HabitatSampler issueshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues2023-12-19T13:06:48+01:00https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/72default values for increasing models and samples after classification fails2023-12-19T13:06:48+01:00Johannes Knochdefault values for increasing models and samples after classification fails@romulo @carstenn
Arash and I are wondering why the default value of increasing the number of samples are that large (+50). Especially when you use regular sampling the increase of just one sample leads to different samples and the com...@romulo @carstenn
Arash and I are wondering why the default value of increasing the number of samples are that large (+50). Especially when you use regular sampling the increase of just one sample leads to different samples and the computation time is increased a lot!
![sampling_time](/uploads/4e1dd8519b5e9db5caebcb0744027644/sampling_time.jpg)
Besides the "increase number" of models is just 15 which does not lead to more models after the classification in my experience.
I usually use more then 30 models in one resampling step.
Are we missing something here? What were the intentions, when those numbers were established?Future optimizationsRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/57The selection of the class should be independent of its order in the referenc...2022-07-04T17:40:49+02:00Romulo Pereira GoncalvesThe selection of the class should be independent of its order in the reference data.The following code at [inner_procedure.r#L249](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/blob/master/R-package/R/inner_procedure.r#L249)
```
index <- which.max(dif[2,])
```
Will pick the first class which has the maxim...The following code at [inner_procedure.r#L249](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/blob/master/R-package/R/inner_procedure.r#L249)
```
index <- which.max(dif[2,])
```
Will pick the first class which has the maximum value. That means the order in which the classes are define in the reference data will dictate which class is selected. We have printed the values for each run and we have seen the following:
```
[1] "The difference between classes:"
[1] "Deciduous_trees was 0.375"
[1] "Bare_ground was 0.3"
[1] "Xeric_grass was 0.3"
[1] "Fesh_meadow was 0.357142857142857"
[1] "Heather_mature was 0.375"
[1] "Heather_pioneer was 0.409090909090909"
[1] "Heather_scrubby was 0.409090909090909"
```
As we can see `Heather_pioneer` will be selected, but if in the reference data the order was different `Heather_scrubby` would be the one selected. We think in case there is more than one class with the max value, then the user should re-sample. Would that be correct?
Related to this we still have few questions:
1. We would like to understand what these value mean? Is it a percentage?
2. In some cases the decision is done at the 5 decimal digit. It also happens that this values are all below `0.4` and we have never seen them above `0.5`. We wonder if the information above should not be shared with the user to assist the user in deciding either to accept the class selection or do a re-sample? This question depends on the meaning of the values, which is question 1.Future optimizationsCarsten NeumannCarsten Neumannhttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/46Idea: incorporate "expert area" to improve performance2022-04-19T15:02:31+02:00Julia NeelmeijerIdea: incorporate "expert area" to improve performance@carstenn :
Classifying large areas with HaSa/MiSa.C is very slow. As the idea behind the tool is to incorporate expert knowledge I was wondering whether it would make sense to define two different areas of interest:
1. the total image/...@carstenn :
Classifying large areas with HaSa/MiSa.C is very slow. As the idea behind the tool is to incorporate expert knowledge I was wondering whether it would make sense to define two different areas of interest:
1. the total image/extent that should be classified (can be large)
2. the "expert area" - this would be an AOI that is well known to the user (could incorporate the reference points, probably allow also multi-polygons)
Idea: the entire model run / threshold finding analysis would then only be based on the expert area to save time and complexity. Once the final solution is found, the result will be deployed to the large area. I do understand that this needs to be the same imagery (data stack) and it does not work if any image/bands and so on would be changed.
Would that be possible?
@romulo : if this works it could be a real improvement for MiSa.C!Future optimizationsCarsten NeumannCarsten Neumann