HabitatSampler issueshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues2023-12-19T13:02:50+01:00https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/36check data type of input image for processing time optimization2023-12-19T13:02:50+01:00Carsten Neumanncheck data type of input image for processing time optimizationI recognized that we use a FLT4S (float 4 byte signed) datatype for our benchmark image. This is absolutely not necessary and decreases processing speed drastically!!! A smart solution would be to always automatically **rescale input ima...I recognized that we use a FLT4S (float 4 byte signed) datatype for our benchmark image. This is absolutely not necessary and decreases processing speed drastically!!! A smart solution would be to always automatically **rescale input imagery** between 0...10,000. However, it is not easy in the raster/rgdal *.tif environment since values in memory are not affecting datatype when not writing out on disk. **Any solution on datatype change on the flow would be cool to speed up processing.** Have a look on our benchmark image to see what it means to artificially blow up an integer type to float! I guess it is already considered in the *json approach of the GUI.Future optimizationsCarsten NeumannCarsten Neumannhttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/41Display error in browser plot after multitest2022-04-19T14:56:16+02:00Antonia SchönbergDisplay error in browser plot after multitestWhen using multi_class_sampling with multitest=3, the following display errors occur in the browser plot:
- Negative values and values above the range of the probability in the legend.
- The range of the probability in the legend does ...When using multi_class_sampling with multitest=3, the following display errors occur in the browser plot:
- Negative values and values above the range of the probability in the legend.
- The range of the probability in the legend does not match the preview in R (see class "high").
However, this is only a display error, since the selected thresholds (determined by the mouse pointer) give the expected extent of the habitats.
Screenshots are attached.
![hasa_tresh](/uploads/0ea6d14ac4ce6df813bee09203a92822/hasa_tresh.PNG)
![hasa_treshB](/uploads/cf5c6ff2c95d38ca83f7e893ee36ae39/hasa_treshB.PNG)
![hasa_treshC](/uploads/fe286fe20b3ee04ca8eb780134fa598d/hasa_treshC.PNG)Future optimizationshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/46Idea: incorporate "expert area" to improve performance2022-04-19T15:02:31+02:00Julia NeelmeijerIdea: incorporate "expert area" to improve performance@carstenn :
Classifying large areas with HaSa/MiSa.C is very slow. As the idea behind the tool is to incorporate expert knowledge I was wondering whether it would make sense to define two different areas of interest:
1. the total image/...@carstenn :
Classifying large areas with HaSa/MiSa.C is very slow. As the idea behind the tool is to incorporate expert knowledge I was wondering whether it would make sense to define two different areas of interest:
1. the total image/extent that should be classified (can be large)
2. the "expert area" - this would be an AOI that is well known to the user (could incorporate the reference points, probably allow also multi-polygons)
Idea: the entire model run / threshold finding analysis would then only be based on the expert area to save time and complexity. Once the final solution is found, the result will be deployed to the large area. I do understand that this needs to be the same imagery (data stack) and it does not work if any image/bands and so on would be changed.
Would that be possible?
@romulo : if this works it could be a real improvement for MiSa.C!Future optimizationsCarsten NeumannCarsten Neumannhttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/57The selection of the class should be independent of its order in the referenc...2022-07-04T17:40:49+02:00Romulo Pereira GoncalvesThe selection of the class should be independent of its order in the reference data.The following code at [inner_procedure.r#L249](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/blob/master/R-package/R/inner_procedure.r#L249)
```
index <- which.max(dif[2,])
```
Will pick the first class which has the maxim...The following code at [inner_procedure.r#L249](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/blob/master/R-package/R/inner_procedure.r#L249)
```
index <- which.max(dif[2,])
```
Will pick the first class which has the maximum value. That means the order in which the classes are define in the reference data will dictate which class is selected. We have printed the values for each run and we have seen the following:
```
[1] "The difference between classes:"
[1] "Deciduous_trees was 0.375"
[1] "Bare_ground was 0.3"
[1] "Xeric_grass was 0.3"
[1] "Fesh_meadow was 0.357142857142857"
[1] "Heather_mature was 0.375"
[1] "Heather_pioneer was 0.409090909090909"
[1] "Heather_scrubby was 0.409090909090909"
```
As we can see `Heather_pioneer` will be selected, but if in the reference data the order was different `Heather_scrubby` would be the one selected. We think in case there is more than one class with the max value, then the user should re-sample. Would that be correct?
Related to this we still have few questions:
1. We would like to understand what these value mean? Is it a percentage?
2. In some cases the decision is done at the 5 decimal digit. It also happens that this values are all below `0.4` and we have never seen them above `0.5`. We wonder if the information above should not be shared with the user to assist the user in deciding either to accept the class selection or do a re-sample? This question depends on the meaning of the values, which is question 1.Future optimizationsCarsten NeumannCarsten Neumannhttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/65Differences between saved .tif and .kmz -files2023-12-19T13:11:43+01:00Johannes KnochDifferences between saved .tif and .kmz -files@romulo
I was about to write a unit test for the save functions and therefore I compared a step_\*.tif- and a step_\*.kml-file.
I was aware, that due to transforming between crs there is a slighty shift of the pixels.
But did you know...@romulo
I was about to write a unit test for the save functions and therefore I compared a step_\*.tif- and a step_\*.kml-file.
I was aware, that due to transforming between crs there is a slighty shift of the pixels.
But did you know that:
1. some pixel change class?
As you can see in the circled area the shape of the pixelcluster is different, which means some pixel changed their class (respective attribute value)
![tif_kml_pixel_change_class](/uploads/37c0067922b14b8c14a4c202ea632c6b/tif_kml_pixel_change_class.png)
------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------
2. The pixel values are saved in reverse order in the .kml file?
In the big circle on the right you can see that class "0" in the step_01_decidiuous.tif is class "10" in the step_01.kml and that's the case for all the classes. Moreover the step_01.kml file has one more class (blue circle on the left) for "NA" which gets the value "0".
![reversed_order_pixel_values](/uploads/b1cf482b1125bb97aea3dc3fd50598c9/reversed_order_pixel_values.png)Future optimizationsJohannes KnochJohannes Knochhttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/70Replace raster by terra package2023-12-19T17:48:52+01:00Romulo Pereira GoncalvesReplace raster by terra package> [terra](https://github.com/rspatial/terra) replaces the raster package. The interfaces of terra and raster are similar, but terra is simpler, faster and can do more.> [terra](https://github.com/rspatial/terra) replaces the raster package. The interfaces of terra and raster are similar, but terra is simpler, faster and can do more.Future optimizationshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/72default values for increasing models and samples after classification fails2023-12-19T13:06:48+01:00Johannes Knochdefault values for increasing models and samples after classification fails@romulo @carstenn
Arash and I are wondering why the default value of increasing the number of samples are that large (+50). Especially when you use regular sampling the increase of just one sample leads to different samples and the com...@romulo @carstenn
Arash and I are wondering why the default value of increasing the number of samples are that large (+50). Especially when you use regular sampling the increase of just one sample leads to different samples and the computation time is increased a lot!
![sampling_time](/uploads/4e1dd8519b5e9db5caebcb0744027644/sampling_time.jpg)
Besides the "increase number" of models is just 15 which does not lead to more models after the classification in my experience.
I usually use more then 30 models in one resampling step.
Are we missing something here? What were the intentions, when those numbers were established?Future optimizationsRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/76BUG: returning 0 when predictive distance of last class is the same as class NA2024-01-23T15:57:57+01:00Johannes KnochBUG: returning 0 when predictive distance of last class is the same as class NAThe problem occurs when the last class has 0.5 as predictive distance (same as the "fake class" NA). In this case HaSa seems to return a zero (returns==0) which causes HaSa_API to interpret that there is valid result coming from HaSa. As...The problem occurs when the last class has 0.5 as predictive distance (same as the "fake class" NA). In this case HaSa seems to return a zero (returns==0) which causes HaSa_API to interpret that there is valid result coming from HaSa. As this is not the case HaSa_API returns a cryptic error saying the "argument is of length zero".
It should be avoided that HaSa returns a zero in that case.
```plaintext
2024-01-23 15:14:14.362959 [INFO] classify: checking input
2024-01-23 15:14:14.363122 [INFO] Seed was set to: 1706019254
2024-01-23 15:14:14.363409 [INFO] runClassification: loading input data
2024-01-23 15:14:14.422804 [INFO] runClassification: call HaSa:sample_nb
[1] "Number of samples = 100 Number of models = 150"
GB
available : 7.15
60% : 4.29
needed : 0.01
allowed : 4.66 (if available)
[1] "It was possible to load the raster stack in_memory."
[1] "loading took 0.050469"
[1] "Matrix conversion took 0.000411"
|======================================================================| 100%[1] "sampling took 27.131906"
[1] "prediction took 0.491540"
[1] "The difference between classes:"
[1] "xeric_grass was 0.5"
[1] "NA was 0.5"
Warning in remove(points) : object 'points' not found
Warning in remove(mod_all) : object 'mod_all' not found
2024-01-23 15:14:42.244241 [INFO] classify: returned Error
2024-01-23 15:14:42.244341 [INFO] Error in if (returns == 0) {: argument is of length zero
```
![error_HaSa_return0_0.5_predict](/uploads/83d974c291a0c2f0839efa1f04ffca52/error_HaSa_return0_0.5_predict.png)