HabitatSampler issueshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues2024-01-11T10:12:05+01:00https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/67remove function writeOutSamples2024-01-11T10:12:05+01:00Johannes Knochremove function writeOutSamplesRight now the function `writeOutSamples()` does the same as `saveSamplePoints()` but slower, because it loads an saved _run_ file and does not use variables already loaded to the RAM.
@romulo
Can we deactivate or even remove it?
Or s...Right now the function `writeOutSamples()` does the same as `saveSamplePoints()` but slower, because it loads an saved _run_ file and does not use variables already loaded to the RAM.
@romulo
Can we deactivate or even remove it?
Or should we rewrite it, so that it returns all samplePoints, regardless of the specified threshold?Future optimizationsRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/73Rgdal is depricated2024-01-10T17:00:24+01:00Sophia HoriganRgdal is depricatedUpgrade code to stop using rgeos, sp, and rgdalRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/75Replace all sp functions by sf functions2024-01-10T17:00:24+01:00Romulo Pereira GoncalvesReplace all sp functions by sf functionsWe should replace the following functions:
```R
sp::coordinates
sp::CRS
sp::is.projected
sp::proj4string
sp::SpatialPolygons
sp::spTransform
```We should replace the following functions:
```R
sp::coordinates
sp::CRS
sp::is.projected
sp::proj4string
sp::SpatialPolygons
sp::spTransform
```Upgrade code to stop using rgeos, sp, and rgdalRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/69bugfix multi_class_samplig() and plot_results()2022-10-17T14:49:33+02:00Carsten Neumannbugfix multi_class_samplig() and plot_results()two new bugs due to R version change:
1. the condition if (class(ref) == "SpatialPointsDataFrame" in multi_Class_Sampling() does not work if there are two classes for the reference argument e.g. class(reference) "matrix" "array"
2. the...two new bugs due to R version change:
1. the condition if (class(ref) == "SpatialPointsDataFrame" in multi_Class_Sampling() does not work if there are two classes for the reference argument e.g. class(reference) "matrix" "array"
2. the function that checks length(step*.kml) == length(step*.tif) in plot_results() does not work if the *.tif.aux.xml is also written to disk int the new raster or terra versionJohannes KnochJohannes Knochhttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/71Pages are not updated2022-10-17T13:49:46+02:00Romulo Pereira GoncalvesPages are not updatedhttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/663 feature requests: Display several band combinations and already extracted c...2022-10-17T11:56:49+02:00Irina Stockmann3 feature requests: Display several band combinations and already extracted classes in one Leaflet window and less laborious way to continue at a certain step1) The function multi_Class_Sampling allows the user to define only one band combination to be shown on the Leaflet window. However, for a better visual interpretation it would be very helpful if the user could define several band combin...1) The function multi_Class_Sampling allows the user to define only one band combination to be shown on the Leaflet window. However, for a better visual interpretation it would be very helpful if the user could define several band combinations within this function and if those combinations would appear in **one** Leaflet window where the user could switch between them in a similar way as in MiSa.C.
2) Only the current class (the one for which the threshold value has to be set) is displayed on the Leaflet window. However, for a better visual interpretation it would be helpful, if the user could see all already extracted classes and the currently considered class in **one** Leaflet window where the user could switch between these classes or could choose several classes to be displayed (also in a similar way as in MiSa.C).
3) If you stop at a certain step, close R and then continue with the next steps later, it would be less laborious if the user could just tell the function that the output folder contains the files needed to continue at a certain step.Romulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/68R package and MiSa.C separation2022-10-17T11:52:44+02:00Carsten NeumannR package and MiSa.C separationThe reason is that the R package installation is tarting to get overloaded. There are too many dependencies that make the installation process slow and increases the potential of conflicts. For example, under Linux operating systems all ...The reason is that the R package installation is tarting to get overloaded. There are too many dependencies that make the installation process slow and increases the potential of conflicts. For example, under Linux operating systems all the R dependencies require specific Linux libraries to be compiled. I got reports that user stop installing HaSa package under Linux since there are too many Linux compile processes required. Even on Windows it takes 5-10 minutes to install HaSa which is unusual for R packages.
I have the feeling that there are many R dependencies integrated that are used only by MiSa.C or I am wrong? So my question, would it make sense at a certain point to stop adding more and more dependencies to the R package that are not really needed for executing the algorithm itself and provide a minimalistic stable version of an R package that is then only updated if the algorithm itself is changed?
I have the feeling that the R-package is already unnecessarily big with regard to dependencies!Romulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/63Add unitests and coverage reports to HaSa.2022-10-13T15:32:45+02:00Romulo Pereira GoncalvesAdd unitests and coverage reports to HaSa.Unit and Integration testsJohannes KnochJohannes Knochhttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/64Bug when using Support Vector Machines2022-05-30T22:24:58+02:00Irina StockmannBug when using Support Vector MachinesWhenever I choose Support Vector Machines as Machine Learning algorithm I get a result that looks like an error (see figure 1). However, when using Random Forest with the same data (see figure 2), I get plausible classification results (...Whenever I choose Support Vector Machines as Machine Learning algorithm I get a result that looks like an error (see figure 1). However, when using Random Forest with the same data (see figure 2), I get plausible classification results (see figure 3).
Fig. 1
![2022-05-30_09_13_06-MiSa.C_Dashboard](/uploads/19870b1c3b5896117a2d2c8011711f0c/2022-05-30_09_13_06-MiSa.C_Dashboard.jpg)
Fig. 2
![2022-05-30_09_13_18-MiSa.C_Dashboard](/uploads/a94ea78eaa6d7308d68a8cd83b8a3e37/2022-05-30_09_13_18-MiSa.C_Dashboard.jpg)
Fig. 3
![2022-05-30_09_12_10-MiSa.C_Dashboard](/uploads/0a65dc2949b5deb098ae37aefc6483e7/2022-05-30_09_12_10-MiSa.C_Dashboard.jpg)Unit and Integration testsRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/56When using different predict functions we get different results.2022-05-13T16:02:33+02:00Romulo Pereira GoncalvesWhen using different predict functions we get different results.We have saved initial raster and a model. Then we decided to compare the results coming out of the predict function of different packages: `randomForest:::predict.randomForest`, `stats::predict`, and `raster::predict`. Of course for the ...We have saved initial raster and a model. Then we decided to compare the results coming out of the predict function of different packages: `randomForest:::predict.randomForest`, `stats::predict`, and `raster::predict`. Of course for the first two, we need to convert the input raster to a matrix and for that we used `raster::as.matrix()` function.
Here are the results of our evaluation. Note: That a factor needs to be transposed since it differs in how a vector is built (one uses row by row, while the other one column by column).
1.
```
pred1 <- randomForest:::predict.randomForest(object = models[[1]], newdata = ma_rast, type = "response")
res1 <- t(matrix(as.numeric(predv1), ncol(rast[[1]]), nrow(rast[[1]])))
res1[!is.na(res1)]
```
`
12221111111112111111111112111111111111111222121222111111111112222222112222211111111211112222222222222222111111111211111112122222222221221111111111212211211111221122222222222212111111112222222222212221⋯22222222222222222211222222222222222222222222222222112222222222222222222222222211222222222222222222222112222222222222222222222222222222222222222211222222222221222221121222222222222211222222222222222222
`
2.
```
pred2 <- stats::predict(object = models[[1]], newdata = ma_rast, type = "response")**
res2 <- t(matrix(as.numeric(pred2), ncol(rast[[1]]), nrow(rast[[1]])))
res2[!is.na(res2)]
```
12221111111112111111111112111111111111111222121222111111111112222222112222211111111211112222222222222222111111111211111112122222222221221111111111212211211111221122222222222212111111112222222222212221⋯22222222222222222211222222222222222222222222222222112222222222222222222222222211222222222222222222222112222222222222222222222222222222222222222211222222222221222221121222222222222211222222222222222222
`
3.
```
prev3 <- raster::predict(object = rast, model = models[[1]], type = "response")**
res3 <- raster::as.matrix(pred3)
res3[!is.na(res3)]
```
`
22221111111112211111111112221111111111111222121222111111111112222222112222211111211211112222222222222222111111111221111112122222222221221111111111212211211111221122222222222211111111112222222222222221⋯22222222222222222211222222222222222222222222222222112222222222222222222222222211222222222222222222222112222222222222222222222222222222222222222211222222222221222221111222222222222211222222222222222222
`
As we can see, the results between `raster::predict` and the `randomForest::predict` slightly differ. @carstenn any idea why?
In case it is difficult to see, many this photo helps you.
![image](/uploads/894ab3c45142b04c0ce2a4cebc2d6290/image.png)Carsten NeumannCarsten Neumannhttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/21Align the RMarkDown usage with the documentation.2022-05-06T15:18:16+02:00Romulo Pereira GoncalvesAlign the RMarkDown usage with the documentation.[Habitat_Sampler_Usage.Rmd](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/blob/master/R-package/vignettes/Habitat_Sampler_Usage.Rmd) align it with the documentation.[Habitat_Sampler_Usage.Rmd](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/blob/master/R-package/vignettes/Habitat_Sampler_Usage.Rmd) align it with the documentation.Improve documentationAlison BeamishAlison Beamishhttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/35initial sample_type needs to be documented more specific2022-05-06T15:18:16+02:00Carsten Neumanninitial sample_type needs to be documented more specificSampleRegular ("regular") -> fast and regular: preferable at the beginning of sampling procedure
SampleRandom ("random") -> only samples pixels with information!!! preferrable to use at the final steps with few, irregular distributed pi...SampleRegular ("regular") -> fast and regular: preferable at the beginning of sampling procedure
SampleRandom ("random") -> only samples pixels with information!!! preferrable to use at the final steps with few, irregular distributed pixels - slow(!)Improve documentationRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/60documentation has two sections 4.1.22022-05-06T15:18:16+02:00Daniela Rabedocumentation has two sections 4.1.2both deal with the probability map, the section 4.1.3 in between should be rearrangedboth deal with the probability map, the section 4.1.3 in between should be rearrangedImprove documentationRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/62Update documentation2022-05-06T15:18:15+02:00Romulo Pereira GoncalvesUpdate documentationThe following points should be covered in the documentation:
**Sampling**
1. There are three sampling methods. We need to explain each of them and their optimization strategies.
- `regular_raster`
- `random_raster`
- `random_ma...The following points should be covered in the documentation:
**Sampling**
1. There are three sampling methods. We need to explain each of them and their optimization strategies.
- `regular_raster`
- `random_raster`
- `random_matrix`
2. Both `seed` for `random_raster` and `random_matrix` should be set to a different value in each run, such as, `seed=as.integer(Sys.time())`, unless the user wants reproducible results. Results reproducibility is possible in two ways.
- At a specific step.
- An entire classification run.
3. In case it is not possible to find models, increasing the number of `init.samples` is not always the solution. The user should also try to re-sample so a new set of sample points is picked.
**Prediction**
1. For `randomForest` it is possible to set the number of trees, which should be 1/3 of the total number of predictors. For small values, below 100, the value should be odd so the models can be used by different predict functions, but also reproducible between runs. Check related issues for more information and to add support material.
**Classification**
1. Information to cover issue #57
2. Add information about issue #61
**Overall**
1. Describe what are the optimizations for the `optimized_mode` operation mode.Improve documentationhttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/39We do not use anymore velox, remove BH version pin2022-05-02T11:23:43+02:00Romulo Pereira GoncalvesWe do not use anymore velox, remove BH version pinUnit and Integration testsRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/48Improve understanding and documentation for sampling point results2022-04-28T09:01:26+02:00Robert BehlingImprove understanding and documentation for sampling point resultsI have a few questions regarding the sampling points, which MiSa.C produces and which can be viewed and downloaded in the result panel.
1. Why is the number of samples increasing heavily from 1st class run to the last class run, while...I have a few questions regarding the sampling points, which MiSa.C produces and which can be viewed and downloaded in the result panel.
1. Why is the number of samples increasing heavily from 1st class run to the last class run, while less pixel are available for sampling. Below are two figures for the sample point distribution of the first and last class. The first class are 2204 points, and the last class 21004 points.
2. In the geojson file all points are listed (see example figure below). The points can be differentiated by "nam", which is either 1 or 2. 2 means correctly predicted and 1 not. But there are X number of models. So the question is to which model these points refer to?
I will ask Carsten again and will update some points here, but we still need to include the details to the documentation afterwards.
<details><summary>Figures: (Click to expand)</summary>
sample point distribution first class: ![image](/uploads/4326c4d3b5207222752d0775f138d17d/image.png)
sample point distribution last class : ![image](/uploads/2273f1ae77ef5aaaf854a963a7eb62c3/image.png)
samplePointsJson: ![image](/uploads/e46b66043f856a355f1261e64d32461c/image.png)
</details>Improve documentationRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/2We should use saveRDS and loadRDS when working with single objects2022-04-19T15:29:49+02:00Romulo Pereira GoncalvesWe should use saveRDS and loadRDS when working with single objectsIt makes easier to read the code. We should also add extensions to the files so we know how the object was saved.It makes easier to read the code. We should also add extensions to the files so we know how the object was saved.Future optimizationsRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/12Create a runner for Unit tests and the setup of Unit tests.2022-04-19T15:06:42+02:00Romulo Pereira GoncalvesCreate a runner for Unit tests and the setup of Unit tests.We should have all in place to have unit tests. We should also create one as an example.We should have all in place to have unit tests. We should also create one as an example.Unit and Integration testsRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/52Mask out NaN accross all the layers in the stack at the loading time2022-04-19T15:05:42+02:00Romulo Pereira GoncalvesMask out NaN accross all the layers in the stack at the loading timeImprove documentationRomulo Pereira GoncalvesRomulo Pereira Goncalveshttps://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/issues/44Reclassification code is slow for high resolution images2022-04-19T14:57:01+02:00Romulo Pereira GoncalvesReclassification code is slow for high resolution imagesWe are able to sample and predict quite fast, however, [this piece of code](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/blob/master/R-package/R/inner_procedure.r#L290) is really slow.
```
for (i in ch) {
if (j == 1)...We are able to sample and predict quite fast, however, [this piece of code](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler/-/blob/master/R-package/R/inner_procedure.r#L290) is really slow.
```
for (i in ch) {
if (j == 1) {
result1 <- raster::predict(object = raster, model = models[[i]])
if (is.na(switch[i]) == F) {
result1 <-
raster::reclassify(result1, rbind(c(0.5, 1.5, 2), c(1.6, 2.5, 1)))
}
} else {
result1 <-
raster::stack( result1, raster::predict(object = raster, model = models[[i]]))
if (is.na(switch[i]) == F) {
result1[[j]] <-
raster::reclassify(result1[[j]], rbind(c(0.5, 1.5, 2), c(1.6, 2.5, 1)))
}
}
#print(j)
j <- j + 1
}
```
@carstenn is his code responsible for the classification? All the previous code did the sampling and prediction, but this one does the final classification, is that right?
The issue is that the `raster::predict()` for high resolution raster is too low. I have updated the code to use the raster in memory (413234792aae5a8c36ecaf8550399dc98be2df5c), but even with the raster in memory `raster::predict()` is too slow. We could partition the raster as suggested [here](https://gis.stackexchange.com/questions/206822/speed-up-maxent-prediction-in-r/206905).Future optimizationsCarsten NeumannCarsten Neumann