diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index d1c9f52d9ce8f3effcf9b709c0838f8097a206dd..d1219a145a0040462e1c7924b5b6c42f52c4f8b5 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -23,8 +23,8 @@ pages: # this job must be called 'pages' to advise GitLab to upload content to - mkdir -p public/images/ # Copy over the docs - - cp -r docs/*.html public/index.html - - cp -r docs/images/* public/images/ + - cp -r R-package/hasa/vignettes/*.html public/index.html + - cp -r R-package/hasa/vignettes/images/* public/images/ # Check if everything is working great - ls -al public @@ -36,5 +36,5 @@ pages: # this job must be called 'pages' to advise GitLab to upload content to expire_in: 30 days only: - master - - documentation + - improve_documentation diff --git a/docs/HabitatSampler.Rmd b/R-package/hasa/vignettes/HabitatSampler.Rmd similarity index 84% rename from docs/HabitatSampler.Rmd rename to R-package/hasa/vignettes/HabitatSampler.Rmd index 65c7b9fcc29c699480614c0fbb4f277369c282e2..1d69e595772c5f263f80575f99d343055eecfef6 100644 --- a/docs/HabitatSampler.Rmd +++ b/R-package/hasa/vignettes/HabitatSampler.Rmd @@ -1,16 +1,15 @@ --- title: "An Introduction to Habitat Sampler" author: "Carsten Neumann, Alison Beamish, Romulo Goncalves" -date: "01/06/2021" +date: "`r Sys.Date()`" output: - pdf_document: - toc: true - toc_depth: 2 md_document: - pandoc_args: ["--output", "README.md"] toc: true toc_depth: 2 variant: gfm + pdf_document: + toc: true + toc_depth: 2 html_document: theme: united highlight: tango @@ -33,23 +32,23 @@ knitr::opts_chunk$set(tidy.opts = list(width.cutoff = 75), tidy = TRUE, fig.pos \newpage # 1 Introduction -This manual introduces the Habitat Sampler (HaSa), an innovative tool that autonomously generates representative reference samples for predictive modelling of surface class probabilities. The tool can be applied to any image data that displays surface structures and dynamics of any kind at multiple spatial and temporal scales. HaSa was initially developed to classify habitat dynamics in semi-natural ecosystems but the procedure can theoretically be applied to any surface. The main innovation of the tool is that it reduces reliance on comprehensive in situ ground truth data or comprehensive training datasets which constrain accurate image classification particularly in complex scenes. +This manual introduces the Habitat Sampler (HaSa), an innovative tool that autonomously generates representative reference samples for predictive modeling of surface class probabilities. The tool can be applied to any image data that displays surface structures and dynamics of any kind at multiple spatial and temporal scales. HaSa was initially developed to classify habitat dynamics in semi-natural ecosystems, but the procedure can theoretically be applied to any surface. The main innovation of the tool is that it reduces reliance on comprehensive in situ ground truth data or comprehensive training data sets which constrain accurate image classification particularly in complex scenes. -Though development of HaSa has prioritized ease of use, this documentation assume a familiarity with the R software. The document is built successively and is intended to lead you step-by-step through the HaSa procedure of generating probability and classification maps. HaSa is still in development and any suggestions or improvements are welcomed and encouraged in our [GitHub Community Version](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler.git). If questions remain please don't hesitate to contact the authors of the package. For a detailed description of the Habitat Sampler and its applications, see [Neumann et al., (2020)](https://doi.org/10.1111/ddi.13165). +Though development of HaSa has prioritized ease of use, this documentation assumes a familiarity with the R software. The document is built successively and is intended to lead you step-by-step through the HaSa procedure of generating probability and classification maps. HaSa is still in development and any suggestions or improvements are welcomed and encouraged in our [GitLab Community Version](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler.git). If questions remain please don't hesitate to contact the authors of the package. For a detailed description of the Habitat Sampler and its applications, see [Neumann et al., (2020)](https://doi.org/10.1111/ddi.13165). ## 1.1 Usage The tool is implemented in R and uses Leaflet [(Cheng et al., 2019)](https://rdrr.io/cran/leaflet/) to generate interactive maps in a web browser. There are no assumptions about the input image data and there are no constraints for the spectral-temporal-spatial domain in which the image is sampled. The tool requires the input of a priori expert user knowledge to generate reference data about expected surface classes which are delineated in the imagery or extracted from an external spectral library. The user has the choice between image classifiers [random forest](https://doi.org/10.1023/A:1010933404324) (RF) and [support vector](https://doi.org/10.1145/130385.130401) (SV). ## 1.2 Sample datasets -The examples in this documentation use an L2A Sentinel-2 timeseries stack from 2018 (6 days, 9 bands each) and reference points that are included in the HaSa package. This Sentinel-2 data are from the Kyritz-Ruppiner Heide a former military training area north east of Berlin, Germany. The open heathlands in the former military training area are designated protected areas under the European Natura 2000 network and are subject to management activities including tree removal, controlled burning and machine mowing. The reference data include 7 classes identified with a priori expert knowledge. +The examples in this documentation use an L2A Sentinel-2 timeseries stack from 2018 (6 days, 9 bands each) and reference points that are included in the HaSa package. This Sentinel-2 data are from the Kyritz-Ruppiner Heide a former military training area northeast of Berlin, Germany. The open heathlands in the former military training area are designated protected areas under the European Natura 2000 network and are subject to management activities including tree removal, controlled burning and machine mowing. The reference data include 7 classes identified with a priori expert knowledge. -The Sentinel-2 data are downloaded and processed using the German Centre for Geosciences (GFZ) Time Series System for Sentinel-2 (GTS2). Data are made available and atmospherically corrected via a simple web API. Detailed information on the GTS2 system can be found [here](https://www.gfz-potsdam.de/gts2/). The metadata of the Senitnel-2 data including the band ID in the timeseries stack are provided below (Table 1). +The Sentinel-2 data are downloaded and processed using the German Centre for Geosciences (GFZ) Time Series System for Sentinel-2 (GTS2). Data are made available and atmospherically corrected via a simple web API. Detailed information on the GTS2 system can be found [here](https://www.gfz-potsdam.de/gts2/). The metadata of the Sentinel-2 data including the band ID in the timeseries stack are provided below (Table 1). ```{r S2 metadata, eval = TRUE, echo=FALSE, message = FALSE, warning = FALSE} library(tools) #install.packages("kableExtra") library(kableExtra) -wd <- paste(tools::file_path_as_absolute("./"), "/../demo/", sep = "") +wd <- paste(tools::file_path_as_absolute("./"), "/../../../demo/", sep = "") metadat <- read.csv(paste(wd, "Data/S2_stack_metadata.csv", sep = ""), header = T, sep = ",") colnames(metadat) <- c("", "Band 2", "Band 3", "Band 4", "Band 5", "Band 6", "Band 7", "Band 8", "Band 11", "Band 12") metadat[1,] <- c("Date","Blue", "Green", "Red", "Red Edge 1", "Red Edge 2", "Red Edge 3", "NIR", "SWIR 1", "SWIR 2") @@ -85,7 +84,7 @@ remotes::install_git("https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler. ``` ## 2.3 Load HaSa -Before the user starts using `HaSa` it is necessary to load the library and some of its dependencies. In the following example, the libraries to be loaded are passed as a list. The option `options("rgdal_show_exportToProj4_warnings"="none")` is to supress the warning messages related with the latest changes in `gdal` and `PROJ6`. +Before the user starts using `HaSa` it is necessary to load the library and some of its dependencies. In the following example, the libraries to be loaded are passed as a list. The option `options("rgdal_show_exportToProj4_warnings"="none")` is to suppress the warning messages related with the latest changes in `gdal` and `PROJ6`. ```{r load libraries, echo = TRUE, results = 'hide', message = FALSE, warning = FALSE} options("rgdal_show_exportToProj4_warnings" = "none") libraries <- c("rgdal","raster","maptools","spatialEco","randomForest","e1071", @@ -99,7 +98,7 @@ An important step preceding classification is to load the Sentinel-2 satellite t ## 3.1 Data directories Before loading the input data and using `HaSa`, the user needs to define a series of directory paths. They are from where `HaSa` will read input data, and store intermediates and final results. These directory paths are relative to the working directory path, i.e., `wd`. The following code sets all the paths assuming that the root path is the current directory, i.e., the `demo` directory. ```{r set wd, eval = TRUE} -wd <- paste(tools::file_path_as_absolute("./"), "/../demo/", sep = "") +wd <- paste(tools::file_path_as_absolute("./"), "/../../../demo/", sep = "") #demo data stored here dataPath <- paste(wd,"Data/", sep = "") @@ -112,7 +111,7 @@ raster::rasterOptions(tmpdir = "./RasterTmp/") ``` ## 3.2 Satellite timeseries stack -The satellite time series is either passed as a **3.2.1** stack of images already clipped or **3.2.2** a stack of image to be clipped. In both cases, the input Satellite images needs to either have a valid projection or the projection be passed as parameter, i.e., `sat_crs_str = '+proj=utm +zone=32 +datum=WGS84 +units=m +no_defs'`, otherwise, the function will report error. Satellite time series data are available in `dataPath`. +The satellite time series is either passed as a **3.2.1** stack of images already clipped or **3.2.2** a stack of image to be clipped. In both cases, the input satellite images needs to either have a valid projection or the projection be passed as parameter, i.e., `sat_crs_str = '+proj=utm +zone=32 +datum=WGS84 +units=m +no_defs'`, otherwise, the function will report error. Satellite time series data are available in `dataPath`. ```{r handling image data clipped, eval = TRUE, results= "hide", message = FALSE} satellite_series_path <- paste(dataPath,"SentinelStack_2018.tif",sep = "") @@ -207,10 +206,11 @@ Class sampling input HaSa::multi_Class_Sampling( in_raster = raster_stk, # clipped satellite time series stack [raster brick] num_samples = 75, # starting number of spatial samples (recommended value: 75) + # for more info *See note 1 sample_type = "regular_raster", # distribution of spatial samples # ("random_raster","regular_raster", "matrix_random") - # recommended: "regular_raster" *See note 1 + # recommended: "regular_raster" *See note 2 num_models = 200, # number of models to collect (recommended value: 200) num_iterations = 10, # number of iterations for model accuracy # (recommended value:10) @@ -222,23 +222,24 @@ HaSa::multi_Class_Sampling( # recommended input: rf) num_trees = 500, # if the model is "rf" set the number of trees for the # random forest algorithm, otherwise, the parameter is - # ignored *See note 2 + # ignored *See note 3 mtry = 10, # splitting nodes (recommended: mtry < number of predictors) mod_error = 0.02, # threshold for model error until which iteration is being last = FALSE, # only TRUE for one class classifier - # recommended input: FALSE *See note 3 + # recommended input: FALSE *See note 4 last_ref_val = 1000, # default reference value for the last step (default: 1000) + # for more info *See note 5 seed = - as.integer(Sys.time()), # set seed for reproducible results *See note 4 + as.integer(Sys.time()), # set seed for reproducible results *See note 6 init_seed = "sample", # "sample" for new or use Run@seeds to reproduce previous - # steps *See note 5 + # steps *See note 7 out_path = out_path, # output path for saving results step = 1, # at which step should the procedure start, e.g. use # step = 2 if the first habitat is already extracted class_names = class_names,# vector with class names in the order of reference spectra n_classes = 7, # total number of classes to be separated multi_test = 1, # number of test runs to compare different probability - # output *See note 6 + # output *See note 8 RGB = c(19,20,21), # pallette colors for the interactive plots color = c("lightgrey", "orange", "yellow", "limegreen", "forestgreen") @@ -254,12 +255,14 @@ HaSa::multi_Class_Sampling( ) ``` -* **Note 1**: There are threes sampling strategies: `random_raster` (it uses `raster::sampleRandom`), `raster_regular` (it uses `raster::sampleRegular`), and `random_matrix` (it uses matrices and the `stats::sample` function over only existent non NaN pixels). -* **Note 2**: The default value is 500, for small number of trees (at least 1/3 of the number of predictors) use an odd number for precise prediction results. -* **Note 3**: The argument `last = T` can be set when only one class should be separated from the background pixels -* **Note 4**: For different results the `seed` should have a different value on each run (use `seed = as.numeric(Sys.time())`). To repeat a specific run the user just needs to restart R, run again the `HaSa::multi_Class_Sampling` with the same arguments, this is, keep `seed` constant, and it will get the same results for all the steps. -* **Note 5**: The results from previous steps are reproducible when using the same seed value and `int.seed=Run@seeds` (e.g. Run02@seeds) in consequence, `init.sample` for regular sampling determines an invariant sample distribution, use `random` sampling or vary `init.sample` to get varying sample distributions. -* **Note 6**: If `multi_test > 1` the user will get multiple maps and will be asked to enter the number of the probability distribution that is appropriate. +* **Note 1**: In case it is not possible to find models, increasing the number of `num_samples` and `num_models` is not always the solution. The user should also try to re-sample with a different `seed` value. +* **Note 2**: There are three sampling strategies: `random_raster` (it uses `raster::sampleRandom`), `raster_regular` (it uses `raster::sampleRegular`), and `random_matrix` (it uses matrices and the `stats::sample` function over only existent non `NaN` pixels). The `regular_raster` -> fast: preferable at the beginning of sampling procedure. The `random_raster` -> slow: it only samples pixels with information and it is preferable to use at the final steps with few and irregular distributed pixels. The `random_matrix` -> fast: it only samples pixels with information and it is preferable to use at the final steps with few and irregular distributed pixels. +* **Note 3**: The default value is 500, for small number of trees (at least 1/3 of the number of predictors) use an odd number for precise prediction results. +* **Note 4**: The argument `last = T` can be set when only one class should be separated from the background pixels +* **Note 5**: The reference data for the pseudo class in the last step is built using the value of `last_ref_val`. In case the user gets NA as last class, the user should adjust the value of `last_ref_val` and re-sample again. +* **Note 6**: For different results the `seed` should have a different value on each run (use `seed = as.numeric(Sys.time())`). To repeat a specific run the user just needs to restart R, run again the `HaSa::multi_Class_Sampling` with the same arguments, this is, keep `seed` constant, and it will get the same results for all the steps. +* **Note 7**: The results from previous steps are reproducible when using the same seed value and `int.seed=Run@seeds` (e.g. Run02@seeds) in consequence, `init.sample` for regular sampling determines an invariant sample distribution, use `random` sampling or vary `init.sample` to get varying sample distributions. +* **Note 8**: If `multi_test > 1` the user will get multiple maps and will be asked to enter the number of the probability distribution that is appropriate. Habitat sampling output. R object that contains: ```r @@ -273,7 +276,7 @@ Habitat sampling output. R object that contains: seeds # seeds to reproduce respective step/habitat type sampling ``` -### 4.1.2 Interactive probability maps and downloading output +### 4.1.4 Interactive probability maps and downloading output An interactive map is plotted in a web browser (e.g., Firefox for Linux) containing a selected habitat type. The number of models predicting this habitat type can be viewed by hovering the mouse over the map. ![](./images/inter_map_ex_new.png){width=65%} diff --git a/docs/HabitatSampler.html b/R-package/hasa/vignettes/HabitatSampler.html similarity index 99% rename from docs/HabitatSampler.html rename to R-package/hasa/vignettes/HabitatSampler.html index e3a732a22902dcabaf7cba61940324ba92288c0a..767ca6e2875308b685b1200e9e092f11429a59db 100644 --- a/docs/HabitatSampler.html +++ b/R-package/hasa/vignettes/HabitatSampler.html @@ -11,7 +11,7 @@ - + An Introduction to Habitat Sampler @@ -2863,7 +2863,7 @@ div.tocify {

An Introduction to Habitat Sampler

Carsten Neumann, Alison Beamish, Romulo Goncalves

-

01/06/2021

+

2022-05-05

@@ -2871,8 +2871,8 @@ div.tocify {

1 Introduction

-

This manual introduces the Habitat Sampler (HaSa), an innovative tool that autonomously generates representative reference samples for predictive modelling of surface class probabilities. The tool can be applied to any image data that displays surface structures and dynamics of any kind at multiple spatial and temporal scales. HaSa was initially developed to classify habitat dynamics in semi-natural ecosystems but the procedure can theoretically be applied to any surface. The main innovation of the tool is that it reduces reliance on comprehensive in situ ground truth data or comprehensive training datasets which constrain accurate image classification particularly in complex scenes.

-

Though development of HaSa has prioritized ease of use, this documentation assume a familiarity with the R software. The document is built successively and is intended to lead you step-by-step through the HaSa procedure of generating probability and classification maps. HaSa is still in development and any suggestions or improvements are welcomed and encouraged in our GitHub Community Version. If questions remain please don’t hesitate to contact the authors of the package. For a detailed description of the Habitat Sampler and its applications, see Neumann et al., (2020).

+

This manual introduces the Habitat Sampler (HaSa), an innovative tool that autonomously generates representative reference samples for predictive modeling of surface class probabilities. The tool can be applied to any image data that displays surface structures and dynamics of any kind at multiple spatial and temporal scales. HaSa was initially developed to classify habitat dynamics in semi-natural ecosystems but the procedure can theoretically be applied to any surface. The main innovation of the tool is that it reduces reliance on comprehensive in situ ground truth data or comprehensive training data sets which constrain accurate image classification particularly in complex scenes.

+

Though development of HaSa has prioritized ease of use, this documentation assume a familiarity with the R software. The document is built successively and is intended to lead you step-by-step through the HaSa procedure of generating probability and classification maps. HaSa is still in development and any suggestions or improvements are welcomed and encouraged in our GitLab Community Version. If questions remain please don’t hesitate to contact the authors of the package. For a detailed description of the Habitat Sampler and its applications, see Neumann et al., (2020).

1.1 Usage

The tool is implemented in R and uses Leaflet (Cheng et al., 2019) to generate interactive maps in a web browser. There are no assumptions about the input image data and there are no constraints for the spectral-temporal-spatial domain in which the image is sampled. The tool requires the input of a priori expert user knowledge to generate reference data about expected surface classes which are delineated in the imagery or extracted from an external spectral library. The user has the choice between image classifiers random forest (RF) and support vector (SV).

@@ -3005,7 +3005,7 @@ div.tocify {

2.3 Load HaSa

-

Before the user starts using HaSa it is necessary to load the library and some of its dependencies. In the following example, the libraries to be loaded are passed as a list. The option options("rgdal_show_exportToProj4_warnings"="none") is to supress the warning messages related with the latest changes in gdal and PROJ6.

+

Before the user starts using HaSa it is necessary to load the library and some of its dependencies. In the following example, the libraries to be loaded are passed as a list. The option options("rgdal_show_exportToProj4_warnings"="none") is to suppress the warning messages related with the latest changes in gdal and PROJ6.

options(rgdal_show_exportToProj4_warnings = "none")
 libraries <- c("rgdal", "raster", "maptools", "spatialEco", "randomForest", 
     "e1071", "devtools", "fasterize", "rgeos", "leaflet", "htmlwidgets", "IRdisplay", 
@@ -3019,7 +3019,7 @@ div.tocify {
 

3.1 Data directories

Before loading the input data and using HaSa, the user needs to define a series of directory paths. They are from where HaSa will read input data, and store intermediates and final results. These directory paths are relative to the working directory path, i.e., wd. The following code sets all the paths assuming that the root path is the current directory, i.e., the demo directory.

-
wd <- paste(tools::file_path_as_absolute("./"), "/../demo/", sep = "")
+
wd <- paste(tools::file_path_as_absolute("./"), "/../../../demo/", sep = "")
 
 # demo data stored here
 dataPath <- paste(wd, "Data/", sep = "")
@@ -3178,58 +3178,62 @@ div.tocify {
 
HaSa::multi_Class_Sampling(
     in_raster = raster_stk,   # clipped satellite time series stack [raster brick]
     num_samples = 75,         # starting number of spatial samples (recommended value: 75)
-    sample_type = 
-        "regular_raster",     # distribution of spatial samples 
-                              # ("random_raster","regular_raster", "matrix_random")
-                              # recommended: "regular_raster" *See note 1
-    num_models = 200,         # number of models to collect (recommended value: 200)
-    num_iterations = 10,      # number of iterations for model accuracy
-                              # (recommended value:10)
-    buffer = 10,              # distance (in m) for new sample collection around initial
-                              # samples (depends on pixel size and image resolution)
-    reference = ref,          # table of reference spectra [data.frame]
-    model = "rf",             # which machine learning algorithm to use ("rf" random
-                              # forest or "svm" support vector machine;
-                              # recommended input: rf)
-    num_trees = 500,          # if the model is "rf" set the number of trees for the
-                              # random forest algorithm, otherwise, the parameter is
-                              # ignored *See note 2
-    mtry = 10,                # splitting nodes (recommended: mtry < number of predictors)
-    mod_error = 0.02,         # threshold for model error until which iteration is being
-    last = FALSE,             # only TRUE for one class classifier
-                              # recommended input: FALSE *See note 3
-    last_ref_val = 1000,      # default reference value for the last step (default: 1000)
-    seed = 
-      as.integer(Sys.time()), # set seed for reproducible results *See note 4
-    init_seed = "sample",     # "sample" for new or use Run@seeds to reproduce previous
-                              # steps *See note 5
-    out_path = out_path,      # output path for saving results
-    step = 1,                 # at which step should the procedure start, e.g. use 
-                              # step = 2 if the first habitat is already extracted
-    class_names = class_names,# vector with class names in the order of reference spectra
-    n_classes = 7,            # total number of classes to be separated
-    multi_test = 1,           # number of test runs to compare different probability
-                              # output *See note 6
-    RGB = c(19,20,21),        # pallette colors for the interactive plots
-    color = 
-        c("lightgrey", "orange", "yellow", "limegreen", "forestgreen")
-                              #  single colors for continuous color palette interpolation
-    in_memory = TRUE,         # boolean for raster processing (memory = "TRUE", 
-                              # from disk = "FALSE")
-    optimized_mode = TRUE     # use the optimized mode (run in_memory if possible
-                              # and use matrices instead of rasters)
-    overwrite = TRUE,         # overwrite the KML and raster files from previous runs
-    save_runs = TRUE,         # an class object is saved into disk for each run 
-                              # (default TRUE)
-    plot_on_browser = FALSE   # plot on the browser or inline in a notebook (default TRUE)
-    )
+ # for more info *See note 1 + sample_type = + "regular_raster", # distribution of spatial samples + # ("random_raster","regular_raster", "matrix_random") + # recommended: "regular_raster" *See note 2 + num_models = 200, # number of models to collect (recommended value: 200) + num_iterations = 10, # number of iterations for model accuracy + # (recommended value:10) + buffer = 10, # distance (in m) for new sample collection around initial + # samples (depends on pixel size and image resolution) + reference = ref, # table of reference spectra [data.frame] + model = "rf", # which machine learning algorithm to use ("rf" random + # forest or "svm" support vector machine; + # recommended input: rf) + num_trees = 500, # if the model is "rf" set the number of trees for the + # random forest algorithm, otherwise, the parameter is + # ignored *See note 3 + mtry = 10, # splitting nodes (recommended: mtry < number of predictors) + mod_error = 0.02, # threshold for model error until which iteration is being + last = FALSE, # only TRUE for one class classifier + # recommended input: FALSE *See note 4 + last_ref_val = 1000, # default reference value for the last step (default: 1000) + # for more info *See note 5 + seed = + as.integer(Sys.time()), # set seed for reproducible results *See note 6 + init_seed = "sample", # "sample" for new or use Run@seeds to reproduce previous + # steps *See note 7 + out_path = out_path, # output path for saving results + step = 1, # at which step should the procedure start, e.g. use + # step = 2 if the first habitat is already extracted + class_names = class_names,# vector with class names in the order of reference spectra + n_classes = 7, # total number of classes to be separated + multi_test = 1, # number of test runs to compare different probability + # output *See note 8 + RGB = c(19,20,21), # pallette colors for the interactive plots + color = + c("lightgrey", "orange", "yellow", "limegreen", "forestgreen") + # single colors for continuous color palette interpolation + in_memory = TRUE, # boolean for raster processing (memory = "TRUE", + # from disk = "FALSE") + optimized_mode = TRUE # use the optimized mode (run in_memory if possible + # and use matrices instead of rasters) + overwrite = TRUE, # overwrite the KML and raster files from previous runs + save_runs = TRUE, # an class object is saved into disk for each run + # (default TRUE) + plot_on_browser = FALSE # plot on the browser or inline in a notebook (default TRUE) + )
    -
  • Note 1: There are threes sampling strategies: random_raster (it uses raster::sampleRandom), raster_regular (it uses raster::sampleRegular), and random_matrix (it uses matrices and the stats::sample function over only existent non NaN pixels).
  • -
  • Note 2: The default value is 500, for small number of trees (at least 1/3 of the number of predictors) use an odd number for precise prediction results.
  • -
  • Note 3: The argument last = T can be set when only one class should be separated from the background pixels
  • -
  • Note 4: For different results the seed should have a different value on each run (use seed = as.numeric(Sys.time())). To repeat a specific run the user just needs to restart R, run again the HaSa::multi_Class_Sampling with the same arguments, this is, keep seed constant, and it will get the same results for all the steps.
  • -
  • Note 5: The results from previous steps are reproducible when using the same seed value and int.seed=Run@seeds (e.g. ) in consequence, init.sample for regular sampling determines an invariant sample distribution, use random sampling or vary init.sample to get varying sample distributions.
  • -
  • Note 6: If multi_test > 1 the user will get multiple maps and will be asked to enter the number of the probability distribution that is appropriate.
  • +
  • Note 1: In case it is not possible to find models, increasing the number of num_samples and num_models is not always the solution. The user should also try to re-sample with a different seed value.
  • +
  • Note 2: There are threes sampling strategies: random_raster (it uses raster::sampleRandom), raster_regular (it uses raster::sampleRegular), and random_matrix (it uses matrices and the stats::sample function over only existent non NaN pixels). The regular_raster -> fast: preferable at the beginning of sampling procedure. The random_raster -> slow: it only samples pixels with information and it is preferable to use at the final steps with few and irregular distributed pixels. The random_matrix -> fast: it only samples pixels with information and it is preferable to use at the final steps with few and irregular distributed pixels.
  • +
  • Note 3: The default value is 500, for small number of trees (at least 1/3 of the number of predictors) use an odd number for precise prediction results.
  • +
  • Note 4: The argument last = T can be set when only one class should be separated from the background pixels
  • +
  • Note 5: The reference data for the pseudo class in the last step is built using the value of last_ref_val. In case the user gets NA as last class, the user should adjust the value of last_ref_val and re-sample again.
  • +
  • Note 6: For different results the seed should have a different value on each run (use seed = as.numeric(Sys.time())). To repeat a specific run the user just needs to restart R, run again the HaSa::multi_Class_Sampling with the same arguments, this is, keep seed constant, and it will get the same results for all the steps.
  • +
  • Note 7: The results from previous steps are reproducible when using the same seed value and int.seed=Run@seeds (e.g. ) in consequence, init.sample for regular sampling determines an invariant sample distribution, use random sampling or vary init.sample to get varying sample distributions.
  • +
  • Note 8: If multi_test > 1 the user will get multiple maps and will be asked to enter the number of the probability distribution that is appropriate.

Habitat sampling output. R object that contains:

        models          # selected classifiers
@@ -3242,7 +3246,7 @@ div.tocify {
         seeds           # seeds to reproduce respective step/habitat type sampling
-

4.1.2 Interactive probability maps and downloading output

+

4.1.4 Interactive probability maps and downloading output

An interactive map is plotted in a web browser (e.g., Firefox for Linux) containing a selected habitat type. The number of models predicting this habitat type can be viewed by hovering the mouse over the map.

From this interactive map, the user has two choices:

diff --git a/docs/README.md b/R-package/hasa/vignettes/HabitatSampler.md similarity index 91% rename from docs/README.md rename to R-package/hasa/vignettes/HabitatSampler.md index 5557f40e5b7603b5ffca79b098161da1d5ca95fd..23546312ec10f096f8d4704dffcaff10ec9d455a 100644 --- a/docs/README.md +++ b/R-package/hasa/vignettes/HabitatSampler.md @@ -19,18 +19,17 @@ - [4.2 Generating classification map and summary statistics](#generating-classification-map-and-summary-statistics) - # 1 Introduction This manual introduces the Habitat Sampler (HaSa), an innovative tool that autonomously generates representative reference samples for -predictive modelling of surface class probabilities. The tool can be +predictive modeling of surface class probabilities. The tool can be applied to any image data that displays surface structures and dynamics of any kind at multiple spatial and temporal scales. HaSa was initially developed to classify habitat dynamics in semi-natural ecosystems but the procedure can theoretically be applied to any surface. The main innovation of the tool is that it reduces reliance on comprehensive in -situ ground truth data or comprehensive training datasets which +situ ground truth data or comprehensive training data sets which constrain accurate image classification particularly in complex scenes. Though development of HaSa has prioritized ease of use, this @@ -38,7 +37,7 @@ documentation assume a familiarity with the R software. The document is built successively and is intended to lead you step-by-step through the HaSa procedure of generating probability and classification maps. HaSa is still in development and any suggestions or improvements are welcomed -and encouraged in our [GitHub Community +and encouraged in our [GitLab Community Version](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler.git). If questions remain please don’t hesitate to contact the authors of the package. For a detailed description of the Habitat Sampler and its @@ -134,7 +133,7 @@ remotes::install_git("https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler. Before the user starts using `HaSa` it is necessary to load the library and some of its dependencies. In the following example, the libraries to be loaded are passed as a list. The option -`options("rgdal_show_exportToProj4_warnings"="none")` is to supress the +`options("rgdal_show_exportToProj4_warnings"="none")` is to suppress the warning messages related with the latest changes in `gdal` and `PROJ6`. ``` r @@ -162,7 +161,7 @@ code sets all the paths assuming that the root path is the current directory, i.e., the `demo` directory. ``` r -wd <- paste(tools::file_path_as_absolute("./"), "/../demo/", sep = "") +wd <- paste(tools::file_path_as_absolute("./"), "/../../../demo/", sep = "") # demo data stored here dataPath <- paste(wd, "Data/", sep = "") @@ -322,10 +321,11 @@ Class sampling input HaSa::multi_Class_Sampling( in_raster = raster_stk, # clipped satellite time series stack [raster brick] num_samples = 75, # starting number of spatial samples (recommended value: 75) + # for more info *See note 1 sample_type = "regular_raster", # distribution of spatial samples # ("random_raster","regular_raster", "matrix_random") - # recommended: "regular_raster" *See note 1 + # recommended: "regular_raster" *See note 2 num_models = 200, # number of models to collect (recommended value: 200) num_iterations = 10, # number of iterations for model accuracy # (recommended value:10) @@ -337,23 +337,24 @@ HaSa::multi_Class_Sampling( # recommended input: rf) num_trees = 500, # if the model is "rf" set the number of trees for the # random forest algorithm, otherwise, the parameter is - # ignored *See note 2 + # ignored *See note 3 mtry = 10, # splitting nodes (recommended: mtry < number of predictors) mod_error = 0.02, # threshold for model error until which iteration is being last = FALSE, # only TRUE for one class classifier - # recommended input: FALSE *See note 3 + # recommended input: FALSE *See note 4 last_ref_val = 1000, # default reference value for the last step (default: 1000) + # for more info *See note 5 seed = - as.integer(Sys.time()), # set seed for reproducible results *See note 4 + as.integer(Sys.time()), # set seed for reproducible results *See note 6 init_seed = "sample", # "sample" for new or use Run@seeds to reproduce previous - # steps *See note 5 + # steps *See note 7 out_path = out_path, # output path for saving results step = 1, # at which step should the procedure start, e.g. use # step = 2 if the first habitat is already extracted class_names = class_names,# vector with class names in the order of reference spectra n_classes = 7, # total number of classes to be separated multi_test = 1, # number of test runs to compare different probability - # output *See note 6 + # output *See note 8 RGB = c(19,20,21), # pallette colors for the interactive plots color = c("lightgrey", "orange", "yellow", "limegreen", "forestgreen") @@ -369,26 +370,41 @@ HaSa::multi_Class_Sampling( ) ``` - - **Note 1**: There are threes sampling strategies: `random_raster` + - **Note 1**: In case it is not possible to find models, increasing + the number of `num_samples` and `num_models` is not always the + solution. The user should also try to re-sample with a different + `seed` value. + - **Note 2**: There are threes sampling strategies: `random_raster` (it uses `raster::sampleRandom`), `raster_regular` (it uses `raster::sampleRegular`), and `random_matrix` (it uses matrices and - the `stats::sample` function over only existent non NaN pixels). - - **Note 2**: The default value is 500, for small number of trees (at + the `stats::sample` function over only existent non `NaN` pixels). + The `regular_raster` -\> fast: preferable at the beginning of + sampling procedure. The `random_raster` -\> slow: it only samples + pixels with information and it is preferable to use at the final + steps with few and irregular distributed pixels. The `random_matrix` + -\> fast: it only samples pixels with information and it is + preferable to use at the final steps with few and irregular + distributed pixels. + - **Note 3**: The default value is 500, for small number of trees (at least 1/3 of the number of predictors) use an odd number for precise prediction results. - - **Note 3**: The argument `last = T` can be set when only one class + - **Note 4**: The argument `last = T` can be set when only one class should be separated from the background pixels - - **Note 4**: For different results the `seed` should have a different + - **Note 5**: The reference data for the pseudo class in the last step + is built using the value of `last_ref_val`. In case the user gets NA + as last class, the user should adjust the value of `last_ref_val` + and re-sample again. + - **Note 6**: For different results the `seed` should have a different value on each run (use `seed = as.numeric(Sys.time())`). To repeat a specific run the user just needs to restart R, run again the `HaSa::multi_Class_Sampling` with the same arguments, this is, keep `seed` constant, and it will get the same results for all the steps. - - **Note 5**: The results from previous steps are reproducible when + - **Note 7**: The results from previous steps are reproducible when using the same seed value and `int.seed=Run@seeds` (e.g. ) in consequence, `init.sample` for regular sampling determines an invariant sample distribution, use `random` sampling or vary `init.sample` to get varying sample distributions. - - **Note 6**: If `multi_test > 1` the user will get multiple maps and + - **Note 8**: If `multi_test > 1` the user will get multiple maps and will be asked to enter the number of the probability distribution that is appropriate. @@ -405,7 +421,7 @@ Habitat sampling output. R object that contains: seeds # seeds to reproduce respective step/habitat type sampling ``` -### 4.1.2 Interactive probability maps and downloading output +### 4.1.4 Interactive probability maps and downloading output An interactive map is plotted in a web browser (e.g., Firefox for Linux) containing a selected habitat type. The number of models predicting this diff --git a/docs/HabitatSampler.pdf b/R-package/hasa/vignettes/HabitatSampler.pdf similarity index 87% rename from docs/HabitatSampler.pdf rename to R-package/hasa/vignettes/HabitatSampler.pdf index d81ab0aa20228272fdc6ebf205c881ce855f2e80..e06dcb76884250044d0ab2ce62f3f3c2806277da 100644 Binary files a/docs/HabitatSampler.pdf and b/R-package/hasa/vignettes/HabitatSampler.pdf differ diff --git a/docs/HabitatSampler_files/figure-gfm/plot configuration-1.png b/R-package/hasa/vignettes/HabitatSampler_files/figure-gfm/plot configuration-1.png similarity index 100% rename from docs/HabitatSampler_files/figure-gfm/plot configuration-1.png rename to R-package/hasa/vignettes/HabitatSampler_files/figure-gfm/plot configuration-1.png diff --git a/docs/HabitatSampler_files/figure-gfm/raster preview clipped-1.png b/R-package/hasa/vignettes/HabitatSampler_files/figure-gfm/raster preview clipped-1.png similarity index 100% rename from docs/HabitatSampler_files/figure-gfm/raster preview clipped-1.png rename to R-package/hasa/vignettes/HabitatSampler_files/figure-gfm/raster preview clipped-1.png diff --git a/docs/HabitatSampler_files/figure-markdown_github/plot configuration-1.png b/R-package/hasa/vignettes/HabitatSampler_files/figure-markdown_github/plot configuration-1.png similarity index 100% rename from docs/HabitatSampler_files/figure-markdown_github/plot configuration-1.png rename to R-package/hasa/vignettes/HabitatSampler_files/figure-markdown_github/plot configuration-1.png diff --git a/docs/HabitatSampler_files/figure-markdown_github/raster preview clipped-1.png b/R-package/hasa/vignettes/HabitatSampler_files/figure-markdown_github/raster preview clipped-1.png similarity index 100% rename from docs/HabitatSampler_files/figure-markdown_github/raster preview clipped-1.png rename to R-package/hasa/vignettes/HabitatSampler_files/figure-markdown_github/raster preview clipped-1.png diff --git a/R-package/hasa/vignettes/Habitat_Sampler_Usage.Rmd b/R-package/hasa/vignettes/Habitat_Sampler_Usage.Rmd deleted file mode 100644 index 45d2837ae6988a7a9c2455e3c70ad680dc32d114..0000000000000000000000000000000000000000 --- a/R-package/hasa/vignettes/Habitat_Sampler_Usage.Rmd +++ /dev/null @@ -1,86 +0,0 @@ ---- -title: "Habitat_Sampler_Usage" -author: "Carsten Neumann" -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{Habitat_Sampler_Usage} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- -## Workflow of Habitat Sampling and Probability Mapping -```{r setup} -library(HaSa) -``` - -```R -multi_Class_Sampling(...) -``` -## step 1 - -**A)** an interactive map is plotted in a web browser (firefox for linux), containing: -a) background map -b) RGB image -c) selected habitat type map -d) probaility threshold on mouse hover -e) predictive distance - -**B)** the user has to decide to extract this habitat type on the basis of a threshold **(B.1)** or to sample again **(B.2)** - -### B.1 -``` -enter threshold in R console -``` -6 files are saved to disk for the selected habitat type -a) HabitatSampler object (Run) - R Binary -b) probability map - *.kml, *.png, geocoded *.tif -c) threshold list - R Binary -d) leaflet interactive web interface - *.html - -after habitat extraction is done the user have to decide to adjust starting number of samples and number of models or proceed automaticlay to the next step -``` -enter sample/model adjsutement (../..) or auto (0) in R console -``` -## step 2 ... proceed with A) - -### B.2 -``` -enter 0 in R console -``` -the user have to decide to adjust starting number of samples and number of models or proceed automaticlay to new sampling -``` -enter sample/model adjsutement (../..) or auto (0) in R console -``` -...proceed with A until decision (B.1) has made - -## step 2 ... proceed with A) - --------- -### if convergence fails / no models can be selected / num_samples are to little / or another error occurs, restart next step with: -```R -multi_Class_Sampling(in_raster = out.raster, reference = out.reference, class_names = out.names, ... ) -``` -step = specify next step number - --------- -## remarks -1) the results from previous steps are reproducable when using the same seed value and int.seed=Run@seeds (e.g. Run02@seeds) in consequence, init.sample for regular sampling determines an invariant sample distribution, use random sampling or vary init.sample to get varying sample distributions -2) regular sampling is faster -3) last = T can be set when only one class should be separated from the background pixels -4) The R object Run holds slots of: -models = selected classifiers -ref_samples = spatial points of selected samples (see ?write_Out_Samples.r) -switch = the target class is [2] if switch is not NA then the target class must be changed from [1] to [2] (see write_Out_Samples.r) -layer = raster layer of habitat type probability -mod_all = all classifiers from num_models -class_ind = predictive distance metric for all classes -seeds = seeds to reproduce respecitve step/habitat type sampling -5) if multi_test > 1 the user will get multiple maps and will be ask to enter the number of the probability distribution that is apropriate - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>" -) -``` - - diff --git a/docs/images/Logo.jpg b/R-package/hasa/vignettes/images/Logo.jpg similarity index 100% rename from docs/images/Logo.jpg rename to R-package/hasa/vignettes/images/Logo.jpg diff --git a/docs/images/Logo.png b/R-package/hasa/vignettes/images/Logo.png similarity index 100% rename from docs/images/Logo.png rename to R-package/hasa/vignettes/images/Logo.png diff --git a/docs/images/Results1.png b/R-package/hasa/vignettes/images/Results1.png similarity index 100% rename from docs/images/Results1.png rename to R-package/hasa/vignettes/images/Results1.png diff --git a/docs/images/Results2.png b/R-package/hasa/vignettes/images/Results2.png similarity index 100% rename from docs/images/Results2.png rename to R-package/hasa/vignettes/images/Results2.png diff --git a/docs/images/Results3.png b/R-package/hasa/vignettes/images/Results3.png similarity index 100% rename from docs/images/Results3.png rename to R-package/hasa/vignettes/images/Results3.png diff --git a/docs/images/Results4.png b/R-package/hasa/vignettes/images/Results4.png similarity index 100% rename from docs/images/Results4.png rename to R-package/hasa/vignettes/images/Results4.png diff --git a/docs/images/figure_1.jpg b/R-package/hasa/vignettes/images/figure_1.jpg similarity index 100% rename from docs/images/figure_1.jpg rename to R-package/hasa/vignettes/images/figure_1.jpg diff --git a/docs/images/figure_1.png b/R-package/hasa/vignettes/images/figure_1.png similarity index 100% rename from docs/images/figure_1.png rename to R-package/hasa/vignettes/images/figure_1.png diff --git a/docs/images/figure_2.jpg b/R-package/hasa/vignettes/images/figure_2.jpg similarity index 100% rename from docs/images/figure_2.jpg rename to R-package/hasa/vignettes/images/figure_2.jpg diff --git a/docs/images/figure_2.png b/R-package/hasa/vignettes/images/figure_2.png similarity index 100% rename from docs/images/figure_2.png rename to R-package/hasa/vignettes/images/figure_2.png diff --git a/docs/images/inter_map_ex.png b/R-package/hasa/vignettes/images/inter_map_ex.png similarity index 100% rename from docs/images/inter_map_ex.png rename to R-package/hasa/vignettes/images/inter_map_ex.png diff --git a/docs/images/inter_map_ex_new.png b/R-package/hasa/vignettes/images/inter_map_ex_new.png similarity index 100% rename from docs/images/inter_map_ex_new.png rename to R-package/hasa/vignettes/images/inter_map_ex_new.png diff --git a/docs/images/plot configuration-1.png b/R-package/hasa/vignettes/images/plot configuration-1.png similarity index 100% rename from docs/images/plot configuration-1.png rename to R-package/hasa/vignettes/images/plot configuration-1.png diff --git a/docs/images/raster_preview.png b/R-package/hasa/vignettes/images/raster_preview.png similarity index 100% rename from docs/images/raster_preview.png rename to R-package/hasa/vignettes/images/raster_preview.png diff --git a/docs/images/raster_preview_clipped.png b/R-package/hasa/vignettes/images/raster_preview_clipped.png similarity index 100% rename from docs/images/raster_preview_clipped.png rename to R-package/hasa/vignettes/images/raster_preview_clipped.png diff --git a/docs/images/results.png b/R-package/hasa/vignettes/images/results.png similarity index 100% rename from docs/images/results.png rename to R-package/hasa/vignettes/images/results.png diff --git a/README.rst b/README.rst index cdf64fe0fe85eb716e47b765ca25bb601037f12e..0ef1a231f2d357ab30673aca5293fcdfb990dc7a 100644 --- a/README.rst +++ b/README.rst @@ -1,4 +1,4 @@ -.. figure:: docs/images/Logo.png +.. figure:: R-package/hasa/vignettes/images/Logo.png :target: https://github.com/carstennh/HabitatSampler/tree/master/demo :align: center @@ -12,7 +12,7 @@ How to use 1. R package ----------------------- * You need R to install the **HaSa package** that includes all functions and demo data. -* The installation steps for R version **4.1.0** are defined in `Section 2.0 of the documentation `_. +* The installation steps for R version **4.1.0** are defined in `Section 2.0 of the documentation `_. * For Ubuntu systems the following system packages dependencies need to be installed: .. code-block:: @@ -21,14 +21,14 @@ How to use * For Windows operating systems the `Rtools `_ are needed * library(HaSa) and list datasets: data(package="HaSa") and functions: lsf.str("package:HaSa") or use library(help="HaSa") -* Information about program execution and function behavior is available in Rmarkdown: `HabitatSampler_Usage `_ +* Information about program execution and function behavior is available in Rmarkdown: `HabitatSampler.Rmd `_ 2. Stepwise Procedure ---------------------------------- * The **demo** directory provides a step-wised procedure via an R script: **HabitatSampler.r**, but also via a Jupyter notebook **HabitatSampler.ipynb**. -* All necessary data and information is available under the directory: `demo `_ -* For documentation please check the `docs `_ directory. +* All necessary data and information is available under the directory: `demo `_ +* For documentation please check the `HaSa vignettes `_ directory. Input ---------------- @@ -40,7 +40,7 @@ Output ---------------- * **Interactive Maps** of class type probabilities -.. image:: docs/images/figure_1.jpg +.. image:: R-package/hasa/vignettes/images/figure_1.jpg :width: 700px @@ -50,7 +50,7 @@ Output * the classes are referred to as class types -.. image:: docs/images/figure_2.jpg +.. image:: R-package/hasa/vignettes/images/figure_2.jpg :width: 450px Key Features