diff --git a/docs/01_Introduction.md b/docs/01_Introduction.md index cc5a0d46152d507d1b8aa8950d4bce080ca12bd4..60b5db62f190a360f52cafc8564cbd4106c4517a 100644 --- a/docs/01_Introduction.md +++ b/docs/01_Introduction.md @@ -1,10 +1,11 @@ # Introduction -The [OpenBuildingMap](https://www.openbuildingmap.org/) (OBM) gap analysis program evaluates -the completeness of buildings in OpenStreetMap. It compares the existing buildings with global -high-resolution open-data settlement datasets as well as detailed remote sensing data with a -local scope and categorizes the status on -[Quadtree-based](https://www.sciencedirect.com/topics/engineering/quadtrees) tiles with a level-18 resolution. +The [OpenBuildingMap](https://www.openbuildingmap.org/) (OBM) gap analysis program produces the required +data to assess the completeness status of buildings in OpenStreetMap. This is done by processing gridded +data in the shape of raster files and returning actual polygons. With it, the comparison between existing +buildings with global high-resolution open-data settlement datasets as well as detailed self assessed data +derived with remote sensing techniques with a local scope and categorizes the status on +[Quadtree-based](https://www.sciencedirect.com/topics/engineering/quadtrees) tiles with any desired resolution. # Copyright and Copyleft diff --git a/docs/02_Input_datasets.md b/docs/02_Input_datasets.md index 9b1b579cb39a1db5c43b3a00a1849c32ac950be4..902d36c9466329aa4d9fe9a16ea32bed7cdd4a61 100644 --- a/docs/02_Input_datasets.md +++ b/docs/02_Input_datasets.md @@ -5,19 +5,20 @@ shape of a settlement by assigning pixel values to different status (e.g. 1=buil There are currently some models based on remote sensing observations that depict the settlement distributions among the world (e.g. [GHSL-BUILT](https://ghsl.jrc.ec.europa.eu/download.php) -available in 30m resolution or [WSF-2015](https://www.esa.int/Applications/Observing_the_Earth/Mapping_our_global_human_footprint) available in 10m resolution). -have different ways to describe the surface but can be reduced to a binary solution, in this -document a short description of the datasets can be found. For more information, check the datasets -official sites. +available in 30m resolution or [WSF-2015](https://www.esa.int/Applications/Observing_the_Earth/Mapping_our_global_human_footprint) +available in 10m resolution). These datasets have different ways to describe the surface (e.g. land cover +classification, multi-temporal built-up expansion, population density, etc.). Still, they can be reduced +to a binary solution, in this document a short description of the datasets can be found. For more information, +check the datasets official sources. ## source_id -The input datasets can be instantiated with a `method_id` if desired. This is recommended if the aim is -to create a multi-source product. This option can be activated by setting the `method_id` argument to -any integer value. By assigning a `method_id`, the output can be queried by this ID. A multi-source +The input datasets can be instantiated with a `source_id` if desired. This is recommended if the aim is +to create a multi-source product. This option can be activated by setting the `source_id` argument to +any integer value. By assigning a `source_id`, the output can be queried by this ID. A multi-source output may be organized like this: -| method_id | description | +| source_id | description | |:------:|:------:| | 1 | Built areas produced with GHSL data from 2014| | 2 | Built areas produced with GHSL data from 1975| @@ -25,10 +26,13 @@ output may be organized like this: | ... | ... | | n | Built areas retrieved with n dataset | +It is advised to make use of these IDs, since they make further calculations easier when the buildings +model to evaluate (e.g. OpenStreetMap) will have a `source_id = 0` + # Data structure This program makes use of the original data structure from the [GHSL-BUILT](https://ghsl.jrc.ec.europa.eu/download.php) -dataset. This structure can also be applied to any other group of rasters in order to use custom +dataset. This structure can also be applied to any other group of raster files in order to use custom datasets. The structure has two main components to follow (`file paths` and `raster_files_index`). ## File paths @@ -53,7 +57,7 @@ where the different datasets are stored and should follow this structure. +-- dataset_n_directory The naming of the directories, subdirectories and raster tiles is arbitrary as long as it is well -established in the `raster index` files. +established in the `raster_files_index` files. ## raster_files_index @@ -71,9 +75,9 @@ The `raster_files_index` have the following minimum structure | location | geometry | |:------:|:------:| | dataset_1_directory/dataset_subdirectory_3/raster_1.tif | POLIGON((... ... , ... ... , ... ... , ... ... , ... ...)) | -| dataset_1_directory/dataset_subdirectory_3/raster_2.tif | POLIGON((... ... , ... ... , ... ... , ... ... , ... ...)) | +| dataset_1_directory/dataset_subdirectory_4/raster_2.tif | POLIGON((... ... , ... ... , ... ... , ... ... , ... ...)) | | ... | POLIGON((... ... , ... ... , ... ... , ... ... , ... ...)) | -| dataset_n_directory/dataset_subdirectory_n/raster_n.tif | POLIGON((... ... , ... ... , ... ... , ... ... , ... ...)) | +| dataset_1_directory/dataset_subdirectory_n/raster_n.tif | POLIGON((... ... , ... ... , ... ... , ... ... , ... ...)) | `location.dtype = str` relative path of the raster tile diff --git a/docs/03_Configuration_file.md b/docs/03_Configuration_file.md index 74a76c50773caa1d3da473a20a3881e7f5934ae2..475fb6fec3d0536d9c159a7607120862c67442cd 100644 --- a/docs/03_Configuration_file.md +++ b/docs/03_Configuration_file.md @@ -1,9 +1,10 @@ # Configuration file -The OpenBuildingMap (OBM) gap analysis program can be configured to fit user needs. This is done by -changing the different parameters within the `config.yml` file. A sample file can be found in the -package directory under the filename `config-example.yml`. by using the `-conf` argument, the -configuration can be extracted from a `.yml` file with a custom name. If not, the default file is `config.yml` +The OpenBuildingMap (OBM) gap analysis program can be configured to fit user needs and make best usage +of the computing resources. This is done by changing the different parameters within the `config.yml` +file. A sample file can be found in the package directory under the filename `config-example.yml`. by +using the `-conf` argument, the configuration can be extracted from a `.yml` file with a custom name. +If not, the default file is `config.yml` on the current directory ## config.yml @@ -29,12 +30,15 @@ input the tiles. If set to `True`, Quadkeys are read from `txt_filepath` instead txt_filepath (str): file path of a text file with all quadkeys to be read The following parameters define the processing output and can improve the performance of the program. -First, `output_pathname` is the directory to store and read CSV files for further import in SQL. The -`number_cores` parameter refers to the maximum number of parallel processes the system can handle for -tile processing. `number_cores_import` allows the parallel import of csv files into a database. This -number should not be too high, since the database could fail with many parallel imports. Finally, -`batch_size` sets the maximum amount of tiles to be handled per process. Each CSV file may contain -maximum this amount of tiles if all of them provide built areas. +First, `output_pathname` is the directory to store and read CSV files for further import in SQL. The +`obm_output_pathname` is the directory where CSV files with data coming from OSM or OBM will be stored. +The `number_cores` parameter refers to the maximum number of parallel processes the system can handle for +tile processing, this depends on the amount of threads that the CPU can support. `number_cores_import` +allows the parallel import of csv files into a database, normally this value is lower than `number_cores`, +as these imports are much faster and many parallel jobs could cause a bottle-neck in the database. The +`batch_size` sets the maximum amount of tiles to be handled per process and possible within a single CSV +file, this value in combination with the amount of cores will set the amount to RAM usage at the same time. +Finally, the `get_geometry` parameter sets whether the polygons of built-up areas will be stored or not. output_pathname (str): Target path name for the csv file writing and import. obm_output_pathname (str): Target path name for the OBM csv file writing and import. @@ -43,15 +47,17 @@ maximum this amount of tiles if all of them provide built areas. batch_size (int): Maximum amount of tiles to be handled per process. get_geometry (bool): If True, geometries will be stored in the output csv files. -The last sections refer to database connections. `database` holds a database from which roads can be -extracted to refine built areas, also it may contain buildings if the program wants to calculate a -tile based built-up area. The `process_buffer_magnitude` is a parameter that defines how the OBM roads -(defined as lines and not polygons) are processed, giving them a width. Be careful to use the same units as in the -`datasource.crs` (meters or deg). `target_database` is a second database with the table where processed -tiles will be imported into. +The last sections refer to database connections. `roads_database` points to a table within a database where +roads can be extracted from, in order to refine built area. The `process_buffer_magnitude` is a parameter +that defines how the OBM roads (defined as lines and not polygons) are processed, giving them a width and +removing the resulting areas from the built-up coarse polygons. Be careful to use the same units as in the +`datasource.crs` (meters or deg). The `buildings_database` section holds the connection to a table in a +database from which building footprints will be requested. This database is used on the `--obm_built_up` +routine. Finally, the `target_database` is another database with the table where processed tiles will be +imported into. This database is used during the `--import_csv` routine. - database: + roads_database: host (str): Postgres Database host address. dbname (str): PostgreSQL database name. port (int): Port to connect to the PostgreSQL database. @@ -62,6 +68,16 @@ tiles will be imported into. geometry_field (str): Name of the column with geometries. process_buffer_magnitude (float): Numeric magnitude for the polygon buffer (units are equal to the coordinate system units). + + buildings_database: + host (str): Postgres Database host address. + dbname (str): PostgreSQL database name. + port (int): Port to connect to the PostgreSQL database. + username (str): User to connect to the PostgreSQL database. + password (str or getpass.getpass): Password for `username` argument. + roads_table: + tablename (str): Table name within database for searching. + geometry_field (str): Name of the column with geometries. target_database: host (str): Postgres Database host address.