*(Unfortunately we are not allowed to redistribute the data and you'll need a password to access the sources. [Read here](https://git.gfz-potsdam.de/dynamicexposure/datasources/-/tree/master/ESRM20_boundaries) for more information about the data being used.)*
*(Unfortunately we are not allowed to redistribute the data and you'll need a password to access the sources. [Read here](https://git.gfz-potsdam.de/dynamicexposure/datasources/-/tree/master/ESRM20_boundaries) for more information about the data being used.)*
3. Place the downloaded data into paths and directories of your preference.
3. Place the downloaded data into paths and directories of your preference.
4. If you wish to run `gde-importer` for the industrial exposure of a country for which the geographical units used in ESRM20 are 30-arcsec cells, please read the [special preliminary steps for 30-arcsec industrial cells](#special-preliminary-steps-for-30-arcsec-industrial-cells) down below.
4. If you wish to run `gde-importer` for the industrial exposure of a country for which the geographic units used in ESRM20 are 30-arcsec cells, please read the [special preliminary steps for 30-arcsec industrial cells](#special-preliminary-steps-for-30-arcsec-industrial-cells) down below.
### Configuration
### Configuration
#### Quickstart:
Copy the file `config_example.yml` to your working directory as `config.yml` and provide the necessary parameters. Required parameters are:
Copy the file `config_example.yml` to your working directory as `config.yml` and provide the necessary parameters:
The following configuration options are available in the `config.yml`:
-`model_name`: Name of the input aggregated exposure model (only relevant for the user).
-`exposure_format`: Format of the input aggregated exposure model. Currently supported values: esrm20.
-`data_pathname`: Path to directory that contains the model data.
-`data_pathname`: Path to directory that contains the model data.
-`boundaries_pathname`: Path to directory that contains the boundary geodata files.
-`boundaries_pathname`: Path to directory that contains the boundary geodata files.
-`domain_boundary_filepath`: Path to the geodata file that contains the boundaries within which the input aggregated model is defined.
-`occupancies_to_run`: List of occupancies for which the code will be run, separated by ", " (comma and space). They need to exist for the indicated `exposure format`. Currently supported values: residential, commercial, industrial.
-`occupancies_to_run`: List of occupancies for which the code will be run, separated by ", " (comma and space). They need to exist for the indicated `exposure format`. Currently supported values: residential, commercial.
-`exposure_entities_to_run`: List of names of exposure entities for which the code will be run. Currently supported options:
-`exposure_entities_to_run`: List of names of exposure entities for which the code will be run. Currently supported options:
- "all": The list of names will be retrieved from the metadata of the input aggregated exposure model.
- "all": The list of names will be retrieved from the metadata of the input aggregated exposure model.
- A comma-space-separated list of entity names: This list of names will be used.
- A comma-space-separated list of entity names: This list of names will be used.
...
@@ -83,8 +71,18 @@ The following configuration options are available in the `config.yml`:
...
@@ -83,8 +71,18 @@ The following configuration options are available in the `config.yml`:
-`database_built_up`: Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles#obm_built_area_assessments-completeness-assessments-information) where the built-up areas per quadtile are stored.
-`database_built_up`: Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles#obm_built_area_assessments-completeness-assessments-information) where the built-up areas per quadtile are stored.
-`database_gde_tiles`: Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles) where information on the GDE tiles is stored.
-`database_gde_tiles`: Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles) where information on the GDE tiles is stored.
-`data_units_surface_threshold`: Percentage difference (float between 0.0 and 100.0) of geographic areas to define the need to create data units to fill an exposure entity, if the data units defined in an aggregated exposure model do not fully cover the geographic extents of the exposure entity. Default: 1.0%.
-`data_units_surface_threshold`: Percentage difference (float between 0.0 and 100.0) of geographic areas to define the need to create data units to fill an exposure entity, if the data units defined in an aggregated exposure model do not fully cover the geographic extents of the exposure entity. Default: 1.0%.
-`data_units_min_admisible_area`: Minimum surface area (in m2) of data units created to fill an exposure entity if the data units defined in an aggregated exposure model do not fully cover the geographic extents of the exposure entity. Needs to be smaller than `data_units_max_admisible_area`.
-`data_units_max_admisible_area`: Maximum surface area (in m2) of data units created to fill an exposure entity if the data units defined in an aggregated exposure model do not fully cover the geographic extents of the exposure entity. If the area to cover is larger than `data_units_max_admisible_area`, it gets successively subdivided until complying with this requisite. Needs to be larger than `data_units_min_admisible_area`.
Optional parameters are the following:
-`model_name`: Name of the input aggregated exposure model (only relevant for the user).
-`exposure_format`: Format of the input aggregated exposure model. Currently supported values: esrm20.
-`domain_boundary_filepath`: Path to the geodata file that contains the boundaries within which the input aggregated model is defined.
-`force_creation_data_units`: Create data units to fill an exposure entity irrespective of other conditions (e.g. irrespective of `data_units_surface_threshold`). Default: False.
-`force_creation_data_units`: Create data units to fill an exposure entity irrespective of other conditions (e.g. irrespective of `data_units_surface_threshold`). Default: False.
Further details on the meaning and use of these parameters can be found in the [documentation](docs/04_Configuration.md).
### Special preliminary steps for 30-arcsec industrial cells
### Special preliminary steps for 30-arcsec industrial cells
The following countries have their industrial exposure models defined in terms of 30-arcsec cells in the ESRM20 model (names and underscores as per naming in ESRM20): Albania, Austria, Belgium, Bosnia_and_Herzegovina, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Moldova, Montenegro, Netherlands, North_Macedonia, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, United_Kingdom. This list might change. This information can be retrieved from the [ESRM20 exposure model repository](https://gitlab.seismo.ethz.ch/efehr/esrm20_exposure), under the path `esrm20_exposure/sources/European_Exposure_Model_Data_Inputs_Sources.xlsx`.
The following countries have their industrial exposure models defined in terms of 30-arcsec cells in the ESRM20 model (names and underscores as per naming in ESRM20): Albania, Austria, Belgium, Bosnia_and_Herzegovina, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Moldova, Montenegro, Netherlands, North_Macedonia, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, United_Kingdom. This list might change. This information can be retrieved from the [ESRM20 exposure model repository](https://gitlab.seismo.ethz.ch/efehr/esrm20_exposure), under the path `esrm20_exposure/sources/European_Exposure_Model_Data_Inputs_Sources.xlsx`.
User-configurable parameters need to be provided in a file named `config.yml`, located in the working directory. The file [config_example.yml](../config_example.yml) in this repository can be used as a starting point.
## General parameters
-`model_name` (optional): Name of the input aggregated exposure model (only relevant for the user). E.g. "ESRM20".
-`exposure_format` (optional): Format of the input aggregated exposure model. Currently supported values: esrm20.
-`data_pathname` (required): Path to directory that contains the model data. For example, if the input aggregated exposure model is ESRM20, and the [ESRM20 repository](https://gitlab.seismo.ethz.ch/efehr/esrm20_exposure) has been cloned to the path `/home/username/`, then `data_pathname=/home/username/esrm20_exposure`.
-`boundaries_pathname` (required): Path to directory that contains the boundary geodata files.
## Parameters that control what cases are run
An input aggregated exposure model may cover different [exposure entities](02_Organisation_Geographic_Space.md#exposure-entity) and different [occupancy cases](02_Organisation_Geographic_Space.md#occupancy-cases). These parameters allow to control which are run when calling `gde-importer`:
-`occupancies_to_run` (required): List of occupancies for which the code will be run, separated by ", " (comma and space). They need to exist for the indicated `exposure_format`. Currently supported values: residential, commercial, industrial.
-`exposure_entities_to_run` (required): List of names of exposure entities for which the code will be run. Currently supported options:
- "all": The list of names will be retrieved from the metadata of the input aggregated exposure model.
- A comma-space-separated list of entity names: This list of names will be used.
- A full path to a .txt or .csv file: The list of names will be retrieved from the indicated .txt/.csv file.
## Parameters needed to access databases
-`database_built_up` (required): Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles#obm_built_area_assessments-completeness-assessments-information) where the built-up areas per quadtile are stored. The `sourceid` of the built-up areas needs to be indicated as a nested parameter. The `gde-importer` assumes that this database contains a table named `obm_built_area_assessments`. In order to connect to the database, some of all of the following parameters may be required:
- host: name of the host
- dbname: name of the database
- port: port number
- username: user name
- password: password associated with `username`
-`database_gde_tiles` (required): Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles) where information on the GDE tiles is stored. The `gde-importer` assumes that this database contains the tables indicated in the link. In order to connect to the database, some of all of the following parameters may be required:
- host: name of the host
- dbname: name of the database
- port: port number
- username: user name
- password: password associated with `username`
## Parameters associated with ensuring full geographic coverage
Input aggregated exposure models may not have exposure defined for the complete territory of an exposure entity. As explained [here](06_Ensuring_Full_Geographic_Coverage.md), the `gde-importer` contains a special routine to ensure that the whole territory is covered by a potential distribution of building classes. The following parameters control this routine:
-`data_units_surface_threshold` (required): Percentage difference (float between 0.0 and 100.0) of geographic areas to define the need to create data units to fill an exposure entity, if the data units defined in an aggregated exposure model do not fully cover the geographic extents of the exposure entity. Default (and reasonable value based on observation of existing models): 1.0%.
-`force_creation_data_units` (optional): True or False. If True, create data units to fill an exposure entity irrespective of other conditions that are automatically verified by the code (such as `data_units_surface_threshold`, for example). If this parameter is not provided, the `gde-importer` takes it as False.
-`data_units_min_admisible_area` (required): Minimum surface area (in m2) of data units created to fill an exposure entity. It needs to be smaller than `data_units_max_admisible_area`. Its purpose is to avoid the creation of data units that are only artefacts of the resolution of the input boundaries and the accuracy of the geometric calculations carried out. Suggested value: 0.1 (m2).
-`data_units_max_admisible_area` (required): Maximum surface area (in m2) of data units created to fill an exposure entity. If the area to cover is larger than `data_units_max_admisible_area`, it gets successively subdivided until complying with this requisite. It needs to be larger than `data_units_min_admisible_area`. Suggested value: 3e9 (m2).
## Other parameters
-`domain_boundary_filepath` (optional): Path to the geodata file (including the name and extension of the file itself) that contains the boundaries within which the input aggregated model is defined. If provided, `gde-importer` verifies that the boundaries of the exposure entities lie inside it and cut out areas that may fall outside. This is relevant for cases in which the geodata files associated with an exposure entity may include overseas territories that are located, e.g., in other continents and are thus not covered by the input aggregated exposure model.
-`number_cores` (required): Number of cores (integer) used for parallelising the creation and storage of data-unit tiles. If larger than 1, individual data units are sent to different cores to be processed in parallel.
-`exposure_entities_code` (required): This parameter controls the creation of the 3-character code that the `gde-importer` uses to identify [exposure entities](02_Organisation_Geographic_Space.md#exposure-entity). The 3-character code is appended to the begining of the IDs of data units as well (e.g. a data unit with ID "38271" in Greece is stored as "GRC_38271"). If the exposure entities of the input aggregated exposure model are countries, it is recommended to set this parameter to "ISO3", in which case the `gde-importer` will retrieve the corresponding alpha-3 [ISO 3166 country code](https://www.iso.org/iso-3166-country-codes.html), using the [iso3166 library](https://github.com/deactivated/python-iso3166). Alternatively, a nested structure with exposure entities names and 3-character codes can be provided. For example:
When running `gdeimporter`, it is the `main()` function in
[gdeimporter.py](../gdeimporter/gdeimporter.py) that is called and the program is set to run.
The processing logic described herein is the one adopted at present for the `gde-importer`, but
the modularity of the code allows for it to be easily reorganised or modified in the future if
needed. In other words, there are aspects of this processing logic that do not need to follow
the order in which they are currently carried out.
## Reading the Configuration File
The first task carried out by the `gde-importer` is reading and interpreting the
[configuration file](04_Configuration.md). This is done by means of the
[Configuration](../gdeimporter/configuration.py#L29) class, which verifies that all the
required (i.e. non-optional) parameters are available and that their values/types are as
expected by the program. If a problem is found that does not allow the program to run, it will
raise an error and stop.
## Instantiating the Relevant `AggregatedExposureModel` and `ExposureEntity` Classes
Thanks to the `exposure_format` parameter in the configuration file, the `gde-importer` knows
which sub-class of the
[AggregatedExposureModel](02_Organisation_Geographic_Space.md#aggregated-exposure-model) class
to instantiate. During instantiation of the subclass, the object that represents the aggregated
exposure model is created and the attributes associated with indicating its data structure are
assigned. Moreover, critical information about its
[exposure entities](02_Organisation_Geographic_Space.md#exposure-entity) is retrieved from its
metadata files, by means of the `retrieve_exposure_entities` method.
The `retrieve_exposure_entities` method carries out the following actions:
- Retrieve the names of all exposure entities associated with the aggregated exposure model.
- For each exposure entity and occupancy case (covered by the aggregated exposure model):
- retrieve details on how the data units are defined (administrative units of which level or
30-arcsec cells, for example);
- retrieve the factors associated with the
[distribution of the population during different times of the day](03_Organisation_Building_Information.md#distribution-of-the-population-during-different-times-of-the-day);
- retrieve the factors associated with the
[disaggregation of replacement costs](03_Organisation_Building_Information.md#disaggregation-of-replacement-costs);
- For each exposure entity, retrieve its boundary and "trim" it if part of it falls outside the
geometry of the domain of the aggregated exposure model (only if if is provided by means of
specifying the `domain_boundary_filepath` parameter in the configuration file).
- Create an [ExposureEntity](02_Organisation_Geographic_Space.md#exposure-entity) object for
each exposure entity, and assign them to the corresponding attribute of the
`AggregatedExposureModel`.
It is noted that, at this stage, metadata is retrieved for all exposure entities and all
occupancy cases covered by the aggregated exposure model, not just the ones specified within
the configuration file under the parameters `exposure_entities_to_run` and
`occupancies_to_run`. This is needed so as to be able to assign
`exposure_entities_to_run="all"` in the configuration file and have the `gde-importer`
automatically interpret this as a list of names of exposure entities without the need for the
user to enumerate them. This is particularly useful for models that cover several exposure
entities (e.g. ESRM20).
## Storing Information on the Source of the Aggregated Exposure Model
After having instantiated the `AggregatedExposureModel` and `ExposureEntity` classes, the
`gde-importer` stores the `name` and `format` of the aggregated exposure model in the