Commit 93647423 authored by Cecilia Nievas's avatar Cecilia Nievas
Browse files

Added documentation

parent c937e469
Pipeline #40804 passed with stage
in 2 minutes and 13 seconds
...@@ -53,27 +53,15 @@ Currently the `gde-importer` only supports the European Seismic Risk Model 2020 ...@@ -53,27 +53,15 @@ Currently the `gde-importer` only supports the European Seismic Risk Model 2020
$ `wget --http-user=USERNAME --http-password=PASSWORD --recursive --no-parent https://datasources.dynamicexposure.org/private/ESRM20_boundaries/data/` $ `wget --http-user=USERNAME --http-password=PASSWORD --recursive --no-parent https://datasources.dynamicexposure.org/private/ESRM20_boundaries/data/`
*(Unfortunately we are not allowed to redistribute the data and you'll need a password to access the sources. [Read here](https://git.gfz-potsdam.de/dynamicexposure/datasources/-/tree/master/ESRM20_boundaries) for more information about the data being used.)* *(Unfortunately we are not allowed to redistribute the data and you'll need a password to access the sources. [Read here](https://git.gfz-potsdam.de/dynamicexposure/datasources/-/tree/master/ESRM20_boundaries) for more information about the data being used.)*
3. Place the downloaded data into paths and directories of your preference. 3. Place the downloaded data into paths and directories of your preference.
4. If you wish to run `gde-importer` for the industrial exposure of a country for which the geographical units used in ESRM20 are 30-arcsec cells, please read the [special preliminary steps for 30-arcsec industrial cells](#special-preliminary-steps-for-30-arcsec-industrial-cells) down below. 4. If you wish to run `gde-importer` for the industrial exposure of a country for which the geographic units used in ESRM20 are 30-arcsec cells, please read the [special preliminary steps for 30-arcsec industrial cells](#special-preliminary-steps-for-30-arcsec-industrial-cells) down below.
### Configuration ### Configuration
#### Quickstart: Copy the file `config_example.yml` to your working directory as `config.yml` and provide the necessary parameters. Required parameters are:
Copy the file `config_example.yml` to your working directory as `config.yml` and provide the necessary parameters:
- `exposure_format: esrm20`
- `data_pathname: /path/to/downloaded/data/ESRM20`
- `boundaries_pathname: /path/to/downloaded/ESRM20_boundaries`
#### Configuration file
The following configuration options are available in the `config.yml`:
- `model_name`: Name of the input aggregated exposure model (only relevant for the user).
- `exposure_format`: Format of the input aggregated exposure model. Currently supported values: esrm20.
- `data_pathname`: Path to directory that contains the model data. - `data_pathname`: Path to directory that contains the model data.
- `boundaries_pathname`: Path to directory that contains the boundary geodata files. - `boundaries_pathname`: Path to directory that contains the boundary geodata files.
- `domain_boundary_filepath`: Path to the geodata file that contains the boundaries within which the input aggregated model is defined. - `occupancies_to_run`: List of occupancies for which the code will be run, separated by ", " (comma and space). They need to exist for the indicated `exposure format`. Currently supported values: residential, commercial, industrial.
- `occupancies_to_run`: List of occupancies for which the code will be run, separated by ", " (comma and space). They need to exist for the indicated `exposure format`. Currently supported values: residential, commercial.
- `exposure_entities_to_run`: List of names of exposure entities for which the code will be run. Currently supported options: - `exposure_entities_to_run`: List of names of exposure entities for which the code will be run. Currently supported options:
- "all": The list of names will be retrieved from the metadata of the input aggregated exposure model. - "all": The list of names will be retrieved from the metadata of the input aggregated exposure model.
- A comma-space-separated list of entity names: This list of names will be used. - A comma-space-separated list of entity names: This list of names will be used.
...@@ -83,8 +71,18 @@ The following configuration options are available in the `config.yml`: ...@@ -83,8 +71,18 @@ The following configuration options are available in the `config.yml`:
- `database_built_up`: Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles#obm_built_area_assessments-completeness-assessments-information) where the built-up areas per quadtile are stored. - `database_built_up`: Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles#obm_built_area_assessments-completeness-assessments-information) where the built-up areas per quadtile are stored.
- `database_gde_tiles`: Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles) where information on the GDE tiles is stored. - `database_gde_tiles`: Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles) where information on the GDE tiles is stored.
- `data_units_surface_threshold`: Percentage difference (float between 0.0 and 100.0) of geographic areas to define the need to create data units to fill an exposure entity, if the data units defined in an aggregated exposure model do not fully cover the geographic extents of the exposure entity. Default: 1.0%. - `data_units_surface_threshold`: Percentage difference (float between 0.0 and 100.0) of geographic areas to define the need to create data units to fill an exposure entity, if the data units defined in an aggregated exposure model do not fully cover the geographic extents of the exposure entity. Default: 1.0%.
- `data_units_min_admisible_area`: Minimum surface area (in m2) of data units created to fill an exposure entity if the data units defined in an aggregated exposure model do not fully cover the geographic extents of the exposure entity. Needs to be smaller than `data_units_max_admisible_area`.
- `data_units_max_admisible_area`: Maximum surface area (in m2) of data units created to fill an exposure entity if the data units defined in an aggregated exposure model do not fully cover the geographic extents of the exposure entity. If the area to cover is larger than `data_units_max_admisible_area`, it gets successively subdivided until complying with this requisite. Needs to be larger than `data_units_min_admisible_area`.
Optional parameters are the following:
- `model_name`: Name of the input aggregated exposure model (only relevant for the user).
- `exposure_format`: Format of the input aggregated exposure model. Currently supported values: esrm20.
- `domain_boundary_filepath`: Path to the geodata file that contains the boundaries within which the input aggregated model is defined.
- `force_creation_data_units`: Create data units to fill an exposure entity irrespective of other conditions (e.g. irrespective of `data_units_surface_threshold`). Default: False. - `force_creation_data_units`: Create data units to fill an exposure entity irrespective of other conditions (e.g. irrespective of `data_units_surface_threshold`). Default: False.
Further details on the meaning and use of these parameters can be found in the [documentation](docs/04_Configuration.md).
### Special preliminary steps for 30-arcsec industrial cells ### Special preliminary steps for 30-arcsec industrial cells
The following countries have their industrial exposure models defined in terms of 30-arcsec cells in the ESRM20 model (names and underscores as per naming in ESRM20): Albania, Austria, Belgium, Bosnia_and_Herzegovina, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Moldova, Montenegro, Netherlands, North_Macedonia, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, United_Kingdom. This list might change. This information can be retrieved from the [ESRM20 exposure model repository](https://gitlab.seismo.ethz.ch/efehr/esrm20_exposure), under the path `esrm20_exposure/sources/European_Exposure_Model_Data_Inputs_Sources.xlsx`. The following countries have their industrial exposure models defined in terms of 30-arcsec cells in the ESRM20 model (names and underscores as per naming in ESRM20): Albania, Austria, Belgium, Bosnia_and_Herzegovina, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Moldova, Montenegro, Netherlands, North_Macedonia, Norway, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, United_Kingdom. This list might change. This information can be retrieved from the [ESRM20 exposure model repository](https://gitlab.seismo.ethz.ch/efehr/esrm20_exposure), under the path `esrm20_exposure/sources/European_Exposure_Model_Data_Inputs_Sources.xlsx`.
......
...@@ -3,7 +3,7 @@ exposure_format: esrm20 # Only supported value for now ...@@ -3,7 +3,7 @@ exposure_format: esrm20 # Only supported value for now
data_pathname: path_to_directory_with_model_data data_pathname: path_to_directory_with_model_data
boundaries_pathname: path_to_directory_with_boundary_files boundaries_pathname: path_to_directory_with_boundary_files
domain_boundary_filepath: path_to_file_with_domain_boundary # Boundary that defines the limits of the input aggregated model; optional domain_boundary_filepath: path_to_file_with_domain_boundary # Boundary that defines the limits of the input aggregated model; optional
occupancies_to_run: residential, commercial # Need to exist for the indicated `exposure format`, industrial not supported occupancies_to_run: residential, commercial, industrial # Need to exist for the indicated `exposure format`
exposure_entities_to_run: Luxembourg # Either "all", a comma-space-separated list of entity names, or a name of a .txt or .csv file exposure_entities_to_run: Luxembourg # Either "all", a comma-space-separated list of entity names, or a name of a .txt or .csv file
exposure_entities_code: ISO3 # Either "ISO3" in this or a nested structure with exposure entities names and 3-character codes exposure_entities_code: ISO3 # Either "ISO3" in this or a nested structure with exposure entities names and 3-character codes
number_cores: 1 # Number of cores used for parallelisation number_cores: 1 # Number of cores used for parallelisation
......
# General concepts
Exposure models describe the location and value of assets and people, as well as the connection
to their fragility/vulnerability to specific hazards.
The concept of "aggregated" exposure models arises in contrast with the idea of
"building-by-building" exposure models: in the latter, each individual building is represented
by its geometry and location, while in the former larger groups of buildings are represented by
a single point in space, which is intended to represent a larger geographical extent.
In order to run a damage calculation associated with a specific hazard, an exposure model needs
to assign classes to the individual or grouped buildings, and these classes need to be
meaningful in terms of representing the expected behaviour of the buildings when subject to the
hazard. The latter means that the building classes need to allow the risk modeller to make a
connection with fragility models that represent such behaviour.
In order to run a loss calculation associated with a specific hazard, the exposure model also
needs to assign replacement costs and/or number of occupants to each building (or building
class) or group of buildings (or building classes). Similarly to the case of the damage
calculation, the building classes need to be meaningful in terms of representing the expected
behaviour of the buildings and outcome (in terms of losses) when subject to the hazard.
The Global Dynamic Exposure (GDE) model classifies buildings using the
[GEM Building Taxonomy v3.0](https://github.com/gem/gem_taxonomy). The GEM Building Taxonomy
v3.0 is a faceted taxonomy, which means that it characterises buildings by means of individual
relevant attributes, such as construction material, type of lateral load-resisting system,
number of storeys, expected ductility, etc. Building classes arise from the combination of the
different possible values of these attributes. For example:
- `CR/LFINF+CDL/HBET:2-4` represents a 2- to 4-storey (`HBET:2`) reinforced concrete (`CR`)
infilled frame (`LFINF`) with low seismic code design level (`CDL`),
- `MUR+CL/LWAL+CDN/H:1` represents a 1-storey (`H:1`) fired clay-unit (`CL`) unreinforced
masonry (`MUR`) wall-system (`LWAL`) building with no seismic code design (`CDN`).
The distribution of buildings into different classes can be conceptually decomposed into a
product between the total number of buildings in a certain geographical area/location and the
proportion each building class represents with respect to the total. As would be expected, the
proportions of all classes should add up to 1.0. When the total number of buildings is 1, these
proportions can be thought of as representing the probabilities of that particular building
belonging to any of those classes. This conceptualisation is relevant because, when
distributing an aggregated exposure model to a grid of level 18 tiles, the `gde-importer`
treats them as separate processes: on the one hand, the total number of buildings in a
geographical area are distributed and, on the other, the distribution of building classes for
that geographical area applies in all the tiles covered by it. This later becomes true about
buildings retrieved from [OpenStreetMap](https://www.openstreetmap.org) by the GDE model: the
whole distribution of building classes applies to them as well (narrowing down based on known
attributes when possible).
It is not uncommon for aggregated exposure models to be associated with a specific occupancy
case. Usual occupancy cases include residential, commercial and industrial buildings, for
example. Other occupancy cases may be educational, agricultural or governmental buildings, as
well as many others. The total number of buildings and distribution of building classes
described above are thus associated with a specific occupancy case in the GDE model (and,
consequently, the `gde-importer`).
# Organisation of the Geographic Space
The `gde-importer` organises (or subdivides) the geographic space by means of five main
concepts:
- the `aggregated exposure model`
(class [`AggregatedExposureModel`](../gdeimporter/aggregatedexposuremodel.py#L35) and its
sub-classes)
- the `exposure entity` (class [`ExposureEntity`](../gdeimporter/exposureentity.py#L32))
- the `occupancy cases` (attribute of `ExposureEntity`)
- the `data unit` (class [`DataUnit`](../gdeimporter/dataunit.py#L42))
- the `data-unit tile` (attribute of `DataUnit`)
## `Aggregated Exposure Model`
The [`AggregatedExposureModel`](../gdeimporter/aggregatedexposuremodel.py#L35) class represents
an input aggregated exposure model. Geographically speaking, it covers all the areas in which
the aggregated exposure model is defined.
Because different input aggregated exposure models might have their peculiarities, particularly
in what refers to the structure of their data, the `AggregatedExposureModel` is intended as a
general class from which specific sub-classes stem. In the current version of `gde-importer`
only one sub-class of `AggregatedExposureModel` is implemented, `ExposureModelESRM20`, which
allows to import the exposure model of the European Seismic Risk Model 2020 (ESRM20) (Crowley
et al., 2020).
### `Exposure Model ESRM20`
The [`ExposureModelESRM20`](../gdeimporter/aggregatedexposuremodel.py#L341) sub-class
represents the structure and contents of the exposure model of the European Seismic Risk Model
2020 (ESRM20, Crowley et al., 2020). The ESRM20 exposure model covers three occupancy cases
(residential, commercial and industrial) for 44 European countries (names include underscores
used internally by ESRM20 and `gde-importer`): Albania, Andorra, Austria, Belgium,
Bosnia_and_Herzegovina, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France,
Germany, Gibraltar, Greece, Hungary, Ireland, Iceland, Isle_of_Man, Italy, Kosovo, Latvia,
Liechtenstein, Lithuania, Luxembourg, Malta, Moldova, Monaco, Montenegro, Netherlands, Norway,
North_Macedonia, Poland, Portugal, Romania, Serbia, Slovakia, Slovenia, Spain, Sweden,
Switzerland, Turkey, and the United_Kingdom.
<img src="images/aem_ESRM20.png" width=75%>
Fig. 2.1 The `AggregatedExposureModel` and `ExposureModelESRM20` classes, with some of their
attributes.
## `Exposure Entity`
The [`ExposureEntity`](../gdeimporter/exposureentity.py#L32) class represents a geographic unit
where an exposure model is defined, which can be further sub-divided into smaller data units.
What makes several data units belong to the same exposure entity is the existence of
characteristics/parameters/properties that are defined at the level of the exposure entity and
thus apply to all its data units.
In the case of ESRM20, the exposure entities are the 44 European countries covered by the model.
However, it is also possible for an aggregated exposure model to consist of only one exposure
entity. The exposure entities can correspond to any geographic level: what defines them is the
fact that certain properties of the model are defined at this level.
Properties that are expected to be defined at the level of the exposure entity are, for
example, the distribution of people in buildings at different times of the day (day, night,
transit), or the replacement cost per area unit.
Exposure entities are identified with a 3-character code. If the exposure entity is a country,
the user can indicate to the `gde-importer` to use the alpha-3
[ISO 3166 country codes](https://www.iso.org/iso-3166-country-codes.html), though it is always
an option to manually indicate a 3-character code of one's choice (which will be necessary in
the case in which the exposure entities are not countries).
<img src="images/exposure_entity.png" width=75%>
Fig. 2.2 Example of an instance of the `ExposureEntity` class within an instance of
`ExposureModelESRM20` (and some of their attributes).
## `Occupancy Cases`
Occupancy cases are broad groupings of buildings according to their use, i.e. their occupancy.
Usual occupancy cases include residential, commercial and industrial buildings, for example.
Other occupancy cases may be educational, agricultural or governmental buildings, among many
others. It is not uncommon for aggregated exposure models to be associated with a specific
occupancy case, or series of occupancy cases, for which different building classes are expected
and potentially even different techniques, strategies and assumptions may have been used in
their creation. In the case of ESRM20, for example, the exposure models for different occupancy
cases of the same exposure entity may be defined at different administrative levels. For these
reason, occupancy cases are an
[attribute of ExposureEntity](../gdeimporter/exposureentity.py#L46), within which other smaller
geographic units (the data units) exist.
It is noted that the word _case_ is used herein to distinguish "occupancy cases" from other
categorisations of occupancy that are necessary. Within the Global Dynamic Exposure Model, the
occupancy of buildings is defined by means of the
[GEM Building Taxonomy v3.0](https://github.com/gem/gem_taxonomy), and this is referred to as
the occupancy _type_ of the building. An occupancy _case_ can encompass several occupancy
_types_. For example, the occupancy types `COM1` (retail trade), `COM2` (wholesale trade),
`COM3` (offices), `COM5` (restaurants, bars, cafes) and `RES3` (hotels, motels, guest lodges,
etc.) are all grouped under the occupancy case `commercial` in the ESRM20 model.
## `Data Unit`
The [`DataUnit`](../gdeimporter/dataunit.py#L42) class represents the smallest geographic unit
of an exposure entity where an exposure model is defined for a particular occupancy case. Their
size defines the resolution of the exposure model. They may represent an administrative unit or
not, may be cells of a grid, may be Voronoi cells, may be any arbitrarily shaped area used by
the input aggregated exposure model to define number and classes of buildings. In the case of
ESRM20, for example, residential and commercial exposure models are defined in terms of
administrative units of different levels (see, e.g., the [Nomenclature of Territorial Units for
Statistics](https://ec.europa.eu/eurostat/web/nuts/background) standard) while industrial
exposure models are defined in terms of administrative units for some countries and 30-arcsec
cells for others where the method of Sousa et al. (2017) was used for their generation.
Because of the possibility for data units to be defined in very different terms and the fact
that their definition depends on the occupancy case (even for the same exposure entity), the
[`occupancy_cases`](../gdeimporter/exposureentity.py#L46) attribute of `ExposureEntity` is a
dictionary that includes keys that define the kind of data units used. Within that same
dictionary, the key `data_units` contains a dictionary of instances of the
[`DataUnit`](../gdeimporter/dataunit.py#L42) class, each of which represents one of the data
units of the exposure entity for a particular occupancy case.
By definition, the whole area covered by a data unit has the same distribution of possible
building classes. By definition as well, data units cannot be separated from the occupancy case
they correspond to: this means that it is not possible to speak of "the data units of exposure
entity X" in a general sense, and it is only possible to speak of "the data units of exposure
entity X for Y occupancy case". If two different occupancy cases of the same exposure entity
use data units that refer to the exact same geographic areas, these are still treated as
separate data units by the `gde-importer` (even if they share the same data unit ID).
<img src="images/occupancy_cases_and_data_units.png" width=75%>
Fig. 2.3 Illustrative example of instances of the `DataUnit` class and their relation with
`occupancy cases` and an instance of the `ExposureEntity` class.
## `Data-Unit Tile`
Data-unit tiles result from intersecting data units, which carry information on the number of
buildings and building classes of a particular occupancy, with zoom level 18 tiles, which are
the unit selected for the Global Dynamic Exposure Model to create the link between input
aggregated exposure models and individual buildings from OpenStreetMap.
The zoom level 18 tiles used by GDE are Web Mercator tiles and thus look square-shaped when
visualised under EPSG:3857 and only exist between latitudes 85.0511 S and 85.0511 N (and the
full range of longitudes). Tiles are uniquely identified by a quadkey: a string of digits that
can only be 0, 1, 2 or 3 and whose total length reveals the zoom level at which it is defined
(e.g. '0113' is a tile defined at zoom level 4).
Data-unit tiles that are fully contained within their corresponding data unit are geometrically
identical to their associated zoom level 18 tile. However, tiles that are crossed by boundaries
of different data units result in two or more data-unit tiles.
As data-unit tiles stem from data units, data-unit tiles cannot be separated either from the
occupancy case of the data unit they correspond to. As a consequence, if two different
occupancy cases of the same exposure entity use data units that refer to the exact same
geographic areas, which leads to data-unit tiles that refer to the exact same geographic areas
as well, these are still treated as separate data-unit tiles by the `gde-importer`.
As shown in Fig. 2.4, data-unit tiles are an attribute of the [DataUnit](#data-unit) class,
named `data_unit_tiles`. This is a
[pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object
in which each row corresponds to a data-unit tile (see [here](07_Creation_Data_Unit_Tiles.md))
for further details.
<img src="images/data_unit_tiles.png" width=75%>
Fig. 2.4 Illustrative example of the definition of `data-unit tiles`.
# Organisation of the Building Information
Exposure models describe the location and value of assets and people, as well as the connection
to their fragility/vulnerability to specific hazards. Different attributes of buildings, their
value and occupants may be defined at different geographic levels, as explained below.
The distribution of buildings into different classes can be conceptually decomposed into a
product between the total number of buildings in a certain geographic area/location and the
proportion each building class represents with respect to the total.
## Distribution of the Population During Different Times of the Day
People usually occupy different buildings at different times of the day, depending on their
daily activities. Aligned with the approach of the European Seismic Risk Model 2020 (ESRM20,
Crowley et al., 2020), which uses a modified version of the PAGER population distribution
model (Jaiswal and Wald, 2010), the factors by wich the census or statistical average
population per building can be multiplied to obtain an estimate of the people in the buildings
at a certain time of the day are attributes of the `ExposureEntity` class, within each
`occupancy case`.
For example, the factors associated with the residential occupancy case can be found under:
```
ExposureEntity.occupancy_cases["residential"]["population_time_distribution"]
```
`population_time_distribution` is a dictionary with three keys:
- `Day`: factor by which to multiply census people to obtain the number of people during day
time (approx. 10 am to 6 pm).
- `Night`: factor by which to multiply census people to obtain the number of people during
night time (approx. 10 pm to 6 am).
- `Transit`: factor by which to multiply census people to obtain the number of people during
transit time (approx. 6 am to 10 am and 6 pm to 10 pm).
## Disaggregation of Replacement Costs
Total building replacement costs can be disaggregated into three main categories: structural
components, non-structural components, and contents. The factors by which the total replacement
costs can be multiplied to obtain any of the three are attributes of the `ExposureEntity`
class, within each `occupancy case`.
For example, the factors associated with the residential occupancy case can be found under:
```
ExposureEntity.occupancy_cases["residential"]["costs_disaggregation"]
```
`costs_disaggregation` is a dictionary with three keys:
- `Structural`: factor by which to multiply total costs to obtain the cost of the structural
components.
- `Non-Structural`: factor by which to multiply total costs to obtain the cost of the
non-structural components.
- `Contents`: factor by which to multiply total costs to obtain the cost of the building
contents (e.g. furniture, appliences, machinery, merchandise, etc.).
## Totals per Data Unit
The `DataUnit` class, which represents the smallest geographic unit of an exposure entity where
an exposure model is defined for a particular occupancy case, posseses the following attributes:
- `total_buildings`: total number of buildings as per the aggregated exposure model.
- `total_dwellings`: total number of dwellings as per the aggregated exposure model.
- `total_people`: dictionary with the total number of people in the data unit, subclassified as:
- `Census`: total number of people expected to be inside all buildings in the data unit,
obtained from distributing the census population to the buildings, without consideration of
the time of the day or distribution of the population across different economic/educational/
recreational activities.
- `Day`: total number of people expected to be inside all buildings in the data unit during
the day (approx. 10 am to 6 pm).
- `Night`: total number of people expected to be inside all buildings in the data unit during
the night (approx. 10 pm to 6 am).
- `Transit`: total number of people expected to be inside all buildings in the data unit
during transit times (approx. 6 am to 10 am and 6 pm to 10 pm).
- `total_cost`: dictionary with the monetary value of all the buildings in the data unit (cost
of `total_buildings`), subclassified as:
- `Total`: total replacement cost of all buildings, including structural and non-structural
components, as well as contents.
- `Structural`: replacement cost of the structural components of all buildings.
- `Non-Structural`: replacement cost of the non-structural components of all buildings.
- `Contents (float)`: replacement cost of the contents of all buildings.
## Building Classes Proportions and Properties
The `total_buildings` attribute of the `DataUnit` class described above only indicates the
total number of buildings of a certain occupancy case in a data unit, but provides no details
on their characteristics. The attribute `building_classes_proportions_and_properties` of the
`DataUnit` class has this purpose instead. The attribute is a
[pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object
in which each row corresponds to a different building class. The column `proportions` indicates
the contribution of each building class to the total building stock of the data unit and, as a
consequence, the column must add up to 1.0. Multiplying the `proportions` column by the
`total_buildings` attribute results in the number of buildings per building class in the data
unit.
The building class is defined by a combination of three columns of the DataFrame:
- `building_class_name`: name of the building classes associated with the data unit, as per the
[GEM Building Taxonomy v3.0](https://github.com/gem/gem_taxonomy).
- `settlement_type`: type of settlement within the data unit. E.g. "urban", "rural",
"big_city", "all".
- `occupancy_subtype`: details on the occupancy, if available and relevant to characterise the
building classes.
The [GEM Building Taxonomy v3.0](https://github.com/gem/gem_taxonomy)
is a faceted taxonomy, which means that it characterises buildings by means of individual
relevant attributes, such as construction material, type of lateral load-resisting system,
number of storeys, expected ductility, etc. Building classes arise from the combination of the
different possible values of these attributes. For example:
- `CR/LFINF+CDL/HBET:2-4` represents a 2- to 4-storey (`HBET:2`) reinforced concrete (`CR`)
infilled frame (`LFINF`) with low seismic code design level (`CDL`),
- `MUR+CL/LWAL+CDN/H:1` represents a 1-storey (`H:1`) fired clay-unit (`CL`) unreinforced
masonry (`MUR`) wall-system (`LWAL`) building with no seismic code design (`CDN`).
Knowledge on these attributes allows the risk modeller to select/assign/define appropriate
fragility and/or vulnerability models for each of these classes.
Due to its relevance for the connection between imported aggregated exposure models and
individual buildings retrieved from [OpenStreetMap](https://www.openstreetmap.org) by the
Global Dynamic Exposure (GDE) model, the `gde-importer` processes the `building_class_name`
strings to retrieve the range of number of storeys that the building class refers to. These
interpretation leads to filling in the `storeys_min` and `storeys_max` columns of the
`building_classes_proportions_and_properties` DataFrame. In the two examples above, the values
of `storeys_min` and `storeys_max` would be 2 and 4 for the first case, and 1 and 1, for the
second case. If the range of number of storeys does not have an upper bound, 9999 is assigned
to `storeys_max`.
Apart from the aforementioned ones, `building_classes_proportions_and_properties` contains two
further building properties that are extremely relevant for the calculation of losses in a risk
assessment:
- `census_people_per_building`: number of census-derived people per building (i.e. number of
occupants, not accounting for the time of the day). These values can be multiplied by the
`Day`, `Night` and `Transit` factors corresponding to the data unit's exposure entity and
occupancy case (`ExposureEntity.occupancy_cases[case]["population_time_distribution"]`) to
obtain the number of occupants at different times of the day.
- `total_cost_per_building`: total replacement cost per building, including costs of structural
and non-structural components as well as contents. These values can be multiplied by the
`Structural`, `Non-Structural` and `Contents` factors corresponding to the data unit's exposure
entity and occupancy case (`ExposureEntity.occupancy_cases[case]["costs_disaggregation"]`) to
obtain the replacement costs disaggregated by each of these categories.
Within a data unit (which implies a specific occupancy case as well), each unique combination
of `building_class_name`, `settlement_type`and `occupancy_subtype` has unique values of
`census_people_per_building` and `total_cost_per_building` associated with them.
## Number of Buildings per Data-Unit Tile
Being zoom level 18 tiles the spatial unit selected for the resolution of the Global Dynamic
Exposure Model, one of the main tasks of the `gde-importer` is to distribute the total number
of buildings in a data unit onto its data-unit tiles. As explained in the chapter
[Processing Logic](05_Processing_Logic.md), this is done proportionally to the built-up area
associated with each tile, which is retrieved from the
[Global Human Settlement (GHS)](https://data.jrc.ec.europa.eu/dataset/jrc-ghsl-10007)
multitemporal built-up area layer. The total number of buildings associated with a data-unit
tile is assigned to the `aggregated_buildings` column of the `data_unit_tiles`
[pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object,
which is an attribute of the `DataUnit` class. This value can be multiplied by the
`proportions` column of the `building_classes_proportions_and_properties` DataFrame of the
corresponding data unit to obtain the number of buildings per building class in the data-unit
tile.
# Configuration
User-configurable parameters need to be provided in a file named `config.yml`, located in the working directory. The file [config_example.yml](../config_example.yml) in this repository can be used as a starting point.
## General parameters
- `model_name` (optional): Name of the input aggregated exposure model (only relevant for the user). E.g. "ESRM20".
- `exposure_format` (optional): Format of the input aggregated exposure model. Currently supported values: esrm20.
- `data_pathname` (required): Path to directory that contains the model data. For example, if the input aggregated exposure model is ESRM20, and the [ESRM20 repository](https://gitlab.seismo.ethz.ch/efehr/esrm20_exposure) has been cloned to the path `/home/username/`, then `data_pathname=/home/username/esrm20_exposure`.
- `boundaries_pathname` (required): Path to directory that contains the boundary geodata files.
## Parameters that control what cases are run
An input aggregated exposure model may cover different [exposure entities](02_Organisation_Geographic_Space.md#exposure-entity) and different [occupancy cases](02_Organisation_Geographic_Space.md#occupancy-cases). These parameters allow to control which are run when calling `gde-importer`:
- `occupancies_to_run` (required): List of occupancies for which the code will be run, separated by ", " (comma and space). They need to exist for the indicated `exposure_format`. Currently supported values: residential, commercial, industrial.
- `exposure_entities_to_run` (required): List of names of exposure entities for which the code will be run. Currently supported options:
- "all": The list of names will be retrieved from the metadata of the input aggregated exposure model.
- A comma-space-separated list of entity names: This list of names will be used.
- A full path to a .txt or .csv file: The list of names will be retrieved from the indicated .txt/.csv file.
## Parameters needed to access databases
- `database_built_up` (required): Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles#obm_built_area_assessments-completeness-assessments-information) where the built-up areas per quadtile are stored. The `sourceid` of the built-up areas needs to be indicated as a nested parameter. The `gde-importer` assumes that this database contains a table named `obm_built_area_assessments`. In order to connect to the database, some of all of the following parameters may be required:
- host: name of the host
- dbname: name of the database
- port: port number
- username: user name
- password: password associated with `username`
- `database_gde_tiles` (required): Credentials for the [database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles) where information on the GDE tiles is stored. The `gde-importer` assumes that this database contains the tables indicated in the link. In order to connect to the database, some of all of the following parameters may be required:
- host: name of the host
- dbname: name of the database
- port: port number
- username: user name
- password: password associated with `username`
## Parameters associated with ensuring full geographic coverage
Input aggregated exposure models may not have exposure defined for the complete territory of an exposure entity. As explained [here](06_Ensuring_Full_Geographic_Coverage.md), the `gde-importer` contains a special routine to ensure that the whole territory is covered by a potential distribution of building classes. The following parameters control this routine:
- `data_units_surface_threshold` (required): Percentage difference (float between 0.0 and 100.0) of geographic areas to define the need to create data units to fill an exposure entity, if the data units defined in an aggregated exposure model do not fully cover the geographic extents of the exposure entity. Default (and reasonable value based on observation of existing models): 1.0%.
- `force_creation_data_units` (optional): True or False. If True, create data units to fill an exposure entity irrespective of other conditions that are automatically verified by the code (such as `data_units_surface_threshold`, for example). If this parameter is not provided, the `gde-importer` takes it as False.
- `data_units_min_admisible_area` (required): Minimum surface area (in m2) of data units created to fill an exposure entity. It needs to be smaller than `data_units_max_admisible_area`. Its purpose is to avoid the creation of data units that are only artefacts of the resolution of the input boundaries and the accuracy of the geometric calculations carried out. Suggested value: 0.1 (m2).
- `data_units_max_admisible_area` (required): Maximum surface area (in m2) of data units created to fill an exposure entity. If the area to cover is larger than `data_units_max_admisible_area`, it gets successively subdivided until complying with this requisite. It needs to be larger than `data_units_min_admisible_area`. Suggested value: 3e9 (m2).
## Other parameters
- `domain_boundary_filepath` (optional): Path to the geodata file (including the name and extension of the file itself) that contains the boundaries within which the input aggregated model is defined. If provided, `gde-importer` verifies that the boundaries of the exposure entities lie inside it and cut out areas that may fall outside. This is relevant for cases in which the geodata files associated with an exposure entity may include overseas territories that are located, e.g., in other continents and are thus not covered by the input aggregated exposure model.
- `number_cores` (required): Number of cores (integer) used for parallelising the creation and storage of data-unit tiles. If larger than 1, individual data units are sent to different cores to be processed in parallel.
- `exposure_entities_code` (required): This parameter controls the creation of the 3-character code that the `gde-importer` uses to identify [exposure entities](02_Organisation_Geographic_Space.md#exposure-entity). The 3-character code is appended to the begining of the IDs of data units as well (e.g. a data unit with ID "38271" in Greece is stored as "GRC_38271"). If the exposure entities of the input aggregated exposure model are countries, it is recommended to set this parameter to "ISO3", in which case the `gde-importer` will retrieve the corresponding alpha-3 [ISO 3166 country code](https://www.iso.org/iso-3166-country-codes.html), using the [iso3166 library](https://github.com/deactivated/python-iso3166). Alternatively, a nested structure with exposure entities names and 3-character codes can be provided. For example:
```
exposure_entities_code:
Europe: EUE
North_America: NNN
South_America: SRR
...
```
# Processing Logic
When running `gdeimporter`, it is the `main()` function in
[gdeimporter.py](../gdeimporter/gdeimporter.py) that is called and the program is set to run.
The processing logic described herein is the one adopted at present for the `gde-importer`, but
the modularity of the code allows for it to be easily reorganised or modified in the future if
needed. In other words, there are aspects of this processing logic that do not need to follow
the order in which they are currently carried out.
## Reading the Configuration File
The first task carried out by the `gde-importer` is reading and interpreting the
[configuration file](04_Configuration.md). This is done by means of the
[Configuration](../gdeimporter/configuration.py#L29) class, which verifies that all the
required (i.e. non-optional) parameters are available and that their values/types are as
expected by the program. If a problem is found that does not allow the program to run, it will
raise an error and stop.
## Instantiating the Relevant `AggregatedExposureModel` and `ExposureEntity` Classes
Thanks to the `exposure_format` parameter in the configuration file, the `gde-importer` knows
which sub-class of the
[AggregatedExposureModel](02_Organisation_Geographic_Space.md#aggregated-exposure-model) class
to instantiate. During instantiation of the subclass, the object that represents the aggregated
exposure model is created and the attributes associated with indicating its data structure are
assigned. Moreover, critical information about its
[exposure entities](02_Organisation_Geographic_Space.md#exposure-entity) is retrieved from its
metadata files, by means of the `retrieve_exposure_entities` method.
The `retrieve_exposure_entities` method carries out the following actions:
- Retrieve the names of all exposure entities associated with the aggregated exposure model.
- For each exposure entity and occupancy case (covered by the aggregated exposure model):
- retrieve details on how the data units are defined (administrative units of which level or
30-arcsec cells, for example);
- retrieve the factors associated with the
[distribution of the population during different times of the day](03_Organisation_Building_Information.md#distribution-of-the-population-during-different-times-of-the-day);
- retrieve the factors associated with the
[disaggregation of replacement costs](03_Organisation_Building_Information.md#disaggregation-of-replacement-costs);
- For each exposure entity, retrieve its boundary and "trim" it if part of it falls outside the
geometry of the domain of the aggregated exposure model (only if if is provided by means of
specifying the `domain_boundary_filepath` parameter in the configuration file).
- Create an [ExposureEntity](02_Organisation_Geographic_Space.md#exposure-entity) object for
each exposure entity, and assign them to the corresponding attribute of the
`AggregatedExposureModel`.
It is noted that, at this stage, metadata is retrieved for all exposure entities and all
occupancy cases covered by the aggregated exposure model, not just the ones specified within
the configuration file under the parameters `exposure_entities_to_run` and
`occupancies_to_run`. This is needed so as to be able to assign
`exposure_entities_to_run="all"` in the configuration file and have the `gde-importer`
automatically interpret this as a list of names of exposure entities without the need for the
user to enumerate them. This is particularly useful for models that cover several exposure
entities (e.g. ESRM20).
## Storing Information on the Source of the Aggregated Exposure Model
After having instantiated the `AggregatedExposureModel` and `ExposureEntity` classes, the
`gde-importer` stores the `name` and `format` of the aggregated exposure model in the
[aggregated_sources](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles#aggregated_sources-information-about-sources-of-aggregated-exposure-models)
table of the GDE Tiles database. If an entry already exists for that `name`, the `format` is
updated as per the new value and the `aggregated_source_id` is retrieved; otherwise, a new
entry is created and a new `aggregated_source_id` is assigned. The `aggregated_source_id` is a
serial primary key of the table, which means that PSQL automatically assigns an integer that is
larger than the last one assigned by +1. Even if entries are erased from the table, integer
values of `aggregated_source_id` are not repeated or "recycled". The value of
`aggregated_source_id` is needed to store all other data processed/generated by the
`gde-importer`.
## Interpreting the Exposure Entities to Run
Having retrieved the names of all the exposure entities associated with the input aggregated
exposure model, [gdeimporter.py](../gdeimporter/gdeimporter.py) calls the
`interpret_exposure_entities_to_run` method of the
[Configuration](../gdeimporter/configuration.py) class, which transforms a value of `"all"` of
`exposure_entities_to_run` to a list of all the retrieved names of the exposure entities, or
reads the list of exposure entities from the indicated `txt` or `csv` file, or does nothing and
keeps the list provided by the user, if that is the case. The outcome of this operation is
stored as the `exposure_entities_to_run` attribute of the `Configuration` class.
## Processing Each Exposure Entity and Occupancy Case
The tasks that follow are carried out for each exposure entity listed in the
`exposure_entities_to_run` attribute and each occupancy case listed in the `occupancies_to_run`
attribute of the `Configuration` class.
### Retrieving All Data Units
The `get_data_units` method of the
[AggregatedExposureModel](../gdeimporter/aggregatedexposuremodel.py#L35) class (and/or its
subclasses) first retrieves the IDs of all data units associated with the exposure entity and
occupancy case. Then, for each data unit it carries out the following actions:
- Retrieve the boundary.
- Retrieve the