Commit cfcc5133 authored by Cecilia Nievas's avatar Cecilia Nievas
Browse files

Added documentation

parent 1c63a616
Pipeline #44875 passed with stage
in 2 minutes and 42 seconds
......@@ -92,6 +92,9 @@ database where information on the OBM buildings is stored.
database where information on the OSM-completeness of tiles is stored.
- `number_cores`: Number of cores used for parallelisation.
Further details on the meaning and use of these parameters can be found in the
[documentation](docs/04_Configuration.md).
## Running gde-core
From the working directory (where you placed `config.yml`), run the code by typing:
......@@ -130,3 +133,17 @@ solutions will be better for different programs; see section 13 for the
specific requirements.
See the [LICENSE](./LICENSE) for the full license text.
## Acknowledgements
This project is partially funded by:
- the Real-time Earthquake Risk Reduction for a Resilient Europe (RISE) project, which has
received funding from the European Union’s Horizon 2020 research and innovation programme under
grant agreement No 821115;
- the Large-scale EXecution for Industry and Society (LEXIS) project, which has received funding
from the European Union’s Horizon 2020 research and innovation programme under grant agreement
No 825532;
- the Airborne Observation of Critical Infrastructures (Luftgestützte Observation Kritischer
Infrastrukturen, LOKI in German) project, which has received funding from the German Federal
Ministry for Education and Research (BMBF) under funding code (FKZ) 03G0890D.
# General concepts
Exposure models describe the location and value of assets and people, as well as the connection
to their fragility/vulnerability to specific hazards.
The concept of "aggregated" exposure models arises in contrast with the idea of
"building-by-building" exposure models: in the latter, each individual building is represented
by its geometry and location, while in the former larger groups of buildings are represented by
a single point in space, which is intended to represent a larger geographical extent.
In order to run a damage calculation associated with a specific hazard, an exposure model needs
to assign classes to the individual or grouped buildings, and these classes need to be
meaningful in terms of representing the expected behaviour of the buildings when subject to the
hazard. The latter means that the building classes need to allow the risk modeller to make a
connection with fragility models that represent such behaviour.
In order to run a loss calculation associated with a specific hazard, the exposure model also
needs to assign replacement costs and/or number of occupants to each building (or building
class) or group of buildings (or building classes). Similarly to the case of the damage
calculation, the building classes need to be meaningful in terms of representing the expected
behaviour of the buildings and outcome (in terms of losses) when subject to the hazard.
The [Global Dynamic Exposure](02_GDE_Model.md) (GDE) model classifies buildings using the
[GEM Building Taxonomy v3.0](https://github.com/gem/gem_taxonomy) (Silva et al., 2022). The GEM
Building Taxonomy v3.0 is a faceted taxonomy, which means that it characterises buildings by
means of individual relevant attributes, such as construction material, type of lateral
load-resisting system, number of storeys, expected ductility, etc. Building classes arise from
the combination of the different possible values of these attributes. For example:
- `CR/LFINF+CDL/HBET:2-4` represents a 2- to 4-storey (`HBET:2`) reinforced concrete (`CR`)
infilled frame (`LFINF`) with low seismic code design level (`CDL`),
- `MUR+CL/LWAL+CDN/H:1` represents a 1-storey (`H:1`) fired clay-unit (`CL`) unreinforced
masonry (`MUR`) wall-system (`LWAL`) building with no seismic code design (`CDN`).
It is not uncommon for aggregated exposure models to be associated with a specific occupancy
case. Usual occupancy cases include residential, commercial and industrial buildings, for
example. Other occupancy cases may be educational, agricultural or governmental buildings, as
well as many others. The total number of buildings and distribution of building classes
described above are thus associated with a specific occupancy case in the GDE model (and,
consequently, the `gde-core`).
# The Global Dynamic Exposure (GDE) Model
The Global Dynamic Exposure (GDE) model is a high-resolution exposure model that is generated by
means of combining:
1. Aggregated exposure models.
2. Data on individual buildings from [OpenStreetMap](https://www.openstreetmap.org).
3. Remote sensing-derived built-up areas from the Global Human Settlement Layer (GHSL; Corbane
et al., 2018).
Fig. 2.1 gives an overview of how these elements are processed and combined to create the GDE
model.
<img src="images/gde_model_overview.png" width=75%>
Fig. 2.1 Conceptual overview of OpenBuildingMap (OBM) and the Global Dynamic Exposure (GDE) model.
Aggregated exposure models (such as that of the European Seismic Risk Model 2020, ESRM20,
Crowley et al., 2020) are distributed onto a grid of zoom-level 18 quadtiles proportionally to
the built-up area expected in each tile. This task is carried out by the
[gde-importer](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/gde-importer/-/tree/master/),
which also processes data on building classes, their attributes (e.g. people per building,
replacement cost per building) and other relevant exposure parameters. Currently, `gde-importer`
can only import the format of the exposure model of the European Seismic Risk Model 2020
(ESRM20) (Crowley et al., 2020), but further formats can be implemented in the future. The
expected built-up areas in each tile are an output of
[obmgapanalysis](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/obmgapanalysis),
which processes the [Global Human Settlement Layer (GHSL) built-up area layer with multi-temporal
analysis and global coverage](https://data.jrc.ec.europa.eu/dataset/jrc-ghsl-10007)
(Corbane et al., 2018) for this purpose.
Independently from this,
[rabotnik-obm](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/rabotnik-obm)
retrieves buildings from [OpenStreetMap](https://www.openstreetmap.org) together with some
additional attributes, and assigns them occupancy types as per the
[GEM Building Taxonomy v3.0](https://github.com/gem/gem_taxonomy) (Silva et al., 2022), the
resulting product being the `OpenBuildingMap`, or OBM.
OBM buildings are assigned building classes as per the aggregated exposure model, as a function
of their geographic location and their occupancy type. Only occupancy types that belong to
occupancy cases covered by the imported aggregated exposure models can be assigned building
classes. This is done by the
[gde-core](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/gde-core), as
explained
[here](04_Processing_Logic.md#retrieve-obm-buildings-and-assign-building-classes-and-probabilities-to-them).
As `OpenStreetMap` is not yet complete everywhere, that is, as not all buildings that exist in
reality are represented in `OpenStreetMap`, the final number of buildings in each tile of the
GDE model is calculated by combining numbers of buildings from the aggregated exposure model and
OpenBuildingMap, taking into account the completeness of the tile. The completeness of the tile
is determined by
[obmgapanalysis](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/obmgapanalysis) in
an automatic fashion, by calculating the ratio between the area of building footprints from
`OpenStreetMap` and the expected built-up area as per GHSL, and using a threshold to decide
whether the tile is or not complete. The threshold was selected by analysing the built-up ratios
obtained for the region of Attica (Greece), for which a manual assessment of completeness was
carried out. If a tile is incomplete,
[gde-core](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/gde-core) compares
the number of buildings in the tile as per the distribution of the aggregated exposure model
onto the grid against the number of OBM buildings. If the former is larger than the latter, a
number of so-called "remainder" buildings is calculated as the difference between the two and
assigned to the tile. These are buildings that are expected to exist in the tile but are not yet
represented in `OpenStreetMap`. If the number of OBM buildings is larger than that of the
so-called "aggregated" buildings, no remainder buildings are assigned to the tile. If the tile
is complete, no remainder buildings are assigned to the tile either. Further details can be
found [here](04_Processing_Logic.md#calculate-remainder-buildings-in-data-unit-tiles).
The resulting GDE model is thus a combination of individual OBM buildings (re-named as GDE
buildings once building classes have been assigned to them) and remainder buildings defined in
zoom-level 18 quadtiles.
# Configuration
User-configurable parameters need to be provided in a file named `config.yml`, located in the
working directory. The file [config_example.yml](../config_example.yml) in this repository can
be used as a starting point. Currently all parameters shown in the example file are mandatory to
run `gde-core` (i.e. none of them is optional).
## General parameters
- `model_name` (required): Name of the input aggregated exposure model to be processed. It needs to have
been imported by
[gde-importer](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/gde-importer)
already and thus exist in the `aggregated_sources` table of the database specified under
`database_gde_tiles` in the configuration file, which needs to follow the structure of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).
## Parameters that control what cases are run
An input aggregated exposure model may cover different exposure entities and different occupancy
cases. These parameters allow to control which are run when calling `gde-core` (they need to
have been processed already by
[gde-importer](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/gde-importer)):
- `occupancies_to_run` (required): List of occupancies for which the code will be run, separated
by ", " (comma and space). They need to exist for the indicated `model_name`. Currently
supported values: residential, commercial, industrial.
- `exposure_entities_to_run` (required): List of names of exposure entities for which the code
will be run. Currently supported options:
- "all": The list of 3-character codes of all exposure entities associated with `model_name`
will be retrieved from the `data_units` table of the
[GDE Tiles](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles)
database.
- A comma-space-separated list of entity names: This list of names will be transformed into a
list of their associated 3-character codes, by means of what is specified in
`exposure_entities_code` (see below).
- A full path to a .txt or .csv file: The list of names will be retrieved from the indicated
.txt/.csv file and it will be transformed into a list of their associated 3-character codes,
by means of what is specified in `exposure_entities_code` (see below).
## Parameters needed to access databases
- `database_gde_tiles` (required): Credentials for the
[GDE Tiles](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles)
database where information on the GDE tiles is stored.
- `database_obm_buildings` (required): Credentials for the
[OBM Buildings](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmbuildings)
database where information on the OBM buildings is stored.
- `database_completeness` (required): Credentials for the
[OBM Tiles](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles)
database where information on the OSM-completeness of tiles is stored. Apart from the parameters
listed below, the `sourceid` from which the completeness values will be read needs to be
indicated within a nested structure. This parameter will be sought for in the `source_id` column
of the `obm_built_area_assessments` table of the `OBM Tiles` database.
The `gde-core` assumes that these databases contain the tables indicated in each corresponding
link. In order to connect to the databases, some or all of the following parameters may be
required (as nested elements of `database_gde_tiles`, `database_obm_buildings` and
`database_completeness`):
- host: name of the host
- dbname: name of the database
- port: port number
- username: user name
- password: password associated with username
## Other parameters
- `exposure_entities_code` (required): This parameter controls how the exposure entities listed
under `exposure_entities_to_run` get converted into their 3-character codes. The only case in
which `exposure_entities_code` is not used is when `exposure_entities_to_run` is "all", as the
3-character codes get directly retrieved from the database in this case. In all other cases, the
user will be specifying names (e.g. "Greece", "United_Kingdom") and:
- if `exposure_entities_code` is "ISO3", the `gde-core` will retrieve the corresponding
alpha-3 [ISO 3166 country codes](https://www.iso.org/iso-3166-country-codes.html), using the
[iso3166 library](https://github.com/deactivated/python-iso3166) (this only works if the
listed exposure entities are countries);
- if `exposure_entities_code` is a dictionary, the `gde-core` will seek for the name of the
exposure entities within the keys of the dictionary and take as codes the values associated
with those keys.
- `number_cores` (required): Number of cores used for parallelisation.
# Processing Logic
When running `gdecore`, it is the `main()` function in
[gdecore.py](../gdecore/gdecore.py) that is called and the program is set to run.
The processing logic described herein is the one adopted at present for the `gde-core`, but
the modularity of the code allows for it to be easily reorganised or modified in the future if
needed. In other words, there are aspects of this processing logic that do not need to follow
the order in which they are currently carried out.
For details on the geographic units used by `gde-core` (e.g. "exposure entity", "data unit",
"data-unit tile"), please refer to the documentation of the `gde-importer`
[here](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/gde-importer/-/blob/master/docs/02_Organisation_Geographic_Space.md).
## Reading the Configuration File
The first task carried out by the `gde-core` is reading and interpreting the
[configuration file](03_Configuration.md). This is done by means of the
[Configuration](../gdecore/configuration.py#L29) class, which verifies that all the
required (i.e. non-optional) parameters are available and that their values/types are as
expected by the program. If a problem is found that does not allow the program to run, it will
raise an error and stop.
## Retrieving `aggregated_source_id` and `aggregated_source_format` for User-Specified `model_name`
The `model_name` of the aggregated exposure model specified by the user in the configuration
file is sought for in the `aggregated_sources` table of the database specified under
`database_gde_tiles` in the configuration file, which needs to follow the structure of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).
If `model_name` is not found, the program stops running. If found, its ID
(`aggregated_source_id`) and format (`aggregated_source_format`) are retrieved.
## Initialise Occupancy Cases Class
The format of the aggregated exposure model is relevant to initialise the correct occupancy
cases sub-class of `OccupancyCases` from [occupancy_cases.py](../gdecore/occupancy_cases.py).
The `OccupancyCases` class represents the grouping of occupancy types (as per the GEM Building
Taxonomy v3.0) into occupancy cases. Occupancy cases are broader groups of occupancy types and
are relevant to connect individual OBM buildings with aggregated exposure models, as the latter
are usually defined for particular occupancy cases. For example, the "commercial" occupancy case
of the ESRM20 exposure model (Crowley et al., 2020) comprises the "COM", "COM99" (older format
for "COM"), "COM1", "COM2", "COM3", "COM5" and "RES3" occupancy types.
## Interpreting the Exposure Entities to Run
The `gde-core` offers different ways in which the user can specify which exposure entity/ies to
run (see documentation on the [configuration file](03_Configuration.md)). The
`interpret_exposure_entities_to_run` method of the [Configuration](../gdecore/configuration.py)
class is called to interpret the input provided by the user and update the
`exposure_entities_to_run` attribute of the configuration object accordingly.
If the user specifies to run `all` exposure entities, `gde-core` retrieves the list of
3-character codes of all exposure entities associated with `aggregated_source_id` in the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).
If the user specifies a path to a `.txt` or `.csv`, `gde-core` retrieves the list of 3-character
codes of the exposure entities listed in the indicated `.txt`/`.csv` file.
If the user specifies a list with one or more names of exposure entities, `gde-core` retrieves
the 3-character codes associated with these exposure entities either from a nested structure
that the user indicates in the configuration file or by retrieving the corresponding alpha-3
[ISO 3166 country codes](https://www.iso.org/iso-3166-country-codes.html), using the
[iso3166 library](https://github.com/deactivated/python-iso3166), if `exposure_entities_code`
is set to `ISO3` in the configuration file.
## Processing Each Exposure Entity and Occupancy Case
The tasks that follow are carried out for each exposure entity listed in the
`exposure_entities_to_run` attribute and each occupancy case listed in the `occupancies_to_run`
attribute of the Configuration class.
### Retrieve Data Unit IDs and Geometries
All data unit IDs associated with the exposure entity, occupancy case and aggregated source ID
are retrieved, together with their geometries, from the `data_units` table of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).
Data units for which no geometry is available (which should not occur unless an unexpected error
occurred while running the `gde-importer`) are logged with a warning, as they cannot be
processed.
<img src="images/gde_core_algorithm_01.png" width=75%>
Fig. 4.1 Flowchart showing the processing logic at the highest level.
### Process Data Units in Parallel
Data units are processed using the `process_data_unit` method of the
[GDEProcessor](../gdecore/processor.py#L34) class. Each data unit is processed by a core, using
as many cores as specified by the user in the configuration file.
Each of the subtitles that follow refer to a task carried out or called by the
`process_data_unit` method, as shown in Fig. 4.2.
<img src="images/gde_core_algorithm_02_process_data_unit_overview.png" width=75%>
Fig. 4.2 Overview of the main tasks carried out by `GDEProcessor.process_data_unit()`.
#### Retrieve Building Classes of the Data Unit
The building classes associated with the data unit ID are retrieved from the
`data_units_buildings` table of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).
For each building class, the proportion of buildings it represents within the data unit and the
minimum and maximum number of storeys it covers are retrieved.
It is noted that a data unit is implicitly associated with an aggregated source ID and an
occupancy case and, consequently, "building classes associated with the data unit ID" implies as
well and association with the aggregated source ID and occupancy case being processed.
#### Retrieve OBM Buildings and Assign Building Classes and Probabilities to Them
As show in Fig. 4.3, the following tasks are carried out in order to assign building classes to
all OpenBuildingMap (OBM) buildings that belong to the data unit and occupancy case being
processed:
<img src="images/gde_core_algorithm_03_OBM_buildings.png" width=75%>
Fig. 4.3 Flowchart of tasks associated with retrieving OBM buildings and assigning building
classes to them.
1. **Retrieve OBM buildings**: The `get_OBM_buildings_in_data_unit_by_occupancy_types` method of
`DatabaseQueries` retrieves from the
[OBM buildings database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmbuildings)
all OBM building parts whose centroids fall within the boundaries of the data unit and whose
occupancy type belongs to the occupancy case being processed (see
[here](#initialise-occupancy-cases-class)).
2. **Group parts of the same relations**: Parts of a building that is defined in OpenStreetMap
by a relation are gathered together and treated as one building by the
`post_process_obm_relations` method of `GDEProcessor`. For such cases, the method gathers the
parts and transforms them into one individual building in which the `osm_id` becomes that of the
`relation_id`, the number of storeys becomes the maximum of all individual parts, and the
quadkey of the ensemble is identified.
3. **Calculate the number of OBM buildings per quadkey**: Count numbers of OBM buildings in each
zoom-level 18 tile, represented by its quadkey.
4. **Assign building classes to OBM buildings**: As shown in Fig. 4.4, all building classes
retrieved for the data unit are initially considered as possible for each individual OBM
building. If the number of storeys is available (from OpenStreetMap), building classes that are
incompatible with the indicated number are discarded. If the building is commercial of the type
"COM1", "COM2", "COM3", "COM5" or "RES3", only building classes associated with these types are
kept. This stems from the fact the ESRM20 exposure model (Crowley et al., 2020) has different
sub-types of commercial classes and the logic might need revision when other aggregated exposure
models are incorporated to the GDE model in the future. If discarding potential building classes
as per these criteria lead to no building class being compatible with the building, a warning is
logged and all initial building classes are assigned to the building. This is done because
assigning no building classes to a building leads to the impossibility of considering the
building within a damage/loss assessment.
5. **Store building classes of GDE (OBM) buildings**: At this stage, OBM buildings that have
been assigned building classes become so-called GDE buildings. Their details are stored in the
`gde_buildings` table of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).
<img src="images/gde_core_algorithm_04_OBM_buildings_assign_classes.png" width=75%>
Fig. 4.4 Algorithm used to assign building classes to individual OBM buildings.
Step number 2 should become part of
[rabotnik-obm](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/rabotnik-obm/-/tree/master)
in the future. As explained
[here](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/rabotnik-obm/-/blob/master/docs/general_overview.md),
`rabotnik-obm` currently processes the IDs of individual building polygons without gathering
together building parts that make up a relation in
[OpenStreetMap](https://www.openstreetmap.org) (OSM). Relations are often used in OSM to
represent complex building geometries, like vertical irregularities (e.g. a building whose plan
comprises partly 3 storeys and partly 10 storeys). If in the future
[rabotnik-obm](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/rabotnik-obm/-/tree/master)
were to gather parts of a building relation, the method
`get_OBM_buildings_in_data_unit_by_occupancy_types` of
[DatabaseQueries](../gdecore/database_queries.py#L30) would directly return what is now being
returned by the combination of running `get_OBM_buildings_in_data_unit_by_occupancy_types`
followed by `post_process_obm_relations` of [GDEProcessor](../gdecore/processor.py#L34), and the
latter (i.e. step 2) would thus no longer be needed.
Some limitations associated with the fact that `gde-core` is attempting to gather parts of the
same building relation (instead of having this done by `rabotnik-obm`) are the following:
- `gde-core` queries the
[OBM buildings database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmbuildings)
by data unit, which means that if some parts of a relation fall in one data unit and some others
fall in another, this building will be counted twice, once in each data unit. Results from the
GDE model for Greece indicate this happens just once for the whole country.
- `gde-core` queries the
[OBM buildings database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmbuildings)
by occupancy case, which means that if some parts of a relation have occupancy types that belong
to a different occupancy case (or to no occupancy case at all), they will not be drawn together.
- `rabotnik-obm` only retrieves the first relation ID it finds for each building part, but a
building might be represented in OpenStreetMap by a nested series of relations, as explained
[here](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/rabotnik-obm/-/blob/master/docs/general_overview.md).
Trying to solve these limitations within `gde-core` would incur in dupplicating efforts with
respect to `rabotnik-obm`, which is not desirable. For example, bringing together different
occupancy types coming from parts of a relation should follow the same processing logic that is
already being used by `rabotnik-obm` to define a final occupancy type based on different types
stemming from different tags of the same building (details
[here](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/rabotnik-obm/-/blob/master/docs/occupancy.md)).
The minimum has been done within `gde-core` to minimise the impact of buildings being
represented by relation IDs (i.e. step 2 above is this minimum). Out of over 30 million
buildings processed so far (at the time of writing, end of June 2022) by `rabotnik-obm`, only
0.07% belong to a relation (these buildings belong to Greece, Italy and part of Germany).
#### Calculate Remainder Buildings in Data-Unit Tiles
As shown in Fig. 4.5, the following tasks are carried out in order to calculate the number of
remainder buildings in each data-unit tile:
<img src="images/gde_core_algorithm_05_remainder_buildings.png" width=75%>
Fig. 4.5 Algorithm used to process data-unit tiles.
1. **Retrieve data-unit tiles**: The data-unit tiles associated with the data unit being
processed are retrieved from the `data_unit_tiles` table of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles),
in terms of their quadkeys and the number of aggregated buildings associated with them (i.e. the
number of buildings assigned to the tile by `gde-importer` when distributing the aggregated
exposure model onto the tiles).
2. **Retrieve completeness of data-unit tiles**: The completeness status of the data-unit tiles
is retrieved from the `obm_built_area_assessments` table of the
[OBM tiles database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles)
(entries associated with `source_id=1`, whih corresponds to the Global Human Settlement Layer GHSL,
Corbane et al., 2018). As
[obmgapanalysis](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/obmgapanalysis)
does not create entries for tiles for which the built-up area is zero, if a quadkey does not
have an entry for `source_id=1`, it is assumed to be complete. If the entry exists, the
completeness status is retrieved from the database. The table below shows the interpretation of
the contents of `obm_built_area_assessments` depending on whether the GHS (`source_id=1`) and/or
the OSM (`source_id=0`) entries exist.
| GHS entry | OSM entry | Completeness |
| -------------- | -------------- | -------------------------------------- |
| Exists | Exists | Retrieve from source_id=1 |
| Exists | Does not exist | Retrieve from source_id=1 (incomplete) |
| Does not exist | Exists | OSM/0 --> infinity, assume complete |
| Does not exist | Does not exist | 0/0 --> undefined, assume complete |
3. **Calculate remainder buildings in data-unit tiles**: If the data-unit tile is complete, it
is assigned zero remainder buildings (and, as a consequence, the total number of buildings is
coming from `OpenBuildingMap`). If the data-unit is incomplete, the number of aggregated
buildings in the tile is compared against the number of OBM buildings. If the former is larger
than the latter, the "remainder" buildings are calculated as the difference between the two and
assigned to the tile. If the number of OBM buildings is larger than that of aggregated
buildings, zero remainder buildings are assigned to the tile. This algorithm is shown in Fig.
4.6.
<img src="images/gde_core_algorithm_06_remainder_buildings_calculation.png" width=75%>
Fig. 4.6 Algorithm used to calculate remainder buildings.
4. **Store number of OBM and remainder buildings of the data-unit tiles**: The number of OBM and
remainder buildings in the data-unit tile are stored in the `data_unit_tiles` table of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).
# Querying Results
Example queries that allow to access data from the GDE model are shown herein. Unless stated
otherwise, all queries refer to the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).
## Available aggregated sources
To list all available aggregated sources, that is, aggregated exposure models that have been
imported by the
[gde-importer](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/gde-importer),
do:
```
gde_tiles=> SELECT aggregated_source_id, name, format FROM aggregated_sources;
```
## Total number of buildings of an exposure entity, grouped by occupancy case
The following example query returns the number of aggregated buildings, OBM buildings and
remainder buildings in Greece (`GRC`), as per `aggregated_source_id=1` (i.e., ESRM20), grouped
by each occupancy case:
```
gde_tiles=> SELECT occupancy_case, SUM(aggregated_buildings), SUM(obm_buildings), SUM(remainder_buildings) FROM data_unit_tiles WHERE exposure_entity='GRC' AND aggregated_source_id=1 GROUP BY occupancy_case;
occupancy_case | sum | sum | sum
----------------+--------------------+--------+--------------------
residential | 3051157.8920203564 | 607409 | 2553080.074310104
commercial | 249708.0000000016 | 21811 | 212562.61958915135
industrial | 51229.000000000095 | 15952 | 41849.26464570898
(3 rows)
```
The summation of all OBM buildings and all remainder buildings gives the total number of
buildings in Greece for the three occupancy cases. This does not mean that these are all the OBM
buildings that exist in Greece in the
[OBM buildings database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmbuildings),
as many of those correspond to occupancy cases not covered by ESRM20 (and, thus, by GDE).
## Number of tiles associated with an exposure entity
The following example query returns the number of zoom-level 18 tiles associated with Luxembourg
(`LUX`), as per `aggregated_source_id=1` (i.e., ESRM20):
```
gde_tiles=> SELECT COUNT(DISTINCT(quadkey)) FROM data_unit_tiles WHERE exposure_entity='LUX' AND aggregated_source_id=1;
count
--------
267637
(1 row)
```
Note that the use of `DISTINCT` is relevant because the same tile (identified by its quadkey)
normally has more than one entry associated with the same exposure entity and aggregated source
ID, for different occupancy cases and/or data unit IDs.
## Exposure results associated with a quadkey
The following example query returns the list of entries associated with the zoom-level 18 tile
with quadkey `122100203132021122`, as per `aggregated_source_id=1` (i.e., ESRM20). For each
entry, the data unit ID, occupancy case and numbers of aggregated, OBM and remainder buildings
are requested.
```
gde_tiles=> SELECT data_unit_id, occupancy_case, aggregated_buildings, obm_buildings, remainder_buildings FROM data_unit_tiles WHERE quadkey='122100203132021122' AND aggregated_source_id=1;
data_unit_id | occupancy_case | aggregated_buildings | obm_buildings | remainder_buildings
---------------------------+----------------+----------------------+---------------+---------------------
GRC_3514508 | residential | 3.5232385833448014 | 0 | 3.5232385833448014
GRC_3514608 | residential | 10.832479557437827 | 38 | 0
GRC_3514508 | commercial | 0.2990195693416231 | 0 | 0.2990195693416231
GRC_3514608 | commercial | 1.437904894386666 | 0 | 1.437904894386666
GRC_industrial_FILLER_205 | industrial | 0 | 0 | 0
(5 rows)
```
This result says that both for residential and commercial exposure this tile is intersected by
the boundary between data units `GRC_3514508` and `GRC_3514608` of Greece (`GRC`), but is fully
contained in the filler data-unit created for industrial exposure by the `gde-importer` to
ensure full geographic coverage (see more details
[here](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/gde-importer/-/blob/master/docs/06_Ensuring_Full_Geographic_Coverage.md)).
## Building classes associated with a data unit ID
If, following the previous example, one were interested in knowing the residential building
classes associated with `GRC_3514508`, as per `aggregated_source_id=1` (i.e., ESRM20), the
following query could be run:
```
gde_tiles=> SELECT building_class_name, settlement_type, occupancy_subtype, proportions FROM data_units_buildings WHERE data_unit_id='GRC_3514508' AND occupancy_case='residential' AND aggregated_source_id=1;
building_class_name | settlement_type | occupancy_subtype | proportions
------------------------------------+-----------------+-------------------+------------------------
CR/LDUAL+CDH+LFC:15.0/H:1 | urban | ALL | 0.0014801545326146078
CR/LDUAL+CDH+LFC:15.0/H:2 | urban | ALL | 0.0023115539626318814
CR/LDUAL+CDH+LFC:15.0/H:2/SOS | urban | ALL | 0.0017060083401791966
CR/LDUAL+CDH+LFC:15.0/HBET:3-5 | urban | ALL | 0.014476105111235434
CR/LDUAL+CDH+LFC:15.0/HBET:3-5/SOS | urban | ALL | 0.007514762230467308
... (continues)
```
It is possible to check that all these proportions add up to unity by doing:
```
gde_tiles=> SELECT SUM(proportions) FROM data_units_buildings WHERE data_unit_id='GRC_3514508' AND occupancy_case='residential' AND aggregated_source_id=1;
sum
--------------------
1.0000000000000004
(1 row)
```
## Census people and total replacement cost per building for a specific building class
Taking one of the building classes from the output to the query above, it is possible to
retrieve the census people and total replacement cost per building by doing:
```
gde_tiles=> SELECT census_people_per_building, total_cost_per_building FROM data_units_buildings WHERE building_class_name='CR/LDUAL+CDH+LFC:15.0/HBET:3-5' AND settlement_type='urban' AND occupancy_subtype='ALL' AND occupancy_case='residential' AND data_unit_id='GRC_3514508' AND aggregated_source_id=1;
census_people_per_building | total_cost_per_building
----------------------------+-------------------------
7.905868678536067 | 303187.50000000006
(1 row)
```
Note that not specifying the data unit ID would lead to more than one entry being returned, as
in this other example:
```
gde_tiles=> SELECT data_unit_id, census_people_per_building, total_cost_per_building FROM data_units_buildings WHERE building_class_name='CR/LDUAL+CDH+LFC:15.0/HBET:3-5' AND settlement_type='urban' AND occupancy_subtype='ALL' AND occupancy_case='residential' AND aggregated_source_id=1;
data_unit_id | census_people_per_building | total_cost_per_building
--------------+----------------------------+-------------------------
GRC_1120703 | 5.482850339082017 | 303187.5
GRC_2312405 | 10.349432913301737 | 303187.5
GRC_1120709 | 9.110088642942177 | 303187.5
GRC_1121301 | 7.159655217902061 | 303187.50000000006
GRC_2322901 | 7.683854928944831 | 303187.50000000006
GRC_2312403 | 4.688827891144585 | 303187.5
... (continues)
```
## People at different times of the day
If for the building class queried above it is desired to distribute the census people onto
different times of the day, one should first retrieve the respective coefficients for the
corresponding exposure entity (`GRC`), occupancy case (`residential`) and aggregated source ID
(`1`):
```
gde_tiles=> SELECT day, night, transit FROM exposure_entities_population_time_distribution WHERE exposure_entity='GRC' AND occupancy_case='residential' AND aggregated_source_id=1;