04_Processing_Logic.md 15.6 KB
Newer Older
Cecilia Nievas's avatar
Cecilia Nievas committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
# Processing Logic

When running `gdecore`, it is the `main()` function in
[gdecore.py](../gdecore/gdecore.py) that is called and the program is set to run.
The processing logic described herein is the one adopted at present for the `gde-core`, but
the modularity of the code allows for it to be easily reorganised or modified in the future if
needed. In other words, there are aspects of this processing logic that do not need to follow
the order in which they are currently carried out.

For details on the geographic units used by `gde-core` (e.g. "exposure entity", "data unit",
"data-unit tile"), please refer to the documentation of the `gde-importer`
[here](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/gde-importer/-/blob/master/docs/02_Organisation_Geographic_Space.md).

## Reading the Configuration File

The first task carried out by the `gde-core` is reading and interpreting the
[configuration file](03_Configuration.md). This is done by means of the
[Configuration](../gdecore/configuration.py#L29) class, which verifies that all the
required (i.e. non-optional) parameters are available and that their values/types are as
expected by the program. If a problem is found that does not allow the program to run, it will
raise an error and stop.

## Retrieving `aggregated_source_id` and `aggregated_source_format` for User-Specified `model_name`

The `model_name` of the aggregated exposure model specified by the user in the configuration
file is sought for in the `aggregated_sources` table of the database specified under
`database_gde_tiles` in the configuration file, which needs to follow the structure of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).
If `model_name` is not found, the program stops running. If found, its ID
(`aggregated_source_id`) and format (`aggregated_source_format`) are retrieved.

## Initialise Occupancy Cases Class

The format of the aggregated exposure model is relevant to initialise the correct occupancy
cases sub-class of `OccupancyCases` from [occupancy_cases.py](../gdecore/occupancy_cases.py).
The `OccupancyCases` class represents the grouping of occupancy types (as per the GEM Building
Taxonomy v3.0) into occupancy cases. Occupancy cases are broader groups of occupancy types and
are relevant to connect individual OBM buildings with aggregated exposure models, as the latter
are usually defined for particular occupancy cases. For example, the "commercial" occupancy case
of the ESRM20 exposure model (Crowley et al., 2020) comprises the "COM", "COM99" (older format
for "COM"), "COM1", "COM2", "COM3", "COM5" and "RES3" occupancy types.

## Interpreting the Exposure Entities to Run

The `gde-core` offers different ways in which the user can specify which exposure entity/ies to
run (see documentation on the [configuration file](03_Configuration.md)). The
`interpret_exposure_entities_to_run` method of the [Configuration](../gdecore/configuration.py)
class is called to interpret the input provided by the user and update the
`exposure_entities_to_run` attribute of the configuration object accordingly.

If the user specifies to run `all` exposure entities, `gde-core` retrieves the list of
3-character codes of all exposure entities associated with `aggregated_source_id` in the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).

If the user specifies a path to a `.txt` or `.csv`, `gde-core` retrieves the list of 3-character
codes of the exposure entities listed in the indicated `.txt`/`.csv` file.

If the user specifies a list with one or more names of exposure entities, `gde-core` retrieves
the 3-character codes associated with these exposure entities either from a nested structure
that the user indicates in the configuration file or by retrieving the corresponding alpha-3
[ISO 3166 country codes](https://www.iso.org/iso-3166-country-codes.html), using the
[iso3166 library](https://github.com/deactivated/python-iso3166), if `exposure_entities_code`
is set to `ISO3` in the configuration file.

## Processing Each Exposure Entity and Occupancy Case

The tasks that follow are carried out for each exposure entity listed in the
`exposure_entities_to_run` attribute and each occupancy case listed in the `occupancies_to_run`
attribute of the Configuration class.

### Retrieve Data Unit IDs and Geometries

All data unit IDs associated with the exposure entity, occupancy case and aggregated source ID
are retrieved, together with their geometries, from the `data_units` table of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).
Data units for which no geometry is available (which should not occur unless an unexpected error
occurred while running the `gde-importer`) are logged with a warning, as they cannot be
processed.

<img src="images/gde_core_algorithm_01.png" width=75%>

Fig. 4.1 Flowchart showing the processing logic at the highest level.

### Process Data Units in Parallel

Data units are processed using the `process_data_unit` method of the
[GDEProcessor](../gdecore/processor.py#L34) class. Each data unit is processed by a core, using
as many cores as specified by the user in the configuration file.

Each of the subtitles that follow refer to a task carried out or called by the
`process_data_unit` method, as shown in Fig. 4.2.

<img src="images/gde_core_algorithm_02_process_data_unit_overview.png" width=75%>

Fig. 4.2 Overview of the main tasks carried out by `GDEProcessor.process_data_unit()`.

#### Retrieve Building Classes of the Data Unit

The building classes associated with the data unit ID are retrieved from the
`data_units_buildings` table of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).

For each building class, the proportion of buildings it represents within the data unit and the
minimum and maximum number of storeys it covers are retrieved.

It is noted that a data unit is implicitly associated with an aggregated source ID and an
occupancy case and, consequently, "building classes associated with the data unit ID" implies as
well and association with the aggregated source ID and occupancy case being processed.

#### Retrieve OBM Buildings and Assign Building Classes and Probabilities to Them

As show in Fig. 4.3, the following tasks are carried out in order to assign building classes to
all OpenBuildingMap (OBM) buildings that belong to the data unit and occupancy case being
processed:

<img src="images/gde_core_algorithm_03_OBM_buildings.png" width=75%>

Fig. 4.3 Flowchart of tasks associated with retrieving OBM buildings and assigning building
classes to them.

1. **Retrieve OBM buildings**: The `get_OBM_buildings_in_data_unit_by_occupancy_types` method of
`DatabaseQueries` retrieves from the
[OBM buildings database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmbuildings)
all OBM building parts whose centroids fall within the boundaries of the data unit and whose
occupancy type belongs to the occupancy case being processed (see 
[here](#initialise-occupancy-cases-class)).
2. **Group parts of the same relations**: Parts of a building that is defined in OpenStreetMap
by a relation are gathered together and treated as one building by the
`post_process_obm_relations` method of `GDEProcessor`. For such cases, the method gathers the
parts and transforms them into one individual building in which the `osm_id` becomes that of the
`relation_id`, the number of storeys becomes the maximum of all individual parts, and the
quadkey of the ensemble is identified.
3. **Calculate the number of OBM buildings per quadkey**: Count numbers of OBM buildings in each
zoom-level 18 tile, represented by its quadkey.
4. **Assign building classes to OBM buildings**: As shown in Fig. 4.4, all building classes
retrieved for the data unit are initially considered as possible for each individual OBM
building. If the number of storeys is available (from OpenStreetMap), building classes that are
incompatible with the indicated number are discarded. If the building is commercial of the type
"COM1", "COM2", "COM3", "COM5" or "RES3", only building classes associated with these types are
kept. This stems from the fact the ESRM20 exposure model (Crowley et al., 2020) has different
sub-types of commercial classes and the logic might need revision when other aggregated exposure
models are incorporated to the GDE model in the future. If discarding potential building classes
as per these criteria lead to no building class being compatible with the building, a warning is
logged and all initial building classes are assigned to the building. This is done because
assigning no building classes to a building leads to the impossibility of considering the
building within a damage/loss assessment.
5. **Store building classes of GDE (OBM) buildings**: At this stage, OBM buildings that have
been assigned building classes become so-called GDE buildings. Their details are stored in the
`gde_buildings` table of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).

<img src="images/gde_core_algorithm_04_OBM_buildings_assign_classes.png" width=75%>

Fig. 4.4 Algorithm used to assign building classes to individual OBM buildings.

Step number 2 should become part of
[rabotnik-obm](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/rabotnik-obm/-/tree/master)
in the future. As explained
[here](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/rabotnik-obm/-/blob/master/docs/general_overview.md),
`rabotnik-obm` currently processes the IDs of individual building polygons without gathering
together building parts that make up a relation in
[OpenStreetMap](https://www.openstreetmap.org) (OSM). Relations are often used in OSM to
represent complex building geometries, like vertical irregularities (e.g. a building whose plan
comprises partly 3 storeys and partly 10 storeys). If in the future
[rabotnik-obm](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/rabotnik-obm/-/tree/master)
were to gather parts of a building relation, the method
`get_OBM_buildings_in_data_unit_by_occupancy_types` of
[DatabaseQueries](../gdecore/database_queries.py#L30) would directly return what is now being
returned by the combination of running `get_OBM_buildings_in_data_unit_by_occupancy_types`
followed by `post_process_obm_relations` of [GDEProcessor](../gdecore/processor.py#L34), and the
latter (i.e. step 2) would thus no longer be needed.

Some limitations associated with the fact that `gde-core` is attempting to gather parts of the
same building relation (instead of having this done by `rabotnik-obm`) are the following:
- `gde-core` queries the
[OBM buildings database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmbuildings)
by data unit, which means that if some parts of a relation fall in one data unit and some others
fall in another, this building will be counted twice, once in each data unit. Results from the
GDE model for Greece indicate this happens just once for the whole country.
- `gde-core` queries the
[OBM buildings database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmbuildings)
by occupancy case, which means that if some parts of a relation have occupancy types that belong
to a different occupancy case (or to no occupancy case at all), they will not be drawn together.
- `rabotnik-obm` only retrieves the first relation ID it finds for each building part, but a
building might be represented in OpenStreetMap by a nested series of relations, as explained
[here](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/rabotnik-obm/-/blob/master/docs/general_overview.md). 

Trying to solve these limitations within `gde-core` would incur in dupplicating efforts with
respect to `rabotnik-obm`, which is not desirable. For example, bringing together different
occupancy types coming from parts of a relation should follow the same processing logic that is
already being used by `rabotnik-obm` to define a final occupancy type based on different types
stemming from different tags of the same building (details
[here](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/rabotnik-obm/-/blob/master/docs/occupancy.md)).
The minimum has been done within `gde-core` to minimise the impact of buildings being
represented by relation IDs (i.e. step 2 above is this minimum). Out of over 30 million
buildings processed so far (at the time of writing, end of June 2022) by `rabotnik-obm`, only
0.07% belong to a relation (these buildings belong to Greece, Italy and part of Germany).

#### Calculate Remainder Buildings in Data-Unit Tiles

As shown in Fig. 4.5, the following tasks are carried out in order to calculate the number of
remainder buildings in each data-unit tile:

<img src="images/gde_core_algorithm_05_remainder_buildings.png" width=75%>

Fig. 4.5 Algorithm used to process data-unit tiles.

1. **Retrieve data-unit tiles**: The data-unit tiles associated with the data unit being
processed are retrieved from the `data_unit_tiles` table of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles),
in terms of their quadkeys and the number of aggregated buildings associated with them (i.e. the
number of buildings assigned to the tile by `gde-importer` when distributing the aggregated
exposure model onto the tiles).
2. **Retrieve completeness of data-unit tiles**: The completeness status of the data-unit tiles
is retrieved from the `obm_built_area_assessments` table of the
[OBM tiles database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles)
(entries associated with `source_id=1`, whih corresponds to the Global Human Settlement Layer GHSL,
Corbane et al., 2018). As
[obmgapanalysis](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/obmgapanalysis)
does not create entries for tiles for which the built-up area is zero, if a quadkey does not
have an entry for `source_id=1`, it is assumed to be complete. If the entry exists, the
completeness status is retrieved from the database. The table below shows the interpretation of
the contents of `obm_built_area_assessments` depending on whether the GHS (`source_id=1`) and/or
the OSM (`source_id=0`) entries exist.


| GHS entry      | OSM entry      | Completeness                           |
| -------------- | -------------- | -------------------------------------- |
| Exists         | Exists         | Retrieve from source_id=1              |
| Exists         | Does not exist | Retrieve from source_id=1 (incomplete) |
| Does not exist | Exists         | OSM/0 --> infinity, assume complete    |
| Does not exist | Does not exist | 0/0 --> undefined, assume complete     |

3. **Calculate remainder buildings in data-unit tiles**: If the data-unit tile is complete, it
is assigned zero remainder buildings (and, as a consequence, the total number of buildings is
coming from `OpenBuildingMap`). If the data-unit is incomplete, the number of aggregated
buildings in the tile is compared against the number of OBM buildings. If the former is larger
than the latter, the "remainder" buildings are calculated as the difference between the two and
assigned to the tile. If the number of OBM buildings is larger than that of aggregated
buildings, zero remainder buildings are assigned to the tile. This algorithm is shown in Fig.
4.6.

<img src="images/gde_core_algorithm_06_remainder_buildings_calculation.png" width=75%>

Fig. 4.6 Algorithm used to calculate remainder buildings.

4. **Store number of OBM and remainder buildings of the data-unit tiles**: The number of OBM and
remainder buildings in the data-unit tile are stored in the `data_unit_tiles` table of the
[GDE Tiles database](https://git.gfz-potsdam.de/dynamicexposure/globaldynamicexposure/database-gdetiles).