Commit 79f71100 authored by Cecilia Nievas's avatar Cecilia Nievas
Browse files

Updated documentation

parent 557d9273
......@@ -193,7 +193,7 @@ def run_this_file(config_dict):
print(' '+ error_message)
log.append(error_message)
print('\n')
compl_cells[i]==3
compl_cells[i]= 3
cell_summary_dict['Completeness']= 999
if compl_cells[i] == 1 or compl_cells[i] == 5 or compl_cells[i] == 6: # cell is complete (1) or water (5) or empty (6), no additional OQ input files need to be written
leftover_num_bdgs= 0.0
......
......@@ -23,7 +23,11 @@ Section 2.6: Seismic Hazard and Risk Dynamics
SERA_mapping_admin_units_to_cells_add_GHS_from_OBM_tiles
========================================================
This code adds the surface of built area per tile according to the Global Human Settlement (GHS)
layer. It retrieves the corresponding value from the OBM Tiles database]. The values are stored
in the PSQL database (field `ghs_km2`), and are thus allocated to intersections of cells and
administrative units proportionally to the areas of the intersections with respect to that of
the complete cell.
"""
import sys
......
......@@ -16,7 +16,7 @@ The overall procedure can be grouped into three main stages:
3. The combination of both sources of data.
The spacing of the grid used by the present prototype code is 10 arc-seconds, though the final code will work with a map tiles approach, handled through a Quadtree principle. In the present 10-arcsec grid, the cell ID starts from the North-West corner of the world, moves East by row, and finishes at the South-East corner of the world. First cell ID is 0, last cell ID is 8,398,079,999 (total number of cells is 8,398,080,000). There are 64,800 rows and 129,600 columns of cells.
The grid used by the present prototype code is composed of zoom-level 18 quadtiles (tiles with quadkeys) defined in EPSG:3857 projection. It is noted that previous releases of this code used a 10 arc-second grid instead. The words `tile` and `cell` are used interchangeably herein, and so are `cell ID` and `quadkey`.
The first stage consists in going one by one the relevant administrative units of each country, determining the grid cells associated with each unit, and distributing the total number of buildings indicated by the SERA exposure model across those grid cells, according to a certain criterion ("distribution method"), such as population count or built-up area estimated from the processing of remote-sensing imagery. The proportion or distribution of building classes (structural types) is also retrieved from SERA, as well as the parameters of relevance for each building class, such as the number of people per dwelling, number of dwellings per building, cost per area, etc. All this information is stored as HDF5 files.
......@@ -32,7 +32,7 @@ All scripts that are not tools require input parameters that are read from a con
# Copyright and Copyleft
Copyright (C) 2020
Copyright (C) 2020-2021
Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum GFZ
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
......
......@@ -18,14 +18,12 @@ The order in which the scripts in the present repository need to be run to produ
2. Run `OBM_assign_cell_ids_and_adm_ids_to_footprints.py`
3. Run `SERA_create_HDF5_metadata.py`
4. Run `SERA_mapping_admin_units_to_cells.py`
5. Run `SERA_mapping_admin_units_to_cells_add_GHS_from_HDF5.py` (if GHS criterion desired)
6. Run `SERA_mapping_admin_units_to_cells_add_GPW.py` (if GPW criterion desired)
7. Run `SERA_mapping_admin_units_to_cells_add_Sat.py` (if Sat or Sat_mod criterion desired)
8. Run `SERA_distributing_exposure_to_cells.py` with the desired distribution method.
9. If the OpenQuake input files for the SERA model distributed onto a grid are desired (i.e. not GDE, just SERA), run `SERA_create_OQ_input_files.py` with the desired distribution method.
10. If a CSV summarising the number of buildings, dwellings, people and costs by cell according to the SERA model is desired (i.e. not GDE, just SERA), run `SERA_create_visual_output_of_grid_model_full_files.py` with the desired distribution method.
11. Run `OBM_buildings_per_cell.py` with the desired distribution method.
12. Run `GDE_gather_SERA_and_OBM.py` with the desired distribution method. The output is:
5. Run `SERA_mapping_admin_units_to_cells_add_GHS_from_OBM_tiles.py`
6. Run `SERA_distributing_exposure_to_cells.py` with the desired distribution method.
7. If the OpenQuake input files for the SERA model distributed onto a grid are desired (i.e. not GDE, just SERA), run `SERA_create_OQ_input_files.py` with the desired distribution method.
8. If a CSV summarising the number of buildings, dwellings, people and costs by cell according to the SERA model is desired (i.e. not GDE, just SERA), run `SERA_create_visual_output_of_grid_model_full_files.py` with the desired distribution method.
9. Run `OBM_buildings_per_cell.py` with the desired distribution method.
10. Run `GDE_gather_SERA_and_OBM.py` with the desired distribution method. The output is:
- a series of CSV files that serve as input for damage/risk calculations to be run in OpenQuake (https://github.com/gem/oq-engine);
- a CSV file that summarises results per cell and contains the geometry of the cells so that it can all be visualised with a GIS;
- a CSV file that summarises results per adminstrative unit and contains the geometry of the administrative boundaries so that it can all be visualised with a GIS;
......@@ -33,19 +31,19 @@ The order in which the scripts in the present repository need to be run to produ
## Testing Scripts
- The scripts `SERA_testing_rebuilding_exposure_from_cells_alternative_01.py`, `SERA_testing_rebuilding_exposure_from_cells_alternative_02.py` and `SERA_testing_rebuilding_exposure_from_cells_alternative_03.py` can be run after step 8 above. They compare the SERA-on-a-grid model against the original files of the SERA model.
- The scripts `SERA_testing_rebuilding_exposure_from_cells_alternative_01.py`, `SERA_testing_rebuilding_exposure_from_cells_alternative_02.py` and `SERA_testing_rebuilding_exposure_from_cells_alternative_03.py` can be run after step 6 above. They compare the SERA-on-a-grid model against the original files of the SERA model.
- The script `SERA_testing_compare_visual_output_vs_OQ_input_files.py` can be run after step 10 above to compare the number of buildings, people and cost per cell reported in the OpenQuake input file (generated from the grid) and the visual output CSV.
- The script `SERA_testing_compare_visual_output_vs_OQ_input_files.py` can be run after step 8 above to compare the number of buildings, people and cost per cell reported in the OpenQuake input file (generated from the grid) and the visual output CSV.
- The script `SERA_create_outputs_QGIS_for_checking.py` can be run after step 7 above to create a summary of the parameters mapped (GHS, GPW, Sat, etc) in CSV format to be read with QGIS, enabling a visual check of the results.
- The script `SERA_create_outputs_QGIS_for_checking.py` can be run after step 5 above to create a summary of the parameters mapped (GHS, area, etc) in CSV format to be read with QGIS, enabling a visual check of the results.
- The script `SERA_testing_mapping_admin_units_to_cells_qualitycontrol.py` can be run after step 4 above to check the areas of the cells mapped for the administrative units for which step 3 was run.
- The script `GDE_check_consistency.py` can be run after step 12 above. It carries out different consistency checks on the resulting GDE model (see detailed description of this script).
- The script `GDE_check_consistency.py` can be run after step 10 above. It carries out different consistency checks on the resulting GDE model (see detailed description of this script).
- The script `GDE_check_OQ_input_files.py` can be run after step 12 above. It prints to screen some summary values of the files and checks that the asset ID values are all unique.
- The script `GDE_check_OQ_input_files.py` can be run after step 10 above. It prints to screen some summary values of the files and checks that the asset ID values are all unique.
- The script `GDE_check_tiles_vs_visual_CSVs.py` can be run after step 12 above. It reads the visual CSV output by cell and the corresponding GDE tiles HDF5 files and compares the number of buildings, cost and number of people in each cell according to each of the two. An output CSV file collects the discrepancies found, if any.
- The script `GDE_check_tiles_vs_visual_CSVs.py` can be run after step 10 above. It reads the visual CSV output by cell and the corresponding GDE tiles HDF5 files and compares the number of buildings, cost and number of people in each cell according to each of the two. An output CSV file collects the discrepancies found, if any.
## Other Scripts
......@@ -60,6 +58,10 @@ The order in which the scripts in the present repository need to be run to produ
- `SERA_exploration_longest_strings.py`
- `SERA_exploration_summarise_columns_in_full_CSV_files.py`
- The following scripts were used in previous releases of this prototype code. They are now deprecated but kept in the repository nevertheless:
- `SERA_mapping_admin_units_to_cells_add_GHS_from_HDF5.py`
- `SERA_mapping_admin_units_to_cells_add_GPW.py`
- `SERA_mapping_admin_units_to_cells_add_Sat.py`
# Pre-Requisites / Initial Assumptions
......
......@@ -8,7 +8,8 @@ The configuration file is organised into sub-sections, some of which are common
- `Available Results`: used by a few scripts that work together on results from several different runs (e.g. using different methods to distribute the SERA model to a grid).
- `OBM Database`: name of the database, schema, table and user to access the OBM buildings.
- `Tiles Database`: name of the database, schema, table and user to access the tiles/cells database table.
- `Admin Units Database`name of the database, schema, table and user to access the administrative units database table.
- `Admin Units Database`: name of the database, schema, table and user to access the administrative units database table.
- `OBM Tiles`: name of the host, database, user and tables containing data on built-up areas and completeness (as well as the source ID of the built-up area) to obtain these data from the [OBM Tiles database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles).
- `Cells to Process`: the list of cells to process can be defined by means of several methods: by country, by administrative unit ID of a country, with a bounding box, specifying a number of random cells to select from a country, or by means of an arbitrary list of cell IDs. All these parameters are specified in this sub-section.
- `Ocuppancy String Groups`: mapping of occupancy strings to occupancy categories (e.g. RES1, RES2, etc., are "Res", COM1, COM11, etc., are "Com", etc.)
......
......@@ -83,7 +83,22 @@ This code determines the mapping between the administrative units used in SERA a
It goes per country and, within each country, per administrative unit. All three occupancy cases (Res, Com, Ind) are considered together because the code groups together the cases for which the administrative level at which the exposure models are defined is the same (otherwise there would be a lot of repeated processing). This code adds and/or updates entries to the tiles database. The fields that are written by this code are: `cell_id`, `country`, `occupancy`, `adm_level`, `adm_id`, `area` and `geom`. Each entry of the table is a cell-admin unit intersection for a particular occupancy type.
# SERA_mapping_admin_units_to_cells_add_GHS_from_HDF5.py
# SERA_mapping_admin_units_to_cells_add_GHS_from_OBM_tiles.py
## Configurable parameters:
No specific ones.
## What the code does:
This code adds the surface of built area per tile according to the Global Human Settlement (GHS) layer. It retrieves the corresponding value from the [OBM Tiles database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles). The original GHS layer is a raster file in Mollweide projection that uses a non-standard reference system.
The values are stored in the PSQL database (field `ghs_km2`), and are thus allocated to intersections of cells and administrative units proportionally to the areas of the intersections with respect to that of the complete cell.
The method for selecting the cells to process and associated parameters need to be specified in the configuration file under `Cells to Process`.
# SERA_mapping_admin_units_to_cells_add_GHS_from_HDF5.py (DEPRECATED)
## Configurable parameters:
......@@ -100,7 +115,7 @@ The input to this code is an HDF5 file with a 10-arcsec grid that is aligned wit
The method for selecting the cells to process and associated parameters need to be specified in the configuration file under `Cells to Process`.
# SERA_mapping_admin_units_to_cells_add_GPW.py
# SERA_mapping_admin_units_to_cells_add_GPW.py (DEPRECATED)
## Configurable parameters:
......@@ -118,7 +133,7 @@ The Gridded Population of the World (GPW) v4.0 dataset:
Center for International Earth Science Information Network-CIESIN-Columbia University (2016) Gridded Population of the World, Version 4 (GPWv4). NASA Socioeconomic Data and Applications Center, Palisades. http://dx.doi.org/10.7927/H4NP22DQ
# SERA_mapping_admin_units_to_cells_add_Sat.py
# SERA_mapping_admin_units_to_cells_add_Sat.py (DEPRECATED)
## Configurable parameters:
......@@ -259,10 +274,12 @@ When the number of storeys is not available (=UNK), the proportions of SERA buil
The parameters that need to be specified under the `GDE_gather_SERA_and_OBM` section of the configuration file are:
- version_of_SERA_metadata = string that defines the version of the SERA model. The code will seek for the file "Europe_SERA_metadata_v_"+version_of_SERA_metadata+".hdf5".
- sera_disaggregation_to_consider = area, gpw_2015_pop, ghs, sat_27f or sat_27f_model. Select the parameter to use to distribute the SERA model to the grid.
- sera_disaggregation_to_consider = area, gpw_2015_pop (deprecated), ghs, sat_27f (deprecated) or sat_27f_model (deprecated). Select the parameter to use to distribute the SERA model to the grid.
- read_completeness = read completeness from CSV ("csv") or the [OBM Tiles database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles) ("obm_tiles").
- print_screen_during_run = if True, messages will be printed while running the code (useful for debugging).
- occupancy_cases = Res, Com, Ind. Occupancy cases to process.
## What the code does:
This code goes by cell (list of cells as specified in the configuration file under `Cells to Process`), combining the OBM buildings with the expected buildings from the SERA model distributed to the cells according to the selected criterion (`sera_disaggregation_to_consider`).
......@@ -284,7 +301,7 @@ The output is:
- a CSV file that summarises results per adminstrative unit and contains the geometry of the administrative boundaries so that it can all be visualised with a GIS;
- a series of HDF5 files (one per cell) that contain the final GDE model (the "GDE tiles").
The completeness status of the cells is retrieved from a CSV file. The code then goes by cell and, within each cell, by occupancy case (Res, Com, Ind, Oth). A dictionary is created and filled for each cell containing the final values of the cell, which are dumped in the by-cell visual output CSV file incrementally (each row corresponds to a cell and is written at the end of the loop dealing with that cell). Another dictionary is created outside of the main loop and is used to store information per administrative unit, which can only be dumped to an output file once the code has finished running all of the cells involved (by-admin unit visual output CSV). The administrative units are not predefined, they are automatically retrieved from the cell information. Apart from these two visual output CSV files, the code generates CSV exposure input files in the OpenQuake (OQ) format.
The completeness status of the cells is retrieved either from a CSV file or from the [OBM Tiles database](https://git.gfz-potsdam.de/dynamicexposure/openbuildingmap/database-obmtiles). The code then goes by cell and, within each cell, by occupancy case (Res, Com, Ind, Oth). A dictionary is created and filled for each cell containing the final values of the cell, which are dumped in the by-cell visual output CSV file incrementally (each row corresponds to a cell and is written at the end of the loop dealing with that cell). Another dictionary is created outside of the main loop and is used to store information per administrative unit, which can only be dumped to an output file once the code has finished running all of the cells involved (by-admin unit visual output CSV). The administrative units are not predefined, they are automatically retrieved from the cell information. Apart from these two visual output CSV files, the code generates CSV exposure input files in the OpenQuake (OQ) format.
The code starts by retrieving the OBM buildings associated with a cell and occupancy case from the OBM HDF5 files. These OBM buildings include those that can be classified according to SERA building classes and those that cannot (due to, for example, their centroid falling outside of an administrative unit where the SERA model is defined). If there are OBM buildings with SERA classes, these are appended to the corresponding OQ input CSV file. OBM buildings without SERA classes cannot be appended to the OQ input file because no value or number of people can be assigned to them.
......@@ -296,9 +313,9 @@ The function `cn_gral.disaggregate_OBM_by_class_and_adm_unit_main` is then calle
These cases should not occur, because right now the only possible classification of OBM buildings comes from SERA. This means that if the OBM DataFrame has buildings, their classes should be SERA classes and their country_adminIDs should match those of SERA too. Other classifications might be incorporated into the model in the future. The matter of cells intersected by boundaries between countries for which different external aggregated models are used has not yet been explicitly considered, though having coded the functions as generic as possible should make it relatively straightforward.
Before moving on to further calculations, the code checks if the cell has been assigned a completeness code equal to 5, which means “irrelevant (water)”, but there are OBM buildings in the cell. If this occurs, the cell is later on treated as incomplete and a completeness value of 999 is assigned to it in the by-cell visual output file.
Before moving on to further calculations, the code checks if the cell has been assigned a completeness code equal to 5, which means “irrelevant (water)”, or 5, which means "empty", but there are OBM buildings in the cell. If this occurs, the cell is later on treated as incomplete and a completeness value of 999 is assigned to it in the by-cell visual output file.
If the cell is complete (completeness code =1), the building stock of the cell is taken as the OBM buildings only. If the cell is incomplete, a number of “left-over” buildings is calculated as the difference between the number of SERA buildings and the number of OBM buildings. If the number of OBM buildings is larger than that of SERA buildings, the number of left-over buildings is zero. These calculations are carried out per administrative unit within the cell, hence the need to classify both OBM and SERA data per administrative unit. Only the number of left-over buildings is calculated in this way, not the number of dwellings, people or costs, as the totals of these parameters are not directly proportional and depend on the distribution of building classes. The number of left-over buildings are assigned the SERA distribution of building classes, irrespective of the building classes of the OBM buildings. This means, for example, that if according to SERA there should be 100 buildings in the cell-administrative unit, of which 25 are made of wood, 60 are reinforced concrete and 15 are steel, and according to the narrowing down of classes (based on the number of storeys), there are 20 steel buildings, 10 reinforced concrete ones and 5 wood ones, the number of left-over buildings is 100 – (20+10+5) = 65, of which 16.25 will be assigned the wood class, 39.0 will be assigned the reinforced concrete class and 9.75 will be assigned the steel class. As a result, the total number of buildings per class in the cell will be (5+16.25=) 21.25 wood, (10+39=) 49 reinforced concrete and (20+9.75=) 29.75 steel ones. This decision clearly has the potential to change the proportion of building classes with respect to that indicated in the SERA model.
If the cell is complete (completeness code = 1), water (completeness code = 5) or empty (completeness code = 6), the building stock of the cell is taken as the OBM buildings only. If the cell is incomplete, a number of “left-over” buildings is calculated as the difference between the number of SERA buildings and the number of OBM buildings. If the number of OBM buildings is larger than that of SERA buildings, the number of left-over buildings is zero. These calculations are carried out per administrative unit within the cell, hence the need to classify both OBM and SERA data per administrative unit. Only the number of left-over buildings is calculated in this way, not the number of dwellings, people or costs, as the totals of these parameters are not directly proportional and depend on the distribution of building classes. The number of left-over buildings are assigned the SERA distribution of building classes, irrespective of the building classes of the OBM buildings. This means, for example, that if according to SERA there should be 100 buildings in the cell-administrative unit, of which 25 are made of wood, 60 are reinforced concrete and 15 are steel, and according to the narrowing down of classes (based on the number of storeys), there are 20 steel buildings, 10 reinforced concrete ones and 5 wood ones, the number of left-over buildings is 100 – (20+10+5) = 65, of which 16.25 will be assigned the wood class, 39.0 will be assigned the reinforced concrete class and 9.75 will be assigned the steel class. As a result, the total number of buildings per class in the cell will be (5+16.25=) 21.25 wood, (10+39=) 49 reinforced concrete and (20+9.75=) 29.75 steel ones. This decision clearly has the potential to change the proportion of building classes with respect to that indicated in the SERA model.
If there are any left-over buildings, these are appended to the corresponding OQ input CSV file. The total number of buildings, dwellings, people and cost are then calculated as the summation of the OBM buildings and the left-over buildings. If the number of OBM buildings with SERA classes is different from the total number of OBM buildings, then the total number of buildings reflects both OBM building with and without SERA classes, but the numbers of dwellings and people as well as the costs include only the OBM buildings with classes, as these values cannot be calculated if building classes are not assigned. Summary results are appended to the by-cell visual output CSV file.
......
......@@ -17,15 +17,7 @@ Tools used by the GDE code to access/query/write to the PSQL databases of tiles
# GDE_TOOLS_world_grid.py
This code allows to generate the 10-arcsec world grid over which the Global Dynamic Exposure model is generated in this version of the code. Future versions of GDE will use zoom level 18 Quadtiles instead.
The grid is conceptually defined in the following way:
- The grid spacing is 10 arc-seconds.
- The grid runs from -180.0 through +180.0 in longitude.
- The grid runs from -90.0 through +90.0 in latitude.
- The top-left-most cell (NW) is cell number 0.
- The cell id increases from this first cell to the east, by "row".
- At the end of each row, the cell id "jumps" to the first (westmost) cell of the next row.
This code allows to generate the grid of zoom-level 18 quadtiles over which the Global Dynamic Exposure model is generated in this version of the code. Previous versions of GDE used 10-arcsec cells instead. The names and logic of the functions has been preserved to be equivalent to those used in the past with the 10-arcsec cells.
# GDE_TOOLS_read_SERA.py
......@@ -33,7 +25,7 @@ The grid is conceptually defined in the following way:
Tools used to read the original files of the SERA exposure model and write the SERA HDF5 cell files and the SERA HDF5 buildings files (when running `SERA_distributing_exposure_to_cells.py`).
# GDE_TOOLS_GPW.py
# GDE_TOOLS_GPW.py (DEPRECATED)
Tools to load the population and density grids of Gridded Population of the World (GPW) v4.0. The input HDF5 files read by this code have been previously parsed from the original GPW files.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment