Commit d82cd8e9 authored by Cecilia Nievas's avatar Cecilia Nievas
Browse files

Added function generate_country_industrial_cells, its tests, test files

parent 0e9f4756
...@@ -36,7 +36,7 @@ from copy import deepcopy ...@@ -36,7 +36,7 @@ from copy import deepcopy
import numpy as np import numpy as np
import pandas as pd import pandas as pd
import geopandas as gpd import geopandas as gpd
from shapely.geometry import Polygon from shapely.geometry import Point, Polygon
def define_corners(centr_lon, centr_lat, width_EW, width_NS): def define_corners(centr_lon, centr_lat, width_EW, width_NS):
...@@ -1203,3 +1203,345 @@ def get_relative_area_range(cells_gdf): ...@@ -1203,3 +1203,345 @@ def get_relative_area_range(cells_gdf):
areas_range_relative = areas_range / areas.mean() areas_range_relative = areas_range / areas.mean()
return areas_range_relative return areas_range_relative
def generate_country_industrial_cells(
country,
col_lon,
col_lat,
width_EW,
width_NS,
id_str,
precision_points,
precision_cells,
aggr_model_pathname,
boundaries_pathname,
boundaries_type="shp",
consistency_checks=True,
autoadjust_overlap_gap=True,
in_crs="EPSG:4326",
consistency_tol_dist=0.05,
consistency_tol_area=0.05,
):
"""This function reads the input aggregated exposure model of the country (location defined
by aggr_model_pathname and country), identifies the unique points present in this input
model, and generates cell geometries around these points of widths width_EW and width_NS
(in the east-west and north-south directions, respectively). It then adjusts these
geometries so that the cells do not overlap and that consecutive cells do not have small
gaps in between them. This adjustment is necessary because it cannot be guaranteed that
the input points are spaced with sufficient accuracy. The identification of unique points
and the adjustment of the geometries is carried out according to precision levels specified
by precision_points and precision_cells.
If consistency_checks is True, four consistency checks are run:
1) Function overlap_by_full_geom_intersection() is used to verify that the created cells do
not overlap with one another. The cells pass this check if no overlaps are found (output
parameter overlap_found is "False").
2) Functions swell_cells_with_buffer(), overlap_by_full_geom_intersection() and
auto_adjust_overlaps_gaps() are used to verify that the created cells do not have gaps in
between them where there should not be any. The cells pass this check if no gaps are found
(output parameter gap_found is "False").
3) Function get_distance_centroids() is used to calculate the maximum distance between the
original points and the final centroids of the generated cells (max_dist_centr). This
maximum distance is compared against the tolerance, defined as:
max_dist_centr <= min(width_EW, width_NS)*consistency_tol_dist
The cells pass this check if max_dist_centr is smaller than the tolerance (output parameter
big_dist_diff is "False").
4) Function get_relative_area_range() is used to calcuate the variability of the areas of
the resulting cells, as [max-min]/mean. This value is compared against consistency_tol_area:
if the former is smaller, the cells pass this check (output parameter big_area_diff is
"False").
If consistency_checks is False, the consistency checks are not run and all associated output
parameters take the value "Not_Checked".
If both consistency_checks and autoadjust_overlap_gap are True, the function automatically
adjusts the geometry of the cells if overlaps or gaps are identified. Gaps between cells
that are located diagonally with one another and that have other neighbouring cells in other
directions are ignored.
If autoadjust_overlap_gap is True, overlap_found and gap_found will be False, because the
function keeps on adjusting the geometries of the cells until no overlaps or gaps are found.
The adjusted cells are then cropped as per the country's boundaries (input file defined by
boundaries_pathname). The consistency checks are run before this step because the cropping
will affect the location of the centroids of the cells and their areas.
Args:
country (str): Name of country (as in the files of the aggregated model).
col_lon (str): Name of the column in the file of the aggregated exposure model that
contains longitudes.
col_lat (str): Name of the column in the file of the aggregated exposure model that
contains latitudes.
width_EW (float): Width of the cell in the east-west direction, in degrees, >0.
width_NS (float): Width of the cell in the north-south direction, in degrees, >0.
id_str (str): First part of the string used to generate IDs of the inidividual points.
in which the aggregated exposure model is defined. E.g. "IND_".
precision_points (int): Number of decimal places to be used to determine unique points
present in the aggregated exposure model (the aggregated model
will most likely have several entries for the "same" point).
precision_cells (dictionary): Dictionary with four keys, "lon_w", "lat_s", "lon_e", and
"lat_n". Each key contains a formatting string such as
'{:.3f}', for example. It is used to round the coordinates
of the cells' corners to a certain precision to identify
those that need to be adjusted to be the exact same value.
aggr_model_pathname (str): Path where the files of the aggregated model are located.
boundaries_pathname (str): Path where the boundaries of the country are located.
boundaries_type (str): File type containing the boundaries ("shp"=Shapefile, "gpkg"=
Geopackage). Default: "shp".
consistency_checks (bool): If True, consistency checks will be run to check the quality
of the output produced. If False, they will not.
Default: True.
autoadjust_overlap_gap (bool): If True, the code will automatically adjust cells for
which an overlap or gap was found (consistency_checks
needs to be True for this check to be run). If False, no
automatic adjustment will be carried out.
in_crs (str): CRS of the data (default="EPSG:4326").
consistency_tol_dist (float): Tolerance to assess how large the maximum distance between
the original points and the centroids of the generated
cells is with respect to the width of the cells. The value
of consistency_tol_dist is multiplied by the smallest of
width_EW and width_NS to define the tolerance in degrees.
Default: 0.05 (i.e. 5% of the smallest width). Only needed
if consistency_checks is True.
consistency_tol_area (float): Tolerance to assess how large the variability of the area
of the generated cells is. The value of
consistency_tol_area is compared against the relative area
range, which is calculated as ([max-min]/mean) of all
cells. Default: 0.05. Only needed if consistency_checks is
True.
Returns:
cells_adj_bound_gdf (GeoDataFrame): GeoPandas GeoDataFrame with the cells defined around
the points at which the aggregated model is defined
for the country. The cells have already been
"cropped" around the country boundaries.
Each row is one of these points. It contains the
following columns:
- id: ID of the point, given by id_str and an
incremental integer.
- coordinates: the names of these columns are given
by col_lon and col_lat. They contain
the original coordinates as given in
the input aggregated exposure model.
- lon_w, lat_s, lon_e, lat_n: coordinates that
define the corners of
the cells.
- geometry: (Shapely) polygons of the output cells.
aggr_mod_df (DataFrame): Pandas DataFrame with the aggregated exposure model of the
country. The content is exactly the same as in the original
input file, plus two additional columns "ID_99" and "NAME_99",
both of which contain the IDs of the cells of cells_adj_gdf
associated with each entry of aggr_mod_df.
overlap_found (str): "True" if consistency_checks=True and an overlap between the
calculated cell geometries was found, "False" if consistency_checks
=True and an overlap was not found (which means that the cells pass
this consistency check), "Not_Checked" if consistency_checks=False,
i.e. the consistency checks were not run.
gap_found (str): "True" if consistency_checks=True and a gap between the calculated cell
geometries was found, "False" if consistency_checks=True and a gap was
not found (which means that the cells pass this consistency check),
"Not_Checked" if consistency_checks=False, i.e. the consistency checks
were not run.
big_dist_diff (str): "True" if consistency_checks=True and the maximum distance between
the original input points and the centroids of the calculated cells
(before the cropping as per country boundaries) is larger than the
tolerance, "False" if consistency_checks=True and the maximum
distance is smaller than the tolerance (which means that the cells
pass this consistency check), "Not_Checked" if consistency_checks=
False, i.e. the consistency checks were not run.
big_area_diff (str): "True" if consistency_checks=True and the variability of the areas
of the resulting cells before the cropping as per country
boundaries) is larger than the tolerance, "False" if
consistency_checks=True and the variability is smaller than the
tolerance (which means that the cells pass this consistency check),
"Not_Checked" if consistency_checks=False, i.e. the consistency
checks were not run.
country_id (str): ID of the country as specified in the country boundaries file given
as input.
"""
# Return the input in_df if width_EW or width_NS are not valid:
if (width_EW <= 0.0) or (width_NS <= 0):
print(
"ERROR in generate_country_industrial_cells: "
"with_EW and width_NS need to be positive numbers >=0"
)
return
# Load exposure file of the aggregated exposure model:
aggr_model_filepath = os.path.join(aggr_model_pathname, "Exposure_Ind_%s.csv" % (country))
if not os.path.isfile(aggr_model_filepath):
print("ERROR in generate_country_industrial_cells: aggregated model file not found")
return
aggr_mod_df = pd.read_csv(aggr_model_filepath, sep=",") # aggr_mod_df is a Pandas DataFrame
# Load country boundaries:
boundaries_filepath = os.path.join(
boundaries_pathname, "Adm0_%s.%s" % (country, boundaries_type)
)
if not os.path.isfile(boundaries_filepath):
print("ERROR in generate_country_industrial_cells: country boundaries file not found")
return
bounds_gdf = gpd.read_file(boundaries_filepath)
# Retrieve unique points in the exposure file, determined with a specific precision
# (points_gdf is a GeoPandas GeoDataFrame, ids_aggr is an array of strings with length
# equal to the number of rows of aggr_mod_df):
points_gdf, ids_aggr = retrieve_unique_points(
aggr_mod_df, col_lon, col_lat, id_str, precision=precision_points, in_crs=in_crs
)
# Add the IDs of the unique points to the DataFrame of the input aggregated model:
aggr_mod_df["ID_99"] = ids_aggr
aggr_mod_df["NAME_99"] = ids_aggr
# Define cells around the unique points (cells_gdf is a GeoPandas GeoDataFrame):
cells_gdf = define_cells_in_dataframe(
points_gdf, col_lon, col_lat, width_EW, width_NS, in_crs=in_crs
)
# Create dictionaries with all coordinates of the corners of the cells:
coords_dict = create_dict_all_coordinates(cells_gdf, precision_cells)
if len(coords_dict.keys()) < 1:
print(
"ERROR in generate_country_industrial_cells: create_dict_all_coordinates "
"has returned an empty dictionary"
)
return
# Create dictionary with unique values of the coordinates of the corners of the cells, to
# a certain precision (given by precision_cells when generating coords_dict):
coords_uq = create_dict_unique_coordinates(coords_dict)
# Adjust all the coordinates of the corners of the cells (coords_dict) by taking the
# average value of all instances of that coordinate that "should be the same", as identified
# in coords_uq:
coords_dict_adj = adjust_coords(coords_dict, coords_uq) # coords_dict_adj is a dictionary
# Generate final output with adjusted cell geometries (cells_adj_gdf is a
# GeoPandas GeoDataFrame):
cells_adj_gdf = build_adjusted_cells_dataframe(cells_gdf, coords_dict_adj)
# Run consistency checks if requested:
if not consistency_checks:
overlap_found = "Not_Checked"
gap_found = "Not_Checked"
big_dist_diff = "Not_Checked"
big_area_diff = "Not_Checked"
else:
# Consistency check 1: the output geometries should not overlap
num_overlaps = 999 # Initialise variable for the while loop to run at least once
while num_overlaps > 0:
intsect_gdf = overlap_by_full_geom_intersection(cells_adj_gdf, "id_1", "id_2")
num_overlaps = intsect_gdf.shape[0]
if num_overlaps > 0:
if not autoadjust_overlap_gap: # The user specified not to automatically adjust
overlap_found = "True"
break
# Automatically adjust the overlaps
cells_adj_gdf, _ = auto_adjust_overlaps_gaps(
cells_adj_gdf,
intsect_gdf,
col_lon,
col_lat,
width_EW,
width_NS,
"overlap",
"id_1",
"id_2",
)
else: # No overlaps found, the while loop will end
overlap_found = "False"
# Consistency check 2:
gaps_found = True # Initialise variable for the while loop to run at least once
while gaps_found:
# Expand the cells by 25% of their dimensions in all directions:
cells_adj_offset_gdf = swell_cells_with_buffer(
cells_adj_gdf, 0.25 * width_EW, 0.25 * width_NS
)
# Identify intersections in the expanded version:
intsect_gdf = overlap_by_full_geom_intersection(
cells_adj_offset_gdf, "id_1", "id_2"
)
# Automatically adjust the potential gaps and store it in auxiliary variable to
# determine if there are gaps or not:
cells_adj_aux_gdf, gaps_found = auto_adjust_overlaps_gaps(
cells_adj_gdf,
intsect_gdf,
col_lon,
col_lat,
width_EW,
width_NS,
"gap",
"id_1",
"id_2",
)
if not autoadjust_overlap_gap: # The user specified not to automatically adjust
if gaps_found:
gap_found = "True"
else:
gap_found = "False"
break # This will be the result of the check, the while loop will not run again
# Adopt the auxiliary adjusted version:
cells_adj_gdf = deepcopy(cells_adj_aux_gdf)
gap_found = str(gaps_found)
# Consistency check 3: maximum distance between original points and final centroids:
max_dist_centr = get_distance_centroids(cells_adj_gdf, col_lon, col_lat)
# Compare the maximum distance against the tolerance:
if max_dist_centr > min(width_EW, width_NS) * consistency_tol_dist:
big_dist_diff = "True"
else:
big_dist_diff = "False"
# Consistency check 4: stability/variability of area of resulting cells:
rel_area_range = get_relative_area_range(cells_gdf)
# Compare the relative area range ([max-min]/mean) against the tolerance:
if rel_area_range > consistency_tol_area:
big_area_diff = "True"
else:
big_area_diff = "False"
# Intersect cells with admnistrative boundary of country:
cells_adj_bound_gdf = gpd.overlay(cells_adj_gdf, bounds_gdf, how="intersection")
# Eliminate columns that are not useful:
if "ID_0" in cells_adj_bound_gdf.columns:
country_id = str(cells_adj_bound_gdf["ID_0"].values[0])
del cells_adj_bound_gdf["ID_0"]
else:
country_id = "UNK"
if "NAME_0" in cells_adj_bound_gdf.columns:
del cells_adj_bound_gdf["NAME_0"]
if "id_2" in cells_adj_bound_gdf.columns:
del cells_adj_bound_gdf["id_2"]
if "id_1" in cells_adj_bound_gdf.columns:
cells_adj_bound_gdf = cells_adj_bound_gdf.rename(columns={"id_1": "id"})
# Update columns "lon_w", "lat_s", "lon_e", and "lat_n" of cells_adj_bound_gdf:
for row in range(0, cells_adj_bound_gdf.shape[0]):
geometry_row = cells_adj_bound_gdf["geometry"].values[row]
cells_adj_bound_gdf["lon_w"].values[row] = geometry_row.bounds[0]
cells_adj_bound_gdf["lon_e"].values[row] = geometry_row.bounds[2]
cells_adj_bound_gdf["lat_s"].values[row] = geometry_row.bounds[1]
cells_adj_bound_gdf["lat_n"].values[row] = geometry_row.bounds[3]
return (
cells_adj_bound_gdf,
aggr_mod_df,
overlap_found,
gap_found,
big_dist_diff,
big_area_diff,
country_id,
)
GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]
\ No newline at end of file
GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]
GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]
\ No newline at end of file
GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]
GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]
\ No newline at end of file
GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]
LONGITUDE,LATITUDE,TAXONOMY,BUILDINGS,DWELLINGS,OCCUPANTS_PER_ASSET,COST_PER_AREA_EUR,ID_1,NAME_1,ID_2,NAME_2
12,50,BdgClassA,50,50,500,2000,1,State 1,11,Municipality 1
12,50,BdgClassB,30,30,400,2000,1,State 1,11,Municipality 1
12,50,BdgClassC,20,20,300,2000,1,State 1,11,Municipality 1
12,50.0082,BdgClassA,40,40,400,2000,1,State 1,11,Municipality 1
12,50.0082,BdgClassB,40,40,533,2000,1,State 1,11,Municipality 1
12,50.0082,BdgClassC,30,30,450,2000,1,State 1,11,Municipality 1
12.0082,50,BdgClassA,15,15,150,2000,1,State 1,11,Municipality 1
12.0082,50,BdgClassD,30,30,500,2000,1,State 1,11,Municipality 1
12.0081,50.0082,BdgClassA,20,20,200,2000,2,State 2,25,Municipality 5
12.0081,50.0082,BdgClassB,15,15,200,2000,2,State 2,25,Municipality 5
12.0081,50.0082,BdgClassC,25,25,375,2000,2,State 2,25,Municipality 5
12.0081,50.0082,BdgClassD,30,30,500,2000,2,State 2,25,Municipality 5
12.0162,50.0162,BdgClassB,45,45,600,2000,2,State 2,27,Municipality 7
12.0162,50.0162,BdgClassD,60,60,1000,2000,2,State 2,27,Municipality 7
\ No newline at end of file
LONGITUDE,LATITUDE,TAXONOMY,BUILDINGS,DWELLINGS,OCCUPANTS_PER_ASSET,COST_PER_AREA_EUR,ID_1,NAME_1,ID_2,NAME_2,ID_99,NAME_99
12,50,BdgClassA,50,50,500,2000,1,State 1,11,Municipality 1,CA_IND_0,CA_IND_0
12,50,BdgClassB,30,30,400,2000,1,State 1,11,Municipality 1,CA_IND_0,CA_IND_0
12,50,BdgClassC,20,20,300,2000,1,State 1,11,Municipality 1,CA_IND_0,CA_IND_0
12,50.0082,BdgClassA,40,40,400,2000,1,State 1,11,Municipality 1,CA_IND_1,CA_IND_1
12,50.0082,BdgClassB,40,40,533,2000,1,State 1,11,Municipality 1,CA_IND_1,CA_IND_1
12,50.0082,BdgClassC,30,30,450,2000,1,State 1,11,Municipality 1,CA_IND_1,CA_IND_1
12.0082,50,BdgClassA,15,15,150,2000,1,State 1,11,Municipality 1,CA_IND_3,CA_IND_3
12.0082,50,BdgClassD,30,30,500,2000,1,State 1,11,Municipality 1,CA_IND_3,CA_IND_3
12.0081,50.0082,BdgClassA,20,20,200,2000,2,State 2,25,Municipality 5,CA_IND_2,CA_IND_2
12.0081,50.0082,BdgClassB,15,15,200,2000,2,State 2,25,Municipality 5,CA_IND_2,CA_IND_2
12.0081,50.0082,BdgClassC,25,25,375,2000,2,State 2,25,Municipality 5,CA_IND_2,CA_IND_2
12.0081,50.0082,BdgClassD,30,30,500,2000,2,State 2,25,Municipality 5,CA_IND_2,CA_IND_2
12.0162,50.0162,BdgClassB,45,45,600,2000,2,State 2,27,Municipality 7,CA_IND_4,CA_IND_4
12.0162,50.0162,BdgClassD,60,60,1000,2000,2,State 2,27,Municipality 7,CA_IND_4,CA_IND_4
\ No newline at end of file
LONGITUDE,LATITUDE,TAXONOMY,BUILDINGS,DWELLINGS,OCCUPANTS_PER_ASSET,COST_PER_AREA_EUR,ID_1,NAME_1,ID_2,NAME_2
12,50,BdgClassA,50,50,500,2000,1,State 1,11,Municipality 1
12,50,BdgClassB,30,30,400,2000,1,State 1,11,Municipality 1
12,50,BdgClassC,20,20,300,2000,1,State 1,11,Municipality 1
12.001,50.0082,BdgClassA,40,40,400,2000,1,State 1,11,Municipality 1
12.001,50.0082,BdgClassB,40,40,533,2000,1,State 1,11,Municipality 1
12.001,50.0082,BdgClassC,30,30,450,2000,1,State 1,11,Municipality 1
12.0082,50,BdgClassA,15,15,150,2000,1,State 1,11,Municipality 1
12.0082,50,BdgClassD,30,30,500,2000,1,State 1,11,Municipality 1
12.0081,50.0082,BdgClassA,20,20,200,2000,2,State 2,25,Municipality 5
12.0081,50.0082,BdgClassB,15,15,200,2000,2,State 2,25,Municipality 5
12.0081,50.0082,BdgClassC,25,25,375,2000,2,State 2,25,Municipality 5
12.0081,50.0082,BdgClassD,30,30,500,2000,2,State 2,25,Municipality 5
12.0162,50.0162,BdgClassB,45,45,600,2000,2,State 2,27,Municipality 7
12.0162,50.0162,BdgClassD,60,60,1000,2000,2,State 2,27,Municipality 7
\ No newline at end of file
LONGITUDE,LATITUDE,TAXONOMY,BUILDINGS,DWELLINGS,OCCUPANTS_PER_ASSET,COST_PER_AREA_EUR,ID_1,NAME_1,ID_2,NAME_2,ID_99,NAME_99
12,50,BdgClassA,50,50,500,2000,1,State 1,11,Municipality 1,CB_IND_0,CB_IND_0
12,50,BdgClassB,30,30,400,2000,1,State 1,11,Municipality 1,CB_IND_0,CB_IND_0
12,50,BdgClassC,20,20,300,2000,1,State 1,11,Municipality 1,CB_IND_0,CB_IND_0
12.001,50.0082,BdgClassA,40,40,400,2000,1,State 1,11,Municipality 1,CB_IND_1,CB_IND_1
12.001,50.0082,BdgClassB,40,40,533,2000,1,State 1,11,Municipality 1,CB_IND_1,CB_IND_1
12.001,50.0082,BdgClassC,30,30,450,2000,1,State 1,11,Municipality 1,CB_IND_1,CB_IND_1
12.0082,50,BdgClassA,15,15,150,2000,1,State 1,11,Municipality 1,CB_IND_3,CB_IND_3
12.0082,50,BdgClassD,30,30,500,2000,1,State 1,11,Municipality 1,CB_IND_3,CB_IND_3
12.0081,50.0082,BdgClassA,20,20,200,2000,2,State 2,25,Municipality 5,CB_IND_2,CB_IND_2
12.0081,50.0082,BdgClassB,15,15,200,2000,2,State 2,25,Municipality 5,CB_IND_2,CB_IND_2
12.0081,50.0082,BdgClassC,25,25,375,2000,2,State 2,25,Municipality 5,CB_IND_2,CB_IND_2
12.0081,50.0082,BdgClassD,30,30,500,2000,2,State 2,25,Municipality 5,CB_IND_2,CB_IND_2
12.0162,50.0162,BdgClassB,45,45,600,2000,2,State 2,27,Municipality 7,CB_IND_4,CB_IND_4
12.0162,50.0162,BdgClassD,60,60,1000,2000,2,State 2,27,Municipality 7,CB_IND_4,CB_IND_4
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment