Commit eff87c88 authored by Nicolas Garcia Ospina's avatar Nicolas Garcia Ospina
Browse files

Included grid generator scripts

parent de53b79c
Pipeline #28250 passed with stage
in 3 minutes and 32 seconds
......@@ -9,6 +9,8 @@ cache:
- venv/
before_script:
- apt-get update -y
- apt-get install -y python3-pip proj-bin libgeos-dev gdal-bin libgdal-dev
- python3 -V
- pip3 install virtualenv
- virtualenv venv
......
# quadtreegrid
This repository contains code related to the creation and management of quadtree grids.
\ No newline at end of file
This repository contains code related to the creation and writing of quadtree grids.
# quadtreegrid.py examples
The `quadtreegrid.py` script allows the creation of a grid with all tiles that contain
a bounding box.
To launch this script, an input extent is always required, it may be inserted manually,
or it can be extracted from a shapefile.
## -in_bounds
For instance, a specific bounding box is introduced with the -in_bounds argument followed
by the West,South,East,North limits separated with single spaces:
```
python quadtreegrid.py -in_bounds 1 1 5 5
```
This would create a file called `output_quadkeys.txt` with all the `zoom level 18` tiles that
contain that bounding box.
## -in_file
The bounding box can be also extracted from a `shapefile` for example, this can be done with the
argument `-in_file` followed by the relative or absolute path of the target file. For example:
```
python quadtreegrid.py -in_file /path/to/my_shapefile.shp
```
This would create a file called `output_quadkeys.txt` with all the `zoom level 18` tiles that
contain the bounding box of the entire shapefile.
## Optional arguments
Two optional arguments to include are: 1) `-output` which allows to set a filename or filepath to
the resulting text file. If the resulting filename has the `csv` extension, a header with the name
`quadkey` will be included. The default for this is `output_quadkeys.txt`. 2) The `-zoom` argument
sets the desired zoom level of the output tiles. Its default value is `18`.
```
python quadtreegrid.py -in_bounds 1 1 5 5 -output ./custom_filepath.csv -zoom 9
```
This would create a file called `custom_filepath.csv` with all the `zoom level 9` tiles that
contain the input bounding box and the header `quadkey`.
# quadtreegrid_polyfill.py examples
The `quadtreegrid_polyfill.py` script allows for the identification of the quadkeys of all tiles that
contain an irregular polygon or multipolygon buffered by 0.05 deg, this way tiles in the borders
won't be accidentally skipped. It is an optimization of the `Babel("bing").polyfill()` function from
[Babelgrid](https://pypi.org/project/babelgrid/). The current version of this script uses the country
borders from [GADM 3.6](https://gadm.org/) dataset. This dataset must be downloaded separately from
[this_link](https://biogeo.ucdavis.edu/data/gadm3.6/gadm36_levels_shp.zip), and the shapefile `gadm36_0.shp`
must be placed in the `data` directory of this repository (`quadtreegrid/data`). Currently, this is hard
coded but may become more flexible in a future version.
## -country
To launch this script, an input territory is always required, it is inserted manually with the `-country`
argument followed by the name of the country in english. Whenever the country name has more than one
word, they must be separated by an underscore `_`. Some examples may be:
```
python quadtreegrid_polyfill.py -country Italy
```
```
python quadtreegrid_polyfill.py -country Puerto_Rico
```
```
python quadtreegrid_polyfill.py -country Democratic_Republic_of_the_Congo
```
## Optional arguments
The `-zoom` and `-output` arguments work in the same way as in `quadtreegrid.py`. One characteristic of
the `polyfill()` function from `Babelgrid` is that it cannot allocate quadkeys if their size is larger than
a single polygon (e.g. a very small country, or an island with low zoom level quadkeys). This is partly solved
with the buffer applied to all polygons and using an adequate zoom level. An example that is likely to fail is:
```
python quadtreegrid_polyfill.py -country Vatican_City -zoom 8
```
To optimize the memory usage, and the program speed, a `multiprocessing` framework has been incorporated.
For it, the target geometry is gridded to produce smaller batches to be processed, this is done with the
`-max_batches` argument, whose default value is 100. If the amount of batches is increased, the processing
speed for each one decreases. Keep in mind that their size also depends on the size of the original geometry
(country).
The other argument that controls the `multiprocessing` behaviour is `-n_cores`, with it the amount of parallel
jobs to be executed is established. This depends on the CPU thread count. The default value is 1, meaning that
it will not parallelize the execution. A complete setup to launch this program could be the following:
```
python quadtreegrid_polyfill.py -country Italy -max_batches 10000 -n_cores 12 -output Italy_quadkeys.txt
```
This will launch a program to find the zoom level 18 quadkeys in Italy with 12 parallel jobs where 10000
batches will be distributed and finally write a text file called `Italy_quadkeys.txt`.
......@@ -19,18 +19,78 @@
import logging
import sys
import geopandas as gpd
import mercantile
import argparse
# Add a logger printing error, warning, info and debug messages to the screen
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.info("Launching quadtreegrid")
parser = argparse.ArgumentParser(
description="""Get all quadkeys from given bounds
or get the bounds of a vector file"""
)
parser.add_argument(
"-in_bounds",
type=float,
nargs="+",
help="Bounds in order West South East North, separated by spaces",
)
parser.add_argument(
"-in_file", type=str, help="Shapefile with a given geometry to extract a bounding box"
)
parser.add_argument(
"-zoom",
nargs="?",
const=1,
type=int,
default=18,
help="Set zoom level of output tiles, default: 18",
)
parser.add_argument(
"-output",
type=str,
nargs="?",
const=1,
default="./output_quadkeys.txt",
help="""Set the output file path and extension as
text or CSV file path, default: ./output_quadkeys.txt""",
)
args = parser.parse_args()
def main():
if args.in_bounds:
sw_lon, sw_lat, ne_lon, ne_lat = args.in_bounds
assert sw_lon < ne_lon, "West longitude is greater than east longitude"
assert sw_lat < ne_lat, "South latitude is greater than north latitude"
elif args.in_file:
outline = gpd.read_file(args.in_file).to_crs("epsg:4326")
geometry = outline.envelope.values[0]
sw_lon, sw_lat, ne_lon, ne_lat = geometry.bounds
else:
raise ValueError("No input geometry found")
# Example logging output
logger.info("quadtreegrid has started")
logger.info(
"Zoom level {} quadkeys will be created for the bounding "
"box {} {} {} {}.".format(args.zoom, sw_lon, sw_lat, ne_lon, ne_lat)
)
tiles_generator = mercantile.tiles(sw_lon, sw_lat, ne_lon, ne_lat, args.zoom)
logger.info("Writing to {}".format(args.output))
outfile = open(args.output, "w")
if args.output[-3:] == "csv":
outfile.write("quadkey\n")
for tile in tiles_generator:
outfile.write(mercantile.quadkey(tile) + "\n")
outfile.close()
# Example logging output
logger.info("quadtreegrid is closing")
# Leave the program
sys.exit()
......
#!/usr/bin/env python3
# Copyright (c) 2021:
# Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum GFZ
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero
# General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see http://www.gnu.org/licenses/.
import logging
import sys
import geopandas as gpd
import argparse
import mercantile
from multiprocessing import Pool
import numpy as np
from shapely.geometry import Polygon
import os
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.info("Launching quadtreegrid")
parser = argparse.ArgumentParser(
description="""Get all quadkeys from given polygon shapefile"""
)
parser.add_argument("-country", type=str, help="Country name in GADM 3.6")
parser.add_argument(
"-zoom",
nargs="?",
const=1,
type=int,
default=18,
help="Set zoom level of output tiles, default: 18",
)
parser.add_argument(
"-max_batches",
nargs="?",
const=1,
type=int,
default=100,
help="""Set the maximum amount of batches for multiprocessing""",
)
parser.add_argument(
"-n_cores",
nargs="?",
const=1,
type=int,
default=1,
help="""Set the maximum amount of jobs to spawn in multiprocessing""",
)
parser.add_argument(
"-output",
type=str,
nargs="?",
const=1,
default="./output_quadkeys.txt",
help="""Set the output file path and extension as
text or CSV file path, default: ./output_quadkeys.txt""",
)
args = parser.parse_args()
# Get a country polygon by name from the GADM dataset and do small buffer
country_name = args.country.replace("_", " ")
world = gpd.read_file(
os.path.join("..", os.path.dirname(__file__), "data", "gadm36_0.shp")
).to_crs("epsg:4326")
logger.info(country_name)
country = world[world["NAME_0"] == country_name]
geom = country.unary_union
del world
country = gpd.GeoDataFrame(crs="epsg:4326", geometry=[geom])
# Build the batches from griding the input polygon
sw_lon, sw_lat, ne_lon, ne_lat = country.geometry.values[0].bounds
logger.info("Creating batches")
width = abs(ne_lon - sw_lon) / np.sqrt(args.max_batches)
height = abs(ne_lat - sw_lat) / np.sqrt(args.max_batches)
cols = list(np.arange(int(np.floor(sw_lon)) - 0.5, int(np.ceil(ne_lon)) + 0.5, width))
rows = list(np.arange(int(np.floor(sw_lat)) - 0.5, int(np.ceil(ne_lat)) + 0.5, height))
rows.reverse()
polygons = []
for x in cols:
for y in rows:
polygons.append(
Polygon(
[
(x, y),
(x + width, y),
(x + width, y - height),
(x, y - height),
]
)
)
grid = gpd.GeoDataFrame({"geometry": polygons}, crs="epsg:4326")
gridded_polygon = gpd.overlay(grid, country, how="intersection")
batch_list = gridded_polygon.geometry.tolist()
logger.info(
"{} batches to be processed with {} parallel jobs".format(len(batch_list), args.n_cores)
)
def get_tiles_from_polygon(in_geometry):
"""Wrapper function that extracts quadkeys inside a given polygon.
Args:
in_geometry (shapely.geometry.polygon.Polygon): Polygon to fill with quadkeys.
Returns:
tiles_ids (list): List of quadkeys as strings.
"""
in_geometry = in_geometry.buffer(0.01)
tiles = []
if in_geometry.geom_type == "Polygon":
sw_lon, sw_lat, ne_lon, ne_lat = in_geometry.bounds
tiles_generator = mercantile.tiles(sw_lon, sw_lat, ne_lon, ne_lat, args.zoom)
tiles.extend([mercantile.quadkey(tile) for tile in tiles_generator])
elif in_geometry.geom_type == "MultiPolygon":
geometries = list(in_geometry)
for geometry in geometries:
sw_lon, sw_lat, ne_lon, ne_lat = geometry.bounds
tiles_generator = mercantile.tiles(sw_lon, sw_lat, ne_lon, ne_lat, args.zoom)
tiles.extend([mercantile.quadkey(tile) for tile in tiles_generator])
else:
raise IOError("Input is not a valid geometry.")
if len(tiles) > 0:
logger.info("Found {} tiles in batch".format(len(tiles)))
return tiles
else:
logger.info("No quadkeys found in batch")
return tiles
def main():
logger.info("create pool")
p = Pool(processes=args.n_cores)
logger.info("execute pool")
data = p.map(get_tiles_from_polygon, batch_list)
logger.info("closing pool")
p.close()
p.join()
# Remove possible duplicated quadkeys
flat_list_ids = list(set([item for sublist in data for item in sublist]))
if len(flat_list_ids) > 0:
logger.info("Writing {} quadkeys to {}".format(len(flat_list_ids), args.output))
outfile = open(args.output, "w")
if args.output[-3:] == "csv":
outfile.write("quadkey\n")
for element in flat_list_ids:
outfile.write(element + "\n")
outfile.close()
else:
logger.info("No output file will be produced")
if __name__ == "__main__":
main()
......@@ -8,10 +8,18 @@ linters_require = ["black>=20.8b1", "pylint", "flake8"]
setup(
name="quadtreegrid",
version="0.0.1",
description="Quadtree grid creation for OpenBuildingMap",
description="Quadtree grid creation",
keywords="OpenBuildingMap, quadtree, grid",
author="Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum GFZ",
install_requires=["numpy"],
install_requires=[
"pyproj==3.0.0.post1",
"rtree",
"shapely",
"mercantile",
"geopandas",
"numpy",
"babelgrid",
],
extras_require={
"tests": tests_require,
"linters": linters_require,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment