Skip to content
Snippets Groups Projects
user avatar
Niklas Bohn authored
78377d84
History

Database File of Manually classified Sentinel-2A Data

This repository contains a database of manually labeled Sentinel-2A spectra which were used in the paper: Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M. Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666..

The data itself and some associated metadata are stored in an HDF5 file which can be downloaded here:

https://git.gfz-potsdam.de/EnMAP/sentinel2_manual_classification_clouds/blob/master/20170710_s2_manual_classification_data.h5

The first dimension of dates, spectra, and classes are aligned such that for each spectrum the selected classes can be retrieved. The association of class_ids and class_names is given in additional attributes.

The figure below shows the layout of the file and some sample data:

hdf5 file

How the data was produced

1. Data Collection

Open-source Sentinel-2 data is available for download on the Scientific Data Hub. Products consist of a 290 km image divided into 100 km granules in UTM/WGS84 projection. The product name includes sensing and creation date, as well as the relative orbit number of the image.

Following image corresponds to the division into granules of the product S2A_OPER_PRD_MSIL1C_PDMC_20151211T153317_R021_V20151211T084342_20151211T084342.SAFE:

granules

To create a varied and representative spatial dataset, downloaded images cover a large variety of regions from all over the world.

2. Data Classification

By means of different spectral tools, granule pixels are selected and classified into one of the following six classes:

Class Coverage
cloud opaque clouds
cirrus cirrus and vapor trails
snow snow and ice
shadow shadows from clouds, cirrus, mountains, buildings, etc
water lakes, rivers, seas
clear-sky remaining: crops, mountains, urban, etc

Spectral tools include false-color composites, image enhancements and graphical visualization of spectra. Our aim is to create highly heterogeneous classes with a balanced number of pixels.

The figure below exposes the benefit of false-color composites for snow distinction. For this RGB display of the Atlas mountains in Morocco, bands 12/7/3 are selected. Snow pixels will appear in blue, whereas cloud pixels in pink orange.

marokko

Next figure illustrates the pixel classification. The Fiji coastline is displayed in two different false-composites: (a) bands 4/3/2 and (b) bands 8a/3/2. Colored polygons represent four different classes. Cyan, yellow, dark blue and green colors stand for water, shadow, cloud and clear-sky pixels. fiji

And following graph shows four different spectral profiles from a Sentinel-2 image.

Dataset

Our dataset consists of a total of N=5647725 pixels. Pixel information is saved into different tables in the HDF5 file. Relative to Sentinel-2 spatial and spectral resolutions:

  • band associates a band position with its label
  • further band descriptions can be found in bandwidth_nm, central_wavelength_nm and spatial_sampling_m Relative to the classes:
  • classes (1xN table) includes the class id to which each pixel in the dataset is associated
  • class_ids describes the id associated to each class that appears in class_names Relative to the spectra:
  • spectra (13xN table) collects the spectral values of each pixel. Sentinel-2 instrument samples 13 spectral bands. Relative to the image metadata:
  • latitude and longitude gather pixel coordinates
  • each pixel is located in a granule_id, where several granules correspond to an image associated with a product_id
  • the same product will share the sensing date -date-, four different sampling angles -sun_azimuth_angle, sun_zenith_angle, viewing_azimuth_angle, viewing_zenith_angle- and the geographical location -continent and country.