Database File of Manually classified Sentinel-2A Data
This repository contains a database of manually labeled Sentinel-2A spectra which were used in the paper: Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M. Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666..
The data itself and some associated metadata are stored in an HDF5 file which can be downloaded here:
The first dimension of dates, spectra, and classes are aligned such that for each spectrum the selected classes can be retrieved. The association of class_ids and class_names is given in additional attributes.
The figure below shows the layout of the file and some sample data:
How the data was produced
1. Data Collection
Open-source Sentinel-2 data is available for download on the Scientific Data Hub. Products consist of a 290 km image divided into 100 km granules in UTM/WGS84 projection. The product name includes sensing and creation date, as well as the relative orbit number of the image.
Following image corresponds to the division into granules of the product S2A_OPER_PRD_MSIL1C_PDMC_20151211T153317_R021_V20151211T084342_20151211T084342.SAFE:
To create a varied and representative spatial dataset, downloaded images cover a large variety of regions from all over the world.
2. Data Classification
By means of different spectral tools, granule pixels are selected and classified into one of the following six classes:
Class | Coverage |
---|---|
cloud | opaque clouds |
cirrus | cirrus and vapor trails |
snow | snow and ice |
shadow | shadows from clouds, cirrus, mountains, buildings, etc |
water | lakes, rivers, seas |
clear-sky | remaining: crops, mountains, urban, etc |
Spectral tools include false-color composites, image enhancements and graphical visualization of spectra. Our aim is to create highly heterogeneous classes with a balanced number of pixels.
The figure below exposes the benefit of false-color composites for snow distinction. For this RGB display of the Atlas mountains in Morocco, bands 12/7/3 are selected. Snow pixels will appear in blue, whereas cloud pixels in pink orange.
Next figure illustrates the pixel classification. The Fiji coastline is displayed in two different false-composites: (a) bands 4/3/2 and (b) bands 8a/3/2. Colored polygons represent four different classes. Cyan, yellow, dark blue and green colors stand for water, shadow, cloud and clear-sky pixels.
And following graph shows four different spectral profiles from a Sentinel-2 image.
Dataset
Our dataset consists of a total of N=5647725 pixels. Pixel information is saved into different tables in the HDF5 file. Relative to Sentinel-2 spatial and spectral resolutions:
- band associates a band position with its label
- further band descriptions can be found in bandwidth_nm, central_wavelength_nm and spatial_sampling_m Relative to the classes:
- classes (1xN table) includes the class id to which each pixel in the dataset is associated
- class_ids describes the id associated to each class that appears in class_names Relative to the spectra:
- spectra (13xN table) collects the spectral values of each pixel. Sentinel-2 instrument samples 13 spectral bands. Relative to the image metadata:
- latitude and longitude gather pixel coordinates
- each pixel is located in a granule_id, where several granules correspond to an image associated with a product_id
- the same product will share the sensing date -date-, four different sampling angles -sun_azimuth_angle, sun_zenith_angle, viewing_azimuth_angle, viewing_zenith_angle- and the geographical location -continent and country.