Big Data for the Environment
A new era in global environmental monitoring has begun. One of the propellers is the European earth observation programme, whose Sentinel satellite fleet is supplying an unprecedented wealth of measurements: continuously, comprehensively and free of charge. By the end of 2017, Sentinel-1, Sentinel-2 and Sentinel-3 alone will be recording a daily data volume of some 20 terabytes — 20,000 gigabytes. This massive flow of data requires new approaches to data access, processing and analysis. On behalf of the Bavarian Ministry of Economic Affairs and Media, Energy and Technology, EOC has developed and tested the required technologies together with industrial partners.
Formerly, acquisitions had to be individually downloaded and evaluated by users, but now the masses of available data are processed directly at the source, at the reception facilities and at the storage systems. In the OPUS research project, user algorithms are brought to the data, whether in computer clusters or in the Cloud. Data reception, archiving and processing are ideally linked for maximal speed. With the help of OPUS, only customized information products reach the user instead of raw data. This avoids the transfer of large amounts of data and makes it unnecessary to set up in-house computer services.
As a test, EOC has now produced a satellite image mosaic for Germany from almost 1500 Sentinel-1 data sets using a fully automated processing chain developed as part of the OPUS project. For the so-called TimeScan product the data are not only condensed spatially, but also temporally. Each pixel is calculated from hundreds of individual scenes. Since the Sentinel-1 radar carefully differentiates ground roughness and conductivity/moisture, the new TimeScan product provides information on surface characteristics useful for such purposes as land use classification. In this instance all radar acquisitions collected over Germany between May 2014 and July 2016 were systematically extracted from the stream of received data and processed to produce the TimeScan data set. In the corresponding false-color image, the average, minimal and maximal backscatter values are represented for every pixel in the red, green and blue channels. Urban conglomerations, for example, reflect strongly and appear as prominent, bright areas. Water bodies deflect a large proportion of the oblique radar beams from the satellite and are therefore dark. Vegetated regions are distinguished by comparatively high minimal backscatter, which causes forests and meadows to appear in green tones. In addition, the temporal dynamics of the acquisition are highly condensed in the product—quasi an additional information source. Land cover that has changed considerably during the acquisition period, such as crop acreage, is lilac in this data product.
TimeScan condenses the information contained in countless acquisitions to produce a single product with a fraction of the original data volume. Such data compression is a way to efficiently use the now available streams of satellite data. This is particularly relevant for achieving continuous environmental monitoring based on extensive time series of earth observation data. Accordingly, DFD scientists are already at work developing methodologies that yield a detailed portrayal of changes and dynamics in spatial utilization patterns. For example, with the help of TimeScan data sets key questions posed by scientists about aspects of climate change can be addressed, like urbanisation. Scenarios for using these new technologies are also currently being tested to support commercial providers of geoinformation services.
Prototypes of the developed processing chains have also been implemented for data from the Landsat, Envisat-ASAR and Sentinel-2 missions. At the same time, the procedures could be successfully executed on a variety of platforms. Beside the classic Cloud environment with virtual machines, a Hadoop cluster and a High-Performance Computing Cluster are being employed. So far a total of some 600,000 data sets and over 2 petabytes of data have been processed.
Accessing large amounts of Sentinel data and processing them on a Cloud platform is also one of the purposes of the national Copernicus portal CODE-DE. It will make the data and processing methodologies developed in the OPUS project available to national science, public and commercial users.
As part of the new strategy of the European Commission, provider of the European Copernicus Programme, such platforms are to be expanded and made available to users throughout Europe and beyond. Big Data in earth observation for scientists and the geoinformation industry is to become a reality through the German Copernicus Centre and its concept. With support from the Bavarian Ministry of Economic Affairs and in cooperation with the commercial sector, DFD is pursuing the installation of these platforms.