Article from the DLRmagazine 175: A plattform for high-performance data analysis

Big data in Earth Observation

Tokio from space
This view from space shows the development of cities and settlements, here using the example of Tokyo. The red areas represent growth.

Earth observation has changed significantly over the last two decades. Numerous government and commercial satellites now paint an increasingly detailed picture of Earth's ecosystem. The growing volume and complexity of this data requires new processing methods. The terrabyte high-performance computer, operated jointly by DLR and the Leibniz Supercomputing Centre (Leibniz-Rechenzentrum; LRZ), extracts valuable scientific and economic information from this mountain of data.

Global change and its effects on people and the environment pose major challenges for scientific research. To find solutions, researchers need direct access to the relevant data. The terrabyte high-performance computer provides exactly that. Networked with the satellite data archive of the German Remote Sensing Data Center (Deutsches Fernerkundungsdatenzentrum; DFD), users from DLR and selected external research institutions can access a comprehensive, curated collection of Earth observation data with global coverage. In addition, DLR is continuously uploading data from other providers such as the European Space Agency (ESA) and the US space agency NASA onto terrabyte.

There are currently around 60 petabytes in the DFD archive – the amount of data acquired over the last 50 years. This is equivalent to approximately 15 million feature films. Additionally, more than 15 terabytes of new data are added every day, providing both historical and current information about the state of our planet, thus enabling detailed mapping of changes.

Direct access to the relevant data

DLR researchers analyse and process the data. For example, using the data they were able to demonstrate that air quality had improved globally during the coronavirus pandemic, when the concentration of tropospheric nitrogen dioxide in Europe and Southeast Asia dropped by more than 40 percent. This was attributed to lower economic activity and reduced traffic volume during lockdown. For this purpose, terrabyte evaluated 1.2 trillion individual measurements from the European satellite Sentinel-5P to reach this conclusion.

Rendering from satellite images
DLR researchers analysed 15,000 satellite images spanning 37 years to investigate how the snow lines in the Aosta Valley in northern Italy have changed. Yellow shows the snow deficit in 2022 compared to the long-term average.

Using the high-performance platform, researchers have also been able to map the development of settlements globally with a resolution of up to ten metres for the first time. The DLR researchers evaluated data dating back over 40 years for this purpose. "The end result, the World Settlement Footprint, even shows streets and buildings," says Mattia Marconcini, a terrabyte Developer at the DFD. "It clearly shows how quickly the world's urban centres are expanding and where settlement density is growing."

terrabyte can also provide support in the area of disaster management. The high-precision mapping of flooded areas supports emergency services in the quick and efficient rescue of flood victims. Up-to-date satellite maps can be made available within 45 minutes. To achieve this, terrabyte evaluates radar data completely automatically, including data from the European Sentinel-1 satellite, whose sensors can penetrate dense clouds.

Secure data instead of cloud systems

Up until now, scientists have mostly used cloud systems from commercial providers such as Amazon Web Services or Google Earth Engine to process vast amounts of data. These providers not only offer significant computing capacity but also incorporate Earth observation data from Europe and the US into their cloud platforms. "However, these cloud systems do not provide the specific Earth observation data required for our specific applications," says Stefan Dech, Director of the DFD. "What is more, neither the data nor the algorithms we have developed for analysis are reliably protected from third-party access. Consequently, in the long term we would inevitably become dependent on proprietary, commercial systems. That was the motivation behind the development of terrabyte," he adds.

terrabyte computing power

"The high-performance platform has simplified working with Earth observation data," says Jonas Eberle, Project Manager of terrabyte. "Instead of days or months, now we might only need hours for complex calculations." The platform is specifically designed for analysing large Earth observation datasets. In addition, the latest software enables quick and easy transmission and execution of applications and programs. These services and tools are continually expanding to adapt terrabyte to new applications and improve the utilisation of computing resources. The Earth observation data is also processed as analysis-ready data (ARD) and can be immediately used and combined without the need for additional pre-processing steps.

terrabyte has simplified working with Earth observation data. Instead of days or months, now we might only need hours for complex calculations.

Jonas Eberle, Project Manager terrabyte

The future of terrabyte

terrabyte will be continuously developed in conjunction with the LRZ over the coming years. Software for workflows will be integrated and standardised services offered to simplify processing. The developers are also working on applications that can automatically make databases available. terrabyte is also part of the DLR project Visual Data Analysis Platform (VisPlore). Its aim is to enable interactive applications to be executed by a web-based system on all three DLR HPC clusters in the future. Julian Zeidler from the DFD is confident: "Thanks to terrabyte, DLR is very well positioned to provide important information on social challenges and global change in light of the rapidly growing volume of Earth observation data."

High-performance computing cluster at DLR

terrabyte

terrabyte is one of three DLR high-performance computing clusters (HPC clusters). CARA and CARO (Computers for Advanced Research in Aerospace) are powerful supercomputers that offer extremely high computing performance. They are used, for example, to simulate flows around aircraft wings and the behaviour of fuel in a rocket engine (see article in DLRmagazine 173).

Trailer Terrabyte LRZ
The German Aerospace Center (DLR) and the Leibniz Supercomputing Center (LRZ) of the Bavarian Academy of Sciences and Humanities have launched 'terrabyte' - one of Europe's largest scientific platforms for analyzing Earth observation data. The declared goal is to make current and historical Earth observation data centrally available to the scientific community for public use.

An article by Anja Philipp from the DLRmagazine 175

Related Links

Contact

Editorial team DLRmagazine

German Aerospace Center (DLR)
Corporate Communications
Linder Höhe, 51147 Cologne