Data Management and Enrichment
The department focuses on the research and development of methods, tools, and systems in the field of data management and enrichment. The focus here is on:
- Information extraction from documents,
- Development of methods for the automated exchange of heterogeneous data between different stakeholders (interoperability),
- Utilisation of data outside their original context of collection with the help of semantic descriptions,
- Methods for the manual and (semi-)automatic further development of semantic models (knowledge graph evolution), and
- Development of methods and systems for efficient data management, visualisation and exploration of raster, time series and point cloud data in different execution environments and using modern hardware.
The areas of application extend across all areas of DLR with a focus on the circular economy and resilient supply chains, for which data exchange between different stakeholders plays an important role. This is being pursued, for example, in the projects MaTiC-M, COOPERANTS, and Aerospace-X.
The project "Methods and Technologies for an intelligent Circularity of Materials" (MaTiC-M) focuses on the development of sustainable product designs and disassembly technologies; we contribute through modelling and tool development to support a recycling-friendly product design.
"Collaborative Processes and Services for Aeronautics and Space" (COOPERANTS) is located in the area of data exchange in the aeronautics and space sector and focuses in particular on collaborative design processes and data exchange along supply chains. For the latter aspect, we provide support in the area of semantic interoperability.
The Aerospace-X project has a similar aim, although here the focus is on production and quality assurance. Here too, our work focuses on semantic interoperability.
The department consists of three working groups:
Data access and processing
Modern data management systems have to master a variety of challenges nowadays. Data is much more heterogeneous (e.g. raster, time series and point cloud data) and is generated in a wide variety of quantities and speeds, data access patterns are becoming increasingly interactive and diverse (e.g. due to the increase in the use of mobile devices, interactive data exploration or access to data from virtual research environments), and data management systems must be able to run in a wide variety of execution environments (e.g. edge, cloud, embedded). Today's data management systems are not designed in their entirety for these diverse requirements.
Based on these requirements, the research group aims to develop methods and technologies that make it possible to store data of various types in a data management system and at the same time efficiently provide heterogeneous data access methods. The aspects of performance, scalability (in terms of data volume and available hardware resources) and resource efficiency play an essential role here. In particular, research should also take into account trends such as the diversification of computing and storage hardware (e.g. NVMe SSDs, persistent storage, computational storage) and different usage scenarios in the overall architecture of the systems. In summary, the working group pursues the following research topics:
- Database & Information System Architecture
- Efficient Data Management
- Data Storage Technologies
- Big Data Processing & Visualization
Metadata management
Data is increasingly becoming the driving force in many areas of science and industry. The growing number of stakeholders is making the data landscape ever more diverse, extensive and more interesting. However, this increasing heterogeneity also leads to new challenges:
Descriptions can no longer be targeted at just one project or application, but rather have to cater to a general audience; Suitable data sets must be identified accurately from an ever faster growing number of sources; the meaning(s) of certain terms and concepts often differ between participants and need to be translated accordingly.
The Semantic Web promises solutions to these and other challenges. In practice, however, approaches have not yet been able to establish themselves and the full potential of data-driven science and business has yet to be realized.
The aim of the group is to make data available beyond its initial context and beyond the boundaries of projects, institutions, or specialist areas. Among others, the following areas are considered:
- Metadata descriptions
- Semantic Web & knowledge graphs
- Data management in science and industry
- Semantically enhanced tools and services
Information extraction and interoperability
Information and data exchange is essential for communication in a wide variety of areas; for example, between companies with regard to production data or between research institutions with regard to measured and acquired data.
This exchange is made more difficult by the fact that information and data are available in incompatible and sometimes not even machine-actionable formats. For example, these are often stored in documents that are unstructured or semi-structured from a computer perspective.
In order to simplify the exchange of information and to make unstructured data accessible for further automated processing, the working group is therefore pursuing two main areas of research:
Information extraction
From semi-structured data, e.g. tables
Focus on technical documents such as data sheets or measurement reports
Interoperability, especially at semantic level
(Semi-)automatic extension of knowledge graphs
Focus on supply chains, recycling routes, product life cycle