June 29, 2020

How artificial intelligence helps us to teach robots to "see"

Localization of Rollin' Justin in a simulated environment
Credit:
DLR (CC-BY3.0)
DownloadDownload

In robotics there are still many basic problems that need to be solved in order to use a robot useful in the household or in care. One of the exciting questions is: How can robots "see" and recognize their environment? Researcher Rudolph Triebel illustrates how artificial intelligence learns to recognize objects such as chairs or tables through colors and numbers. In this context, he presents the software BlenderProc developed in his department.

The "brain" of a robot is essentially a computer that processes the input signals from the robot's sensors - also called perception - and then sends commands to the motors. This is the field of work of my department "Perception and Cognition" at the Institute of Robotics and Mechatronics. Here we develop methods for data acquisition using different sensors, for efficient storage and, above all, for interpretation of these data. Of particular importance are image data from single and stereo camera systems as well as depth data, where in addition to color the distance between camera and object is measured. Put simply, these sensors serve to enable the robot to "see".

From numbers to pictures

In order for a robot not only to "see" its environment, but also to recognize things in it, we need a link between the color information coming from the images and the so-called semantics, which describes what exactly this image represents. For the computer, an image consists of several million individual picture elements called pixels. Each pixel is a color point with a unique color, which is stored as a number. This means that an image consists of a large amount of numbers that must be converted into useful information in order to recognize what is on the image.

Links: Künstlich erzeugtes Bild eines Schlafzimmers (=Eingabewert).  Rechts: Die dazugehörige Segmentierung, d.h. jede Farbe entspricht einer Objektklasse (=Ausgabwert).
Links: Künstlich erzeugtes Bild eines Schlafzimmers (=Eingabewert). Rechts: Die dazugehörige Segmentierung, d.h. jede Farbe entspricht einer Objektklasse (=Ausgabwert).

In concrete terms, the task of the engineers is to write a computer program that assigns each picture element the number that belongs to the corresponding object class. An object class is not just about the one concrete chair we see, but about a chair as such - the class of chairs. For the existing object classes (in our picture thus "chair", "table" or "floor") the corresponding numbers must be defined in advance so that the assignment is unambiguous. The computer program you are looking for can then be imagined in simplified form as a mathematical function which calculates the segmentation y from the input image x.

The big problem now is that a human programmer cannot simply write down this function, because the relations between input and output are much too complex. It is not possible to create simple rules here - for example that every white pixel is automatically assigned to a wall, because there are other white pixels and the wall can have shades. In addition, problems can occur, for example, with occlusion: Objects are only partially visible because other objects are partly in front of them.
 
Therefore, machine learning is used instead. In simplified terms: from an existing set of input images and associated segmentations, the function is calculated in such a way that its output y matches the known segmentation as closely as possible. With the help of the data, the algorithm learns to generate output images (like the right image in our example) on its own. It can then apply the learned function to new data. However, a single pair of images is not sufficient as a data source to learn the function. The more image data - we call this "training data" - is available, the better the artificial intelligence can achieve.

BlenderProc: Our "digital trainer

At the German Aerospace Center (DLR), we have therefore developed a novel process that can automatically generate such training data. The idea is to simulate the depicted scenes in the computer and thereby generate images of the scenes that are as realistic as possible. This technique is also used in animated films, for example, to create artificial worlds or figures.

Our OpenSource software developed for this purpose is called BlenderProc, because it is based on the program Blender. BlenderProc calculates the exact course of the light rays for each pixel from given object positions and illumination directions (ray tracing). This results in an extremely realistic image that is difficult to distinguish from a real camera image. In our examples, the left image of the bedroom and the two images are created with the robot Rollin' Justin.

The great advantage of this artificially generated data is that the information to be learned can be automatically generated. For each artificially generated color pixel the object class can be calculated simultaneously. Furthermore, many other interesting information can be determined - like the depth of each pixel, i.e. the distance between camera and object. In our bedroom example, BlenderProc can "calculate" or "see" on the basis of the colors how much the table is tilted, the distance between the picture and the bed or the depth of the bedside table. A person, on the other hand, cannot describe exactly how much the inclination of a chair or table is, for example.

From all these calculations, a training data set can be generated, which is arbitrarily large, by placing the virtual camera at as many randomly selected positions as possible and regenerating the view from there. These are always values that the artificial intelligence can learn. Finally, it can apply such a learned function to new, "real" data - a nursing robot, for example, can take pictures of a living room and use the calculated function for this real environment.
 
Our software consists of several modules, which are executed one after the other. These modules first load the simulated scene and then place the camera at different locations in the room. At the end the images and segmentations are generated. This process can be repeated as often as desired for different rooms. If one uses this data to train the machine learning algorithm, one can get the robot to segment such new and unknown scenes semantically as well.

Similar to us humans, the robot learns to recognize more and more on the "seen" pictures step by step - even if it needs much more training data than a human. Thanks to BlenderProc, however, we are confident that one day a robot will be able to describe its environment as well as we do. This will enable us to teach a nursing robot in the future how to find lost glasses, for example. We will "see"!

For those who want to go even deeper into the subject: As open source software, we offer BlenderProc, an open interface that other researchers can use. It contains many examples that make it easy to get started.

Contact

Lioba Suchenwirth

Public Relations
Institute of Robotics and Mechatronics
Institute Development and Central Management
Münchener Straße 20, 82234 Oberpfaffenhofen-Weßling
Tel: +49 8153 28-4292

Prof. Dr. rer. nat. habil. Rudolph Triebel

Head of Department
Institute of Robotics and Mechatronics
Perception and Cognition
Münchener Straße 20, 82234 Oberpfaffenhofen-Weßling