DLR and Mozilla are researching technologies for voice control of robots
- Focus: Space, digitalisation, artificial intelligence, robotics
There is no room for error when controlling satellites or operating the Columbus Laboratory on the International Space Station. Every single working step and command follows an established procedure and is documented. The German Space Operations Center (GSOC) at the German Aerospace Center (Deutsches Zentrum für Luft- und Raumfahrt; DLR) is developing the voice transmission software openvocs in order to automatically convert control room voice radio transmissions into text, while simultaneously linking key content. In future, this technology could also be used by astronauts to command lunar rovers or other robotic systems. DLR has teamed up with the Mozilla Corporation to investigate whether the open speech-to-text (STT) engine project 'DeepSpeech', could be used for voice-based robot control. The aim is to develop an open software solution that is suitable for free use on smartphones and other common input devices. An initial prototype is to be developed by autumn 2020.
Voice transmission in openvocs is based on the open Web Real-Time Communications (WebRTC) framework and establishes the connections between the voice input device and the robot. This open transmission standard is supported by all of the major web browsers, so a large number of end devices can be used as voice terminals. The basic WebRTC technology is commonly used for data transmission in video conferencing, chats and desktop sharing.
The DLR scientists are pursuing the approach of first converting the voice input into text using 'DeepSpeech'. The openvocs artificial intelligence then analyses this text and detects defined commands. For the test scenario, the developers are using rover controls with simple sets of commands like 'left', 'right', 'forwards' or 'backwards'. In the final step, the text recognition system activates the corresponding motor control for the robot – the voice command is executed.
The voice commands can be programmed individually. The experts at GSOC are using machine learning for this purpose, as well as testing how well the new linguistic model detects the learned commands. Among other things, it is vital that the technology is able to correctly attribute multiple meanings of words, other semantic overlaps and negations. For example, the software has to learn that the expression 'Leave no one behind' is not a movement command and that the commands 'back' and 'backwards' have the same meaning.
The experts in Berlin and Oberpfaffenhofen are also endeavouring to make operating the technology as intuitive as possible. No trigger command should be required to activate voice control. Instead, the predefined commands must be automatically detected within the voice stream. Furthermore, Mozilla's 'DeepSpeech' engine does not require a cloud solution for data processing. It can be downloaded as software and trained on an individual basis. Users can upload the 'speech-to-text' model directly to the robot for local voice recognition. As ‘DeepSpeech’ has an open-source licence, the use of DLR's technology will also be free of charge.
"We started the openvocs project at DLR with the aim of providing an open and flexible platform for control room communication. The Mozilla voice recognition solution fits seamlessly into this. I am very pleased about the combination of both activities, as it offers completely new and exciting possibilities for communication in the space sector," says openvocs systems engineer Markus Töpfer from DLR's Space Operations and Astronaut Training departmen.
Kelly Davis, Manager of Mozilla's Machine Learning Group, adds: "DLR's interest in our STT technology acknowledges our work on performance optimisation for embedded systems and small device platforms. Even though we are still in the test phase: Honestly, the child in me is also excited about the idea of maybe one day being able to look up into the sky and know that we have played a small part in what is happening so many miles above us."
Over the coming months, GSOC will build the 'speech-to-text' interface for the new communications solution and integrate it into DLR's openvocs platform. The team of developers is working specifically on the fundamental technologies in conjunction with Mozilla, so that in future astronauts and users on Earth will be able to move their hands freely when controlling a robot in the future.