Article from DLRmagazine 172: How speech recognition software can improve air traffic control

A mutual understanding

air traffic control at work
In this test, the controller was assisted by voice recognition software in directing traffic in the area around Frankfurt Airport.
Credit:

Fraport AG

Speech recognition has become a part of everyday life. Voice assistants such as Alexa, OK Google and Siri can carry out many useful functions, such as allowing us to enter a new address into our navigation system without taking our hands off the steering wheel. But speech recognition software can also be used to assist air traffic controllers communicating with aircraft in flight and on the ground. It can save them time and even reduce the aircraft's fuel consumption. For many years, the DLR Institute of Flight Guidance has been researching speech recognition software and testing it in new areas of application. But how does this software work, and what other benefits does it offer?

Detail from the TowerPad
Controllers use the so-called TowerPad to control air traffic on the ground
Credit:

TowerPadTM/ATRiCS Advanced Traffic Solutions GmbH

Air traffic controllers (ATCos) use specialised technical vocabulary when they communicate with pilots. Conventional applications such as Siri recognise just half of the words spoken in rapid, complex instructions such as "Speedbird two zero zero zero, reduce one eight zero knots until four miles final, contact tower on frequency one one eight decimal seven zero zero, bye bye!" This difficulty is compounded by the different English accents of the controllers around the world. A recognition rate of 50 percent is far from sufficient for dayto-day operation. That is why the DLR Institute of Flight Guidance has worked with various air navigation service providers and research institutions, such as Saarland University, Swiss research institute Idiap and the University of Brno, to develop speech recognition and understanding systems for air traffic control applications. To do this, the researchers evaluated and transcribed over 50 hours of spoken language. By comparison, Google has access to about 200,000 hours of transcribed speech data. The researchers used the data to train a neural network. The resulting system has now achieved a word recognition rate of over 97 percent for communication in the apron area of Frankfurt Airport.

How does air traffic control communicate?
Above is a typical example of communication from air traffic control. It begins with the callsign – the name of the flight. This is followed by instructions to reduce the speed to 180 knots, no later than four nautical miles before landing. The pilot should then switch to the frequency of 118.700 to communicate with the tower. The pilot repeats the speed and tower frequency and finally the callsign.

Speech recognition versus speech understanding

In addition to the recognition rate, the quality of an application is also determined by the unrecognised words themselves. A failure to understand 'Good morning' is potentially far less fraught with danger than a mix-up with 'heading two six zero'. If the system mishears the 'two' as a 'three', the aircraft will fly north instead of west.

A communication, either from the flight deck or from the air traffic control tower, consists of various elements: a callsign, a command and possible conditions. The pilot must read back every instruction from the air traffic controller in order to confirm it. While doing so, they may alter the order of the words or use slightly different expressions. In order to ensure that the software can 'understand' these variations, in 2018, 22 partners from 15 European countries under the leadership of DLR agreed on rules for the semantic interpretation of radiotelephony communications. This is known as an ontology. DLR has now used this ontology in various speech understanding projects alongside air traffic service providers from all over Europe. These activities have proven its suitability and it is currently being further developed under the leadership of DLR.

From 2024, apron controllers at Frankfurt Airport will have to manually enter all spoken taxi clearances into a control system using a mouse or keyboard. This process will increase safety. However, it will also significantly increase their workload. In the worst-case scenario, this could reduce the number of take-offs and landings. Speech recognition software can transcribe and automatically interpret the spoken commands, leaving the air traffic controller to correct any remaining errors. A recognition rate of 90 percent would mean that they only have to manually enter every tenth clearance. This is what DLR researchers have been investigating in the Safety and Artificial Intelligence Speech Recognition (STARFiSH) project. Idiap developed the speech recogniser used in STARFiSH, DLR supplied the speech understanding module, Freiburg-based company ATRiCS Advanced Traffic Solutions GmbH developed the simulator and the TowerPad, and Frankfurt Airport AG (Fraport) provided the test air traffic controllers. The system was initially tested in the Fraport AG simulator. The tests carried out in the summer of 2022 demonstrated that good speech recognition software can reduce the workload of air traffic controllers, as the input they are required to type is reduced by over 50 percent.

How does speech recognition work?
The software's speech recognition algorithm transforms a spoken voice signal into a sequence of words. The speech understanding algorithm then converts the individual words into semantic units. These may be callsigns, command types or command values. The system also uses command prediction to improve speech recognition and understanding. The command prediction system uses radar, flight plan and weather data to automatically identify the callsigns of aircraft in the vicinity that could soon be addressed. It also provides potential command types and corresponding values for each callsign. Command types include changes in direction, speed or altitude. For this purpose, the command prediction system uses radar, flight plan and weather data. The results of the speech understanding process are automatically displayed on the controller's radar screen.

From simulation to control room

While the STARFiSH project tested the software with voice data from simulation operations, the European HAAWAII project led by DLR is already a step further. Here, speech recognition software was used to recognise and understand instructions issued by Icelandic and British air traffic controllers resulting from live operations. HAAWAII stands for Highly Automated Air Traffic Controller Workstation with Artificial Intelligence Integration and involves Idiap, the University of Brno and the air navigation service providers of Iceland, the United Kingdom, Austria and Croatia. The software not only recognises the speech from the air traffic controllers during live operations, but also from the pilots. This is not only challenging because of the high noise level in the flight deck, but also because of the different speech accents. In addition, the air-ground voice channel is usually very noisy. The project began in 2020 with a voice recognition system that had been trained with 3000 hours of everyday English and with air traffic radio communication, but which had not yet had to contend with live speech from British or Icelandic air traffic control. As a result, word recognition rates were initially poor, with error rates of 30 to 40 percent. British and Icelandic air traffic controllers then transcribed their voice radio communications. Once the researchers had trained the neural network with these data, the word recognition rate rose to over 95 percent for the air traffic controllers and over 90 percent for the pilots. The Icelandic air navigation service provider used speech understanding to help air traffic controllers identify errors when pilots read back controller commands. British air traffic control used speech understanding to predict the workload of air traffic controllers.

Saving fuel with speech recognition

In 2015, researchers at the DLR Institute of Flight Guidance demonstrated that using speech recognition software can also save fuel. Freeing air traffic controllers from the need to enter commands manually allows them to spend more time on their primary task: safe and efficient air traffic management. If a command is issued even slightly late on approach, it can result in the aircraft flying a few kilometres too far in the wrong direction. This creates more work for the controllers. Between 2015 and 2017, tests with air traffic controllers from Germany, Austria, Croatia, the Czech Republic, Denmark, Sweden and Ireland demonstrated that voice recognition software could reduce flight times by an average of approximately 77 seconds. This corresponds to a reduction in fuel consumption of 60 litres of kerosene per flight.

First live application and future ambitions

SESAR
This project has received funding from the SESAR Joint Undertaking under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 874470
Credit:

SESAR

In the summer of 2022, Icelandic air navigation service provider integrated the HAAWAII system into the control room in Reykjavik to display the spoken commands. The aim of this demonstration was to automatically detect readback errors, which the system did successfully. DLR and aircraft manufacturers are now planning to integrate the software into the flight decks to remove the need for pilots to manually enter taxiways into the system while taxiing at the airport. This brings the vision of the one-person flight deck a step closer. Frankfurt Airport is not only planning to use the system in the simulator, but from 2026 at the latest, it will also reduce the workload of air traffic controllers in the control room when entering commands.

Managing traffic at an airport is no easy task
Credit:

VOO QQQ/Unsplash

An article by Hartmut Helmke from the DLRmagazine 172

Contacts

Prof. Dr. Hartmut Helmke

German Aerospace Center (DLR)
Institute of Flight Guidance
Lilienthalplatz 7, 38108 Braunschweig
Germany

Julia Heil

Editorial management DLRmagazine
German Aerospace Center (DLR)
Communications and Media Relations
Linder Höhe, 51147 Cologne