Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control

Over the past few years, a lot of research has been conducted to bring Automatic Speech Recognition (ASR) into various areas of Air Traffic Control (ATC), such as air traffic control simulation and training, monitoring live operators for with the aim of safety improvements, air traffic controller workload measurement and conducting analysis on large quantities controller-pilot speech. Due to the high accuracy requirements of the ATC context and its unique challenges, automatic speech recognition has not been widely adopted in this field. With the aim of providing a good starting point for researchers who are interested bringing automatic speech recognition into ATC, this paper gives an overview of possibilities and challenges of applying automatic speech recognition in air traffic control. To provide this overview, we present an updated literature review of speech recognition technologies in general, as well as specific approaches relevant to the ATC context. Based on this literature review, criteria for selecting speech recognition approaches for the ATC domain are presented, and remaining challenges and possible solutions are discussed.




References:
[1] H. AlShu’eili, G. Sen Gupta, and S. Mukhopadhyay. Voice recognition
based wireless home automation system. In Mechatronics (ICOM), 2011
4th International Conference On, pages 1–6, May 2011.
[2] Tanel Alum¨ae and Leo V˜ohandu. Limited-vocabulary estonian
continuous speech recognition system using hidden markov models.
Informatica, 15(3):303–314, 2004.
[3] Hamid Behravan. Dialect and accent recognition. PhD thesis, 2012.
[4] Francesco Beritelli and Salvatore Serrano. A robust low-complexity
algorithm for voice command recognition in adverse acoustic
environments. In 2006 8th International Conference on Signal
Processing, volume 3. IEEE, 2006.
[5] Fadi Biadsy. Automatic dialect and accent recognition and its
application to speech recognition. PhD thesis, Columbia University,
2011.
[6] Shantanu Chakrabartty, Guneet Singh, and Gert Cauwenberghs. Hybrid
support vector machine/hidden markov model approach for continuous
speech recognition. In Circuits and Systems, 2000. Proceedings of the
43rd IEEE Midwest Symposium on, volume 2, pages 828–831. IEEE,
2000.
[7] Rahul Chitturi, Venkatesh Keri, Gopalakrishna Anumanchipalli, and
Sachin Joshi. Lexical modeling for non native speech recognition using
neural networks. In Proceedings of the International Conference on
Natural Language Processing (ICON–2005), page 79. Allied Publishers,
2005.
[8] Noah B. Coccaro. Latent Semantic Analysis As a Tool to Improve
Automatic Speech Recognition Performance. PhD thesis, Boulder, CO,
USA, 2005. AAI3190360.
[9] Jos´e Manuel Cordero, Manuel Dorado, and Jos´e Miguel de Pablo.
Automated speech recognition in atc environment. In Proceedings of the
2nd International Conference on Application and Theory of Automation
in Command and Control Systems, pages 46–53. IRIT Press, 2012.
[10] Jos´e Manuel Cordero, Natalia Rodr´ıguez, Jos´e Miguel, and Manuel
Dorado. Automated speech recognition in controller communications
applied to workload measurement. Third SESAR Innovation Days, 2013.
[11] KH Davis, R Biddulph, and Stephen Balashek. Automatic recognition
of spoken digits. The Journal of the Acoustical Society of America,
24(6):637–642, 1952.
[12] M. De Wachter, M. Matton, K. Demuynck, P. Wambacq, R. Cools, and
D. Van Compernolle. Template-based continuous speech recognition.
IEEE Transactions on Audio, Speech, and Language Processing,
15(4):1377–1390, May 2007.
[13] Li Deng, Khaled Hassanein, and M Elmasry. Analysis of the correlation
structure for a neural predictive model with application to speech
recognition. Neural Networks, 7(2):331–339, 1994.
[14] Li Deng, Geoffrey Hinton, and Brian Kingsbury. New types of deep
neural network learning for speech recognition and related applications:
An overview. In Acoustics, Speech and Signal Processing (ICASSP),
2013 IEEE International Conference on, pages 8599–8603. IEEE, 2013.
[15] Scott Durling and Jo Lumsden. Speech recognition use in healthcare
applications. In Proceedings of the 6th international conference on
advances in mobile computing and multimedia, pages 473–478. ACM,
2008.
[16] Hakan Erdogan, Ruhi Sarikaya, Stanley F Chen, Yuqing Gao, and
Michael Picheny. Using semantic analysis to improve speech recognition
performance. Computer Speech & Language, 19(3):321–343, 2005.
[17] Eurocontrol. All clear? the path to clear communication. icao standard
phraseology a quick reference guide for commercial air transport pilots.
http://www.skybrary.aero/bookshelf/books/115.pdf, 2011.
[18] AJV-0 VP Mission Support Federal Aviation Administration. Air traffic
control - chapter 2. general control, faa 7110.65 2-1-1. Technical report,
February 19, 2014.
[19] F Fern´andez, J Ferreiros, JM Pardo, V Sama, R de C´ordoba,
J Marias-Guarasa, JM Montero, R San Segundo, LF d’Haro,
M Santamar´ıa, et al. Automatic understanding of atc speech. Aerospace
and Electronic Systems Magazine, IEEE, 21(10):12–17, 2006.
[20] J. Ferreiros, J.M. Pardo, R. de Crdoba, J. Macias-Guarasa, J.M. Montero,
F. Fernndez, V. Sama, L.F. d’Haro, and G. Gonzlez. A speech interface
for air traffic control terminals. Aerospace Science and Technology,
21(1):7 – 15, 2012.
[21] Sadaoki Furui. 50 years of progress in speech and speaker recognition.
SPECOM 2005, Patras, pages 1–9, 2005.
[22] Sadaoki Furui, Masanobu Nakamura, Tomohisa Ichiba, and Koji Iwano.
Why is the recognition of spontaneous speech so hard? In Text, Speech
and Dialogue, pages 9–22. Springer, 2005.
[23] Santosh K Gaikwad, Bharti W Gawali, and Pravin Yannawar. A review
on speech recognition technique. International Journal of Computer
Applications, 10(3):16–24, 2010.
[24] M.J.F. Gales and S.J. Young. Robust continuous speech recognition
using parallel model combination. Speech and Audio Processing, IEEE
Transactions on, 4(5):352–359, Sep 1996.
[25] J. Gauvain and Chin-Hui Lee. Maximum a posteriori estimation for
multivariate gaussian mixture observations of markov chains. Speech
and Audio Processing, IEEE Transactions on, 2(2):291–298, Apr 1994.
[26] Claudiu-Mihai Geac˘ar. Reducing pilot/atc communication errors using
voice recognition. In Proceedings of ICAS, volume 2010, 2010.
[27] Yitagessu Birhanu Gebremedhin, Frank Duckhorn, R¨udiger Hoffmann,
and Ivan Kraljevski. A new approach to develop a syllable based,
continuous amharic speech recognizer. In EUROCON, 2013 IEEE, pages
1684–1689. IEEE, 2013.
[28] Wiqas Ghai and Navdeep Singh. Literature review on automatic
speech recognition. International Journal of Computer Applications,
41(8):42–50, 2012.
[29] A. Graves, N. Jaitly, and A.-R. Mohamed. Hybrid speech recognition
with deep bidirectional lstm. In Automatic Speech Recognition and
Understanding (ASRU), 2013 IEEE Workshop on, pages 273–278, Dec
2013.
[30] Hartmut Helmke, Heiko Ehr, and Matthias Kleinert. Increased
acceptance of controller assistance by automatic speech recognition.
Tenth USA/Europe Air Traffic Management Research and Development
Seminar (ATM2013), 2013.
[31] Horst Hering. Technical analysis of atc controller to pilot voice
communication with regard to automatic speech recognition systems.
EEC note, 1, 2001.
[32] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman
Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick
Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic
modeling in speech recognition: The shared views of four research
groups. Signal Processing Magazine, IEEE, 29(6):82–97, 2012.
[33] John-Paul Hosom. The cslu toolkit: A platform for research and
development of spoken-language systems. Center for Spoken Language
Understanding (CSLU), OGI Campus, Oregon Health & Science
University (OGI/OHSU), visitado em Janeiro de, 2002.
[34] Zhang Hua and Wei Lieh Ng. Speech recognition interface design for
in-vehicle system. In Proceedings of the 2nd International Conference
on Automotive User Interfaces and Interactive Vehicular Applications,
pages 29–33. ACM, 2010.
[35] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, and Raj Foreword
By-Reddy. Spoken language processing: A guide to theory, algorithm,
and system development. Prentice Hall PTR, 2001.
[36] Xuedong Huang, James Baker, and Raj Reddy. A historical perspective
of speech recognition. Commun. ACM, 57(1):94–103, January 2014.
[37] Karlsson Joakim. The integration of automatic speech recognition into
the air traffic control system. Technical report, Cambridge, Mass.:
Flight Transportation Laboratory, Dept. of Aeronautics and Astronautics,
Massachusetts Institute of Technology,[1990], 1990.
[38] Rhys James Jones, Simon Downey, and John S. Mason. Continuous
speech recognition using syllables. In In Proc. Eurospeech ’97, pages
1171–1174, 1997.
[39] Biing-Hwang Juang and Lawrence R Rabiner. Automatic speech
recognition–a brief history of the technology development. Georgia
Institute of Technology. Atlanta Rutgers University and the University
of California. Santa Barbara, 1, 2005.
[40] Daniel Jurafsky, Chuck Wooters, Gary Tajchman, Jonathan Segal,
Andreas Stolcke, Eric Foster, and Nelson Morgan. The berkeley
restaurant project. In ICSLP, volume 94, pages 2139–2142, 1994. [41] H.D. Kopald, A. Chanen, Shuo Chen, E.C. Smith, and R.M. Tarakan.
Applying automatic speech recognition technology to air traffic
management. In Digital Avionics Systems Conference (DASC), 2013
IEEE/AIAA 32nd, pages 6C3–1–6C3–15, Oct 2013.
[42] Cini Kurian and Kannan Balakriahnan. Continuous speech recognition
system for malayalam language using plp cepstral coefficient. Journal
of Computing and Business Research, 3(1), 2012.
[43] KF Leung, FH Frank Leung, HK Lam, and Peter Kwong-Shun Tam.
Neural fuzzy network and genetic algorithm approach for cantonese
speech command recognition. In 2003. FUZZ’03. The 12th IEEE
International Conference on Fuzzy Systems, volume 1, pages 208–213.
IEEE, 2003.
[44] Edward C Lin, Kai Yu, Rob A Rutenbar, and Tsuhan Chen.
A 1000-word vocabulary, speaker-independent, continuous live-mode
speech recognizer implemented in a single fpga. In Proceedings of the
2007 ACM/SIGDA 15th international symposium on Field programmable
gate arrays, pages 60–68. ACM, 2007.
[45] F Marque, SK Bennacef, F Neel, and S Trinh. Parole: a vocal dialogue
system for air traffic control training. In Applications of Speech
Technology, 1993.
[46] LG Miller and S Levinson. Syntactic analysis for large vocabulary
speech recognition using a context-free covering grammar. In Acoustics,
Speech, and Signal Processing, 1988. ICASSP-88., 1988 International
Conference on, pages 271–274. IEEE, 1988.
[47] M. Nofal, E. Abdel-Raheem, H. El Henawy, and N.A. Kader. Acoustic
training system for speaker independent continuous arabic speech
recognition system. In Proceedings of the Fourth IEEE International
Symposium on Signal Processing and Information Technology, 2004.,
pages 200–203, Dec 2004.
[48] Jan Novotn`y, Pavel Sovka, and Jan Uhl´ıˇr. Analysis and optimization
of telephone speech command recognition system performance in noisy
environment. Radioengineering, 13(1):1, 2004.
[49] JM Pardo, J Ferreiros, F Fernandez, Valentin Sama, R De Cordoba,
Javier Macias-Guarasa, JM Montero, R San-Segundo, LF D’Haro, and
Germ´an Gonz´alez. Automatic understanding of atc speech: Study of
prospectives and field experiments for several controller positions. IEEE
Transactions on Aerospace and Electronic Systems, 47(4):2709–2730,
2011.
[50] B.L. Pellom, R. Sarikaya, and J.H.L. Hansen. Fast likelihood
computation techniques in nearest-neighbor based search for continuous
speech recognition. Signal Processing Letters, IEEE, 8(8):221–224, Aug
2001.
[51] Omprakash Prabhakar and Navneet Kumar Sahu. A survey on: Voice
command recognition technique. International Journal of Advanced
Research in Computer Science And Software Engineering, 3(5), 2013.
[52] V Radha and C Vimala. A review on speech recognition challenges and
approaches. doaj. org, 2(1):1–7, 2012.
[53] V. Radha, C. Vimala, and M. Krishnaveni. Continuous speech
recognition system for tamil language using monophone-based hidden
markov model. In Proceedings of the Second International Conference
on Computational Science, Engineering and Information Technology,
CCSEIT ’12, pages 227–231, New York, NY, USA, 2012. ACM.
[54] D. Schaefer. Context-sensitive speech recognition in the air traffic
control simulation. EEC Technical/Scientific Report No. 2001-004, 2001.
[55] ICAO Secretariat. Outlook for air transport to the year 2025. Report
No. Cir, 313, 2007.
[56] Hussien Seid and Bj¨orn Gamb¨ack. A speaker independent continuous
speech recognizer for amharic. INTERSPEECH 2005, 2005.
[57] Benjamin J Shannon and Kuldip K Paliwal. Feature extraction from
higher-lag autocorrelation coefficients for robust speech recognition.
Speech Communication, 48(11):1458–1485, 2006.
[58] CMU Sphinx. Cmu sphinx: Open source toolkit for speech recognition.
Retrieved, 8(13):2010, 2010.
[59] Georg Stemmer, Elmar N¨oth, and Heinrich Niemann. The utility
of semantic-pragmatic information and dialogue-state for speech
recognition in spoken dialogue systems. In Text, Speech and Dialogue,
pages 439–444. Springer, 2000.
[60] Stevenson. Oxford dictionary of english.
[61] Glenn Taylor, J Miller, and Jeff Maddox. Automating simulation-based
air traffic control. In Interservice/Industry Training, Simulation, and
Education Conference, volume 2193, 2005.
[62] R. Thangarajan, A. M. Natarajan, and M. Selvam. Word and triphone
based approaches in continuous speech recognition for tamil language.
WSEAS Trans. Sig. Proc., 4(3):76–85, March 2008.
[63] R Thangarajan, AM Natarajan, and M Selvam. Syllable modeling in
continuous speech recognition for tamil language. International Journal
of Speech Technology, 12(1):47–57, 2009.
[64] Edmondo Trentin and Marco Gori. A survey of hybrid ann/hmm models
for automatic speech recognition. Neurocomputing, 37(1):91–126, 2001.
[65] Thanassis Trikas. Automated speech recognition in air traffic
control. Technical report, Cambridge, Mass.: Massachusetts Institute
of Technology, Dept. of Aeronautics and Astronautics, Flight
Transportation Laboratory, 1987, 1987.
[66] Karen Ward. A speech act model of air traffic control dialogue. 1992.
[67] MARTA WRONISZEWSKA and JACEK DZIEDZIC. Voice command
recognition using hybrid genetic algorithm. TASK QUARTERLY,
14(4):377–396, 2010.
[68] Dong Yu and Li Deng. Deep neural network-hidden markov model
hybrid systems. In Automatic Speech Recognition, pages 99–116.
Springer, 2015.
[69] Bartosz Zi´ołko, Suresh Manandhar, Richard C Wilson, and Mariusz
Zi´ołko. Semantic modelling for speech recognition. Proceedings of
Speech Analysis, Synthesis and Recognition. Applications in Systems
for Homeland Security, Piechowice, Poland, 2008.