Abstract: Over the past few years, a lot of research has been
conducted to bring Automatic Speech Recognition (ASR) into various
areas of Air Traffic Control (ATC), such as air traffic control
simulation and training, monitoring live operators for with the aim
of safety improvements, air traffic controller workload measurement
and conducting analysis on large quantities controller-pilot speech.
Due to the high accuracy requirements of the ATC context and its
unique challenges, automatic speech recognition has not been widely
adopted in this field. With the aim of providing a good starting
point for researchers who are interested bringing automatic speech
recognition into ATC, this paper gives an overview of possibilities
and challenges of applying automatic speech recognition in air traffic
control. To provide this overview, we present an updated literature
review of speech recognition technologies in general, as well as
specific approaches relevant to the ATC context. Based on this
literature review, criteria for selecting speech recognition approaches
for the ATC domain are presented, and remaining challenges and
possible solutions are discussed.
Abstract: This research study aims to present a retrospective
study about speech recognition systems and artificial intelligence.
Speech recognition has become one of the widely used technologies,
as it offers great opportunity to interact and communicate with
automated machines. Precisely, it can be affirmed that speech
recognition facilitates its users and helps them to perform their daily
routine tasks, in a more convenient and effective manner. This
research intends to present the illustration of recent technological
advancements, which are associated with artificial intelligence.
Recent researches have revealed the fact that speech recognition is
found to be the utmost issue, which affects the decoding of speech. In
order to overcome these issues, different statistical models were
developed by the researchers. Some of the most prominent statistical
models include acoustic model (AM), language model (LM), lexicon
model, and hidden Markov models (HMM). The research will help in
understanding all of these statistical models of speech recognition.
Researchers have also formulated different decoding methods, which
are being utilized for realistic decoding tasks and constrained
artificial languages. These decoding methods include pattern
recognition, acoustic phonetic, and artificial intelligence. It has been
recognized that artificial intelligence is the most efficient and reliable
methods, which are being used in speech recognition.
Abstract: In this study, we propose a novel technique for acoustic
echo suppression (AES) during speech recognition under barge-in
conditions. Conventional AES methods based on spectral subtraction
apply fixed weights to the estimated echo path transfer function
(EPTF) at the current signal segment and to the EPTF estimated until
the previous time interval. However, the effects of echo path changes
should be considered for eliminating the undesired echoes. We
describe a new approach that adaptively updates weight parameters in
response to abrupt changes in the acoustic environment due to
background noises or double-talk. Furthermore, we devised a voice
activity detector and an initial time-delay estimator for barge-in speech
recognition in communication networks. The initial time delay is
estimated using log-spectral distance measure, as well as
cross-correlation coefficients. The experimental results show that the
developed techniques can be successfully applied in barge-in speech
recognition systems.
Abstract: We report in this paper the procedure of a system of
automatic speech recognition based on techniques of the dynamic
programming. The technique of temporal retiming is a technique
used to synchronize between two forms to compare. We will see how
this technique is adapted to the field of the automatic speech
recognition. We will expose, in a first place, the theory of the
function of retiming which is used to compare and to adjust an
unknown form with a whole of forms of reference constituting the
vocabulary of the application. Then we will give, in the second place,
the various algorithms necessary to their implementation on machine.
The algorithms which we will present were tested on part of the
corpus of words in Arab language Arabdic-10 [4] and gave whole
satisfaction. These algorithms are effective insofar as we apply them
to the small ones or average vocabularies.
Abstract: Performance of any continuous speech recognition system is highly dependent on performance of the acoustic models. Generally, development of the robust spoken language technology relies on the availability of large amounts of data. Common way to cope with little data for training each state of Markov models is treebased state tying. This tying method applies contextual questions to tie states. Manual procedure for question generation suffers from human errors and is time consuming. Various automatically generated questions are used to construct decision tree. There are three approaches to generate questions to construct HMMs based on decision tree. One approach is based on misrecognized phonemes, another approach basically uses feature table and the other is based on state distributions corresponding to context-independent subword units. In this paper, all these methods of automatic question generation are applied to the decision tree on FARSDAT corpus in Persian language and their results are compared with those of manually generated questions. The results show that automatically generated questions yield much better results and can replace manually generated questions in Persian language.
Abstract: In this project, a tele-operated anthropomorphic
robotic arm and hand is designed and built as a versatile robotic arm
system. The robot has the ability to manipulate objects such as pick
and place operations. It is also able to function by itself, in
standalone mode.
Firstly, the robotic arm is built in order to interface with a personal
computer via a serial servo controller circuit board. The circuit board
enables user to completely control the robotic arm and moreover,
enables feedbacks from user. The control circuit board uses a
powerful integrated microcontroller, a PIC (Programmable Interface
Controller). The PIC is firstly programmed using BASIC (Beginner-s
All-purpose Symbolic Instruction Code) and it is used as the 'brain'
of the robot. In addition a user friendly Graphical User Interface
(GUI) is developed as the serial servo interface software using
Microsoft-s Visual Basic 6.
The second part of the project is to use speech recognition control
on the robotic arm. A speech recognition circuit board is constructed
with onboard components such as PIC and other integrated circuits. It
replaces the computers- Graphical User Interface. The robotic arm is
able to receive instructions as spoken commands through a
microphone and perform operations with respect to the commands
such as picking and placing operations.
Abstract: Emotion in speech is an issue that has been attracting
the interest of the speech community for many years, both in the
context of speech synthesis as well as in automatic speech
recognition (ASR). In spite of the remarkable recent progress in
Large Vocabulary Recognition (LVR), it is still far behind the
ultimate goal of recognising free conversational speech uttered by
any speaker in any environment. Current experimental tests prove
that using state of the art large vocabulary recognition systems the
error rate increases substantially when applied to
spontaneous/emotional speech. This paper shows that recognition
rate for emotionally coloured speech can be improved by using a
language model based on increased representation of emotional
utterances.
Abstract: In this study, the use of silicon NAM (Non-Audible
Murmur) microphone in automatic speech recognition is presented.
NAM microphones are special acoustic sensors, which are attached
behind the talker-s ear and can capture not only normal (audible)
speech, but also very quietly uttered speech (non-audible murmur).
As a result, NAM microphones can be applied in automatic speech
recognition systems when privacy is desired in human-machine communication.
Moreover, NAM microphones show robustness against
noise and they might be used in special systems (speech recognition,
speech conversion etc.) for sound-impaired people. Using a small
amount of training data and adaptation approaches, 93.9% word
accuracy was achieved for a 20k Japanese vocabulary dictation
task. Non-audible murmur recognition in noisy environments is also
investigated. In this study, further analysis of the NAM speech has
been made using distance measures between hidden Markov model
(HMM) pairs. It has been shown the reduced spectral space of NAM
speech using a metric distance, however the location of the different
phonemes of NAM are similar to the location of the phonemes
of normal speech, and the NAM sounds are well discriminated.
Promising results in using nonlinear features are also introduced,
especially under noisy conditions.