Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems

Statement of the automatic speech recognition
problem, the assignment of speech recognition and the application
fields are shown in the paper. At the same time as Azerbaijan speech,
the establishment principles of speech recognition system and the
problems arising in the system are investigated. The computing algorithms of speech features, being the main part
of speech recognition system, are analyzed. From this point of view,
the determination algorithms of Mel Frequency Cepstral Coefficients
(MFCC) and Linear Predictive Coding (LPC) coefficients expressing
the basic speech features are developed. Combined use of cepstrals of
MFCC and LPC in speech recognition system is suggested to
improve the reliability of speech recognition system. To this end, the
recognition system is divided into MFCC and LPC-based recognition
subsystems. The training and recognition processes are realized in
both subsystems separately, and recognition system gets the decision
being the same results of each subsystems. This results in decrease of
error rate during recognition. The training and recognition processes are realized by artificial
neural networks in the automatic speech recognition system. The
neural networks are trained by the conjugate gradient method. In the
paper the problems observed by the number of speech features at
training the neural networks of MFCC and LPC-based speech
recognition subsystems are investigated. The variety of results of neural networks trained from different
initial points in training process is analyzed. Methodology of
combined use of neural networks trained from different initial points
in speech recognition system is suggested to improve the reliability
of recognition system and increase the recognition quality, and
obtained practical results are shown.




References:
[1] K.R.Ayda-zade, S.S.Rustamov. Research of Cepstral Coefficients
for Azerbaijan speech recognition system. Transactions of
Azerbaijan National Academy of sciences.”Informatics and control
problems”. Volume XXV, №3. Baku, 2005, p.89-94.
[2] K.Р.Айда-заде, Э.Э.Мустафаев. Об оптимизации параметров
нейронной сети на этапе ее обучения / Труды
Республиканской научной конференции «Современные
проблемы информатизации, кибернетики и информационных
технологий», том I, Баку, 2003, с. 118-121.
[3] Mikael Nilsson,Marcus Ejnarsson. “Speech Recognition using
Hidden Markov Model”.Department of Telecommunications and
Speech Processing, Blekinge Institute of Technology. 2002.
http://www.hh.se/staff/maej/publications/MSc Thesis - MiMa.pdf
[4] Group 622 “On Speaker Verification”. 2004. 198 p.
http://www.control.auc.dk/~jhve02/report_inf6.pdf
[5] А.Б.Сергиенко. Цифровая обработка сигналов. СПб.: Питер,
2002, 608 с.
[6] ETSI ES 201 108 v1.1.2 (2000-04). “Speech Processing,
Transmission and Quality aspects(STQ); distributed speech
recognition; Front-end feature extraction algorithm; Compression
algorithms”. 20 p.
http://www.3gpp.org/ftp/TSG_SA/TSG_SA/TSGS_13/docs/PDF/S
P-010566.pdf
[7] Bengt Mandersson. Chapter 4. “Signal Modeling”.Department of
Electroscience. Lund University. August 2005.
http://www.tde.lth.se/ugradcourses/osb/osb05_f2_a4.pdf
[8] Bengt Mandersson. Chapter 5. “Levinson-Durbin Recursion”.
Department of Electroscience. Lund University. September 2005.
http://www.tde.lth.se/ugradcourses/osb/osb05_f3_a4.pdf
[9] Group 11. Tejaswini Hebalkar, Lee Hotraphinyo, Richard Tseng.
“Voice Recognition and Identification System”. Digital
communications and Signal Processing Systems Design. June
2000.
http://www.ece.cmu.edu/~ee551/Final_Reports/Gr11.551.S00.pdf
[10] Bengt Mandersson. Chapter 4. “Signal Modeling”.Department of
Electroscience. Lund University. August 2005.
http://www.tde.lth.se/ugradcourses/osb/osb05_f2_a4.pdf [11] Химмельблау Д. Прикладное нелинейное программирование.
М.: Мир, 1975, 534 с.