Efficient DTW-Based Speech Recognition System for Isolated Words of Arabic Language

Despite the fact that Arabic language is currently one of the most common languages worldwide, there has been only a little research on Arabic speech recognition relative to other languages such as English and Japanese. Generally, digital speech processing and voice recognition algorithms are of special importance for designing efficient, accurate, as well as fast automatic speech recognition systems. However, the speech recognition process carried out in this paper is divided into three stages as follows: firstly, the signal is preprocessed to reduce noise effects. After that, the signal is digitized and hearingized. Consequently, the voice activity regions are segmented using voice activity detection (VAD) algorithm. Secondly, features are extracted from the speech signal using Mel-frequency cepstral coefficients (MFCC) algorithm. Moreover, delta and acceleration (delta-delta) coefficients have been added for the reason of improving the recognition accuracy. Finally, each test word-s features are compared to the training database using dynamic time warping (DTW) algorithm. Utilizing the best set up made for all affected parameters to the aforementioned techniques, the proposed system achieved a recognition rate of about 98.5% which outperformed other HMM and ANN-based approaches available in the literature.




References:
[1] M. Al-Zabibi, "An Acoustic-Phonetic Approach in Automatic Arabic
Speech Recognition," The British Library in Association with UMI,
UK, 1990, http://hdl.handle.net/2134/6949.
[2] M. Alkhouli, "Alaswaat Alaghawaiyah," Daar Alfalah, Jordan, 1990 (in
Arabic).
[3] M. Elshafei, "Toward an Arabic Text-to-Speech System," The Arabian
Journal for Science and Engineering, vol. 16, no. 4B, pp. 565-83,
October 1991.
[4] S.B. Davis, P. Mermelstein, "Comparison of parametric representations
for monosyllabic word recognition in continuously spoken sentences,"
IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.
28, no.4, pp. 357-366, August 1980.
[5] Z. Hachkar, A. Farchi, B. Mounir, J. El Abbadi, "A Comparison of
DHMM and DTW for Isolated Digits Recognition System of Arabic
Language," International Journal on Computer Science and
Engineering, vol.3, no.3, pp.1002-1008, March 2011.
[6] Lindasalwa Muda, Mumtaj Begam, I. Elamvazuthi, "Voice Recognition
Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and
Dynamic Time Warping (DTW)", Journal of Computing, vol. 2, no. 3,
pp. 138-143, March 2010.
[7] Stan Salvador, Philip Chan, "Toward Accurate Dynamic Time Warping
in Linear Time and Space", Intelligent Data Analysis Journal, vol. 11,
no. 5, pp. 561-580, October 2007.
TABLE I
RECOGNITION RATES FOR DIFFERENT FEATURE SETS
Tested Word
(Arabic Writing)
Transcription English Writing Approach#1:
VAD+MFCC
Approach#2:
VAD+MFCC+Δ
Approach#3:
VAD+MFCC+Δ+ΔΔ
_` ┘êϺ WAHID ONE 85.7% 100% 100%
┘åghi Ϻ ITHNAN TWO 100% 100% 100%
Ziji THALATHA THREE 100% 100% 100%
Z[k ÏúÏ▒ ARBAA FOUR 100% 100% 100%
Znop KHAMSA FIVE 100% 100% 100%
Zq] SITTA SIX 85.7% 85.7% 85.7%
Z[\] SABAA SEVEN 100% 100% 100%
ZQsgoi THAMANIYA EIGHT 100% 100% 100%
Z[nt TISAA NINE 100% 100% 100%
ϮvwS ASHRA TEN 85.7% 100% 100%
┘àjny Ϻ ASSALAAMU PEACE 100% 100% 100%
OPQRS ALAIKUM UPON YOU 100% 100% 100%
zQ Ïó KEEF HOW 100% 100% 100%
{yg` HALAK ARE YOU 85.7% 85.7% 85.7%
g| MA WHAT 100% 100% 100%
{o] Ϻ ESMOK YOUR NAME 100% 100% 100%
O Ïó KAM HOW 85.7% 85.7% 100%
كvoS OMROK YOUR AGE 100% 100% 100%
{qh~| MEHNATOK YOUR
OCCUPATION
100% 100% 100%
[8] D. Vergyri, K. Kirchhoff, K. Duh, A. Stolcke, "Morphology-based
language modeling for Arabic speech recognition", In INTERSPEECH-
2004, pp. 2245-2248, 2004.
[9] K. Kirchho, J. Bilmes, J. Henderson, R. Schwartz, M. Noamany, P.
Schone, G. Ji, S. Das, M. Egan, F. He, D. Vergyri, D. Liu, and N. Duta,
"Novel Approaches to Arabic Speech Recognition," Technical Report,
Johns-Hopkins University, 2002.
[10] D. Vergyri, K. Kirchhoff. "Automatic diacritization of Arabic for
acoustic modeling in speech recognition", In Ali Farghaly and Karine
Megerdoomian, editors, COLING 2004, Computational Approaches to
Arabic Scriptbased Languages, pp. 66-73, Geneva, Switzerland, 2004.
[11] H. Satori, M. Harti, N. Chenfour, "Introduction to Arabic Speech
Recognition Using CMUSphinx System," Proceedings of Information
and Communication Technologies International Symposium (ICTIS'07),
Fes, Morocco, pp. 139-115, July 2007.
[12] Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech
recognition, Upper Saddle River, New Jersey: Prentice Hall, USA, 1993
[13] X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing,
Upper Saddle River, New Jersey: Prentice Hall, USA, 2001.
[14] B. Gold and N. Morgan, Speech and Audio Signal Processing, New
York, New York: John Wiley and Sons, USA, 2000.
[15] Mikael Nilsson and Marcus Ejnarsson, "Speech Recognition using
Hidden Markov Model (performance evaluation in noisy environment)",
Masters Thesis, Department of Telecommunications and Signal
Processing, Belkinge Institute of Technology, Ronneby, Sweden, March
2002.
[16] B.S. Jinjin Ye, "Speech Recognition Using Time Domain Features From
Phase Space Reconstructions", Masters Thesis, Department of Electrical
and Computer Engineering, Marquette University, Milwaukee,
Wisconsin, May 2004.
[17] Khalid Saeed and Mohammad Nammous, Heuristic Method of Arabic
Speech Recognition, Bialystok University of Technology, Poland,
http://aragorn.pb.bialystok.pl/~zspinfo/
[18] Mohamed Mostafa Azmi, Hesham Tolba, Sherif Mahdy, Mervat Fashal,
"Syllable-Based Automatic Arabic Speech Recognition", Proceedings
of WSEAS International conference of Signal Processing, Robotics and
Automation (ISPRA- 08), University of Cambridge, UK, pp. 246-250,
February 2008.
[19] H. Bahi and M. Sellami, "Combination of Vector Quantization and
Hidden Markov Models for Arabic Speech Recognition," Proceedings
of the ACS/IEEE International Conference on Computer Systems and
Applications (AICCSA 2001), Beirut, Lebanon, pp: 96-100, June 2001.
[20] W. Alkhaldi, W. Fakhr, N. Hamdy, "Multi-Band Based Recognition of
Spoken Arabic Numerals Using Wavelet Transform," Proceedings of
the 19th National Radio Science Conference (NRSC-01), Alexandria
University, Alexandria, Egypt, March 19-21, 2002.
[21] F.A. Elmisery, A.H. Khalil, A.E. Salama, H.F. Hammed, "A FPGA
Based HMM for a Discrete Arabic Speech Recognition System,"
Proceedings of the 15th International Conference on Microelectronics
(ICM 2003), Cairo, Egypt, December 9-11, 2003.