Speaker Independent Quranic Recognizer Basedon Maximum Likelihood Linear Regression

An automatic speech recognition system for the formal Arabic language is needed. The Quran is the most formal spoken book in Arabic, it is spoken all over the world. In this research, an automatic speech recognizer for Quranic based speakerindependent was developed and tested. The system was developed based on the tri-phone Hidden Markov Model and Maximum Likelihood Linear Regression (MLLR). The MLLR computes a set of transformations which reduces the mismatch between an initial model set and the adaptation data. It uses the regression class tree, as well as, estimates a set of linear transformations for the mean and variance parameters of a Gaussian mixture HMM system. The 30th Chapter of the Quran, with five of the most famous readers of the Quran, was used for the training and testing of the data. The chapter includes about 2000 distinct words. The advantages of using the Quranic verses as the database in this developed recognizer are the uniqueness of the words and the high level of orderliness between verses. The level of accuracy from the tested data ranged 68 to 85%.




References:
[1] Al-Diri, B., "A Large Vocabulary Speech Recognition Model for Arabic,"
Master Thesis, University of Jordan, 2002.
[2] Aulama, M., "Arabic Vowel Phonemes Detection and Categorization in
Speech Processing," Master Thesis, University of Jordan, 2001.
[3] Bahl, L. Balakrishnan, S. Bellegarda, J. Franz, M. Gopalakrishnan, P.
Nahamoo, D. Novak, M. Padmanabhan, M. Picheny, M and Roukos, S.,
"Performance of the IBM Large Vocabulary Continuous Speech
Recognition System on the ARPA Wall Street Journal Task," IEEE Inter.
Conf. on Acoustics, Speech and Signal Processing, vol. 1., pp. 41-44,
1995.
[4] Bateman, D. Bye, D. and Hunt, M., "Spectral Constant Normalization and
Other Techniques for Speech Recognition in Noise," Proc. IEEE. Inter.
Conf. Acoustic. Speech Signal Process, vol.1, pp. 241-244, 1992.
[5] Baum, L.E., "An Inequality and Associated Maximization Technique in
Statistical Estimation for Probabilistic Functions of Markov processes,"
Inequalities, vol.3, pp. 1-8, 1972.
[6] Christensen, B. Maurer, J. Nash, M. and Vanlandingham, E., "Accessing
the Internet via the Human Voice,"
www.stanford.edu/~jmaurer/homepage.htm, 2001.
[7] Davis S. and Mermelstein P., "Comparison of Parametric Representations
for Monosyllabic Word Recognition in Continuously Spoken Sentences,"
IEEE Trans. on Acoustics, Speech and Signal Processing, vol. ASSP-28,
no. 4, pp. 357-366, 1980.
[8] Davis, K. Biddulph, R. and Balashek, S. "Automatic Recognition of
Spoken Digits," Journal of the Acoustical Society of America, vol.24, pp.
637-642, 1952.
[9] Deller J., Proakis G. and Hansen J., "Discrete-Time Processing of Speech
Signals," The Institute of Electrical and Electronics Engineers Inc., New
York, 2nd edition, 2000.
[10] Furui S., "Speaker Independent Isolated Word Recognition Using
Dynamic Features of Speech Spectrum," IEEE Trans. on Acoustics,
Speech and Signal Processing, vol. 34, no.1, pp. 52-59, 1986.
[11] Gauvain, J. Lamel, L. and Adda-Decker, M. "Developments in
Continuous Speech Dictation Using ARPA WSJ Task," IEEE Inter. Conf.
on Acoustics, Speech, and Signal Processing, vol. 1, pp.65-68, 1995.
[12] Gold B. and Morgan N., "Speech and Audio Signal Processing:
Processing and Perception of Speech and Music," John Wiley & Sons,
Inc., New York, 2000.
[13] Jelinek, F. "A Fast Sequential Decoding Algorithm Using a Stack," IBM J.
Res. Develop., vol.13, pp. 675-685, 1969.
[14] Jelinek, F. Bahl, L. R. and Mercer, R. L., "Design of a Linguistic
Statistical Decoder for the Recognition of Continuous Speech," IEEE
Trans. Information Theory, vol. IT-21, pp. 250-256, 1975.
[15] Majali, S., "A Model for a Limited Domain of Arabic Speech Recognition
Using Artificial Neural Network," Master Thesis, University of Jordan,
1999.
[16] Markowitz, J., "Using Speech Recognition", Prentice Hall, MA, 1st
edition, USA, 1996.
[17] Mourtaga, E., M. Abdallah, A. Sharieh, and S. Serahn, "Quranic Based
Speaker-Dependent Recognition Using Triphone/HMM Model," accepted
in AMSE, 2005.
[18] Pallett, D. Fiscus, J. Fisher, W. Garofolo, J. Lund, B. Martin, A. and
Przybocki, M., "1994 Benchmark Tests for the ARPA Spoken Language
Program," DARPA Spoken Language Systems Technology Workshop, pp.
5-36, 1995.
[19] Rabiner L., "Fundamentals of Speech Recognition," PTR Prentice-Hall
Inc., New Jersey, 1993.
[20] Ursin, M., "Triphone Clustering in Finish Continuous Speech
Recognition," Master Thesis, Helsinki University of Technology, 2002.
[21] Woodland, P. Leggetter, C. Odell, J. Valtchev, V. and Young, S., "The
1994 HTK Large Vocabulary Speech Recognition System," IEEE Inter.
Conf. on Acoustics, Speech, and Signal Processing, vol.1, pp.73-76, 1995.
[22] Young, S. Evermann, G. Hain, T. Kershaw, D. Moore, G. Odell, J.
Ollason, D. Povey, D. Valtchev, V. Woodland, P., "The HTK Book (for
HTK Version 3.2.1)," Cambridge University, Engineering Department,
2002.