Efficient System for Speech Recognition using General Regression Neural Network

In this paper we present an efficient system for independent speaker speech recognition based on neural network approach. The proposed architecture comprises two phases: a preprocessing phase which consists in segmental normalization and features extraction and a classification phase which uses neural networks based on nonparametric density estimation namely the general regression neural network (GRNN). The relative performances of the proposed model are compared to the similar recognition systems based on the Multilayer Perceptron (MLP), the Recurrent Neural Network (RNN) and the well known Discrete Hidden Markov Model (HMM-VQ) that we have achieved also. Experimental results obtained with Arabic digits have shown that the use of nonparametric density estimation with an appropriate smoothing factor (spread) improves the generalization power of the neural network. The word error rate (WER) is reduced significantly over the baseline HMM method. GRNN computation is a successful alternative to the other neural network and DHMM.




References:
[1] L. Rabiner, "A Tutorial on hidden Markov model and selected
applications", in Proc. of IEEE, Vol. 77, n┬░2, 1989.
[2] C. M. Bishop, Neural Networks for Pattern Recognition, Oxford
University Press, 1995.
[3] S. Haykin , Neural Networks: A Comprehensive Foundation", 2nd ed.,
Cliffs, NJ,1999.
[4] R. P. Lippman, "Review of Neural Networks for Speech Recognition"
Neural Computation, n┬░1, pp.1-38, 1989.
[5] F. Jelinek, Statistical Methods for Speech Recognition, Cambridge,
Massachusetts, MIT Press, 1997.
[6] A. Waibel, T. Harazawa, G. Hinton, K. Shakano and K.J. Lang,
"Phoneme recognition using Time-Delay Neural Networks," IEEE
Trans. On ASSP, vol. 37, n┬░3, pp. 328-339, March 1989.
[7] K. Lang, A. Waibel, and G. Hinton, "A Time Delay Neural Network
architecture," Neural Networks, vol. 3, pp. 333-34, 1990.
[8] H. Bourlard, and N. Morgan "Connexionnist techniques", available:
http://cslu.cse.ogi.edu/HLT survey/ch11node7.html, March 2003.
[9] H. Bourlard and C.J. Wellekens "Links between Markov models and
multilayer perceptrons" in IEEE Trans on Pattern Analysis and Machine
Intelligence, Vol 2, pp. 1167-1178, 1990.
[10] K. Kirschoff et al., "Novel approach to Arabic speech recognition,"
Final Report from the JHU Summer School Workshop, 2002.
[11] S.A. Selouani and J. Caelen "Arabic word recognition by classifiers and
context", Journal of Computer Science and Technology, Vol.20, N┬░3,
pp.402-410. May 2005.
[12] H. Bahi and M. Sellami,"Combination of vector quantization and HMM
for Arabic speech recognition ", ACS/ IEEE Int. Conf. on Computer
System and Applications AICCSA-01, pp.96-101, Beirut, Lebanon, 2001.
[13] T. Cacoulos "Estimation of a multivariate density" Ann. Inst. Math.
Tokyo, Vol. 18, n┬░2, pp. 179-189, 1966.
[14] D. F. Specht "A General Regression Neural Networks" IEEE Trans. on
Neural Networks, Vol. 2, n┬░6, pp. 568-576, Nov. 1991.
[15] D.F. Specht, Probabilistic Neural Networks and General Regression
Neural Networks, FuzzyLogic and Neural Network Handbook, Chap3.
Mac Grow Hill inc. 1995.