A New Vector Quantization Front-End Process for Discrete HMM Speech Recognition System

The paper presents a complete discrete statistical framework, based on a novel vector quantization (VQ) front-end process. This new VQ approach performs an optimal distribution of VQ codebook components on HMM states. This technique that we named the distributed vector quantization (DVQ) of hidden Markov models, succeeds in unifying acoustic micro-structure and phonetic macro-structure, when the estimation of HMM parameters is performed. The DVQ technique is implemented through two variants. The first variant uses the K-means algorithm (K-means- DVQ) to optimize the VQ, while the second variant exploits the benefits of the classification behavior of neural networks (NN-DVQ) for the same purpose. The proposed variants are compared with the HMM-based baseline system by experiments of specific Arabic consonants recognition. The results show that the distributed vector quantization technique increase the performance of the discrete HMM system.





References:
[1] X.D. Huang, H.W. Hon, M.Y. Hwang, and K.F. Lee, "A comparative
study of discrete, semi continuous, and continuous hidden Markov
models," Computer Speech and Language, vol. 7, pp. 359-368, 1993.
[2] N. Morgan and H. Bourlard, "Continuous speech recognition," IEEE
Signal Processing Magazine, vol. 12, no. 3, 1995.
[3] J.C Segura, A.J. Rubio, A.M. Peinado, P. Garcia, and R. Roman,
"Multiple VQ hidden Markov modeling for speech recognition," Speech
Communication, vol. 14, pp. 163-170, 1994.
[4] Q. Huo and C. Chan, "Contextual vector quantization for speech
recognition with discrete hidden Markov model," Pattern recognition,
vol. 28 no. 4, pp. 513-517, 1995.
[5] V. Digalakis, S. Tsakalidis, C. Harizakis, and L. Neumeyer, "Efficient
speech recognition using sub vector quantization and discrete-mixture
HMMs," Computer Speech and Language, vol. 14, pp. 33-46, 2000.
[6] F. Lefevre, "Non parametric probability estimation for HMM-based
automatic speech recognition," Computer Speech and Language, vol.
17, pp. 113-136, 2003.
[7] A. Bernard and A. Alwan, "Low-bit-rat distributed speech recognition
for packet-based and wireless communication," IEEE Trans. on Speech
and Audio Processing, vol. 10 no. 8, pp. 570-580, 2002.
[8] R. Ethman, D.A. Subramaniam, and B.D. Rao, "Improved quantization
structure using generalized HMM modeling with application to
wideband speech coding," presented at IEEE Int. Conf. on Audio Speech
and Signal Processing, Montreal, pp. 161-164, 2004.
[9] M.A Elkhouli, "Hearing distinction of speech sound," Arabic Linguistic
and computer science, publication of Tunis university, pp. 267-295,
1989.
[10] S. Davis and P. Mermelstein, "Comparison of parametric representations
for monosyllabic word recognition in continuously spoken sentences,"
IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 28 no. 4,
pp. 357-366, 1980.
[11] Y. Linde, A. Buzo, and R.M Gray, "An algorithm for vector quantizer,"
IEEE Trans. on Communication, vol. 28, no.1, 1980.
[12] L.R Rabiner, "A tutorial on hidden Markov models and selected
applications in speech recognition," Proceeding of the IEEE Trans.
Speech Process, vol. 77, no. 2, pp. 257-285, 1989.
[13] P. Hedelin and J. Skoglound, "Vector quantization based on Gaussian
mixture models," IEEE Trans. on Speech and Audi Processing, vol. 8,
no. 4, pp. 385-401, 2000.
[14] A. Likas, N. Vlassis, and J.J. Verbeck, "The global K-means clustering
algorithm," Pattern Recognition, vol. 36, no. 2, pp. 451-461, 2003.