Hybrid Modeling Algorithm for Continuous Tamil Speech Recognition

In this paper, Fuzzy C-Means clustering with
Expectation Maximization-Gaussian Mixture Model based hybrid
modeling algorithm is proposed for Continuous Tamil Speech
Recognition. The speech sentences from various speakers are used
for training and testing phase and objective measures are between the
proposed and existing Continuous Speech Recognition algorithms.
From the simulated results, it is observed that the proposed algorithm
improves the recognition accuracy and F-measure up to 3% as
compared to that of the existing algorithms for the speech signal from
various speakers. In addition, it reduces the Word Error Rate, Error
Rate and Error up to 4% as compared to that of the existing
algorithms. In all aspects, the proposed hybrid modeling for Tamil
speech recognition provides the significant improvements for speechto-
text conversion in various applications.





References:
[1] T.B.Adam, M.Salam, “Spoken English Alphabet Recognition with Mel
Frequency Cepstral Coefficients and Back Propagation Neural
Networks”, International Journal of Computer Applications, vol. 42,
no.12, pp. 21-27, March 2012.
[2] M.A.Al-Alaoui, L.Al-Kanj, J.Azar and E.Yaacoub, “Speech Recognition
using Artificial Neural Networks and Hidden Markov Models”, IEEE
Multidisciplinary Engineering Education Magazine, vol. 3, no. 3, pp.
77-86, September 2008.
[3] J. C.Bezdek , Robert Ehrlich and William Full, “FCM: The Fuzzy cmeans
clustering algorithm”, Computers & Geosciences, vol. 10, no. 2-
3, pp. 191-203, 1984.
[4] S.Chattopadhyay, “A Comparative study of Fuzzy C-Means Algorithm
and Entropy-based Fuzzy Clustering Algorithms”, Computing and
Informatics, vol. 30, pp.701–720, 2011.
[5] D.Chazan, R.Hoory, G.Cohen and M.Zibulski, “Speech reconstruction
from mel frequency cepstral coefficients and pitch frequency,” Proc.
ICASSP, vol. 3, pp. 1299–1302, 2000.
[6] S.Davis and P.Mermelstein, “Comparison of parametric representations
for monosyllabic word recognition in continuously spoken sentences,”
IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 28,
no. 4, pp. 357-366, 1980.
[7] J.R.Deller, J.H.L.Hansen and J.G.Proakis, Discrete-Time Processing of
Speech Signals, IEEE Press, New York, 2000.
[8] S.Furui, “Speaker-independent isolated word recognition using dynamic
features of speech spectrum,” IEEE Transactions on Acoustics, Speech
and Signal Processing, vol. 34, no. 1, pp. 52-59, 1986.
[9] G.Hemakumar and P.Punitha, “ Automatic Segmentation of Kannada
Speech Signal into Syllables and Sub-words: Noised and Noiseless
Signals”, International Journal of Scientific & Engineering Research,
vol.5, no.1, pp. 1707- 1711, January 2014.
[10] H.Hermansky and N.Morgan,“RASTA processing of speech”, IEEE
Transactions on Speech and Audio Processing, vol. 2, no.4, pp. 578–
589, 1994.
[11] H.Hermansky, “Perceptual linear predictive (PLP) analysis for speech”,
Journal of Acoustic Society of America, pp. 1738–1752,1990.
[12] T.Kinnunen, T.Kilpeläinen and P.Fränti “Comparison of Clustering
Algorithms in Speaker Identification,” Proceedings of International
Conference on Signal Processing and Communications (SPC 2000),
Spain, pp. 222-227, September 2000.
[13] R.S.Kurcan, “Isolated word recognition from in-ear microphone data
using Hidden Markov Models (HMM)”, Ph.D. Thesis, March 2006.
[14] Y.Linde, A.Buzo and R.M.Gray, “An algorithm for vector quantizer
design,” IEEE Transactions on Communications, vol. 28, pp. 84-95,
1980.
[15] B.Milner and X.Shao, “Prediction of fundamental frequency and voicing
from mel-frequency cepstral coefficients for unconstrained speech
reconstruction,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no.
1, pp. 24–33, January 2007.
[16] S.Moon and J.Hwang, “Robust speech recognition based on joint model
and feature space optimization of hidden Markov models,” IEEE Trans.
Neural Networks, vol. 8, pp. 194–204, March 1997.
[17] L.R.Rabiner and B-H. Juang, Fundamentals of Speech Recognition,
Prentice Hall, 1993.
[18] L.R.Rabiner, “A tutorial on hidden Markov models and selected
applications in speech recognition,” Proceedings of the IEEE, vol.77,
no.2, pp. 257- 286, February 1989.
[19] L.R.Rabiner and M.R.Sambur, “An algorithm for determining the
endpoints of isolated utterances,” The Bell System Technical Journal,
February 1975.
[20] M.M.Rahman and M.A.Bhuiyan, “Continuous Bangla Speech
Segmentation using Short-term Speech Features Extraction
Approaches”, International Journal of Advanced Computer Science and
Applications, vol. 3, no. 11, pp. 131-138, 2012.
[21] D.A.Reynolds and R.C.Rose, “Robust Text-Independent Speaker
Identification using Gaussian Mixture Speaker Models”, IEEE
Transactions on Speech and Audio Processing, vol.3, no.1, pp. 72-83,
January 1995.
[22] X.Shao and B.Milner, “Clean speech reconstruction from noisy melfrequency
cepstral coefficients using a sinusoidal model,” in Proc.
ICASSP, 2003, vol. 1, pp. 704–707.
[23] R.Thangarajan , A.M.Natarajan and M.Selvam, “Syllable modeling in
continuous speech recognition for Tamil language”, International
Journal of Speech Technology, vol.2, pp. 47–57, 2009.
[24] M.Vyas, “A Gaussian Mixture Model based Speech Recognition system
using MATLAB”, Signal & Image Processing: An International Journal
(SIPIJ), vol.4, no.4, pp.109-118, August 2013.
[25] G.S.Ying, C.D.Mitchell and L.H.Jamieson, “Endpoint detection of
isolated utterances based on a modified Teager energy measurement,”
Proceedings of IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP-93), vol. 2, pp. 732-735, April 1993.
[26] Q.Zhu and A.Alwan, “Non-linear feature extraction for robust speech
recognition in stationary and non-stationary noise”, Speech
Communication, March 2003.