Recognition of Isolated Speech Signals using Simplified Statistical Parameters

We present a novel scheme to recognize isolated speech signals using certain statistical parameters derived from those signals. The determination of the statistical estimates is based on extracted signal information rather than the original signal information in order to reduce the computational complexity. Subtle details of these estimates, after extracting the speech signal from ambience noise, are first exploited to segregate the polysyllabic words from the monosyllabic ones. Precise recognition of each distinct word is then carried out by analyzing the histogram, obtained from these information.




References:
[1] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals.
Prentice- Hall, Englewood Cliffs, NJ, 1978.
[2] J. L. Flanagan, "Speech Coding", IEEE Trans. on Communications, vol.
COM-27, April 1979. pp. 710-737.
[3] J. Picone et. al., "Initial Evaluation of Hidden Dynamic Models on
Conversational Speech", in Proc. IEEE ICASSP, Phoenix, Arizona,
USA, May 1999.
[4] F. Chen and E. Chang, "A New Dynamic HMM Model for Speech
Recognition", in Proc. EUROSPEECH 2001, Scandinavia, 2001.
[5] D. Sun, L. Deng and C. Wu, "State-dependent Time Warping in the
Trended Hidden Markov Model", Signal Processing, vol. 39, no. 1, 1994.
pp. 263-275.
[6] L. Deng, "Speech Recognition using Autosegmental Representation
of Phonological Units with Interface to the Trended HMM", Speech
Communication, vol. 23, 1997. pp. 211-222.
[7] A. Agarwal and Y. M Cheng, "Two-stage Mel Warped Wiener Filter for
Robust Speech Recognition", in Proc. ASRU, December 12-15, 1999.
[8] L. Deng et. al., "Large-Vocabulary Speech Recognition under Adverse
Acoustic Environments", in Proc. ICSLP, vol.3, pp. 806-809, 2000.