Recognition by Online Modeling – a New Approach of Recognizing Voice Signals in Linear Time

This work presents a novel means of extracting fixedlength parameters from voice signals, such that words can be recognized in linear time. The power and the zero crossing rate are first calculated segment by segment from a voice signal; by doing so, two feature sequences are generated. We then construct an FIR system across these two sequences. The parameters of this FIR system, used as the input of a multilayer proceptron recognizer, can be derived by recursive LSE (least-square estimation), implying that the complexity of overall process is linear to the signal size. In the second part of this work, we introduce a weighting factor λ to emphasize recent input; therefore, we can further recognize continuous speech signals. Experiments employ the voice signals of numbers, from zero to nine, spoken in Mandarin Chinese. The proposed method is verified to recognize voice signals efficiently and accurately.




References:
[1] X. Huang and A. Acero and H. W. Won, Spoken Language Processing:
A Guide to Theory, Algorithm, and System Development, Prentice Hall, Upper Saddle River, NJ, 2001.
[2] S. K. Mitra, Digital Signal Processing: A Computer-Based Approach,
McGraw-Hill, New York, NY, 2001.
[3] J.R.Deller and J.G.Proakis and J. H. L. Hansen, Discrete Time Processing
of Speech Signals, Mac Millan, 1993.
[4] Tanja Schultz, Alan W. Black, Stephan Vogel, and Monika Woszczyna,
"Flexible speech translation systems," IEEE Transactions on Audio, Speech, and Language Processin, vol. 14, pp. 403-411, 2006.
[5] William M. Campbell, Joseph P. Campbell, Douglas A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo, "Support vector machines for
speaker and language recognition," Computer Speech & Language, vol.
20, no. 2-3, pp. 210-229, 2006.
[6] N.U. Nair and T.V. Sreenivas, "Multi pattern dynamic time warping
for automatic speech recognition," in Proc. 2008 IEEE Region 10
Conference, 2008, pp. 1-6.
[7] Peter Janˇcoviˇc and M¨unevver K¨ok¨uer, "Incorporating the voicing
information into hmm-based automatic speech recognition in noisy environments," Speech Commun., vol. 51, no. 5, pp. 438-451, 2009.
[8] A' ngel de la Torre, Antonio M. Peinado, Antonio J. Rubio, Jose' C.
Segura, and Carmen Ben'─▒tez, "Discriminative feature weighting for
hmm-based continuous speech recognizers," Speech Commun., vol. 38,
no. 3-4, pp. 267-286, 2002.
[9] Alan V. Oppenheim and Ronald W. Schafer, Discrete-Time Signal
Processing, Prentice Hall, 2009.
[10] Ben M. Chen and Kemao Peng and Tong H. Lee and Venkatakrishnan
Venkataramanan, System Modeling and Identification, Springer London,2006.
[11] J. P. Norton, An Introduction to Identification, Dover Publications, Inc.,
New York, NY, USA, 2009.