Recognition by Online Modeling – a New Approach of Recognizing Voice Signals in Linear Time
This work presents a novel means of extracting fixedlength parameters from voice signals, such that words can be recognized
in linear time. The power and the zero crossing rate are first
calculated segment by segment from a voice signal; by doing so, two
feature sequences are generated. We then construct an FIR system
across these two sequences. The parameters of this FIR system, used
as the input of a multilayer proceptron recognizer, can be derived by
recursive LSE (least-square estimation), implying that the complexity of overall process is linear to the signal size. In the second part of
this work, we introduce a weighting factor λ to emphasize recent
input; therefore, we can further recognize continuous speech signals.
Experiments employ the voice signals of numbers, from zero to nine, spoken in Mandarin Chinese. The proposed method is verified to
recognize voice signals efficiently and accurately.
[1] X. Huang and A. Acero and H. W. Won, Spoken Language Processing:
A Guide to Theory, Algorithm, and System Development, Prentice Hall, Upper Saddle River, NJ, 2001.
[2] S. K. Mitra, Digital Signal Processing: A Computer-Based Approach,
McGraw-Hill, New York, NY, 2001.
[3] J.R.Deller and J.G.Proakis and J. H. L. Hansen, Discrete Time Processing
of Speech Signals, Mac Millan, 1993.
[4] Tanja Schultz, Alan W. Black, Stephan Vogel, and Monika Woszczyna,
"Flexible speech translation systems," IEEE Transactions on Audio, Speech, and Language Processin, vol. 14, pp. 403-411, 2006.
[5] William M. Campbell, Joseph P. Campbell, Douglas A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo, "Support vector machines for
speaker and language recognition," Computer Speech & Language, vol.
20, no. 2-3, pp. 210-229, 2006.
[6] N.U. Nair and T.V. Sreenivas, "Multi pattern dynamic time warping
for automatic speech recognition," in Proc. 2008 IEEE Region 10
Conference, 2008, pp. 1-6.
[7] Peter Janˇcoviˇc and M¨unevver K¨ok¨uer, "Incorporating the voicing
information into hmm-based automatic speech recognition in noisy environments," Speech Commun., vol. 51, no. 5, pp. 438-451, 2009.
[8] A' ngel de la Torre, Antonio M. Peinado, Antonio J. Rubio, Jose' C.
Segura, and Carmen Ben'─▒tez, "Discriminative feature weighting for
hmm-based continuous speech recognizers," Speech Commun., vol. 38,
no. 3-4, pp. 267-286, 2002.
[9] Alan V. Oppenheim and Ronald W. Schafer, Discrete-Time Signal
Processing, Prentice Hall, 2009.
[10] Ben M. Chen and Kemao Peng and Tong H. Lee and Venkatakrishnan
Venkataramanan, System Modeling and Identification, Springer London,2006.
[11] J. P. Norton, An Introduction to Identification, Dover Publications, Inc.,
New York, NY, USA, 2009.
[1] X. Huang and A. Acero and H. W. Won, Spoken Language Processing:
A Guide to Theory, Algorithm, and System Development, Prentice Hall, Upper Saddle River, NJ, 2001.
[2] S. K. Mitra, Digital Signal Processing: A Computer-Based Approach,
McGraw-Hill, New York, NY, 2001.
[3] J.R.Deller and J.G.Proakis and J. H. L. Hansen, Discrete Time Processing
of Speech Signals, Mac Millan, 1993.
[4] Tanja Schultz, Alan W. Black, Stephan Vogel, and Monika Woszczyna,
"Flexible speech translation systems," IEEE Transactions on Audio, Speech, and Language Processin, vol. 14, pp. 403-411, 2006.
[5] William M. Campbell, Joseph P. Campbell, Douglas A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo, "Support vector machines for
speaker and language recognition," Computer Speech & Language, vol.
20, no. 2-3, pp. 210-229, 2006.
[6] N.U. Nair and T.V. Sreenivas, "Multi pattern dynamic time warping
for automatic speech recognition," in Proc. 2008 IEEE Region 10
Conference, 2008, pp. 1-6.
[7] Peter Janˇcoviˇc and M¨unevver K¨ok¨uer, "Incorporating the voicing
information into hmm-based automatic speech recognition in noisy environments," Speech Commun., vol. 51, no. 5, pp. 438-451, 2009.
[8] A' ngel de la Torre, Antonio M. Peinado, Antonio J. Rubio, Jose' C.
Segura, and Carmen Ben'─▒tez, "Discriminative feature weighting for
hmm-based continuous speech recognizers," Speech Commun., vol. 38,
no. 3-4, pp. 267-286, 2002.
[9] Alan V. Oppenheim and Ronald W. Schafer, Discrete-Time Signal
Processing, Prentice Hall, 2009.
[10] Ben M. Chen and Kemao Peng and Tong H. Lee and Venkatakrishnan
Venkataramanan, System Modeling and Identification, Springer London,2006.
[11] J. P. Norton, An Introduction to Identification, Dover Publications, Inc.,
New York, NY, USA, 2009.
@article{"International Journal of Information, Control and Computer Sciences:52776", author = "Jyh-Da Wei and Hsin-Chen Tsai", title = "Recognition by Online Modeling – a New Approach of Recognizing Voice Signals in Linear Time", abstract = "This work presents a novel means of extracting fixedlength parameters from voice signals, such that words can be recognized
in linear time. The power and the zero crossing rate are first
calculated segment by segment from a voice signal; by doing so, two
feature sequences are generated. We then construct an FIR system
across these two sequences. The parameters of this FIR system, used
as the input of a multilayer proceptron recognizer, can be derived by
recursive LSE (least-square estimation), implying that the complexity of overall process is linear to the signal size. In the second part of
this work, we introduce a weighting factor λ to emphasize recent
input; therefore, we can further recognize continuous speech signals.
Experiments employ the voice signals of numbers, from zero to nine, spoken in Mandarin Chinese. The proposed method is verified to
recognize voice signals efficiently and accurately.", keywords = "Speech Recognition, FIR system, Recursive LSE, Multilayer Perceptron", volume = "5", number = "5", pages = "447-4", }