Formant Tracking Linear Prediction Model using HMMs for Noisy Speech Processing

This paper presents a formant-tracking linear prediction (FTLP) model for speech processing in noise. The main focus of this work is the detection of formant trajectory based on Hidden Markov Models (HMM), for improved formant estimation in noise. The approach proposed in this paper provides a systematic framework for modelling and utilization of a time- sequence of peaks which satisfies continuity constraints on parameter; the within peaks are modelled by the LP parameters. The formant tracking LP model estimation is composed of three stages: (1) a pre-cleaning multi-band spectral subtraction stage to reduce the effect of residue noise on formants (2) estimation stage where an initial estimate of the LP model of speech for each frame is obtained (3) a formant classification using probability models of formants and Viterbi-decoders. The evaluation results for the estimation of the formant tracking LP model tested in Gaussian white noise background, demonstrate that the proposed combination of the initial noise reduction stage with formant tracking and LPC variable order analysis, results in a significant reduction in errors and distortions. The performance was evaluated with noisy natual vowels extracted from international french and English vocabulary speech signals at SNR value of 10dB. In each case, the estimated formants are compared to reference formants.




References:
[1] X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing.
Prentice Hall PTR, 2001.
[2] R.C. Snell and F. Milinazzo,Formant location from LPC analysis data.
IEEE Trans. Speech Audio Processing, vol. 1, pp. 129-134, Apr. 1993.
[3] S. McCandless,An algorithm for automatic formant extraction using linear
prediction spectra. IEEE Trans. Acoust., Speech, Signal Processing,
vol. ASSP-22, pp. 135-141, 1974.
[4] Noll, A.Cepstrum speech determination. Journal of the Acoustic Society
of America 41 (1), 293-309. 1967.
[5] R. Shafer and L. Rabiner,System for Automatic Formant Analysis of
Voiced Speech. JASA Vol. 47, 1970, pp. 634-648.
[6] C. Espy-Wilson,An Acoustic-Phonetic approach to speech Recognition:
Application to the Semivowels. RLE Technical Report 531, MIT, 1987.
[7] V. Chari,Extraction of Formant Frequencies by Adaptive Enhancement of
Fourier Spectra. MS Th., Boston Univ, 1992.
[8] D. Talkin,Speech Formant Trajectory Estimation Using Dynamic Programming
with Modulated Transition Costs. JASA, S1, 1987 p. S55.
[9] L. Welling and H. Ney,A Model for Efficient Formant Estimation. Proc.
ICASSP 1996 pp. 797-800.
[10] K. Xia and C. Epsy-Wilson.A New Strategy of Formant Tracking based
on Dynamic Programming. In International Conf. on Spoken Language
Processing - ICSLP2000, Beijing, China, October 2000.
[11] A. Acero,Formant analysis and synthesis using hidden markov models.
in Proc. Eur. Conf. Speech Communication Technology, 1999.
[12] Roy Streit and Ross Barrett,Frequency line traking using Hidden Markov
Model. IEEE Trans. On Acoust. Speech, and Signal Proc., vol. ASSP-
38, April 1990.
[13] Depalle, P.,G. Garca, and X. Rodet, Tracking of partials for additive
sound synthesis using hidden markov models. In Proceedings of the
International Conference on Acoustics Speech and Signal Processing
1993.
[14] I. C. Bruce, N. V. Karkhanis, E. D. Young, and M. B.Sachs,Robust
formant tracking in noise. in Proc. Int. Conf. Acoustics, Speech, Signal
Processing (ICASSP), vol. 1, pp. 281-284. 2002.
[15] A. Rao and R. Kumaersan,On decomposing into modulated speech
components. IEEE Transactions on Speech and Audio Processing, pp.
240-254, 2000.
[16] S. Kamath, and P. Loizou,A multi-band spectral subtraction method on
enhancing speech corrupted by colored noise. Proceedings of ICASSP-
2002, Orlando, FL, May 2002.
[17] Dorra. Gargouri, M. A. Zerzri and Ahmed Ben Hamida,Formants
Estimation Algorithm in Noisy Environment. GESTS Int-l Trans.
Computer Science and Engr., Vol.45, No.1, pp. 221-241, Mar. 2008.
[18] K. Weber, S. Bengio, and H. Bourlard, HMM2 extraction of formant
structures and their use for robust ASR. in Proc. Eur. Conf. Speech
Communications and Technology (EUROSPEECH), pp. 607-610 ,2001.
[19] M. A. Kammoun, Dorra Gargouri, Mondher Frikha and Ahmed Ben
Hamida,Cepstrum vs. LPC: A Comparative Study for Speech Formant
Frequencies Estimation. GESTS Int-l Trans. Communication and Signal
Proce., Vol.9, No.1,pp. 87-102, Oct 2006.
[20] Calliope, La parole et son traitement automatique. ed. J.P.Tubach,
Masson,1989.
[21] L. R. Rabiner,Tutorial on hidden Markov models and selected applications
in speech recognition. Proc. IEEE, vol. 77, no. 2, pp. 257-278,
Feb. 1989.
[22] G. D. Forney, Jr.,The Viterbi algorithm. Proc. IEEE, vol. 61, pp.
268-278, Mar. 1973.
[23] Mathworks. Inc. Matlab MEX File API Documentation. Mathworks,
Inc. 2002.
[24] A. P. Varga et al.,The NOISEX-92 - Study on the effect of additive noise
on an automatic speech recognition. In Technical Report; DRA Speech
Research Unit; 1992