Improvement of MLLR Speaker Adaptation Using a Novel Method

This paper presents a technical speaker adaptation method called WMLLR, which is based on maximum likelihood linear regression (MLLR). In MLLR, a linear regression-based transform which adapted the HMM mean vectors was calculated to maximize the likelihood of adaptation data. In this paper, the prior knowledge of the initial model is adequately incorporated into the adaptation. A series of speaker adaptation experiments are carried out at a 30 famous city names database to investigate the efficiency of the proposed method. Experimental results show that the WMLLR method outperforms the conventional MLLR method, especially when only few utterances from a new speaker are available for adaptation.

Authors:



References:
[1] J. L. Gauvain, and C. H. Lee, "Maximum A Posteriori Estimation for
Multivariate Gaussian Mixture Observations of Markov Chains," IEEE
Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298,
1994.
[2] G. Zavaliagkos, R. Schwartz, and J. Makhoul, "Batch, Incremental and
Instantaneous Adaptation Techniques for Speech Recognition," in Proc.
the International Conference on Acoustic, Speech and Signal Processing
(ICASSP), 1995, pp. 676-679.
[3] C. H. Lee, C. H. Lin, and B. H. Juang, "A Study on Speaker Adaptation of
the Parameters of Continuous Density Hidden Markov Models," IEEE
Transactions on Acoustics, Speech, and Signal Processing, vol. 39, no. 4,
pp. 806-814, 1991.
[4] S. J. Cox, and J. S. Bridle, "Unsupervised Speaker Adaptation by
Probabilistic Spectrum Fitting," in Proc. the IEEE International
Conference on Acoustic, Speech and Signal Processing (ICASSP), 1989,
pp. 294-297.
[5] C. J. Leggetter, and P. C. Woodland, "Maximum Likelihood Linear
Regression for Speaker Adaptation of Continuous Density Hidden
Markov Models," Computer Speech and Language, vol. 9, pp. 171-185,
1995.
[6] J. T. Chien, L. M. Lee, and H. C. Wang, "Estimation of Channel Bias for
Telephone Speech Recognition," in Proc. the International Conference
on Spoken Language Processing (ICSLP), 1996, pp. 1840-1843.
[7] J. T. Chien, and H. C. Wang, "Telephone Speech Recognition Based on
Bayesian Adaptation of Hidden Markov Models," Speech
Communication, vol. 22, pp. 369-384, 1997.
[8] C. Chesta, O. Siohan, and C. H. Lee, "Maximum A Posteriori Linear
Regression for Hidden Markov Model Adaptation," in Proc. the
European Conference on Speech Communication and Technology
(EUROSPEECH), 1999, pp. 211-214.
[9] W. Chou, "Maximum A Posteriori Linear Regression with Elliptically
Symmetric Matrix Priors," in Proc. the European Conference on Speech
Communication and Technology (EUROSPEECH), 1999, pp. 1-4.
[10] W. Byrne, and A. Gunawardana, "Discounted Likelihood Linear
Regression for Rapid Adaptation," in Proc. the European Conference on
Speech Communication and Technology (EUROSPEECH), 1999, pp.
203-206.
[11] R. Kuhn, J. -C. Junqua, P. Nguyen, and N. Niedzielski, "Rapid Speaker
Adaptation in Eigenvoice Space," IEEE Transactions on Speech and
Audio Processing, vol. 8, no. 6, pp. 695-707, 2000.
[12] K. T. Chen, W. W. Liau, H. M. Wang, and L. S. Lee, "Fast Speaker
Adaptation Using Eigenspace-based Maximum Likelihood Linear
Regression," in Proc. the International Conference on Spoken Language
Processing (ICSLP), 2000, pp. 742-745.
[13] K. T. Chen, and H. M. Wang, "Eigenspace-based Maximum A Posteriori
Linear Regression for Rapid Speaker Adaptation," in Proc. the IEEE
International Conference on Acoustic, Speech and Signal Processing
(ICASSP), 2001, pp. 917-920.
[14] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum Likelihood
from Incomplete Data via the EM Algorithm," Journal of the Royal
Statistical Society, vol. 39, pp. 1-38, 1977.
[15] M. J. F. Gales, and P. C. Woodland, "Mean and Variance Adaptation
Within the MLLR Framework," Computer and Speech Language, vol. 10,
pp. 249-264, 1996.
[16] W. G. Cochran, "Problems Arising in the Analysis of A Series of Similar
Experiments," Journal of the Royal Statistical Society, vol. 4 (Suppl.), pp.
102-118, 1937.
[17] H. C. Wang, "MAT - A Project to Collect Mandarin Speech Data through
Telephone Networks in Taiwan," Comput. Linguist. Chinese Lang.
Process., vol. 2, pp. 73-89, 1997.