Improvement of MLLR Speaker Adaptation Using a Novel Method
This paper presents a technical speaker adaptation
method called WMLLR, which is based on maximum likelihood linear
regression (MLLR). In MLLR, a linear regression-based transform
which adapted the HMM mean vectors was calculated to maximize the
likelihood of adaptation data. In this paper, the prior knowledge of the
initial model is adequately incorporated into the adaptation. A series of
speaker adaptation experiments are carried out at a 30 famous city
names database to investigate the efficiency of the proposed method.
Experimental results show that the WMLLR method outperforms the
conventional MLLR method, especially when only few utterances
from a new speaker are available for adaptation.
[1] J. L. Gauvain, and C. H. Lee, "Maximum A Posteriori Estimation for
Multivariate Gaussian Mixture Observations of Markov Chains," IEEE
Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298,
1994.
[2] G. Zavaliagkos, R. Schwartz, and J. Makhoul, "Batch, Incremental and
Instantaneous Adaptation Techniques for Speech Recognition," in Proc.
the International Conference on Acoustic, Speech and Signal Processing
(ICASSP), 1995, pp. 676-679.
[3] C. H. Lee, C. H. Lin, and B. H. Juang, "A Study on Speaker Adaptation of
the Parameters of Continuous Density Hidden Markov Models," IEEE
Transactions on Acoustics, Speech, and Signal Processing, vol. 39, no. 4,
pp. 806-814, 1991.
[4] S. J. Cox, and J. S. Bridle, "Unsupervised Speaker Adaptation by
Probabilistic Spectrum Fitting," in Proc. the IEEE International
Conference on Acoustic, Speech and Signal Processing (ICASSP), 1989,
pp. 294-297.
[5] C. J. Leggetter, and P. C. Woodland, "Maximum Likelihood Linear
Regression for Speaker Adaptation of Continuous Density Hidden
Markov Models," Computer Speech and Language, vol. 9, pp. 171-185,
1995.
[6] J. T. Chien, L. M. Lee, and H. C. Wang, "Estimation of Channel Bias for
Telephone Speech Recognition," in Proc. the International Conference
on Spoken Language Processing (ICSLP), 1996, pp. 1840-1843.
[7] J. T. Chien, and H. C. Wang, "Telephone Speech Recognition Based on
Bayesian Adaptation of Hidden Markov Models," Speech
Communication, vol. 22, pp. 369-384, 1997.
[8] C. Chesta, O. Siohan, and C. H. Lee, "Maximum A Posteriori Linear
Regression for Hidden Markov Model Adaptation," in Proc. the
European Conference on Speech Communication and Technology
(EUROSPEECH), 1999, pp. 211-214.
[9] W. Chou, "Maximum A Posteriori Linear Regression with Elliptically
Symmetric Matrix Priors," in Proc. the European Conference on Speech
Communication and Technology (EUROSPEECH), 1999, pp. 1-4.
[10] W. Byrne, and A. Gunawardana, "Discounted Likelihood Linear
Regression for Rapid Adaptation," in Proc. the European Conference on
Speech Communication and Technology (EUROSPEECH), 1999, pp.
203-206.
[11] R. Kuhn, J. -C. Junqua, P. Nguyen, and N. Niedzielski, "Rapid Speaker
Adaptation in Eigenvoice Space," IEEE Transactions on Speech and
Audio Processing, vol. 8, no. 6, pp. 695-707, 2000.
[12] K. T. Chen, W. W. Liau, H. M. Wang, and L. S. Lee, "Fast Speaker
Adaptation Using Eigenspace-based Maximum Likelihood Linear
Regression," in Proc. the International Conference on Spoken Language
Processing (ICSLP), 2000, pp. 742-745.
[13] K. T. Chen, and H. M. Wang, "Eigenspace-based Maximum A Posteriori
Linear Regression for Rapid Speaker Adaptation," in Proc. the IEEE
International Conference on Acoustic, Speech and Signal Processing
(ICASSP), 2001, pp. 917-920.
[14] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum Likelihood
from Incomplete Data via the EM Algorithm," Journal of the Royal
Statistical Society, vol. 39, pp. 1-38, 1977.
[15] M. J. F. Gales, and P. C. Woodland, "Mean and Variance Adaptation
Within the MLLR Framework," Computer and Speech Language, vol. 10,
pp. 249-264, 1996.
[16] W. G. Cochran, "Problems Arising in the Analysis of A Series of Similar
Experiments," Journal of the Royal Statistical Society, vol. 4 (Suppl.), pp.
102-118, 1937.
[17] H. C. Wang, "MAT - A Project to Collect Mandarin Speech Data through
Telephone Networks in Taiwan," Comput. Linguist. Chinese Lang.
Process., vol. 2, pp. 73-89, 1997.
[1] J. L. Gauvain, and C. H. Lee, "Maximum A Posteriori Estimation for
Multivariate Gaussian Mixture Observations of Markov Chains," IEEE
Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298,
1994.
[2] G. Zavaliagkos, R. Schwartz, and J. Makhoul, "Batch, Incremental and
Instantaneous Adaptation Techniques for Speech Recognition," in Proc.
the International Conference on Acoustic, Speech and Signal Processing
(ICASSP), 1995, pp. 676-679.
[3] C. H. Lee, C. H. Lin, and B. H. Juang, "A Study on Speaker Adaptation of
the Parameters of Continuous Density Hidden Markov Models," IEEE
Transactions on Acoustics, Speech, and Signal Processing, vol. 39, no. 4,
pp. 806-814, 1991.
[4] S. J. Cox, and J. S. Bridle, "Unsupervised Speaker Adaptation by
Probabilistic Spectrum Fitting," in Proc. the IEEE International
Conference on Acoustic, Speech and Signal Processing (ICASSP), 1989,
pp. 294-297.
[5] C. J. Leggetter, and P. C. Woodland, "Maximum Likelihood Linear
Regression for Speaker Adaptation of Continuous Density Hidden
Markov Models," Computer Speech and Language, vol. 9, pp. 171-185,
1995.
[6] J. T. Chien, L. M. Lee, and H. C. Wang, "Estimation of Channel Bias for
Telephone Speech Recognition," in Proc. the International Conference
on Spoken Language Processing (ICSLP), 1996, pp. 1840-1843.
[7] J. T. Chien, and H. C. Wang, "Telephone Speech Recognition Based on
Bayesian Adaptation of Hidden Markov Models," Speech
Communication, vol. 22, pp. 369-384, 1997.
[8] C. Chesta, O. Siohan, and C. H. Lee, "Maximum A Posteriori Linear
Regression for Hidden Markov Model Adaptation," in Proc. the
European Conference on Speech Communication and Technology
(EUROSPEECH), 1999, pp. 211-214.
[9] W. Chou, "Maximum A Posteriori Linear Regression with Elliptically
Symmetric Matrix Priors," in Proc. the European Conference on Speech
Communication and Technology (EUROSPEECH), 1999, pp. 1-4.
[10] W. Byrne, and A. Gunawardana, "Discounted Likelihood Linear
Regression for Rapid Adaptation," in Proc. the European Conference on
Speech Communication and Technology (EUROSPEECH), 1999, pp.
203-206.
[11] R. Kuhn, J. -C. Junqua, P. Nguyen, and N. Niedzielski, "Rapid Speaker
Adaptation in Eigenvoice Space," IEEE Transactions on Speech and
Audio Processing, vol. 8, no. 6, pp. 695-707, 2000.
[12] K. T. Chen, W. W. Liau, H. M. Wang, and L. S. Lee, "Fast Speaker
Adaptation Using Eigenspace-based Maximum Likelihood Linear
Regression," in Proc. the International Conference on Spoken Language
Processing (ICSLP), 2000, pp. 742-745.
[13] K. T. Chen, and H. M. Wang, "Eigenspace-based Maximum A Posteriori
Linear Regression for Rapid Speaker Adaptation," in Proc. the IEEE
International Conference on Acoustic, Speech and Signal Processing
(ICASSP), 2001, pp. 917-920.
[14] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum Likelihood
from Incomplete Data via the EM Algorithm," Journal of the Royal
Statistical Society, vol. 39, pp. 1-38, 1977.
[15] M. J. F. Gales, and P. C. Woodland, "Mean and Variance Adaptation
Within the MLLR Framework," Computer and Speech Language, vol. 10,
pp. 249-264, 1996.
[16] W. G. Cochran, "Problems Arising in the Analysis of A Series of Similar
Experiments," Journal of the Royal Statistical Society, vol. 4 (Suppl.), pp.
102-118, 1937.
[17] H. C. Wang, "MAT - A Project to Collect Mandarin Speech Data through
Telephone Networks in Taiwan," Comput. Linguist. Chinese Lang.
Process., vol. 2, pp. 73-89, 1997.
@article{"International Journal of Electrical, Electronic and Communication Sciences:49638", author = "Ing-Jr Ding", title = "Improvement of MLLR Speaker Adaptation Using a Novel Method", abstract = "This paper presents a technical speaker adaptation
method called WMLLR, which is based on maximum likelihood linear
regression (MLLR). In MLLR, a linear regression-based transform
which adapted the HMM mean vectors was calculated to maximize the
likelihood of adaptation data. In this paper, the prior knowledge of the
initial model is adequately incorporated into the adaptation. A series of
speaker adaptation experiments are carried out at a 30 famous city
names database to investigate the efficiency of the proposed method.
Experimental results show that the WMLLR method outperforms the
conventional MLLR method, especially when only few utterances
from a new speaker are available for adaptation.", keywords = "hidden Markov model, maximum likelihood linearregression, speech recognition, speaker adaptation.", volume = "3", number = "11", pages = "1907-6", }