A Hybrid GMM/SVM System for Text Independent Speaker Identification

This paper proposes a novel approach that combines statistical models and support vector machines. A hybrid scheme which appropriately incorporates the advantages of both the generative and discriminant model paradigms is described and evaluated. Support vector machines (SVMs) are trained to divide the whole speakers' space into small subsets of speakers within a hierarchical tree structure. During testing a speech token is assigned to its corresponding group and evaluation using gaussian mixture models (GMMs) is then processed. Experimental results show that the proposed method can significantly improve the performance of text independent speaker identification task. We report improvements of up to 50% reduction in identification error rate compared to the baseline statistical model.




References:
[1] D. Reynolds, "An overview of automatic speaker recognition
technology," in Proc. Int. Conf. on Acoust. Speech and Signal Process.
(ICASSP 2002), Orlando, FL, 2002, pp. 4072-4075.
[2] F. Bimbot, J.-F. Bonastre, G. Gravier, I. Chagnolleau, S. Meignier, T.
Merlin, J. Garcia, D. Delacrètaz, and D. Reynolds, "A tutorial on text
independent speaker verification," Eurasip Journal on Applied Signal
Process., vol. 4, pp. 430-451, 2004.
[3] T. Matsui and S. Furui, "Comparison of text independent speaker
recognition methods using VQ distorsion and discrete/continuous
HMM-s," IEEE Trans. Speech and Audio Process., vol. 2, no. 3, pp.
456-459, July 2004.
[4] K. Farell, R. Mammone, and K. Assaleh, "Speaker recognition using
neural networks and conventional classifiers," IEEE Trans. Speech and Audio Process., vol. 2, no. 1, pp. 194-205, 1994.
[5] J. Campbell, "Speaker recognition: A tutorial," Proc. IEEE, vol. 85, no.
9, pp. 1437-1462, Sep. 1997.
[6] D. Reynolds and R. Rose, "Robust text independent speaker
identification using gaussian mixture models," IEEE Trans. Speech and
Audio Process., vol. 3, no. 1, pp. 72-83, Jan. 1995.
[7] B. Boser, I. Guyon, and V. Vapnik, "A training algorithm for optimal
margin classifiers," in Proc. of the 5th Annual ACM Workshop on
Computational learning theory, ACM press, pp. 144-152, 1992.
[8] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag,
1995.
[9] J. Kharroubi, D. Petrovska, and G. Chollet, "Combining GMM-s with
support vector machines classifier," in Proc. European Conf. Speech
Communication and Technology (EUROSPEECH 2001), Aalborg,
Denmark, 2001, pp. 1757-1760.
[10] S. Fine, J. Navratil, and R. Gopinath, "A hybrid GMM/SVM approach to
speaker identification," in Proc. Int. Conf. on Acoust. Speech and Signal
Process. (ICASSP 2001), Salt Lake City, Utah, 2001, pp. 417-420.
[11] X. Dong, W. Zhaohui, and Y. Yingchun, "Exploiting support vector
machines in hidden Markov models for speaker verification," in Proc 7th
Int. Conf. on Spoken Language Process. (ICSLP 2002), Denver,
Colorado, 2002, pp. 1329-1332.
[12] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed., Wiley,
New York, 2001.
[13] A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from
incomplete data via the EM algorithm," J. Royal Stat. Soc., vol. 39, pp.
1-38, 1977.
[14] C. Cortes and V. Vapnik, "Support vector networks," Machine Learning,
vol. 20, pp. 273-297, 1995.
[15] V. Keckman, Learning and Soft Computing, MIT Press, Cambridge,
MA, 2001.
[16] V. Vapnik, Statistical Learning Theory, John Wiley, New York, 1998.
[17] C.J. C. Burges, "A tutorial on support vector machines for pattern
recognition," Data Mining and Knowl. Discov., vol. 2, no. 2, pp. 1-47,
1998.
[18] R. Courant and D. Hilbert, Methods of Mathematical Physics, Wiley
Interscience, New York, 1953.
[19] G. Doddington, M. Przybocki, A. Martin, and D. Reynolds, "The NIST
speaker recognition evaluation: Overview, methodology, systems,
results, perspectives," Speech Communication, vol. 31, pp. 225-254,
2000.
[20] M. Schmidt and H. Gish, "Speaker identification via support vector
machines," in Proc. Int. Conf. on Acoust. Speech and Signal Process.
(ICASSP 96), Atlanta, 1996, pp. 105-108.
[21] V. Wan and S. Renals, "Speaker verification using sequence
discriminant support vector machines," IEEE Trans. Speech Audio
Process., vol. 13, no. 2, pp. 203-210, Mar. 2005.
[22] X. Dong and W. Zhaohui, "Speaker recognition using continuous
density support vector machines," Electronics Letters, vol. 37, no. 17,
pp. 1099-1101, 2001.
[23] V. Wan and S. Renals, "SVMSVM: Support vector machine speaker
verification methodology," in Proc. Int. Conf on Acoust. Speech and
Signal Process. (ICASSP 2003), Hong Kong, 2003, vol. 2, pp. 221-224.
[24] P. Moreno and P. Ho, "A new approach to speaker identification and
verification using probabilistic distance kernels," in Proc. European
Conf. Speech Communication and Technology (EUROSPEECH 2003),
Geneva, Switzerland, 2003, pp. 2965-2968.
[25] J. R. Della, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing
of Speech Signals, 2nd ed., IEEE Press, New York, 2000.
[26] D. O-Shaughnessy, "Interacting with computers by voice: Automatic
speech recognition and synthesis," Proc. IEEE, vol. 91, no. 9, Nov.
2003.
[27] R. Stapert and J. S. Mason, "Speaker recognition and the acoustic speech
space", in Proc. Speaker Odyssey: The Speaker recognition Workshop,
Crete, Greece, 2001, pp. 195-1999.