An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System
Speaker Identification (SI) is the task of establishing
identity of an individual based on his/her voice characteristics. The SI
task is typically achieved by two-stage signal processing: training and
testing. The training process calculates speaker specific feature
parameters from the speech and generates speaker models
accordingly. In the testing phase, speech samples from unknown
speakers are compared with the models and classified. Even though
performance of speaker identification systems has improved due to
recent advances in speech processing techniques, there is still need of
improvement. In this paper, a Closed-Set Tex-Independent Speaker
Identification System (CISI) based on a Multiple Classifier System
(MCS) is proposed, using Mel Frequency Cepstrum Coefficient
(MFCC) as feature extraction and suitable combination of vector
quantization (VQ) and Gaussian Mixture Model (GMM) together
with Expectation Maximization algorithm (EM) for speaker
modeling. The use of Voice Activity Detector (VAD) with a hybrid
approach based on Short Time Energy (STE) and Statistical
Modeling of Background Noise in the pre-processing step of the
feature extraction yields a better and more robust automatic speaker
identification system. Also investigation of Linde-Buzo-Gray (LBG)
clustering algorithm for initialization of GMM, for estimating the
underlying parameters, in the EM step improved the convergence rate
and systems performance. It also uses relative index as confidence
measures in case of contradiction in identification process by GMM
and VQ as well. Simulation results carried out on voxforge.org
speech database using MATLAB highlight the efficacy of the
proposed method compared to earlier work.
[1] Furui S, “Recent advances in speaker recognition”,Pattern Recognition
Letters, vol. 18, no. 9, (1997). September, pp. 859–872.H. Simpson,
Dumb Robots, 3rd ed., Springfield: UOS Press, 2004, pp.6-9.
[2] K. Chen, L. Wang, and H. Chi., “Methods of combining multiple
classifiers with different features and their applications to textindependent
speaker identification”. Journal on Pattern Recognition and
Artificial Intelligence, 11(3):417–445, 1997.
[3] Reynolds, D.A., “An overview of automatic speaker recognition
technology”. Proc. IEEE Acoustics Speech Signal Processing 4,4072–
4075 (2002).
[4] Godino-Llorente, J.I., Gómez-Vilda, P., Sáenz Lechón, N., Velasco,
M.B., Cruz Roldán, F., Ballester, M.A.F., “Discriminative Methods for
the Detection of Voice Disorder”. In: A ISCA Tutorial and Research
Workshop on Non-Linear Speech Processing, The COST- 277
Workshop (2005).
[5] Xugang, L., Jianwu, D., “An investigation of Dependencies between
Frequency components and speaker characteristics for text–independent
speaker identification”. Speech Communication 2007 50(4), 312–
322 (2007).
[6] D. A. Reynolds and R. C. Rose, “Robust text independent speaker
identification using Gaussian mixture speaker models”. IEEE Trans.
on Speech and audio processing, vol. 3(1), pp. 72–83, 1995.
[7] Yuk,C.C.Q.L.D.-S., “An HMM approach to text independent speaker
verification”,. In IEEE international conference on Acoustics, Speech
and signal processing, 1996.
[8] F. K. Soong, et. al., “A vector quantization approach to speaker
recognition”, AT & T Technical Journal, Vol.66, No.2, pp. 14-26, 1987.
[9] T. Kinnunen, T., Kilpeläinen,T., Fränti P.: ”Comparison of clustering
algorithms in speaker identification”, proc. Lasted Int. Conf. Signal
Processing and Communications (SPC): 222-
227, Marbella, Spain, 2000.
[10] Y. Linde, A. Buzo and R. M. Gray, “An Algorithm for Vector Quantizer
Design,”IEEE Transactions on Communications, vol. COM-28, pp. 84-
95, January 1980.
[11] Atal, B.; Rabiner, L., “A pattern recognition approach to voicedunvoiced-
silence classification with applications to speech
recognition”, Acoustics,Speech, and Signal Processing (see also IEEE
Transactions on Signal Processing), IEEE Transactions on, Volume: 24 ,
Issue: 3 , Jun 1976, Pages: 201 - 212.
[12] D. G. Childers, M. Hand, J. M. Larar, “ Silent and Voiced/Unvoied/
Mixed Excitation(Four-Way), Classification of Speech”, IEEE
Transaction on ASSP, Vol-37, No-11, pp. 1771-74, Nov 1989.
[13] G. Saha, Sandipan Chakroborty, Suman Senapat , "A New Silence
Removal and end point detection algorithm for speech and Speaker
Recognition Applications", Proceedings of the NCC 2005, Jan.
[14] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from
incomplete data via the EM algorithm, ” J.Royal Stat. Soc., vol 39, pp.
1-38, 1977.
[1] Furui S, “Recent advances in speaker recognition”,Pattern Recognition
Letters, vol. 18, no. 9, (1997). September, pp. 859–872.H. Simpson,
Dumb Robots, 3rd ed., Springfield: UOS Press, 2004, pp.6-9.
[2] K. Chen, L. Wang, and H. Chi., “Methods of combining multiple
classifiers with different features and their applications to textindependent
speaker identification”. Journal on Pattern Recognition and
Artificial Intelligence, 11(3):417–445, 1997.
[3] Reynolds, D.A., “An overview of automatic speaker recognition
technology”. Proc. IEEE Acoustics Speech Signal Processing 4,4072–
4075 (2002).
[4] Godino-Llorente, J.I., Gómez-Vilda, P., Sáenz Lechón, N., Velasco,
M.B., Cruz Roldán, F., Ballester, M.A.F., “Discriminative Methods for
the Detection of Voice Disorder”. In: A ISCA Tutorial and Research
Workshop on Non-Linear Speech Processing, The COST- 277
Workshop (2005).
[5] Xugang, L., Jianwu, D., “An investigation of Dependencies between
Frequency components and speaker characteristics for text–independent
speaker identification”. Speech Communication 2007 50(4), 312–
322 (2007).
[6] D. A. Reynolds and R. C. Rose, “Robust text independent speaker
identification using Gaussian mixture speaker models”. IEEE Trans.
on Speech and audio processing, vol. 3(1), pp. 72–83, 1995.
[7] Yuk,C.C.Q.L.D.-S., “An HMM approach to text independent speaker
verification”,. In IEEE international conference on Acoustics, Speech
and signal processing, 1996.
[8] F. K. Soong, et. al., “A vector quantization approach to speaker
recognition”, AT & T Technical Journal, Vol.66, No.2, pp. 14-26, 1987.
[9] T. Kinnunen, T., Kilpeläinen,T., Fränti P.: ”Comparison of clustering
algorithms in speaker identification”, proc. Lasted Int. Conf. Signal
Processing and Communications (SPC): 222-
227, Marbella, Spain, 2000.
[10] Y. Linde, A. Buzo and R. M. Gray, “An Algorithm for Vector Quantizer
Design,”IEEE Transactions on Communications, vol. COM-28, pp. 84-
95, January 1980.
[11] Atal, B.; Rabiner, L., “A pattern recognition approach to voicedunvoiced-
silence classification with applications to speech
recognition”, Acoustics,Speech, and Signal Processing (see also IEEE
Transactions on Signal Processing), IEEE Transactions on, Volume: 24 ,
Issue: 3 , Jun 1976, Pages: 201 - 212.
[12] D. G. Childers, M. Hand, J. M. Larar, “ Silent and Voiced/Unvoied/
Mixed Excitation(Four-Way), Classification of Speech”, IEEE
Transaction on ASSP, Vol-37, No-11, pp. 1771-74, Nov 1989.
[13] G. Saha, Sandipan Chakroborty, Suman Senapat , "A New Silence
Removal and end point detection algorithm for speech and Speaker
Recognition Applications", Proceedings of the NCC 2005, Jan.
[14] A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from
incomplete data via the EM algorithm, ” J.Royal Stat. Soc., vol 39, pp.
1-38, 1977.
@article{"International Journal of Information, Control and Computer Sciences:70754", author = "Cheima Ben Soltane and Ittansa Yonas Kelbesa", title = "An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System", abstract = "Speaker Identification (SI) is the task of establishing
identity of an individual based on his/her voice characteristics. The SI
task is typically achieved by two-stage signal processing: training and
testing. The training process calculates speaker specific feature
parameters from the speech and generates speaker models
accordingly. In the testing phase, speech samples from unknown
speakers are compared with the models and classified. Even though
performance of speaker identification systems has improved due to
recent advances in speech processing techniques, there is still need of
improvement. In this paper, a Closed-Set Tex-Independent Speaker
Identification System (CISI) based on a Multiple Classifier System
(MCS) is proposed, using Mel Frequency Cepstrum Coefficient
(MFCC) as feature extraction and suitable combination of vector
quantization (VQ) and Gaussian Mixture Model (GMM) together
with Expectation Maximization algorithm (EM) for speaker
modeling. The use of Voice Activity Detector (VAD) with a hybrid
approach based on Short Time Energy (STE) and Statistical
Modeling of Background Noise in the pre-processing step of the
feature extraction yields a better and more robust automatic speaker
identification system. Also investigation of Linde-Buzo-Gray (LBG)
clustering algorithm for initialization of GMM, for estimating the
underlying parameters, in the EM step improved the convergence rate
and systems performance. It also uses relative index as confidence
measures in case of contradiction in identification process by GMM
and VQ as well. Simulation results carried out on voxforge.org
speech database using MATLAB highlight the efficacy of the
proposed method compared to earlier work.", keywords = "Feature Extraction, Speaker Modeling, Feature
Matching, Mel Frequency Cepstrum Coefficient (MFCC), Gaussian
mixture model (GMM), Vector Quantization (VQ), Linde-Buzo-Gray
(LBG), Expectation Maximization (EM), pre-processing, Voice
Activity Detection (VAD), Short Time Energy (STE), Background
Noise Statistical Modeling, Closed-Set Tex-Independent Speaker
Identification System (CISI).", volume = "8", number = "10", pages = "1949-10", }