Speaker Identification Using Admissible Wavelet Packet Based Decomposition

Mel Frequency Cepstral Coefficient (MFCC) features are widely used as acoustic features for speech recognition as well as speaker recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolution in low frequency region, and a low resolution in high frequency region. This kind of processing is good for obtaining stable phonetic information, but not suitable for speaker features that are located in high frequency regions. The speaker individual information, which is non-uniformly distributed in the high frequencies, is equally important for speaker recognition. Based on this fact we proposed an admissible wavelet packet based filter structure for speaker identification. Multiresolution capabilities of wavelet packet transform are used to derive the new features. The proposed scheme differs from previous wavelet based works, mainly in designing the filter structure. Unlike others, the proposed filter structure does not follow Mel scale. The closed-set speaker identification experiments performed on the TIMIT database shows improved identification performance compared to other commonly used Mel scale based filter structures using wavelets.




References:
[1] Z. Tufekci and J.N. Gowdy, "Feature extraction using discrete wavelet
transform for speech recognition," in Proc. IEEE Southeastcon, USA,
2000, pp. 116-123.
[2] O. Farooq and S. Datta, "Phoneme recognition using wavelet based
features," Information Sciences, vol. 150, no.1-2, Mar. 2003, pp. 5-15.
[3] O. Farooq and S. Datta, "Wavelet based robust sub-band features for
phoneme recognition," in IEE Proc. Image signal process., vol. 151, no.
3, June 2004, pp. 187-192.
[4] O. Farooq and S. Datta, "Mel filter like admissible wavelet packet
structure for speech recognition," IEEE Signal Process. Lett., vol. 8, no.
7, pp. 196-199, July 2001.
[5] R. Sarikaya and H. L. Hansen, "High resolution speech feature parameterization
for monophone-based stressed speech recognition," IEEE Signal
Process. Lett., vol. 7, no. 7, pp. 182-185, July 2000.
[6] R. Sarikaya, B. L. Pellom and H. L. Hansen, "Wavelet packet transform
features with application to speaker identification," in Proc. IEEE Nordic
Signal processing Symposium, Visgo, Denmark, 1998, pp. 81-84.
[7] C. T. Hsieh, E. Lai and Y. C. Wang, "Robust speech features based on
wavelet transform with application to speaker identification," in IEE Proc.
Image signal process., vol. 149, no. 2, April 2002, pp. 108-114.
[8] S.-Y. Lung, "Further reduced form of wavelet feature for text independent
speaker recognition," Pattern recognition, vol. 37, 2004, pp. 1565-1566.
[9] S.-Y. Lung, "Wavelet feature selection based neural networks with application
to the text independent speaker recognition," Pattern recognition,
vol. 39, 2006, pp. 1518-1521.
[10] H. M. Torres and H. L. Rufiner, "Automatic speaker identification by
means of Mel cepstrum, wavelets and wavelets packets," in Proc. IEEE
international conference, EMBS, Chicago, IL, 2002, pp. 978-981.
[11] Xugang Lu and Jianwu Dang, "An investigation of dependencies
between frequency components and speaker characteristics for textindependent
speaker identification," Speech communication, vol. 50,
2008, pp. 312-322.
[12] S. Hayakawa and F. Itakura, "Text-dependent speaker recognition using
the information in the higher frequency band," in Proc. IEEE international
conference on Acoustic Speech and signal Processing, ICASSP, Adelaide,
Australia, 1994, pp. 137-140.
[13] S. Mallat, A wavelet tour of signal processing. Second ed., Academic
Press, 1998.
[14] D.A. Reynolds and R.C. Rose, "Robust text-independent speaker identification
using Gaussian mixture speaker models," IEEE Trans. Speech
and Audio Processing, vol. 3, no. 1, pp. 72-83, Jan. 1995.
[15] K. Markov and S. Nakagawa, "Frame level likelihood normalization for
text-independent speaker identification using Gaussian mixture models,"
in Proc. IEEE ICSLP, 1996, pp. 1764-1767.