Speaker Identification Using Admissible Wavelet Packet Based Decomposition
Mel Frequency Cepstral Coefficient (MFCC) features
are widely used as acoustic features for speech recognition as well
as speaker recognition. In MFCC feature representation, the Mel frequency
scale is used to get a high resolution in low frequency region,
and a low resolution in high frequency region. This kind of processing
is good for obtaining stable phonetic information, but not suitable
for speaker features that are located in high frequency regions. The
speaker individual information, which is non-uniformly distributed
in the high frequencies, is equally important for speaker recognition.
Based on this fact we proposed an admissible wavelet packet based
filter structure for speaker identification. Multiresolution capabilities
of wavelet packet transform are used to derive the new features.
The proposed scheme differs from previous wavelet based works,
mainly in designing the filter structure. Unlike others, the proposed
filter structure does not follow Mel scale. The closed-set speaker
identification experiments performed on the TIMIT database shows
improved identification performance compared to other commonly
used Mel scale based filter structures using wavelets.
[1] Z. Tufekci and J.N. Gowdy, "Feature extraction using discrete wavelet
transform for speech recognition," in Proc. IEEE Southeastcon, USA,
2000, pp. 116-123.
[2] O. Farooq and S. Datta, "Phoneme recognition using wavelet based
features," Information Sciences, vol. 150, no.1-2, Mar. 2003, pp. 5-15.
[3] O. Farooq and S. Datta, "Wavelet based robust sub-band features for
phoneme recognition," in IEE Proc. Image signal process., vol. 151, no.
3, June 2004, pp. 187-192.
[4] O. Farooq and S. Datta, "Mel filter like admissible wavelet packet
structure for speech recognition," IEEE Signal Process. Lett., vol. 8, no.
7, pp. 196-199, July 2001.
[5] R. Sarikaya and H. L. Hansen, "High resolution speech feature parameterization
for monophone-based stressed speech recognition," IEEE Signal
Process. Lett., vol. 7, no. 7, pp. 182-185, July 2000.
[6] R. Sarikaya, B. L. Pellom and H. L. Hansen, "Wavelet packet transform
features with application to speaker identification," in Proc. IEEE Nordic
Signal processing Symposium, Visgo, Denmark, 1998, pp. 81-84.
[7] C. T. Hsieh, E. Lai and Y. C. Wang, "Robust speech features based on
wavelet transform with application to speaker identification," in IEE Proc.
Image signal process., vol. 149, no. 2, April 2002, pp. 108-114.
[8] S.-Y. Lung, "Further reduced form of wavelet feature for text independent
speaker recognition," Pattern recognition, vol. 37, 2004, pp. 1565-1566.
[9] S.-Y. Lung, "Wavelet feature selection based neural networks with application
to the text independent speaker recognition," Pattern recognition,
vol. 39, 2006, pp. 1518-1521.
[10] H. M. Torres and H. L. Rufiner, "Automatic speaker identification by
means of Mel cepstrum, wavelets and wavelets packets," in Proc. IEEE
international conference, EMBS, Chicago, IL, 2002, pp. 978-981.
[11] Xugang Lu and Jianwu Dang, "An investigation of dependencies
between frequency components and speaker characteristics for textindependent
speaker identification," Speech communication, vol. 50,
2008, pp. 312-322.
[12] S. Hayakawa and F. Itakura, "Text-dependent speaker recognition using
the information in the higher frequency band," in Proc. IEEE international
conference on Acoustic Speech and signal Processing, ICASSP, Adelaide,
Australia, 1994, pp. 137-140.
[13] S. Mallat, A wavelet tour of signal processing. Second ed., Academic
Press, 1998.
[14] D.A. Reynolds and R.C. Rose, "Robust text-independent speaker identification
using Gaussian mixture speaker models," IEEE Trans. Speech
and Audio Processing, vol. 3, no. 1, pp. 72-83, Jan. 1995.
[15] K. Markov and S. Nakagawa, "Frame level likelihood normalization for
text-independent speaker identification using Gaussian mixture models,"
in Proc. IEEE ICSLP, 1996, pp. 1764-1767.
[1] Z. Tufekci and J.N. Gowdy, "Feature extraction using discrete wavelet
transform for speech recognition," in Proc. IEEE Southeastcon, USA,
2000, pp. 116-123.
[2] O. Farooq and S. Datta, "Phoneme recognition using wavelet based
features," Information Sciences, vol. 150, no.1-2, Mar. 2003, pp. 5-15.
[3] O. Farooq and S. Datta, "Wavelet based robust sub-band features for
phoneme recognition," in IEE Proc. Image signal process., vol. 151, no.
3, June 2004, pp. 187-192.
[4] O. Farooq and S. Datta, "Mel filter like admissible wavelet packet
structure for speech recognition," IEEE Signal Process. Lett., vol. 8, no.
7, pp. 196-199, July 2001.
[5] R. Sarikaya and H. L. Hansen, "High resolution speech feature parameterization
for monophone-based stressed speech recognition," IEEE Signal
Process. Lett., vol. 7, no. 7, pp. 182-185, July 2000.
[6] R. Sarikaya, B. L. Pellom and H. L. Hansen, "Wavelet packet transform
features with application to speaker identification," in Proc. IEEE Nordic
Signal processing Symposium, Visgo, Denmark, 1998, pp. 81-84.
[7] C. T. Hsieh, E. Lai and Y. C. Wang, "Robust speech features based on
wavelet transform with application to speaker identification," in IEE Proc.
Image signal process., vol. 149, no. 2, April 2002, pp. 108-114.
[8] S.-Y. Lung, "Further reduced form of wavelet feature for text independent
speaker recognition," Pattern recognition, vol. 37, 2004, pp. 1565-1566.
[9] S.-Y. Lung, "Wavelet feature selection based neural networks with application
to the text independent speaker recognition," Pattern recognition,
vol. 39, 2006, pp. 1518-1521.
[10] H. M. Torres and H. L. Rufiner, "Automatic speaker identification by
means of Mel cepstrum, wavelets and wavelets packets," in Proc. IEEE
international conference, EMBS, Chicago, IL, 2002, pp. 978-981.
[11] Xugang Lu and Jianwu Dang, "An investigation of dependencies
between frequency components and speaker characteristics for textindependent
speaker identification," Speech communication, vol. 50,
2008, pp. 312-322.
[12] S. Hayakawa and F. Itakura, "Text-dependent speaker recognition using
the information in the higher frequency band," in Proc. IEEE international
conference on Acoustic Speech and signal Processing, ICASSP, Adelaide,
Australia, 1994, pp. 137-140.
[13] S. Mallat, A wavelet tour of signal processing. Second ed., Academic
Press, 1998.
[14] D.A. Reynolds and R.C. Rose, "Robust text-independent speaker identification
using Gaussian mixture speaker models," IEEE Trans. Speech
and Audio Processing, vol. 3, no. 1, pp. 72-83, Jan. 1995.
[15] K. Markov and S. Nakagawa, "Frame level likelihood normalization for
text-independent speaker identification using Gaussian mixture models,"
in Proc. IEEE ICSLP, 1996, pp. 1764-1767.
@article{"International Journal of Electrical, Electronic and Communication Sciences:52876", author = "Mangesh S. Deshpande and Raghunath S. Holambe", title = "Speaker Identification Using Admissible Wavelet Packet Based Decomposition", abstract = "Mel Frequency Cepstral Coefficient (MFCC) features
are widely used as acoustic features for speech recognition as well
as speaker recognition. In MFCC feature representation, the Mel frequency
scale is used to get a high resolution in low frequency region,
and a low resolution in high frequency region. This kind of processing
is good for obtaining stable phonetic information, but not suitable
for speaker features that are located in high frequency regions. The
speaker individual information, which is non-uniformly distributed
in the high frequencies, is equally important for speaker recognition.
Based on this fact we proposed an admissible wavelet packet based
filter structure for speaker identification. Multiresolution capabilities
of wavelet packet transform are used to derive the new features.
The proposed scheme differs from previous wavelet based works,
mainly in designing the filter structure. Unlike others, the proposed
filter structure does not follow Mel scale. The closed-set speaker
identification experiments performed on the TIMIT database shows
improved identification performance compared to other commonly
used Mel scale based filter structures using wavelets.", keywords = "Speaker identification, Wavelet transform, Feature extraction,MFCC, GMM.", volume = "4", number = "1", pages = "88-4", }