Speech Enhancement Using Wavelet Coefficients Masking with Local Binary Patterns

In this paper, we present a wavelet coefficients masking
based on Local Binary Patterns (WLBP) approach to enhance the
temporal spectra of the wavelet coefficients for speech enhancement.
This technique exploits the wavelet denoising scheme, which splits
the degraded speech into pyramidal subband components and extracts
frequency information without losing temporal information. Speech
enhancement in each high-frequency subband is performed by binary
labels through the local binary pattern masking that encodes the ratio
between the original value of each coefficient and the values of the
neighbour coefficients. This approach enhances the high-frequency
spectra of the wavelet transform instead of eliminating them through
a threshold. A comparative analysis is carried out with conventional
speech enhancement algorithms, demonstrating that the proposed
technique achieves significant improvements in terms of PESQ, an
international recommendation of objective measure for estimating
subjective speech quality. Informal listening tests also show that
the proposed method in an acoustic context improves the quality
of speech, avoiding the annoying musical noise present in other
speech enhancement techniques. Experimental results obtained with a
DNN based speech recognizer in noisy environments corroborate the
superiority of the proposed scheme in the robust speech recognition
scenario.




References:
[1] J. Benesty, S. Makino, J. Chen, Speech Enhancement, Springer, 2005.
[2] P. C. Loizou, Speech enhancement: theory and practice, CRC press,
2013.
[3] S. Boll, Suppression of acoustic noise in speech using spectral
subtraction, IEEE Transactions on acoustics, speech, and signal
processing, 27, pp. 113-120 1979.
[4] Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean
square error short-time spectral amplitude estimator IEEE Transactions
on Acoustics, Speech, and Signal Processing, 32, pp. 1109-1121, 1984.
[5] D. L. Donoho, De-noising by soft-thresholding, IEEE transactions on
information theory, 41, pp. 613-627, 1995.
[6] Y. Wang, K. Han, and D. Wang, Exploring monaural features for
classification-based speech segregation, IEEE Transactions on Audio,
Speech, and Language Processing, 21(2), pp. 270-279, 2013.
[7] D. Wang, G. J. BrownComputational auditory scene analysis: Principles,
algorithms, and applications, Hoboken, NJ, USA Wiley-IEEE press,
2006.
[8] D. Wang,On ideal binary mask as the computational goal of auditory
scene analysis, Speech separation by humans and machines, p. 181-197,
2005.
[9] Y. Jiang, H. Zhou, and Z. Feng Performance analysis of ideal binary
masks in speech enhancement In 4th International Congress Image and
Signal Processing (ICISP), Vol. 5, pp. 2422-2425, october. 2011.
[10] T. Ojala, M. Pietikinen, D. Harwood A comparative study of texture
measures with classification based on featured distributions, Pattern
recognition, pp. 51-9, 1996.
[11] N. Chatlani, JJ. Soraghan, Local binary patterns for 1-D signal
processing, EUSIPCO, p. 95-99, 2010.
[12] D. Pearce, and H. G. Hirsch, The Aurora experimental framework for
the performance evaluation of speech recognition systems under noisy
conditions. In Sixth International Conference on Spoken Language
Processing, 2000.
[13] D. L. Donoho, I. M. Johnstone,Threshold selection for wavelet shrinkage
of noisy data, In Engineering in Medicine and Biology Society, Vol.
1, pp. A24-A25, nov. 1994.
[14] S. Liao, M. W. Law, A. C. Chung,Dominant local binary patterns for
texture classification, IEEE transactions on image processing, 18(5),
pp. 1107-1118, 2009.
[15] J. Chen, S. Shan, C. He, G. Zhao, M. Pietikainen, X. Chen, and W. Gao,
WLD: A robust local image descriptor, IEEE transactions on pattern
analysis and machine intelligence, 32(9), pp. 1705-1720. 2010.
[16] D. Gupta, and A. Jindal. Content based image retrieval using enhanced
local tetra patterns International journal of innovative research in
science and engineering, January 2017.
[17] I. Cohen, Noise spectrum estimation in adverse environments: Improved
minima controlled recursive averaging, IEEE Transactions on Speech
and Audio Processing, 11(5), pp. 466-475. 2003.
[18] J. G. Beerends, A. P. Hekstra, A. M. Rix,and M. P. Hollier, Perceptual
evaluation of speech quality (pesq) the new itu standard for end-to-end
speech quality assessment part ii: psychoacoustic model. Journal of the
Audio Engineering Society, 50(10), pp. 765-778, 2002.
[19] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel,
and J. Silovsky. The Kaldi speech recognition toolkit, In IEEE
workshop on automatic speech recognition and understanding, IEEE
Signal Processing Society, Dec 2011.