Speech Intelligibility Improvement Using Variable Level Decomposition DWT

Intelligibility is an essential characteristic of a speech
signal, which is used to help in the understanding of information in
speech signal. Background noise in the environment can deteriorate
the intelligibility of a recorded speech. In this paper, we presented a
simple variance subtracted - variable level discrete wavelet transform,
which improve the intelligibility of speech. The proposed algorithm
does not require an explicit estimation of noise, i.e., prior knowledge
of the noise; hence, it is easy to implement, and it reduces the
computational burden. The proposed algorithm decides a separate
decomposition level for each frame based on signal dominant and
dominant noise criteria. The performance of the proposed algorithm
is evaluated with speech intelligibility measure (STOI), and results
obtained are compared with Universal Discrete Wavelet Transform
(DWT) thresholding and Minimum Mean Square Error (MMSE)
methods. The experimental results revealed that the proposed scheme
outperformed competing methods




References:
[1] P. C. Loizou, Speech Enhancement: Theory and Practice. Boca Raton,
FL, USA: CRC press, 2007.
[2] Y. Ephraim and D. Malah, “Speech enhancement using a
Minimum-Mean Square Error Short-Time Spectral Amplitude
estimator,” IEEE Transactions on Acoustics, Speech, and Signal
Processing, vol. 32, no. 6, pp. 1109–1121, 1984.
[3] S. G. Mallat, “A Theory for Multiresolution Signal Decomposition: The
Wavelet Representation,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 11, no. 7, pp. 674–693, 1989.
[4] G. Kim and P. C. Loizou, “Improving Speech Intelligibility in
Noise using Environment-Optimized Algorithms,” IEEE Transactions on
Audio, Speech, and Language Processing, vol. 18, no. 8, pp. 2080–2090,
2010.
[5] P. C. Loizou and G. Kim, “Reasons Why Current Speech-Enhancement
Algorithms do not Improve Speech Intelligibility and Suggested
Solutions,” IEEE Transactions on Audio, Speech, and Language
Processing, vol. 19, no. 1, pp. 47–56, 2010.
[6] D. Wang and J. Chen, “Supervised peech separation based on deep
learning: An overview,” IEEE Transactions on Audio, Speech, and
Language Processing, vol. 26, no. 10, pp. 1702–1726, 2018.
[7] M. Kolbk, Z.-H. Tan, J. Jensen, M. Kolbk, Z.-H. Tan, and J. Jensen,
“Speech Intelligibility Potential of General and Specialized Deep Neural
Network based Speech Enhancement Systems,” IEEE Transactions on
Audio, Speech, and Language Processing, vol. 25, no. 1, pp. 153–167,
2017.
[8] S. Y. Low, D. S. Pham, and S. Venkatesh, “Compressive Speech
Enhancement,” Speech Communication, vol. 55, no. 6, pp. 757–768,
2013.
[9] M. Srivastava, C. L. Anderson, and J. H. Freed, “A New Wavelet
Denoising Method for Selecting Decomposition Levels and Noise
Thresholds,” IEEE Access, vol. 4, pp. 3862–3877, 2016.
[10] J. S. Garofolo et al., “Getting started with the DARPA TIMIT CD-ROM:
An acoustic phonetic continuous speech database,” National Institute of
Standards and Technology (NIST), Gaithersburgh, MD, vol. 107, pp.
1–6, 1988.
[11] A. Varga and H. J. Steeneken, “Assessment for Automatic Speech
Recognition: II. NOISEX-92: A Database and an Experiment to Study
the Effect of Additive Noise on Speech Recognition Systems,” Speech
communication, vol. 12, no. 3, pp. 247–251, 1993.
[12] D. L. Donoho and J. M. Johnstone, “Ideal Spatial Adaptation by Wavelet
Shrinkage,” biometrika, vol. 81, no. 3, pp. 425–455, 1994.
[13] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Transactions on
information Theory, vol. 41, no. 3, pp. 613–627, 1995.
[14] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An
Algorithm for Intelligibility Prediction of Time–Frequency Weighted
Noisy Speech,” IEEE Transactions on Audio, Speech, and Language
Processing, vol. 19, no. 7, pp. 2125–2136, 2011.