Speech Enhancement by Marginal Statistical Characterization in the Log Gabor Wavelet Domain

This work presents a fusion of Log Gabor Wavelet (LGW) and Maximum a Posteriori (MAP) estimator as a speech enhancement tool for acoustical background noise reduction. The probability density function (pdf) of the speech spectral amplitude is approximated by a Generalized Laplacian Distribution (GLD). Compared to earlier estimators the proposed method estimates the underlying statistical model more accurately by appropriately choosing the model parameters of GLD. Experimental results show that the proposed estimator yields a higher improvement in Segmental Signal-to-Noise Ratio (S-SNR) and lower Log-Spectral Distortion (LSD) in two different noisy environments compared to other estimators.




References:
[1] Boll, S. F., "Suppression of Acoustic Noise in Speech using Spectral
Subtraction", IEEE ASSP, 27(2):113-120, 1979
[2] Y. Ephraim and D. Malah, "Speech Enhancement using a Minimum
Mean-Square Error Short-Time Spectral Amplitude Estimator", IEEE
Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP32,
no. 6, pp. 1109-1121, Dec. 1984.
[3] T. H. Dat, K. Takeda and F. Itakura, "Generalized Gamma Modeling of
Speech and its Online Estimation for Speech Enhancement",
Proceedings of ICASSP-2005, 2005.
[4] R. Martin and C. Breithaupt, "Speech Enhancement in the DFT Domain
using Laplacian Speech Priors", in Proc. International Workshop on
Acoustic Echo and Noise Control (IWAENC 03), pp. 8790, Kyoto,
Japan, Sep. 2003.
[5] R. Martin, "Speech Enhancement Using MMSE Short Time Spectral
Estimation with Gamma Distributed Speech Priors", IEEE ICASSP-02,
Orlando, Florida, May 2002.
[6] Thomas Lotter and Peter Vary, "Speech Enhancement by MAP Spectral
Amplitude Estimation Using a Super-Gaussian Speech Model",
EURASIP Journal on Applied Signal Processing , vol. 2005, Issue 7,
Pages 11101126.
[7] C. Breithaupt and R. Martin, "MMSE Estimation of Magnitude-Squared
DFT Coefficients with Super-Gaussian Priors", IEEE Proc. Intern. Conf.
on Acoustics, Speech and Signal Processing, vol. I, pp. 896-899, April
2003.
[8] Deng, J. Droppo, and A. Acero. "Estimating cepstrum of speech under
the presence of noise using a joint prior of static and dynamic features",
IEEE Transactions on Speech and Audio Processing, vol. 12, no. 3, May
2004, pp. 218-233.
[9] I. Cohen, "Speech Enhancement Using a Noncausal A Priori SNR
Estimator", IEEE Signal Processing Letters, Vol. 11, No. 9, Sep. 2004,
pp. 725-728.
[10] S. Kamath and P. Loizou, "A Multi-Band Spectral Subtraction Method
for Enhancing Speech Corrupted by Colored Noise", In Proceedings
International Conference on Acoustics, Speech and Signal Processing,
2002.
[11] E. Zavarehei, S. Vaseghi and Q. Yan, "Speech Enhancement using
Kalman Filters for Restoration of Short-Time DFT Trajectories",
Automatic Speech Recognition and Understanding (ASRU), 2005 IEEE
Workshop, Nov. 27, 2005, Page(s):219 -224.
[12] D. Gabor, "Theory of communication", J. Inst. Electr. Eng. 93, pp.
429457, 1946.
[13] J. Morlet, G. Arens, E. Fourgeau and D. Giard, "Wave Propagation and
Sampling Theory -Part II: Sampling theory and complex waves",
Geophysics, 47(2):222-236, February 1982.
[14] D. J. Field, "Relations between the statistics of natural images and the
response properties of cortical cells", Journal of The Optical Society of
America A, 4(12):2379-2394, Dec. 1987.