Noise Estimation for Speech Enhancement in Non-Stationary Environments-A New Method

This paper presents a new method for estimating the nonstationary noise power spectral density given a noisy signal. The method is based on averaging the noisy speech power spectrum using time and frequency dependent smoothing factors. These factors are adjusted based on signal-presence probability in individual frequency bins. Signal presence is determined by computing the ratio of the noisy speech power spectrum to its local minimum, which is updated continuously by averaging past values of the noisy speech power spectra with a look-ahead factor. This method adapts very quickly to highly non-stationary noise environments. The proposed method achieves significant improvements over a system that uses voice activity detector (VAD) in noise estimation.




References:
[1] Sohn. J, Kim. N, "Statistical model-based voice activity detection",
IEEE Signal Process. Lett. 6(1), pp. 1-3, 1999.
[2] Malah.D, Cox.R, Accardi.A, "Tracking speech-presence uncertainty to
improve speech enhancement in non-stationary environments", Proc.
IEEE Internat. On Conf. Acoust. Speech Signal Process., pp. 789-792,
1999.
[3] Martin.R, "Noise power spectral density estimation based on optimal
smoothing and minimum statistics", IEEE Tran. Speech Audio Process.,
9(5), pp. 504-512,2001.
[4] Cohen.I, "Noise estimation by minima controlled recursive averaging for
robust speech enhancement", IEEE Signal Process. Lett., 9(1), pp. 12-15,
2002.
[5] Cohen.I., "Noise spectrum estimation in adverse environments: improved
minima controlled recursive averaging", IEEE Trans. Speech Audio
Process., 11(5), pp. 466-475, 2003.
[6] Doblinger.G, "Computationally efficient speech enhancement by spectral
minima tracking in subbands", Proc. Eurospeech, pp.1513-1516, 1995.