On-line Speech Enhancement by Time-Frequency Masking under Prior Knowledge of Source Location

This paper presents the source extraction system which can extract only target signals with constraints on source localization in on-line systems. The proposed system is a kind of methods for enhancing a target signal and suppressing other interference signals. But, the performance of proposed system is superior to any other methods and the extraction of target source is comparatively complete. The method has a beamforming concept and uses an improved time-frequency (TF) mask-based BSS algorithm to separate a target signal from multiple noise sources. The target sources are assumed to be in front and test data was recorded in a reverberant room. The experimental results of the proposed method was evaluated by the PESQ score of real-recording sentences and showed a noticeable speech enhancement.





References:
[1] M. Brandstein and D. Ward, Microphone Arrays, Springer, 2001.
[2] S. Haykin, Adaptive Filter Theory, Prentice Hall, 1991.
[3] S. Gannot, D. Burshtein, and E. Weinstein, "Signal enhancement using
beamforming and nonstationarity with applications to speech," IEEE
Trans. Signal Process., vol.49, no.8, Aug. 2001, pp.1614-1626.
[4] Ö. Yilmaz and S. Rickard, "Blind separation of speech mixtures via
time-frequency masking," IEEE Trans. Signal Process., vol. 52, no. 7,
July 2004, pp.1830-1846.
[5] ITU-T, "Perceptual evaluation of speech quality (PESQ), an objective
method for end-to-end speech quality assessment of narrow-band
telephone networks and speech codecs," ITU-T Recommendation P.862,
February 2001.
[6] H. Sawada, S. Araki, R. Mukai, and S. Makino, "Blind extraction of
dominant target sources using ICA and time-frequency masking," IEEE
Trans. Signal Process. , vol. 14, no. 6, Nov. 2006, pp.2165-2173.
[7] H. Saruwatari, S. Kurita, and K. Takeda, "Blind source separation
combining frequency-domain ICA and beamforming," in Proc.
ICASSP2001, pp.2733-2736.
[8] G. Shi and P. Aarabi, "Robust digit recognition using phase-dependent
time-frequency masking," in Proceedings of ICASSP, Hong Kong, Apr.
2003, pp.684-687.
[9] A. Bell and T. Sejnowski, "An information maximization approach to
blind separation and blind deconvolution," Neural Comput., vol.7, Nov.
1995, pp.1129-1159.
[10] J. Yang-Won, K. Hong-Goo, L. Chungyong, Y. Dae-Hee, C. Changkyu,
and K. Jaywoo, "Adaptive Microphone Array System with Two-Stage
Adaptation Mode Controller," in IEICE Trans. Fundamentals, vol. E88-A,
no. 4, Apr. 2005.