Forensic Speaker Verification in Noisy Environmental by Enhancing the Speech Signal Using ICA Approach

We propose a system to real environmental noise and
channel mismatch for forensic speaker verification systems. This
method is based on suppressing various types of real environmental
noise by using independent component analysis (ICA) algorithm.
The enhanced speech signal is applied to mel frequency cepstral
coefficients (MFCC) or MFCC feature warping to extract the
essential characteristics of the speech signal. Channel effects are
reduced using an intermediate vector (i-vector) and probabilistic
linear discriminant analysis (PLDA) approach for classification. The
proposed algorithm is evaluated by using an Australian forensic voice
comparison database, combined with car, street and home noises
from QUT-NOISE at a signal to noise ratio (SNR) ranging from -10
dB to 10 dB. Experimental results indicate that the MFCC feature
warping-ICA achieves a reduction in equal error rate about (48.22%,
44.66%, and 50.07%) over using MFCC feature warping when the
test speech signals are corrupted with random sessions of street, car,
and home noises at -10 dB SNR.




References:
[1] M. I. Mandasari, M. McLaren, and D. A. van Leeuwen, ”The effect of
noise on modern automatic speaker recognition systems,” in IEEE Int.
Conf. Acoust., Speech Signal Process., 2012, pp. 4249-4252.
[2] G. S. Morrison, P. Rose, and C. Zhang, ”Protocol for the collection
of databases of recordings for forensic-voice-comparison research and
practice,”Australian J. Forensic Sci., vol. 44, pp. 155-167, 2012.
[3] J. P. Campbell, W. Shen, W. M. Campbell, R. Schwartz, J. F. Bonastre,
and D. Matrouf, ”Forensic speaker recognition,” IEEE Signal Process.
Mag., pp. 95-103, 2009.
[4] Berouti , M., Schwartz, R. and Makhoul, J., “Enhancement of speech
corrupted by acoustic noise”, IEEE Int. Conf. Acoust., Speech, Signal
Process., vol. 4, 1979, pp. 208-211.
[5] Donho, D.L and Johnston, I.M., “Ideal spatial adapation by wavelet
shrinkage”, Biometrika J., vol. 81, pp. 425-455,1994.
[6] A. K. H. AL-ALI, D. Dean, B. Senadji, and V. Chandran,”Comparison of
speech enhancement algorithms for forensic applications,”in 16th Speech
science and technology conference, Sydney, 2016.
[7] H. Liang, J. Rosca, and R. Balan, ”Independent component analysis
based single channel speech enhancement,” in 3rd IEEE Int. Symp. Signal
Process. Inform. Technology, 2003, pp. 522-525.
[8] H. Li, H. Wang, and B. Xiao, ”Blind separation of noisy mixed
speech signals based on wavelet transform and Independent Component
Analysis,” in 8th Int. Conf. Signal Process., 2006.
[9] Hyvarinen, A. and Oja, E., “Independent component analysis: algorithms
and applications”, Neural Netw., vol. 13, no. 4, pp. 411-430, 2000.
[10] H.-y. Li, Q.-h. Zhao, G.-l. Ren, and B.-j. Xiao, ”Speech Enhancement
Algorithm Based on Independent Component Analysis,” in 5th Int. Conf.
Natural Computation, 2009, pp. 598-602.
[11] D. B. Dean, S. Sridharan, R. J. Vogt, and M. W. Mason, ”The
QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection
algorithms,” in Proc. Interspeech, Makuhari, Japan, 2010, pp. 26-30.
[12] R. S. Holambe and M. S. Deshpande, ”Noise Robust Speaker
Identification: Using Nonlinear Modeling Techniques,” in Forensic
Speaker Recognition, Ed: Springer, 2012, pp. 153-182.
[13] A. Varga and H. J. M. Steeneken,”Assessment for automatic speech
recognition:II. NOISEX-92: A database and an experiment to study the
effect of additive noise on speech recognition systems,” Speech Commun.,
vol. 12, no. 3, pp. 247-251, 1993.
[14] S. O. Sadjadi, M. Slaney, and L. Heck, ”MSR identity toolbox
- A matlab toolbox for speaker recognition research”, Microsoft
Research,Conversational Systems Research Center (CSRC), 2013.
[15] G. S. Morrison, C. Zhang, E. Enzinger, F. Ochoa, D. Bleach, M. Johnson,
B. K. Folky, S. Desouza, N. Cumminus, D. Chow. (2015). Forensic
database of voice recordings of 500+ Australian English speakers.
(Available: http//databases.forensic-voice-comparison.net/).
[16] J. Sohn, N. S. Kim, and W. Sung, ”A statistical model-based voice
activity detection,” IEEE Signal Pocess. Lett., vol. 6, no.1, pp. 1-3, Jan.
1999.