A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented by Matching Pursuit

A simple adaptive voice activity detector (VAD) is implemented using Gabor and gammatone atomic decomposition of speech for high Gaussian noise environments. Matching pursuit is used for atomic decomposition, and is shown to achieve optimal speech detection capability at high data compression rates for low signal to noise ratios. The most active dictionary elements found by matching pursuit are used for the signal reconstruction so that the algorithm adapts to the individual speakers dominant time-frequency characteristics. Speech has a high peak to average ratio enabling matching pursuit greedy heuristic of highest inner products to isolate high energy speech components in high noise environments. Gabor and gammatone atoms are both investigated with identical logarithmically spaced center frequencies, and similar bandwidths. The algorithm performs equally well for both Gabor and gammatone atoms with no significant statistical differences. The algorithm achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR and 98% accuracy at a 20dB SNR using 30d B SNR as a reference for voice activity.




References:
[1] Gabor, D., Theory of communication, J. Inst. Elect. Eng., 93, pp. 429–
457. 1946
[2] Lobo, A., Loizou, P., Voiced/unvoiced speech discrimination in noise
using Gabor atomic decomposition. ICASSP (1) 2003: 820-823
[3] Smith, E., Lewicki, M., Efficient auditory coding. Nature,
439(7079):978–82, 2006.
[4] R. Patterson I. Nimmo-Smith. An Efficient Auditory Filterbank Based
on the Gammatone Function. Institute of Acoustics on Auditory
Modelling 1987
[5] Slaney, M., (1998) "Auditory Toolbox Version 2", Technical Report
#1998-010, Interval Research Corporation, 1998.
[6] Atlas, L. Decomposition of speech and sound into Modulations and
Carriers. http://msrvideo.vo.msecnd.net/rmcvideos/173320/dl/
173320.pdf, Microsoft Research & University of Washington. 2012
[7] Mallat, S., Zhang, Z., Matching Pursuits with Time-Frequency
Dictionaries. IEEE transactions on signal processing, Vol 41. No 12,
1993
[8] Kressner, A., Anderson, D., Rozell, C. Causal Binary Mask Estimation
for Speech Enhancements using Sparsity Constraints. Proceedings on
Meetings on Acoustics Vol. 9, 055037 2013
[9] Guo, D., Verdu’, S., Mutual Information and Minimum Mean-Square
Error in Gaussian Channels. IEEE transactions on information theory,
Vol. 51, No. 4, 2005
[10] Eargle, J., Handbook of Recording Engineering. 4th Addition. Springer
Science and Business Media. ISBN 1-4020-7230-9 (HC), 2003.