A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented by Matching Pursuit
A simple adaptive voice activity detector (VAD) is
implemented using Gabor and gammatone atomic decomposition of
speech for high Gaussian noise environments. Matching pursuit is
used for atomic decomposition, and is shown to achieve optimal
speech detection capability at high data compression rates for low
signal to noise ratios. The most active dictionary elements found by
matching pursuit are used for the signal reconstruction so that the
algorithm adapts to the individual speakers dominant time-frequency
characteristics. Speech has a high peak to average ratio enabling
matching pursuit greedy heuristic of highest inner products to isolate
high energy speech components in high noise environments. Gabor
and gammatone atoms are both investigated with identical
logarithmically spaced center frequencies, and similar bandwidths.
The algorithm performs equally well for both Gabor and gammatone
atoms with no significant statistical differences. The algorithm
achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR
and 98% accuracy at a 20dB SNR using 30d B SNR as a reference
for voice activity.
[1] Gabor, D., Theory of communication, J. Inst. Elect. Eng., 93, pp. 429–
457. 1946
[2] Lobo, A., Loizou, P., Voiced/unvoiced speech discrimination in noise
using Gabor atomic decomposition. ICASSP (1) 2003: 820-823
[3] Smith, E., Lewicki, M., Efficient auditory coding. Nature,
439(7079):978–82, 2006.
[4] R. Patterson I. Nimmo-Smith. An Efficient Auditory Filterbank Based
on the Gammatone Function. Institute of Acoustics on Auditory
Modelling 1987
[5] Slaney, M., (1998) "Auditory Toolbox Version 2", Technical Report
#1998-010, Interval Research Corporation, 1998.
[6] Atlas, L. Decomposition of speech and sound into Modulations and
Carriers. http://msrvideo.vo.msecnd.net/rmcvideos/173320/dl/
173320.pdf, Microsoft Research & University of Washington. 2012
[7] Mallat, S., Zhang, Z., Matching Pursuits with Time-Frequency
Dictionaries. IEEE transactions on signal processing, Vol 41. No 12,
1993
[8] Kressner, A., Anderson, D., Rozell, C. Causal Binary Mask Estimation
for Speech Enhancements using Sparsity Constraints. Proceedings on
Meetings on Acoustics Vol. 9, 055037 2013
[9] Guo, D., Verdu’, S., Mutual Information and Minimum Mean-Square
Error in Gaussian Channels. IEEE transactions on information theory,
Vol. 51, No. 4, 2005
[10] Eargle, J., Handbook of Recording Engineering. 4th Addition. Springer
Science and Business Media. ISBN 1-4020-7230-9 (HC), 2003.
[1] Gabor, D., Theory of communication, J. Inst. Elect. Eng., 93, pp. 429–
457. 1946
[2] Lobo, A., Loizou, P., Voiced/unvoiced speech discrimination in noise
using Gabor atomic decomposition. ICASSP (1) 2003: 820-823
[3] Smith, E., Lewicki, M., Efficient auditory coding. Nature,
439(7079):978–82, 2006.
[4] R. Patterson I. Nimmo-Smith. An Efficient Auditory Filterbank Based
on the Gammatone Function. Institute of Acoustics on Auditory
Modelling 1987
[5] Slaney, M., (1998) "Auditory Toolbox Version 2", Technical Report
#1998-010, Interval Research Corporation, 1998.
[6] Atlas, L. Decomposition of speech and sound into Modulations and
Carriers. http://msrvideo.vo.msecnd.net/rmcvideos/173320/dl/
173320.pdf, Microsoft Research & University of Washington. 2012
[7] Mallat, S., Zhang, Z., Matching Pursuits with Time-Frequency
Dictionaries. IEEE transactions on signal processing, Vol 41. No 12,
1993
[8] Kressner, A., Anderson, D., Rozell, C. Causal Binary Mask Estimation
for Speech Enhancements using Sparsity Constraints. Proceedings on
Meetings on Acoustics Vol. 9, 055037 2013
[9] Guo, D., Verdu’, S., Mutual Information and Minimum Mean-Square
Error in Gaussian Channels. IEEE transactions on information theory,
Vol. 51, No. 4, 2005
[10] Eargle, J., Handbook of Recording Engineering. 4th Addition. Springer
Science and Business Media. ISBN 1-4020-7230-9 (HC), 2003.
@article{"International Journal of Information, Control and Computer Sciences:70009", author = "Thomas Bryan and Veton Kepuska and Ivica Kostanic", title = "A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented by Matching Pursuit", abstract = "A simple adaptive voice activity detector (VAD) is
implemented using Gabor and gammatone atomic decomposition of
speech for high Gaussian noise environments. Matching pursuit is
used for atomic decomposition, and is shown to achieve optimal
speech detection capability at high data compression rates for low
signal to noise ratios. The most active dictionary elements found by
matching pursuit are used for the signal reconstruction so that the
algorithm adapts to the individual speakers dominant time-frequency
characteristics. Speech has a high peak to average ratio enabling
matching pursuit greedy heuristic of highest inner products to isolate
high energy speech components in high noise environments. Gabor
and gammatone atoms are both investigated with identical
logarithmically spaced center frequencies, and similar bandwidths.
The algorithm performs equally well for both Gabor and gammatone
atoms with no significant statistical differences. The algorithm
achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR
and 98% accuracy at a 20dB SNR using 30d B SNR as a reference
for voice activity.", keywords = "Atomic Decomposition, Gabor, Gammatone,
Matching Pursuit, Voice Activity Detection.", volume = "9", number = "5", pages = "1296-8", }