Automatic Recognition of an Unknown and Time-Varying Number of Simultaneous Environmental Sound Sources

The present work faces the problem of automatic enumeration and recognition of an unknown and time-varying number of environmental sound sources while using a single microphone. The assumption that is made is that the sound recorded is a realization of sound sources belonging to a group of audio classes which is known a-priori. We describe two variations of the same principle which is to calculate the distance between the current unknown audio frame and all possible combinations of the classes that are assumed to span the soundscene. We concentrate on categorizing environmental sound sources, such as birds, insects etc. in the task of monitoring the biodiversity of a specific habitat.





References:
[1] O. Wang, D. and G. J. Brown, Computational Auditory Scene Analysis:
Principles, Algorithms and Applications, Wiley-Blackwell, Oxford, UK,
2006.
[2] R. Radhakrishnan, and A. Divakaran, "Systematic acquisition of audio
classes for elevator surveillance," in Image and Video Communications
and Processing 2005, vol. 5685 of Proceedings of SPIE, pp. 64-71,
March 2005.
[3] A.J. Eronen, V.T. Peltonen, J.T. Tuomi, A.P. Klapuri, S. Fagerlund, T.
Sorsa, and G. Lorho, "Audio-Based Context Recognition", IEEE
Transactions on Audio, Speech, and Language Processing, vol. 14, no.
1, pp. 321-329, Jan. 2006.
[4] J. Ogle, and D. Ellis, "Fingerprinting to identify repeated sound events
in long-duration personal audio recordings," in International Conference
on Acoustics, Speech and Signal Processing, Hawaii, pp. I-233-236,
2007.
[5] I. Potamitis, "Single channel enumeration and recognition of an
unknown and time-varying number of sound sources", in 16th European
Signal Processing Conference, Laussane, Switzerland, August 2008.
[6] L. Deng, J. Droppo, and A. Acero, "Estimating Cepstrum of Speech
Under the Presence of Noise Using a Joint Prior of Static and Dynamic
Features", IEEE Transactions on Speech & Audio Processing, vol. 12,
no. 3, pp. 218-233, May 2004.
[7] M. Cowling, and R. Sitte, "Comparison of techniques for environmental
sound recognition", Pattern Recognition Letters, vol. 24, no. 15, pp.
2895-2907, Nov. 2003.
[8] F. Sattar, M.Y. Siyal, L.C. Wee, and L.C. Yen, "Blind source
separation of audio signals using improved ICA method", 11th IEEE
Signal Processing Workshop on Statistical Signal Processing,
Singapore, pp. 452-455, 2001.
[9] J. Herre, E. Allamanche, and O. Hellmuth, "Robust matching of audio
signals using spectral flatness features," in IEEE Workshop on
Applications of Signal Processing to Audio and Acoustics, New York,
pp. 127-130, Oct. 2001.
[10] P. Cano, E. Batlle, E. G├│mez, R. De C. T. Gomes, and M. Bonnet,
"Audio Fingerprinting: Concepts and Applications", Book Chapter,
Springer-Verlag, pp. 233-245, 2005.
[11] E. Allamanche, J. Herre, O. Hellmuth, B. Bernhard Fröbach, and M.
Cremer, "AudioID: Towards Content-Based Identification of Audio
Material", 100th AES Convention, Amsterdam, May 2001.
[12] http://www.sound-ideas.com/