Using HMM-based Classifier Adapted to Background Noises with Improved Sounds Features for Audio Surveillance Application

Discrimination between different classes of environmental sounds is the goal of our work. The use of a sound recognition system can offer concrete potentialities for surveillance and security applications. The first paper contribution to this research field is represented by a thorough investigation of the applicability of state-of-the-art audio features in the domain of environmental sound recognition. Additionally, a set of novel features obtained by combining the basic parameters is introduced. The quality of the features investigated is evaluated by a HMM-based classifier to which a great interest was done. In fact, we propose to use a Multi-Style training system based on HMMs: one recognizer is trained on a database including different levels of background noises and is used as a universal recognizer for every environment. In order to enhance the system robustness by reducing the environmental variability, we explore different adaptation algorithms including Maximum Likelihood Linear Regression (MLLR), Maximum A Posteriori (MAP) and the MAP/MLLR algorithm that combines MAP and MLLR. Experimental evaluation shows that a rather good recognition rate can be reached, even under important noise degradation conditions when the system is fed by the convenient set of features.




References:
[1] C. Couvreur, "Environmental Sound Recognition: A Statistical Approach,"
Ph.D. dissertation, Faculte Polytechnique de Mons, Belgium,
June 1997.
[2] V. Peltonen, "Computational auditory scene recognition," Ph.D. dissertation,
Tampere University of Technology, Finland, 2001.
[3] D. Istrate, "D'etection et reconnaissance des sons pour la surveillance
m'edicale," Ph.D. dissertation, INPG, France, Dec. 2003.
[4] K. El-Maleh, "Frame level noise classification in mobile environments,"
Ph.D. dissertation, McGill University, Montreal, Canada, Jan. 2004.
[5] R. S. Goldhor, "Recognition of environmental sounds," in ICASSP,
vol. 1, New York, USA, 1993, pp. 149-152.
[6] B. Uvacek, H. Ye, and G. Moschytz, "A new strategy for tactile hearing
aids: tactile identification of preclassified signals (tips)," in International
Conference on Acoustic, Speech and Signal Processing (ICASSP), New-
York, USA, May 1988.
[7] A. K. S. Oberle, "Recognition of acoustical alarm signals for the profoundly
deaf using hidden markov models," in International Symposium
on Circuits and Systems, vol. 1, Seattle, USA, 1995, pp. 2285-2288.
[8] J. A. Osuna and G. S. Moschytz, "Recognition of acoustical alarm
signals with cellular networks," in European Conference on Circuit
Theory and Design, Istanbul, Turkey, 1995.
[9] M. J. Paradie and S. Nawab, "Classification of ringing sounds," in
ICASSP, Apr. 1990.
[10] R. H. Cabell, C. Fuller, and W. O-Brien, "Identification of Helicopter
noise Using a Neural Network," AIAA Journal, vol. 30, no. 3, pp. 624-
630, Mar. 1992.
[11] A. Eronen and A. Klapuri, "Musical instrument recognition using
cepstral coefficients and temporal features," in ICASSP, Istanbul, Turkey,
2000, pp. 753-756.
[12] H. Soltau, T. Schultz, and M. Westphal, "Recognition of music types,"
in ICASSP, Seattle, WA, 1998.
[13] A. Dufaux, "Detection and recognition of Impulsive Sounds Signals,"
Ph.D. dissertation, Facult'e des sciences de l-Universit'e de Neuchˆatel,
Switzerland, 2001.
[14] A. Bregman, Auditory scene analysis. Cambridge, USA: MIT Press,
1990.
[15] K. D. Martin, "Sound-source recognition: A theory and computational
model," Ph.D. dissertation, MIT Press, 1999.
[16] A. Klapuri and M. Davy, Eds., Signal Processing Methods for Music
Transcription. New York: Springer, 2006.
[17] M. Orr, D. Pham, B. Lithgow, and R. Mahony, "Speech perception
based algorithm for the separation of overlapping speech signal," in The
Seventh Australian and New Zealand Intelligent Information Systems
Conference, 2001.
[18] M. Cowling, "Non-speech environmental sound classification system for
autonomous surveillance," Ph.D. dissertation, Faculty of Engineering
and Information Technology, Griffith University, 2004.
[19] M. Cowling and R. Sitte, "Recognition of environmental sounds using
speech recognition techniques," Advanced Signal Processing for Communications
Systems, 2002.
[20] ÔÇöÔÇö, "Comparison of techniques for environmental sound recognition,"
Pattern Recognition Letters, vol. 24, pp. 2895-2907, 2003.
[21] Y. Gong, "Speech recognition in noisy environments: A survey," Speech
Communication, vol. 16, pp. 261-291, 1995.
[22] C. H. Lee, "On stochastic feature and model compensation approaches to
robust speech recognition," Speech Communication, vol. 25, pp. 29-47,
1998.
[23] ÔÇöÔÇö, "Adaptive classification and decision strategies for robust speech
recognition," in Workshop on Robust Methods Speech Recognition
Adverse Conditions, Tempere, Finland, May 1999.
[24] Real World Computing Paternship, "Cd-sound scene database in
real acoustical environments," http://tosa.mri.co.jp/sounddb/indexe.htm,
2000.
[25] Leonardo Software, Santa Monica, USA, http://www.leonardosoft.com.
[26] L. R. Rabiner, "A tutorial on hidden markov models and selected
applications in speech recognition," Proc. of IEEE, vol. 77, no. 2, pp.
257-289, Feb. 1989.
[27] P. Mermelstein and S. B. Davis, "Comparison of parametric representations
for monosyllabic word recognition in continuously spoken
sentences," in ICASSP, vol. 28, 1980, pp. 357-366.
[28] J. Makhoul, "Linear prediction: A tutorial review," in Proceedings of
IEEE, vol. 63, 1975, pp. 561-580.
[29] P. Mermelstein and N. Morgan, "Rasta processing of speech," IEEE
Transactions on Speech and Audio Processing, vol. 2, pp. 578-589,
1994.
[30] M. Vetterli and J. Kovacevic, Wavelets and subband coding. Englewood
Cliffs, NJ, USA: Prentice Hall, 1995.
[31] S. Mallat, A wavelet tour of signal processing. Academic Press, 1998.
[32] P. Flandrin, Time-frequency/time Scale Analysis. San Diego, USA:
Academic Press, 1999.
[33] I. Jollife, Principal Component Analysis. New York, USA: Springer-
Verlag, 1986.
[34] J. Loehlin, Latent variable models: An Introduction to Factor, Path, and
Structural Analysis. Lawrence Erlbaum Assoc., 2001.
[35] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
Learning. New-York, USA: Springer, 2001.
[36] A. Rabaoui, Z. Lachiri, and N. Ellouze, "Hidden Markov model environment
adaptation for noisy sounds in a supervised recognition system,"
in International Symposium on Communication, Control and Signal
Processing (ISCCSP), Marrakech, Morroco, Mar. 2006.
[37] K. Lee and H. Hon, "Large-vocabulary speaker-independent continuous
speech recognition," in ICASSP, Apr. 1988.
[38] A. Acero, "Acoustical and Environmental Robustness in Automatic
Speech Recognition," Ph.D. dissertation, Department of Electrical and
Computer Engineering, Carnegie Mellon University, 1990.
[39] C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression
for speaker adaptation of continuous density HMMs," Computer
Speech and Language, vol. 9, pp. 171-186, 1995.
[40] M. J. F. Gales and P. C. Woodland, "Variance compensation within the
mllr framework," Technical Report CUED, Cambridge University, Tech.
Rep., 1996.
[41] J. Bilmes, "A gentle tutorial of the EM algorithm and its application to
parameter estimation for Gaussian mixture and hidden Markov models,"
International Computer Science Institute, Berkeley, USA, Tech. Rep.,
1998.
[42] K. Shinoda and C.-H.Lee, "Unsupervised adaptation using structural
bayes approach," in ICASSP, 1998.
[43] L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal, "A
comparative performance study of several pitch detection algorithms,"
IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24,
no. 5, pp. 399-418, 1976.
[44] D. Mitrovic, "Discrimination and Retrieval of Environmental sounds,"
Ph.D. dissertation, Vienna University of Technology, Dec. 2005.