Slice Bispectrogram Analysis-Based Classification of Environmental Sounds Using Convolutional Neural Network

Certain systems can function well only if they recognize the sound environment as humans do. In this research, we focus on sound classification by adopting a convolutional neural network and aim to develop a method that automatically classifies various environmental sounds. Although the neural network is a powerful technique, the performance depends on the type of input data. Therefore, we propose an approach via a slice bispectrogram, which is a third-order spectrogram and is a slice version of the amplitude for the short-time bispectrum. This paper explains the slice bispectrogram and discusses the effectiveness of the derived method by evaluating the experimental results using the ESC‑50 sound dataset. As a result, the proposed scheme gives high accuracy and stability. Furthermore, some relationship between the accuracy and non-Gaussianity of sound signals was confirmed.


Authors:



References:
[1] S. Chu, S. Narayanan and C.-C. J. Kuo, “Environmental sound recognition with time–frequency audio features,” IEEE Transactions on Audio, Speech, and Language Processing, 17-6, pp.1142-1158, Aug. 2009.
[2] F. Su, L. Yang, T. Lu and G. Wang, “Environmental sound classification for scene recognition using local discriminant bases and HMM,” Proceedings of the 19th ACM international conference on Multimedia, pp.1389-1392, Nov. 2011.
[3] S. Chachada and C.-C. J. Kuo, “Environmental sound recognition: A survey,” 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp.1-9, Oct. 2013.
[4] K. J. Piczak, “Environmental sound classification with convolutional neural networks,” 2015 IEEE international workshop on machine learning for signal processing, Sept. 2015.
[5] M. Huzaifah, “Comparison of time-frequency representations for environmental sound classification using convolutional neural networks,” ArXiv Prepr. ArXiv170607156, 2017.
[6] “ESC-50: Dataset for Environmental Sound Classification”, https:// github.com/karoldvl/ESC-50 (Last accessed at Oct. 3, 2019).
[7] K. J. Piczak, "ESC: Dataset for Environmental Sound Classification," Proceedings of the 23rd Annual ACM Conference on Multimedia, pp.1015-1018, Oct. 2015.
[8] C. L. Nikias and A. P. Petropulu, Higher-order spectra analysis: a nonlinear signal processing framework, Prentice Hall, 1993, pp.7-30
[9] V. Swarnkar, U. Abeyratne, and C. Hukins, “Objective measure of sleepiness and sleep latency via bispectrum analysis of EEG,” Medical and & biological engineering & computing, 48, pp.1203-1213, Dec. 2010.
[10] K. Hirata, “Estimating 3D-Position of A Stationary Random Acoustic Source Using Bispectral Analysis of 4-Point Detected Signals,” International Journal of Computer and Information Engineering, 8-6, pp.932-935, 2014.