Orchestra/Percussion Classification Algorithm for United Speech Audio Coding System

Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.





References:
[1] Jeongook Song, Hyen-o Oh, Hong-Goo Kong, "Enhanced long-term
predictor for Unified Speech and Audio Coding”, Acoustics, Speech and
Signal Processing (ICASSP), 2011 IEEE International Conference on,
pp: 505-508, 22-27 May 2011. [2] Extended Adaptive Multi-Rate-Wideband (AMR-WB+) codec;
Transcoding functions (Release 9), 3GPP TS 26.304 V6.2.0, 2005-03. [3] Martin Dietz, Lars Liljeryd, Kristofer Kjörling and Oliver Kunz,
"Spectral Band Replication, a novel approach in audio coding”, In 112th
AES Convention, Munich, May, 2002. [4] Nagel, F.; Disch, S., "A harmonic bandwidth extension method for audio
codecs",Acoustics, Speech and Signal Processing, ICASSP. IEEE
International Conference on, vol., no., pp.145-148, 19-24 April 2009. [5] E. Aylon. Automatic detection and classification ofdrum kit sounds.
Master’s thesis, Universitat PompeuFabra, 2006. [6] S. Z. Li, "Content-based audio classification and retrieval using
thenearest feature line method,” IEEE Trans. Speech Audio Process.,
vol.8, no. 5, pp. 619–625, Sep. 2000. [7] ISO/IEC Working Group: MPEG-7 overview. URLhttp://
www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm (2004)
Accessed8.2.2006. [8] Lin, C.-C.; Chen, S.-H.;Truong, T.-K.; Chang, Y., "Audio Classification
and Categorization Based on Wavelets and Support Vector Machine”,
Speech and Audio Processing, IEEE Transactions on,Volume: 13, Issue:
5,pp. 644-651, Sept. 2005.[9] Eigenfeldt, A., Pasquier, P. 2009. "Realtime Selection of Percussion
Samples Through Timbral Similarity in Max/MSP”, in Proceedings of
ICMC. [10] Hyoung-Gook Kim, Commun. Syst. Group, Technische Univ. Berlin,
Germany Sikora, T."Comparison of MPEG-7 audio spectrum projection
features and MFCC applied to speaker recognition, sound classification
and audio segmentation”, Acoustics, Speech, and Signal Processing,
2004. Proceedings. (ICASSP '04). IEEE International Conference on,
Volume5 pp- 925-8 vol.5, 17-21 May 2004. [11] J. R. Quinlan, "Learning efficient classification procedures and the
irapplication to chess end games”, Machin eLearning: An Artificia
lIntelligence Approach,Vol.1,pp.463-482, Toiga, Palo Alto, CA, 1983. [12] E. Aylon. "Automatic detection and classification ofdrum kit sounds.”,
Master’s thesis, Universitat PompeuFabra, 2006.
[13] Stevens, Stanley Smith; Volkman; John; Newman, Edwin B. "A scale for
the measurement of the psychological magnitude pitch". Journal of the
Acoustical Society of America 8 (3): 185–190. 1937.