Musical Instrument Classification Using Embedded Hidden Markov Models

In this paper, a novel method for recognition of musical instruments in a polyphonic music is presented by using an embedded hidden Markov model (EHMM). EHMM is a doubly embedded HMM structure where each state of the external HMM is an independent HMM. The classification is accomplished for two different internal HMM structures where GMMs are used as likelihood estimators for the internal HMMs. The results are compared to those achieved by an artificial neural network with two hidden layers. Appropriate classification accuracies were achieved both for solo instrument performance and instrument combinations which demonstrates that the new approach outperforms the similar classification methods by means of the dynamic of the signal.




References:
[1] L. R. Rabiner, "A tutorial on hidden Markov models and selected
applications in speech recognition", Proc. IEEE, vol. 77, no. 2, pp.
257-286, Feb. 1989.
[2] Xuedong Huang, Alejandro Acero, Alex Acero and Hsiao-Wuen Hon,
Spoken language processing: a guide to theory, algorithm, and system
development, Prentice Hall PTR, 2001.
[3] Lawrence R. Rabiner, Biing-Hwang Juang, Fundamentals of Speech
Recognition, Pearson Education, 1993.
[4] Jun Wu, E. Vincent, S. A. Raczynski, T. Nishimoto, N. Ono and
S. Sagayama, "Polyphonic Pitch Estimation and Instrument Identification
by Joint Modeling of Sustained and Attack Sounds", Selected Topics
in Signal Processing, IEEE Journal of , vol.5, no.6, pp.1124-1132, Oct.
2011
[5] J. J. Aucouturier and M. Sandler, "Segmentation of musical signals using
hidden Markov models", presented at the 110th Conv. Audio Eng. Soc.,
May 2001.
[6] T. Virtanen and T. Heittola, "Interpolating hidden Markov model and its
application to automatic instrument recognition", Acoustics, Speech and
Signal Processing, 2009. ICASSP 2009. IEEE International Conference
on , vol., no., pp.49-52, 19-24 April 2009.
[7] C. Raphael, "Automatic segmentation of acoustic musical signals using
hidden Markov models", IEEE Trans. Pattern Anal. Mach. Intell., vol.
21, no. 4, pp. 360370, Apr. 1999.
[8] A. Eronen, "Musical instrument recognition using ICA-based transform
of features and discriminatively trained HMMs", Signal Processing and
Its Applications, 2003. Proceedings. Seventh International Symposium on
, vol.2, no., pp. 133- 136 vol.2, 1-4 July 2003.
[9] Jonghyun Lee and Joohwan Chun, "Musical instruments recognition
using hidden Markov model", Signals, Systems and Computers, 2002.
Conference Record of the Thirty-Sixth Asilomar Conference on , vol.1,
no., pp.196-199 vol.1, 3-6 Nov. 2002.
[10] N. Degara, M. E. P. Davies, A. Pena and M. D. Plumbley, "Onset
Event Decoding Exploiting the Rhythmic Structure of Polyphonic Music",
Selected Topics in Signal Processing, IEEE Journal of , vol.5, no.6,
pp.1228-1239, Oct. 2011.
[11] Yuting Qi, J. W. Paisley, L. Carin, "Music Analysis Using Hidden
Markov Mixture Models", Signal Processing, IEEE Transactions on ,
vol.55, no.11, pp.5209-5224, Nov. 2007.
[12] R. J. Weiss and J. P. Bello, "Unsupervised Discovery of Temporal
Structure in Music", Selected Topics in Signal Processing, IEEE Journal
of , vol.5, no.6, pp.1240-1251, Oct. 2011.
[13] A. Pikrakis, S. Theodoridis, and D. Kamarotos, "Classification of
musical patterns using variable duration hidden Markov models", IEEE
Trans. Audio, Speech, Lang. Process. ,voI.14, pp.1795-1807, 2006.
[14] Jean-Julien Aucouturier and Mark Sandler, "Segmentation of Musical
Signals Using Hidden Markov Models", Presented at the 110th
Convention, Amsterdam, The Netherlands, 12-15 May 2001.
[15] Kai Shen, Sheng Gao, Peiqi Chai and Q. Sun, "Music Identification
Using Embedded HMM", Multimedia Signal Processing, 2005 IEEE
7th Workshop on , vol., no., pp.1-4, Oct. 30 2005-Nov. 2 2005.
[16] G. D. Forney, "The Viterbi algorithm", Proc. IEEE, vol.61, pp. 268-
278, Mar. 1973.
[17] A. Eronen and A. Klapuri, "Musical instrument recognition using
cepstral coefcients and temporal features", in Proc. IEEE Int. Conf.
Acoust., Speech, Signal Process. (ICASSP), 2000, vol. 2, pp. 753-756.
[18] J. C. Brown, "Computer identication of musical instruments using
pattern recognition with cepstral coefcients as features", J. Acoust.
Soc. Amer., vol. 105, no. 3, pp. 19331941, 1999.
[19] E. Vincent and X. Rodet, "Instrument identication in solo and ensemble
music using independent subspace analysis", in Proc. Int. Conf. Music
Inf. Retrieval (ISMIR), 2004, pp. 576-581.
[20] A. Eronen, "Comparison of features for musical instrument recognition",
Applications of Signal Processing to Audio and Acoustics, 2001 IEEE
Workshop on the , vol., no., pp.19-22, 2001.
[21] A. Eronen and A. Klapuri; , "Musical instrument recognition using cepstral
coefficients and temporal features", Acoustics, Speech, and Signal
Processing, 2000. ICASSP -00. Proceedings. 2000 IEEE International
Conference on , vol.2, no., pp.II753-II756 vol.2, 2000.
[22] Beth Logan, "Mel frequency cepstral coefficients for music modeling",
In International Symposium on Music Information Retrieval, 2000.
[23] Monson H. Hayes, Statistical digital signal processing and modeling,
John Wiley & Sons, Inc., 1996.
[24] N. C. Maddage, "Automatic structure detection for popular music",
Multimedia, IEEE , vol.13, no.1, pp. 65- 77, Jan.-March 2006.
[25] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, "Hierarchical
Dirichlet processes", J. Amer. Statist. Assoc., vol. 101, pp. 15661581,
2006.
[26] Katrin Weber, "HMM Mixtures (HMM2) for Robust Speech Recognition",
http://www.idiap.ch/publications/weberrr-0334.bib.abs.html,
2003.
[27] J. Marques and P. Moreno, "A study of musical instrument classification
using gaussian mixture models and support vector machines", Compaq
Computer Corporation, Tech. Rep. CRL 99/4, 1999.
[28] S. S. Stevens and J. Volkman, "The Relation of Pitch to Frequency",
Journal of Psychology, 1940, 53, pp. 329.
[29] L. E. Baum, "An inequality and associated maximization technique in
statistical estimation for probabilistic functions of Markov processes",
Inequalities, vol. 3, pp. 1-8, 1972.
[30] S. E. Levinson, L. R. Rabiner and M. M. Sondhi, "An introduction to the
application of the theory of probabilistic functions of a Markov process
to automatic speech recognition", Bell Syst. Tech. J., vol. 62, no. 4, pp.
1035-1074, Apr. 1983.
[31] Xin Zhang and Z. W. Ras, "Analysis of Sound Features for Music
Timbre Recognition", Multimedia and Ubiquitous Engineering, 2007.
MUE -07. International Conference on , vol., no., pp.3-8, 26-28 April
2007.
[32] G. Tzanetakis and P. Cook, "Musical genre classification of audio
signals", Speech and Audio Processing, IEEE Transactions on , vol.10,
no.5, pp. 293- 302, Jul 2002.
[33] Changsheng Xu, N. C. Maddage and Xi Shao, "Automatic music
classification and summarization", Speech and Audio Processing, IEEE
Transactions on , vol.13, no.3, pp. 441- 450, May 2005.