Machine Learning for Music Aesthetic Annotation Using MIDI Format: A Harmony-Based Classification Approach

Swimming with the tide of deep learning, the field of music information retrieval (MIR) experiences parallel development and a sheer variety of feature-learning models has been applied to music classification and tagging tasks. Among those learning techniques, the deep convolutional neural networks (CNNs) have been widespreadly used with better performance than the traditional approach especially in music genre classification and prediction. However, regarding the music recommendation, there is a large semantic gap between the corresponding audio genres and the various aspects of a song that influence user preference. In our study, aiming to bridge the gap, we strive to construct an automatic music aesthetic annotation model with MIDI format for better comparison and measurement of the similarity between music pieces in the way of harmonic analysis. We use the matrix of qualification converted from MIDI files as input to train two different classifiers, support vector machine (SVM) and Decision Tree (DT). Experimental results in performance of a tag prediction task have shown that both learning algorithms are capable of extracting high-level properties in an end-to end manner from music information. The proposed model is helpful to learn the audience taste and then the resulting recommendations are likely to appeal to a niche consumer.





References:
[1] Rodrigo Capobianco Guido Antonio Jose Homsi Goulart and Carlos Dias Maciel. 2012. Exploring different approaches for music genre classification. The Egyptian Informatics Journal 13, 2 (2012). https://doi.org/10.1016/j.eij.2012.03.001
[2] Yaslan Y. Cataltepe, Z. and A. Sonmez. 2007. Music Genre Classification Using MIDI and Audio Features. (2007).
[3] N. Cristianini and J. Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
[4] Cambridge Press, New York, NY. https://doi.org/10.1017/CBO9780511801389
[5] Edward Dunne and Mark Mcconnell. 1999. Pianos and Continued Fractions. Mathematics Magazine 72, 2 (1999), 104–115. https://doi.org/10.1080/0025570X. 1999.11996712
[6] A. Elbir and N. Aydin. 2020. Music genre classification and music recommendation by using deep learning. Electronics Letters 56, 12 (2020), 627–629. https://doi.org/ 10.1049/el.2019.4202
[7] Cory Mckay and Ichiro Fujinaga. 2004. Automatic genre classification using large high-level musical feature sets. In In Int. Conf. on Music Information Retrieval, ISMIR 2004. 525–530.
[8] J. Nam, K. Choi, J. Lee, S. Chou, and Y. Yang. 2019. Deep Learning for Audio- Based Music Classification and Tagging: Teaching Computers to Distinguish Rock from Bach. (2019).
[9] Aäron van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep Content-Based Music Recommendation. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Lake Tahoe, Nevada) (NIPS’13). Curran Associates Inc., Red Hook, NY, USA, 2643–2651.
[10] Flores M.J. Ramírez, J. 2019. Machine learning for music genre: multifaceted review and experimentation with audioset. (2019).
[11] André C.P.L.F de Carvalho Rodrigo C. Barros and Alex A. Freitas. 2015. Automatic Design of Decision-Tree Induction Algorithms. Springer. https://doi.org/10.1007/ 978-3-319-14231-9
[12] George Tzanetakis, Andrey Ermolinskyi, and Perry Cook. 2003. Pitch Histograms in Audio and Symbolic Music Information Retrieval. (2003).
[13] lan H. Witten, Eibe Frank, and Mark A. Hall. 2011. Chapter 4 - Algorithms: The Basic Methods. In Data Mining: Practical Machine Learning Tools and Techniques (Third Edition)