Practical Method for Digital Music Matching Robust to Various Sound Qualities

In this paper, we propose a practical digital music matching system that is robust to variation in sound qualities. The proposed system is subdivided into two parts: client and server. The client part consists of the input, preprocessing and feature extraction modules. The preprocessing module, including the music onset module, revises the value gap occurring on the time axis between identical songs of different formats. The proposed method uses delta-grouped Mel frequency cepstral coefficients (MFCCs) to extract music features that are robust to changes in sound quality. According to the number of sound quality formats (SQFs) used, a music server is constructed with a feature database (FD) that contains different sub feature databases (SFDs). When the proposed system receives a music file, the selection module selects an appropriate SFD from a feature database; the selected SFD is subsequently used by the matching module. In this study, we used 3,000 queries for matching experiments in three cases with different FDs. In each case, we used 1,000 queries constructed by mixing 8 SQFs and 125 songs. The success rate of music matching improved from 88.6% when using single a single SFD to 93.2% when using quadruple SFDs. By this experiment, we proved that the proposed method is robust to various sound qualities.





References:
[1] G. Skobeltsy, T. Luu, I. P. Zarko, M.
[2] Rajman and K. aberer, "Query-Driven Indexing for Peer-toPeer Text
Retrieval," Proc. 16th International World Wide Web Conference, Canada,
2007, pp. 1185-1186.
[3] B. Stein, "Fuzzy-Fingerprints for Text-Based Information Retrieval,"
Proc. I-KNOW 05th, 2005, pp. 572-579.
[4] Y. Peng, C.W. Ngo, C. Fang, X. Chen and J. Xiao, "Audio Similarity
Measure by Graph Modeling and Matching," Proc. 14th annual ACM
international conference on Multimedia, USA, 2006, pp. 603-606.
[5] F. Kurth and M. Muller, "Efficient Index-Based Audio Matching," IEEE
Trans. Audio, Speech and Language processing, vol. 16, no. 2, pp.
382-395, Feb. 2008.
[6] P. Roos and B. Manaris, "A Music Information Retrieval Approach Based
on Power Laws," Proc. 19th IEEE ICTAI, Greece, Oct. 2007, pp. 29-31.
[7] Z. W. Ras, X. Zhang and R. Lewis, "MIRAI: Multi-hierarchical, FS-Tree
Based Music Information Retrieval System," LNAI 4585, pp. 80-89,
2007.
[8] M. I. Mandel, D. P. W. Ellis, "Multiple-Instance Learning for Music
Information Retrieval," Proc. ISMIR 2008.
[9] S. Wabnik, G. Schuller, J. Hirschfeld and U. Kraemer, "Different
quantization noise shaping methods for predictive audio coding," Proc.
IEEE International Conference on Acoustics, Speech and Signal
processing, France, 2006.
[10] M. Park, H. R. Kim and S. H. Yang, "Frequency-Temporal Filtering for a
Robust Audio Fingerprinting Scheme in Real-Noise Environments,"
ETRI Journal, vol. 28, no. 4, pp. 509-512, Aug. 2006.
[11] S. Hamawaki, S. Funasawa, J. Katto, H. Ishizaki, K. Hoashi and Y.
Takishima, "Feature Analysis and Normalization Approach for Robust
Content-Based Music Retrieval to Encoded Audio with Different Bit
Rates," LNCS 5371, 2008.
[12] D. Giuliani, M. Gerosa and F. Brugnara, "Improved automatic speech
recognition through speaker normalization," Computer Speech and
Language, vol. 20, pp. 107-123, 2006.
[13] C. C. Toh, B. Zhang and Y. Wang, "Multiple feature fusion based onset
detection for solo singing voice," Proc. ISMIR 2008.
[14] R. Zhou and J. D. Reiss, "Music Onset detection combining energy-based
and pitch-based approaches," Proc. MIREX Audio Onset Detection
Contest, 2007.
[15] W. Pan, Y. Yao, Z. Liu and W. Huang, "Audio Classification in a
Weighted SVM," Proc. ISCIT07, 2007.
[16] A. J. eronen, V. T. Peltonen, J. T. Tuomi, A. P. Klapuri, S. Fagerlund, T.
Sorsa, G. Lorho and J. Huopaniemi, "Audio-Based Contest Recognition,"
IEEE Trans. Audio, Speech and Language processing, vol. 14, no. 1, pp.
321-329, Jan. 2006.
[17] A. Farina, "Assessment of Hearing Damage when listening to music
through a personal digital audio player," Journal of the Acoustical Society
of America, 2008.
[18] J. E. M. Exposito, S. G. Galan, N. R. Reyes and P. V. Candeas, "Adaptive
network-based fuzzy inference system vs. other classification algorithms
for warped LPC-based speech/music discrimination," Engineering
Applications of Artificial Intelligence Journal, vol. 20, pp. 783-793, 2007.
[19] M. K. S. Khan and W. G. A. Khatib, "Machine-learning based
classification of speech and music," Multimedia Systems Journal, vol. 12,
no. 1, pp. 55-67, 2006.
[20] H. Zhou, A. Sadka and R. M. Jiang, "Feature Extraction for Speech and
Music Discrimination," Proc. 6th International Workshop on
Content-Based Multimedia Indexing, UK, Jun. 2008.
[21] G. J. A. Hunter and K. Zienowicz and A.I Shihab, "The Use of Mel
Cepstral Coefficients and Markov Models for the Audomatic
Identification, Classification and Sequence Modeling of Salient Sound
Events Occurring During Tennis Matches," Journal of the Acoustical
Society of America, vol. 123, issue. 5, pp. 3431, 2008.
[22] K. M. Indrebo, R. J. Povinelli and M. T. Johnson, "Minimum
Mean-Squared Error Estimation Mel-Frequency Cepstral Coefficients
Using a Novel Distortion Model," IEEE Trans. Audio, Speech and
Language processing, vol. 16, no. 8, pp. 1654-1661, Nov. 2008.
[23] A. H. Nour-Eldin and P. Kabal, "Mel-Frequency Cepstral
Coefficient-Based Bandwidth Extension of Narrowband Speech," Proc.
InterSpeech, Brisbane, 2008.
[24] N. Sato and Y. Obuchi, "Emotion Recognition using Mel-Frequency
Cepstral Coefficients," Journal of Natural Language Processing, vol. 14,
no. 4, pp. 83-96, 2007.
[25] J. Bergstra and N. Casagrande, "Aggregate features and Adaboost for
music classification," Machine Learning Journal, vol. 65, no. 2-3, pp.
473-484, Dec. 2006.
[26] E. Schubert and J. Wolfe, "Does Timbral Brightness Scale with
Frequency and Spectral Centroid?," ACTA Acoustica United with
Acoustica, vol. 92, pp. 820-825, 2006.
[27] T. Li and M. Ogihara, "Content-based music similarity search and
emotion detection," Proc. IEEE International Conference on Acoustic,
Speech and Signal processing, France, 2006.