A Non-Parametric Based Mapping Algorithm for Use in Audio Fingerprinting

Over the past few years, the online multimedia
collection has grown at a fast pace. Several companies showed
interest to study the different ways to organise the amount of audio
information without the need of human intervention to generate
metadata. In the past few years, many applications have emerged on
the market which are capable of identifying a piece of music in a
short time. Different audio effects and degradation make it much
harder to identify the unknown piece. In this paper, an audio
fingerprinting system which makes use of a non-parametric based
algorithm is presented. Parametric analysis is also performed using
Gaussian Mixture Models (GMMs). The feature extraction methods
employed are the Mel Spectrum Coefficients and the MPEG-7 basic
descriptors. Bin numbers replaced the extracted feature coefficients
during the non-parametric modelling. The results show that nonparametric
analysis offer potential results as the ones mentioned in
the literature.





References:
[1] P. Cano, E. Batlle, T. Kalker and J. Haitsma, “A Review of Audio
Fingerprinting,” Journal of VLSI Signal Processing Systems, vol. 41, no.
3, pp. 271-284, Nov. 2005.
[2] S. Baluja and M. Covell, “Waveprint: Efficient Wavelet-Based Audio
Fingerprinting,” in Pattern Recognition, pp. 3467-3480, Nov. 2008.
[3] J. Haitsma and T. Kalker, “A Highly Robust Audio Fingerprinting
System,” in Proc. Of ISMIR, 2002.
[4] Y. Ke, D. Hoiem, and R. Sukthankar, “Computer vision for music
identification,” in Proc. of IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), June, 2005.
[5] A. Ramalingam and S. Krishnan, “Gaussian Mixture Modeling of Short
time Fourier Transform Features for Audio Fingerprinting,” IEEE Trans.
Inf. Forens. Security, vol. 1, no. 4, pp. 457-463, Dec. 2006.
[6] E. Battle, J. Masip, E. Guaus and P. Cano, “Scalability issues in an
HMM-based audio fingerprinting,” in Multimedia and Expo 2004.
ICME ’04. 2004 Int. Conf., vol. 1, 2004, pp. 735-738.
[7] J.W. Picone, “Signal modeling techniques in speech recognition,” in
Proc. of IEEE, vol. 81, no. 9, 1993, pp. 1215-1247.
[8] M. Babtan, (2009, December 23). MPEG-7 (Online). Available:
http://www.cs.bilkent.edu.tr/~bilmdg/bilaudio-7/MPEG7.html.
[9] J. Bercher and C. Vignat, “Estimating the entropy of a signal with
applications,” IEEE Transactions on Signal Processing, vol. 48, no. 6,
pp. 1687–1694, June 2000.
[10] J. Herre, O. Hellmuth and M. Cremer, “Scalable Robust Audio
Fingerprinting Using MPEG-7 Content Description,” Multimedia Signal
Processing, 2002 IEEE Workshop, pp. 165-168, Dec. 2002.
[11] E. Allamanche, J. Herre, O. Helmuth, B. Fröba, T. Kasten, and M.
Cremer, “Content-Based Identification of Audio Material Using Mpeg-7
Low Level Description,” Proc. of the Int. Symp. Of Music Information
Retrieval, pp. 197-204, Oct. 2001.