An Approach for Vocal Register Recognition Based on Spectral Analysis of Singing

Recognizing and controlling vocal registers during
singing is a difficult task for beginner vocalist. It requires among
others identifying which part of natural resonators is being used
when a sound propagates through the body. Thus, an application
has been designed allowing for sound recording, automatic vocal
register recognition (VRR), and a graphical user interface providing
real-time visualization of the signal and recognition results. Six
spectral features are determined for each time frame and passed to the
support vector machine classifier yielding a binary decision on the
head or chest register assignment of the segment. The classification
training and testing data have been recorded by ten professional
female singers (soprano, aged 19-29) performing sounds for both
chest and head register. The classification accuracy exceeded 93%
in each of various validation schemes. Apart from a hard two-class
clustering, the support vector classifier returns also information on
the distance between particular feature vector and the discrimination
hyperplane in a feature space. Such an information reflects the level
of certainty of the vocal register classification in a fuzzy way. Thus,
the designed recognition and training application is able to assess and
visualize the continuous trend in singing in a user-friendly graphical
mode providing an easy way to control the vocal emission.




References:
[1] J. Large, “Towards an integrated physiologic-acoustic theory of vocal
registers,” The NATS Bulletin, vol. 28, pp. 30–35, 1972.
[2] R. L. Whitehead, D. E. Metz, and B. H. Whitehead, “Vibratory patterns
of the vocal folds during pulse register phonation,” The Journal of the
Acoustical Society of America, vol. 75, no. 4, pp. 1293–1297, Apr. 1984.
[3] J. Stark, Bel Canto: A History of Vocal Pedagogy. University of Toronto
Press, 2003.
[4] R. H. Colton, J. K. Casper, and R. Leonard, Understanding Voice
Probems: A Physiological Perspective for Diagnosis and Treatment.
Lippincott Williams & Wilkins, 2006.
[5] A. Frisell, The Tenor voice: a personal guide to acquring a superior
singing technique. Branden Publishing Company, 2007.
[6] G. J. Mysore, R. J. Cassidy, and J. O. Smith, “Singer-dependent
falsetto detection for live vocal processing based on support vector
classification,” in 2006 Fortieth Asilomar Conference on Signals,
Systems and Computers. Institute of Electrical and Electronics
Engineers (IEEE), 2006.
[7] C. T. Ishi, K.-I. Sakakibara, H. Ishiguro, and N. Hagita, “A method for
automatic detection of vocal fry,” IEEE Transactions on Audio, Speech,
and Language Processing, vol. 16, no. 1, pp. 47–56, Jan. 2008.
[8] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-time Signal
Processing (2nd Ed.). Upper Saddle River, NJ, USA: Prentice-Hall,
Inc., 1999.
[9] B. S. Manjunath, P. Salembier, and T. Sikora, Introduction to MPEG-7:
Multimedia Content Description Interface. Wiley & Sons, 2002.
[10] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning,
vol. 20, no. 3, pp. 273–297, 1995.
[11] S. Arlot and A. Celisse, “A survey of cross-validation procedures for
model selection,” Statistics Surveys, vol. 4, pp. 40–79, 2010.