Accent Identification by Clustering and Scoring Formants

There have been significant improvements in automatic voice recognition technology. However, existing systems still face difficulties, particularly when used by non-native speakers with accents. In this paper we address a problem of identifying the English accented speech of speakers from different backgrounds. Once an accent is identified the speech recognition software can utilise training set from appropriate accent and therefore improve the efficiency and accuracy of the speech recognition system. We introduced the Q factor, which is defined by the sum of relationships between frequencies of the formants. Four different accents were considered and experimented for this research. A scoring method was introduced in order to effectively analyse accents. The proposed concept indicates that the accent could be identified by analysing their formants.




References:
[1] K. Bartkova and D. Jouvet. Automatic detection of foreign accent
for automatic speech recognition. In Proceedings of the International
Congress of Phonetic Sciences ICPS07, pages 2185-2188, 2007.
[2] T. Chen, C. Huang, E. Chang, and J. Wang. Automatic accent identification
using Gaussian mixture models. In Proceedings of the IEEE
Workshop on Automatic Speech Recognition, pages 343-346, 2001.
[3] G. Doddington. Speaker recognition based on idiolectal differences
between speakers. In the Proceedings of the 5th European Conference
on Speech Communication and Technology - Eurospeech01, Aalborg,
Denmark, pages 2521-2524, 2001.
[4] Paola Escudero, Paul Boersma, Andreia Schurt Rauber, and Ricardo
Bion. A Cross-dialect Acoustic Description of Vowels: Brazilian and
European Portuguese. Journal of the Acoustical Society of America,
126(3):1379-1393, 2009.
[5] G. Fant. Acoustic Theory of Speech Production. Mouton and Co, The
Hague, Netherlands, 1960.
[6] James Emil Flege, Ocke-Schwen Bohn, and Sunyoung Jang. Effects of
experience on non-native speakers production and perception of English
vowels. Journal of Phonetics, 5(1):437-470, 1997.
[7] M. Greitans. Adaptive STFT-like Time-Frequency analysis from arbitrary
distributed signal samples. International Workshop on Sampling
Theory and Application, 2005.
[8] Therese Leinonen. Factor analysis of vowel pronunciation in swedish
dialects. International Journal of Humanities and Arts Computing,
2(1):189-204, 2009.
[9] Gina Levow. Investigating pitch accent recognition in non-native speech.
In the Proceedings of the 47th Annual Meeting of the Association for
Computational Linguistics, Singapore, pages 269-272, 2009.
[10] S. Matsunaga, A. Ogawa, Y. Yamaguchi, and A. Imamura. Non-native
English speech recognition using bilingual English lexicon and acoustic
models. In the Proceedings of the IEEE International Conference on
Acoustics, Speech, and Signal Processing - ICASSP03, pages 340-343,
2003.
[11] W. C. McDermott. The Scalability of Degrees of Foreign Accent. PhD
thesis. Cornell University, 1986.
[12] M. J. Munro, T. M. Derwing, and J. E. Flege. Canadians in Alabama:
A perceptual study of dialect acquisition in adults. Studies in Second
Language Acquisition, 27:385-403, 1999.
[13] K.J. Preacher, P.J. Curran, and D.J. Bauer. Computational Tools for
Probing Interactions in Multiple Linear Regression, Multilevel Modeling,
and Latent Curve Analysis. Journal of Educational and Behavioral
Statistics, 31(4):437-448, 2006.
[14] E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, and A. Stolcke.
Modeling prosodic feature sequences for speaker recognition. Speech
Communication, Special Issue on Quantitative Prosody Modelling for
Natural Speech Description and Generation, 46(2):455-472, 2005.
[15] Kamil Wojcicki, Mitar Milacic, Anthony Stark, James Lyons, and
Kuldip Paliwal. Exploiting conjugate symmetry of the short-time fourier
spectrum for speech enhancement. 2008.
[16] Qin Yan and Saeed Vaseghi. Modeling and synthesis of English regional
accents with pitch and duration correlates. Computer Speech and
Language, 24:711-725, 2010.
[17] Y. Zheng, R. Sproat, L. Gu, I. Shafran, H. Zhou, Y. Su, D. Jurafsky,
R. Starr, and S.Y. Yoon. Accent detection and speech recognition for
Shanghai-accented Mandarin. In the Proceedings of the 9th European
Conference on Speech Communication and Technology - Eurospeech05,
pages 217-220, 2005.
[18] M. A. Zissman and E. Singer. Automatic language identification of
telephone speech messages using phoneme recognition and N-gram
modeling. In the Proceedings of the Acoustics, Speech, and Signal
Processing ICASSP, Adelaide, Australia, pages 305-308, 1994.