Automatic Recognition of Emotionally Coloured Speech
Emotion in speech is an issue that has been attracting
the interest of the speech community for many years, both in the
context of speech synthesis as well as in automatic speech
recognition (ASR). In spite of the remarkable recent progress in
Large Vocabulary Recognition (LVR), it is still far behind the
ultimate goal of recognising free conversational speech uttered by
any speaker in any environment. Current experimental tests prove
that using state of the art large vocabulary recognition systems the
error rate increases substantially when applied to
spontaneous/emotional speech. This paper shows that recognition
rate for emotionally coloured speech can be improved by using a
language model based on increased representation of emotional
utterances.
[1] K. Cummings, and M. Clements, Analysis of the glottal excitation of
emotionally styled and stressed speech. JASA, 98 (1), pp 88-98, 1995.
[2] H.J.M. Steeneken, and J.H.L. Hansen, Speech Under Stress
Conditions:Overview of the Effect of Speech Production and on System
Performance. IEEE ICASSP-99: Inter. Conf. on Acoustics, Speech, and
Signal Processing 4, pp 2079-2082, 1999.
[3] R. Cowie, and R. Cornelius, Describing the Emotional States that are
Expressed in Speech. Speech Communication, 40, pp 5-32, 2003.
[4] D.J. Litman, J.B. Hirschberg, and M. Swerts, Predicting Automatic
Speech Recognition Performance Using Prosodic Cues. Proceedings of
ANLP-NAACL, pp. 218-225, 2000.
[5] C.E. Williams, K.N. Stevens, Emotions and speech: Someacoustical
correlates. J. Acoust. Soc. Amer. 52, pp 1238-1250, 1972.
[6] S.T. Polzin, and A. Waibel, Pronunciation variations in emotional
speech. In H. Strik, J. M. Kessens & M. Wester (Eds.) Modeling
Pronunciation Variation for Automatic Speech Recognition. Proc. of the
ESCA Workshop, 1998, pp. 103-108.
[7] S.J. Young, Large Vocabulary Continuous Speech Recognition.IEEE
Signal Processing Magazine 13, (5), pp 45-57, 1996.
[8] T. Athanaselis, S. Bakamidis, I. Dologlou, R. Cowie, E. Douglas-Cowie,
and C. Cox, "ASR for emotional speech: clarifying the issues and
enhancing performance", Neural Networks Elsevier Publications,
Volume 18, Issue 4, pp 437- 444, 2005.
[9] C. Whissell, "The dictionary of affect in language". In R. Plutchnik & H.
Kellerman (Eds.) Emotion: Theory and research. New York, Harcourt
Brace, pp. 113-131, 1989.
[10] ERMIS FP5 IST Project http://manolito.image.ece.ntua.gr/ermis/
[11] EC HUMAINE project (http://www.emotion-research.net).
[12] R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey,
M.Schröder, 'Feeltrace': An instrument for recording perceived emotion
in real time. In E. Douglas-Cowie, R. Cowie & M. Schröder (Eds.)
Proceedings of the ISCA Workshop on Speech and Emotion: A
Conceptual Framework for Research, Belfast, pp.19-24, 2000.
[13] E. Douglas-Cowie, et al. Multimodal data in action and interaction:a
library of recordings and labelling schemes HUMAINE report D5d
http://emotion-research.net/deliverables/ 2003.
[1] K. Cummings, and M. Clements, Analysis of the glottal excitation of
emotionally styled and stressed speech. JASA, 98 (1), pp 88-98, 1995.
[2] H.J.M. Steeneken, and J.H.L. Hansen, Speech Under Stress
Conditions:Overview of the Effect of Speech Production and on System
Performance. IEEE ICASSP-99: Inter. Conf. on Acoustics, Speech, and
Signal Processing 4, pp 2079-2082, 1999.
[3] R. Cowie, and R. Cornelius, Describing the Emotional States that are
Expressed in Speech. Speech Communication, 40, pp 5-32, 2003.
[4] D.J. Litman, J.B. Hirschberg, and M. Swerts, Predicting Automatic
Speech Recognition Performance Using Prosodic Cues. Proceedings of
ANLP-NAACL, pp. 218-225, 2000.
[5] C.E. Williams, K.N. Stevens, Emotions and speech: Someacoustical
correlates. J. Acoust. Soc. Amer. 52, pp 1238-1250, 1972.
[6] S.T. Polzin, and A. Waibel, Pronunciation variations in emotional
speech. In H. Strik, J. M. Kessens & M. Wester (Eds.) Modeling
Pronunciation Variation for Automatic Speech Recognition. Proc. of the
ESCA Workshop, 1998, pp. 103-108.
[7] S.J. Young, Large Vocabulary Continuous Speech Recognition.IEEE
Signal Processing Magazine 13, (5), pp 45-57, 1996.
[8] T. Athanaselis, S. Bakamidis, I. Dologlou, R. Cowie, E. Douglas-Cowie,
and C. Cox, "ASR for emotional speech: clarifying the issues and
enhancing performance", Neural Networks Elsevier Publications,
Volume 18, Issue 4, pp 437- 444, 2005.
[9] C. Whissell, "The dictionary of affect in language". In R. Plutchnik & H.
Kellerman (Eds.) Emotion: Theory and research. New York, Harcourt
Brace, pp. 113-131, 1989.
[10] ERMIS FP5 IST Project http://manolito.image.ece.ntua.gr/ermis/
[11] EC HUMAINE project (http://www.emotion-research.net).
[12] R. Cowie, E. Douglas-Cowie, S. Savvidou, E. McMahon, M. Sawey,
M.Schröder, 'Feeltrace': An instrument for recording perceived emotion
in real time. In E. Douglas-Cowie, R. Cowie & M. Schröder (Eds.)
Proceedings of the ISCA Workshop on Speech and Emotion: A
Conceptual Framework for Research, Belfast, pp.19-24, 2000.
[13] E. Douglas-Cowie, et al. Multimodal data in action and interaction:a
library of recordings and labelling schemes HUMAINE report D5d
http://emotion-research.net/deliverables/ 2003.
@article{"International Journal of Electrical, Electronic and Communication Sciences:51026", author = "Theologos Athanaselis and Stelios Bakamidis and Ioannis Dologlou", title = "Automatic Recognition of Emotionally Coloured Speech", abstract = "Emotion in speech is an issue that has been attracting
the interest of the speech community for many years, both in the
context of speech synthesis as well as in automatic speech
recognition (ASR). In spite of the remarkable recent progress in
Large Vocabulary Recognition (LVR), it is still far behind the
ultimate goal of recognising free conversational speech uttered by
any speaker in any environment. Current experimental tests prove
that using state of the art large vocabulary recognition systems the
error rate increases substantially when applied to
spontaneous/emotional speech. This paper shows that recognition
rate for emotionally coloured speech can be improved by using a
language model based on increased representation of emotional
utterances.", keywords = "Statistical language model, N-grams, emotionallycoloured speech", volume = "1", number = "12", pages = "1738-4", }