Emotion Recognition Using Neural Network: A Comparative Study

Emotion recognition is an important research field that finds lots of applications nowadays. This work emphasizes on recognizing different emotions from speech signal. The extracted features are related to statistics of pitch, formants, and energy contours, as well as spectral, perceptual and temporal features, jitter, and shimmer. The Artificial Neural Networks (ANN) was chosen as the classifier. Working on finding a robust and fast ANN classifier suitable for different real life application is our concern. Several experiments were carried out on different ANN to investigate the different factors that impact the classification success rate. Using a database containing 7 different emotions, it will be shown that with a proper and careful adjustment of features format, training data sorting, number of features selected and even the ANN type and architecture used, a success rate of 85% or even more can be achieved without increasing the system complicity and the computation time





References:
[1] C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. Lee, A. Kazemzadeh, et al., "Analysis of Emotion Recognition Using Facial Expressions, Speech and Multimodal Information.," in International Commission on Mathematical Instruction, 2004, pp. 205-211.
[2] S. Ramakrishnan, "Recognition of Emotion from Speech: A Review, Speech Enhancement, Modeling and Recognition- Algorithms and Applications," ISBN 978-953-51-0291-5, Hard cover, 138 pages, Publisher: InTech, Chapters published March 14, 2012 under CC BY 3.0 license DOI: 10.5772/2391, pp. 66-80.
[3] Zhongzhe Xiao, "Features Extraction and Selection for Emotional Speech Classification," in Advanced Video and Signal Based Surveillance, 2005. AVSS ,IEEE Conference, Ecully, France, Sept. 2005, pp. 411- 416.
[4] Margarita Kotti, Fabio Paternò, "Speaker-Independent Emotion Recognition Exploiting a Psychologically-Inspired Binary Cascade Classification Schema," in International Journal of Speech Technology,
vol. 15, no. 2, June 2012, pp. 131-150.
[5] K. J. Patil,P. H. Zope,S. R. Suralkar, "Emotion Detection From Speech
Using Mfcc & Gmm," International Journal of Engineering Research &
Technology (IJERT), vol. 1, no. 9, November- 2012.
[6] Björn Schuller, Gerhard Rigoll, Manfred Lang, "Speech Emotion
Recognition Combining Acoustic Features And Linguistic Information
In A Hybrid Support Vector Machine - Belief Network Architecture," in
Acoustics, Speech, and Signal Processing. (ICASSP '04). IEEE
International Conference on, 2004, pp. I- 577-80.
[7] Mina Hamidi, Muharram Mansoorizade, "Emotion Recognition From
Persian Speech With Neural Network," International Journal of
Artificial Intelligence & Applications (IJAIA), vol. 3, no. 5, September
2012,p. 107.
[8] Constantine Kotropoulos, Dimitrios Ververidis, "Sequential Forward
Feature Selection With Low Computational Cost," in European Signal
processing conference, Turkey, 2005.
[9] Keshi Dai , Harriet J. Fell, Joel MacAuslan, "Recognizing Emotion In
Speech Using Neural Networks," in Telehealth/AT '08 Proceedings of
the IASTED International Conference on Telehealth/Assistive
Technologies, ACTA Press Anaheim, CA, USA, 2008, pp. 31-36.
[10] Mehmet S. Unluturk, Kaya Oguz, Coskun Atay, "Emotion Recognition
Using Neural Networks," in 10th WSEAS (World Scientific and
Engineering Academy and Society ) international conference on Neural
networks, USA, 2009, pp. 82-85.
[11] Yafei Sun, "Neural Networks for Emotion Classification," 2003.
[12] K B Khanchandani, Moiz A Hussain, , "Emotion Recognition Using
Multilayer Perceptron and Generalized Feed Forward Neural Network,"
Journal of Scientific and Industrial Research (JSIR), vol. 68, no. 05,
May 2009, pp. 367-371.
[13] A.A. Razak, R. Komiya, M. Izani, Z. Abidin, "Comparison Between
Fuzzy and NN Method for Speech Emotion Recognition," in
Information Technology and Applications, 2005. ICITA, 2005, vol 1, pp.
297- 302.
[14] Björn Schuller, Gerhard Rigoll, Manfred Lang, "Hidden Markov Model-
Based Speech Emotion Recognition," in Multimedia and Expo, 2003.
ICME '03. Proceedings, 2003, vol.1, pp. I-401-4.
[15] M. Murugappan, "Human Emotion Classification Using Wavelet
Transform and KNN," in Pattern Analysis and Intelligent Robotics
(ICPAIR), 2011, pp. 148-153.
[16] Dimitrios Ververidis, Constantine Kotropoulos, "Emotional speech
Classification Using Gaussian Mixture Models," in Circuits and
Systems. ISCAS 2005. IEEE International Symposium, 2005, pp. 2871-
2874.
[17] Chung-hsien Wu , Ze-jing Chuang, "Emotion Recognition Using IGbased
Feature Compensation and Continuous Support Vector
Machines," in International Journal of Computational Linguistics and
Chinese Language Processing, vol.12, no. 1, 2007, pp.65-78.
[18] Jarosław Cichosz, Krzysztof Ślot, "Emotion Recognition in Speech
Signal Using Emotion-Extracting Binary Decision Tree," in Polish State
Fund for Research, 2008.
[19] Dimitrios Ververidis, Constantine Kotropoulos, "Emotional Speech
Recognition: Resources, Features, and Methods" in Speech
Communication, April 2006.
[20] A. Firoz Shah., A. Raji Sukumar, P. Babu Anto., "Discrete Wavelet
Transforms and Artificial Neural Networks for Speech Emotion
Recognition" in International Journal of Computer Theory and
Engineering, vol. 2, no. 3, June 2010.
[21] A. Firoz Shah.,A. Raji Sukumar.,P. Babu Anto., "Automatic Emotion
Recognition from Speech using Artificial Neural Networks with Gender-
Dependent Databases," in International Conference on Advances in
Computing, Control, and Telecommunication Technologies (IEEE),
2009.
[22] Dimitrios Ververidis, Constantine Kotropoulos, "Automatic Speech
Classification to Five Emotional States Based on Gender Information,"
in European Signal Processing Conference (EUSIPCO), Austrilia, 2004,
pp. 341-344.
[23] Institute for Speech and Communication Technical University. Berlin
Database of Emotional Speech. (Online). http://pascal.kgw.tuberlin.
de/emodb/index-1280.html
[24] Tomi Kinnunen, Haizhou Li, "An Overview of Text-Independent
Speaker Recognition: from Features to Supervectors," in Speech
Communication, vol. 52, no. 1, January 2010, pp. 12-40.
[25] L. Rabiner, "On the Use of Autocorrelation Analysis for Pitch
Detection" in Acoustics, Speech and Signal Processing, IEEE
Transactions on, vol. 25, no. 1, Feb 1977, pp. 24-33.
[26] W. J. Hess, Pitch Determination of Speech Signals: Algorithms and
Devices.: Springer Series in Information Sciences, 1983 pp.415-460.
[27] D. Talkin, "A Robust Algorithm for Pitch Tracking (RAPT)," in Speech
Coding and Synthesis, Elseiver Science, Amsterdam, 1995, pp. 495-518.
[28] M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg, H. J. Manley,
"Average Magnitude Difference Function Pitch Extractor," in Acoustics,
Speech and Signal Processing, IEEE Transactions on , vol 22, no. 5,
Oct.1974., pp. 353–362.
[29] A. M. Noll, "Cepstrum Pitch Determination," in Journal of the
Acoustical Society of America, vol. 41, no. 2, 1967, pp. 293-309.
[30] L. Rabiner, M. J. Cheng, A. E. Rosenberg, C. A. McGonegal, "A
Comparative Performance Study of Several Pitch Detection
Algorithms," in IEEE Transactions on ASSP, vol. 24, 1976, pp. 399-417.
[31] H. Bořil, P. Pollák, "Direct Time Domain Fundamental Frequency
Estimation of Speech in Noisy Conditions," in Proceeding of
EUSIPCO2004, Wien, vol. 1, Austria, 2004, pp. 1003-1006.
[32] Tuomo Raitio, Antti Suni, Junichi Yamagishi, Hannu Pulakka, Jani
Nurminen, Martti Vainio, Paavo Alku, "HMM-Based Speech Synthesis
Utilizing Glottal Inverse Filtering," in Audio, Speech, and Language
Processing, IEEE Transactions on, vol. 19, no. 1, January 2001, pp.
153-165.
[33] Alku, Paavo, "Glottal Wave Analysis with Pitch Synchronous Iterative
Adaptive Inverse Filtering," in Speech Communication - Eurospeech '91,
vol. 11, no. 2-3, June 1992, pp. 109-118.
[34] Johannes Pittermann, Angela Pittermann, Wolfgang Minker, Handling
Emotions in Human-Computer Dialogues.: Springer, 2009.
[35] Jia Rong, "Acoustic Features Extraction for Emotion Recognition," in
Computer and Information Science, 6th IEEE/ACIS International
Conference, Melbourne , July 2007, pp. 419- 424.
[36] N. Kwak, "Input Feature Selection for Classification Problems" in
Neural Networks, IEEE Transactions, vol. 13, no. 1, Jan 2002, pp. 143-
159.
[37] Wauter Bosma, Elisabeth Andr E, "Exploiting Emotions to
Disambiguate Dialogue Acts," in Intelligent User Interfaces - IUI ,
March 2004, pp. 85-92.
[38] Hao Tang ;Chu, S.M. ; Hasegawa-Johnson, M. ; Huang, T.S. , "Emotion
Recognition from Speech VIA Boosted Gaussian Mixture Models," in
Multimedia and Expo, 2009. ICME. IEEE , 2009, pp. 294-297.
[39] Jianping Huaa,Waibhav D.Tembeb, EdwardR.Dougherty, "Performance
of Feature-Selection Methods in the Classification of High Dimension
Data,"in Elsevier, vol. 42, no. 3, 2009, pp. 409-424.
[40] M. Soryani , N. Rafat, "Application of Genetic Algorithms to Feature
Subset Selection in a Farsi OCR," in Proceedings of World Academy of
Science, Engineering and Technology, 2006, pp. 113-116.
[41] Mineichi Kudo , Jack Sklansky, "Comparison of Algorithms that Select
Features for Pattern Classifiers," Pattern Recognition-PR, vol. 33, no. 1,
2000, pp. 25-41.
[42] Delft Pattern Recognition Research, Faculty EWI - ICT, Delft
University of Technology. PRTools, A Matlab toolbox for pattern
recognition. (Online). http://prtools.org/
[43] Kevin L. Priddy, Paul E. Keller, Artificial Neural Networks: An
Introduction, 1st ed.: SPIE Press, 2005,pp. 107-116.