Multimodal Database of Emotional Speech, Video and Gestures

People express emotions through different modalities.
Integration of verbal and non-verbal communication channels creates
a system in which the message is easier to understand. Expanding
the focus to several expression forms can facilitate research on
emotion recognition as well as human-machine interaction. In this
article, the authors present a Polish emotional database composed of
three modalities: facial expressions, body movement and gestures,
and speech. The corpora contains recordings registered in studio
conditions, acted out by 16 professional actors (8 male and 8 female).
The data is labeled with six basic emotions categories, according to
Ekman’s emotion categories. To check the quality of performance,
all recordings are evaluated by experts and volunteers. The database
is available to academic community and might be useful in the study
on audio-visual emotion recognition.




References:
[1] A. T. Lopes, E. de Aguiar, A. F. De Souza, and T. Oliveira-Santos,
“Facial expression recognition with convolutional neural networks:
coping with few data and the training sample order,” Pattern
Recognition, vol. 61, pp. 610–628, 2017.
[2] K. Zhang, Y. Huang, Y. Du, and L. Wang, “Facial expression
recognition based on deep evolutional spatial-temporal networks,” IEEE
Transactions on Image Processing, vol. 26, no. 9, pp. 4193–4203, 2017.
[3] F. Noroozi, T. Sapi´nski, D. Kami´nska, and G. Anbarjafari, “Vocal-based
emotion recognition using random forests and decision tree,”
International Journal of Speech Technology, vol. 20, no. 2, pp. 239–246,
2017.
[4] D. Kami´nska, T. Sapi´nski, and G. Anbarjafari, “Efficiency of chosen
speech descriptors in relation to emotion recognition,” EURASIP Journal
on Audio, Speech, and Music Processing, vol. 2017, no. 1, p. 3, 2017.
[5] P. Pławiak, T. So´snicki, M. Nied´zwiecki, Z. Tabor, and K. Rzecki,
“Hand body language gesture recognition based on signals from
specialized glove and machine learning algorithms,” IEEE Transactions
on Industrial Informatics, vol. 12, no. 3, pp. 1104–1113, 2016.
[6] L. Kiforenko and D. Kraft, “Emotion recognition through body
language using rgb-d sensor,” in 11th International Conference on
Computer Vision Theory and ApplicationsComputer Vision Theory and
Applications. SCITEPRESS Digital Library, 2016, pp. 398–405.
[7] R. Jenke, A. Peer, and M. Buss, “Feature extraction and selection
for emotion recognition from eeg,” IEEE Transactions on Affective
Computing, vol. 5, no. 3, pp. 327–339, 2014.
[8] S. Jerritta, M. Murugappan, K. Wan, and S. Yaacob, “Emotion
recognition from facial emg signals using higher order statistics and
principal component analysis,” Journal of the Chinese Institute of
Engineers, vol. 37, no. 3, pp. 385–394, 2014.
[9] A. Greco, G. Valenza, L. Citi, and E. P. Scilingo, “Arousal and valence
recognition of affective sounds based on electrodermal activity,” IEEE
Sensors Journal, vol. 17, no. 3, pp. 716–725, 2017.
[10] R. Gupta, M. Khomami Abadi, J. A. Cárdenes Cabré, F. Morreale, T. H.
Falk, and N. Sebe, “A quality adaptive multimodal affect recognition
system for user-centric multimedia indexing,” in Proceedings of the 2016
ACM on international conference on multimedia retrieval. ACM, 2016,
pp. 317–320.
[11] B. d. Gelder, “Why Bodies? Twelve Reasons for Including Bodily
Expressions in Affective Neuroscience.” hilosophical Transactions of the
Royal Society B: Biological Sciences, vol. 364, no. 364, p. 3475–3484,
2009.
[12] D. Efron, “Gesture and environment.” 1941.
[13] A. Kendon, “The study of gesture: Some remarks on its history,” in
Semiotics 1981. Springer, 1983, pp. 153–164.
[14] B. Pease and A. Pease, The definitive book of body language. Bantam,
2004.
[15] L. Yin, X. Wei, Y. Sun, J. Wang, and M. J. Rosato, “A 3d facial
expression database for facial behavior research,” in Automatic face and
gesture recognition, 2006. FGR 2006. 7th international conference on.
IEEE, 2006, pp. 211–216.
[16] L. Yin, X. Chen, Y. Sun, T. Worm, and M. Reale, “A high-resolution
3d dynamic facial expression database,” in Automatic Face and Gesture
Recognition, 2008. FG08. 8th IEEE International Conference on. IEEE,
2008, pp. 1–6.
[17] A. Savran, N. Alyüz, H. Dibeklio˘glu, O. Çeliktutan, B. Gökberk,
B. Sankur, and L. Akarun, “Bosphorus database for 3d face analysis,” in
European Workshop on Biometrics and Identity Management. Springer,
2008, pp. 47–56.
[18] X. Zhang, L. Yin, J. F. Cohn, S. Canavan, M. Reale, A. Horowitz, P. Liu,
and J. M. Girard, “Bp4d-spontaneous: a high-resolution spontaneous
3d dynamic facial expression database,” Image and Vision Computing,
vol. 32, no. 10, pp. 692–706, 2014.
[19] G. Goswami, M. Vatsa, and R. Singh, “Rgb-d face recognition with
texture and attribute features,” IEEE Transactions on Information
Forensics and Security, vol. 9, no. 10, pp. 1629–1640, 2014.
[20] R. Hg, P. Jasek, C. Rofidal, K. Nasrollahi, T. B. Moeslund, and
G. Tranchet, “An rgb-d database using microsoft’s kinect for windows
for face detection,” in Signal Image Technology and Internet Based
Systems (SITIS), 2012 Eighth International Conference on. IEEE, 2012,
pp. 42–46.
[21] R. Min, N. Kose, and J.-L. Dugelay, “Kinectfacedb: A kinect database
for face recognition,” IEEE Transactions on Systems, Man, and
Cybernetics: Systems, vol. 44, no. 11, pp. 1534–1548, 2014.
[22] I. Lüsi, S. Escarela, and G. Anbarjafari, “Sase: Rgb-depth database
for human head pose estimation,” in Computer Vision–ECCV 2016
Workshops. Springer, 2016, pp. 325–336.
[23] F. Noroozi, C. A. Corneanu, D. Kami´nska, T. Sapi´nski, S. Escalera, and
G. Anbarjafari, “Survey on emotional body gesture recognition,” arXiv
preprint arXiv:1801.07481, 2018.
[24] A. Psaltis, K. Kaza, K. Stefanidis, S. Thermos, K. C. Apostolakis,
K. Dimitropoulos, and P. Daras, “Multimodal affective state recognition
in serious games applications,” in Imaging Systems and Techniques (IST),
2016 IEEE International Conference on. IEEE, 2016, pp. 435–439.
[25] H. Ranganathan, S. Chakraborty, and S. Panchanathan, “Multimodal
emotion recognition using deep learning architectures,” in 2016 IEEE
Winter Conference on Applications of Computer Vision (WACV). IEEE,
2016, pp. 1–9.
[26] E. Douglas-Cowie, R. Cowie, and M. Schröder, “A new emotion
database: considerations, sources and scope,” in ISCA tutorial and
research workshop (ITRW) on speech and emotion, 2000.
[27] F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and
B. Weiss, “A database of german emotional speech,” in Ninth European
Conference on Speech Communication and Technology, 2005.
[28] P. Ekman, “Universal and cultural differences in facial expression of
emotion,” Nebr. Sym. Motiv., vol. 19, pp. 207–283, 1971.
[29] J. Russell and A. Mehrabian, “Evidence for a three-factor theory of
emotions,” J. Research in Personality, vol. 11, pp. 273–294, 1977.
[30] R. Plutchik, “The nature of emotions human emotions have deep
evolutionary roots, a fact that may explain their complexity and provide
tools for clinical practice,” American scientist, vol. 89, no. 4, pp.
344–350, 2001.
[31] L. A. Camras, H. Oster, J. J. Campos, K. Miyake, and D. Bradshaw,
“Japanese and american infants’ responses to arm restraint.”
Developmental Psychology, vol. 28, no. 4, p. 578, 1992.
[32] M. Gavrilescu, “Recognizing emotions from videos by studying facial
expressions, body postures and hand gestures,” in Telecommunications
Forum Telfor (TELFOR), 2015 23rd. IEEE, 2015, pp. 720–723.
[33] T. Baltrušaitis, D. McDuff, N. Banda, M. Mahmoud, R. El Kaliouby,
P. Robinson, and R. Picard, “Real-time inference of mental states
from facial expressions and upper body gestures,” in Automatic
Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE
International Conference on. IEEE, 2011, pp. 909–914.