Abstract: Affective adaptation is a creative way for game designers to add an extra layer of engagement to their productions. When player’s emotions are an explicit factor in mechanics design, endless possibilities for imaginative gameplay emerge. Whilst gaining popularity, existing affective game research mostly runs controlled experiments in restrictive settings and rely on one or more specialist devices for measuring player’s emotional state. These conditions albeit effective, are not necessarily realistic. Moreover, the simplified narrative and intrusive wearables may not be suitable for players. This exploratory study investigates delivering an immersive affective experience in the wild with minimal requirements, in an attempt for the average developer to reach the average player. A puzzle game is created with rich narrative and creative mechanics. It employs both explicit and implicit adaptation and only requires a web camera. Participants played the game on their own machines in various settings. Whilst it was rated feasible, very engaging and enjoyable, it remains questionable whether a fully immersive experience was delivered due to the limited sample size.
Abstract: During COVID-19, the depression rate has increased dramatically. Young adults are most vulnerable to the mental health effects of the pandemic. Lower-income families have a higher ratio to be diagnosed with depression than the general population, but less access to clinics. This research aims to achieve early depression detection at low cost, large scale, and high accuracy with an interdisciplinary approach by incorporating clinical practices defined by American Psychiatric Association (APA) as well as multimodal AI framework. The proposed approach detected the nine depression symptoms with Natural Language Processing sentiment analysis and a symptom-based Lexicon uniquely designed for young adults. The experiments were conducted on the multimedia survey results from adolescents and young adults and unbiased Twitter communications. The result was further aggregated with the facial emotional cues analyzed by the Convolutional Neural Network on the multimedia survey videos. Five experiments each conducted on 10k data entries reached consistent results with an average accuracy of 88.31%, higher than the existing natural language analysis models. This approach can reach 300+ million daily active Twitter users and is highly accessible by low-income populations to promote early depression detection to raise awareness in adolescents and young adults and reveal complementary cues to assist clinical depression diagnosis.
Abstract: We as humans use words with accompanying visual and facial cues to communicate effectively. Classifying facial emotion using computer vision methodologies has been an active research area in the computer vision field. In this paper, we propose a simple method for facial expression recognition that enhances accuracy. We tested our method on the FER-2013 dataset that contains static images. Instead of using Histogram equalization to preprocess the dataset, we used Unsharp Mask to emphasize texture and details and sharpened the edges. We also used ImageDataGenerator from Keras library for data augmentation. Then we used Convolutional Neural Networks (CNN) model to classify the images into 7 different facial expressions, yielding an accuracy of 69.46% on the test set. Our results show that using image preprocessing such as the sharpening technique for a CNN model can improve the performance, even when the CNN model is relatively simple.
Abstract: Neural networks are appealing for many applications since they are able to learn complex non-linear relationships between input and output data. As the number of neurons and layers in a neural network increase, it is possible to represent more complex relationships with automatically extracted features. Nowadays Deep Neural Networks (DNNs) are widely used in Computer Vision problems such as; classification, object detection, segmentation image editing etc. In this work, Facial Emotion Recognition task is performed by proposed Convolutional Neural Network (CNN)-based DNN architecture using FER2013 Dataset. Moreover, the effects of different hyperparameters (activation function, kernel size, initializer, batch size and network size) are investigated and ablation study results for Pooling Layer, Dropout and Batch Normalization are presented.
Abstract: People with speech disorders may rely on augmentative
and alternative communication (AAC) technologies to help them
communicate. However, the limitations of the current AAC
technologies act as barriers to the optimal use of these technologies in
daily communication settings. The ability to communicate effectively
relies on a number of factors that are not limited to the intelligibility
of the spoken words. In fact, non-verbal cues play a critical role in
the correct comprehension of messages and having to rely on verbal
communication only, as is the case with current AAC technology,
may contribute to problems in communication. This is especially true
for people’s ability to express their feelings and emotions, which are
communicated to a large part through non-verbal cues. This paper
focuses on understanding more about the non-verbal communication
ability of people with dysarthria, with the overarching aim of this
research being to improve AAC technology by allowing people
with dysarthria to better communicate emotions. Preliminary survey
results are presented that gives an understanding of how people with
dysarthria convey emotions, what emotions that are important for
them to get across, what emotions that are difficult for them to convey,
and whether there is a difference in communicating emotions when
speaking to familiar versus unfamiliar people.
Abstract: Human pose estimation and tracking are to accurately identify and locate the positions of human joints in the video. It is a computer vision task which is of great significance for human motion recognition, behavior understanding and scene analysis. There has been remarkable progress on human pose estimation in recent years. However, more researches are needed for human pose tracking especially for online tracking. In this paper, a framework, called PoseSRPN, is proposed for online single-person pose estimation and tracking. We use Siamese network attaching a pose estimation branch to incorporate Single-person Pose Tracking (SPT) and Visual Object Tracking (VOT) into one framework. The pose estimation branch has a simple network structure that replaces the complex upsampling and convolution network structure with deconvolution. By augmenting the loss of fully convolutional Siamese network with the pose estimation task, pose estimation and tracking can be trained in one stage. Once trained, PoseSRPN only relies on a single bounding box initialization and producing human joints location. The experimental results show that while maintaining the good accuracy of pose estimation on COCO and PoseTrack datasets, the proposed method achieves a speed of 59 frame/s, which is superior to other pose tracking frameworks.
Abstract: The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.
Abstract: People express emotions through different modalities.
Integration of verbal and non-verbal communication channels creates
a system in which the message is easier to understand. Expanding
the focus to several expression forms can facilitate research on
emotion recognition as well as human-machine interaction. In this
article, the authors present a Polish emotional database composed of
three modalities: facial expressions, body movement and gestures,
and speech. The corpora contains recordings registered in studio
conditions, acted out by 16 professional actors (8 male and 8 female).
The data is labeled with six basic emotions categories, according to
Ekman’s emotion categories. To check the quality of performance,
all recordings are evaluated by experts and volunteers. The database
is available to academic community and might be useful in the study
on audio-visual emotion recognition.
Abstract: Motion recognition from videos is actually a very
complex task due to the high variability of motions. This paper
describes the challenges of human motion recognition, especially
motion representation step with relevant features. Our descriptor
vector is inspired from Laban Movement Analysis method. We
propose discriminative features using the Random Forest algorithm
in order to remove redundant features and make learning algorithms
operate faster and more effectively. We validate our method on
MSRC-12 and UTKinect datasets.
Abstract: Technological and sociological developments in the automotive sector are shifting the focus of design towards developing a better understanding of driver needs, desires and emotions. Human centred design methods are being more frequently applied to automotive research, including the use of systems to detect human emotions in real-time. One method for a non-contact measurement of emotion with low intrusiveness is Facial-Expression Analysis (FEA). This paper describes a research study investigating emotional responses of 22 participants in a naturalistic driving environment by applying a multi-method approach. The research explored the possibility to investigate emotional responses and their frequencies during naturalistic driving through real-time FEA. Observational analysis was conducted to assign causes to the collected emotional responses. In total, 730 emotional responses were measured in the collective study time of 440 minutes. Causes were assigned to 92% of the measured emotional responses. This research establishes and validates a methodology for the study of emotions and their causes in the driving environment through which systems and factors causing positive and negative emotional effects can be identified.
Abstract: Human motion recognition has been extensively increased in recent years due to its importance in a wide range of applications, such as human-computer interaction, intelligent surveillance, augmented reality, content-based video compression and retrieval, etc. However, it is still regarded as a challenging task especially in realistic scenarios. It can be seen as a general machine learning problem which requires an effective human motion representation and an efficient learning method. In this work, we introduce a descriptor based on Laban Movement Analysis technique, a formal and universal language for human movement, to capture both quantitative and qualitative aspects of movement. We use Discrete Hidden Markov Model (DHMM) for training and classification motions. We improve the classification algorithm by proposing two DHMMs for each motion class to process the motion sequence in two different directions, forward and backward. Such modification allows avoiding the misclassification that can happen when recognizing similar motions. Two experiments are conducted. In the first one, we evaluate our method on a public dataset, the Microsoft Research Cambridge-12 Kinect gesture data set (MSRC-12) which is a widely used dataset for evaluating action/gesture recognition methods. In the second experiment, we build a dataset composed of 10 gestures(Introduce yourself, waving, Dance, move, turn left, turn right, stop, sit down, increase velocity, decrease velocity) performed by 20 persons. The evaluation of the system includes testing the efficiency of our descriptor vector based on LMA with basic DHMM method and comparing the recognition results of the modified DHMM with the original one. Experiment results demonstrate that our method outperforms most of existing methods that used the MSRC-12 dataset, and a near perfect classification rate in our dataset.
Abstract: In this study, we have proposed a gesture to emotion recognition method using flex sensors mounted on metacarpophalangeal joints. The flex sensors are fixed in a wearable glove. The data from the glove are sent to PC using Wi-Fi. Four gestures: finger pointing, thumbs up, fist open and fist close are performed by five subjects. Each gesture is categorized into sad, happy, and excited class based on the velocity and acceleration of the hand gesture. Seventeen inspectors observed the emotions and hand gestures of the five subjects. The emotional state based on the investigators assessment and acquired movement speed data is compared. Overall, we achieved 77% accurate results. Therefore, the proposed design can be used for emotional state detection applications.
Abstract: One of the main aims of current social robotic research
is to improve the robots’ abilities to interact with humans. In order
to achieve an interaction similar to that among humans, robots
should be able to communicate in an intuitive and natural way
and appropriately interpret human affects during social interactions.
Similarly to how humans are able to recognize emotions in other
humans, machines are capable of extracting information from the
various ways humans convey emotions—including facial expression,
speech, gesture or text—and using this information for improved
human computer interaction. This can be described as Affective
Computing, an interdisciplinary field that expands into otherwise
unrelated fields like psychology and cognitive science and involves
the research and development of systems that can recognize and
interpret human affects. To leverage these emotional capabilities
by embedding them in humanoid robots is the foundation of
the concept Affective Robots, which has the objective of making
robots capable of sensing the user’s current mood and personality
traits and adapt their behavior in the most appropriate manner
based on that. In this paper, the emotion recognition capabilities
of the humanoid robot Pepper are experimentally explored, based
on the facial expressions for the so-called basic emotions, as
well as how it performs in contrast to other state-of-the-art
approaches with both expression databases compiled in academic
environments and real subjects showing posed expressions as well
as spontaneous emotional reactions. The experiments’ results show
that the detection accuracy amongst the evaluated approaches differs
substantially. The introduced experiments offer a general structure
and approach for conducting such experimental evaluations. The
paper further suggests that the most meaningful results are obtained
by conducting experiments with real subjects expressing the emotions
as spontaneous reactions.
Abstract: Surface electromyographic (sEMG) signal has the potential to identify the human activities and intention. This potential is further exploited to control the artificial limbs using the sEMG signal from residual limbs of amputees. The paper deals with the development of multichannel cost efficient sEMG signal interface for research application, along with evaluation of proposed class dependent statistical approach of the feature selection method. The sEMG signal acquisition interface was developed using ADS1298 of Texas Instruments, which is a front-end interface integrated circuit for ECG application. Further, the sEMG signal is recorded from two lower limb muscles for three locomotions namely: Plane Walk (PW), Stair Ascending (SA), Stair Descending (SD). A class dependent statistical approach is proposed for feature selection and also its performance is compared with 12 preexisting feature vectors. To make the study more extensive, performance of five different types of classifiers are compared. The outcome of the current piece of work proves the suitability of the proposed feature selection algorithm for locomotion recognition, as compared to other existing feature vectors. The SVM Classifier is found as the outperformed classifier among compared classifiers with an average recognition accuracy of 97.40%. Feature vector selection emerges as the most dominant factor affecting the classification performance as it holds 51.51% of the total variance in classification accuracy. The results demonstrate the potentials of the developed sEMG signal acquisition interface along with the proposed feature selection algorithm.
Abstract: The problem of emotion recognition is a challenging problem. It is still an open problem from the aspect of both intelligent systems and psychology. In this paper, both voice features and facial features are used for building an emotion recognition system. A Support Vector Machine classifiers are built by using raw data from video recordings. In this paper, the results obtained for the emotion recognition are given, and a discussion about the validity and the expressiveness of different emotions is presented. A comparison between the classifiers build from facial data only, voice data only and from the combination of both data is made here. The need for a better combination of the information from facial expression and voice data is argued.
Abstract: In this paper we present a system for classifying videos
by frequency spectra. Many videos contain activities with repeating
movements. Sports videos, home improvement videos, or videos
showing mechanical motion are some example areas. Motion of these
areas usually repeats with a certain main frequency and several side
frequencies. Transforming repeating motion to its frequency domain
via FFT reveals these frequencies. Average amplitudes of frequency
intervals can be seen as features of cyclic motion. Hence determining
these features can help to classify videos with repeating movements.
In this paper we explain how to compute frequency spectra for video
clips and how to use them for classifying. Our approach utilizes series
of image moments as a function. This function again is transformed
into its frequency domain.
Abstract: Emotion recognition is an important research field that finds lots of applications nowadays. This work emphasizes on recognizing different emotions from speech signal. The extracted features are related to statistics of pitch, formants, and energy contours, as well as spectral, perceptual and temporal features, jitter, and shimmer. The Artificial Neural Networks (ANN) was chosen as the classifier. Working on finding a robust and fast ANN classifier suitable for different real life application is our concern. Several experiments were carried out on different ANN to investigate the different factors that impact the classification success rate. Using a database containing 7 different emotions, it will be shown that with a proper and careful adjustment of features format, training data sorting, number of features selected and even the ANN type and architecture used, a success rate of 85% or even more can be achieved without increasing the system complicity and the computation time
Abstract: In this paper, we propose a new method to distinguish
between arousal and relaxation states by using multiple features
acquired from a photoplethysmogram (PPG) and support vector
machine (SVM). To induce arousal and relaxation states in subjects, 2
kinds of sound stimuli are used, and their corresponding biosignals are
obtained using the PPG sensor. Two features–pulse to pulse interval
(PPI) and pulse amplitude (PA)–are extracted from acquired PPG
data, and a nonlinear classification between arousal and relaxation is
performed using SVM.
This methodology has several advantages when compared with
previous similar studies. Firstly, we extracted 2 separate features from
PPG, i.e., PPI and PA. Secondly, in order to improve the classification
accuracy, SVM-based nonlinear classification was performed.
Thirdly, to solve classification problems caused by generalized
features of whole subjects, we defined each threshold according to
individual features.
Experimental results showed that the average classification
accuracy was 74.67%. Also, the proposed method showed the better
identification performance than the single feature based methods.
From this result, we confirmed that arousal and relaxation can be
classified using SVM and PPG features.
Abstract: This paper is concerned with motion recognition based fuzzy WP(Wavelet Packet) feature extraction approach from Vicon physical data sets. For this purpose, we use an efficient fuzzy mutual-information-based WP transform for feature extraction. This method estimates the required mutual information using a novel approach based on fuzzy membership function. The physical action data set includes 10 normal and 10 aggressive physical actions that measure the human activity. The data have been collected from 10 subjects using the Vicon 3D tracker. The experiments consist of running, seating, and walking as physical activity motion among various activities. The experimental results revealed that the presented feature extraction approach showed good recognition performance.
Abstract: In modern human computer interaction systems
(HCI), emotion recognition is becoming an imperative characteristic.
The quest for effective and reliable emotion recognition in HCI has
resulted in a need for better face detection, feature extraction and
classification. In this paper we present results of feature space analysis
after briefly explaining our fully automatic vision based emotion
recognition method. We demonstrate the compactness of the feature
space and show how the 2d/3d based method achieves superior features
for the purpose of emotion classification. Also it is exposed that
through feature normalization a widely person independent feature
space is created. As a consequence, the classifier architecture has
only a minor influence on the classification result. This is particularly
elucidated with the help of confusion matrices. For this purpose
advanced classification algorithms, such as Support Vector Machines
and Artificial Neural Networks are employed, as well as the simple k-
Nearest Neighbor classifier.