You are on page 1of 10

2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable

Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovations

EmotionSense: Emotion Recognition Based on


Wearable Wristband
Bobo Zhao, Zhu Wang, Zhiwen Yu, Bin Guo
School of Computer Science
Northwestern Polytechnical University
Xi’an, China 710029
Email: zhaobobo@mail.nwpu.edu.cn, {wangzhu, zhiwenyu, guob}@nwpu.edu.cn

Abstract—In this paper, we develop an automatic emotion features contained in human communications, such as facial
recognition system based on a sensor-enriched wearable expression, gestures, or speech in different experimental setups
wristband. Specifically, in order to obtain physiological data from [4,5,6]. However, the features of the aforementioned
participants, we first adopt a video induction method which can audio/visual emotion channels usually are not adequate to obtain
spontaneously evoke human emotions in a real-life environment. emotion classification results, since human can disguise their
Meanwhile, a questionnaire is designed to record the emotion
emotions with artifacts of social masking [7]. For example,
status of the participants, which can be used as the ground-truth
for emotion recognition. Second, we collect multi-modal people may have a “poker face” or they may not express emotion
physiological signals by utilizing three different biosensors changes via intuitive human body languages when they are in
(including blood volume pause, electrodermal activity, and skin the mood [6]. Similarly, using traditional physiological
temperature) embedded in the Empatica E4 wristband. measurements, including electroencephalography (EEG),
Furthermore, we extract time, frequency and nonlinear features electromyogram (EMG), electroencephalography (EEG),
from the collected physiological signals, and adopt the sequence respiration (RSP), and blood oxygen saturation for emotion
forward floating selection (SFFS) method to search for the best classifications has some limitations [8,9,10,11]. First, the data
emotion-related features. Finally, we classify different emotions acquisition equipments are medical level, which are very
base on SVM using the selected features in aspect of arousal,
expensive and not suitable for daily use. Second, physiological
valence, and four emotions. An overall accuracy of 76% for 15
participants demonstrates that the proposed system can recognize patterns cannot be mapped into specific emotional states
human emotions effectively. uniquely because emotions could be influenced by many other
factors, e.g., time, context, space, culture, etc.[1]. To this end,
Keywords—Wearable wristband, emotion recognition, blood this work aims to deal with the above issues by developing an
volume pause, electrodermal activity, skin temperature, arousal, emotion sensing system based on a sensor-enriched wearable
valence wristband. On one hand, the wristband is able to capture
physiological signals in a non-invasive manner and thus suitable
I. INTRODUCTION for real-life use. On the other hand, fine-grained features are
Human emotions are psycho-physiological experiences that extracted from multi-modal physiological signals, which take
affect every aspect of our daily life, which are triggered by full advantage of the information contained in the emotion
unconscious or conscious perception to something, and often changes, and further improve the efficiency of emotion
associated with mood, personality, temperament, disposition, recognition.
and motivation [1]. Additionally, they are a series of processes However, recognizing emotions is still a challenging
directed towards specific internal or external objects or events, problem for the following reasons:
which result in changes of both behavior and bodily state (i.e., z The fuzzy boundaries and differences in individual
physiological changes) [2]. In particular, emotion health is variations of emotion make it difficult to obtain the
closely related to the quality of personal life as well as the ‘ground-truth’ behind human emotions. Therefore, if we
security and stability of public communities [3]. Bad emotion exploit the method in [12,13,14], which stimuli emotion
state is not only harmful to personal health, but also most likely with pictures over a short time, it is hard to evoke an
to be an early sign or cause of some serious mental illness. explicit emotion.
Especially, for some special groups, e.g., drivers, pilots and z As emotions affects several aspects of the individual’s
enginemen, emotional health even have serious influence to physiological signs, e.g., heartbeat, blood pressure, skin
public security. Therefore, developing an effective emotion temperature, it is hard to identify the emotion patterns by
recognition system for identifying various emotions is very employing one single attribute or attributes from one single
important. physiological measurement. Thereby, if we exploit
Recently, a lot of studies have focused on designing effective traditional method [15,16,17], which focus on single signal
human emotion recognition systems, which can identify implicit feature or features from one single aspect of emotion

978-1-5386-9380-3/18/$31.00 ©2018 IEEE 346


DOI 10.1109/SmartWorld.2018.00091
measurement, it is hard to achieve high recognition be adequately expressed in words since the meaning of the
accuracy. chosen words are too restrictive or culturally dependent [1]. The
z Even though two subjects have the same emotion status, other is dimensional models that categorize emotions with
their physiological features may have significant multiple dimensions or scales instead of discreet words or labels.
differences since different participants usually have Arousal and valence are two common used scale in emotion
different physiological levels. Thereby, individual classification, mapping all the emotions onto the two-dimension
difference in the same group is one of the key obstacles in plane shown in figure 1. In dimensional model, valence
emotion pattern discrimination, which makes it difficult to dimension represents the pleasantness of emotion (positive and
distinguish different emotion patterns by setting thresholds negative), for instance, joy and happiness are positive valence,
[18]. while fear and sadness are negative valence. Arousal dimension
To tackle the above challenges, we propose a framework represents the level of emotions (low and high). For instance,
based on wearable wristband, which leverages more fine- sad and relaxed are low arousal, while fear and excited are high
grained emotion features to identify emotion patterns by arousal. In this paper, we use the dimensional model to describe
systematically exploring the multi-modal physiological signals. human emotions with label LANV, LAPV, HANV, and HAPV
In particular, the proposed system is suitable for real-life shown in Figure 1. Nowadays, psychphysiologists are still
environments and does not need to limit the subject in certain studying how to categorize human emotions accurately and
specific places. The main contributions of this work are as specifically.
follows: The second category focuses on emotion recognition based
z We propose an emotion recognition system based on on physiological signals. Many researches have been conducted
wearable Empatica E4 wristband. Compared with the to recognize human emotion by utilizing physiological data, i.e.
method in [19,20], the proposed approach is light-weight EEG, ECG, muscle activity, skin conductivity, and RSP velocity,
and nonintrusive. Moreover, the wristband can connect etc. [6,7,10,19]. In research by Sander Koelstra et al. [19], for
with smartphones, which enable context-aware and real- example, EEG data was collected from 32 participants while
time emotion analysis. each watches one-minute excerpts of music video, which was
z We extract a set of fine-grained features from multi- rated in terms of the levels of arousal, valence, like/dislike,
modal physiological signals, which are closely related dominance and familiarity. Then, an extensive analysis of the
with emotion changes. Moreover, to search for best participants’ rating is presented and decision fusion of the
emotion-related features, SFFS is used to select feature classification results is performed. The authors show that EEG
subsets, and different feature selection and pattern data can be used to monitor human emotions. Although this
classification methods are tested. work achieve an accuracy of 75 percent, deploying EEG device
z We evaluate the proposed system using 10 fold cross- in a daily office setting is not yet realistic. Yu-Liang Hsu et al.
validation based on a real physiological dataset. adopted a musical induction method to evoke participants’
Experimental results show that our system achieves a emotion states and collected their ECG signals. The feature-
recognition accuracy of 76%, which demonstrates the selection method of generalized discriminant analysis was used
efficiency of the proposed system. to select significant ECG features and four types of emotions,
The rest of this paper is organized as follows. We first review joy, tension, sadness, and peacefulness were classified with LS-
the related work in section II. Afterwards, Section III present the SVM algorithm. Finally, they fund that induction methods,
overview of the proposed system, followed by methodology in emotion types, and number of subjects all have an influence to
Section IV, and then experiment evaluation is presented in emotion recognition accuracy [7]. Corneliz Setz et al. presented
Section V. Finally, we discuss the future work and conclude the the work in which they adopted galvanic skin response to
paper in Section VI. distinguish stress and cognitive load. An arithmetic task on
computer was solved by 32 participants to elicit both conditions
II. RELATED WORK and data was normalized by using a baseline period to address
A large number of studies have been conducted in the field individual differences. Leave-one-out cross validation yielded
of effective computing and great achievements have been made. an accuracy of 82% to distinguish the two conditions [23].
In this section, we will review these related works in three However, it is difficult to ensure each participant to evoke the
aspects. The first line of research can be classified as same stressful condition, since each one react differently to the
categorizations of emotions. It is difficult to judge or model designed tasks and the boundary between stress and work load
human emotions because people express their emotions is fuzzy. The abovementioned researches all achieved high
differently. After years of research, there are two different accuracy, which appears acceptable for practical applications.
models often used to model emotions. One is discrete emotion Moreover, the recognition rates heavily depend on the training
model, in which human must choose a specific list of word set on the application.
labels to describe emotion states i.e. joy, fear, sadness, anger, The third line of research includes studies that recognize
tension, surprise, etc. For instance, Ekman and Friesen proposed human emotions with Audio/Visual signals i.e. facial expression,
the six basic emotions [21] and the tree structure emotion model postures, speech etc. Sarode et al. presented the work in which
presented by Parrot [22]. However, there is one problem in this facial expressions were collected to recognition human emotions
method that the stimuli may elicit blended emotions that cannot and proposed a method that using a 2D appearance-based local

347
approach to extract the intrinsic facial features. Radial HRV. In addition, the principal frequency component of the
Symmetry Transform and edge projection analysis were further BVP signal is analyzed. For EDA, skin conductance level (SCL)
used for feature extraction and an accuracy of 81 percent was as well as skin conductance response (SCR) of the subject is
achieved for facial expression recognition from grayscale image studied. In particular, Butterworth filter is designed to
[24]. In research by Wu S et al [25], modulation spectral features decompose EDA signal to two parts. For SKT, we mainly
(MSFs) were proposed to automatically recognize human analyze time domain characteristics of the signal to
affective information from speech. The author used a quantitatively describe variation of the SKT. The detailed
modulation filterbank and an auditory filterbank for speech description of feature extraction will be shown in section IV.
analysis and captured both temporal modulation frequency and Emotion recognition. In order to demonstrate the effective
acoustic frequency components. Finally, an overall recognition of extracted features, we utilize them to classify different
accuracy of 91 percent is achieved for classifying seven emotion emotion patterns by learning an SVM classifier.
types by using the two features. However, as mentioned in the
previous section, people have a ‘poker face’, or they may not
express their emotion via intrusive human body language.


 Figure. 2. Overview of the proposed emotion recognition system
Figure. 1. Dimensional emotion model

III. SYSTEM OVERVIEW IV. METHODOLOGY


As illustrated in Figure 2, the system consists of three This section focus on the detailed methods used in this paper.
components: data preparation, feature extraction, and emotion We first describe how to eliminate the noise contained in the
recognition. We first collect physiology data from participants collected physiological signals. Then, we explain the extracted
with emotion changes triggered by emotion stimuli corpus, features in detail, followed by description to the used
using the Empatica E4 wristband embedded into biosensors. classification method.
Then, physiological data is preprocessed to eliminate the noise. A. Data Pre-processing
Next, we extract distinct features from three different biosensors
Signal noise is an important factor that influence the stability
and use SFFS+SVM to achieve emotion recognition at last.
Data capture. Emotion is a complex progress, which of features and the recognition accuracy. Different types of
consists of a series of reactivity. To evoke the subject’s emotion artifacts are observed in all of the three channel signals, such as
changes, Emotion stimuli corpus is constructed before we baseline wander caused by subject’s movement or signal
amplitude shift generated by individuals and instrumental
collect physiology data using biosensors available in wearable
difference, which will lower the noise to signal ratio, or even
wristband. At the same time, a questionnaire recording emotion
cover up the original signal, mostly at the beginning and at the
status is prepared for the subject to describe the detailed emotion
condition. The concrete description of the experiments setting end of each recording. Thus, for all the subjects and channels,
will be shown in section V. we only keep the middle part of signals, resulting in data
Feature extraction. This component characterizes the segment of 300 seconds for each recording.
Particularly, the EDA signal requires additional
multi-modal physiological signals e.g. BVP, EDA, SKT, and
preprocessing, including deep smoothing and signal separation,
extracts effective features corresponding to the function
relationship between physiological changes and emotion status. because of electromyogram interference, which is caused by the
human forearm muscle fibrillation and contains irregular fast
For BVP, we analyze the heart rate variety (HRV) ˈ which
waveforms. Using an adaptive bandpass filter, we remove the
reflect the activity of the autonomic nervous system (ANS)
artifacts in EDA. For other signals, we use pertinent low-pass
directly related to emotion changes. Then, we can further extract
filters to remove noises without loss of information.
fine-grained features to qualitatively describe emotion from

348
Furthermore, to overcome individual difference associated Figure 4(b). As features, first, the power mean values of each
with physiological data, we normalize the values of each signal subband and the fundamental frequency are calculated by
as a number between 0 and 100. Using EDA signal as an finding the maximum magnitude in the spectrum, which is
example, a global minimum and maximum EDA are obtained denoted as P_power_LF, P_power_MF, and P_power_HF,
for each subject, and then the normalized signal is computed as respectively. Then, to capture peaks and their location in
follows. The same method is also used to normalize the BVP subband, represented as P_peak_LF, P_peak_MF, and
signal. P_peak_HF respectively, we compute the subband spectral
 EDA(i )  EDA  (1) entropy (SSE) for each subband. Entropy plays an important part
Normalize EDA(i )   min
  100
EDAmax  EDAmin in information theory, which is a measure of disorganization or
uncertainty in a random variable. It is usually used to measure
B. Feature extraction
the degree of a classifier’s confidence in pattern recognition. In
After the preprocessing of the collected physiological signals, order to compute the SSE it is necessary to convert each
we have removed the noise such as baseline drift and EMG spectrum into a probability mass function form. Equation (2) is
interference to obtain reliable physiological data. To extract design to normalize the spectrum:
emotion-related features from these signals, we adopt a window Xi
xi  , i 1 N (2)
based method. In particular, the fixed window size of 16s is set  iN1 X i
up to intercept the preprocessed data segment, and each trail will
be divided into several segments, as shown in Figure 3. If we X i represents the energy of the ith frequency component of the
use N to represent the size of a time window, the data can be spectrum, and x  {x1 , x2 , xn } is computed as the PMF of
represented as X  x1 , x2 , , xn . The value of N varies
the spectrum. Finally, the SSE is computed from x , using the
with different physiological signals. Specifically, it is 1024 for following equation:
the BVP signal, 64 for the EDA signal, and 16 for the N
temperature signal (BVP 64Hz, EDA 4Hz, SKT 1Hz). H sub    xi  log 2 xi (3)
i 1


  D Original BVP signal
Figure 3. Segment of the trials in the experiment

BVP signal: BVP signals are optically obtained using the


pulse oximeters, embedded in the wearable device, which
illuminates the subject’s skin using a light-emitting diode (LED),
and measures the intensity changes in the light reflected from
skin, forming a BVP signal. In traditional Chinese medicine,
pulse diagnosis has a very important value. By analyzing the
pulse wave, we are able to understand the status of the heart and
hemodynamics, and then evaluate the entire blood circulation.
Similarly, a subject’s psychological and mental status will
change under emotional stimuli, which will further influence the
(b)FFT of BVP with different frequency subbands
activity of her cardiac. Thereby, it is possible to characterize
Figure 4. Original BVP signal and FFT analysis
one’s emotional changes by analyzing her cardiovascular
activities. HRV analysis: Heart rate variety (HRV) is one of the most
BVP signal is a quasi-periodic signal, and its typical often used measures for BVP analysis, which contains abundant
waveform is shown in Figure 4(a). In order to obtain the subband information about the status of the autonomic nerves system[9].
of spectrum of BVP signal, we utilize the 1024 point fast Fourier To obtain HRV from the continuous BVP signal, we design a
transform (FFT) and partition the coefficients with the novel algorithm named dynamic threshold difference (DTD) to
frequency range of 0~10Hz into three non-overlapping subbands detect heart beat and use cubic spline interpolation to make the
with unequal bandwidth. On the basis of the normal heart rate HRV curve smooth. The processing flow chart of HRV analysis
range, 50~120 Hz, we set subband 0~0.7 Hz as the low- is shown in Figure. 5. Main steps of the DTD algorithm include
frequency (LF) band, 0.7~2Hz as the middle-frequency (MF) initial threshold setting, main crest detection, and threshold
band and 2~10 Hz as high-frequency (HF) band, shown as updating. The advantage of DTD is that the threshold of peaks

349
is not fixed, but varies with each detection. In Algorithm 1, C (H_meanValue), the maximum value (H_maxValue), the
represents the time interval of two adjacent peaks in segment Si , minimum value (H_minValue), number of successive NN
and H denotes the maximum amplitude of segment Hi . We set intervals that differ more than 50ms (H_NN50), and the
standard deviation of the first derivative of HRV (H_STDD).
the average of C0 as the initial difference threshold after The relative equation for calculating the time domain HRV is
removing the maximum and minimum of the five segments. as follows.
H 0 is computed by the same process. Usually the interval of H _ SDNN  in1 ( Ri  R)2 / n (4)
two adjacent peaks is not fixed but varies up and down. Here, where Ri is the ith NN interval, R is the average of NN
we define that the difference is reasonable when it is bigger than
0.7C0 , and the amplitude is reasonable when the peaks are intervals, and n is the number of NN intervals.
The frequency domain of the HRV time series is also
bigger than 0.5H 0 and smaller than 1.5H 0 . If the selected peak analyzed, and a total of 3 HRV related parameters are calculated
satisfies the above rules, we consider it as the main crest of the at certain frequency bands. Before the fast Fourier transform
heartbeat cycle. Finally, we adjust the threshold according to (FFT) of the HRV time series, we interpolate the NN interval
the previous five main crest. signal to prevent the generation of additional harmonic
component. Then, the power spectral density of the signal is
analyzed to calculate the power of specific frequency ranges and
the peak frequency for three different frequency bands: the very
low frequency band (VLF) (0.003-0.04 Hz), the low frequency
band (LF) (0.04-0.15 Hz), and the high frequency band (HF)
(0.15-0.4 Hz). Specifically, we extract the following frequency
domain features: the mean power of the VLF, LF, and HF band,
ratio of power within the LF band to that within the HF band
(LF/HF), frequency of the highest peak in the VLF band (H_
 peak_VLF), frequency of the highest peak in the LF band (H_
Figure. 5. Flow chart of HRV processing
peak_LF), and frequency of the highest peak in the HF band (H_
peak_HF). Furthermore, we also analyze the Poincare geometry
in the feature set in order to capture the nature of inter-beat
Algorithm 1 interval fluctuations. As shown in Figure 6, the Poincare plot is
Input˖BVP time series X  x1 , x2 , , xn , n=1024. a graph of each RR interval plotted against the next interval and
Output˖HRV time series Y  y1 , y2 , , ym
provides quantitative information of the heart activity by
calculating the standard deviations of the distances. SD1
Step 1˖Initial Threshold Setting
First , divide the BVP time series into five equal parts represents the standard deviation of the instantaneous beat-to-
S1ˈS 2ˈS 3ˈS 4ˈS 5 and beat NN interval variability, SD2 represents the standard
compute maximal difference and maximal amplitude of each deviation of the continuous long-term beat-to-beat NN interval
Then, remove the maximum and minimum values of the five. variability, and SD12 is the ratio of SD1 to SD2.
C1  C2  C3 H1  H 2  H3
Compute C0  , H0 
3 3
Definite difference threshold Th1  0.7C0
Definite Amplitude threshold lower limit Th2  0.5H0
Definite Amplitude threshold upper limit Th3  1.5H0
Step 2˖Main Crest Detection
First ,select random three peak, f i , f i 1 , f i  2
If fi1  fi  Th1, fi 2  fi 1  Th1
Then define the four successive peak point f k , f k 1 , f k 2 , f k 3
Else reselect point
If f k 1  f k  0, f k 2  f k 1  0, and ?f k 3  f k 2  0
Then select f k  2 as the main crest
Figure. 6. Poincare plot of HRV time series
Step 3˖Thredshold updating
First, select the maximal difference from f i 1 to f k  2 donated by Cnew, and EDA signal: Electrodermal activity (EDA) is one of the most
Amplitude Hnew often used measurements to capture the affective status of users,
C2  C3  Cnew
Then, update C0  , H0  H2  H3  Hnew and Th1,Th2,Th3 especially for arousal difference. Resent studies have shown that
3 3
Return to Step 2 until finish all the main crest the magnitude of electrodermal change and the intensity of
emotional experience are almost linearly associated in the
In the time domain analysis of the HRV time series, we arousal dimension. It is obtained by measuring the voltage
calculate a set of statistical features, including the standard between two electrodes across which a low-level current was
deviation of NN intervals (H_SDNN), the mean value applied. There are two important EDA features, one is the skin

350
conductance level (SCL), which indicates the basic used as the features of SKT, i.e., (T_maxValue, T_minValue,
physiological level of the subject, and another is called skin T_average, T_STD).
conductance response, which is considered to be useful as it
C. Feature selection.
signifies a response to external stimuli. In this paper, we use a
three-order Butterworth filter to decompose the two types of We have described the features extracted from various
signals with a cut-off frequency of 0.5 Hz, as shown in Figure 7. domains in the previous section. Although we try to obtain
Figure 7(a) shows the EDA signals of five different subjects features closely related to emotion changes, there may exist
under the same stimuli condition, and we can see that the five features that have little contribution to the differentiation of
waveforms locate in different scales, due to the difference of different emotion types. In order to gain better emotion pattern
basic physiological status. In addition, the physiological signals classification performance, provide good generalization and
of one subject under different stimuli conditions are also reduce the computation complexity in successive steps, we only
collected, the result is shown in Figure 7(b). retain the features that contribute significantly to the
The features we extracted from the EDA signal is as follows: classification performance. Specifically, both Information Gain
the mean value of SCL (E_mean_SCL), standard deviation of (IG) and sequential forward floating selection (SFFS) are
SCL (E_std_SCL), the mean value of SCR (E_mean_SCR), employed to measure the effectiveness of candidate features. In
standard deviation of SCR (E_std_SCR), the occurrence of the particular, features with an IG smaller than 0.1 will be discarded.
SCR detected by finding two consecutive zero-crossings, from sequential forward floating selection (SFFS) algorithms were
negative to positive and positive to negative (E_num_SCR). applied in this problem to select best features for classification.
SFFS performs a heuristic depth-first search on the feature space.
Starting with an empty set, one subset was selected from the
unselected features to join in the feature subset and optimize the
evaluation function. Then, another subset was taken off from the
feature subset to optimize the evaluation function. There are also
many other feature-selecting algorithms, in order to examine the
performance the selected method, we compared different
feature-selecting algorithms.
D. Emotion recognition
 In this paper, we implement the SVM algorithm for emotion
pattern classification. SVM is a practical method for classifying,
D EDA of different participants under the same stimuli.
which has excellent ability in nonlinear mapping and
generalization, making it widely used in many applications.
For a data set (T) with n samples:
T  {( x1 , y1 ),( x2 , y2 ) ( xn , yn )} , xi  Rn , yi {1} .
The problem of finding a linear classifier for given data
points with a known class label can be described as a problem
of finding a separating hyperplane w T x  b that satisfies
yi ( xi wT  b)  1, i  1, 2, , n
where xi and yi represent a feature vector and its given class
labels, respectively. Then, the problem can be expressed as a
(b) EDA of the same participant under different stimulus. problem of finding Lagrange multipliers  i as follows:
Figure 7.EDA under different conditions.
1
max W ( )  w   in1 i ( yi ( xi wT  b)  1) ,
2

 2
SKT signal: Skin temperature indicates the thermal response
of the human skin. Variations in the SKT mainly come from subject to  in1i yi  0 , i  0, i  1, 2, ,n
localized changes in blood flow, caused by vascular resistance 1  yi ( xi w  b)  0, i  1, 2, , n .
T

or arterial blood pressure. Local vascular resistance is modulated In order to solve the multi-class question, SVM algorithm
by smooth muscle tone, which is mediated by the sympathetic adopts a preliminary non-linear mapping to higher-dimensional
nervous system. The mechanism of arterial blood pressure feature space with kernel function. Its decision rule is formulated
variation can be described by a complicated model of as follows:
cardiovascular regulation by the autonomic nervous system. f ( x)  sgn(in1i yi K ( xi , x)  b)
Thus, the SKT variation reflects the activity of the autonomic
nervous system and is another effective measurement of where b  yi   in1 yii ( xi , x j ) , K ( xi , x) is a kernel function.
emotional status. In this paper, the maximum, minimum, mean In this paper, we use the radial basis function kernel to
and standard deviation values within a time interval of 16s are project the data to a higher dimensional space. In addition, the

351
least square method was utilized to compute the separating data set with 900 samples, each of which consists of the sensor
hyper plane. recording as well as the ground-truth provided by the
questionnaire.
V. EXPERIMENT EVALUATION B. Feature evaluation
In this section, we first report the experimental settings In the experiment, we first evaluate the effectiveness of the
including the subjects we recruited, the stimuli corpus we made extracted features for emotion recognition. Specifically, we
and the experimental protocol. Then, the evaluation results of
analyze the feature distribution of the participants with different
the proposed system will be described.
emotion status. Based on the feature distribution, we use
Information Gain (IG) for further feature evaluation, and the
A. Experiment Set Up
result is shown in Table 1. By setting the threshold as 0.1, we
To capture the relationship between physiological signals can see that the IG of features P_power_MF, H_maxValue,
and different emotion status, we made a special emotion stimuli H_minValue, T_maxValue, T_minValue is lower the threshold,
corpus to evoke emotions, which consists of several different and thereby will not be used for emotion classification.
types of film clips such as comedy, documentary film, horror
movie and war film. These film clips contain both scenes and Table 1. A summary of the features extracted in this paper
audios, which can expose the participants into a real-life Feature Average IG
scenarios and elicit strong subjective and physiological changes. P_peak_LF 5.06f2.43 0.113
When constructing the stimuli corpus, we mainly consider
P_peak_MF 24.37f6.25 0.119
Chinese film clips to evoke emotions, because native culture
factors may affect the elicitation of human emotion. The used P_peak_HF 7.42f3.64 0.156
film clips are selected by the participant themselves, which can P_power_LF 42.44f10.86 0.145
truly induce the four types of emotions corresponding to the four P_power_MF 45.90f9.42 0.264
quadrants in the 2D emotion model. In addition, the criteria we P_power_HF 35.63f8.65 0.041
used for selecting film clips are as follows: (1) the duration of
H_meanValue 0.76f0.32 0.137
the whole experiment should not last for a long time in case it
will make the participants visual fatigue, (2) the video clips H_SDNN 0.13f0.08 0.235
should be understood easily, (3) each video clip should evoke a H_maxValue 0.91f0.15 0.042
single desired target emotion. Finally, we choose 20 film clips H_minValue 0.61±0.24 0.011
with a whole time duration of about 2 hours. In particular, these H_STDD 2.38f0.39 0.164
20 film clips consist of 4 happiness clips, 3 sadness clips, 3 fear
H_SD1 4.23f2.27 0.188
clips, 3 anger clips, 4 sensation clips, and 4 neutrality clips, each
of which is cut into time slice of 5 minutes. H_SD2 8.97f3.65 0.143
In this paper, 15 subjects (9 males and 6 females; age range: H_SD12 0.63f0.22 0.201
22-28 years old, mean: 24.35, std: 3.52) are recruited for the H_NN50 4.12f2.68 0.203
experiment, all of them are graduate students and no one has H_ power_VLF 5.65f4.32 0.035
reported the cardiovascular, neurological, epileptic, or
H_ power_LF 8.36f5.48 0.154
hypertension disease. The volunteers are requested not to use
caffeine, salty or fatty food, one hour before the experiment. H_ power_HF 7.23f4.16 0.176
What’s more, they have not taken any somatic drugs, which may E_mean 0.54f0.32 0.138
have an important impact on the physiological response. E_mean_SCL 0.48f0.37 0.214
The experiment was performed in a quiet laboratory room. E_std_SCL 0.35f0.58 0.195
We measure the subject’s physiological signals using a wearable
E_num_SCR 38.63f13.52 0.187
device called Empatica E4 wristband, which is embedded with
a photoplethysmography sensor (64Hz), an electrodermal E_mean_SCR 0.02f0.04 0.114
activities sensor (4Hz), and a temperature sensor (1Hz). The E_std_SCR 0.01f0.02 0.161
data-collecting experiment involves the following stages: first, T_maxValue 34.21±3.21 0.067
we prepare the experiment environment, and the subject put on T_minValue 27.99+4.38 0.058
the E4 wristband 10 minutes before playing the film clip, so that T_average 32.94±2.64 0.107
she can calm down and be as relaxed as possible during the
experiment. Next, we play the first video clip to evoke the T_STD 1.05±0.89 0.203
subject’s emotion, and collect the subject’s physiological signal C. Emotion recognition
using the wristband. When the video clip is over, the subject will
In this paper, the performance of classifiers is evaluated by
have a little break of 60s, and during this time she can fill in the
means of the correct classification ratio (CCR), which can be
questionnaire to describe arousal (low, high) and valence (low,
TP  TN
high) of current emotion status. Then, we repeat the previous calculated as 100% . Meanwhile, to get robust
TP  TN  FP  FN
process to play the video clip continuously with an interval of
verification result, 10 fold cross-validation is applied.
60s, until the whole experiment is finished. Finally, we obtain a

352
Table 2. Classification results of four types of emotions, arousal, 82.67% for HANV and the lowest is 69.78% for HAPV. In
valence addition, we can see that most classification errors for HAPV
are in false classification of LAPV and HANV, while an extreme
Participant CCR (%)
uncertainty can be found between LANV and LAPV. The
ID Four types of emotions Arousal Valence overall classification of the four emotions is 75.56%, as shown
#1 65.00 71.67 66.67 in Table 2.
#2 73.33 73.33 75.00 From the results of cross-participants (i.e., All participants in
#3 75.00 80.00 76.67 Table 2) and participant-independent emotion recognition
#4 66.67 71.67 70.00 model, we find that the performance of the model across
#5 78.33 85.00 81.67 participants is worse than that for a single participant model. In
#6 76.67 81.67 78.33 general, we intend to train a model based on the wearable
#7 73.33 78.33 75.00 wristband from a set of participants and perform inference on
#8 78.33 76.67 80.00 the new dataset from other unknown participants. However, this
#9 83.33 85.00 86.67 is technically challenging due to individual differences, even
#10 81.67 83.33 78.33 though emotion patterns of different individuals share some
#11 76.67 83.33 71.67 common characteristics. This is the reason why the CCR of
#12 78.33 81.67 83.33 classifiers trained and tested on each individual participant is
#13 75.00 70.00 75.00 higher than that of a classifier trained on all the participants.
#14 76.67 78.33 71.67 2) CCR variation with different number of features
In order to evaluate the relevance of emotion recognition
#15 78.33 85.00 78.33
accuracy and the number of features and find the best emotion-
All 75.56 82.89 75.56 related features, we determine to rank all the features by using
SFFS+SNM, where the number of features increased from 2 to
Table 3. The confusion matrix of recognition results 28 by iteratively adding 2 features each time. Figure. 8
Classificatio Precision(% summarizes the CCR of arousal classification, valence
LANV LAPV HANV HAPV
n overall )
classification and four emotion classification with the increase
LANV 174 25 7 17 223 78.03
of features, respectively. The result shows that the features
LAPV 29 163 23 27 242 67.36
obtained from all the three biosensors by using time, frequency
HANV 7 19 186 24 236 78.81 and nonlinear analysis are effective. More particularly, it is
HAPV 15 18 9 157 199 78.89 obvious that the CCR increases with the number of selected
Truth
overall 225 225 225 225 900 features, and the CCR of arousal gets to maximum with 14
Accuracy(% 77.3 72.4 82.6 69.7 features, the CCR of valence with 18 features, and four emotions
) 3 4 7 8 with 16 features.
Table 4 shows the most emotion-related features for the three
1) Classification using SFFS+SVM classification problems. For the arousal classification, fewer
First, we applied the SFFS+SVM in all the 900 instances to features are used relatively, while it achieves higher recognition
classify the four emotion types, which indicate the overall accuracy compared with the other two classification problems.
performance of our emotion recognition system. Then, we tried In addition, most of the features are HRV-related features, which
to differentiate the emotions based on the two axes, arousal and indicates that HRV variation is more sensitive to arousal
valence, in the 2D emotion model. Instances of the four classification. For valence classification, EDA-related features
emotions were divided into groups of high arousal and low E_std_SCL, E_mean_SCL, E_mean_SCR, E_std_SCR are
arousal and into groups of negative valence and positive valence. more useful.
By using this binary classification strategy, we can analyze the
response of subject to the arousal and the valence dimension,
respectively. Table 2 shows the CCR results changing from
subject to subject. We can see that, for arousal classification,
subject 9 achieve the highest CCR of 85.00%, much high than
the lowest CCR of 70.00% (subject 13), and the overall CCR is
78.89%. For valence classification, subject 9 achieve the best
CCR of 86.67%, much higher that the lowest CCR of 66.67%
(subject 1), and the overall CCR is 75.56. It is evident that the
CCR in arousal classification is higher than that in valence
classification. The reason might be that human emotion changes
are more sensitive to the arousal dimension. Moreover, we also
evaluated the performance of four kind of emotions and the Figure 8. CCR varies with the number of features
confusion matrix is shown in Table 3, which indicates that CCR
varies for different labels. For example, the best accuracy is

353
Table 4. Most emotion-related features
Classification Best emotion-related features
Arousal P_peak_MF, P_peak_HF, P_power_MF,
H_meanValue, H_SD1, H_SD2,
H_ power_LF, E_mean_SCL, T_average
Valence P_peak_LF, P_peak_MF, P_power_MF,
H_meanValue, H_SD1,E_std_SCL,
E_mean_SCL, E_mean_SCR, E_std_SCR,
T_STD
P_peak_LF, P_peak_MF, P_peak_HF,

Figure 9. Comparation of different classifications
Four P_power_MF,H_meanValue, H_SD1,
emotions H_SD2,H_power_LF,E_num_SCR,
E_mean_SCL,E_mean_SCR, T_STD, VI. CONCLUSIONS
T_average
In this paper, we presented an emotion recognition system
based on the wearable wristband, by exploring the multi-modal
3) Comparation of different feature-selection methods physiological signals collected based on the embedded multi-
In this paper, we also tested different feature selection mode biosensors, achieving an average accuracy of 75.56%. In
methods: principal component analysis (PCA) and Information order to obtain a real-world dataset from subjects, we designed
Gain. PCA is a linear technique for dimensionality reduction, an emotion stimuli corpus to evoke human emotions, and
which performs a linear mapping of the data to a lower performed the experiment in a real-life environment.
dimensional space, in such a way the variance of the data in the Meanwhile, a questionnaire was designed to record the emotion
low dimensional representation is maximized. Information gain status of the subjects, which was used as the ground-truth.
can be used to define a preferred sequence of attributes Furthermore, we analyzed the collected multi-modal
according to the change of information entropy from a prior state physiological signals to extract features from different domains,
to a state that takes some information. Table 5 shows the results including time domain, frequency domain, and nonlinear
of different feature selection methods, we can see that SFFS analysis. A total of 28 features were extracted from the three
performs best in the three feature selection methods, while biosensors of BVP, EDA, and SKT. In addition, the feature
Information Gain has the lowest CCR when using SVM selection method SFFS was used to search for the best emotion-
algorithm. It is surprising that PCA outperforms the other two related features. Finally, we classified different emotions base
feature selection methods when using Random forest. on SVM using the selected features in aspect of arousal, valence,
Table 5. Performance of different feature-selection methods and four emotions. The results demonstrated that our system
Feature- can detect the status of LANV, LAPV, HANV, and HAPV with
Random Neural Naïve high accuracy.
selection SVM
forest Network Bayes In the future, we intend to extend our work in two directions.
method
SFFS 73.61 73.92 75.56 71.38 First, we plan to add other modalities jointly to interpret human
emotion states, such as audiovisual information, i.e., facial
Information expression, gesture, voice and contextual information of human
71.19 72.37 70.56 70.33
Gain communication, which will help us understand human emotions
PCA 75.42 71.31 73.25 70.69 more accurately. Second, we will combine our system with
android cellphone to recognition emotions in a real-time
environment.
4) Comparation of different classification methods
The classification performance of different classifiers, i.e. ACKNOWLEDGMENT
Random Forest, Neural Network, SVM, and Naïve Bayes are
compared in this work. Considering the characteristic of This work was supported in part by the National Key R&D
Program of China (No. 2016YFB1001400), the National
individual difference commonly existing in physiological
Natural Science Foundation of China (No. 61332005,
features, all the classifiers are evaluated via leave-one-out cross-
61772428), the Innovative Talents Promotion Program of
validation. Figure 9 shows the classification results of arousal, Shaanxi Province (No. 2018KJXX-011), and the Seed
valence, and four emotions using different classifiers. We can Foundation of Innovation and Creation for Graduation Students
see that SVM achieves the best performance on distinguishing in Northwestern Polytechnical University (No. ZZ2018168).
arousal, valence, and four emotions, respectively, which
indicates that SVM is more suitable for our dataset. Other
REFERENCES
classifiers have quite similar performance especially for
Random Forest and Neural Network, and Naïve Bayes has the [1] Kim, Jonghwa, and Elisabeth André. "Emotion recognition based on
physiological changes in music listening." IEEE transactions on pattern
lowest CCR among all the classifiers. analysis and machine intelligence30.12 (2008): 2067-2083.

354
[2] Jang, Eun-Hye, et al. "Analysis of physiological signals for recognition [14] Schaaff, Kristina, and Tanja Schultz. "Towards emotion recognition
of boredom, pain, and surprise emotions." Journal of physiological from electroencephalographic signals." Affective Computing and
anthropology 34.1 (2015): 25. Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd
[3] Dai, Yixiang, et al. "Reputation-driven multimodal emotion recognition International Conference on. IEEE, 2009.
in wearable biosensor network." Instrumentation and Measurement [15] R. Bailón, L. Sörnmo, and P. Laguna, “A robust method for ECG-based
Technology Conference (I2MTC), 2015 IEEE International. IEEE, estimation of the respiratory frequency during stress test-ing,” IEEE
2015. Trans. Biomedical Engineering, vol. 53, no. 7, pp. 1273-1285, 2006.
[4] D. Kulić and A. Croft, “Affective state estimation for human-ro-bot [16] M. Orini, R. Bailón, R. Enk, S. Koelsch, L. Mainardi, and P.La-guna, “A
interaction,” IEEE Trans. Robotics, vol. 23, no. 5, pp. 991-1000, 2007. method for continuously assessing the automatic re-sponse to music-
[5] P. Rainville, A. Bechara, N. Naqvi, and A. R. Damasio, “Basic emotions induced emotions through HRV analysis,” Med-ical & Biological
are associated with distinct patterns of cardiorespira-tory activity,” Engineering & Computing, vol. 48, pp. 423-433, 2010.
International Journal of Psychophysiology, vol. 61, pp. 5-18, 2006. [17] M. D. van der Zwaag, J. H. Janssen, and J. H. D. M. Wesrerlink,
[6] G. Rigas, C. D. Katsis, G. Ganiatsas, and D. I. Fotiadis, “A user “Directing physiology and mood through music: Validation of an
independent, biosignal based, emotion recognition method,” in Proc. affectice music player,
11th Int’l conf. User Modeling, pp. 314-318, 2007. [18] [M. Kusserow, O. Amft, and G. Tröster, “Modeling arousal phases in
[7] Hsu, Yu-Liang, et al. "Automatic ECG-Based Emotion Recognition in daily living using wearable sensors,” IEEE Trans. At-tective Computing,
Music Listening." IEEE Transactions on Affective Computing (2017). vol. 4, no. 1, pp. 93-105, 2013.
[8] A. Kleinsmith and N. Bianchi-Berthouze, “Affective body expres-sion [19] Koelstra, Sander, et al. "Deap: A database for emotion analysis; using
perception and recognition: A survey,” IEEE Trans. Attective physiological signals." IEEE Transactions on Affective Computing 3.1
Computing, vol. 4, no. 1, pp. 15-33, 2013. (2012): 18-31.
[9] K. Wac and C. Tsiourti, “Ambulatory assessment of affect: Sur-vey of [20] Cabibihan, John-John, and Sushil Singh Chauhan. "Physiological
sensor systems for monitoring of autonomic nervous sys-tems activation responses to affective tele-touch during induced emotional
in emotion,”IEEE Trans. Attective Computing, vol. 5, no. 3, pp. 251- stimuli." IEEE Transactions on Affective Computing 8.1 (2017): 108-
272, 2014. 118.
[10] R. Bailón, L. Sörnmo, and P. Laguna, “A robust method for ECG-based [21] ]Sarode N, Bhatia S. Facial expression recognition[J]. International
estimation of the respiratory frequency during stress test-ing,” IEEE Journal on computer science and Engineering, 2010, 2(5): 1552-1557.
Trans. Biomedical Engineering, vol. 53, no. 7, pp. 1273-1285, 2006. [22] Wu S, Falk T H, Chan W Y. Automatic speech emotion recognition
[11] M. Orini, R. Bailón, R. Enk, S. Koelsch, L. Mainardi, and P.La-guna, “A using modulation spectral features[J]. Speech Communication, 2011, 53:
method for continuously assessing the automatic re-sponse to music- 768-785.
induced emotions through HRV analysis,” Med-ical & Biological [23] Setz, Cornelia, et al. "Discriminating stress from cognitive load using a
Engineering & Computing, vol. 48, pp. 423-433, 2010. wearable EDA device." IEEE Transactions on information technology
[12] Kuppens, Peter, et al. "The Relation Between Valence and Arousal in in biomedicine 14.2 (2010): 410-417.
Subjective Experience." [24] Sarode N, Bhatia S. Facial expression recognition[J]. International
[13] Quirin, Markus, Miguel Kazén, and Julius Kuhl. "When nonsense Journal on computer science and Engineering, 2010, 2(5): 1552-1557.
sounds happy or helpless: the implicit positive and negative affect test [25] Wu S, Falk T H, Chan W Y. Automatic speech emotion recognition
(IPANAT)." Journal of personality and social psychology 97.3 (2009): using modulation spectral features[J]. Speech Communication, 2011, 53:
500. 768-785.

355

You might also like