Professional Documents
Culture Documents
Yongqiang Yin, Xiangwei Zheng, Bin Hu, Yuang Zhang, Xinchun Cui
PII: S1568-4946(20)30892-9
DOI: https://doi.org/10.1016/j.asoc.2020.106954
Reference: ASOC 106954
Please cite this article as: Y. Yin, X. Zheng, B. Hu et al., EEG emotion recognition using fusion
model of graph convolutional neural networks and LSTM, Applied Soft Computing Journal (2020),
doi: https://doi.org/10.1016/j.asoc.2020.106954.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
of
Graph Convolutional Neural Networks and LSTM
pro
a Schoolof Information Science and Engineering, Shandong Normal University, Jinan,
China
b Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology,
Jinan, China
c School of Computer Science, Qufu Normal University, Rizhao, China
Abstract re-
In recent years, graph convolutional neural networks have become research fo-
cus and inspired new ideas for emotion recognition based on EEG. Deep learn-
ing has been widely used in emotion recognition, but it is still challenging to
construct models and algorithms in practical applications. In this paper, we
lP
propose a novel emotion recognition method based on a novel deep learning
model (ERDL). Firstly, EEG data is calibrated by 3s baseline data and divided
into segments with 6s time window, and then differential entropy is extracted
from each segment to construct feature cube. Secondly, the feature cube of each
rna
segment serves as input of the novel deep learning model which fuses graph
convolutional neural network (GCNN) and long-short term memories neural
networks (LSTM). In the fusion model, multiple GCNNs are applied to extract
graph domain features while LSTM cells are used to memorize the change of the
relationship between two channels within a specific time and extract temporal
features, and Dense layer is used to attain the emotion classification results.
Jou
of
Keywords: EEG, emotion recognition, Long-short term memory neural
network, Graph convolutional neural network, Differential entropy
pro
1. Introduction
Emotions play an important role in daily life and influence the perception
of the surroundings. Recently, many human-computer interaction systems are
established by lots of research communities in domestic and abroad, so the au-
5 tomatic classification of emotional states becomes indispensable. This can be
re-
achieved with a variety of methods, such as subjective self-reporting, neuro-
physiological measurements and so on. In recent years, electroencephalography
(EEG) based emotion recognition has received widespread attention because it
is a simple, cheap, portable, and easy-to-use emotion classification method [1].
10 EEG signals record the relationship between emotional state and brain activity
lP
and reflect very subtle emotional changes with high time resolution [2]. However,
EEG signals have the shortcomings [3] as time asymmetry and instability, low
signal-to-noise ratio, and uncertain brain areas of specific reactions. Therefore,
EEG-based emotion recognition is still a challenging task.
Many researchers have proposed their methods for emotion recognition us-
rna
15
2
Journal Pre-proof
which fuses graph convolutional neural network and long-short term memories
of
neural networks. In the fusion model, multiple GCNNs are applied to extract
graph domain features, LSTM cells are used to memorize the change of the rela-
30 tionship between two EEG channels within a specific time and extract temporal
pro
features, and Dense layer is used to attain the emotion classification result.
At last, we conducted extensive experiments on DEAP dataset and experi-
mental results demonstrate that the proposed method has better classification
performance than the state-of-the-art methods. We attained the average classi-
35 fication accuracy of 90.45% and 90.60% on the DEAP dataset for valence and
arousal in subject-dependent experiments while 84.81% and 85.27% in subject-
independent experiments.
re-
The main contributions of this paper are as follows:
cells’ gates are used to extract effective information from input (the output
of GCNNs) for emotion classification. The fusion of GCNN and LSTM
improves effectiveness of emotion recognition.
3
Journal Pre-proof
of
ods. The average accuracy of 90.45% and 90.60% for valence and arousal
in subject-dependent experiments are achieved while 84.81% and 85.27%
in subject-independent experiments.
pro
60 The remainder of this paper is as follow. We briefly review EEG features,
GCNN, LSTM and emotion recognition methods on DEAP dataset in section
2. In section 3, we present the proposed emotional recognition method and
its key components, including DE, fusion model of GCNN and LSTM and its
architecture. In section 4, we introduce the DEAP data set adopted in the
65 experiments and the evaluation indicators and analyze the experimental results
re-
of ECLGCNN on DEAP in detail. We conclude the paper and discuss the future
work in section 5.
2. Related work
lP
2.1. EEG features
capture the time domain information of EEG signals. Frequency domain fea-
80 tures aim to capture EEG emotional information from a frequency perspective.
Feature extraction in frequency domain consists of two steps. The first step
is to decompose the EEG signal into several frequency bands, including δ fre-
quency band (1-3Hz), θ frequency band (4-7Hz), α frequency band (8-13Hz), β
frequency band (14-30Hz) and γ frequency band (31-50Hz) [10, 15, 16, 17]. The
4
Journal Pre-proof
85 second step is to extract EEG features from each frequency band. Commonly
of
used EEG features include differential asymmetry [13],differential entropy (DE)
[18, 19], power spectral density [19], approximate entropy [20], sample entropy
[21] and Rational Asymmetry [22].
pro
Due to the instability of EEG signals, it is unilateral to individually use the
90 time domain information or frequency domain information, so more and more
studies simultaneously utilize the time and the frequency domain information to
embody the nature of EEG signals. The features of fusing time and frequency
domain are called time-frequency features. Common feature extraction methods
include short-time Fourier transform [23], wavelet transform [24] and so on.
95
re-
2.2. Graph convolutional neural network
and spectrum theory. Compared with the classical convolutional neural net-
work, graph convolutional neural network is advantageous in the discriminative
105 feature extraction of signals in the discrete spatial domain [27].
Graph convolutional neural network applies convolution operations to the
transformed graph, but the definition of convolution operation is the key. Fourier
transform is introduced to the graph, and the convolution theorem is adopted,
Jou
5
Journal Pre-proof
of
115 tional neural network and they applied the dynamic graph convolutional neural
network to the SEED and DREAMER data set and achieved good results. Wang
et al. [3] introduced a broad learning system and proposed a model that com-
pro
bines dynamic convolutional neural network and broad learning system. They
also applied the model to the SEED and DREAMER to verify the effectiveness
120 of emotion recognition.In image processing, Zhu et al. [28] adopted graph convo-
lutional neural network to extract the features of graph-structured data. Levie
et al. [29] proposed a CayleyNets based on graph convolutional neural network
and they made use of MNIST, CORA and MovieLens datasets to verify Cay-
125
re-
leyNets and attained good experimental results. Valsesia et al. [30] proposed a
convolutional neural network with graph convolutional layer and applied it to
recover an image from a noisy observation.
2.3. LSTM
lP
Recurrent Neural Network (RNN) owns the good capability of addressing
time series data and is commonly used in natural language processing. With
130 the special network structure, RNN memorizes the previous information and
utilizes them to influence the output of the subsequent nodes. The typical
rna
memories become an effective and scalable model for solving several learning
problems related to sequential data. LSTM architecture contains two important
140 units, namely, a storage unit and a nonlinear gating unit. The storage unit can
maintain its states over time and the nonlinear gating units can regulate the
information flow into and out of the units [32].
Yang et al. [33] proposed a novel approach to video captioning based on
6
Journal Pre-proof
adversarial LSTM and their method aimed at compensating for the deficiencies
of
145 of LSTM-based video captioning method. Yu et al. [34] proposed an end-to-
end model based on LSTM to optimize biomedical event extraction. Salma et
al. [35] designed a multi-layer LSTM framework for emotion recognition and
pro
applied it to the DEAP dataset.
150 The public release of DEAP [36] dataset provides opportunities for the re-
searchers in the field of emotional recognition. Prior to DEAP, most researchers
focused on analyzing facial expressions and speech to determine a person’s emo-
155
re-
tional states [37]. Recently, many researchers have proposed their own emotional
recognition methods for DEAP. Tripathi et al. [4] used the time domain fea-
tures from EEG to train deep neural network (DNN) and CNN respectively,
and the final classification accuracy exceeded 73%. Li et al. [5] applied wavelet
features to train CNN combined with LSTM and the binary classifications ac-
lP
curacy reached 72%. Salma et al. [35] designed a multi-layer LSTM framework
to learn features from EEG signals, then the dense layer classified emotions
160 into low/high arousal and valence. They used DEAP to verify their method
and achieved an average accuracy of 85.65% and 85.45% with arousal and va-
rna
got classification accuracy of 72.10% in the valence and 73.10% in the arousal.
Liu et al. [41] used bimodal deep autoencoder to generate new features, and then
170 fed the new features into support vector machines (SVM) to complete emotions
classification. They attained the accuracy of 85.20% for binary classification of
valence and 80.50% of arousal. Mert and Akan [42] first normalized the IMF
generated by the decomposition of multiple empirical modes and extracted 10
7
Journal Pre-proof
features as PSD, entropy etc., then processed them with ICA and fed them
of
175 into artificial neural network. They attained 72.87% in binary classification of
valence and 75.00% in binary classification of arousal. Thammasan et al. [43]
extracted the fractal dimension and power spectral density from EEG data, and
pro
then put the extracted features into the SVM classifier. They attained 73.00%
in binary classification of valence and 72.50% in binary classification of arousal.
180 Zhang et al. [44] decomposed the EEG signal into four bands, i.e. theta, al-
pha, beta and gamma, and used FFT to calculate the power as EEG features
which were input to the PNN. The experimental results showed that the mean
classification accuracy of PNN was 81.21% for valence(≥ 5 and <5) and 81.26%
185
re-
for arousal(≥5 and < 5). He et al. [45] proposed firefly integrated optimization
algorithm (FIOA) to simultaneously accomplish multiple tasks, i.e. the optimal
feature selection, parameter setting and classifier selection according to different
EEG-based emotion datasets. The experimental results showed that the aver-
age classification accuracy of FIOA was 86.90% for positive emotion (valence
lP
≥5 and arousal ≥5) and negative emotion (valence <5 and arousal <5).
190 Although the above recognition methods achieved some progresses in some
applications, the classification accuracy is relatively low. Graph convolutional
neural networks have shown successful applications, and inspired new ideas for
rna
3. Method
8
Journal Pre-proof
of
features that contains positive emotions (amusement, joy, tenderness) and
negative emotions (anger, sadness, fear, disgust) [46].
pro
affecting subjects: Valence (disgust, pleasure) and Arousal (calm, excite-
ment) [47, 48, 49].
re-
lP
Figure 1: Emotion recognition method using deep learning model based on EEG’s differential
entropy. The size of iT feature cube is T × CN × F N . T is T seconds; CN is the number of
EEG channels; FN is the number of features.
Our study is based the above researches, and Figure 1 illustrates the pro-
rna
posed emotion recognition method using deep learning model based on EEG’s
210 differential entropy. The proposed ERDL method consists of four steps.
(1) Data calibration. Firstly, 3 seconds baseline EEG data, which is generated
spontaneously by the brain, is copied 20 times and linked one by one.
Then, EEG data of watching 60 seconds video subtracts corresponding
baseline data where the purpose of the processing is to remove EEG data
Jou
(2) Data division. The calibrated EEG data is divided into ((60 - T )/S + 1)
segments for each trial of each subject. In the following experiments, we
set T to 6 seconds and S to 3 seconds referencing to the experimental
results of literatures [51, 52].
9
Journal Pre-proof
220 (3) Feature extraction. The experimental results [3, 8, 53] show that differen-
of
tial entropy of EEG data has a higher accuracy in emotion classification,
so the proposed method utilizes the differential entropy of EEG data. Sec-
tion 3.2 briefly introduces the extraction process of differential entropy of
pro
EEG data.
225 (4) Emotion recognition. A novel ECLGCNN is developed and used to emo-
tion recognition in this paper. ECLGCNN contains three layers, namely,
GCNNs layer, LSTMs layer and Dense layer. GCNNs layer is to attain
graph domain and temporal information from EEG channels’ DE features
while LSTMs and Dense layer are used to predict the low/high arousal or
re-
230 negative/positive valence according to the output of GCNNs layer.
2
where x ∼ N µ, σ 2 , e and π are constants. In this paper, for the characteristics
of EEG signals, DE is extracted from five main bands, namely δ frequency band
(1-3Hz), θ frequency band (4-7Hz), α frequency band (8-13Hz), β frequency
band (14-30Hz) and γ frequency band (31-50Hz).
10
Journal Pre-proof
of
pro
re-
lP
rna
Jou
11
Journal Pre-proof
245 3.3. Fusion model of LSTM and GCNN for emotion recognition
of
The whole architecture of the fusion model of LSTM and GCNN for emotion
recognition is shown in Figure 2, which contains three layers, namely, GCNNs
layer, LSTMs layer and Dense layer. The GCNNs layer is used to calculate the
pro
relationship between two EEG channels during a period of time, the LSTMs
250 layer is devoted to memorize changes between two EEG channels in a certain
period, and the Dense layer completes the final emotion recognition according
to the LSTMs layer’s output result. In the GCNNs layer, we set T GCNNs to
extract the graph domain features from DE of T seconds’ EEG data. In other
words, we make use of graph domain information and time domain information
255
re-
to improve the EEG emotion recognition. The LSTMs layer contains input layer,
hidden layer and output layer. We set T LSTM cells to receive the calculation
results of T GCNNs, and set the number of LSTM hidden layer cells to num cell.
The LSTMs layer is used to memorize changes between two EEG channels in
T seconds. The Dense layer is a fully connected layer and is used to perform
lP
260 data dimension transformation. In the end of the fusion model, the Dense layer
is used to attain recognition result with feeding of the LSTMs layer’s output
result.
rna
different nodes of the graph, which provides a potential way to explore the
270 relationship among multiple EEG channels in the emotion recognition using
EEG [3].The details of i-th GCNN structure for GCNNs layer is shown in Figure
3. The calculation process of GCNN is comprised of two steps as follow.
(1) Graph Representation. Inspired by the successful applications of graph
convolutional neural network model in image processing [28, 29, 30, 54], this
12
Journal Pre-proof
of
pro
Figure 3: Graph convolutional neural network structure
275
re-
paper studies the problem of multi-channel EEG emotion recognition through
graph representation method. In the proposed graph representation, each EEG
channel corresponds to a node [8], and the functional relationship between two
channels corresponds to the edge of the graph and the value of the edge repre-
sents the close degree of the functional relationship. Meanwhile, the greater the
lP
280 value of the edge is, the closer the functional relationship of two channels is.
The graph can be defined as G= {V, ε, A} , where V is the set of N nodes,
ε is the edge set, A ∈ RN ×N is the adjacency matrix of node set V, and Ai,j is
the functional relationship between node i and node j.The common method for
rna
where θ and τ are two parameters to be fixed, disti,j is the Euclidean distance
Jou
between the i-th vertex node and the j-th vertex node.
(2) Spectral Graph Filtering. Spectral graph filtering, also known as graph
convolution, is a popular signal processing method used for graph data opera-
285 tions, where graph Fourier transform (GFT) is a typical method.
13
Journal Pre-proof
of
1 1
L = E − D− 2 AD− 2 , A and D ∈ RN ×N (3)
P
where D ∈ RN ×N is a diagonal matrix, Di,i = Ai,j ;E is an identity matrix.
j
pro
For a given spatial signal x ∈ RN ×F N (FN is the number of features), its
GFT is as follows
x̂ = U T x (4)
x = U U T x = U x̂ (6)
lP
The convolution operation of two signals x and y on graph *G is defined as
x ∗G y =U UT x UT y (7)
y = g (L) x = g U ΛU T x = U g (Λ) U T x (8)
0 ··· g (λN )
14
Journal Pre-proof
of
method of Chebyshev polynomial [3] whose calculation process is formulated as
T0 (x) = 1
pro
T1 (x) = x (11)
Tk (x) = 2xTk−1 (x) − Tk−2 (x) , k ≥ 2
Combining formula (10), formula (8) can be converted into the following
calculation form
y = U g (Λ) U T x
θk Tk (λ1 ) · · · 0
P
K−1
re- .. .. ..
T
= U . . . U x (12)
k=0
0 ··· θk Tk (λN )
P
K−1
= θk Tk L̃ x
k=0
L
where L̃ = − E; E is an identity matrix.
lP
λM AX
(1) Calculate the relationship between two EEG channels with the K-nearest
neighbor;
rna
an effective and scalable model for solving several learning problems related to
sequential data [32]. The purpose of LSTMs layer in Figure 2 is to memorize
300 the change of the relationship between two EEG channels in T seconds. LSTMs
layer defines three layers: the first layer is the input layer which is used to receive
the results from GCNNs layer; the second layer is hidden layer and is used to
memorize the change of the relationship between EEG channels in T seconds;
15
Journal Pre-proof
the last layer is used to output the information of emotion recognition. Next,
of
305 we depict LSTM cell’s structure. The structure of each LSTM cell structure is
shown in Figure 4 .
pro
re-
Figure 4: The structure of LSTM cell
The forget gate, input gate and output gate of LSTM cells can be used to
add and remove information to cell states and are defined as follow
where σ (•) is the activation function, ht−1 is the output of LSTM cell at the
last moment, ct−1 is the state of LSTM cell at the last moment and bi , bo , bc are
bias.
16
Journal Pre-proof
of
The loss function of ECLGCNN model is defined in formula (19):
pro
where p is the predicted value of the model, l is the label, W is all parameters
of the model, and α is the regularization coefficient. The cross-entropy function
cross entropy (p, l) aims to measure the difference between the actual label and
the predicted value of the model, while the regular term αkW k2 aims to reduce
315 the overfitting of the model’s learning parameters.
The update rule of graph convolution parameter is defined in formula (20)
[25]:
re- θ∗ =θ∗ +λ
∂Loss
∂θ∗
(20)
4:
5: x = F Si,j,:,:
330 6: Calculate adjacency matrix A of x according to k-NN and formula
(2)
7: Calculate the Laplace matrix L of x according to formula (3)
L
8: L̃ = λM AX −E
17
Journal Pre-proof
9: Calculate Tk L̃ of x according to formula (12)
of
335 10: tempi,j = (T0 L̃ x, T1 L̃ x, · · · , TK−1 L̃ x)
11: end for
12: end for
pro
13: Step count = 0
14: while Loss > e||Step count < M AX do
∗
340 15: yj = sigmoid batch norm tempi,j ∗ θ·,j (i = 1, 2, ..., n; j = 1, 2, ..., T )
16: //* is a convolution operation.
17: Convert yj to column vector yj∗
18: y ∗ = (y1∗ , y2∗ , ..., yT∗ )
345
19:
20:
re-
Send y ∗ to the receiving cells of LSTM
Calculate Loss according to formula (19)
21: if Loss < e then
22: Break
23: end if
lP
24: Update the parameters of LSTM based on the Loss and BP algorithm
350 25: Update graph convolution parameters according to formula (20)
26: Step count = Step count + 1
27: end while
rna
4.1. DEAP
355 The experiments in this paper are based on multimodal DEAP data set.
DEAP is a large open source dataset that contains multiple physiological signals
with sentiment evaluation. In its data collection experiments, evoked EEG,
Jou
ECG, EMG and other bioelectric signals were detected and recorded, and 32
subjects (16 males and 16 females) were involved in 40 trials of music videos
360 with different emotional tendencies, where each music video was 1 minute. After
watching the music videos, participants rated the videos they watched on a scale
of 1-9 based on arousal, valence, likes, dominance and familiarity. Scored values
from small to large indicate that each indicator is from weak to strong.
18
Journal Pre-proof
In this paper, we used 32 channels data in the dataset, that is, only EEG
of
365 data is used. Eye myoelectricity, eye movement, and power supply noise of
EEG data are removed and the sampling rate is adjusted to 128 Hz. The du-
ration of the EEG signal is 63 seconds, including 3 seconds of pre-trial baseline
pro
data and 60 seconds of watching emotional video. For subject-dependent ex-
periments, we used data set of each subject to validate ECLGCNN. In order
370 to verify the model generalization, the data set of each subject was collected
into a sample set, so as to train and verify ECLGCNN. We defined the label
of DEAP EEG data as follows: arousal/valence with self-score more than 5
is high arousal/positive valence, otherwise it is low arousal/negative valence.
375
re-
The following section 4.3 and 4.4 discuss subject-dependent experiments and
subject-independent experiments, respectively.
The classification accuracy Acc and F-score are used to evaluate ECLGCNN
lP
model. The Acc is expressed as
TP + TN
Acc = (21)
TP + TN + FP + FN
where TP is the number of samples that the classification model can accurately
rna
TP
P re = (22)
TP + FP
TP
Rec = (23)
TP + FN
19
Journal Pre-proof
of
precision and the recall rate, and the calculation method is defined as
2 × P re × Rec
F − score = (24)
Rec + P re
pro
4.3. Subject-dependent experiments on DEAP
60−T
In this paper, each subject’s EEG data is divided into S + 1 × 40
385 samples, for example, when T is set to 6 and S is set to 3, 760 samples are
generated, and 3 times 5-fold cross-validation with random strategy was adopted
to verify ECLGCNN.
We selected No.16 subject’s data to choose the model parameters, because
390
re-
the number of positive samples and negative samples of No.16 subject is same in
binary classification of arousal. Then, we explored the influence of Chebyshev
polynomial order (K) and the number of LSTM hidden layer cells (num cell) on
emotion classification with ECLGCNN model. The influence of K and num cell
lP
is shown in Figure 5. We found that compared with num cell, the value of K
has the greater influence on ECLGCNN; when the num cell is 30 and the K is
395 2, ECLGCNN can reach the highest accuracy in binary classification of arousal.
Therefore, we set num cell to 30 and K to 2 in the following experiments. The
parameter settings of ECLGCNN is listed in Table 1.
rna
Jou
20
Journal Pre-proof
of
pro
re-
Figure 5: Experimental values of num cell and K
lP
Table 1: Parameter settings
ECLGCNN model parameters Values
The number of GCNNs T 6
The number of Chebyshev coefficients K 2
rna
21
Journal Pre-proof
of
Binary classification of valence Binary classification of arousal
Subject
Accuracy(%) F-score(%) Accuracy(%) F-score(%)
pro
01 93.42 93.13 94.21 95.29
02 87.73 88.88 86.84 89.06
03 93.24 93.76 94.12 84.35
04 90.65 88.49 87.46 84.13
05 88.90 90.76 87.85 87.07
06 89.52 93.08 89.13 87.43
07
08
09
94.34
91.79
91.84
re- 95.97
92.52
91.80
89.87
88.64
88.99
92.04
90.26
90.74
10 93.90 93.81 90.92 92.08
11 80.79 84.33 86.58 82.14
lP
12 87.53 87.86 89.65 93.80
13 89.65 87.46 92.32 95.47
14 89.82 89.53 88.16 91.33
15 94.43 94.36 91.88 91.24
rna
22
Journal Pre-proof
of
Subject
Accuracy(%) F-score(%) Accuracy(%) F-score(%)
27 89.22 91.75 91.01 93.27
28 91.01 92.74 85.00 83.08
pro
29 92.67 93.71 94.34 95.42
30 92.28 94.33 91.41 90.84
31 88.78 90.24 88.63 87.73
32 88.33 88.31 89.74 92.47
Average 90.45 91.08 90.60 90.94
re-
From Table 2, experimental results indicate that the minimum, maximum
and average classification accuracies of ECLGCNN for 32 subjects are 80.79%,
94.43% and 90.45% respectively in binary classification of valence. For the
binary classification of arousal, the minimum, maximum and average classifica-
405 tion accuracies of ECLGCNN for 32 subjects are 85.00%, 94.52% and 90.60%
lP
respectively. On the other hand, the average of F-score reaches more than 90%
in these two classification tasks.
In the following, the classification results of support vector classification
(SVC) [55], decision tree (DT) [56] and random forest (RF) [57] are compared
rna
410 with ECLGCNN using the same features. The comparison results are shown in
Figure 6, Figure 7, Figure 8 and Figure 9 respectively.
From Figure 6(a) and Figure 6(b), the classification accuracy and F-score of
ECLGCNN are relatively stable in binary classification of arousal, while the clas-
sification accuracy and F-score of SVC are unstable compared with ECLGCNN,
415 RF and DT. The reason for this phenomenon in binary classification of arousal
Jou
is that SVC fails to find an optimal classification surface. The classification ac-
curacy and F-score of DT are closer to RF, because the strategies of generating
decision tree for RF and DT are similar. However, ECLGCNN makes use of
temporal and graph features from DE of EEG data to find a best classification
420 surface compared with SVC, DT and RF. On the whole, this shows that our
proposed model is effective in binary classification of arousal.
23
Journal Pre-proof
of
1
0.9
0.8
0.7
0.6
Accuracy
pro
0.5
0.4
ECLGCNN
0.3 SVC
DT
0.2 RF
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject
0.9
0.8
0.7
0.6
re-
F-score
0.5
0.4
ECLGCNN
0.3 SVC
DT
0.2 RF
0.1
lP
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject
1.2
ECLGCNN
1.1 SVC
DT
1 RF
0.9060 0.9094
0.9
0.8131 0.8060 0.8211 0.8321 0.8152 0.8101
0.8
0.7
0.6
0.5
Jou
0.4
0.3
0.2
0.1
0
Accuracy F-score
Figure 7: Average classification result of the four classifiers on 32 subjects’ high/low arousal
24
Journal Pre-proof
of
ECLGCNN are the highest among four classifiers in binary classification of
arousal. The average F-score of SVC is higher compared with DT and RF,
425 while the average classification accuracy of SVC is the lowest. The reason for
pro
this phenomenon is that the classification accuracy and F-score of SVC are
unstable for 32 subjects. The average classification accuracy and F-score of DT
are closer to RF and the reason for this phenomenon is that RF is the extension
of DT. The average classification accuracy of ECLGCNN is at least 8.49% higher
430 than the other three classifiers in binary classification of arousal. In summary,
from Figure 6 and Figure 7, ECLGCNN is effective in binary classification of
arousal.
re-
Next, we show the comparison results of ECLGCNN, SVC, DF and RF in
binary classification of valence in Figure 8 and Figure 9.
1
0.9
lP
0.8
0.7
0.6
Accuracy
0.5
0.4
ECLGCNN
0.3 SVC
DT
0.2 RF
0.1
0
rna
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject
0.9
0.8
0.7
0.6
F-score
0.5
0.4
Jou
ECLGCNN
0.3 SVC
DT
0.2 RF
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject
25
Journal Pre-proof
435 From Figure 8(a) and Figure 8(b), the classification accuracy and F-score
of
of ECLGCNN are relatively stable in binary classification of valence. The clas-
sification accuracy of ECLGCNN for subject 22 is lower than SVC, and the
F-score of ECLGCNN for subject 12, subject 13, subject 22 and subject 24 is
pro
lower than SVC. These experimental results show that SVC is the best at binary
440 classification of valence based on DE compared with DT and RF, and SVC is
lower than ECLGCNN. The classification accuracy and F-score of DT are the
lowest in binary classification of valence compared with ECLGCNN, SVC and
RF.
1.2
1.1
0.9
0.8
0.9045
re-
0.8246
0.7936 0.8110
0.9108
0.8614
ECLGCNN
SVC
DT
RF
0.8068 0.8053
0.7
0.6
0.5
lP
0.4
0.3
0.2
0.1
0
Accuracy F-score
rna
Figure 9: Average classification result of the four classifiers on 32 subjects’ high/low valence
26
Journal Pre-proof
results are shown in Figure 10. Experimental results show that our method
of
455 is the most effective compared with the other two methods. Our method is
5% higher than Salma’s method [35] in binary classification of valence and
4.95% higher in binary classification of arousal, where they designed a multi-
pro
layer LSTM framework for emotion recognition. Our method is also 5.25%
higher than Liu’s [41]method in binary classification of valence and 10.10%
460 higher in binary classification of arousal, where they combined bimodal deep
autoencoder and SVM to recognize emotions. In addition, we compare ERDL
with He’s (the last name of author) method [45]. Our experimental results
are at least 3.55% higher than their experimental results. Their experiments
465
re-
were conducted on positive emotion (valence>5 and arousal>5) and negative
emotion (valence<5 and arousal<5), but our experiments were conducted on
positive (valence>5)/negative (valence<5) valence and high (arousal>5)/low
(arousal<5) arousal. On the whole, our experimental results are higher than
their experimental results.
lP
ERDL
Liu et al.
Salma et al.
1
0.9045 0.9060
0.9 0.8545 0.8565
0.8520
0.8050
0.8
0.7
rna Accuracy
0.6
0.5
0.4
0.3
0.2
0.1
0
Valence Arousal
Jou
From the above experimental results, we may conclude that ERDL is the
470 most effective in the subject-dependent experiments, which contributes to the
fusion of GCNN and LSTM.
27
Journal Pre-proof
of
Table 3: Parameter settings
ECLGCNN model parameters Values
The number of GCNNs T 6
The number of Chebyshev coefficients K 10
pro
The number of LSTM hidden layer cells(num cell) 150
GCNN activation function type sigmoid
LSTM activation function type sigmoid
The number of nodes in the graph 32
Maximum number of model iterations MAX 100000
Model error threshold e 0.12
re-
Model learning rate λ
Model regular term coefficient α
0.003
0.00008
From Figure 11 and Figure 12, the average classification accuracy and F-
485 score of ECLGCNN are the highest in binary classification of arousal and va-
lence. The average classification accuracy of ECLGCNN is at least 7.74% higher
than the other three classifiers in binary classification of arousal and at least
8.29% higher than other three classifiers in binary classification of valence. The
28
Journal Pre-proof
of
490 arousal and valence compared with the other three classifiers, because SVC is
sensitive to the choice of parameters and kernel functions, and fails to find the
optimal parameters. Therefore, ECLGCNN is effective in subject-independent
pro
emotion classification.
1.2
ECLGCNN
1.1 SVC
DT
1 RF
0.9 0.8713
0.8527
0.7888 0.7877
0.8 0.7568 0.7753
0.7317
0.7 0.6777
0.6
0.5
0.4
0.3
0.2
re-
0.1
0
Accuracy F-score
lP
Figure 11: Average classification result of four classifiers for low/high arousal in Subject-
independent experiments
1.2
ECLGCNN
1.1 SVC
rna
DT
1 RF
0.9 0.8621
0.8481
0.8 0.7509 0.7652 0.7732 0.7636
0.7098 0.7189
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Jou
0
Accuracy F-score
Figure 12: Average classification result of four classifiers for positive/negative valence in
Subject-independent experiments
29
Journal Pre-proof
495 literatures and the comparison results are shown in Figure 13. Classification
of
accuracy of ERDL method is the highest compared with the other methods
in emotion recognition. ERDL is at least 3.4% higher than other methods in
binary classification of valence and is at least 3.51% higher than other methods
pro
in binary classification of arousal. Specifically, ERDL is at least 3.4% higher
500 than Tripathi’s method [4] who made use of the time domain features of EEG
signals and deep learning model. ERDL is at least 11.15% higher than Li’s
method [5] where they made use of a fusion model which fuses CNN and LSTM.
ERDL is at least 3.71% higher than Xing’s method [39] where they proposed a
SAE+LSTM classification model. ERDL is at least 11.97% higher than Wang’s
505
re-
method [40] where they proposed a 3D CNN for emotion classification. ERDL
is at least 10.27% higher than Mert’s method [42] where they extracted PSD,
entropy and other features from EEG data, then processed them with ICA
and fed them into artificial neural network. ERDL is at least 11.81% higher
than Thammasan’s method [43] where they extracted the fractal dimension and
lP
510 power spectral density from EEG data, and then fed the extracted features into
the SVM. And our method ERDL is at least 3.51% higher than Zhang’s method
[44] where they decomposed the EEG signal into four bands (theta, alpha, beta
and gamma) and used FFT to calculate the power as EEG data features which
rna
are input to the PNN. In addition, we compare ERDL with Chen’s method [38].
515 Our experimental results are at least 16.18% higher than their experimental
results. Their method focuses on the EEG channels’ selection and the accuracy
of emotion classification between different genders, but we ignore the influence
of gender on emotion recognition in our researches. This phenomenon shows
that our method is more universal than their method.
Jou
520 In summary, ERDL makes use of temporal and graph features from EEG
data to achieve good classification results and the nonlinear cells of ECLGCNN
make it much more powerful to perform feature representation and learning.
30
Journal Pre-proof
of
ERDL
Tripathi et al.
Li et al.
Xing et al.
Wang et al.
1 Mert et al.
Thammasan et al.
Zhang et al.
0.9
0.8481 0.8527
0.8141 0.8110 0.8121 0.8176
0.8
0.7210 0.7287 0.7300 0.7336 0.7412 0.7438 0.7330 0.7500 0.7250
pro
0.7206
0.7
Accuracy
0.6
0.5
0.4
0.3
0.2
0.1
0
Valence Arousal
re-
Figure 13: Comparison with other methods in Subject-independent experiments
4.5. Summary
lP
The difference among subjects’ EEG is not considered in subject-dependent
525 experiments, but it is considered in subject-independent experiments. The ex-
perimental results confirmed the phenomenon as shown in Figure 9 and Figure
13 that the classification accuracies of subject-dependent experiments are at
least 5% higher than subject-independent experiments. The purpose of subject-
rna
535 In this paper, we propose a new emotion recognition method using deep
learning model based on EEG’s differential entropy, which adopts a novel fu-
sion model of GCNN and LSTM for emotion classification. ECLGCNN utilizes
the graph and temporal information, where each EEG channel corresponds to a
31
Journal Pre-proof
graph node, and the functional relationship between two channels corresponds
of
540 to the edge of the graph and LSTM cells’ gates are used to extract effective
information. Both subject-dependent experiments and subject-independent ex-
periments on DEAP were conducted and the experimental results indicate that
pro
ERDL achieves better recognition accuracy than the state-of-the-art methods
such as CNN, RNN [5],LSTM [35], SAE+LSTM [39], EmotioNet [40], SVM [41],
545 ANN [42], and PNN [44] methods.
Furthermore, the average classification accuracy of ECLGCNN can be as
high as 90.52% for the case of subject-dependent experiments and can be as
high as 85.04% for the case of subject-independent experiments. The better
550
re-
classification accuracy of ECLGCNN owes to the following mechanisms:
Acknowledgments
Jou
We are grateful for the support of the National Natural Science Foundation of
China (91846205, 61373149), the National Key R&D Program (2017YFB1400102,
2016YFB1000602), and SDNSFC (no.ZR2017ZB0420).
32
Journal Pre-proof
565 References
of
[1] S. M. Alarcao, M. J. Fonseca, Emotions recognition using eeg signals: A
survey, IEEE Transactions on Affective Computing 10 (3) (2019) 374–393.
doi:10.1109/TAFFC.2017.2714671.
pro
[2] M. Hamalainen, R. Hari, R. J. Ilmoniemi, J. Knuutila, O. V. Lounas-
570 maa, Magnetoencephalography-theory, instrumentation, and applications
to noninvasive studies of the working human brain, Reviews of Modern
Physics 65 (2) (1993) 413–497. doi:10.1103/RevModPhys.65.413.
575
re-
recognition using dynamical graph convolutional neural networks and broad
learning system, in: 2018 IEEE International Conference on Bioinformatics
and Biomedicine (BIBM), 2018, pp. 1240–1244. doi:10.1109/bibm.2018.
8621147.
lP
[4] S. Tripathi, S. Acharya, R. D. Sharma, S. Mittal, S. Bhattacharya, Using
deep and convolutional neural networks for accurate emotion classification
580 on DEAP dataset, in: S. P. Singh, S. Markovitch (Eds.), Proceedings of
the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9,
2017, San Francisco, California, USA, AAAI Press, 2017, pp. 4746–4752.
rna
URL http://aaai.org/ocs/index.php/IAAI/IAAI17/paper/view/
15007
585 [5] X. Li, D. Song, P. Zhang, G. Yu, Y. Hou, B. Hu, Emotion recognition from
multi-channel eeg data through convolutional recurrent neural network, in:
2016 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM), 2016, pp. 352–359. doi:10.1109/bibm.2016.7822545.
Jou
33
Journal Pre-proof
of
595 in eeg-based dynamic music-emotion recognition, in: 2016 International
Joint Conference on Neural Networks (IJCNN), 2016, pp. 881–888. doi:
10.1109/IJCNN.2016.7727292.
pro
[8] T. Song, W. Zheng, P. Song, Z. Cui, Eeg emotion recognition using dynam-
ical graph convolutional neural networks, IEEE Transactions on Affective
600 Computing (2019) 1–1doi:10.1109/TAFFC.2018.2817622.
doi:10.1155/2013/573734.
34
Journal Pre-proof
of
in Biomedicine 14 (2) (2010) 186–197. doi:10.1109/TITB.2009.2034649.
[15] R. J. Davidson, What does the prefrontal cortex ”do” in affect: perspectives
on frontal eeg asymmetry research, Biological Psychology 67 (1) (2004)
pro
625 219–234. doi:10.1016/j.biopsycho.2004.03.008.
[16] M. Li, B. Lu, Emotion classification based on gamma-band eeg, in: 2009
Annual International Conference of the IEEE Engineering in Medicine and
Biology Society, Vol. 2009, 2009, pp. 1223–1226. doi:10.1109/IEMBS.
2009.5334139.
630
re-
[17] D. Nie, X. Wang, L. Shi, B. Lu, Eeg-based emotion recognition during
watching movies, in: 2011 5th International IEEE/EMBS Conference on
Neural Engineering, 2011, pp. 667–670. doi:10.1109/NER.2011.5910636.
[18] L. Shi, Y. Jiao, B. Lu, Differential entropy feature for eeg-based vigilance
lP
estimation, in: 2013 35th Annual International Conference of the IEEE
635 Engineering in Medicine and Biology Society (EMBC), Vol. 2013, 2013,
pp. 6627–6630. doi:10.1109/EMBC.2013.6611075.
7494017.
645 [21] Y. Shi, X. Zheng, T. Li, Unconscious emotion recognition based on multi-
scale sample entropy, in: 2018 IEEE International Conference on Bioin-
formatics and Biomedicine (BIBM), 2018, pp. 1221–1226. doi:10.1109/
bibm.2018.8621185.
35
Journal Pre-proof
of
650 P. D. Bamidis, Toward emotion aware computing: An integrated approach
using multichannel neurophysiological recordings and affective visual stim-
uli, IEEE Transactions on Information Technology in Biomedicine 14 (3)
pro
(2010) 589–597. doi:10.1109/TITB.2010.2041553.
cation from sar imagery based on the pixel grayscale decline by graph
convolutional neural network, IEEE Sensors Letters 4 (6) (2020) 1–4.
doi:10.1109/LSENS.2020.2995060.
36
Journal Pre-proof
of
TSP.2018.2879624.
pro
cessing, ICIP 2019, Taipei, Taiwan, September 22-25, 2019, IEEE, 2019,
pp. 2399–2403. doi:10.1109/ICIP.2019.8803367.
[33] Y. Yang, J. Zhou, J. Ai, Y. Bin, A. Hanjalic, H. Shen, Y. Ji, Video caption-
lP
690 ing by adversarial lstm, IEEE Transactions on Image Processing 27 (11)
(2018) 5600–5611. doi:10.1109/TIP.2018.2855422.
org/10.14569/IJACSA.2017.081046.
37
Journal Pre-proof
of
705 gorithm using fractal dimension, in: 2014 IEEE International Confer-
ence on Systems, Man, and Cybernetics (SMC), 2014, pp. 3166–3171.
doi:10.1109/SMC.2014.6974415.
pro
[38] J. Chen, B. Hu, P. Moore, X. Zhang, X. Ma, Electroencephalogram-based
emotion assessment system using ontology and data mining techniques,
710 Applied Soft Computing 30 (2015) 663–674. doi:10.1016/j.asoc.2015.
01.007.
[39] X. Xing, Z. Li, T. Xu, L. Shu, B. Hu, X. Xu, Sae+lstm: A new framework
re-
for emotion recognition from multi-channel eeg, Frontiers in Neurorobotics
13 (2019) 37. doi:10.3389/fnbot.2019.00037.
978-3-319-46672-9_58.
[42] A. Mert, A. Akan, Emotion recognition from eeg signals by using multi-
variate empirical mode decomposition, Pattern Analysis and Applications
725 21 (1) (2018) 81–89. doi:10.1007/s10044-016-0567-6.
[44] J. Zhang, M. Chen, S. Hu, Y. Cao, R. Kozma, Pnn for eeg-based emotion
730 recognition, in: 2016 IEEE International Conference on Systems, Man, and
38
Journal Pre-proof
of
7844584.
pro
735 ing (2020) 106426doi:10.1016/j.asoc.2020.106426.
10.1109/ACCESS.2019.2908285.
39
Journal Pre-proof
of
2015.7320065.
pro
Man, and Cybernetics (SMC), 2016 IEEE International Conference on,
2016, pp. 002558–002563. doi:10.1109/SMC.2016.7844624.
[53] J. Li, Z. Zhang, H. He, Hierarchical convolutional neural networks for eeg-
765 based emotion recognition, Cognitive Computation 10 (2) (2018) 368–380.
doi:10.1007/s12559-017-9533-x.
re-
[54] S. Fu, W. Liu, S. Li, Y. Zhou, Two-order graph convolutional networks for
semi-supervised classification, Iet Image Processing 13 (14) (2019) 2763–
2771. doi:10.1049/iet-ipr.2018.6224.
770 [55] C. Chang, C. Lin, LIBSVM: A library for support vector machines,
lP
ACM Trans. Intell. Syst. Technol. 2 (3) (2011) 27:1–27:27. doi:10.1145/
1961189.1961199.
[57] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. doi:https:
//doi.org/10.1023/A:1010933404324.
Jou
40
Journal Pre-proof
The highlights of our paper are as follow:
A new emotion recognition method using deep learning model based on EEG's differential
entropy is proposed. In contrast to the traditional emotion recognition method, multiple GCNN
structures are utilized to extract the temporal information and graph domain information, and
LSTM is integrated to memorize the change of the relationship between two EEG channels
within a specific time, and the fusion of GCNN and LSTM improves effectiveness of emotion
recognition.
A fusion model of LSTM and GCNN for emotion classification (named ECLGCNN) is
of
proposed, which utilizes the graph and temporal information. In the fusion model, each EEG
channel corresponds to a vertex node, and the functional relationship between two channels
corresponds to edge of the graph where the greater value of the edge is, the closer the functional
pro
relationship between two channels is; LSTM cells’ gates are used to extract effective
information from input (the output of GCNNs) for emotion classification.
Extensive experiments on DEAP are conducted to verify ECLGCNN model and experimental
results demonstrate that the proposed method has better emotion classification performance
than the state-of-the-art methods. The average accuracy reaches 90.45% and 90.60% for
valence and arousal in subject-dependent experiments while 84.81% and 85.27% in subject-
independent experiments.
re-
lP
rna
Jou
Journal Pre-proof
Yongqiang Yin: Conceptualization, Methodology, Software
Xiangwei Zheng: Writing- Reviewing and Editing
Bin Hu: Supervision.
Yuang Zhang: Software
Xinchun Cui: Validation
of
pro
re-
lP
rna
Jou
*Declaration of Interest Statement Journal Pre-proof
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
of
☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
pro
re-
lP
rna
Jou