(A) EEG Emotion Recognition Using Fusion Model of Graph Convolutional

Journal Pre-proof
EEG emotion recognition using fusion model of graph convolutional

neural networks and LSTM
Yongqiang Yin, Xiangwei Zheng, Bin Hu, Yuang Zhang, Xinchun Cui
PII: S1568-4946(20)30892-9
DOI: https://doi.org/10.1016/j.asoc.2020.106954
Reference: ASOC 106954
To appear in: Applied Soft Computing Journal
Received date : 18 August 2020

Revised date : 5 November 2020
Accepted date : 25 November 2020
Please cite this article as: Y. Yin, X. Zheng, B. Hu et al., EEG emotion recognition using fusion
model of graph convolutional neural networks and LSTM, Applied Soft Computing Journal (2020),
doi: https://doi.org/10.1016/j.asoc.2020.106954.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
© 2020 Elsevier B.V. All rights reserved.

Journal Pre-proof
EEG Emotion Recognition Using Fusion Model of
of
Graph Convolutional Neural Networks and LSTM
Yongqiang Yina,b , Xiangwei Zheng∗a,b , Bin Hua , Yuang Zhanga,b , Xinchun

Cuic
pro
a Schoolof Information Science and Engineering, Shandong Normal University, Jinan,
China
b Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology,
Jinan, China
c School of Computer Science, Qufu Normal University, Rizhao, China
Abstract re-
In recent years, graph convolutional neural networks have become research fo-
cus and inspired new ideas for emotion recognition based on EEG. Deep learn-
ing has been widely used in emotion recognition, but it is still challenging to
construct models and algorithms in practical applications. In this paper, we
lP
propose a novel emotion recognition method based on a novel deep learning
model (ERDL). Firstly, EEG data is calibrated by 3s baseline data and divided
into segments with 6s time window, and then differential entropy is extracted
from each segment to construct feature cube. Secondly, the feature cube of each
rna
segment serves as input of the novel deep learning model which fuses graph
convolutional neural network (GCNN) and long-short term memories neural
networks (LSTM). In the fusion model, multiple GCNNs are applied to extract
graph domain features while LSTM cells are used to memorize the change of the
relationship between two channels within a specific time and extract temporal
features, and Dense layer is used to attain the emotion classification results.
Jou
At last, we conducted extensive experiments on DEAP dataset and experimen-

tal results demonstrate that the proposed method has better classification re-
sults than the state-of-the-art methods. We attained the average classification
accuracy of 90.45% and 90.60% for valence and arousal in subject-dependent
Email address: xwzhengcn@163.com ( Xiangwei Zheng∗ )
Preprint submitted to Journal of LATEX Templates November 28, 2020

Journal Pre-proof
experiments while 84.81% and 85.27% in subject-independent experiments.
of
Keywords: EEG, emotion recognition, Long-short term memory neural
network, Graph convolutional neural network, Differential entropy
pro
1. Introduction
Emotions play an important role in daily life and influence the perception
of the surroundings. Recently, many human-computer interaction systems are
established by lots of research communities in domestic and abroad, so the au-
5 tomatic classification of emotional states becomes indispensable. This can be
re-
achieved with a variety of methods, such as subjective self-reporting, neuro-
physiological measurements and so on. In recent years, electroencephalography
(EEG) based emotion recognition has received widespread attention because it
is a simple, cheap, portable, and easy-to-use emotion classification method [1].
10 EEG signals record the relationship between emotional state and brain activity
lP
and reflect very subtle emotional changes with high time resolution [2]. However,
EEG signals have the shortcomings [3] as time asymmetry and instability, low
signal-to-noise ratio, and uncertain brain areas of specific reactions. Therefore,
EEG-based emotion recognition is still a challenging task.
Many researchers have proposed their methods for emotion recognition us-
rna
15
ing EEG, such as emotion recognition methods based on convolutional neural

networks(CNN) [4, 5, 6], deep belief networks [7], graph convolutional neu-
ral network [3, 8] and so on. Recently, graph convolutional neural networks
(GCNN) and long-short term memory neural networks (LSTM) have been grad-
20 ually adopted in this fields. But We are facing problems that how to fuse GCNN
with LSTM and apply it to EEG-based emotion recognition.
Jou
To address the above issues, we propose a novel emotion recognition method

based on the deep learning model. Firstly, EEG data is calibrated by 3s baseline
data and divided into segments with 6s time window, and then differential en-
25 tropy (DE) is extracted from each segment to construct feature cube. Secondly,
the feature cube of each segment serves as the input of deep learning model
2
Journal Pre-proof
which fuses graph convolutional neural network and long-short term memories
of
neural networks. In the fusion model, multiple GCNNs are applied to extract
graph domain features, LSTM cells are used to memorize the change of the rela-
30 tionship between two EEG channels within a specific time and extract temporal
pro
features, and Dense layer is used to attain the emotion classification result.
At last, we conducted extensive experiments on DEAP dataset and experi-
mental results demonstrate that the proposed method has better classification
performance than the state-of-the-art methods. We attained the average classi-
35 fication accuracy of 90.45% and 90.60% on the DEAP dataset for valence and
arousal in subject-dependent experiments while 84.81% and 85.27% in subject-
independent experiments.
re-
The main contributions of this paper are as follows:
• We propose a new emotion recognition method using deep learning model

40 based on EEG’s differential entropy. In contrast to the traditional emotion
lP
recognition method, DE is extracted from each divided segment to gen-
erate feature cube, multiple GCNNs are applied to extract graph domain
features from each feature cube, and LSTM cells are applied to memorize
the change of the relationship between two EEG channels within a specific
45 time and extract temporal features.
rna
• We propose a fusion model of LSTM and GCNN for emotion classification

(named ECLGCNN). In the fusion model, each EEG channel corresponds
to a vertex node, and the functional relationship between two channels
corresponds to edge of the graph where the greater value of the edge
50 is, the closer the functional relationship between two channels is; LSTM
Jou
cells’ gates are used to extract effective information from input (the output
of GCNNs) for emotion classification. The fusion of GCNN and LSTM
improves effectiveness of emotion recognition.
• We conduct extensive experiments on DEAP dataset to verify ECLGCNN

55 model. Experimental results demonstrate that the proposed method has
3
Journal Pre-proof
better emotion classification performance than the state-of-the-art meth-
of
ods. The average accuracy of 90.45% and 90.60% for valence and arousal
in subject-dependent experiments are achieved while 84.81% and 85.27%
in subject-independent experiments.
pro
60 The remainder of this paper is as follow. We briefly review EEG features,
GCNN, LSTM and emotion recognition methods on DEAP dataset in section
2. In section 3, we present the proposed emotional recognition method and
its key components, including DE, fusion model of GCNN and LSTM and its
architecture. In section 4, we introduce the DEAP data set adopted in the
65 experiments and the evaluation indicators and analyze the experimental results
re-
of ECLGCNN on DEAP in detail. We conclude the paper and discuss the future
work in section 5.
2. Related work
lP
2.1. EEG features
70 The research on applying EEG signals to emotion classification can be traced

back to the work of Mush [9]. They extracted a set of 135 state variables as EEG
features of cross-correlation coefficients of EEG signals collected with ten scalp
rna
electrodes. In recent decades, many machine learning and signal processing

methods have been proposed to deal with EEG emotion classification [10, 11].
75 EEG features adopted in emotion classification can usually be divided into three
types, namely, the time domain features, the frequency domain features and the
time-frequency domain features. Time domain features, such as Hjorth feature
[12], fractal dimension feature [13] and high-order cross feature [14], mainly
Jou
capture the time domain information of EEG signals. Frequency domain fea-
80 tures aim to capture EEG emotional information from a frequency perspective.
Feature extraction in frequency domain consists of two steps. The first step
is to decompose the EEG signal into several frequency bands, including δ fre-
quency band (1-3Hz), θ frequency band (4-7Hz), α frequency band (8-13Hz), β
frequency band (14-30Hz) and γ frequency band (31-50Hz) [10, 15, 16, 17]. The
4
Journal Pre-proof
85 second step is to extract EEG features from each frequency band. Commonly
of
used EEG features include differential asymmetry [13],differential entropy (DE)
[18, 19], power spectral density [19], approximate entropy [20], sample entropy
[21] and Rational Asymmetry [22].
pro
Due to the instability of EEG signals, it is unilateral to individually use the
90 time domain information or frequency domain information, so more and more
studies simultaneously utilize the time and the frequency domain information to
embody the nature of EEG signals. The features of fusing time and frequency
domain are called time-frequency features. Common feature extraction methods
include short-time Fourier transform [23], wavelet transform [24] and so on.
95
re-
2.2. Graph convolutional neural network
The ability of convolutional neural networks (CNN) to learn local station-

ary structures and evolve them to form multi-scale hierarchical patterns has led
to breakthroughs in image processing, video processing, and sound recognition
lP
tasks [25]. However, the advantages of CNN also limit CNN’s capability of pro-
100 cessing irregular and non-Euclidean domains data that can be structured with
graphs. Graph convolutional neural network [25, 26] is an extension of the tradi-
tional convolutional neural network by combining convolutional neural network
rna
and spectrum theory. Compared with the classical convolutional neural net-
work, graph convolutional neural network is advantageous in the discriminative
105 feature extraction of signals in the discrete spatial domain [27].
Graph convolutional neural network applies convolution operations to the
transformed graph, but the definition of convolution operation is the key. Fourier
transform is introduced to the graph, and the convolution theorem is adopted,
Jou
therefore the convolution operation can be expressed by the product of two

110 Fourier transforms. Graph convolutional neural network provides an effective
method to describe the internal relationship between different graph’s nodes,
which provides a way to explore the relationship among multiple EEG channels
in the EEG emotion classification [8].
5
Journal Pre-proof
In emotion recognition, Song et al. [8] proposed a dynamic graph convolu-
of
115 tional neural network and they applied the dynamic graph convolutional neural
network to the SEED and DREAMER data set and achieved good results. Wang
et al. [3] introduced a broad learning system and proposed a model that com-
pro
bines dynamic convolutional neural network and broad learning system. They
also applied the model to the SEED and DREAMER to verify the effectiveness
120 of emotion recognition.In image processing, Zhu et al. [28] adopted graph convo-
lutional neural network to extract the features of graph-structured data. Levie
et al. [29] proposed a CayleyNets based on graph convolutional neural network
and they made use of MNIST, CORA and MovieLens datasets to verify Cay-
125
re-
leyNets and attained good experimental results. Valsesia et al. [30] proposed a
convolutional neural network with graph convolutional layer and applied it to
recover an image from a noisy observation.
2.3. LSTM
lP
Recurrent Neural Network (RNN) owns the good capability of addressing
time series data and is commonly used in natural language processing. With
130 the special network structure, RNN memorizes the previous information and
utilizes them to influence the output of the subsequent nodes. The typical
rna
characteristic of RNN architecture is a cyclic connection, which enables RNN

to own the capacity to update the current states based on past states and
current input data. However, RNN consisting of sigma cells or tanh cells is
135 unable to learn the relevant information of input data when the relevant input
is large. By introducing gate functions into the cell structure, LSTM can handle
the problem of long-term dependencies well [31]. RNN with long short-term
Jou
memories become an effective and scalable model for solving several learning
problems related to sequential data. LSTM architecture contains two important
140 units, namely, a storage unit and a nonlinear gating unit. The storage unit can
maintain its states over time and the nonlinear gating units can regulate the
information flow into and out of the units [32].
Yang et al. [33] proposed a novel approach to video captioning based on
6
Journal Pre-proof
adversarial LSTM and their method aimed at compensating for the deficiencies
of
145 of LSTM-based video captioning method. Yu et al. [34] proposed an end-to-
end model based on LSTM to optimize biomedical event extraction. Salma et
al. [35] designed a multi-layer LSTM framework for emotion recognition and
pro
applied it to the DEAP dataset.
2.4. Emotion recognition methods on DEAP
150 The public release of DEAP [36] dataset provides opportunities for the re-
searchers in the field of emotional recognition. Prior to DEAP, most researchers
focused on analyzing facial expressions and speech to determine a person’s emo-
155
re-
tional states [37]. Recently, many researchers have proposed their own emotional
recognition methods for DEAP. Tripathi et al. [4] used the time domain fea-
tures from EEG to train deep neural network (DNN) and CNN respectively,
and the final classification accuracy exceeded 73%. Li et al. [5] applied wavelet
features to train CNN combined with LSTM and the binary classifications ac-
lP
curacy reached 72%. Salma et al. [35] designed a multi-layer LSTM framework
to learn features from EEG signals, then the dense layer classified emotions
160 into low/high arousal and valence. They used DEAP to verify their method
and achieved an average accuracy of 85.65% and 85.45% with arousal and va-
rna
lence classes, respectively. Chen et al. [38] proposed an enhanced EEG-based

emotion assessment system and achieved an average accuracy of 69.09% and
67.89% with arousal and valence classes, respectively. Xing et al. [39] proposed
165 a SAE+LSTM classification model, and their experiments indicated that the
binary classification results were 81.10% in valence and 74.38% in arousal, re-
spectively. Wang et al. [40] proposed a 3D CNN for emotion classification and
Jou
got classification accuracy of 72.10% in the valence and 73.10% in the arousal.
Liu et al. [41] used bimodal deep autoencoder to generate new features, and then
170 fed the new features into support vector machines (SVM) to complete emotions
classification. They attained the accuracy of 85.20% for binary classification of
valence and 80.50% of arousal. Mert and Akan [42] first normalized the IMF
generated by the decomposition of multiple empirical modes and extracted 10
7
Journal Pre-proof
features as PSD, entropy etc., then processed them with ICA and fed them
of
175 into artificial neural network. They attained 72.87% in binary classification of
valence and 75.00% in binary classification of arousal. Thammasan et al. [43]
extracted the fractal dimension and power spectral density from EEG data, and
pro
then put the extracted features into the SVM classifier. They attained 73.00%
in binary classification of valence and 72.50% in binary classification of arousal.
180 Zhang et al. [44] decomposed the EEG signal into four bands, i.e. theta, al-
pha, beta and gamma, and used FFT to calculate the power as EEG features
which were input to the PNN. The experimental results showed that the mean
classification accuracy of PNN was 81.21% for valence(≥ 5 and <5) and 81.26%
185
re-
for arousal(≥5 and < 5). He et al. [45] proposed firefly integrated optimization
algorithm (FIOA) to simultaneously accomplish multiple tasks, i.e. the optimal
feature selection, parameter setting and classifier selection according to different
EEG-based emotion datasets. The experimental results showed that the aver-
age classification accuracy of FIOA was 86.90% for positive emotion (valence
lP
≥5 and arousal ≥5) and negative emotion (valence <5 and arousal <5).
190 Although the above recognition methods achieved some progresses in some
applications, the classification accuracy is relatively low. Graph convolutional
neural networks have shown successful applications, and inspired new ideas for
rna
emotion recognition based on EEG. Therefore, we attempt to develop a novel

emotion recognition method based on a deep learning model to improve accuracy
195 of emotion classification.
3. Method
3.1. Emotion recognition method

Jou
A primary issue with emotion recognition is that subjects show different

subjective emotion states as responses to the same stimuli. Many psychologists
200 conducted researches and proposed several theories. Typical emotion classifica-
tion model includes:
8
Journal Pre-proof
• Discrete model, which classifies emotion states based on developmental
of
features that contains positive emotions (amusement, joy, tenderness) and
negative emotions (anger, sadness, fear, disgust) [46].
205 • Dimensional model, which is expressed in terms of two dimensional states
pro
affecting subjects: Valence (disgust, pleasure) and Arousal (calm, excite-
ment) [47, 48, 49].
re-
lP
Figure 1: Emotion recognition method using deep learning model based on EEG’s differential
entropy. The size of iT feature cube is T × CN × F N . T is T seconds; CN is the number of
EEG channels; FN is the number of features.
Our study is based the above researches, and Figure 1 illustrates the pro-
rna
posed emotion recognition method using deep learning model based on EEG’s
210 differential entropy. The proposed ERDL method consists of four steps.
(1) Data calibration. Firstly, 3 seconds baseline EEG data, which is generated
spontaneously by the brain, is copied 20 times and linked one by one.
Then, EEG data of watching 60 seconds video subtracts corresponding
baseline data where the purpose of the processing is to remove EEG data
Jou
215 noise generated spontaneously by the brain [50].
(2) Data division. The calibrated EEG data is divided into ((60 - T )/S + 1)
segments for each trial of each subject. In the following experiments, we
set T to 6 seconds and S to 3 seconds referencing to the experimental
results of literatures [51, 52].
9
Journal Pre-proof
220 (3) Feature extraction. The experimental results [3, 8, 53] show that differen-
of
tial entropy of EEG data has a higher accuracy in emotion classification,
so the proposed method utilizes the differential entropy of EEG data. Sec-
tion 3.2 briefly introduces the extraction process of differential entropy of
pro
EEG data.
225 (4) Emotion recognition. A novel ECLGCNN is developed and used to emo-
tion recognition in this paper. ECLGCNN contains three layers, namely,
GCNNs layer, LSTMs layer and Dense layer. GCNNs layer is to attain
graph domain and temporal information from EEG channels’ DE features
while LSTMs and Dense layer are used to predict the low/high arousal or
re-
230 negative/positive valence according to the output of GCNNs layer.
3.2. Differential entropy
Differential entropy is an extension of Shannon entropy and used to measure

the complexity of continuous random variables. Differential entropy has higher
lP
capability to reflect the changes of vigilance and is one of the most accurate and
235 stable EEG features [18]. In several works [3, 8, 53], differential entropy is used
as the EEG signal feature for emotion recognition and the highest recognition
accuracy is achieved compared with other features.
rna
In this study, differential entropy is extracted from each segment to construct

feature cube, which serves as the input of the fusion model of GCNN and LSTM.
240 The mathematical formulation of the differential entropy is defined as
Z ∞
1 (x−µ)2 1 (x−µ)2
DE(X) = - √ e− 2σ 2 log2 √ e− 2σ 2 dx
-∞ 2πσ 2 2πσ 2 (1)
1
= log2 (2πe) + log2 (σ)
Jou
2

where x ∼ N µ, σ 2 , e and π are constants. In this paper, for the characteristics
of EEG signals, DE is extracted from five main bands, namely δ frequency band
(1-3Hz), θ frequency band (4-7Hz), α frequency band (8-13Hz), β frequency
band (14-30Hz) and γ frequency band (31-50Hz).
10
Journal Pre-proof
of
pro
re-
lP
rna
Jou
Figure 2: Architecture of ECLGCNN
11
Journal Pre-proof
245 3.3. Fusion model of LSTM and GCNN for emotion recognition
of
The whole architecture of the fusion model of LSTM and GCNN for emotion
recognition is shown in Figure 2, which contains three layers, namely, GCNNs
layer, LSTMs layer and Dense layer. The GCNNs layer is used to calculate the
pro
relationship between two EEG channels during a period of time, the LSTMs
250 layer is devoted to memorize changes between two EEG channels in a certain
period, and the Dense layer completes the final emotion recognition according
to the LSTMs layer’s output result. In the GCNNs layer, we set T GCNNs to
extract the graph domain features from DE of T seconds’ EEG data. In other
words, we make use of graph domain information and time domain information
255
re-
to improve the EEG emotion recognition. The LSTMs layer contains input layer,
hidden layer and output layer. We set T LSTM cells to receive the calculation
results of T GCNNs, and set the number of LSTM hidden layer cells to num cell.
The LSTMs layer is used to memorize changes between two EEG channels in
T seconds. The Dense layer is a fully connected layer and is used to perform
lP
260 data dimension transformation. In the end of the fusion model, the Dense layer
is used to attain recognition result with feeding of the LSTMs layer’s output
result.
rna
3.3.1. Design of parallel GCNNs

In order to attain the temporal information from DE of EEG data, we design
265 the parallel computing mode of GCNN, that is, GCNNs layer as shown in Figure
2. GCNNs layer receives T-second EEG data’s features in chronological order
and outputs the calculation results to the LSTMs layer in time sequence. The
GCNN provides an effective way to describe the internal relationship between
Jou
different nodes of the graph, which provides a potential way to explore the
270 relationship among multiple EEG channels in the emotion recognition using
EEG [3].The details of i-th GCNN structure for GCNNs layer is shown in Figure
3. The calculation process of GCNN is comprised of two steps as follow.
(1) Graph Representation. Inspired by the successful applications of graph
convolutional neural network model in image processing [28, 29, 30, 54], this
12
Journal Pre-proof
of
pro
Figure 3: Graph convolutional neural network structure
275
re-
paper studies the problem of multi-channel EEG emotion recognition through
graph representation method. In the proposed graph representation, each EEG
channel corresponds to a node [8], and the functional relationship between two
channels corresponds to the edge of the graph and the value of the edge repre-
sents the close degree of the functional relationship. Meanwhile, the greater the
lP
280 value of the edge is, the closer the functional relationship of two channels is.
The graph can be defined as G= {V, ε, A} , where V is the set of N nodes,
ε is the edge set, A ∈ RN ×N is the adjacency matrix of node set V, and Ai,j is
the functional relationship between node i and node j.The common method for
rna
determining Ai,j of the adjacency matrix A is k nearest neighbor (k-NN). The

typical distance function is a Gaussian kernel function, which is expressed as

dist2

 − 2θ2i,j
Ai,j = e , disti,j 6 τ (2)

 0, other
where θ and τ are two parameters to be fixed, disti,j is the Euclidean distance
Jou
between the i-th vertex node and the j-th vertex node.
(2) Spectral Graph Filtering. Spectral graph filtering, also known as graph
convolution, is a popular signal processing method used for graph data opera-
285 tions, where graph Fourier transform (GFT) is a typical method.
13
Journal Pre-proof
Symmetric normalized Laplacian matrix L of graph G is defined as follows:
of
1 1
L = E − D− 2 AD− 2 , A and D ∈ RN ×N (3)
P
where D ∈ RN ×N is a diagonal matrix, Di,i = Ai,j ;E is an identity matrix.
j
pro
For a given spatial signal x ∈ RN ×F N (FN is the number of features), its
GFT is as follows
x̂ = U T x (4)
where x̂ is the transformed signal in the frequency domain; U is an orthogo-

nal matrix, which can be obtained by the singular value decomposition of the
Laplacian matrix L as follows
re- L = U ΛU T (5)
The inverse of GFT can be expressed as
x = U U T x = U x̂ (6)
lP
The convolution operation of two signals x and y on graph *G is defined as

x ∗G y =U UT x UT y (7)
where is the Hadamard product.

g (·) is a filter function, so the signal x filtered by g (L) can be expressed as
rna

y = g (L) x = g U ΛU T x = U g (Λ) U T x (8)
where g (Λ) is expressed as follow

 
g (λ1 ) · · · 0
 
 .. .. .. 
g (Λ) =  . . .  (9)
 
Jou
0 ··· g (λN )
where λ1 , λ2 , · · · , λN are the eigenvalue of L.

Because the computation of g (Λ) is too expensive in practical problems,
K-order Chebyshev polynomial is adopted to approximate g (Λ) in [3], that is
K−1
X
g (Λ) = θk Tk Λ̃ (10)
k=0
14
Journal Pre-proof
where θk is the Chebyshev polynomial coefficient and Tk (•) is the calculation
of
method of Chebyshev polynomial [3] whose calculation process is formulated as
T0 (x) = 1
pro
T1 (x) = x (11)
Tk (x) = 2xTk−1 (x) − Tk−2 (x) , k ≥ 2
Combining formula (10), formula (8) can be converted into the following
calculation form
y = U g (Λ) U T x
 
θk Tk (λ1 ) · · · 0
P 
K−1
re- .. .. ..

 T
= U . . . U x (12)
k=0  
0 ··· θk Tk (λN )
P
K−1
= θk Tk L̃ x
k=0
L
where L̃ = − E; E is an identity matrix.
lP
λM AX
290 The GCNN’s calculation steps are as follow:
(1) Calculate the relationship between two EEG channels with the K-nearest
neighbor;

rna
(2) Apply Chebyshev polynomial to calculate Tk L̃ x;
(3) Set Chebyshev polynomial coefficients θ as the convolution kernel to imple-

295 ment the convolutional operation on (T0 L̃ x, T1 L̃ x, · · · , TK−1 L̃ x).
3.3.2. Architecture of LSTM

LSTM is a neural network with long and short-term memory and has become
Jou
an effective and scalable model for solving several learning problems related to
sequential data [32]. The purpose of LSTMs layer in Figure 2 is to memorize
300 the change of the relationship between two EEG channels in T seconds. LSTMs
layer defines three layers: the first layer is the input layer which is used to receive
the results from GCNNs layer; the second layer is hidden layer and is used to
memorize the change of the relationship between EEG channels in T seconds;
15
Journal Pre-proof
the last layer is used to output the information of emotion recognition. Next,
of
305 we depict LSTM cell’s structure. The structure of each LSTM cell structure is
shown in Figure 4 .
pro
re-
Figure 4: The structure of LSTM cell
The forget gate, input gate and output gate of LSTM cells can be used to
add and remove information to cell states and are defined as follow
ft = σ (Wf · [ht−1 , yt ] + bf ) (13)

lP
it = σ (Wi · [ht−1 , yt ] + bi ) (14)
ot = σ (Wo · [ht−1 , yt ] + bo ) (15)

rna
The calculation method of current memory states of the cell is defined as
c̃t = σ (Wc · [ht−1 , yt ] + bc ) (16)
The calculation method of cell state is defined as
ct = ft ct−1 + it c̃t (17)

Jou
The calculation method of cell output is defined as
ht = ot tanh (ct ) (18)
where σ (•) is the activation function, ht−1 is the output of LSTM cell at the
last moment, ct−1 is the state of LSTM cell at the last moment and bi , bo , bc are
bias.
16
Journal Pre-proof
310 3.3.3. Algorithm description of ECLGCNN
of
The loss function of ECLGCNN model is defined in formula (19):
Loss = cross entropy (p, l) + αkW k2 (19)
pro
where p is the predicted value of the model, l is the label, W is all parameters
of the model, and α is the regularization coefficient. The cross-entropy function
cross entropy (p, l) aims to measure the difference between the actual label and
the predicted value of the model, while the regular term αkW k2 aims to reduce
315 the overfitting of the model’s learning parameters.
The update rule of graph convolution parameter is defined in formula (20)
[25]:
re- θ∗ =θ∗ +λ
∂Loss
∂θ∗
(20)
where θ* ∈ RK×T is Chebyshev polynomial coefficient for T GCNNs, and λ is

the learning rate. ECLGCNN algorithm is described in Algorithm1.
lP
Algorithm 1 The description of ECLGCNN
Input: Sample collection F S ∈ Rn×T ×CN ×F N ,Data label l, Chebyshev order
320 K, Learning rate λ,The maximum number of iterations MAX, Stop iteration
threshold e, Regularization weight α, The number of LSTM hidden layer
cells num cell, The number of graph convolution structures T.
rna
Output: The desired parameters of ECLGCNN

1: Initialize θ* ∈ RK×T and other parameters to be learned in the ECLGCNN
325 model

2: //Calculate(T0 L̃ x, T1 L̃ x, · · · , TK−1 L̃ x) for each sample.
3: for i = 0;i < n;i + + do
for j = 0;i < T ;j + + do
Jou
4:
5: x = F Si,j,:,:
330 6: Calculate adjacency matrix A of x according to k-NN and formula
(2)
7: Calculate the Laplace matrix L of x according to formula (3)
L
8: L̃ = λM AX −E
17
Journal Pre-proof

9: Calculate Tk L̃ of x according to formula (12)
of

335 10: tempi,j = (T0 L̃ x, T1 L̃ x, · · · , TK−1 L̃ x)
11: end for
12: end for
pro
13: Step count = 0
14: while Loss > e||Step count < M AX do
∗

340 15: yj = sigmoid batch norm tempi,j ∗ θ·,j (i = 1, 2, ..., n; j = 1, 2, ..., T )
16: //* is a convolution operation.
17: Convert yj to column vector yj∗
18: y ∗ = (y1∗ , y2∗ , ..., yT∗ )
345
19:
20:
re-
Send y ∗ to the receiving cells of LSTM
Calculate Loss according to formula (19)
21: if Loss < e then
22: Break
23: end if
lP
24: Update the parameters of LSTM based on the Loss and BP algorithm
350 25: Update graph convolution parameters according to formula (20)
26: Step count = Step count + 1
27: end while
rna
4. Experiments and discussion
4.1. DEAP
355 The experiments in this paper are based on multimodal DEAP data set.
DEAP is a large open source dataset that contains multiple physiological signals
with sentiment evaluation. In its data collection experiments, evoked EEG,
Jou
ECG, EMG and other bioelectric signals were detected and recorded, and 32
subjects (16 males and 16 females) were involved in 40 trials of music videos
360 with different emotional tendencies, where each music video was 1 minute. After
watching the music videos, participants rated the videos they watched on a scale
of 1-9 based on arousal, valence, likes, dominance and familiarity. Scored values
from small to large indicate that each indicator is from weak to strong.
18
Journal Pre-proof
In this paper, we used 32 channels data in the dataset, that is, only EEG
of
365 data is used. Eye myoelectricity, eye movement, and power supply noise of
EEG data are removed and the sampling rate is adjusted to 128 Hz. The du-
ration of the EEG signal is 63 seconds, including 3 seconds of pre-trial baseline
pro
data and 60 seconds of watching emotional video. For subject-dependent ex-
periments, we used data set of each subject to validate ECLGCNN. In order
370 to verify the model generalization, the data set of each subject was collected
into a sample set, so as to train and verify ECLGCNN. We defined the label
of DEAP EEG data as follows: arousal/valence with self-score more than 5
is high arousal/positive valence, otherwise it is low arousal/negative valence.
375
re-
The following section 4.3 and 4.4 discuss subject-dependent experiments and
subject-independent experiments, respectively.
4.2. Evaluation indices
The classification accuracy Acc and F-score are used to evaluate ECLGCNN
lP
model. The Acc is expressed as
TP + TN
Acc = (21)
TP + TN + FP + FN
where TP is the number of samples that the classification model can accurately
rna
identify low arousal/negative valence (named positive examples); TN is the

number of samples that the classification model can accurately identify the
380 high arousal/positive valence (named negative examples); FP is the number of
negative examples of error classification; FN is the number of positive examples
of error classification
The precision Pre is defined as
Jou
TP
P re = (22)
TP + FP
The calculation way of recall rate Rec is defined as
TP
Rec = (23)
TP + FN
19
Journal Pre-proof
F-score is an extension of the classification accuracy, which combines the
of
precision and the recall rate, and the calculation method is defined as
2 × P re × Rec
F − score = (24)
Rec + P re
pro
4.3. Subject-dependent experiments on DEAP
60−T

In this paper, each subject’s EEG data is divided into S + 1 × 40
385 samples, for example, when T is set to 6 and S is set to 3, 760 samples are
generated, and 3 times 5-fold cross-validation with random strategy was adopted
to verify ECLGCNN.
We selected No.16 subject’s data to choose the model parameters, because
390
re-
the number of positive samples and negative samples of No.16 subject is same in
binary classification of arousal. Then, we explored the influence of Chebyshev
polynomial order (K) and the number of LSTM hidden layer cells (num cell) on
emotion classification with ECLGCNN model. The influence of K and num cell
lP
is shown in Figure 5. We found that compared with num cell, the value of K
has the greater influence on ECLGCNN; when the num cell is 30 and the K is
395 2, ECLGCNN can reach the highest accuracy in binary classification of arousal.
Therefore, we set num cell to 30 and K to 2 in the following experiments. The
parameter settings of ECLGCNN is listed in Table 1.
rna
Jou
20
Journal Pre-proof
of
pro
re-
Figure 5: Experimental values of num cell and K
lP
Table 1: Parameter settings
ECLGCNN model parameters Values
The number of GCNNs T 6
The number of Chebyshev coefficients K 2
rna
The number of LSTM hidden layer cells(num cell) 30

GCNN activation function type sigmoid
LSTM activation function type sigmoid
The number of nodes in the graph 32
Maximum number of model iterations MAX 100000
Model error threshold e 0.1
Jou
Model learning rate λ 0.003

Model regular term coefficient α 0.0008
Experiments are conducted based on the above parameters and emotion

classification results of ECLGCNN model for each subject are shown in Table
400 2.
21
Journal Pre-proof
Table 2: ECLGCNN classification results
of
Binary classification of valence Binary classification of arousal
Subject
Accuracy(%) F-score(%) Accuracy(%) F-score(%)
pro
01 93.42 93.13 94.21 95.29
02 87.73 88.88 86.84 89.06
03 93.24 93.76 94.12 84.35
04 90.65 88.49 87.46 84.13
05 88.90 90.76 87.85 87.07
06 89.52 93.08 89.13 87.43
07
08
09
94.34
91.79
91.84
re- 95.97
92.52
91.80
89.87
88.64
88.99
92.04
90.26
90.74
10 93.90 93.81 90.92 92.08
11 80.79 84.33 86.58 82.14
lP
12 87.53 87.86 89.65 93.80
13 89.65 87.46 92.32 95.47
14 89.82 89.53 88.16 91.33
15 94.43 94.36 91.88 91.24
rna
16 92.76 90.17 94.52 94.42

17 82.62 83.65 88.73 90.60
18 93.11 94.18 92.37 93.95
19 92.94 93.92 92.11 94.18
20 90.48 91.51 93.68 95.95
21 91.36 91.75 94.35 96.52
Jou
22 91.45 90.72 88.59 90.54

23 93.47 94.99 93.55 87.03
24 89.43 88.17 92.94 95.67
25 87.50 86.86 91.93 94.42
26 89.30 91.81 89.61 87.51
22
Journal Pre-proof
Binary classification of valence Binary classification of arousal
of
Subject
Accuracy(%) F-score(%) Accuracy(%) F-score(%)
27 89.22 91.75 91.01 93.27
28 91.01 92.74 85.00 83.08
pro
29 92.67 93.71 94.34 95.42
30 92.28 94.33 91.41 90.84
31 88.78 90.24 88.63 87.73
32 88.33 88.31 89.74 92.47
Average 90.45 91.08 90.60 90.94
re-
From Table 2, experimental results indicate that the minimum, maximum
and average classification accuracies of ECLGCNN for 32 subjects are 80.79%,
94.43% and 90.45% respectively in binary classification of valence. For the
binary classification of arousal, the minimum, maximum and average classifica-
405 tion accuracies of ECLGCNN for 32 subjects are 85.00%, 94.52% and 90.60%
lP
respectively. On the other hand, the average of F-score reaches more than 90%
in these two classification tasks.
In the following, the classification results of support vector classification
(SVC) [55], decision tree (DT) [56] and random forest (RF) [57] are compared
rna
410 with ECLGCNN using the same features. The comparison results are shown in
Figure 6, Figure 7, Figure 8 and Figure 9 respectively.
From Figure 6(a) and Figure 6(b), the classification accuracy and F-score of
ECLGCNN are relatively stable in binary classification of arousal, while the clas-
sification accuracy and F-score of SVC are unstable compared with ECLGCNN,
415 RF and DT. The reason for this phenomenon in binary classification of arousal
Jou
is that SVC fails to find an optimal classification surface. The classification ac-
curacy and F-score of DT are closer to RF, because the strategies of generating
decision tree for RF and DT are similar. However, ECLGCNN makes use of
temporal and graph features from DE of EEG data to find a best classification
420 surface compared with SVC, DT and RF. On the whole, this shows that our
proposed model is effective in binary classification of arousal.
23
Journal Pre-proof
of
1
0.9
0.8
0.7
0.6
Accuracy
pro
0.5
0.4
ECLGCNN
0.3 SVC
DT
0.2 RF
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject
(a) Accuracy of the four classifiers on 32 subjects’ high/low arousal
0.9
0.8
0.7
0.6
re-
F-score
0.5
0.4
ECLGCNN
0.3 SVC
DT
0.2 RF
0.1
lP
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject
(b) F-score of the four classifiers on 32 subjects’ high/low arousal
Figure 6: Comparison of ECLGCNN, SVC, DF and RF in binary classification of arousal

rna
1.2
ECLGCNN
1.1 SVC
DT
1 RF
0.9060 0.9094
0.9
0.8131 0.8060 0.8211 0.8321 0.8152 0.8101
0.8
0.7
0.6
0.5
Jou
0.4
0.3
0.2
0.1
0
Accuracy F-score
Figure 7: Average classification result of the four classifiers on 32 subjects’ high/low arousal
24
Journal Pre-proof
As shown in Figure 7, the average classification accuracy and F-score of
of
ECLGCNN are the highest among four classifiers in binary classification of
arousal. The average F-score of SVC is higher compared with DT and RF,
425 while the average classification accuracy of SVC is the lowest. The reason for
pro
this phenomenon is that the classification accuracy and F-score of SVC are
unstable for 32 subjects. The average classification accuracy and F-score of DT
are closer to RF and the reason for this phenomenon is that RF is the extension
of DT. The average classification accuracy of ECLGCNN is at least 8.49% higher
430 than the other three classifiers in binary classification of arousal. In summary,
from Figure 6 and Figure 7, ECLGCNN is effective in binary classification of
arousal.
re-
Next, we show the comparison results of ECLGCNN, SVC, DF and RF in
binary classification of valence in Figure 8 and Figure 9.
1
0.9
lP
0.8
0.7
0.6
Accuracy
0.5
0.4
ECLGCNN
0.3 SVC
DT
0.2 RF
0.1
0
rna
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject
(a) Accuracy of the four classifiers on 32 subjects’ high/low valence
0.9
0.8
0.7
0.6
F-score
0.5
0.4
Jou
ECLGCNN
0.3 SVC
DT
0.2 RF
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject
(b) F-score of the four classifiers on 32 subjects’ high/low valence
Figure 8: Comparison of ECLGCNN, SVC, DF and RF in binary classification of valence
25
Journal Pre-proof
435 From Figure 8(a) and Figure 8(b), the classification accuracy and F-score
of
of ECLGCNN are relatively stable in binary classification of valence. The clas-
sification accuracy of ECLGCNN for subject 22 is lower than SVC, and the
F-score of ECLGCNN for subject 12, subject 13, subject 22 and subject 24 is
pro
lower than SVC. These experimental results show that SVC is the best at binary
440 classification of valence based on DE compared with DT and RF, and SVC is
lower than ECLGCNN. The classification accuracy and F-score of DT are the
lowest in binary classification of valence compared with ECLGCNN, SVC and
RF.
1.2
1.1
0.9
0.8
0.9045
re-
0.8246
0.7936 0.8110
0.9108
0.8614
ECLGCNN
SVC
DT
RF
0.8068 0.8053
0.7
0.6
0.5
lP
0.4
0.3
0.2
0.1
0
Accuracy F-score
rna
Figure 9: Average classification result of the four classifiers on 32 subjects’ high/low valence
As shown in Figure 9, the average classification accuracy and F-score of

445 ECLGCNN are the highest in binary classification of valence compared with
SVC, DT and RF. As shown in Figure 7 and Figure 9, SVC is better in binary
classification of valence than arousal, but RF is opposite to SVC. The classifica-
tion results of ECLGCNN and DT in binary classification of valence are closer
Jou
to that in binary classification of arousal. The average classification accuracy of

450 ECLGCNN is at least 7.09% higher than other classifiers in binary classification
of valence. In summary, from Figure 8 and Figure 9, ECLGCNN is effective in
binary classification of valence.
Meanwhile, we compare ERDL with other methods and the comparison
26
Journal Pre-proof
results are shown in Figure 10. Experimental results show that our method
of
455 is the most effective compared with the other two methods. Our method is
5% higher than Salma’s method [35] in binary classification of valence and
4.95% higher in binary classification of arousal, where they designed a multi-
pro
layer LSTM framework for emotion recognition. Our method is also 5.25%
higher than Liu’s [41]method in binary classification of valence and 10.10%
460 higher in binary classification of arousal, where they combined bimodal deep
autoencoder and SVM to recognize emotions. In addition, we compare ERDL
with He’s (the last name of author) method [45]. Our experimental results
are at least 3.55% higher than their experimental results. Their experiments
465
re-
were conducted on positive emotion (valence>5 and arousal>5) and negative
emotion (valence<5 and arousal<5), but our experiments were conducted on
positive (valence>5)/negative (valence<5) valence and high (arousal>5)/low
(arousal<5) arousal. On the whole, our experimental results are higher than
their experimental results.
lP
ERDL
Liu et al.
Salma et al.
1
0.9045 0.9060
0.9 0.8545 0.8565
0.8520
0.8050
0.8
0.7
rna Accuracy
0.6
0.5
0.4
0.3
0.2
0.1
0
Valence Arousal
Jou
Figure 10: Comparison with other methods in Subject-dependent experiments
From the above experimental results, we may conclude that ERDL is the
470 most effective in the subject-dependent experiments, which contributes to the
fusion of GCNN and LSTM.
27
Journal Pre-proof
of
Table 3: Parameter settings
ECLGCNN model parameters Values
The number of GCNNs T 6
The number of Chebyshev coefficients K 10
pro
The number of LSTM hidden layer cells(num cell) 150
GCNN activation function type sigmoid
LSTM activation function type sigmoid
The number of nodes in the graph 32
Maximum number of model iterations MAX 100000
Model error threshold e 0.12
re-
Model learning rate λ
Model regular term coefficient α
0.003
0.00008
4.4. Subject-independent experiments on DEAP

lP
In this section, we synthesize 32 subjects’ data into one data set which
contains 24320(32 × 760) samples. We also adopt 3 times 5-fold cross-validation
475 with random strategy to verify ECLGCNN. The purpose of the experiments
in this section is to analyze whether ECLGCNN is effective in reducing the
difference among subjects. In order to reduce the difference among subjects’
rna
EEG, we expanded K and num cell by 5 times, respectively. The remaining

parameters are shown in Table 3. Experiments are conducted based on the
480 parameters in Table 3, and the emotion classification results of ECLGCNN
model for all subjects are obtained. For different classifiers, we compared SVC,
DT and RF with ECLGCNN with the same features. The comparison results
are shown in Figure 11 and Figure 12.
Jou
From Figure 11 and Figure 12, the average classification accuracy and F-
485 score of ECLGCNN are the highest in binary classification of arousal and va-
lence. The average classification accuracy of ECLGCNN is at least 7.74% higher
than the other three classifiers in binary classification of arousal and at least
8.29% higher than other three classifiers in binary classification of valence. The
28
Journal Pre-proof
average classification accuracy of SVC is the lowest in binary classification of
of
490 arousal and valence compared with the other three classifiers, because SVC is
sensitive to the choice of parameters and kernel functions, and fails to find the
optimal parameters. Therefore, ECLGCNN is effective in subject-independent
pro
emotion classification.
1.2
ECLGCNN
1.1 SVC
DT
1 RF
0.9 0.8713
0.8527
0.7888 0.7877
0.8 0.7568 0.7753
0.7317
0.7 0.6777
0.6
0.5
0.4
0.3
0.2
re-
0.1
0
Accuracy F-score
lP
Figure 11: Average classification result of four classifiers for low/high arousal in Subject-
independent experiments
1.2
ECLGCNN
1.1 SVC
rna
DT
1 RF
0.9 0.8621
0.8481
0.8 0.7509 0.7652 0.7732 0.7636
0.7098 0.7189
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Jou
0
Accuracy F-score
Figure 12: Average classification result of four classifiers for positive/negative valence in
Subject-independent experiments
We also compare with other state-of-the-art emotion recognition methods in
29
Journal Pre-proof
495 literatures and the comparison results are shown in Figure 13. Classification
of
accuracy of ERDL method is the highest compared with the other methods
in emotion recognition. ERDL is at least 3.4% higher than other methods in
binary classification of valence and is at least 3.51% higher than other methods
pro
in binary classification of arousal. Specifically, ERDL is at least 3.4% higher
500 than Tripathi’s method [4] who made use of the time domain features of EEG
signals and deep learning model. ERDL is at least 11.15% higher than Li’s
method [5] where they made use of a fusion model which fuses CNN and LSTM.
ERDL is at least 3.71% higher than Xing’s method [39] where they proposed a
SAE+LSTM classification model. ERDL is at least 11.97% higher than Wang’s
505
re-
method [40] where they proposed a 3D CNN for emotion classification. ERDL
is at least 10.27% higher than Mert’s method [42] where they extracted PSD,
entropy and other features from EEG data, then processed them with ICA
and fed them into artificial neural network. ERDL is at least 11.81% higher
than Thammasan’s method [43] where they extracted the fractal dimension and
lP
510 power spectral density from EEG data, and then fed the extracted features into
the SVM. And our method ERDL is at least 3.51% higher than Zhang’s method
[44] where they decomposed the EEG signal into four bands (theta, alpha, beta
and gamma) and used FFT to calculate the power as EEG data features which
rna
are input to the PNN. In addition, we compare ERDL with Chen’s method [38].
515 Our experimental results are at least 16.18% higher than their experimental
results. Their method focuses on the EEG channels’ selection and the accuracy
of emotion classification between different genders, but we ignore the influence
of gender on emotion recognition in our researches. This phenomenon shows
that our method is more universal than their method.
Jou
520 In summary, ERDL makes use of temporal and graph features from EEG
data to achieve good classification results and the nonlinear cells of ECLGCNN
make it much more powerful to perform feature representation and learning.
30
Journal Pre-proof
of
ERDL
Tripathi et al.
Li et al.
Xing et al.
Wang et al.
1 Mert et al.
Thammasan et al.
Zhang et al.
0.9
0.8481 0.8527
0.8141 0.8110 0.8121 0.8176
0.8
0.7210 0.7287 0.7300 0.7336 0.7412 0.7438 0.7330 0.7500 0.7250
pro
0.7206
0.7
Accuracy
0.6
0.5
0.4
0.3
0.2
0.1
0
Valence Arousal
re-
Figure 13: Comparison with other methods in Subject-independent experiments
4.5. Summary
lP
The difference among subjects’ EEG is not considered in subject-dependent
525 experiments, but it is considered in subject-independent experiments. The ex-
perimental results confirmed the phenomenon as shown in Figure 9 and Figure
13 that the classification accuracies of subject-dependent experiments are at
least 5% higher than subject-independent experiments. The purpose of subject-
rna
independent experiments is to verify the generalization of the proposed model.

530 The above experimental results show that the proposed method is the most
effective in the subject-dependent experiments and the subject-independent ex-
periments. This further shows that ECLGCNN is effective in the emotion clas-
sification models.
Jou
5. Conclusion and future work
535 In this paper, we propose a new emotion recognition method using deep
learning model based on EEG’s differential entropy, which adopts a novel fu-
sion model of GCNN and LSTM for emotion classification. ECLGCNN utilizes
the graph and temporal information, where each EEG channel corresponds to a
31
Journal Pre-proof
graph node, and the functional relationship between two channels corresponds
of
540 to the edge of the graph and LSTM cells’ gates are used to extract effective
information. Both subject-dependent experiments and subject-independent ex-
periments on DEAP were conducted and the experimental results indicate that
pro
ERDL achieves better recognition accuracy than the state-of-the-art methods
such as CNN, RNN [5],LSTM [35], SAE+LSTM [39], EmotioNet [40], SVM [41],
545 ANN [42], and PNN [44] methods.
Furthermore, the average classification accuracy of ECLGCNN can be as
high as 90.52% for the case of subject-dependent experiments and can be as
high as 85.04% for the case of subject-independent experiments. The better
550
re-
classification accuracy of ECLGCNN owes to the following mechanisms:
• The nonlinear cells of ECLGCNN render it much more powerful to perform

feature representation and learning;
• ECLGCNN simultaneously extracts and combines the temporal and graph

lP
features for emotion classification.
Although better emotion classification accuracy is obtained in the above

555 experiments, we only explored the effectiveness of ECLGCNN on the binary
classification of emotions (low/high arousal or negative/positive valence). In the
rna
future work, we will expand ECLGCNN as a multi-classifier to finely distinguish

the emotion states. At the same time, the graph representation and new features
will be elaborated so as to represent the intrinsic relationship of EEG channel
560 and reduce computing complexity.
Acknowledgments
Jou
We are grateful for the support of the National Natural Science Foundation of
China (91846205, 61373149), the National Key R&D Program (2017YFB1400102,
2016YFB1000602), and SDNSFC (no.ZR2017ZB0420).
32
Journal Pre-proof
565 References
of
[1] S. M. Alarcao, M. J. Fonseca, Emotions recognition using eeg signals: A
survey, IEEE Transactions on Affective Computing 10 (3) (2019) 374–393.
doi:10.1109/TAFFC.2017.2714671.
pro
[2] M. Hamalainen, R. Hari, R. J. Ilmoniemi, J. Knuutila, O. V. Lounas-
570 maa, Magnetoencephalography-theory, instrumentation, and applications
to noninvasive studies of the working human brain, Reviews of Modern
Physics 65 (2) (1993) 413–497. doi:10.1103/RevModPhys.65.413.
[3] X. Wang, T. Zhang, X. Xu, L. Chen, X. Xing, C. L. P. Chen, Eeg emotion
575
re-
recognition using dynamical graph convolutional neural networks and broad
learning system, in: 2018 IEEE International Conference on Bioinformatics
and Biomedicine (BIBM), 2018, pp. 1240–1244. doi:10.1109/bibm.2018.
8621147.
lP
[4] S. Tripathi, S. Acharya, R. D. Sharma, S. Mittal, S. Bhattacharya, Using
deep and convolutional neural networks for accurate emotion classification
580 on DEAP dataset, in: S. P. Singh, S. Markovitch (Eds.), Proceedings of
the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9,
2017, San Francisco, California, USA, AAAI Press, 2017, pp. 4746–4752.
rna
URL http://aaai.org/ocs/index.php/IAAI/IAAI17/paper/view/
15007
585 [5] X. Li, D. Song, P. Zhang, G. Yu, Y. Hou, B. Hu, Emotion recognition from
multi-channel eeg data through convolutional recurrent neural network, in:
2016 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM), 2016, pp. 352–359. doi:10.1109/bibm.2016.7822545.
Jou
[6] T. Wilaiprasitporn, A. Ditthapron, K. Matchaparn, T. Tongbuasirilai,

590 N. Banluesombatkul, E. Chuangsuwanich, Affective eeg-based person iden-
tification using the deep learning approach, IEEE Transactions on Cog-
nitive and Developmental Systems (2020) 1–1doi:10.1109/TCDS.2019.
2924648.
33
Journal Pre-proof
[7] N. Thammasan, K. Fukui, M. Numao, Application of deep belief networks
of
595 in eeg-based dynamic music-emotion recognition, in: 2016 International
Joint Conference on Neural Networks (IJCNN), 2016, pp. 881–888. doi:
10.1109/IJCNN.2016.7727292.
pro
[8] T. Song, W. Zheng, P. Song, Z. Cui, Eeg emotion recognition using dynam-
ical graph convolutional neural networks, IEEE Transactions on Affective
600 Computing (2019) 1–1doi:10.1109/TAFFC.2018.2817622.
[9] T. Musha, Y. Terasaki, H. A. Haque, G. A. Ivamitsky, Feature extraction

from eegs associated with emotions, Artificial Life and Robotics 1 (1) (1997)
re-
15–19. doi:10.1007/BF02471106.
[10] L. I. Aftanas, N. V. Reva, A. A. Varlamov, S. V. Pavlov, V. P. Makhnev,

605 Analysis of evoked eeg synchronization and desynchronization in condi-
tions of emotional activation in humans: temporal and topographic char-
lP
acteristics, Neuroscience and Behavioral Physiology 34 (8) (2004) 859–867.
doi:10.1023/B:NEAB.0000038139.39812.eb.
[11] M. Kim, M. Kim, E. Oh, S. Kim, A review on the computational meth-

610 ods for emotional state estimation from the human eeg., Computational
and Mathematical Methods in Medicine 2013 (2013) (2013) 573734–573734.
rna
doi:10.1155/2013/573734.
[12] B. Hjorth, Eeg analysis based on time domain properties, Electroen-

cephalography and Clinical Neurophysiology 29 (3) (1970) 306–310. doi:
615 10.1016/0013-4694(70)90143-4.
[13] Y. Liu, O. Sourina, Real-time fractal-based valence level recognition from

Jou
eeg, in: Transactions on Computational Science XVIII, Vol. 18, Springer

Berlin Heidelberg, 2013, pp. 101–120. doi:10.1007/978-3-642-38803-3_
6.
620 [14] P. C. Petrantonakis, L. J. Hadjileontiadis, Emotion recognition from eeg
34
Journal Pre-proof
using higher order crossings, IEEE Transactions on Information Technology
of
in Biomedicine 14 (2) (2010) 186–197. doi:10.1109/TITB.2009.2034649.
[15] R. J. Davidson, What does the prefrontal cortex ”do” in affect: perspectives
on frontal eeg asymmetry research, Biological Psychology 67 (1) (2004)
pro
625 219–234. doi:10.1016/j.biopsycho.2004.03.008.
[16] M. Li, B. Lu, Emotion classification based on gamma-band eeg, in: 2009
Annual International Conference of the IEEE Engineering in Medicine and
Biology Society, Vol. 2009, 2009, pp. 1223–1226. doi:10.1109/IEMBS.
2009.5334139.
630
re-
[17] D. Nie, X. Wang, L. Shi, B. Lu, Eeg-based emotion recognition during
watching movies, in: 2011 5th International IEEE/EMBS Conference on
Neural Engineering, 2011, pp. 667–670. doi:10.1109/NER.2011.5910636.
[18] L. Shi, Y. Jiao, B. Lu, Differential entropy feature for eeg-based vigilance
lP
estimation, in: 2013 35th Annual International Conference of the IEEE
635 Engineering in Medicine and Biology Society (EMBC), Vol. 2013, 2013,
pp. 6627–6630. doi:10.1109/EMBC.2013.6611075.
[19] W. Zheng, J. Zhu, Y. Peng, B. Lu, Eeg-based emotion classification using

rna
deep belief networks, in: 2014 IEEE International Conference on Multime-

dia and Expo (ICME), 2014, pp. 1–6. doi:10.1109/ICME.2014.6890166.
640 [20] O. Lin, G. Liu, J. Yang, Y. Du, Neurophysiological markers of identifying

regret by 64 channels eeg signal, in: 2015 12th International Computer
Conference on Wavelet Active Media Technology and Information Process-
ing (ICCWAMTIP), 2015, pp. 395–399. doi:10.1109/ICCWAMTIP.2015.
Jou
7494017.
645 [21] Y. Shi, X. Zheng, T. Li, Unconscious emotion recognition based on multi-
scale sample entropy, in: 2018 IEEE International Conference on Bioin-
formatics and Biomedicine (BIBM), 2018, pp. 1221–1226. doi:10.1109/
bibm.2018.8621185.
35
Journal Pre-proof
[22] C. A. Frantzidis, C. Bratsas, C. Papadelis, E. I. Konstantinidis, C. Pappas,
of
650 P. D. Bamidis, Toward emotion aware computing: An integrated approach
using multichannel neurophysiological recordings and affective visual stim-
uli, IEEE Transactions on Information Technology in Biomedicine 14 (3)
pro
(2010) 589–597. doi:10.1109/TITB.2010.2041553.
[23] Y. Lin, C. Wang, T. Jung, T. Wu, S. Jeng, J. Duann, J. Chen, Eeg-based

655 emotion recognition in music listening, IEEE Transactions on Biomedical
Engineering 57 (7) (2010) 1798–1806. doi:10.1109/TBME.2010.2048568.
[24] F. Sbargoud, M. Djeha, M. Guiatni, N. Ababou, Wpt-ann and belief theory

re-
based eeg/emg data fusion for movement identification, Traitement Du
Signal 36 (5) (2019) 383–391. doi:10.18280/ts.360502.
660 [25] M. Defferrard, X. Bresson, P. Vandergheynst, Convolutional neural net-

works on graphs with fast localized spectral filtering, Advances in Neural
Information Processing Systems 29 (2016) (2016) 3844–3852.
lP
URL http://infoscience.epfl.ch/record/218985
[26] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The

665 graph neural network model, IEEE Transactions on Neural Networks 20 (1)
(2009) 61–80. doi:10.1109/TNN.2008.2005605.
rna
[27] F. P. Such, S. Sah, M. Dominguez, S. Pillai, C. Zhang, A. M. Michael, N. D.

Cahill, R. Ptucha, Robust spatial filtering with graph convolutional neural
networks, IEEE Journal of Selected Topics in Signal Processing 11 (6)
670 (2017) 884–896. doi:10.1109/JSTSP.2017.2726981.
[28] H. Zhu, N. Lin, H. Leung, R. Leung, S. Theodoidis, Target classifi-

Jou
cation from sar imagery based on the pixel grayscale decline by graph
convolutional neural network, IEEE Sensors Letters 4 (6) (2020) 1–4.
doi:10.1109/LSENS.2020.2995060.
675 [29] R. Levie, F. Monti, X. Bresson, M. M. Bronstein, Cayleynets: Graph con-

volutional neural networks with complex rational spectral filters, IEEE
36
Journal Pre-proof
Transactions on Signal Processing 67 (1) (2019) 97–109. doi:10.1109/
of
TSP.2018.2879624.
[30] V. Diego, F. Giulia, M. Enrico, Image denoising with graph-convolutional

680 neural networks, in: 2019 IEEE International Conference on Image Pro-
pro
cessing, ICIP 2019, Taipei, Taiwan, September 22-25, 2019, IEEE, 2019,
pp. 2399–2403. doi:10.1109/ICIP.2019.8803367.
[31] Y. Yu, X. Si, C. Hu, J. Zhang, A review of recurrent neural networks:

Lstm cells and network architectures, Neural Computation 31 (7) (2019)
685 1235–1270. doi:10.1162/neco_a_01199.
re-
[32] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, J. Schmidhuber,
Lstm: A search space odyssey, IEEE Transactions on Neural Networks
28 (10) (2017) 2222–2232. doi:10.1109/TNNLS.2016.2582924.
[33] Y. Yang, J. Zhou, J. Ai, Y. Bin, A. Hanjalic, H. Shen, Y. Ji, Video caption-
lP
690 ing by adversarial lstm, IEEE Transactions on Image Processing 27 (11)
(2018) 5600–5611. doi:10.1109/TIP.2018.2855422.
[34] X. Yu, W. Rong, J. Liu, D. Zhou, Y. Ouyang, Z. Xiong, Lstm-based end-to-

end framework for biomedical event extraction, IEEE/ACM Transactions
rna
on Computational Biology and Bioinformatics (2019) 1–1doi:10.1109/

695 tcbb.2019.2916346.
[35] S. Alhagry, A. A. Fahmy, R. A. Elkhoribi, Emotion recognition based on

eeg using lstm recurrent neural network, International Journal of Advanced
Computer Science and Applications 8 (10) (2017). doi:http://dx.doi.
Jou
org/10.14569/IJACSA.2017.081046.
700 [36] S. Koelstra, C. Muhl, M. Soleymani, J. Lee, A. Yazdani, T. Ebrahimi,

T. Pun, A. Nijholt, I. Patras, Deap: A database for emotion analysis ;us-
ing physiological signals, IEEE Transactions on Affective Computing 3 (1)
(2012) 18–31. doi:10.1109/T-AFFC.2011.15.
37
Journal Pre-proof
[37] Y. Liu, O. Sourina, Eeg-based subject-dependent emotion recognition al-
of
705 gorithm using fractal dimension, in: 2014 IEEE International Confer-
ence on Systems, Man, and Cybernetics (SMC), 2014, pp. 3166–3171.
doi:10.1109/SMC.2014.6974415.
pro
[38] J. Chen, B. Hu, P. Moore, X. Zhang, X. Ma, Electroencephalogram-based
emotion assessment system using ontology and data mining techniques,
710 Applied Soft Computing 30 (2015) 663–674. doi:10.1016/j.asoc.2015.
01.007.
[39] X. Xing, Z. Li, T. Xu, L. Shu, B. Hu, X. Xu, Sae+lstm: A new framework
re-
for emotion recognition from multi-channel eeg, Frontiers in Neurorobotics
13 (2019) 37. doi:10.3389/fnbot.2019.00037.
715 [40] Y. Wang, Z. Huang, B. Mccane, P. Neo, Emotionet: A 3-d convolutional

neural network for eeg-based emotion recognition, in: 2018 International
lP
Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–7. doi:10.
1109/IJCNN.2018.8489715.
[41] W. Liu, W. Zheng, B. Lu, Emotion recognition using multimodal

720 deep learning, in: Neural Information Processing, Springer Interna-
tional Publishing, 2016, pp. 521–529. doi:https://doi.org/10.1007/
rna
978-3-319-46672-9_58.
[42] A. Mert, A. Akan, Emotion recognition from eeg signals by using multi-
variate empirical mode decomposition, Pattern Analysis and Applications
725 21 (1) (2018) 81–89. doi:10.1007/s10044-016-0567-6.
[43] N. Thammasan, K. Moriyama, K. Fukui, M. Numao, Familiarity effects

Jou
in eeg-based emotion recognition, Brain Informatics 4 (1) (2017) 39–50.

doi:10.1007/s40708-016-0051-5.
[44] J. Zhang, M. Chen, S. Hu, Y. Cao, R. Kozma, Pnn for eeg-based emotion
730 recognition, in: 2016 IEEE International Conference on Systems, Man, and
38
Journal Pre-proof
Cybernetics (SMC), 2016, pp. 002319–002323. doi:10.1109/smc.2016.
of
7844584.
[45] H. He, Y. Tan, J. Ying, W. Zhang, Strengthen eeg-based emotion recogni-

tion using firefly integrated optimization algorithm, Applied Soft Comput-
pro
735 ing (2020) 106426doi:10.1016/j.asoc.2020.106426.
[46] Y. Liu, M. Yu, G. Zhao, J. Song, Y. Ge, Y. Shi, Real-time movie-induced

discrete emotion recognition from eeg signals, IEEE Transactions on Affec-
tive Computing 9 (4) (2018) 550–562. doi:10.1109/TAFFC.2017.2660485.
[47] F. Zhou, S. Kong, C. C. Fowlkes, T. Chen, B. Lei, Fine-grained facial

740
re-
expression analysis using dimensional emotion model, Neurocomputing 392
(2020) 38–49. doi:10.1016/j.neucom.2020.01.067.
[48] R. Fourati, B. Ammar, J. J. Sanchezmedina, A. M. Alimi, Unsuper-

vised learning in reservoir computing for eeg-based emotion recogni-
lP
tion, IEEE Transactions on Affective Computing (2020) 1–1doi:10.1109/
745 taffc.2020.2982143.
[49] H. Becker, J. Fleureau, P. Guillotel, F. Wendling, I. Merlet, L. Albera,

Emotion recognition based on high-resolution EEG recordings and recon-
rna
structed brain sources, IEEE Transactions on Affective Computing 11 (2)

(2020) 244–257. doi:10.1109/TAFFC.2017.2768030.
750 [50] J. Chen, P. W. Zhang, Z. J. Mao, Y. F. Huang, D. Jiang, Y. Zhang,

Accurate eeg-based emotion recognition on combined features using deep
convolutional neural networks, IEEE Access 7 (2019) 44317–44328. doi:
Jou
10.1109/ACCESS.2019.2908285.
[51] H. Candra, M. Yuwono, R. Chai, A. Handojoseno, I. Elamvazuthi, H. T.

755 Nguyen, S. W. Su, Investigation of window size in classification of eeg-
emotion signal with wavelet entropy and support vector machine, in: 2015
37th Annual International Conference of the IEEE Engineering in Medicine
39
Journal Pre-proof
and Biology Society (EMBC), 2015, pp. 7250–7253. doi:10.1109/EMBC.
of
2015.7320065.
760 [52] Z. Lan, G. Mullerputz, L. Wang, Y. Liu, O. Sourina, R. Scherer, Using

support vector regression to estimate valence level from eeg, in: Systems,
pro
Man, and Cybernetics (SMC), 2016 IEEE International Conference on,
2016, pp. 002558–002563. doi:10.1109/SMC.2016.7844624.
[53] J. Li, Z. Zhang, H. He, Hierarchical convolutional neural networks for eeg-
765 based emotion recognition, Cognitive Computation 10 (2) (2018) 368–380.
doi:10.1007/s12559-017-9533-x.
re-
[54] S. Fu, W. Liu, S. Li, Y. Zhou, Two-order graph convolutional networks for
semi-supervised classification, Iet Image Processing 13 (14) (2019) 2763–
2771. doi:10.1049/iet-ipr.2018.6224.
770 [55] C. Chang, C. Lin, LIBSVM: A library for support vector machines,
lP
ACM Trans. Intell. Syst. Technol. 2 (3) (2011) 27:1–27:27. doi:10.1145/
1961189.1961199.
[56] S. R. Safavian, D. A. Landgrebe, A survey of decision tree classifier method-

ology, IEEE Transactions on Systems, Man, and Cybernetics 21 (3) (1991)
rna
775 660–674. doi:10.1109/21.97458.
[57] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. doi:https:
//doi.org/10.1023/A:1010933404324.
Jou
40
Journal Pre-proof
The highlights of our paper are as follow：
 A new emotion recognition method using deep learning model based on EEG's differential
entropy is proposed. In contrast to the traditional emotion recognition method, multiple GCNN
structures are utilized to extract the temporal information and graph domain information, and
LSTM is integrated to memorize the change of the relationship between two EEG channels
within a specific time, and the fusion of GCNN and LSTM improves effectiveness of emotion
recognition.
 A fusion model of LSTM and GCNN for emotion classification (named ECLGCNN) is
of
proposed, which utilizes the graph and temporal information. In the fusion model, each EEG
channel corresponds to a vertex node, and the functional relationship between two channels
corresponds to edge of the graph where the greater value of the edge is, the closer the functional
pro
relationship between two channels is; LSTM cells’ gates are used to extract effective
information from input (the output of GCNNs) for emotion classification.
 Extensive experiments on DEAP are conducted to verify ECLGCNN model and experimental
results demonstrate that the proposed method has better emotion classification performance
than the state-of-the-art methods. The average accuracy reaches 90.45% and 90.60% for
valence and arousal in subject-dependent experiments while 84.81% and 85.27% in subject-
independent experiments.
re-
lP
rna
Jou
Journal Pre-proof
Yongqiang Yin: Conceptualization, Methodology, Software
Xiangwei Zheng: Writing- Reviewing and Editing
Bin Hu: Supervision.
Yuang Zhang: Software
Xinchun Cui: Validation
of
pro
re-
lP
rna
Jou
*Declaration of Interest Statement Journal Pre-proof
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
of
☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
pro
re-
lP
rna
Jou

(A) EEG Emotion Recognition Using Fusion Model of Graph Convolutional

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(A) EEG Emotion Recognition Using Fusion Model of Graph Convolutional

Uploaded by

Copyright:

Available Formats

Journal Pre-proof

EEG emotion recognition using fusion model of graph convolutional

To appear in: Applied Soft Computing Journal

Received date : 18 August 2020

© 2020 Elsevier B.V. All rights reserved.

EEG Emotion Recognition Using Fusion Model of

Yongqiang Yina,b , Xiangwei Zheng∗a,b , Bin Hua , Yuang Zhanga,b , Xinchun

At last, we conducted extensive experiments on DEAP dataset and experimen-

Email address: xwzhengcn@163.com ( Xiangwei Zheng∗ )

Preprint submitted to Journal of LATEX Templates November 28, 2020

experiments while 84.81% and 85.27% in subject-independent experiments.

ing EEG, such as emotion recognition methods based on convolutional neural

To address the above issues, we propose a novel emotion recognition method

• We propose a new emotion recognition method using deep learning model

• We propose a fusion model of LSTM and GCNN for emotion classification

• We conduct extensive experiments on DEAP dataset to verify ECLGCNN

better emotion classification performance than the state-of-the-art meth-

70 The research on applying EEG signals to emotion classification can be traced

electrodes. In recent decades, many machine learning and signal processing

The ability of convolutional neural networks (CNN) to learn local station-

therefore the convolution operation can be expressed by the product of two

In emotion recognition, Song et al. [8] proposed a dynamic graph convolu-

characteristic of RNN architecture is a cyclic connection, which enables RNN

2.4. Emotion recognition methods on DEAP

lence classes, respectively. Chen et al. [38] proposed an enhanced EEG-based

emotion recognition based on EEG. Therefore, we attempt to develop a novel

3.1. Emotion recognition method

A primary issue with emotion recognition is that subjects show different

• Discrete model, which classifies emotion states based on developmental

205 • Dimensional model, which is expressed in terms of two dimensional states

215 noise generated spontaneously by the brain [50].

3.2. Differential entropy

Differential entropy is an extension of Shannon entropy and used to measure

In this study, differential entropy is extracted from each segment to construct

Figure 2: Architecture of ECLGCNN

3.3.1. Design of parallel GCNNs

determining Ai,j of the adjacency matrix A is k nearest neighbor (k-NN). The

Symmetric normalized Laplacian matrix L of graph G is defined as follows:

where x̂ is the transformed signal in the frequency domain; U is an orthogo-

The inverse of GFT can be expressed as

where is the Hadamard product.

where g (Λ) is expressed as follow

where λ1 , λ2 , · · · , λN are the eigenvalue of L.

where θk is the Chebyshev polynomial coefficient and Tk (•) is the calculation

290 The GCNN’s calculation steps are as follow:

(2) Apply Chebyshev polynomial to calculate Tk L̃ x;

(3) Set Chebyshev polynomial coefficients θ as the convolution kernel to imple-

3.3.2. Architecture of LSTM

ft = σ (Wf · [ht−1 , yt ] + bf ) (13)

ot = σ (Wo · [ht−1 , yt ] + bo ) (15)

The calculation method of current memory states of the cell is defined as

c̃t = σ (Wc · [ht−1 , yt ] + bc ) (16)

The calculation method of cell state is defined as

ct = ft ct−1 + it c̃t (17)

The calculation method of cell output is defined as

ht = ot tanh (ct ) (18)

310 3.3.3. Algorithm description of ECLGCNN

Loss = cross entropy (p, l) + αkW k2 (19)

where θ* ∈ RK×T is Chebyshev polynomial coefficient for T GCNNs, and λ is

Output: The desired parameters of ECLGCNN

4. Experiments and discussion

4.2. Evaluation indices

identify low arousal/negative valence (named positive examples); TN is the