You are on page 1of 44

Journal Pre-proof

EEG emotion recognition using fusion model of graph convolutional


neural networks and LSTM

Yongqiang Yin, Xiangwei Zheng, Bin Hu, Yuang Zhang, Xinchun Cui

PII: S1568-4946(20)30892-9
DOI: https://doi.org/10.1016/j.asoc.2020.106954
Reference: ASOC 106954

To appear in: Applied Soft Computing Journal

Received date : 18 August 2020


Revised date : 5 November 2020
Accepted date : 25 November 2020

Please cite this article as: Y. Yin, X. Zheng, B. Hu et al., EEG emotion recognition using fusion
model of graph convolutional neural networks and LSTM, Applied Soft Computing Journal (2020),
doi: https://doi.org/10.1016/j.asoc.2020.106954.

This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.

© 2020 Elsevier B.V. All rights reserved.


Journal Pre-proof

EEG Emotion Recognition Using Fusion Model of

of
Graph Convolutional Neural Networks and LSTM

Yongqiang Yina,b , Xiangwei Zheng∗a,b , Bin Hua , Yuang Zhanga,b , Xinchun


Cuic

pro
a Schoolof Information Science and Engineering, Shandong Normal University, Jinan,
China
b Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology,

Jinan, China
c School of Computer Science, Qufu Normal University, Rizhao, China

Abstract re-
In recent years, graph convolutional neural networks have become research fo-
cus and inspired new ideas for emotion recognition based on EEG. Deep learn-
ing has been widely used in emotion recognition, but it is still challenging to
construct models and algorithms in practical applications. In this paper, we
lP
propose a novel emotion recognition method based on a novel deep learning
model (ERDL). Firstly, EEG data is calibrated by 3s baseline data and divided
into segments with 6s time window, and then differential entropy is extracted
from each segment to construct feature cube. Secondly, the feature cube of each
rna

segment serves as input of the novel deep learning model which fuses graph
convolutional neural network (GCNN) and long-short term memories neural
networks (LSTM). In the fusion model, multiple GCNNs are applied to extract
graph domain features while LSTM cells are used to memorize the change of the
relationship between two channels within a specific time and extract temporal
features, and Dense layer is used to attain the emotion classification results.
Jou

At last, we conducted extensive experiments on DEAP dataset and experimen-


tal results demonstrate that the proposed method has better classification re-
sults than the state-of-the-art methods. We attained the average classification
accuracy of 90.45% and 90.60% for valence and arousal in subject-dependent

Email address: xwzhengcn@163.com ( Xiangwei Zheng∗ )

Preprint submitted to Journal of LATEX Templates November 28, 2020


Journal Pre-proof

experiments while 84.81% and 85.27% in subject-independent experiments.

of
Keywords: EEG, emotion recognition, Long-short term memory neural
network, Graph convolutional neural network, Differential entropy

pro
1. Introduction

Emotions play an important role in daily life and influence the perception
of the surroundings. Recently, many human-computer interaction systems are
established by lots of research communities in domestic and abroad, so the au-
5 tomatic classification of emotional states becomes indispensable. This can be
re-
achieved with a variety of methods, such as subjective self-reporting, neuro-
physiological measurements and so on. In recent years, electroencephalography
(EEG) based emotion recognition has received widespread attention because it
is a simple, cheap, portable, and easy-to-use emotion classification method [1].
10 EEG signals record the relationship between emotional state and brain activity
lP
and reflect very subtle emotional changes with high time resolution [2]. However,
EEG signals have the shortcomings [3] as time asymmetry and instability, low
signal-to-noise ratio, and uncertain brain areas of specific reactions. Therefore,
EEG-based emotion recognition is still a challenging task.
Many researchers have proposed their methods for emotion recognition us-
rna

15

ing EEG, such as emotion recognition methods based on convolutional neural


networks(CNN) [4, 5, 6], deep belief networks [7], graph convolutional neu-
ral network [3, 8] and so on. Recently, graph convolutional neural networks
(GCNN) and long-short term memory neural networks (LSTM) have been grad-
20 ually adopted in this fields. But We are facing problems that how to fuse GCNN
with LSTM and apply it to EEG-based emotion recognition.
Jou

To address the above issues, we propose a novel emotion recognition method


based on the deep learning model. Firstly, EEG data is calibrated by 3s baseline
data and divided into segments with 6s time window, and then differential en-
25 tropy (DE) is extracted from each segment to construct feature cube. Secondly,
the feature cube of each segment serves as the input of deep learning model

2
Journal Pre-proof

which fuses graph convolutional neural network and long-short term memories

of
neural networks. In the fusion model, multiple GCNNs are applied to extract
graph domain features, LSTM cells are used to memorize the change of the rela-
30 tionship between two EEG channels within a specific time and extract temporal

pro
features, and Dense layer is used to attain the emotion classification result.
At last, we conducted extensive experiments on DEAP dataset and experi-
mental results demonstrate that the proposed method has better classification
performance than the state-of-the-art methods. We attained the average classi-
35 fication accuracy of 90.45% and 90.60% on the DEAP dataset for valence and
arousal in subject-dependent experiments while 84.81% and 85.27% in subject-
independent experiments.
re-
The main contributions of this paper are as follows:

• We propose a new emotion recognition method using deep learning model


40 based on EEG’s differential entropy. In contrast to the traditional emotion
lP
recognition method, DE is extracted from each divided segment to gen-
erate feature cube, multiple GCNNs are applied to extract graph domain
features from each feature cube, and LSTM cells are applied to memorize
the change of the relationship between two EEG channels within a specific
45 time and extract temporal features.
rna

• We propose a fusion model of LSTM and GCNN for emotion classification


(named ECLGCNN). In the fusion model, each EEG channel corresponds
to a vertex node, and the functional relationship between two channels
corresponds to edge of the graph where the greater value of the edge
50 is, the closer the functional relationship between two channels is; LSTM
Jou

cells’ gates are used to extract effective information from input (the output
of GCNNs) for emotion classification. The fusion of GCNN and LSTM
improves effectiveness of emotion recognition.

• We conduct extensive experiments on DEAP dataset to verify ECLGCNN


55 model. Experimental results demonstrate that the proposed method has

3
Journal Pre-proof

better emotion classification performance than the state-of-the-art meth-

of
ods. The average accuracy of 90.45% and 90.60% for valence and arousal
in subject-dependent experiments are achieved while 84.81% and 85.27%
in subject-independent experiments.

pro
60 The remainder of this paper is as follow. We briefly review EEG features,
GCNN, LSTM and emotion recognition methods on DEAP dataset in section
2. In section 3, we present the proposed emotional recognition method and
its key components, including DE, fusion model of GCNN and LSTM and its
architecture. In section 4, we introduce the DEAP data set adopted in the
65 experiments and the evaluation indicators and analyze the experimental results
re-
of ECLGCNN on DEAP in detail. We conclude the paper and discuss the future
work in section 5.

2. Related work
lP
2.1. EEG features

70 The research on applying EEG signals to emotion classification can be traced


back to the work of Mush [9]. They extracted a set of 135 state variables as EEG
features of cross-correlation coefficients of EEG signals collected with ten scalp
rna

electrodes. In recent decades, many machine learning and signal processing


methods have been proposed to deal with EEG emotion classification [10, 11].
75 EEG features adopted in emotion classification can usually be divided into three
types, namely, the time domain features, the frequency domain features and the
time-frequency domain features. Time domain features, such as Hjorth feature
[12], fractal dimension feature [13] and high-order cross feature [14], mainly
Jou

capture the time domain information of EEG signals. Frequency domain fea-
80 tures aim to capture EEG emotional information from a frequency perspective.
Feature extraction in frequency domain consists of two steps. The first step
is to decompose the EEG signal into several frequency bands, including δ fre-
quency band (1-3Hz), θ frequency band (4-7Hz), α frequency band (8-13Hz), β
frequency band (14-30Hz) and γ frequency band (31-50Hz) [10, 15, 16, 17]. The

4
Journal Pre-proof

85 second step is to extract EEG features from each frequency band. Commonly

of
used EEG features include differential asymmetry [13],differential entropy (DE)
[18, 19], power spectral density [19], approximate entropy [20], sample entropy
[21] and Rational Asymmetry [22].

pro
Due to the instability of EEG signals, it is unilateral to individually use the
90 time domain information or frequency domain information, so more and more
studies simultaneously utilize the time and the frequency domain information to
embody the nature of EEG signals. The features of fusing time and frequency
domain are called time-frequency features. Common feature extraction methods
include short-time Fourier transform [23], wavelet transform [24] and so on.

95
re-
2.2. Graph convolutional neural network

The ability of convolutional neural networks (CNN) to learn local station-


ary structures and evolve them to form multi-scale hierarchical patterns has led
to breakthroughs in image processing, video processing, and sound recognition
lP
tasks [25]. However, the advantages of CNN also limit CNN’s capability of pro-
100 cessing irregular and non-Euclidean domains data that can be structured with
graphs. Graph convolutional neural network [25, 26] is an extension of the tradi-
tional convolutional neural network by combining convolutional neural network
rna

and spectrum theory. Compared with the classical convolutional neural net-
work, graph convolutional neural network is advantageous in the discriminative
105 feature extraction of signals in the discrete spatial domain [27].
Graph convolutional neural network applies convolution operations to the
transformed graph, but the definition of convolution operation is the key. Fourier
transform is introduced to the graph, and the convolution theorem is adopted,
Jou

therefore the convolution operation can be expressed by the product of two


110 Fourier transforms. Graph convolutional neural network provides an effective
method to describe the internal relationship between different graph’s nodes,
which provides a way to explore the relationship among multiple EEG channels
in the EEG emotion classification [8].

5
Journal Pre-proof

In emotion recognition, Song et al. [8] proposed a dynamic graph convolu-

of
115 tional neural network and they applied the dynamic graph convolutional neural
network to the SEED and DREAMER data set and achieved good results. Wang
et al. [3] introduced a broad learning system and proposed a model that com-

pro
bines dynamic convolutional neural network and broad learning system. They
also applied the model to the SEED and DREAMER to verify the effectiveness
120 of emotion recognition.In image processing, Zhu et al. [28] adopted graph convo-
lutional neural network to extract the features of graph-structured data. Levie
et al. [29] proposed a CayleyNets based on graph convolutional neural network
and they made use of MNIST, CORA and MovieLens datasets to verify Cay-

125
re-
leyNets and attained good experimental results. Valsesia et al. [30] proposed a
convolutional neural network with graph convolutional layer and applied it to
recover an image from a noisy observation.

2.3. LSTM
lP
Recurrent Neural Network (RNN) owns the good capability of addressing
time series data and is commonly used in natural language processing. With
130 the special network structure, RNN memorizes the previous information and
utilizes them to influence the output of the subsequent nodes. The typical
rna

characteristic of RNN architecture is a cyclic connection, which enables RNN


to own the capacity to update the current states based on past states and
current input data. However, RNN consisting of sigma cells or tanh cells is
135 unable to learn the relevant information of input data when the relevant input
is large. By introducing gate functions into the cell structure, LSTM can handle
the problem of long-term dependencies well [31]. RNN with long short-term
Jou

memories become an effective and scalable model for solving several learning
problems related to sequential data. LSTM architecture contains two important
140 units, namely, a storage unit and a nonlinear gating unit. The storage unit can
maintain its states over time and the nonlinear gating units can regulate the
information flow into and out of the units [32].
Yang et al. [33] proposed a novel approach to video captioning based on

6
Journal Pre-proof

adversarial LSTM and their method aimed at compensating for the deficiencies

of
145 of LSTM-based video captioning method. Yu et al. [34] proposed an end-to-
end model based on LSTM to optimize biomedical event extraction. Salma et
al. [35] designed a multi-layer LSTM framework for emotion recognition and

pro
applied it to the DEAP dataset.

2.4. Emotion recognition methods on DEAP

150 The public release of DEAP [36] dataset provides opportunities for the re-
searchers in the field of emotional recognition. Prior to DEAP, most researchers
focused on analyzing facial expressions and speech to determine a person’s emo-

155
re-
tional states [37]. Recently, many researchers have proposed their own emotional
recognition methods for DEAP. Tripathi et al. [4] used the time domain fea-
tures from EEG to train deep neural network (DNN) and CNN respectively,
and the final classification accuracy exceeded 73%. Li et al. [5] applied wavelet
features to train CNN combined with LSTM and the binary classifications ac-
lP
curacy reached 72%. Salma et al. [35] designed a multi-layer LSTM framework
to learn features from EEG signals, then the dense layer classified emotions
160 into low/high arousal and valence. They used DEAP to verify their method
and achieved an average accuracy of 85.65% and 85.45% with arousal and va-
rna

lence classes, respectively. Chen et al. [38] proposed an enhanced EEG-based


emotion assessment system and achieved an average accuracy of 69.09% and
67.89% with arousal and valence classes, respectively. Xing et al. [39] proposed
165 a SAE+LSTM classification model, and their experiments indicated that the
binary classification results were 81.10% in valence and 74.38% in arousal, re-
spectively. Wang et al. [40] proposed a 3D CNN for emotion classification and
Jou

got classification accuracy of 72.10% in the valence and 73.10% in the arousal.
Liu et al. [41] used bimodal deep autoencoder to generate new features, and then
170 fed the new features into support vector machines (SVM) to complete emotions
classification. They attained the accuracy of 85.20% for binary classification of
valence and 80.50% of arousal. Mert and Akan [42] first normalized the IMF
generated by the decomposition of multiple empirical modes and extracted 10

7
Journal Pre-proof

features as PSD, entropy etc., then processed them with ICA and fed them

of
175 into artificial neural network. They attained 72.87% in binary classification of
valence and 75.00% in binary classification of arousal. Thammasan et al. [43]
extracted the fractal dimension and power spectral density from EEG data, and

pro
then put the extracted features into the SVM classifier. They attained 73.00%
in binary classification of valence and 72.50% in binary classification of arousal.
180 Zhang et al. [44] decomposed the EEG signal into four bands, i.e. theta, al-
pha, beta and gamma, and used FFT to calculate the power as EEG features
which were input to the PNN. The experimental results showed that the mean
classification accuracy of PNN was 81.21% for valence(≥ 5 and <5) and 81.26%

185
re-
for arousal(≥5 and < 5). He et al. [45] proposed firefly integrated optimization
algorithm (FIOA) to simultaneously accomplish multiple tasks, i.e. the optimal
feature selection, parameter setting and classifier selection according to different
EEG-based emotion datasets. The experimental results showed that the aver-
age classification accuracy of FIOA was 86.90% for positive emotion (valence
lP
≥5 and arousal ≥5) and negative emotion (valence <5 and arousal <5).
190 Although the above recognition methods achieved some progresses in some
applications, the classification accuracy is relatively low. Graph convolutional
neural networks have shown successful applications, and inspired new ideas for
rna

emotion recognition based on EEG. Therefore, we attempt to develop a novel


emotion recognition method based on a deep learning model to improve accuracy
195 of emotion classification.

3. Method

3.1. Emotion recognition method


Jou

A primary issue with emotion recognition is that subjects show different


subjective emotion states as responses to the same stimuli. Many psychologists
200 conducted researches and proposed several theories. Typical emotion classifica-
tion model includes:

8
Journal Pre-proof

• Discrete model, which classifies emotion states based on developmental

of
features that contains positive emotions (amusement, joy, tenderness) and
negative emotions (anger, sadness, fear, disgust) [46].

205 • Dimensional model, which is expressed in terms of two dimensional states

pro
affecting subjects: Valence (disgust, pleasure) and Arousal (calm, excite-
ment) [47, 48, 49].

re-
lP
Figure 1: Emotion recognition method using deep learning model based on EEG’s differential
entropy. The size of iT feature cube is T × CN × F N . T is T seconds; CN is the number of
EEG channels; FN is the number of features.

Our study is based the above researches, and Figure 1 illustrates the pro-
rna

posed emotion recognition method using deep learning model based on EEG’s
210 differential entropy. The proposed ERDL method consists of four steps.

(1) Data calibration. Firstly, 3 seconds baseline EEG data, which is generated
spontaneously by the brain, is copied 20 times and linked one by one.
Then, EEG data of watching 60 seconds video subtracts corresponding
baseline data where the purpose of the processing is to remove EEG data
Jou

215 noise generated spontaneously by the brain [50].

(2) Data division. The calibrated EEG data is divided into ((60 - T )/S + 1)
segments for each trial of each subject. In the following experiments, we
set T to 6 seconds and S to 3 seconds referencing to the experimental
results of literatures [51, 52].

9
Journal Pre-proof

220 (3) Feature extraction. The experimental results [3, 8, 53] show that differen-

of
tial entropy of EEG data has a higher accuracy in emotion classification,
so the proposed method utilizes the differential entropy of EEG data. Sec-
tion 3.2 briefly introduces the extraction process of differential entropy of

pro
EEG data.

225 (4) Emotion recognition. A novel ECLGCNN is developed and used to emo-
tion recognition in this paper. ECLGCNN contains three layers, namely,
GCNNs layer, LSTMs layer and Dense layer. GCNNs layer is to attain
graph domain and temporal information from EEG channels’ DE features
while LSTMs and Dense layer are used to predict the low/high arousal or
re-
230 negative/positive valence according to the output of GCNNs layer.

3.2. Differential entropy

Differential entropy is an extension of Shannon entropy and used to measure


the complexity of continuous random variables. Differential entropy has higher
lP
capability to reflect the changes of vigilance and is one of the most accurate and
235 stable EEG features [18]. In several works [3, 8, 53], differential entropy is used
as the EEG signal feature for emotion recognition and the highest recognition
accuracy is achieved compared with other features.
rna

In this study, differential entropy is extracted from each segment to construct


feature cube, which serves as the input of the fusion model of GCNN and LSTM.
240 The mathematical formulation of the differential entropy is defined as
Z ∞  
1 (x−µ)2 1 (x−µ)2
DE(X) = - √ e− 2σ 2 log2 √ e− 2σ 2 dx
-∞ 2πσ 2 2πσ 2 (1)
1
= log2 (2πe) + log2 (σ)
Jou

2

where x ∼ N µ, σ 2 , e and π are constants. In this paper, for the characteristics
of EEG signals, DE is extracted from five main bands, namely δ frequency band
(1-3Hz), θ frequency band (4-7Hz), α frequency band (8-13Hz), β frequency
band (14-30Hz) and γ frequency band (31-50Hz).

10
Journal Pre-proof

of
pro
re-
lP
rna
Jou

Figure 2: Architecture of ECLGCNN

11
Journal Pre-proof

245 3.3. Fusion model of LSTM and GCNN for emotion recognition

of
The whole architecture of the fusion model of LSTM and GCNN for emotion
recognition is shown in Figure 2, which contains three layers, namely, GCNNs
layer, LSTMs layer and Dense layer. The GCNNs layer is used to calculate the

pro
relationship between two EEG channels during a period of time, the LSTMs
250 layer is devoted to memorize changes between two EEG channels in a certain
period, and the Dense layer completes the final emotion recognition according
to the LSTMs layer’s output result. In the GCNNs layer, we set T GCNNs to
extract the graph domain features from DE of T seconds’ EEG data. In other
words, we make use of graph domain information and time domain information
255
re-
to improve the EEG emotion recognition. The LSTMs layer contains input layer,
hidden layer and output layer. We set T LSTM cells to receive the calculation
results of T GCNNs, and set the number of LSTM hidden layer cells to num cell.
The LSTMs layer is used to memorize changes between two EEG channels in
T seconds. The Dense layer is a fully connected layer and is used to perform
lP
260 data dimension transformation. In the end of the fusion model, the Dense layer
is used to attain recognition result with feeding of the LSTMs layer’s output
result.
rna

3.3.1. Design of parallel GCNNs


In order to attain the temporal information from DE of EEG data, we design
265 the parallel computing mode of GCNN, that is, GCNNs layer as shown in Figure
2. GCNNs layer receives T-second EEG data’s features in chronological order
and outputs the calculation results to the LSTMs layer in time sequence. The
GCNN provides an effective way to describe the internal relationship between
Jou

different nodes of the graph, which provides a potential way to explore the
270 relationship among multiple EEG channels in the emotion recognition using
EEG [3].The details of i-th GCNN structure for GCNNs layer is shown in Figure
3. The calculation process of GCNN is comprised of two steps as follow.
(1) Graph Representation. Inspired by the successful applications of graph
convolutional neural network model in image processing [28, 29, 30, 54], this

12
Journal Pre-proof

of
pro
Figure 3: Graph convolutional neural network structure

275
re-
paper studies the problem of multi-channel EEG emotion recognition through
graph representation method. In the proposed graph representation, each EEG
channel corresponds to a node [8], and the functional relationship between two
channels corresponds to the edge of the graph and the value of the edge repre-
sents the close degree of the functional relationship. Meanwhile, the greater the
lP
280 value of the edge is, the closer the functional relationship of two channels is.
The graph can be defined as G= {V, ε, A} , where V is the set of N nodes,
ε is the edge set, A ∈ RN ×N is the adjacency matrix of node set V, and Ai,j is
the functional relationship between node i and node j.The common method for
rna

determining Ai,j of the adjacency matrix A is k nearest neighbor (k-NN). The


typical distance function is a Gaussian kernel function, which is expressed as
  
dist2

 − 2θ2i,j
Ai,j = e , disti,j 6 τ (2)

 0, other

where θ and τ are two parameters to be fixed, disti,j is the Euclidean distance
Jou

between the i-th vertex node and the j-th vertex node.
(2) Spectral Graph Filtering. Spectral graph filtering, also known as graph
convolution, is a popular signal processing method used for graph data opera-
285 tions, where graph Fourier transform (GFT) is a typical method.

13
Journal Pre-proof

Symmetric normalized Laplacian matrix L of graph G is defined as follows:

of
1 1
L = E − D− 2 AD− 2 , A and D ∈ RN ×N (3)
P
where D ∈ RN ×N is a diagonal matrix, Di,i = Ai,j ;E is an identity matrix.
j

pro
For a given spatial signal x ∈ RN ×F N (FN is the number of features), its
GFT is as follows
x̂ = U T x (4)

where x̂ is the transformed signal in the frequency domain; U is an orthogo-


nal matrix, which can be obtained by the singular value decomposition of the
Laplacian matrix L as follows
re- L = U ΛU T (5)

The inverse of GFT can be expressed as

x = U U T x = U x̂ (6)
lP
The convolution operation of two signals x and y on graph *G is defined as
 
x ∗G y =U UT x UT y (7)

where is the Hadamard product.


g (·) is a filter function, so the signal x filtered by g (L) can be expressed as
rna


y = g (L) x = g U ΛU T x = U g (Λ) U T x (8)

where g (Λ) is expressed as follow


 
g (λ1 ) · · · 0
 
 .. .. .. 
g (Λ) =  . . .  (9)
 
Jou

0 ··· g (λN )

where λ1 , λ2 , · · · , λN are the eigenvalue of L.


Because the computation of g (Λ) is too expensive in practical problems,
K-order Chebyshev polynomial is adopted to approximate g (Λ) in [3], that is
K−1
X  
g (Λ) = θk Tk Λ̃ (10)
k=0

14
Journal Pre-proof

where θk is the Chebyshev polynomial coefficient and Tk (•) is the calculation

of
method of Chebyshev polynomial [3] whose calculation process is formulated as

T0 (x) = 1

pro
T1 (x) = x (11)
Tk (x) = 2xTk−1 (x) − Tk−2 (x) , k ≥ 2
Combining formula (10), formula (8) can be converted into the following
calculation form

y = U g (Λ) U T x
 
θk Tk (λ1 ) · · · 0
P 
K−1
re- .. .. ..

 T
= U . . . U x (12)
k=0  
0 ··· θk Tk (λN )
P
K−1  
= θk Tk L̃ x
k=0

L
where L̃ = − E; E is an identity matrix.
lP
λM AX

290 The GCNN’s calculation steps are as follow:

(1) Calculate the relationship between two EEG channels with the K-nearest
neighbor;
 
rna

(2) Apply Chebyshev polynomial to calculate Tk L̃ x;

(3) Set Chebyshev polynomial coefficients θ as the convolution kernel to imple-


     
295 ment the convolutional operation on (T0 L̃ x, T1 L̃ x, · · · , TK−1 L̃ x).

3.3.2. Architecture of LSTM


LSTM is a neural network with long and short-term memory and has become
Jou

an effective and scalable model for solving several learning problems related to
sequential data [32]. The purpose of LSTMs layer in Figure 2 is to memorize
300 the change of the relationship between two EEG channels in T seconds. LSTMs
layer defines three layers: the first layer is the input layer which is used to receive
the results from GCNNs layer; the second layer is hidden layer and is used to
memorize the change of the relationship between EEG channels in T seconds;

15
Journal Pre-proof

the last layer is used to output the information of emotion recognition. Next,

of
305 we depict LSTM cell’s structure. The structure of each LSTM cell structure is
shown in Figure 4 .

pro
re-
Figure 4: The structure of LSTM cell

The forget gate, input gate and output gate of LSTM cells can be used to
add and remove information to cell states and are defined as follow

ft = σ (Wf · [ht−1 , yt ] + bf ) (13)


lP
it = σ (Wi · [ht−1 , yt ] + bi ) (14)

ot = σ (Wo · [ht−1 , yt ] + bo ) (15)


rna

The calculation method of current memory states of the cell is defined as

c̃t = σ (Wc · [ht−1 , yt ] + bc ) (16)

The calculation method of cell state is defined as

ct = ft ct−1 + it c̃t (17)


Jou

The calculation method of cell output is defined as

ht = ot tanh (ct ) (18)

where σ (•) is the activation function, ht−1 is the output of LSTM cell at the
last moment, ct−1 is the state of LSTM cell at the last moment and bi , bo , bc are
bias.

16
Journal Pre-proof

310 3.3.3. Algorithm description of ECLGCNN

of
The loss function of ECLGCNN model is defined in formula (19):

Loss = cross entropy (p, l) + αkW k2 (19)

pro
where p is the predicted value of the model, l is the label, W is all parameters
of the model, and α is the regularization coefficient. The cross-entropy function
cross entropy (p, l) aims to measure the difference between the actual label and
the predicted value of the model, while the regular term αkW k2 aims to reduce
315 the overfitting of the model’s learning parameters.
The update rule of graph convolution parameter is defined in formula (20)
[25]:
re- θ∗ =θ∗ +λ
∂Loss
∂θ∗
(20)

where θ* ∈ RK×T is Chebyshev polynomial coefficient for T GCNNs, and λ is


the learning rate. ECLGCNN algorithm is described in Algorithm1.
lP
Algorithm 1 The description of ECLGCNN
Input: Sample collection F S ∈ Rn×T ×CN ×F N ,Data label l, Chebyshev order
320 K, Learning rate λ,The maximum number of iterations MAX, Stop iteration
threshold e, Regularization weight α, The number of LSTM hidden layer
cells num cell, The number of graph convolution structures T.
rna

Output: The desired parameters of ECLGCNN


1: Initialize θ* ∈ RK×T and other parameters to be learned in the ECLGCNN
325 model
     
2: //Calculate(T0 L̃ x, T1 L̃ x, · · · , TK−1 L̃ x) for each sample.
3: for i = 0;i < n;i + + do
for j = 0;i < T ;j + + do
Jou

4:

5: x = F Si,j,:,:
330 6: Calculate adjacency matrix A of x according to k-NN and formula
(2)
7: Calculate the Laplace matrix L of x according to formula (3)
L
8: L̃ = λM AX −E

17
Journal Pre-proof

 
9: Calculate Tk L̃ of x according to formula (12)

of
     
335 10: tempi,j = (T0 L̃ x, T1 L̃ x, · · · , TK−1 L̃ x)
11: end for
12: end for

pro
13: Step count = 0
14: while Loss > e||Step count < M AX do


340 15: yj = sigmoid batch norm tempi,j ∗ θ·,j (i = 1, 2, ..., n; j = 1, 2, ..., T )
16: //* is a convolution operation.
17: Convert yj to column vector yj∗
18: y ∗ = (y1∗ , y2∗ , ..., yT∗ )

345
19:

20:
re-
Send y ∗ to the receiving cells of LSTM
Calculate Loss according to formula (19)
21: if Loss < e then
22: Break
23: end if
lP
24: Update the parameters of LSTM based on the Loss and BP algorithm
350 25: Update graph convolution parameters according to formula (20)
26: Step count = Step count + 1
27: end while
rna

4. Experiments and discussion

4.1. DEAP
355 The experiments in this paper are based on multimodal DEAP data set.
DEAP is a large open source dataset that contains multiple physiological signals
with sentiment evaluation. In its data collection experiments, evoked EEG,
Jou

ECG, EMG and other bioelectric signals were detected and recorded, and 32
subjects (16 males and 16 females) were involved in 40 trials of music videos
360 with different emotional tendencies, where each music video was 1 minute. After
watching the music videos, participants rated the videos they watched on a scale
of 1-9 based on arousal, valence, likes, dominance and familiarity. Scored values
from small to large indicate that each indicator is from weak to strong.

18
Journal Pre-proof

In this paper, we used 32 channels data in the dataset, that is, only EEG

of
365 data is used. Eye myoelectricity, eye movement, and power supply noise of
EEG data are removed and the sampling rate is adjusted to 128 Hz. The du-
ration of the EEG signal is 63 seconds, including 3 seconds of pre-trial baseline

pro
data and 60 seconds of watching emotional video. For subject-dependent ex-
periments, we used data set of each subject to validate ECLGCNN. In order
370 to verify the model generalization, the data set of each subject was collected
into a sample set, so as to train and verify ECLGCNN. We defined the label
of DEAP EEG data as follows: arousal/valence with self-score more than 5
is high arousal/positive valence, otherwise it is low arousal/negative valence.

375
re-
The following section 4.3 and 4.4 discuss subject-dependent experiments and
subject-independent experiments, respectively.

4.2. Evaluation indices

The classification accuracy Acc and F-score are used to evaluate ECLGCNN
lP
model. The Acc is expressed as

TP + TN
Acc = (21)
TP + TN + FP + FN

where TP is the number of samples that the classification model can accurately
rna

identify low arousal/negative valence (named positive examples); TN is the


number of samples that the classification model can accurately identify the
380 high arousal/positive valence (named negative examples); FP is the number of
negative examples of error classification; FN is the number of positive examples
of error classification
The precision Pre is defined as
Jou

TP
P re = (22)
TP + FP

The calculation way of recall rate Rec is defined as

TP
Rec = (23)
TP + FN

19
Journal Pre-proof

F-score is an extension of the classification accuracy, which combines the

of
precision and the recall rate, and the calculation method is defined as

2 × P re × Rec
F − score = (24)
Rec + P re

pro
4.3. Subject-dependent experiments on DEAP
60−T

In this paper, each subject’s EEG data is divided into S + 1 × 40
385 samples, for example, when T is set to 6 and S is set to 3, 760 samples are
generated, and 3 times 5-fold cross-validation with random strategy was adopted
to verify ECLGCNN.
We selected No.16 subject’s data to choose the model parameters, because

390
re-
the number of positive samples and negative samples of No.16 subject is same in
binary classification of arousal. Then, we explored the influence of Chebyshev
polynomial order (K) and the number of LSTM hidden layer cells (num cell) on
emotion classification with ECLGCNN model. The influence of K and num cell
lP
is shown in Figure 5. We found that compared with num cell, the value of K
has the greater influence on ECLGCNN; when the num cell is 30 and the K is
395 2, ECLGCNN can reach the highest accuracy in binary classification of arousal.
Therefore, we set num cell to 30 and K to 2 in the following experiments. The
parameter settings of ECLGCNN is listed in Table 1.
rna
Jou

20
Journal Pre-proof

of
pro
re-
Figure 5: Experimental values of num cell and K
lP
Table 1: Parameter settings
ECLGCNN model parameters Values
The number of GCNNs T 6
The number of Chebyshev coefficients K 2
rna

The number of LSTM hidden layer cells(num cell) 30


GCNN activation function type sigmoid
LSTM activation function type sigmoid
The number of nodes in the graph 32
Maximum number of model iterations MAX 100000
Model error threshold e 0.1
Jou

Model learning rate λ 0.003


Model regular term coefficient α 0.0008

Experiments are conducted based on the above parameters and emotion


classification results of ECLGCNN model for each subject are shown in Table
400 2.

21
Journal Pre-proof

Table 2: ECLGCNN classification results

of
Binary classification of valence Binary classification of arousal
Subject
Accuracy(%) F-score(%) Accuracy(%) F-score(%)

pro
01 93.42 93.13 94.21 95.29
02 87.73 88.88 86.84 89.06
03 93.24 93.76 94.12 84.35
04 90.65 88.49 87.46 84.13
05 88.90 90.76 87.85 87.07
06 89.52 93.08 89.13 87.43
07
08
09
94.34
91.79
91.84
re- 95.97
92.52
91.80
89.87
88.64
88.99
92.04
90.26
90.74
10 93.90 93.81 90.92 92.08
11 80.79 84.33 86.58 82.14
lP
12 87.53 87.86 89.65 93.80
13 89.65 87.46 92.32 95.47
14 89.82 89.53 88.16 91.33
15 94.43 94.36 91.88 91.24
rna

16 92.76 90.17 94.52 94.42


17 82.62 83.65 88.73 90.60
18 93.11 94.18 92.37 93.95
19 92.94 93.92 92.11 94.18
20 90.48 91.51 93.68 95.95
21 91.36 91.75 94.35 96.52
Jou

22 91.45 90.72 88.59 90.54


23 93.47 94.99 93.55 87.03
24 89.43 88.17 92.94 95.67
25 87.50 86.86 91.93 94.42
26 89.30 91.81 89.61 87.51

22
Journal Pre-proof

Binary classification of valence Binary classification of arousal

of
Subject
Accuracy(%) F-score(%) Accuracy(%) F-score(%)
27 89.22 91.75 91.01 93.27
28 91.01 92.74 85.00 83.08

pro
29 92.67 93.71 94.34 95.42
30 92.28 94.33 91.41 90.84
31 88.78 90.24 88.63 87.73
32 88.33 88.31 89.74 92.47
Average 90.45 91.08 90.60 90.94

re-
From Table 2, experimental results indicate that the minimum, maximum
and average classification accuracies of ECLGCNN for 32 subjects are 80.79%,
94.43% and 90.45% respectively in binary classification of valence. For the
binary classification of arousal, the minimum, maximum and average classifica-
405 tion accuracies of ECLGCNN for 32 subjects are 85.00%, 94.52% and 90.60%
lP
respectively. On the other hand, the average of F-score reaches more than 90%
in these two classification tasks.
In the following, the classification results of support vector classification
(SVC) [55], decision tree (DT) [56] and random forest (RF) [57] are compared
rna

410 with ECLGCNN using the same features. The comparison results are shown in
Figure 6, Figure 7, Figure 8 and Figure 9 respectively.
From Figure 6(a) and Figure 6(b), the classification accuracy and F-score of
ECLGCNN are relatively stable in binary classification of arousal, while the clas-
sification accuracy and F-score of SVC are unstable compared with ECLGCNN,
415 RF and DT. The reason for this phenomenon in binary classification of arousal
Jou

is that SVC fails to find an optimal classification surface. The classification ac-
curacy and F-score of DT are closer to RF, because the strategies of generating
decision tree for RF and DT are similar. However, ECLGCNN makes use of
temporal and graph features from DE of EEG data to find a best classification
420 surface compared with SVC, DT and RF. On the whole, this shows that our
proposed model is effective in binary classification of arousal.

23
Journal Pre-proof

of
1

0.9

0.8

0.7

0.6
Accuracy

pro
0.5

0.4
ECLGCNN
0.3 SVC
DT
0.2 RF

0.1

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject

(a) Accuracy of the four classifiers on 32 subjects’ high/low arousal

0.9

0.8

0.7

0.6
re-
F-score

0.5

0.4
ECLGCNN
0.3 SVC
DT
0.2 RF

0.1
lP
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject

(b) F-score of the four classifiers on 32 subjects’ high/low arousal

Figure 6: Comparison of ECLGCNN, SVC, DF and RF in binary classification of arousal


rna

1.2
ECLGCNN
1.1 SVC
DT
1 RF
0.9060 0.9094
0.9
0.8131 0.8060 0.8211 0.8321 0.8152 0.8101
0.8
0.7
0.6
0.5
Jou

0.4
0.3
0.2
0.1
0
Accuracy F-score

Figure 7: Average classification result of the four classifiers on 32 subjects’ high/low arousal

24
Journal Pre-proof

As shown in Figure 7, the average classification accuracy and F-score of

of
ECLGCNN are the highest among four classifiers in binary classification of
arousal. The average F-score of SVC is higher compared with DT and RF,
425 while the average classification accuracy of SVC is the lowest. The reason for

pro
this phenomenon is that the classification accuracy and F-score of SVC are
unstable for 32 subjects. The average classification accuracy and F-score of DT
are closer to RF and the reason for this phenomenon is that RF is the extension
of DT. The average classification accuracy of ECLGCNN is at least 8.49% higher
430 than the other three classifiers in binary classification of arousal. In summary,
from Figure 6 and Figure 7, ECLGCNN is effective in binary classification of
arousal.
re-
Next, we show the comparison results of ECLGCNN, SVC, DF and RF in
binary classification of valence in Figure 8 and Figure 9.
1

0.9
lP
0.8

0.7

0.6
Accuracy

0.5

0.4
ECLGCNN
0.3 SVC
DT
0.2 RF

0.1

0
rna

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject

(a) Accuracy of the four classifiers on 32 subjects’ high/low valence

0.9

0.8

0.7

0.6
F-score

0.5

0.4
Jou

ECLGCNN
0.3 SVC
DT
0.2 RF

0.1

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subject

(b) F-score of the four classifiers on 32 subjects’ high/low valence

Figure 8: Comparison of ECLGCNN, SVC, DF and RF in binary classification of valence

25
Journal Pre-proof

435 From Figure 8(a) and Figure 8(b), the classification accuracy and F-score

of
of ECLGCNN are relatively stable in binary classification of valence. The clas-
sification accuracy of ECLGCNN for subject 22 is lower than SVC, and the
F-score of ECLGCNN for subject 12, subject 13, subject 22 and subject 24 is

pro
lower than SVC. These experimental results show that SVC is the best at binary
440 classification of valence based on DE compared with DT and RF, and SVC is
lower than ECLGCNN. The classification accuracy and F-score of DT are the
lowest in binary classification of valence compared with ECLGCNN, SVC and
RF.

1.2

1.1

0.9

0.8
0.9045
re-
0.8246
0.7936 0.8110
0.9108
0.8614
ECLGCNN
SVC
DT
RF

0.8068 0.8053

0.7

0.6

0.5
lP
0.4

0.3

0.2

0.1

0
Accuracy F-score
rna

Figure 9: Average classification result of the four classifiers on 32 subjects’ high/low valence

As shown in Figure 9, the average classification accuracy and F-score of


445 ECLGCNN are the highest in binary classification of valence compared with
SVC, DT and RF. As shown in Figure 7 and Figure 9, SVC is better in binary
classification of valence than arousal, but RF is opposite to SVC. The classifica-
tion results of ECLGCNN and DT in binary classification of valence are closer
Jou

to that in binary classification of arousal. The average classification accuracy of


450 ECLGCNN is at least 7.09% higher than other classifiers in binary classification
of valence. In summary, from Figure 8 and Figure 9, ECLGCNN is effective in
binary classification of valence.
Meanwhile, we compare ERDL with other methods and the comparison

26
Journal Pre-proof

results are shown in Figure 10. Experimental results show that our method

of
455 is the most effective compared with the other two methods. Our method is
5% higher than Salma’s method [35] in binary classification of valence and
4.95% higher in binary classification of arousal, where they designed a multi-

pro
layer LSTM framework for emotion recognition. Our method is also 5.25%
higher than Liu’s [41]method in binary classification of valence and 10.10%
460 higher in binary classification of arousal, where they combined bimodal deep
autoencoder and SVM to recognize emotions. In addition, we compare ERDL
with He’s (the last name of author) method [45]. Our experimental results
are at least 3.55% higher than their experimental results. Their experiments

465
re-
were conducted on positive emotion (valence>5 and arousal>5) and negative
emotion (valence<5 and arousal<5), but our experiments were conducted on
positive (valence>5)/negative (valence<5) valence and high (arousal>5)/low
(arousal<5) arousal. On the whole, our experimental results are higher than
their experimental results.
lP
ERDL
Liu et al.
Salma et al.
1
0.9045 0.9060
0.9 0.8545 0.8565
0.8520
0.8050
0.8

0.7
rna Accuracy

0.6

0.5

0.4

0.3

0.2

0.1

0
Valence Arousal
Jou

Figure 10: Comparison with other methods in Subject-dependent experiments

From the above experimental results, we may conclude that ERDL is the
470 most effective in the subject-dependent experiments, which contributes to the
fusion of GCNN and LSTM.

27
Journal Pre-proof

of
Table 3: Parameter settings
ECLGCNN model parameters Values
The number of GCNNs T 6
The number of Chebyshev coefficients K 10

pro
The number of LSTM hidden layer cells(num cell) 150
GCNN activation function type sigmoid
LSTM activation function type sigmoid
The number of nodes in the graph 32
Maximum number of model iterations MAX 100000
Model error threshold e 0.12
re-
Model learning rate λ
Model regular term coefficient α
0.003
0.00008

4.4. Subject-independent experiments on DEAP


lP
In this section, we synthesize 32 subjects’ data into one data set which
contains 24320(32 × 760) samples. We also adopt 3 times 5-fold cross-validation
475 with random strategy to verify ECLGCNN. The purpose of the experiments
in this section is to analyze whether ECLGCNN is effective in reducing the
difference among subjects. In order to reduce the difference among subjects’
rna

EEG, we expanded K and num cell by 5 times, respectively. The remaining


parameters are shown in Table 3. Experiments are conducted based on the
480 parameters in Table 3, and the emotion classification results of ECLGCNN
model for all subjects are obtained. For different classifiers, we compared SVC,
DT and RF with ECLGCNN with the same features. The comparison results
are shown in Figure 11 and Figure 12.
Jou

From Figure 11 and Figure 12, the average classification accuracy and F-
485 score of ECLGCNN are the highest in binary classification of arousal and va-
lence. The average classification accuracy of ECLGCNN is at least 7.74% higher
than the other three classifiers in binary classification of arousal and at least
8.29% higher than other three classifiers in binary classification of valence. The

28
Journal Pre-proof

average classification accuracy of SVC is the lowest in binary classification of

of
490 arousal and valence compared with the other three classifiers, because SVC is
sensitive to the choice of parameters and kernel functions, and fails to find the
optimal parameters. Therefore, ECLGCNN is effective in subject-independent

pro
emotion classification.

1.2
ECLGCNN
1.1 SVC
DT
1 RF

0.9 0.8713
0.8527
0.7888 0.7877
0.8 0.7568 0.7753
0.7317
0.7 0.6777

0.6

0.5

0.4

0.3

0.2
re-
0.1

0
Accuracy F-score
lP
Figure 11: Average classification result of four classifiers for low/high arousal in Subject-
independent experiments

1.2
ECLGCNN
1.1 SVC
rna

DT
1 RF

0.9 0.8621
0.8481
0.8 0.7509 0.7652 0.7732 0.7636
0.7098 0.7189
0.7

0.6

0.5

0.4

0.3

0.2

0.1
Jou

0
Accuracy F-score

Figure 12: Average classification result of four classifiers for positive/negative valence in
Subject-independent experiments

We also compare with other state-of-the-art emotion recognition methods in

29
Journal Pre-proof

495 literatures and the comparison results are shown in Figure 13. Classification

of
accuracy of ERDL method is the highest compared with the other methods
in emotion recognition. ERDL is at least 3.4% higher than other methods in
binary classification of valence and is at least 3.51% higher than other methods

pro
in binary classification of arousal. Specifically, ERDL is at least 3.4% higher
500 than Tripathi’s method [4] who made use of the time domain features of EEG
signals and deep learning model. ERDL is at least 11.15% higher than Li’s
method [5] where they made use of a fusion model which fuses CNN and LSTM.
ERDL is at least 3.71% higher than Xing’s method [39] where they proposed a
SAE+LSTM classification model. ERDL is at least 11.97% higher than Wang’s
505
re-
method [40] where they proposed a 3D CNN for emotion classification. ERDL
is at least 10.27% higher than Mert’s method [42] where they extracted PSD,
entropy and other features from EEG data, then processed them with ICA
and fed them into artificial neural network. ERDL is at least 11.81% higher
than Thammasan’s method [43] where they extracted the fractal dimension and
lP
510 power spectral density from EEG data, and then fed the extracted features into
the SVM. And our method ERDL is at least 3.51% higher than Zhang’s method
[44] where they decomposed the EEG signal into four bands (theta, alpha, beta
and gamma) and used FFT to calculate the power as EEG data features which
rna

are input to the PNN. In addition, we compare ERDL with Chen’s method [38].
515 Our experimental results are at least 16.18% higher than their experimental
results. Their method focuses on the EEG channels’ selection and the accuracy
of emotion classification between different genders, but we ignore the influence
of gender on emotion recognition in our researches. This phenomenon shows
that our method is more universal than their method.
Jou

520 In summary, ERDL makes use of temporal and graph features from EEG
data to achieve good classification results and the nonlinear cells of ECLGCNN
make it much more powerful to perform feature representation and learning.

30
Journal Pre-proof

of
ERDL
Tripathi et al.
Li et al.
Xing et al.
Wang et al.
1 Mert et al.
Thammasan et al.
Zhang et al.
0.9
0.8481 0.8527
0.8141 0.8110 0.8121 0.8176
0.8
0.7210 0.7287 0.7300 0.7336 0.7412 0.7438 0.7330 0.7500 0.7250

pro
0.7206
0.7
Accuracy

0.6

0.5

0.4

0.3

0.2

0.1

0
Valence Arousal

re-
Figure 13: Comparison with other methods in Subject-independent experiments

4.5. Summary
lP
The difference among subjects’ EEG is not considered in subject-dependent
525 experiments, but it is considered in subject-independent experiments. The ex-
perimental results confirmed the phenomenon as shown in Figure 9 and Figure
13 that the classification accuracies of subject-dependent experiments are at
least 5% higher than subject-independent experiments. The purpose of subject-
rna

independent experiments is to verify the generalization of the proposed model.


530 The above experimental results show that the proposed method is the most
effective in the subject-dependent experiments and the subject-independent ex-
periments. This further shows that ECLGCNN is effective in the emotion clas-
sification models.
Jou

5. Conclusion and future work

535 In this paper, we propose a new emotion recognition method using deep
learning model based on EEG’s differential entropy, which adopts a novel fu-
sion model of GCNN and LSTM for emotion classification. ECLGCNN utilizes
the graph and temporal information, where each EEG channel corresponds to a

31
Journal Pre-proof

graph node, and the functional relationship between two channels corresponds

of
540 to the edge of the graph and LSTM cells’ gates are used to extract effective
information. Both subject-dependent experiments and subject-independent ex-
periments on DEAP were conducted and the experimental results indicate that

pro
ERDL achieves better recognition accuracy than the state-of-the-art methods
such as CNN, RNN [5],LSTM [35], SAE+LSTM [39], EmotioNet [40], SVM [41],
545 ANN [42], and PNN [44] methods.
Furthermore, the average classification accuracy of ECLGCNN can be as
high as 90.52% for the case of subject-dependent experiments and can be as
high as 85.04% for the case of subject-independent experiments. The better

550
re-
classification accuracy of ECLGCNN owes to the following mechanisms:

• The nonlinear cells of ECLGCNN render it much more powerful to perform


feature representation and learning;

• ECLGCNN simultaneously extracts and combines the temporal and graph


lP
features for emotion classification.

Although better emotion classification accuracy is obtained in the above


555 experiments, we only explored the effectiveness of ECLGCNN on the binary
classification of emotions (low/high arousal or negative/positive valence). In the
rna

future work, we will expand ECLGCNN as a multi-classifier to finely distinguish


the emotion states. At the same time, the graph representation and new features
will be elaborated so as to represent the intrinsic relationship of EEG channel
560 and reduce computing complexity.

Acknowledgments
Jou

We are grateful for the support of the National Natural Science Foundation of
China (91846205, 61373149), the National Key R&D Program (2017YFB1400102,
2016YFB1000602), and SDNSFC (no.ZR2017ZB0420).

32
Journal Pre-proof

565 References

of
[1] S. M. Alarcao, M. J. Fonseca, Emotions recognition using eeg signals: A
survey, IEEE Transactions on Affective Computing 10 (3) (2019) 374–393.
doi:10.1109/TAFFC.2017.2714671.

pro
[2] M. Hamalainen, R. Hari, R. J. Ilmoniemi, J. Knuutila, O. V. Lounas-
570 maa, Magnetoencephalography-theory, instrumentation, and applications
to noninvasive studies of the working human brain, Reviews of Modern
Physics 65 (2) (1993) 413–497. doi:10.1103/RevModPhys.65.413.

[3] X. Wang, T. Zhang, X. Xu, L. Chen, X. Xing, C. L. P. Chen, Eeg emotion

575
re-
recognition using dynamical graph convolutional neural networks and broad
learning system, in: 2018 IEEE International Conference on Bioinformatics
and Biomedicine (BIBM), 2018, pp. 1240–1244. doi:10.1109/bibm.2018.
8621147.
lP
[4] S. Tripathi, S. Acharya, R. D. Sharma, S. Mittal, S. Bhattacharya, Using
deep and convolutional neural networks for accurate emotion classification
580 on DEAP dataset, in: S. P. Singh, S. Markovitch (Eds.), Proceedings of
the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9,
2017, San Francisco, California, USA, AAAI Press, 2017, pp. 4746–4752.
rna

URL http://aaai.org/ocs/index.php/IAAI/IAAI17/paper/view/
15007

585 [5] X. Li, D. Song, P. Zhang, G. Yu, Y. Hou, B. Hu, Emotion recognition from
multi-channel eeg data through convolutional recurrent neural network, in:
2016 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM), 2016, pp. 352–359. doi:10.1109/bibm.2016.7822545.
Jou

[6] T. Wilaiprasitporn, A. Ditthapron, K. Matchaparn, T. Tongbuasirilai,


590 N. Banluesombatkul, E. Chuangsuwanich, Affective eeg-based person iden-
tification using the deep learning approach, IEEE Transactions on Cog-
nitive and Developmental Systems (2020) 1–1doi:10.1109/TCDS.2019.
2924648.

33
Journal Pre-proof

[7] N. Thammasan, K. Fukui, M. Numao, Application of deep belief networks

of
595 in eeg-based dynamic music-emotion recognition, in: 2016 International
Joint Conference on Neural Networks (IJCNN), 2016, pp. 881–888. doi:
10.1109/IJCNN.2016.7727292.

pro
[8] T. Song, W. Zheng, P. Song, Z. Cui, Eeg emotion recognition using dynam-
ical graph convolutional neural networks, IEEE Transactions on Affective
600 Computing (2019) 1–1doi:10.1109/TAFFC.2018.2817622.

[9] T. Musha, Y. Terasaki, H. A. Haque, G. A. Ivamitsky, Feature extraction


from eegs associated with emotions, Artificial Life and Robotics 1 (1) (1997)
re-
15–19. doi:10.1007/BF02471106.

[10] L. I. Aftanas, N. V. Reva, A. A. Varlamov, S. V. Pavlov, V. P. Makhnev,


605 Analysis of evoked eeg synchronization and desynchronization in condi-
tions of emotional activation in humans: temporal and topographic char-
lP
acteristics, Neuroscience and Behavioral Physiology 34 (8) (2004) 859–867.
doi:10.1023/B:NEAB.0000038139.39812.eb.

[11] M. Kim, M. Kim, E. Oh, S. Kim, A review on the computational meth-


610 ods for emotional state estimation from the human eeg., Computational
and Mathematical Methods in Medicine 2013 (2013) (2013) 573734–573734.
rna

doi:10.1155/2013/573734.

[12] B. Hjorth, Eeg analysis based on time domain properties, Electroen-


cephalography and Clinical Neurophysiology 29 (3) (1970) 306–310. doi:
615 10.1016/0013-4694(70)90143-4.

[13] Y. Liu, O. Sourina, Real-time fractal-based valence level recognition from


Jou

eeg, in: Transactions on Computational Science XVIII, Vol. 18, Springer


Berlin Heidelberg, 2013, pp. 101–120. doi:10.1007/978-3-642-38803-3_
6.

620 [14] P. C. Petrantonakis, L. J. Hadjileontiadis, Emotion recognition from eeg

34
Journal Pre-proof

using higher order crossings, IEEE Transactions on Information Technology

of
in Biomedicine 14 (2) (2010) 186–197. doi:10.1109/TITB.2009.2034649.

[15] R. J. Davidson, What does the prefrontal cortex ”do” in affect: perspectives
on frontal eeg asymmetry research, Biological Psychology 67 (1) (2004)

pro
625 219–234. doi:10.1016/j.biopsycho.2004.03.008.

[16] M. Li, B. Lu, Emotion classification based on gamma-band eeg, in: 2009
Annual International Conference of the IEEE Engineering in Medicine and
Biology Society, Vol. 2009, 2009, pp. 1223–1226. doi:10.1109/IEMBS.
2009.5334139.

630
re-
[17] D. Nie, X. Wang, L. Shi, B. Lu, Eeg-based emotion recognition during
watching movies, in: 2011 5th International IEEE/EMBS Conference on
Neural Engineering, 2011, pp. 667–670. doi:10.1109/NER.2011.5910636.

[18] L. Shi, Y. Jiao, B. Lu, Differential entropy feature for eeg-based vigilance
lP
estimation, in: 2013 35th Annual International Conference of the IEEE
635 Engineering in Medicine and Biology Society (EMBC), Vol. 2013, 2013,
pp. 6627–6630. doi:10.1109/EMBC.2013.6611075.

[19] W. Zheng, J. Zhu, Y. Peng, B. Lu, Eeg-based emotion classification using


rna

deep belief networks, in: 2014 IEEE International Conference on Multime-


dia and Expo (ICME), 2014, pp. 1–6. doi:10.1109/ICME.2014.6890166.

640 [20] O. Lin, G. Liu, J. Yang, Y. Du, Neurophysiological markers of identifying


regret by 64 channels eeg signal, in: 2015 12th International Computer
Conference on Wavelet Active Media Technology and Information Process-
ing (ICCWAMTIP), 2015, pp. 395–399. doi:10.1109/ICCWAMTIP.2015.
Jou

7494017.

645 [21] Y. Shi, X. Zheng, T. Li, Unconscious emotion recognition based on multi-
scale sample entropy, in: 2018 IEEE International Conference on Bioin-
formatics and Biomedicine (BIBM), 2018, pp. 1221–1226. doi:10.1109/
bibm.2018.8621185.

35
Journal Pre-proof

[22] C. A. Frantzidis, C. Bratsas, C. Papadelis, E. I. Konstantinidis, C. Pappas,

of
650 P. D. Bamidis, Toward emotion aware computing: An integrated approach
using multichannel neurophysiological recordings and affective visual stim-
uli, IEEE Transactions on Information Technology in Biomedicine 14 (3)

pro
(2010) 589–597. doi:10.1109/TITB.2010.2041553.

[23] Y. Lin, C. Wang, T. Jung, T. Wu, S. Jeng, J. Duann, J. Chen, Eeg-based


655 emotion recognition in music listening, IEEE Transactions on Biomedical
Engineering 57 (7) (2010) 1798–1806. doi:10.1109/TBME.2010.2048568.

[24] F. Sbargoud, M. Djeha, M. Guiatni, N. Ababou, Wpt-ann and belief theory


re-
based eeg/emg data fusion for movement identification, Traitement Du
Signal 36 (5) (2019) 383–391. doi:10.18280/ts.360502.

660 [25] M. Defferrard, X. Bresson, P. Vandergheynst, Convolutional neural net-


works on graphs with fast localized spectral filtering, Advances in Neural
Information Processing Systems 29 (2016) (2016) 3844–3852.
lP
URL http://infoscience.epfl.ch/record/218985

[26] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini, The


665 graph neural network model, IEEE Transactions on Neural Networks 20 (1)
(2009) 61–80. doi:10.1109/TNN.2008.2005605.
rna

[27] F. P. Such, S. Sah, M. Dominguez, S. Pillai, C. Zhang, A. M. Michael, N. D.


Cahill, R. Ptucha, Robust spatial filtering with graph convolutional neural
networks, IEEE Journal of Selected Topics in Signal Processing 11 (6)
670 (2017) 884–896. doi:10.1109/JSTSP.2017.2726981.

[28] H. Zhu, N. Lin, H. Leung, R. Leung, S. Theodoidis, Target classifi-


Jou

cation from sar imagery based on the pixel grayscale decline by graph
convolutional neural network, IEEE Sensors Letters 4 (6) (2020) 1–4.
doi:10.1109/LSENS.2020.2995060.

675 [29] R. Levie, F. Monti, X. Bresson, M. M. Bronstein, Cayleynets: Graph con-


volutional neural networks with complex rational spectral filters, IEEE

36
Journal Pre-proof

Transactions on Signal Processing 67 (1) (2019) 97–109. doi:10.1109/

of
TSP.2018.2879624.

[30] V. Diego, F. Giulia, M. Enrico, Image denoising with graph-convolutional


680 neural networks, in: 2019 IEEE International Conference on Image Pro-

pro
cessing, ICIP 2019, Taipei, Taiwan, September 22-25, 2019, IEEE, 2019,
pp. 2399–2403. doi:10.1109/ICIP.2019.8803367.

[31] Y. Yu, X. Si, C. Hu, J. Zhang, A review of recurrent neural networks:


Lstm cells and network architectures, Neural Computation 31 (7) (2019)
685 1235–1270. doi:10.1162/neco_a_01199.
re-
[32] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, J. Schmidhuber,
Lstm: A search space odyssey, IEEE Transactions on Neural Networks
28 (10) (2017) 2222–2232. doi:10.1109/TNNLS.2016.2582924.

[33] Y. Yang, J. Zhou, J. Ai, Y. Bin, A. Hanjalic, H. Shen, Y. Ji, Video caption-
lP
690 ing by adversarial lstm, IEEE Transactions on Image Processing 27 (11)
(2018) 5600–5611. doi:10.1109/TIP.2018.2855422.

[34] X. Yu, W. Rong, J. Liu, D. Zhou, Y. Ouyang, Z. Xiong, Lstm-based end-to-


end framework for biomedical event extraction, IEEE/ACM Transactions
rna

on Computational Biology and Bioinformatics (2019) 1–1doi:10.1109/


695 tcbb.2019.2916346.

[35] S. Alhagry, A. A. Fahmy, R. A. Elkhoribi, Emotion recognition based on


eeg using lstm recurrent neural network, International Journal of Advanced
Computer Science and Applications 8 (10) (2017). doi:http://dx.doi.
Jou

org/10.14569/IJACSA.2017.081046.

700 [36] S. Koelstra, C. Muhl, M. Soleymani, J. Lee, A. Yazdani, T. Ebrahimi,


T. Pun, A. Nijholt, I. Patras, Deap: A database for emotion analysis ;us-
ing physiological signals, IEEE Transactions on Affective Computing 3 (1)
(2012) 18–31. doi:10.1109/T-AFFC.2011.15.

37
Journal Pre-proof

[37] Y. Liu, O. Sourina, Eeg-based subject-dependent emotion recognition al-

of
705 gorithm using fractal dimension, in: 2014 IEEE International Confer-
ence on Systems, Man, and Cybernetics (SMC), 2014, pp. 3166–3171.
doi:10.1109/SMC.2014.6974415.

pro
[38] J. Chen, B. Hu, P. Moore, X. Zhang, X. Ma, Electroencephalogram-based
emotion assessment system using ontology and data mining techniques,
710 Applied Soft Computing 30 (2015) 663–674. doi:10.1016/j.asoc.2015.
01.007.

[39] X. Xing, Z. Li, T. Xu, L. Shu, B. Hu, X. Xu, Sae+lstm: A new framework
re-
for emotion recognition from multi-channel eeg, Frontiers in Neurorobotics
13 (2019) 37. doi:10.3389/fnbot.2019.00037.

715 [40] Y. Wang, Z. Huang, B. Mccane, P. Neo, Emotionet: A 3-d convolutional


neural network for eeg-based emotion recognition, in: 2018 International
lP
Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–7. doi:10.
1109/IJCNN.2018.8489715.

[41] W. Liu, W. Zheng, B. Lu, Emotion recognition using multimodal


720 deep learning, in: Neural Information Processing, Springer Interna-
tional Publishing, 2016, pp. 521–529. doi:https://doi.org/10.1007/
rna

978-3-319-46672-9_58.

[42] A. Mert, A. Akan, Emotion recognition from eeg signals by using multi-
variate empirical mode decomposition, Pattern Analysis and Applications
725 21 (1) (2018) 81–89. doi:10.1007/s10044-016-0567-6.

[43] N. Thammasan, K. Moriyama, K. Fukui, M. Numao, Familiarity effects


Jou

in eeg-based emotion recognition, Brain Informatics 4 (1) (2017) 39–50.


doi:10.1007/s40708-016-0051-5.

[44] J. Zhang, M. Chen, S. Hu, Y. Cao, R. Kozma, Pnn for eeg-based emotion
730 recognition, in: 2016 IEEE International Conference on Systems, Man, and

38
Journal Pre-proof

Cybernetics (SMC), 2016, pp. 002319–002323. doi:10.1109/smc.2016.

of
7844584.

[45] H. He, Y. Tan, J. Ying, W. Zhang, Strengthen eeg-based emotion recogni-


tion using firefly integrated optimization algorithm, Applied Soft Comput-

pro
735 ing (2020) 106426doi:10.1016/j.asoc.2020.106426.

[46] Y. Liu, M. Yu, G. Zhao, J. Song, Y. Ge, Y. Shi, Real-time movie-induced


discrete emotion recognition from eeg signals, IEEE Transactions on Affec-
tive Computing 9 (4) (2018) 550–562. doi:10.1109/TAFFC.2017.2660485.

[47] F. Zhou, S. Kong, C. C. Fowlkes, T. Chen, B. Lei, Fine-grained facial


740
re-
expression analysis using dimensional emotion model, Neurocomputing 392
(2020) 38–49. doi:10.1016/j.neucom.2020.01.067.

[48] R. Fourati, B. Ammar, J. J. Sanchezmedina, A. M. Alimi, Unsuper-


vised learning in reservoir computing for eeg-based emotion recogni-
lP
tion, IEEE Transactions on Affective Computing (2020) 1–1doi:10.1109/
745 taffc.2020.2982143.

[49] H. Becker, J. Fleureau, P. Guillotel, F. Wendling, I. Merlet, L. Albera,


Emotion recognition based on high-resolution EEG recordings and recon-
rna

structed brain sources, IEEE Transactions on Affective Computing 11 (2)


(2020) 244–257. doi:10.1109/TAFFC.2017.2768030.

750 [50] J. Chen, P. W. Zhang, Z. J. Mao, Y. F. Huang, D. Jiang, Y. Zhang,


Accurate eeg-based emotion recognition on combined features using deep
convolutional neural networks, IEEE Access 7 (2019) 44317–44328. doi:
Jou

10.1109/ACCESS.2019.2908285.

[51] H. Candra, M. Yuwono, R. Chai, A. Handojoseno, I. Elamvazuthi, H. T.


755 Nguyen, S. W. Su, Investigation of window size in classification of eeg-
emotion signal with wavelet entropy and support vector machine, in: 2015
37th Annual International Conference of the IEEE Engineering in Medicine

39
Journal Pre-proof

and Biology Society (EMBC), 2015, pp. 7250–7253. doi:10.1109/EMBC.

of
2015.7320065.

760 [52] Z. Lan, G. Mullerputz, L. Wang, Y. Liu, O. Sourina, R. Scherer, Using


support vector regression to estimate valence level from eeg, in: Systems,

pro
Man, and Cybernetics (SMC), 2016 IEEE International Conference on,
2016, pp. 002558–002563. doi:10.1109/SMC.2016.7844624.

[53] J. Li, Z. Zhang, H. He, Hierarchical convolutional neural networks for eeg-
765 based emotion recognition, Cognitive Computation 10 (2) (2018) 368–380.
doi:10.1007/s12559-017-9533-x.
re-
[54] S. Fu, W. Liu, S. Li, Y. Zhou, Two-order graph convolutional networks for
semi-supervised classification, Iet Image Processing 13 (14) (2019) 2763–
2771. doi:10.1049/iet-ipr.2018.6224.

770 [55] C. Chang, C. Lin, LIBSVM: A library for support vector machines,
lP
ACM Trans. Intell. Syst. Technol. 2 (3) (2011) 27:1–27:27. doi:10.1145/
1961189.1961199.

[56] S. R. Safavian, D. A. Landgrebe, A survey of decision tree classifier method-


ology, IEEE Transactions on Systems, Man, and Cybernetics 21 (3) (1991)
rna

775 660–674. doi:10.1109/21.97458.

[57] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. doi:https:
//doi.org/10.1023/A:1010933404324.
Jou

40
Journal Pre-proof
The highlights of our paper are as follow:
 A new emotion recognition method using deep learning model based on EEG's differential
entropy is proposed. In contrast to the traditional emotion recognition method, multiple GCNN
structures are utilized to extract the temporal information and graph domain information, and
LSTM is integrated to memorize the change of the relationship between two EEG channels
within a specific time, and the fusion of GCNN and LSTM improves effectiveness of emotion
recognition.
 A fusion model of LSTM and GCNN for emotion classification (named ECLGCNN) is

of
proposed, which utilizes the graph and temporal information. In the fusion model, each EEG
channel corresponds to a vertex node, and the functional relationship between two channels
corresponds to edge of the graph where the greater value of the edge is, the closer the functional

pro
relationship between two channels is; LSTM cells’ gates are used to extract effective
information from input (the output of GCNNs) for emotion classification.
 Extensive experiments on DEAP are conducted to verify ECLGCNN model and experimental
results demonstrate that the proposed method has better emotion classification performance
than the state-of-the-art methods. The average accuracy reaches 90.45% and 90.60% for
valence and arousal in subject-dependent experiments while 84.81% and 85.27% in subject-
independent experiments.
re-
lP
rna
Jou
Journal Pre-proof
Yongqiang Yin: Conceptualization, Methodology, Software
Xiangwei Zheng: Writing- Reviewing and Editing
Bin Hu: Supervision.
Yuang Zhang: Software
Xinchun Cui: Validation

of
pro
re-
lP
rna
Jou
*Declaration of Interest Statement Journal Pre-proof

Declaration of interests

☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.

of
☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:

pro
re-
lP
rna
Jou

You might also like