Wang 2018

Received: 13 October 2017 Revised: 26 November 2017 Accepted: 27 November 2017
DOI: 10.1002/cpe.4413
SPECIAL ISSUE PAPER
Short time Fourier transformation and deep neural networks

for motor imagery brain computer interface recognition
Zijian Wang1 Lei Cao2 Zuo Zhang1 Xiaoliang Gong1 Yaoru Sun1 Haoran Wang2
1 Department of Computer Science and
Technology, Tongji University, Shanghai 200092, Summary

China
2 Department of Computer Science, College of
Motor imagery (MI) is an important control paradigm in the field of brain-computer interface
Information Engineering, Shanghai Maritime (BCI), which enables the recognition of personal intention. So far, numerous methods have been
University, Shanghai 201306, China designed to classify EEG signal features for MI task. However, deep neural networks have been
seldom applied to analyze EEG signals. In this study, two novel kinds of deep learning schemes
Correspondence
based on convolutional neural networks (CNN) and Long Short-Term Memory (LSTM) were pro-
Xiaoliang Gong, Department of Computer
Science and Technology, Tongji University, posed for MI-classification. The frequency domain representations of EEG signals were obtained
Shanghai 200092, China. using short time Fourier transform (STFT) to train models. Classification results were compared
Email: gxllshsh@163.com
between conventional algorithm, CNN, and LSTM models. Compared with two other methods,
Funding information
CNN algorithms had shown better performance. These conclusions verified that CNN method
Shanghai NSF research project, Grant/Award was promising for MI-based BCIs.
Number: 16JC1401300; Wuxi Technology
Project, Grant/Award Number: 187 KEYWORDS
BCI, CNN, deep learning, LSTM, motor imagery
1 INTRODUCTION
Brain Computer Interface (BCI) constructs a communication pathway between the human brain and external device for the severely disabled.1 It
allows users to operate wheelchairs, spelling interfaces, video games as well as other assistive tools based on EEG signals. EEG signals are commonly
selected for BCI input due to its non-invasion, inexpensive solution and convenience.
Several modalities of EEG signals have been studied for transforming human intentions into controlling commands.2-5 Among them, motor
imagery, which is evoked by the dynamics of spectral oscillations, is typically selected for BCI applications. MI signals are acquired by imaginary of
limb movement. The change in the power of mu and beta rhythms, referred to as event-related desynchronization and event-related synchronization
(ERD/ERS), is characterized for the discrimination between different limb movements.6
Pattern recognition techniques are utilized for the signal detection of MI tasks. In the conventional classification algorithms, the spectrum
analysis is first used for feature extraction, and the machine learning methods are then used to classify different MI modalities. Common spa-
tial patterns (CSP) are widely used as MI features.7 The support vector machine (SVM) and other neural network algorithms are employed for
BCI classification.8-10 However, the low signal-noise ratio (SNR) of EEG signals is disadvantageous for classification.11 Therefore, the precisions of
MI-based BCI was lower than 80% in previous studies.12,13
In recent years, lots of machine learning researchers focused on the Deep Neural Networks (DNN), which were improved on the basis of BPNN
by Hinton and Salakhutdinov.14 Common algorithms of DNN include Restricted Boltzmann Machine (RBM), Deep Belief Networks (DBN),15 Auto
Encoder (AE), Sparse Coding,16,17 Convolutional Neural Networks (CNN),18 and Recurrent Neural Networks (RNN).19 Especially, DBN, CNN, and
RNN have been widely used in the fields of image recognition, Automatic Speech Recognition (ASR), and sentiment analysis20-24 and have produced
breakthrough in these fields.
Some research studies have used DNN to classify BCI or EEG signals. In these research studies, DNN has resulted in better performance than
traditional algorithms. Li et al selected channels according to the contribution and use RBM for classifying the combined low dimensional charac-
* Lei Cao and Zijian Wang contributed equally.
Concurrency Computat Pract Exper. 2018;e4413. wileyonlinelibrary.com/journal/cpe Copyright © 2018 John Wiley & Sons, Ltd. 1 of 9
https://doi.org/10.1002/cpe.4413
2 of 9 WANG ET AL.
teristics from the chosen EEG channels for affective state recognition.25 Jirayucharoensak et al have combined a stacked autoencoder (SAE) and
principal component analysis (PCA) in automatic emotion recognition of EEG signals and improved the accuracy by 5%-6% compared with using
PCA and covariate shift adaptation.26 Besides, Cecotti and Graser have also employed a new model based on a convolutional neural network for the
detection of P300 waves, reaching a recognition rate of 95.5%.27 These proposed methods provided new ways for analyzing EEG and BCI signals
using DNN models.
In this paper, Short Time Fourier Transform (STFT)28 was employed for feature extraction and the proposed EEG re-presentation patterns. STFT is
the function that transforms the data with multiplying by a window function for a short period of time. The Fourier transform is taken as the window
sliding along the time axis, finally resulting in a two-dimensional representation of the signal. STFT was used to calculate the power spectrum density
of every segment in feature extraction of egg signals.29,30
This paper is organized as follows. Section 2 describes related works including CNN, LSTM, and activation functions. Section 3 introduces the
procedures for data collection. Then, Section 4 showed the experimental result. Discussion is described in Section 5, and Section 6 is the conclusion
of the paper.
2 BRIEF REVIEW OF RELEVANT METHODS
2.1 Common spatial patterns and support vector machine

We have presented a method of data analysis for motor imagery-based BCI task.9 In this paper, common spatial patterns (CSP) and the support
vector machine (SVM) are combined to discriminate two-class (left hand and right hand) MI patterns compared with deep learning algorithms. CSP
method is useful for binary classification of EEG signals. Based on the dataset, a transformation matrix is calculated for a spatial filter.7 The first and
last rows of the CSP transformation matrix could maximize the difference between two classes. In our algorithm, the first and last two rows are
selected to construct the transformation matrix for further data analysis. The probabilities from SVM classifier are two decimals, ranging from 0 to
1 and the sum of them equals to 1. Provided by the LIBSVM toolbox, the feature vectors are fed into the SVM classifier to predict the class label and
probability. The radial basis function (RBF) is used as the kernel function and a five-fold cross validation within the training data is used for choosing
suitable SVM parameters.
2.2 Convolutional neural network

CNN was proposed by LeCun et al based on BPNN.31 CNN typically consists of convolutional layers, pooling layers, dense layers, and an output
layer. A convolutional layer is composed of feature maps with several neurons. Every neuron connects with the previous feature maps through a
convolutional kernel (a weighted matrix). Using convolution operators, convolutional layers extract different features such as edges, lines, and cor-
ners. Deeper convolutional layers extract features that are more advanced. A convolutional layer is usually followed by a pooling layer, which down
samples the feature maps and diminishes the complexity of the model. The tails of CNN are normally several dense layers and one output layer that
compose all of the feature maps and output classification according to the class label.
The convolutional layers are calculated as follows:
⎛∑ ⎞
X𝑗l = 𝑓 ⎜ Xil−1 × Ki𝑗l + b𝑗l ⎟ . (1)
⎜i∈M ⎟
⎝ 𝑗 ⎠
X𝑗l is the jth feature map in layer l, Ki𝑗l is the convolution kernel function, f() is the activation function and b𝑗l is the bias parameter. Mj is the set of
input feature maps, from which input feature maps could be selected. For a specific output feature map, convolution kernels for all the input feature
maps are the same.
The gradient for convolution layers could be computed as follows:
( ( ) ( ))
𝛿𝑗l = 𝛽𝑗l+1 𝑓 ′ ul𝑗 ⊗ u𝑝 𝛿𝑗l+1 . (2)
𝛿𝑗l is the error signal of the jth feature map in layer l, 𝛽𝑗l+1 is the weight value of the next pooling layer, ul𝑗 is a single neuron, f′ is the partial derivatives
of activation function, and up() is the up sampling operator. The gradient of bias basis could be calculated by the sum of 𝛿𝑗l :
𝜕E ∑( )
= 𝛿𝑗l uv . (3)
𝜕b𝑗 u,v
Finally, the weight gradient of convolution kernels could be calculated with back propagation algorithm:
𝜕E ∑( ) ( )
= 𝛿𝑗l uv 𝑝l−1
𝑗 . (4)
𝜕Ki𝑗
l
u,v
uv
WANG ET AL. 3 of 9
𝑝l−1 l
𝑗 is pixel blocks in the l − 1 layer, which is multiplied with Ki𝑗 .
The pooling layer could be calculated by a simple down sampling:
( ( ) )
X𝑗l = 𝑓 𝛽𝑗l down X𝑗l−1 + bl𝑗 . (5)
down() is the down sampling operator. f() is the activation function for pooling layer.
2.3 Long short-term memory

LSTM is a kind of RNN model with LSTM unit (Figure 1), which diminishes the gradient explosion and gradient disappearance problem of
simple RNN.32
LSTM unit is designed for saving memory. The updating and searching of memory is controlled by input gate, forget gate, and output gate. In this
figure, h is the output of LSTM unit, c is the value of memory unit, and x is the input signal. There are 6 steps to update a LSTM unit.
1) Calculate the candidate unit value c̃t , Wxc , and Whc . 1) is the weight between input data and output for the previous time:
c̃t = tanh(Wxc xt + Whc ht−1 + bc ). (6)
2) Calculate the value of input gate it , 𝜎 is the activation function (used to be logistic sigmoid):
it = 𝜎(Wxi xt + whi ht−1 + wci ct−1 + bi ). (7)
3) Calculate the value of forget gate ft :

𝑓t = 𝜎(Wx𝑓 xt + wh𝑓 ht−1 + wc𝑓 ct−1 + b𝑓 ). (8)
4) Calculate the unit value c. ⊗ is pointwise product:
ct = 𝑓t ⊗ ct−1 + it ⊗ c̃t . (9)
5) Calculate the output gate ot :

ot = 𝜎(Wxo xt + who ht−1 + wco ct−1 + bo ). (10)
6) At last, calculate the output value of LSTM unit:

ht = ot ⊗ tanh(ct ). (11)
FIGURE 1 The structure of LSTM unit

4 of 9 WANG ET AL.
2.4 Activation functions for deep neural networks

The activation function is normally a representation of the rate of action potential firing in a neuron as a nonlinear factor in neural networks. There
are 3 kinds of activation functions used in our works, ReLU (Rectified Linear Unit),33 ELU (Exponential Linear Unit),34 and SELU (Scaled Exponential
Liear Unit).35 These activation functions are calculated as follows.
For ReLU:
⎧
⎪x, x > 0
𝑓 (x) = ⎨ (12)
⎪0, x < 0.
⎩
For ELU:
⎧
⎪x, x > 0
𝑓 (x) = ⎨ (13)
⎪𝛼(ex𝑝(x) − 1), x < 0.
⎩
For SELU:
⎧
⎪x, x > 0
𝑓 (x) = 𝜆 ⎨ (14)
⎪𝛼(ex𝑝(x) − 1), x < 0.
⎩
3 METHODS
3.1 Data collection and re-presentation of EEG signal data

A total of 14 subjects (12 males and 2 females), aged from 24 to 28, participated in our experiment. An EEG cap was used for data acquisition from
11 Cu electrodes (Figure 2). A high-performance bio-signal amplifier translated the raw signal into the computer-sensible data. These subjects were
required to perform one of two-class (left or right-hand motor imagery) MI tasks.
In the task, 60 MI trials (30 left hand imageries and 30 right hand imageries) were conducted by all subjects. First, the screen displayed a green
cross for 2 seconds to keep his or her attention in a trial. Then, the left or right arrow was shown to remind the subject of performing corresponding
MI task for another 2 seconds. The sampling rate was 256 Hz and a band filter was used with 5-30 Hz. The selected electrode channel set was
consisted of C3, Cz, and C4. Our EEG dataset could be accessed through http://eelab.tongji.edu.cn/Data/List/zlzq.
In the preprocess step, we re-presented the EEG signals from three channels by Short Time Fourier Transform (STFT), as shown in Figure 3. The
window size of STFT was 64 sample points and the hopping size is 32 sample points. Every window obtained seven frequency components from
8-30 Hz. For a single electrode, a matrix with size of 7 × 15 would be obtained in a trial. Therefore, we could get a matrix with the size of 21 × 15 for
all the three electrodes. It was resized into 48 × 36 for the reason of computation complexity.
FIGURE 2 The electrodes distribution in this experiment. The channels with black circle are channels collecting signals
WANG ET AL. 5 of 9
FIGURE 3 The STFT results for left hand and right hand movement imagery. Two STFT 21 × 15 images for left hand and right hand of one subject,
which were concat images of the STFT images for C4, Cz, and C3 of frequency between 8 and 30 Hz
3.2 Description of our CNN model

When the data in the frequency domain were obtained, we transformed these data into images and fed them into the CNN model. The CNN model
was composed of seven layers (Figure 4). The first layer was the input layer, followed by a convolutional layer with kernel size 3 × 3 and a max pooling
layer with kernel size 2 × 2. To extract more advanced features, we designed another same convolutional layer and max pooling layer behind. The
last two layers were dense layers, which is consisted of 100 and 2 neurons, respectively. A softmax function was used as an activation function in
the second dense layer to compute the predicted labels.
3.3 Description of our LSTM model

The LSTM model was fed with sequence data as input. We regarded the image data as a sequence that consist of 36 columns (time series). A single
column was composed of 48 RGB pixels as a 144-dimension vector. The LSTM model had three layers, as shown in Figure 5. The first layer was the
input layer, followed by a LSTM layer with 32 LSTM units. The last layer was a dense layer consisting of 2 neurons. Besides, a softmax function was
also applied as an activation function in this dense layer.
In both models, the cross entropy was used as the loss function, and stochastic gradient descent (SGD)36 was applied in the learning process with
learning rate 0.0001. The model was constructed on the tensorflow platform.37
FIGURE 4 The structure of CNN model

6 of 9 WANG ET AL.
FIGURE 5 The structure of LSTM model
4 EXPERIMENTAL RESULTS AND ANALYSIS
In this section, we introduce our experiment and results. At first, the data re-presentation was applied to the dataset and output images for each
trial. Then, CNN and LSTM models were used to analyze the data. Besides, the results of conventional method, CNN model and LSTM models were
compared. The confusion matrices were calculated to measure the performance of five models. In addition, the loss curves between CNN models
and the LSTM model were compared to look deeper insight into the DNN models.
For 60 trials in each evaluation, we separated train set with 42 trials and test set with 18 trials. Evaluation for each subject was achieved under
4-fold cross validation, and every DNN model was trained for 4 × 200 epochs. The classification accuracy for the 14 subjects can be found in Table 1.
From Table 1, it was clear that CNN models achieved the highest classification accuracy. CNN models achieved higher mean accuracy by 3.06%,
5.14%, and 9.05% than CSP+SVM. Especially for the sub13, CNN (SELU) model achieved higher accuracy than CSP+SVM even by 30%.
By the mean accuracy, LSTM was obviously the worst model, which got mean accuracy by 80.19%, even lower than CSP+SVM by 3.49%. However,
it was probable that the bad score of LSTM should be attributed to its instability. LSTM got lower accuracy by 20%-40% than CNN (SELU) and by
5%-30% than CSP+SVM for sub01, sub02, sub04, sub06, sub07, and sub08. However, LSTM performed better than the other models for sub 12
and sub 13, even by 30%. Paired t-tests of the accuracy standard derivations were calculated to test the stability of the three models (Table 2).
The accuracy standard derivations of CSP+SVM were significantly higher than CNN (SELU). Neverthless, LSTM and CNN (SELU) had the similar
TABLE 1 Classification accuracy of the five methods. Values in the parentheses are
the standard derivations
Subject CSP+SVM CNN(ReLU) CNN(ELU) CNN(SELU) LSTM
S01 89.17 (8.33) 88.88 (4.55) 89.1 (3.83) 95.78 (2.83) 58.34 (5.56)
S02 68.06 (7.28) 84.58 (8.25) 83.3 (7.12) 88.61 (2.78) 62.5 (2.78)
S03 100 (0) 100 (0) 100 (0) 100 (0) 100 (0)
S04 90.28 (5.32) 66.67 (4.54) 84.32 (5.28) 87.49 (5.31) 76.39 (10.5)
S05 100 (0) 100(0) 100 (0) 100 (0) 100 (0)
S06 77.32 (6.29) 77.78 (0) 83.7 (6.45) 84.44 (4.84) 56.95 (2.78)
S07 70.00 (8.66) 84.43 (7.87) 89.44 (2.28) 93.08 (3.01) 63.89 (5.55)
S08 83.27 (8.30) 71.94 (6.99) 74.9 (7.51) 80.83 (8.33) 54.5 (3)
S09 82.17 (5.15) 81.95 (5.32) 78 (2.31) 88.89 (2.40) 84.72 (2.78)
S10 100 (0) 100 (0) 100 (0) 100 (0) 100 (0)
S11 74.44 (4.60) 77.78 (4.54) 79.6 (4.3) 87.5 (7.22) 73.61 (2.78)
S12 92.78 (0) 93.12 (3.54) 92.28 (4.12) 95.83 (2.66) 100 (0)
S13 75 (9.27) 88.61 (2.78) 94.7 (3.21) 95.71 (1.27) 97.22 (3.21)
S14 94.79 (2.08) 98.61 (2.78) 94.2 (2.14) 100 (0) 94.45
Ave. 83.68 86.74 88.92 92.73 80.19
WANG ET AL. 7 of 9
TABLE 2 Paired t-test of the accuracy standard

derivations between CSP+SVM, CNN (SELU),
and LSTM
Comparison T-value P-value
CSP+SVM VS. CNN(SELU) 2.593 0.022

LSTM VS. CNN(SELU) 0.378 0.712
CSP+SVM VS. LSTM 1.905 0.079
TABLE 3Paired t-test of accuracy between

CSP+SVM, CNN (SELU), and LSTM
Comparison T-value P-value
CSP+SVM VS. CNN(SELU) 3.10 0.008

LSTM VS. CNN(SELU) 3.29 0.006
CSP+SVM VS. LSTM 1.04 0.315
FIGURE 6 Confusion matrices of the five models. In each block, the upper percentage referred to the average accuracy, and the lower percentage
referred to the fluctuation range of the accuracy
standard derivations, which indicated that these two kinds of DNN methods could make better performance than CSP+SVM, but both of CNN and
LSTM were not very robust methods.
For every classifier, we got 14 accuracies. Paired t-tests between CSP+SVM, CNN (SELU), and LSTM models were implemented (Table 3). The
results of paired t-tests showed that CNN (SELU) performed significantly better than the other two methods, but CSP+SVM did not significantly
surpass LSTM.
Figure 6 showed that the confusion matrices of the five methods indicating that CNN (RELU) model was more robust and stable than CNN (ReLU),
CNN (ELU), CSP+SVM, and LSTM models. LSTM model was especially unstable, which got a difference by ±64.95% for the false negative rate.
5 DISCUSSION
In this paper, we designed CNN and LSTM models for MI-BCI analysis. CNN outperformed CSP+SVM in MI-BCI analysis. Compared with the
CSP+SVM, CNN with RELU achieved higher mean accuracy by 9.05%. Obviously, the confusion matrices and results of paired t-tests of accuracy
standard derivations showed more stable performance for CNN with RELU within and between subjects.
However, for another DNN model, ie, LSTM model, why was it even outperformed by CSP+SVM? In Table 1, the mean accuracy of LSTM was
lower than CSP+SVM, but the paired t-test of accuracy results showed insignificant between LSTM and CSP+SVM, suggesting that they performed
8 of 9 WANG ET AL.
similarly in accuracies. LSTM performed similarly with CSP+SVM and CNN models for the last six subjects and even outperformed CSP+SVM and
CNN (RELU) for sub12 and sub 13. It could be the reasons of individual differences that make LSTM less accurate.
In the confusion matrices, LSTM was obviously more unstable than CSP+SVM and CNN. However, LSTM showed insignificant differences
between CNN in the paired t-test of the accuracy standard derivations. It could be inferred that LSTM was stable within subject but unstable
between subjects, which means that LSTM was not suitable for every subject. LSTM is usually applied in application scenarios with common sequen-
tial patterns, such as text comprehension and video comprehension. However, CNN is usually applied to scenarios with common spatial patterns,
such as image classification. For EEG signal, STFT transformed the signal into images that contained common spatial patterns lead to the good per-
formance of CNN. However, LSTM could not turn these common spatial patterns into common sequential patterns for some subjects, causing the
bad performance of LSTM.
In our study, we selected only 3 channels located in the central region for data analysis. Besides, the number of the electrodes was less than those
of other studies.9,38 It was convenient for wearable BCI applications. Few channels may be beneficial for developing easy-used detectors in practical
environment.
The CNN algorithm has been proved to be robust for off-line analysis. The conclusion implied that it was feasible to apply this method for online
MI-based model training. However, time loss of training classifier must be decreased in experimental design. Consequently, we intend to employ
intra-stable recognition model for every subject in further works.
6 CONCLUSION
This work has presented the comparison between CSP+SVM, CNN, and LSTM models for analyzing EEG signals in the motor imagery tasks. We
found that CNN models with SELU outperformed CSP+SVM and LSTM. However, due to the varied performance between subjects, LSTM even
achieved lower mean accuracy than CSP+SVM. This work proposed the different adaptation for different DNN models (CNN and RNN), and dis-
cussed the reasons for the poor performance of LSTM in classifying MI-EEG signals. As a result, the proposed CNN model is a potential classifier in
EEG off-line analysis for MI-BCI. Besides, future work will be carried out to resolve the instable performance of LSTM across subjects and to ana-
lyze more about the DNNs between methods such as common spatio-spectral pattern (CSSP), sub-band common spatial pattern (SBCSP), and filter
bank common spatial pattern (FBCSP).
ACKNOWLEDGEMENT
This research was supported in part by Shanghai NSF research project (Grant No. 16JC1401300) and Wuxi Technology Project (2016 No. 187).
ORCID
Zijian Wang http://orcid.org/0000-0002-8147-5109
REFERENCES
1. Birbaumer N, Ghanayim N, Hinterberger T, et al. A spelling device for the paralysed. Nature. 1999;398:297-298.
2. Blankertz B, Lemm S, Treder W, Haufe S, Müller KR. Single-trial analysis and classification of ERP components: a tutorial. Neuroimage. 2011;56:814-825.
3. Kübler A, Birbaumer N. Brain-computer interfaces and communication in paralysis: extinction of goal directed thinking in completely paralysed patients?
Clin Neurophysiol. 2008;119:2658-2666.
4. Pfurtscheller G, Neuper C. Future prospects of ERD/ERS in the context of brain-computer interface (BCI) developments. Prog Brain Res.
2006;159:433-437.
5. Muller-Putz GR, Pfurtscheller G. Control of an electrical prosthesis with an SSVEP-based BCI. IEEE Trans Biomed Eng. 2008;55(1):361-364.
6. Pfurtscheller G, Neuper C, Schlogl A, Lugger K. Separability of EEG signals recorded during right and left motor imagery using adaptive autoregressive
parameters. IEEE Trans Rehabil Eng. 1998;6(3):316-325.
7. Ramoser H, Muller-Gerking J, Pfurtscheller G. Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans Rehabil Eng.
2000;8(4):441-446.
8. Anderson CW, Devulapalli SV, Stolz EA. Determining mental state from EEG signals using parallel implementations of neural networks. Sci Program.
1995;4(3):171-183.
9. Cao L, Xia B, Maysam O, Li J, Xie H, Birbaumer N. A synchronous motor imagery based neural physiological paradigm for brain computer interface speller.
Front Hum Neurosci. 2017;11:274. https://doi.org/10.3389/fnhum.2017.00274
10. Müller-Gerking J, Pfurtscheller G, Flyvbjerg H. Designing optimal spatial filters for single-trial EEG classification in a movement task. Clin Neurophysiol.
1999;110(5):787-798.
11. Ciregan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. Paper presented at: 2012 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR); 2012; Providence, RI.
WANG ET AL. 9 of 9
12. Huang D, Lin P, Fei DY, Chen X, Bai O. Decoding human motor activity from EEG single trials for a discrete two-dimensional cursor control. J Neural Eng.
2009;6(4):1-12.
13. Pfurtscheller G, Neuper C, Flotzinger D, Pregenzer M. EEG-based discrimination between imagination of right and left hand movement. Electroen-
cephalogr Clin Neurophysiol. 1997;103(6):642-651.
14. Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2016;313(5786):504-507.
15. Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527-1554.
16. Olshausen BA, Field DJ. Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Res. 1997;37(23):3311-3325.
17. Lee H, Battle A, Raina R, Ng AY. Efficient sparse coding algorithms. Paper presented at: Advances in Neural Information Processing Systems;
December 7–8, 2007; Vancouver, Canada.
18. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Paper presented at: Advances in Neural
Information Processing Systems; December 2–8, 2012; South Lake Tahoe, CA.
19. Mikolov T, Karafit M, Burget L, Cernocky J, Khudanpur S. Recurrent neural network based language model. Paper presented at: Interspeech 3;
September 26–30, 2010; Chiba, Japan.
20. Krahenbuhl P, Doersch C, Donahue J, Darrell T. Data-dependent initializations of convolutional neural networks. arXiv preprint arXiv:1511.06856; 2015.
21. Ciresan D, Meier U, Masci J, Schmidhuber J. Multi-column deep neural network for traffic sign classification. Neural Netw. 2012;32:333-338.
22. Ciresan DC, Meier U, Masci J, Maria GL, Schmidhuber J. Flexible, high performance convolutional neural networks for image classification. Paper
presented at: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI); July 16–22, 2011; Barcelona, Spain.
23. Socher R, Perelygin A, Wu J, et al. Recursive deep models for semantic compositionality over a sentiment treebank. Paper presented at: Proceedings of
the 2013 Conference on Empirical Methods in Natural Language Processing; October 18–21, 2013; Seattle, WA.
24. Dos Santos CN, Gatti M. Deep convolutional neural networks for sentiment analysis of short texts. Paper presented at: COLING; August 23–29, 2014;
Dublin, Ireland.
25. Li K, Li X, Zhang Y, Zhang A. Affective state recognition from EEG with deep belief networks. Paper presented at: 2013 IEEE International Conference
on Bioinformatics and Biomedicine (BIBM); December 18–21, 2013; Shanghai, China.
26. Jirayucharoensak S, Pan-Ngum S, Israsena P. EEG-based emotion recognition using deep learning network with principal component based covariate
shift adaptation. Sci World J. 2014;32:333-338.
27. Cecotti H, Graser A. Convolutional neural networks for P300 detection with application to brain-computer interfaces. IEEE Trans Pattern Anal Mach Intell.
2011;33(3):433-445.
28. Griffin D, Lim J. Signal estimation from modified short-time Fourier transform. IEEE Trans Acoust Speech Signal Process. 1984;32(2):236-243.
29. Tzallas AT, Tsipouras MG, Fotiadis DI. Epileptic seizure detection in EEGs using time–frequency analysis. IEEE Trans Inf Technol Biomed.
2009;13(5):703-710.
30. Sheikhani A, Behnam H, Mohammadi MR. Analysis of EEG background activity in Autism disease patients with bispectrum and STFT measure. Paper
presented at: Proceedings of the 11th WSEAS International Conference on Communications; July 26–28, 2007; Heraklion, Greece.
31. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278-2324.
32. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735-1780.
33. Nair V, Hinton GE. Rectified linear units improve restricted boltzmann machines. Paper presented at: Proceedings of the 27th International Conference
on Machine Learning (ICML'10); June 21–26, 2010; Haifa, Israel.
34. Clevert DA, Unterthiner T, Hochreiter S. Fast and accurate deep network learning by exponential linear units (ELUs). arXiv preprint arXiv:1511.07289;
2015.
35. Klambauer G, Unterthiner T, Mayr A, Hochreiter S. Self-normalizing neural networks. arXiv:1706.02515; 2017.
36. Bottou L. Large-scale machine learning with stochastic gradient descent. Paper presented at: Proceedings of COMPSTAT'2010; 2010; Paris, France.
37. Abadi M, Barham P, Chen J, et al. TensorFlow: A system for large-scale machine learning. Paper presented at: 12th USENIX Symposium on Operating
Systems Design and Implementation (OSDI); November 2–4, 2016; Savannah, GA.
38. Tang Z, Li C, Sun S. Single-trial EEG classification of motor imagery using deep convolutional neural networks. Optik. 2017;130:11-18.
How to cite this article: Wang Z, Cao L, Zhang Z, Gong X, Sun Y, Wang H. Short time Fourier transformation and deep neural networks
for motor imagery brain computer interface recognition. Concurrency Computat Pract Exper. 2018;e4413. https://doi.org/10.1002/cpe.4413

Wang 2018

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wang 2018

Uploaded by

Copyright:

Available Formats

Received: 13 October 2017 Revised: 26 November 2017 Accepted: 27 November 2017

SPECIAL ISSUE PAPER

Short time Fourier transformation and deep neural networks

1 Department of Computer Science and

Technology, Tongji University, Shanghai 200092, Summary

BCI, CNN, deep learning, LSTM, motor imagery

* Lei Cao and Zijian Wang contributed equally.

2 BRIEF REVIEW OF RELEVANT METHODS

2.1 Common spatial patterns and support vector machine

2.2 Convolutional neural network

2.3 Long short-term memory

c̃t = tanh(Wxc xt + Whc ht−1 + bc ). (6)

it = 𝜎(Wxi xt + whi ht−1 + wci ct−1 + bi ). (7)

3) Calculate the value of forget gate ft :

4) Calculate the unit value c. ⊗ is pointwise product:

ct = 𝑓t ⊗ ct−1 + it ⊗ c̃t . (9)

5) Calculate the output gate ot :

6) At last, calculate the output value of LSTM unit:

FIGURE 1 The structure of LSTM unit

2.4 Activation functions for deep neural networks

3.1 Data collection and re-presentation of EEG signal data

3.2 Description of our CNN model

3.3 Description of our LSTM model

FIGURE 4 The structure of CNN model

FIGURE 5 The structure of LSTM model

4 EXPERIMENTAL RESULTS AND ANALYSIS

TABLE 2 Paired t-test of the accuracy standard

CSP+SVM VS. CNN(SELU) 2.593 0.022

TABLE 3Paired t-test of accuracy between

CSP+SVM VS. CNN(SELU) 3.10 0.008

Zijian Wang http://orcid.org/0000-0002-8147-5109

You might also like