Professional Documents
Culture Documents
DOI: 10.1002/cpe.4413
Zijian Wang1 Lei Cao2 Zuo Zhang1 Xiaoliang Gong1 Yaoru Sun1 Haoran Wang2
Funding information
CNN algorithms had shown better performance. These conclusions verified that CNN method
Shanghai NSF research project, Grant/Award was promising for MI-based BCIs.
Number: 16JC1401300; Wuxi Technology
Project, Grant/Award Number: 187 KEYWORDS
1 INTRODUCTION
Brain Computer Interface (BCI) constructs a communication pathway between the human brain and external device for the severely disabled.1 It
allows users to operate wheelchairs, spelling interfaces, video games as well as other assistive tools based on EEG signals. EEG signals are commonly
selected for BCI input due to its non-invasion, inexpensive solution and convenience.
Several modalities of EEG signals have been studied for transforming human intentions into controlling commands.2-5 Among them, motor
imagery, which is evoked by the dynamics of spectral oscillations, is typically selected for BCI applications. MI signals are acquired by imaginary of
limb movement. The change in the power of mu and beta rhythms, referred to as event-related desynchronization and event-related synchronization
(ERD/ERS), is characterized for the discrimination between different limb movements.6
Pattern recognition techniques are utilized for the signal detection of MI tasks. In the conventional classification algorithms, the spectrum
analysis is first used for feature extraction, and the machine learning methods are then used to classify different MI modalities. Common spa-
tial patterns (CSP) are widely used as MI features.7 The support vector machine (SVM) and other neural network algorithms are employed for
BCI classification.8-10 However, the low signal-noise ratio (SNR) of EEG signals is disadvantageous for classification.11 Therefore, the precisions of
MI-based BCI was lower than 80% in previous studies.12,13
In recent years, lots of machine learning researchers focused on the Deep Neural Networks (DNN), which were improved on the basis of BPNN
by Hinton and Salakhutdinov.14 Common algorithms of DNN include Restricted Boltzmann Machine (RBM), Deep Belief Networks (DBN),15 Auto
Encoder (AE), Sparse Coding,16,17 Convolutional Neural Networks (CNN),18 and Recurrent Neural Networks (RNN).19 Especially, DBN, CNN, and
RNN have been widely used in the fields of image recognition, Automatic Speech Recognition (ASR), and sentiment analysis20-24 and have produced
breakthrough in these fields.
Some research studies have used DNN to classify BCI or EEG signals. In these research studies, DNN has resulted in better performance than
traditional algorithms. Li et al selected channels according to the contribution and use RBM for classifying the combined low dimensional charac-
Concurrency Computat Pract Exper. 2018;e4413. wileyonlinelibrary.com/journal/cpe Copyright © 2018 John Wiley & Sons, Ltd. 1 of 9
https://doi.org/10.1002/cpe.4413
2 of 9 WANG ET AL.
teristics from the chosen EEG channels for affective state recognition.25 Jirayucharoensak et al have combined a stacked autoencoder (SAE) and
principal component analysis (PCA) in automatic emotion recognition of EEG signals and improved the accuracy by 5%-6% compared with using
PCA and covariate shift adaptation.26 Besides, Cecotti and Graser have also employed a new model based on a convolutional neural network for the
detection of P300 waves, reaching a recognition rate of 95.5%.27 These proposed methods provided new ways for analyzing EEG and BCI signals
using DNN models.
In this paper, Short Time Fourier Transform (STFT)28 was employed for feature extraction and the proposed EEG re-presentation patterns. STFT is
the function that transforms the data with multiplying by a window function for a short period of time. The Fourier transform is taken as the window
sliding along the time axis, finally resulting in a two-dimensional representation of the signal. STFT was used to calculate the power spectrum density
of every segment in feature extraction of egg signals.29,30
This paper is organized as follows. Section 2 describes related works including CNN, LSTM, and activation functions. Section 3 introduces the
procedures for data collection. Then, Section 4 showed the experimental result. Discussion is described in Section 5, and Section 6 is the conclusion
of the paper.
X𝑗l is the jth feature map in layer l, Ki𝑗l is the convolution kernel function, f() is the activation function and b𝑗l is the bias parameter. Mj is the set of
input feature maps, from which input feature maps could be selected. For a specific output feature map, convolution kernels for all the input feature
maps are the same.
The gradient for convolution layers could be computed as follows:
( ( ) ( ))
𝛿𝑗l = 𝛽𝑗l+1 𝑓 ′ ul𝑗 ⊗ u𝑝 𝛿𝑗l+1 . (2)
𝛿𝑗l is the error signal of the jth feature map in layer l, 𝛽𝑗l+1 is the weight value of the next pooling layer, ul𝑗 is a single neuron, f′ is the partial derivatives
of activation function, and up() is the up sampling operator. The gradient of bias basis could be calculated by the sum of 𝛿𝑗l :
𝜕E ∑( )
= 𝛿𝑗l uv . (3)
𝜕b𝑗 u,v
Finally, the weight gradient of convolution kernels could be calculated with back propagation algorithm:
𝜕E ∑( ) ( )
= 𝛿𝑗l uv 𝑝l−1
𝑗 . (4)
𝜕Ki𝑗
l
u,v
uv
WANG ET AL. 3 of 9
𝑝l−1 l
𝑗 is pixel blocks in the l − 1 layer, which is multiplied with Ki𝑗 .
The pooling layer could be calculated by a simple down sampling:
( ( ) )
X𝑗l = 𝑓 𝛽𝑗l down X𝑗l−1 + bl𝑗 . (5)
down() is the down sampling operator. f() is the activation function for pooling layer.
1) Calculate the candidate unit value c̃t , Wxc , and Whc . 1) is the weight between input data and output for the previous time:
2) Calculate the value of input gate it , 𝜎 is the activation function (used to be logistic sigmoid):
For ELU:
⎧
⎪x, x > 0
𝑓 (x) = ⎨ (13)
⎪𝛼(ex𝑝(x) − 1), x < 0.
⎩
For SELU:
⎧
⎪x, x > 0
𝑓 (x) = 𝜆 ⎨ (14)
⎪𝛼(ex𝑝(x) − 1), x < 0.
⎩
3 METHODS
FIGURE 2 The electrodes distribution in this experiment. The channels with black circle are channels collecting signals
WANG ET AL. 5 of 9
FIGURE 3 The STFT results for left hand and right hand movement imagery. Two STFT 21 × 15 images for left hand and right hand of one subject,
which were concat images of the STFT images for C4, Cz, and C3 of frequency between 8 and 30 Hz
In this section, we introduce our experiment and results. At first, the data re-presentation was applied to the dataset and output images for each
trial. Then, CNN and LSTM models were used to analyze the data. Besides, the results of conventional method, CNN model and LSTM models were
compared. The confusion matrices were calculated to measure the performance of five models. In addition, the loss curves between CNN models
and the LSTM model were compared to look deeper insight into the DNN models.
For 60 trials in each evaluation, we separated train set with 42 trials and test set with 18 trials. Evaluation for each subject was achieved under
4-fold cross validation, and every DNN model was trained for 4 × 200 epochs. The classification accuracy for the 14 subjects can be found in Table 1.
From Table 1, it was clear that CNN models achieved the highest classification accuracy. CNN models achieved higher mean accuracy by 3.06%,
5.14%, and 9.05% than CSP+SVM. Especially for the sub13, CNN (SELU) model achieved higher accuracy than CSP+SVM even by 30%.
By the mean accuracy, LSTM was obviously the worst model, which got mean accuracy by 80.19%, even lower than CSP+SVM by 3.49%. However,
it was probable that the bad score of LSTM should be attributed to its instability. LSTM got lower accuracy by 20%-40% than CNN (SELU) and by
5%-30% than CSP+SVM for sub01, sub02, sub04, sub06, sub07, and sub08. However, LSTM performed better than the other models for sub 12
and sub 13, even by 30%. Paired t-tests of the accuracy standard derivations were calculated to test the stability of the three models (Table 2).
The accuracy standard derivations of CSP+SVM were significantly higher than CNN (SELU). Neverthless, LSTM and CNN (SELU) had the similar
TABLE 1 Classification accuracy of the five methods. Values in the parentheses are
the standard derivations
Subject CSP+SVM CNN(ReLU) CNN(ELU) CNN(SELU) LSTM
S01 89.17 (8.33) 88.88 (4.55) 89.1 (3.83) 95.78 (2.83) 58.34 (5.56)
S02 68.06 (7.28) 84.58 (8.25) 83.3 (7.12) 88.61 (2.78) 62.5 (2.78)
S03 100 (0) 100 (0) 100 (0) 100 (0) 100 (0)
S04 90.28 (5.32) 66.67 (4.54) 84.32 (5.28) 87.49 (5.31) 76.39 (10.5)
S05 100 (0) 100(0) 100 (0) 100 (0) 100 (0)
S06 77.32 (6.29) 77.78 (0) 83.7 (6.45) 84.44 (4.84) 56.95 (2.78)
S07 70.00 (8.66) 84.43 (7.87) 89.44 (2.28) 93.08 (3.01) 63.89 (5.55)
S08 83.27 (8.30) 71.94 (6.99) 74.9 (7.51) 80.83 (8.33) 54.5 (3)
S09 82.17 (5.15) 81.95 (5.32) 78 (2.31) 88.89 (2.40) 84.72 (2.78)
S10 100 (0) 100 (0) 100 (0) 100 (0) 100 (0)
S11 74.44 (4.60) 77.78 (4.54) 79.6 (4.3) 87.5 (7.22) 73.61 (2.78)
S12 92.78 (0) 93.12 (3.54) 92.28 (4.12) 95.83 (2.66) 100 (0)
S13 75 (9.27) 88.61 (2.78) 94.7 (3.21) 95.71 (1.27) 97.22 (3.21)
S14 94.79 (2.08) 98.61 (2.78) 94.2 (2.14) 100 (0) 94.45
Ave. 83.68 86.74 88.92 92.73 80.19
WANG ET AL. 7 of 9
FIGURE 6 Confusion matrices of the five models. In each block, the upper percentage referred to the average accuracy, and the lower percentage
referred to the fluctuation range of the accuracy
standard derivations, which indicated that these two kinds of DNN methods could make better performance than CSP+SVM, but both of CNN and
LSTM were not very robust methods.
For every classifier, we got 14 accuracies. Paired t-tests between CSP+SVM, CNN (SELU), and LSTM models were implemented (Table 3). The
results of paired t-tests showed that CNN (SELU) performed significantly better than the other two methods, but CSP+SVM did not significantly
surpass LSTM.
Figure 6 showed that the confusion matrices of the five methods indicating that CNN (RELU) model was more robust and stable than CNN (ReLU),
CNN (ELU), CSP+SVM, and LSTM models. LSTM model was especially unstable, which got a difference by ±64.95% for the false negative rate.
5 DISCUSSION
In this paper, we designed CNN and LSTM models for MI-BCI analysis. CNN outperformed CSP+SVM in MI-BCI analysis. Compared with the
CSP+SVM, CNN with RELU achieved higher mean accuracy by 9.05%. Obviously, the confusion matrices and results of paired t-tests of accuracy
standard derivations showed more stable performance for CNN with RELU within and between subjects.
However, for another DNN model, ie, LSTM model, why was it even outperformed by CSP+SVM? In Table 1, the mean accuracy of LSTM was
lower than CSP+SVM, but the paired t-test of accuracy results showed insignificant between LSTM and CSP+SVM, suggesting that they performed
8 of 9 WANG ET AL.
similarly in accuracies. LSTM performed similarly with CSP+SVM and CNN models for the last six subjects and even outperformed CSP+SVM and
CNN (RELU) for sub12 and sub 13. It could be the reasons of individual differences that make LSTM less accurate.
In the confusion matrices, LSTM was obviously more unstable than CSP+SVM and CNN. However, LSTM showed insignificant differences
between CNN in the paired t-test of the accuracy standard derivations. It could be inferred that LSTM was stable within subject but unstable
between subjects, which means that LSTM was not suitable for every subject. LSTM is usually applied in application scenarios with common sequen-
tial patterns, such as text comprehension and video comprehension. However, CNN is usually applied to scenarios with common spatial patterns,
such as image classification. For EEG signal, STFT transformed the signal into images that contained common spatial patterns lead to the good per-
formance of CNN. However, LSTM could not turn these common spatial patterns into common sequential patterns for some subjects, causing the
bad performance of LSTM.
In our study, we selected only 3 channels located in the central region for data analysis. Besides, the number of the electrodes was less than those
of other studies.9,38 It was convenient for wearable BCI applications. Few channels may be beneficial for developing easy-used detectors in practical
environment.
The CNN algorithm has been proved to be robust for off-line analysis. The conclusion implied that it was feasible to apply this method for online
MI-based model training. However, time loss of training classifier must be decreased in experimental design. Consequently, we intend to employ
intra-stable recognition model for every subject in further works.
6 CONCLUSION
This work has presented the comparison between CSP+SVM, CNN, and LSTM models for analyzing EEG signals in the motor imagery tasks. We
found that CNN models with SELU outperformed CSP+SVM and LSTM. However, due to the varied performance between subjects, LSTM even
achieved lower mean accuracy than CSP+SVM. This work proposed the different adaptation for different DNN models (CNN and RNN), and dis-
cussed the reasons for the poor performance of LSTM in classifying MI-EEG signals. As a result, the proposed CNN model is a potential classifier in
EEG off-line analysis for MI-BCI. Besides, future work will be carried out to resolve the instable performance of LSTM across subjects and to ana-
lyze more about the DNNs between methods such as common spatio-spectral pattern (CSSP), sub-band common spatial pattern (SBCSP), and filter
bank common spatial pattern (FBCSP).
ACKNOWLEDGEMENT
This research was supported in part by Shanghai NSF research project (Grant No. 16JC1401300) and Wuxi Technology Project (2016 No. 187).
ORCID
REFERENCES
1. Birbaumer N, Ghanayim N, Hinterberger T, et al. A spelling device for the paralysed. Nature. 1999;398:297-298.
2. Blankertz B, Lemm S, Treder W, Haufe S, Müller KR. Single-trial analysis and classification of ERP components: a tutorial. Neuroimage. 2011;56:814-825.
3. Kübler A, Birbaumer N. Brain-computer interfaces and communication in paralysis: extinction of goal directed thinking in completely paralysed patients?
Clin Neurophysiol. 2008;119:2658-2666.
4. Pfurtscheller G, Neuper C. Future prospects of ERD/ERS in the context of brain-computer interface (BCI) developments. Prog Brain Res.
2006;159:433-437.
5. Muller-Putz GR, Pfurtscheller G. Control of an electrical prosthesis with an SSVEP-based BCI. IEEE Trans Biomed Eng. 2008;55(1):361-364.
6. Pfurtscheller G, Neuper C, Schlogl A, Lugger K. Separability of EEG signals recorded during right and left motor imagery using adaptive autoregressive
parameters. IEEE Trans Rehabil Eng. 1998;6(3):316-325.
7. Ramoser H, Muller-Gerking J, Pfurtscheller G. Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans Rehabil Eng.
2000;8(4):441-446.
8. Anderson CW, Devulapalli SV, Stolz EA. Determining mental state from EEG signals using parallel implementations of neural networks. Sci Program.
1995;4(3):171-183.
9. Cao L, Xia B, Maysam O, Li J, Xie H, Birbaumer N. A synchronous motor imagery based neural physiological paradigm for brain computer interface speller.
Front Hum Neurosci. 2017;11:274. https://doi.org/10.3389/fnhum.2017.00274
10. Müller-Gerking J, Pfurtscheller G, Flyvbjerg H. Designing optimal spatial filters for single-trial EEG classification in a movement task. Clin Neurophysiol.
1999;110(5):787-798.
11. Ciregan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. Paper presented at: 2012 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR); 2012; Providence, RI.
WANG ET AL. 9 of 9
12. Huang D, Lin P, Fei DY, Chen X, Bai O. Decoding human motor activity from EEG single trials for a discrete two-dimensional cursor control. J Neural Eng.
2009;6(4):1-12.
13. Pfurtscheller G, Neuper C, Flotzinger D, Pregenzer M. EEG-based discrimination between imagination of right and left hand movement. Electroen-
cephalogr Clin Neurophysiol. 1997;103(6):642-651.
14. Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2016;313(5786):504-507.
15. Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527-1554.
16. Olshausen BA, Field DJ. Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Res. 1997;37(23):3311-3325.
17. Lee H, Battle A, Raina R, Ng AY. Efficient sparse coding algorithms. Paper presented at: Advances in Neural Information Processing Systems;
December 7–8, 2007; Vancouver, Canada.
18. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Paper presented at: Advances in Neural
Information Processing Systems; December 2–8, 2012; South Lake Tahoe, CA.
19. Mikolov T, Karafit M, Burget L, Cernocky J, Khudanpur S. Recurrent neural network based language model. Paper presented at: Interspeech 3;
September 26–30, 2010; Chiba, Japan.
20. Krahenbuhl P, Doersch C, Donahue J, Darrell T. Data-dependent initializations of convolutional neural networks. arXiv preprint arXiv:1511.06856; 2015.
21. Ciresan D, Meier U, Masci J, Schmidhuber J. Multi-column deep neural network for traffic sign classification. Neural Netw. 2012;32:333-338.
22. Ciresan DC, Meier U, Masci J, Maria GL, Schmidhuber J. Flexible, high performance convolutional neural networks for image classification. Paper
presented at: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI); July 16–22, 2011; Barcelona, Spain.
23. Socher R, Perelygin A, Wu J, et al. Recursive deep models for semantic compositionality over a sentiment treebank. Paper presented at: Proceedings of
the 2013 Conference on Empirical Methods in Natural Language Processing; October 18–21, 2013; Seattle, WA.
24. Dos Santos CN, Gatti M. Deep convolutional neural networks for sentiment analysis of short texts. Paper presented at: COLING; August 23–29, 2014;
Dublin, Ireland.
25. Li K, Li X, Zhang Y, Zhang A. Affective state recognition from EEG with deep belief networks. Paper presented at: 2013 IEEE International Conference
on Bioinformatics and Biomedicine (BIBM); December 18–21, 2013; Shanghai, China.
26. Jirayucharoensak S, Pan-Ngum S, Israsena P. EEG-based emotion recognition using deep learning network with principal component based covariate
shift adaptation. Sci World J. 2014;32:333-338.
27. Cecotti H, Graser A. Convolutional neural networks for P300 detection with application to brain-computer interfaces. IEEE Trans Pattern Anal Mach Intell.
2011;33(3):433-445.
28. Griffin D, Lim J. Signal estimation from modified short-time Fourier transform. IEEE Trans Acoust Speech Signal Process. 1984;32(2):236-243.
29. Tzallas AT, Tsipouras MG, Fotiadis DI. Epileptic seizure detection in EEGs using time–frequency analysis. IEEE Trans Inf Technol Biomed.
2009;13(5):703-710.
30. Sheikhani A, Behnam H, Mohammadi MR. Analysis of EEG background activity in Autism disease patients with bispectrum and STFT measure. Paper
presented at: Proceedings of the 11th WSEAS International Conference on Communications; July 26–28, 2007; Heraklion, Greece.
31. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278-2324.
32. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735-1780.
33. Nair V, Hinton GE. Rectified linear units improve restricted boltzmann machines. Paper presented at: Proceedings of the 27th International Conference
on Machine Learning (ICML'10); June 21–26, 2010; Haifa, Israel.
34. Clevert DA, Unterthiner T, Hochreiter S. Fast and accurate deep network learning by exponential linear units (ELUs). arXiv preprint arXiv:1511.07289;
2015.
35. Klambauer G, Unterthiner T, Mayr A, Hochreiter S. Self-normalizing neural networks. arXiv:1706.02515; 2017.
36. Bottou L. Large-scale machine learning with stochastic gradient descent. Paper presented at: Proceedings of COMPSTAT'2010; 2010; Paris, France.
37. Abadi M, Barham P, Chen J, et al. TensorFlow: A system for large-scale machine learning. Paper presented at: 12th USENIX Symposium on Operating
Systems Design and Implementation (OSDI); November 2–4, 2016; Savannah, GA.
38. Tang Z, Li C, Sun S. Single-trial EEG classification of motor imagery using deep convolutional neural networks. Optik. 2017;130:11-18.
How to cite this article: Wang Z, Cao L, Zhang Z, Gong X, Sun Y, Wang H. Short time Fourier transformation and deep neural networks
for motor imagery brain computer interface recognition. Concurrency Computat Pract Exper. 2018;e4413. https://doi.org/10.1002/cpe.4413