You are on page 1of 6

Downloaded from https://iranpaper.

ir
https://www.tarjomano.com https://www.tarjomano.com

MULTI-CNN FEATURE FUSION FOR EFFICIENT EEG CLASSIFICATION

Syed Umar Amin*, Ghulam Muhammad†, Wadood Abdul†, Mohamed Bencherif†, and Mansour
Alsulaiman†
*
samin@ksu.edu.sa
*†
The authors are with the Department of Computer Engineering, College of Computer and Information Sciences
(CCIS), King Saud University, Riyadh 11543, Saudi Arabia. They are also with the Center of Smart Robotics
Research, CCIS, King Saud University, Riyadh, Saudi Arabia

ABSTRACT to show differences in the various activities of the brain.


These signals show the brain activity produced while a person
Recently, motor imagery (MI) signals have been used in BCI imagines performing a movement of a body part. Machine
systems for disabled persons, controlling robots or learning and deep learning techniques [4, 5] are used to
wheelchairs or even driving cars. Hence, researchers are discriminate MI activity produced during different tasks.
using machine learning and deep learning techniques for These signals are recorded from the scalp over the
decoding MI signals. The properties of EEG signals like a sensorimotor cortex region of the brain [2].
low signal to noise ratio and its dynamic nature make it more Machine learning-based methods have been used in the
complex and harder to decode. Research has shown that EEG past by researchers to decode EEG signals, these methods are
has both spatial and temporal characteristics which can be based on the extraction of handcrafted features. BCI systems
exploited by deep learning models like convolution neural employing machine learning have been used for stroke
networks (CNN). This paper shows that the multilevel CNN rehabilitation [6], for communication [7], to communicate
model can extract dynamic correlations from EEG MI data. with robots or devices like autonomous wheelchairs [6], etc.
Multilevel CNN models are fused using autoencoder to Recent BCI studies have used deep learning methods to make
extract the best features for EEG data which help improve MI intelligent machines and robots by transferring cognitive
decoding accuracy. The proposed fusion methods give a good behavior from the human brain [8]. The characteristics of
performance for EEG decoding when tested for within- EEG such as low spatial resolution and signal-to-noise ratio
subject trials as well as cross trials. The results are compared make feature extraction and classification a challenging task.
with state-of-the art-techniques on public EEG dataset BCI Machine learning techniques using handcrafted features have
competition IV 2a. Novel cross encoding technique is also had limited success in decoding EEG. Deep learning models
proposed which helps us achieve big improvement in cross- have been quite successful in EEG decoding and have
subject EEG decoding. reached the state-of-the-art performance although the
Index Terms—Deep learning, convolution neural accuracy remains low. Automated deep features have been
network, EEG motor imagery decoding, feature fusion able to provide greater insight into these complex brain
signals. There are different deep learning models that have
1. INTRODUCTION achieved the best performance in various fields like image
classification [9], speech detection [10], detecting forgery [9,
Brain-computer interfaces (BCI) [1]-[3] are used to 23], etc. CNN is good at extracting spatial features [11]. RNN
communicate between our brains and devices [4]. is good for video and audio applications [12] where temporal
Electroencephalograph (EEG) is a non-invasive brain activity features are involved. Autoencoders are better suited for
recording which is not expensive and easy to record. Multiple unsupervised learning [13].
electrodes are used to record EEG by placing them on a There are recent studies that have employed deep learning
specific portion of the scalp. The EEG consists of high models such as CNN and deep belief networks (DBN) for
temporal resolution which can be up to the range of EEG data [11-15], as well as techniques like transfer learning
milliseconds. Such high temporal resolution is not possible and pretraining have been used for EEG datasets as the
even with high-end imaging techniques like magnetic training samples are limited. Due to these issues and
resonance imaging (MRI) or computed tomography (CT). characteristics of EEG, we need more research and
Therefore, EEG is still being used in research and medical customized deep learning models for EEG.
systems to study brain working and related disorders. Many types of CNN based models have produced good
Motor imagery (MI) brain signals are flexible EEG results for image applications. Feature fusion and aggregation
signals [5], [6] that are being used a lot in research recently techniques have been successful in particular. The depth and

978-1-7281-1485-9/20/$31.00 ©2020 IEEE


Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:12:53 UTC from IEEE Xplore. Restrictions apply.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

filter size of CNN have a varying effect on the extracted 3. MULTI-CNN FEATURE FUSION
filters, therefore the architecture and depth play an important
role in extracting robust features [16-22]. Although many The proposed model consists of two CNN models and an
studies have used deep learning-based methods for EEG autoencoder for feature fusion. The CNN models consist of a
decoding, the improvement in performance over shallow CNN with one convolution layer and a deep CNN
conventional machine learning models has been limited [40- having four convolutions. First, each CNN model is
42]. pretrained and then trained on another MI database. When
This study proposes a feature fusion model where deep training is done, both the CNN are fused using an
CNN and shallow CNN features are fused with the help of autoencoder. After autoencoder, a softmax layer is used for
autoencoders. Shallow CNN and deep CNN with different classification.
filter sizes can extract different types of EEG features that are Both the shallow and deep CNNs have blocks of
fused using autoencoders. The proposed model achieves convolution layers and pooling layers. The convolution
better performance than the best CNN models for EEG layers learn spatial features and the pooling layers are used
decoding. Our method shows that CNN depends on depth, for dimensionality reduction. The literature review suggests
architecture, and filters for robust feature extraction. that for EEG decoding CNN can have one to four convolution
The remaining part of this paper is divided into 3 layers for good results, very deep models don’t work for EEG
sections. Section 2 gives an overview of the studies related to [21-25]. Many studies have used CNNs with just one or two
EEG classification. Section 3 presents the proposed Multi layers [33-34].
CNN fusion method and section 4 presents the results. We In both the CNN models the first convolution is a logical
provide a conclusion in section 5. one which is divide into two convolution steps. Most of the
EEG recordings consist of multiple channels which could
sometimes go up to 128 channels. Split convolution
2. RELATED STUDY technique can manage multiple channels, the first
convolution step is performed through time, and the second
Many studies employ conventional machine learning one is done for all channels spatially. Therefore the net effect
techniques in MI classification. The best-known method is convolution through all channels. The EEG dataset is
among them is the filter bank common spatial patterns organized in a 2-D manner, which consists of rows of
(FBCSP) [4, 5] which had the best result for MI decoding. channels and columns of samples. The divided first
Many researchers have used the support vector machine convolution technique favors this input organization. The
(SVM) for classification and have achieved good results [26- convolution step performed across time-samples can extract
28]. Some researchers have also used principal component temporal characteristics while the other convolution
analysis (PCA) and independent component analysis (ICA) performed through all channels can learn spatial features.
reducing dimension and removing noise [16-17]. In this study we use the BCI Competition IV dataset 2a
Many studies have started applying deep learning models (BCID) [31], many recent studies have been tested on this
CNN, DBN, and autoencoders to achieve good results for MI database. The BCI dataset has a limited number of training
classification. CNN has been widely used for feature samples, so in this study, the CNN models are pretrained on
extraction and MI classification tasks [19-22]. Some studies another dataset called the High Gamma dataset (HGD) [33].
have also used DBN for temporal features as research shows The BCI dataset is a motor imagery challenge dataset used
that EEG is oriented in time-series [24-26]. Some studies by many researchers. It has 22 channel recordings from 9
have employed both CNN and RNN for spatial and temporal healthy subjects recorded in two sessions where sessions have
feature extraction [20, 21]. SVM has also used as a classifier 288 trials each four seconds long. Tasks consist of imagining
together with DBN [24]. One study combined handcrafted movement of the right hand, left hand, feet, and tongue [31].
CSP features with CNN and fused handcrafted features with The input EEG signal is cropped by a 2s sliding window
deep features to obtain good results [26]. CNN and and then given to the CNN model as input. Cropping
autoencoders were combined for emotion recognition based increases the size of the training samples, helps improve
on EEG [27]. In one study, the authors transformed the EEG classification accuracy and also prevents overfitting on small
into images and CNN was used for image classification. EEG datasets. The sampling frequency is 256 Hz which gives
Another study extracted the mu band and the beta band from around 1000 samples with 4s cropping window.
the EEG signal for MI classification and employed stacked The split convolution is done through time-samples and then
autoencoder (SAE) with CNN [28, 29]. through all channels as shown in Figure 1. After each
All the above methods above using deep learning models convolution, we apply nonlinearity and max-pooling and at
improved classification results. In this study, we can improve last, we apply the softmax layer. Batch normalization and
the MI decoding accuracy further by fusing CNN using dropout techniques helped us increase decoding accuracy.
autoencoders. We used exponential linear units (ELU) for activation.

Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:12:53 UTC from IEEE Xplore. Restrictions apply.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Fig. 1. CNN-Deep with four layers

The architecture of CNNs is given in Table 1. We also used The fusion is performed using autoencoders which are trained
the Adam algorithm as an optimization algorithm, which is a separately by freezing the CNN parameters.
good optimizer for high-dimensional data like EEG. After We fuse shallow CNN and Deep CNN using an
training the CNNs are optimized jointly. autoencoder. The CNNs were pretrained the HGD dataset,
then the softmax and the dense layers were removed and the
Table 1. Structure of CNN models CNNs were fused by concatenating pool features. The multi-
Shallow CNN Deep CNN CNN fusion model is shown in figure 3. The fusion model is
Convolution (25×1, 100 filters) Convolution (10×1, 100 filters) trained in a subject-specific and cross-subject manner.
The EEG is dynamic so it changes from subject to subject
Convolution (1×22, 100 filters) Convolution (1×22, 100 filters) and the trials also differ for within the subject. Hence, we
Max Pool (3×1, stride 3) Max Pool (3×1, stride 3) need robust features that can discriminate subjects and trials.
Fully Connected (1024) Convolution (10×1, 100 filters) Our study proposes training the autoencoder in a novel way
to learn feature for different subjects. Autoencoder cross-
Softmax (4 classes) Max Pool (3×1, stride 3) encoding and pretraining used in [37] achieved good results.
Convolution (10×1, 100 filters) The concatenated features are given to the autoencoder which
Max Pool (3×1, stride 3) reconstructs the feature set for the same trial in subject-
specific training but in cross-subject encoding technique, the
Convolution (10×1, 200 filters) autoencoder reconstructs a trial for a different subject. In this
Max Pool (3×1, stride 3) process, the fusion model can extract a discriminative feature
Fully Connected (1024) set for MI decoding.
The cross-subject encoding also helps increase training
Softmax (4 classes)
samples. At the last softmax function is used to classify the
feature set. The cross encoding method is given in figure 5.

Fig. 2. Multi-CNN Feature Fusion Model

Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:12:53 UTC from IEEE Xplore. Restrictions apply.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

This study also reported cross-subject accuracy


improvement. Cross-subject classification results have not
been reported by any other research. We ran cross-subject
experiments on other models also and compared the results in
Table 3. Autoencoder cross-encoding gave us good
improvement over other models. Cross-trial training of the
autoencoder increased training samples and helped the fusion
model to learn discriminative features. We achieved 51.5%
cross-subject accuracy which is an excellent improvement
over other models.

Fig. 3. Cross-Encoding Table 3. Results for cross-subject testing


Accuracy Accuracy
Methods
4. EXPERIMENTAL RESULTS (BCI) (HGD)
FBCSP [5] 38.0% 65.2%
The proposed model was built using the PyTorch framework
and the MI preprocessing was done using MNE-Python. The CNN (separable convolutions) [26] 40.0% -
CNN model in [33] acted as our baseline for the evaluation
CNN (cropped training) [33] 41.0% 69.5%
of the results. The accuracy comparison of the proposed
model with the state of the art methods is presented in Table FBCSP and CNN [34] 44.4% -
2, it can be seen the proposed fusion model gave better Multi-CNN fusion (proposed
results. The proposed fusion model gave 74.8% accuracy on 51.5% 75.2%
method)
the BCI dataset when trained in a subject-specific manner
which is slightly higher than the other methods.

Table 2. Results for subject-specific testing 5. CONCLUSION


Accuracy Accuracy This study proposed a multi-layer CNN fusion model for
Methods
(BCI) (HGD)
MI classification. The feature fusion was performed in
FBCSP [5] 68.0% 91.2% subject-specific as well as cross-subject approaches. Pool
CNN(1D) and SAE [35] 70.0% -
layer features from shallow and deep CNN models were
fused and given to autoencoder for the reconstruction of
CNN (separable convolutions) [26] 69.0% - discriminative features. The results show that CNN is better
than conventional machine learning models such as FBCSP.
CNN (cropped training) [33] 72.0% 92.5%
Pretraining helped in improving the accuracy and preventing
FBCSP and CNN [34] 74.4% - the model from overfitting.
Cross-subject autoencoder training helped in finding
CNN fusion [37] 74.5% -
discriminative features for the same subject as well as for
Multi-CNN fusion (proposed different subjects. Therefore the fusion model was able to
74.8% 93.4%
method) find differences in different trials belonging to the same
subject and trials belonging to different subjects
The shallow CNN and the deep CNN [33] gave 73.0% and The subject-specific classification results were better than
71% MI classification accuracy respectively. CNN was used the best methods available for MI decoding and the cross-
with SAE [35] and the model gave 70% accuracy. Another subject classification accuracy achieved was the only result
model called EEGNet [26] was based on a compact CNN available for testing the system across subjects.
model got 69% accuracy. Common spatial patterns and Filter The fusion model proposed to show that with customized
bank CSP [5] based handcrafted features had achieved 68% fusion CNN models can achieve good results and can extract
accuracy. FBCSP features were fused with CNN features in spatially invariant features from MI data; hence we can apply
[34] to give 74.4% accuracy. In another study [37] CNN such fusion models for other types of EEG signals also. In
layers fusion was proposed, the method reached 74.5% the future, we would like to experiment with the fusion of
accuracy. The multi CNN fusion model proposed in this study CNN with other deep learning models such as LSTM.
gave better subject-specific accuracy of 74.8% and 93.4% for
BCI and HGD data, respectively.

Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:12:53 UTC from IEEE Xplore. Restrictions apply.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

ACKNOWLEDGMENT [14] G. Muhammad, et al., "Automatic Seizure Detection in a


Mobile Multimedia Framework." IEEE Access 6 (2018):
The authors extend their appreciation to the Deputyship 45372-45383.
for Research & Innovation, Ministry of Education in Saudi [15] Plis, M. Sergey, Hjelm, R. Devon, Salakhutdinov, Allen,
Bockholt, J. Henry, Johnson, J.Hans, Paulsen, "Deep learning
Arabia for funding this research work through the project for neuroimaging: a validation study", Frontiers in
number (DRI-KSU-1354). Neuroscience, 8(August):1–11, 2014.
[16] A. Ghoneim, et al. "Medical Image Forgery Detection for
Smart Healthcare," IEEE Communications Magazine, vol. 56,
6. REFERENCES no. 4, pp. 33-37, April 2018.
[17] Z. Ali, G. Muhammad, and M. F. Alhamid, “An Automatic
[1] L. J. Greenfield, J. D. Geyer, and P. R. Carney, “Reading Health Monitoring System for Patients Suffering from Voice
EEGs: a practical approach", Lippincott Williams & Wilkins, Complications in Smart Cities,” IEEE Access, vol. 5, no. 1, pp.
2012. 3900-3908, 2017.
[2] G. Pfurtscheller, F.H. Lopes da Silva, "Event-related EEG/ [18] M. S. Hossain, et al., "Improving consumer satisfaction in
MEG synchronization and desynchronization: basic smart cities using edge computing and caching: A case study
principles", Clin. Neurophysiol,. 110 1842–57, 1999. of date fruits classification." Future Generation Computer
[3] J. Müller-Gerking, G. Pfurtscheller and H. Flyvbjerg H, Systems 88 (2018): 333-341.
"Designing optimal spatial filters for single-trial EEG [19] G. Muhammad, et al., "Formant analysis in dysphonic patients
classification in a movement task", Clin. Neurophysiol. 110 and automatic Arabic digit speech recognition," BioMedical
787–98, 1999. Engineering OnLine 2011, 10:41.
[4] M. Grosse-Wentrup and M. Buss, "Multiclass common spatial [20] H. Cecotti and A. Graser, "Convolutional neural networks for
patterns and information theoretic feature extraction", IEEE P300 detection with application to brain-computer interfaces",
Trans. Biomed. Eng. 55 1991–2000, 1998. IEEE Transactions on Pattern Analysis and Machine
[5] K. K. Ang, Z. Y. Chin, C. Wang, C. Guan and H. Zhang, "Filter Intelligence, 33(3):433–445, 2011.
bank common spatial pattern algorithm on BCI competition IV [21] N. Guler, E. Ubeyli and I. Guler, "Recurrent neural networks
datasets 2a and 2b", Front. Neurosci. 6 39, 2012. employing Lyapunov exponents for EEG signals
[6] L. Tonin, T. Carlson, R. Leeb and J. Mill´an, "Brain-controlled classification" Expert Systems with Applications, 29(3):506–
telepresence robot by motor-disabled people", In 2011 Annual 514, 2005.
International Conference of the IEEE Engineering in Medicine [22] P. Thodoroff, J. Pineau and A. Lim, "Learning robust features
and Biology Society, pages 4227–4230. using deep learning for automatic seizure detection", Machine
[7] M. S. Hossain, et al., “Applying Deep Learning to Epilepsy learning for healthcare conference. 2016.
Seizure Detection and Brain Mapping,” ACM Transactions on [23] H. Yang, S. Sakhavi, K. K. Ang and C. Guan, "On the use of
Multimedia Computing Communications and Applications, convolutional neural networks and augmented CSP features for
vol. 15, no. 1s, Article 10 (18 pages), February 2019. multi-class motor imagery of EEG signals classification", 2015
[8] S. U. Amin, et al. "Cognitive Smart Healthcare for Pathology 37th Annual Int. Conf. of the IEEE Engineering in Medicine
Detection and Monitoring," IEEE Access, vol. 7, pp. 10745- and Biology Society (EMBC), pp 2620–3.
10753, 2019. [24] An X., Kuang D., Guo X., Zhao Y., He L. (2014) A Deep
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet Learning Method for Classification of EEG Data Based on
classification with deep convolutional neural networks", Motor Imagery. In: Huang DS., Han K., Gromiha M. (eds)
Advances in Neural Information Processing Systems 25, pages Intelligent Computing in Bioinformatics. ICIC 2014. Lecture
1097–1105. 2012. Notes in Computer Science, vol 8590. Springer, Cham
[10] M. S. Hossain, M. A. Rahman, and G. Muhammad, " Cyber- [25] H. Yang, S. Sakhavi, K. K. Ang and C. Guan, "On the use of
physical cloud-oriented multi-sensory smart home framework convolutional neural networks and augmented CSP features for
for elderly people: An energy efficiency perspective," Journal multi-class motor imagery of EEG signals classification", 2015
of Parallel and Distributed Computing, vol. 103, no. 2017, pp. 37th Annual Int. Conf. of the IEEE Engineering in Medicine
11-21, May 2017. and Biology Society (EMBC) pp 2620–3.
[11] G. Muhammad, M. F. Alhamid, M. Alsulaiman, and B. Gupta, [26] V. Lawhern et al., "EEGNet: a compact convolutional neural
“Edge Computing with Cloud for Voice Disorders Assessment network for EEG-based brain--computer interfaces," Journal of
and Treatment,” IEEE Communications Magazine, vol. 56, neural engineering, 15(5), 2018.
issue 4, pp. 60-65, April 2018. [27] G. Muhammad, et al., "A Facial-Expression Monitoring
[12] M. Masud, M. S. Hossain, and A. Alamri, “Data System for Improved Healthcare in Smart Cities," in IEEE
Interoperability and Multimedia Content Management in e- Access, vol. 5, pp. 10871-10881, 2017.
health Systems,” IEEE Trans. Inf. Technol. Biomed. Vol. 16, [28] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-
No. 6, pp. 1015-1023, Nov. 2012. based learning applied to document recognition," in
[13] M. S. Hossain, G. Muhammad, and A. Alamri, “Smart Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov.
healthcare monitoring: a voice pathology detection paradigm 1998.
for smart cities,” ACM/Springer Multimedia Systems, vol. 25, [29] Y. Bengio, P. Lamblin, D. Popovici and H. Larochelle,
no. 5, pp. 565-575, October 2019. "Greedy layer-wise training of deep networks", NIPS'06:
Proceedings of the 19th International Conference on Neural

Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:12:53 UTC from IEEE Xplore. Restrictions apply.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

Information Processing Systems, December 2006, Pages 153– decoding and visualization", Hum. Brain Mapp., 38: 5391–
160 5420, 2017.
[30] M. Chen, et al., "Edge-CoCaCo: Toward Joint Optimization of [34] S. Sakhavi, C. Guan and Y. Shuicheng, "Learning Temporal
Computation, Caching, and Communication on Edge Cloud," Information for Brain-Computer Interface Using
IEEE Wireless Communications, vol. 25, no. 3, pp. 21-27, June Convolutional Neural Networks" IEEE Transactions on Neural
2018. Networks and Learning Systems, vol. 29, no. 11, pp. 5619-
[31] C. Brunner, R. Leeb, G. Muller-Putz, A. Schlogl and G. 5629, Nov. 2018.
Pfurtscheller, "BCI Competition 2008–Graz data set A and B", [35] Y. R. Tabar and U. Halici, "A novel deep learning approach
Institute for Knowledge Discovery (Laboratory of Brain- for classification of EEG motor imagery signals", Journal of
Computer Interfaces), Graz University of Technology, pages Neural Engineering, 14(1):016003, 2016.
136–142. [36] S. Stober, "Learning discriminative features from
[32] R. T. Canolty, E. Edwards, S. S. Dalal, M. Soltani, S. S. electroencephalography recordings by encoding similarity
Nagarajan, H. E. Kirsch, M. S. Berger, N. M. Barbaro, R. T. constraints," 2017 IEEE International Conference on
Knight, "High gamma power is phase-locked to theta Acoustics, Speech and Signal Processing (ICASSP), New
oscillations in human neocortex", Science 313:1626–1628, Orleans, LA, 2017, pp. 6175-6179.
2006. [37] S. U. Amin, M. Alsulaiman, G. Muhammad, M. A. Bencherif
[33] R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. and M. S. Hossain, "Multilevel Weighted Feature Fusion Using
Glasstetter, K. Eggensperger, M. Tangermann and T. Ball, Convolutional Neural Networks for EEG Motor Imagery
"Deep learning with convolutional neural networks for EEG Classification," IEEE Access, vol. 7, pp. 18940-18950, 2019.

Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:12:53 UTC from IEEE Xplore. Restrictions apply.

You might also like