Professional Documents
Culture Documents
net/publication/333729805
CITATIONS READS
11 620
3 authors:
Hasan Ayaz
Drexel University
258 PUBLICATIONS 3,870 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Hasan Ayaz on 05 July 2019.
1 Introduction
Human-Machine-Interfaces (HMI) are widely used in everyday life, and their navi-
gation imposes high cognitive demands on the operator’s brain. This mental workload
(MWL) affects the operator’s interaction with computers and other devices. MWL may
compromise the user’s functioning, and sometimes safety, by increasing fault rates and
reaction times, introducing fatigue [1], and causing the neglect of critical information
known as cognitive tunneling [2]. Therefore, taking into consideration the operator’s
cognitive characteristics and condition is an integral step in improving the design of
HMIs through installing adaptive features that can adjust to MWL changes [1]. We
consider two essential aspects of a protocol for measuring MWL: selecting the data to
be acquired and analyzed, and developing an algorithm for event-related classification.
To measure MWL, brain activity is typically recorded using the conventional
method of Electroencephalography (EEG), or more recently using functional Near
Infrared Spectroscopy (fNIRS) [3]. EEG measures voltage fluctuations resulting from
ionic current within the neurons of the brain. EEG signals are spontaneous relative to
the brain activity through electrical measurement that provide high temporal resolution,
but these measurements are highly susceptible to electrical noise and motion artifacts,
e.g., eye-blinking or muscle movement. fNIRS, on the other hand, monitors the
hemodynamic responses that follow neuronal activity. fNIRS is less susceptible to
electrical noise and movement artifacts yet has lower temporal resolution and low
depth resolution. However, the higher spatial resolution offered by fNIRS provides a
better indication of which part of the cortex is activated. Integration of EEG and fNIRS
provides us with two different sources of data that are associated to the same neuronal
activities but are sensitive to different types of events. We can leverage the comple-
mentary temporal and spatial resolution of the two data types; while EEG identifies the
cortical reaction to a specific stimulus with relatively fine temporal resolution, fNIRS
measures hemodynamic changes that arise from neuronal activity to determine the
location of the reaction.
The classification of cognitive events based on fNIRS and EEG signals requires
solving high-dimensional pattern classification problems with a relatively small number
of training patterns. Conventional classification methods require that a priori feature
selection be performed before the model can be trained [4, 5]. Such feature selection is
a difficult and heavily studied problem; the large number of possible features and
relatively small amount of available data introduce the curse of dimensionality and the
possibility of overfitting. Deep Neural Networks (DNNs), and in particular Convolu-
tional Neural Networks (CNNs), allow for classifiers that bypass the need for feature
selection, since both are well suited for learning from raw data and hence recordings
can be fed directly to the algorithms for training.
CNNs take as input two-dimensional images, which differ in structure from the
neural time series obtained on the scalp surface using EEG and fNIRS. Therefore, both
the existing CNN architectures and fNIRS-EEG input must be adapted to allow fNIRS-
EEG input to a CNN. Furthermore, since the number of samples in the fNIRS-EEG
recordings is relatively small, avoiding underfitting or overfitting is a primary chal-
lenge. In this study, we aim to focus on how the CNNs of different architectures can be
Convolutional Neural Network for Hybrid fNIRS-EEG Mental Workload Classification 223
designed and trained for end-to-end learning from fNIRS (or fNIRS-EEG) signals
recorded in human subjects. An input adaptation method, an efficient network structure,
and methods to overcome the overfitting are introduced. The performance of a CNN
classifier for a 4-class mental workload task classification is evaluated, and the impact
of design choices (e.g., the overall network architecture and type of nonlinearity used)
on classification accuracy are investigated.
The remainder of the paper is organized as follows. Section 2 provides background
on CNNs and on published results related to the work presented here. After a
description of the dataset specifications, the preprocessing and input adaptation
methods are explained in Sects. 3 and 4. A description of the deep neural networks and
proposed Convolutional Neural Network including the method spec, the network
architecture and hyperparameters configuration, are described in Sect. 5. Finally,
classification results and discussion conclude the paper in Sect. 6.
Fig. 1. The proposed network spec: A CNN with three convolution layers (including max-
pooling and dropout layers after each convolutional layer), two fully connected layers and a
Softmax output layer.
224 M. Saadati et al.
This study makes use of a dataset collected at the Technische Universität Berlin by
Shin et al. [20, 21] in 2017. The dataset includes simultaneous EEG (30 sensors) and
fNIRS (36 channels) recordings of the scalp for mental workload during n-back (0-, 2-
and 3-back) tasks. These tasks are classified into four possibilities: 0-, 2- and 3-back
tasks, and rest. The study included twenty-six right-handed healthy participants. The
dataset consists of three sessions, where each session contained three series of 0-, 2-,
and 3-back tasks. Hence, nine series of n-back tasks were performed for each partic-
ipant. A single series included a 2 s. instruction showing the type of the task (0-, 2- or
3-back), a 40 s. task period, and a 20 s. rest period. A total of 180 trials were performed
for each n-back task (20 trials, 3 series, 3 sessions). All EEG and fNIRS signals were
recorded simultaneously.
Thirty EEG active electrodes were distributed on the entire scalp according to the 5-
10 system. Sixteen sources and sixteen detectors were placed at frontal, motor, parietal,
and occipital areas. The locations of the EEG and fNIRS channels and details of the
sequence of the tasks can be found in [20].
pulsation, and respiration. The signals HbR and HbO are normalized by dividing them
by the maximum value of the window samples. To calculate the HbR and HbO changes
relative to rest, the deference between the HbR and HbO of the window and the mean
of the HbR and HbO of the rest period (base) prior to the window are calculated and
adapted to build images to feed to the CNN, as below:
CNN, like other supervised machine learning algorithms, require training data in the
form of labeled examples. All recordings in the dataset described above are labeled as
one of four classes. The EEG recordings are downsampled to 10 Hz to match the
sampling frequency of the fNIRS data. Data matrices form a set of N input images; the
length of the image is equal to the number of samples in the time window, and N
depends on the window length. Due to the low sampling rate of the fNIRS data, the
dataset is relatively small, and the length of the window must be chosen carefully. The
length of the windows should not result in such a small N that overfitting occurs (in
case of windows with large length), nor compromise the accuracy (in case of small
windows). The image width varies depending on which recordings are used. For the
full hybrid fNIRS-EEG data, the width is 102 (36 + 36 + 30) columns which include
the average ERD/ERS of the EEG recordings, and DHbO and DHbR of the fNIRS
recordings in the desired time window. In order to compare the classification accuracy
of the full hybrid fNIRS-EEG system to that of standalone fNIRS and EEG, four other
cases are tested, including; the combined EEG and DHbO, standalone EEG, combined
of DHbO and DHbR, and standalone DHbO. The widths of the images corresponding to
each of these cases are presented in Table 1. The input image shape for the full hybrid
EEG-fNIRS system is represented in Fig. 1.
A CNN with three convolutional layers (including max-pooling and dropout layers
after each convolutional layer) and two fully connected layers is adapted to suit the
given dataset. Its structure is shown in Fig. 1. The output layer consists of four units,
with a Softmax activation representing a probability distribution over the classes.
Dropout, as a regularization technique to improve generalization, randomly sets the
output of units in the network to zero during training, preventing those units from
affecting the output or the gradient of the loss function for an update step [6]. This
ensures that the network does not overfit by depending on specific hidden units. There
are 32 3 * 3 kernels in each layer. The two fully connected layers have 256 and 128
neurons, respectively. This architecture is selected after experimenting with many
different architectures, both shallower and deeper. Changes in image size as the data
passes through the CNN is depicted in Fig. 1. For the full hybrid EEG-fNIRS classi-
fication, the image size is 30 * 112; after three convolutional layers with a filter size of
3 * 3, the final image is 4 * 13.
We apply two of the most recently introduced activation function and compare their
performance in this application. The activation functions we consider are the rectified
Fig. 2. Processing of the data: After filtering, fNIRS data is squared, then DHbR and DHbO are
calculated. EEG data is downsampled to 10 Hz, and ERD/ERS analysis is performed on the
downsampled data. Finally, data are fed to the CNN algorithm to train the network.
228 M. Saadati et al.
linear unit (ReLU) function, which was proven to dampen the vanishing gradient
problem due to its non-saturating properties [10] and exponential linear unit
(ELU) because of its fast cost convergence to zero and more accurate results.
The entire classification process ow is shown in Fig. 2. After filtering, fNIRS data
is squared, then DHbR and DHbO are calculated. EEG data is downsampled to 10 Hz,
and ERD/ERS analysis is performed on the downsampled data. Finally, data are fed to
the CNN algorithm to train the network.
Single subject classification accuracy for all the examined combinations of recordings
(EEG, fNIRS (DHbO−DHbR), fNIRS (DHbO), EEG + fNIRS (DHbO−DHbR), EEG +
fNIRS (DHbO)) and classifiers (SVM and CNN) are reported in Table 2. Average
accuracy is computed across the 26 subjects. Similar to previous investigations com-
paring classification performance using standalone fNIRS, standalone EEG, and hybrid
EEG-fNIRS with other machine learning methods such as SVM and LDA, a clear
increase in CNN accuracy can be seen when fNIRS is employed with EEG when
compared to standalone modalities (average EEG-CNN accuracy 69%, average
fNIRS + CNN accuracy 82%, average fNIRS + EEG-CNN accuracy 89% at iteration
Fig. 3. Average and standard deviation of classification accuracy for each scenario considered.
Fig. 4. Average recall (a) and classification accuracy (b) in the training process for ELU and
ReLU activation functions.
1900). Using both DHbO and DHbR yields a higher average accuracy than only DHbO
(81% compared to 71%). fNIRS shows a higher average accuracy than EEG and than
the combination of EEG and DHbO. The last two columns of the table compare the
classification accuracy for SVM to CNN for the full hybrid scenario. CNN achieves
better classification performance than SVM, reaching an average 89% accuracy which
shows a 7% improvement. Figure 3 summarizes the average accuracy and standard
deviation for all of the modalities considered. EEG shows the highest standard devi-
ation, and full hybrid modality shows the lowest. Applying CNN on the fNIRS data
and SVM on the hybrid system yield very similar average accuracies.
To investigate the impact of the time window size on classification accuracy, we
calculated the average and standard deviation of the accuracy for three methodologies
with the highest performance. We considered four window lengths ranging from 2 to
5 s; the results is shown in a bar chart in Fig. 5. A 2 s. window yields the lowest
accuracy for all three modalities, likely because the small dimensions of the window.
The 3 s. window yields the highest accuracy and is used for the rest of the analysis. The
effect of the window is negligible for EEG data, since the higher sampling frequency
results in a large number of samples even in a small window.
230 M. Saadati et al.
Fig. 5. Impact of the time window size; average and standard deviation of classification
accuracy for the three methodologies with the best classification performance.
Minimum, maximum, standard deviation, and recall for classification accuracy are
computed for each participant for the CNN method and full hybrid system, and results
appear in Table 3. The lowest Min single subject accuracy is 71%, while the average
Min is 81%. The Max single subject accuracy reaches 99%, while the average Max is
91%. The average recall is 82%, similar to the recall of the SVM method.
In order to study the effects of the activation functions and number of iteration, we
computed the average recall and classification accuracy in the training process. The
results are shown in Fig. 4. Figure 4(b) shows the average training accuracy. Note that
that the highest accuracy is achieved in 1900 iterations, and application of ELU yields
Convolutional Neural Network for Hybrid fNIRS-EEG Mental Workload Classification 231
2% better performance than ReLU. The recall graph in Fig. 4(a) confirms this result, as
well.
In conclusion, the CNN based classifier introduced in this paper shows promising
results in comparison with a more conventional SVM approach. The CNN based
classification demonstrates the highest accuracy when the hybrid EEG-fNIRS record-
ings are used, in comparison with the standalone methods. The best result were
obtained with a three-second window and the ELU activation function, for which the
CNN yields 89% correct classification, while the SVM achieves only 82% correct
classification. The ELU activation function shows higher accuracy and recall in
comparison with the ReLU. For the future work, we plan to investigate different image
structures and performance of the other CNN structures such as ResNet in HMI and
BCI applications.
References
1. Aghajani, H., Garbey, M., Omurtag, A.: Measuring mental workload with EEG + fNIRS.
Front. Hum. Neurosci. 11, 359 (2017)
2. Jarmasz, J., Herdman, C.M., Johannsdottir, K.R.: Object-based attention and cognitive
tunneling. J. Exp. Psychol. Appl. 11(1), 3 (2005)
3. Ayaz, H., Dehais, F.: Neuroergonomics: The Brain at Work and Everyday Life, 1st edn.
Elsevier, Academic Press, Cambridge (2019)
4. Wang, L., Curtin, A., Ayaz, H.: Comparison of machine learning approaches for motor
imagery based optical brain computer interface. In: Ayaz, H., Mazur, L. (eds.) Advances in
Neuroergonomics and Cognitive Engineering, vol. 775, pp. 124–134. Springer, Cham (2019)
5. Liu, Y., Ayaz, H., Shewokis, P.A.: Multisubject “learning” for mental workload classification
using concurrent EEG, fNIRS, and physiological measures. Front. Hum. Neurosci. 11(389)
(2017). https://doi.org/10.3389/fnhum.2017.00389
6. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Networks 61, 85–
117 (2015)
7. Bashivan, P., Rish, I., Yeasin, M., Codella, N.: Learning representations from EEG with
deep recurrent-convolutional neural networks. In: arXiv preprint arXiv:1511.06448 (2015)
8. Hajinoroozi, M., Mao, Z., Jung, T.-P., Lin, C.-T., Huang, Y.: EEG-based prediction of
driver’s cognitive performance by deep convolutional neural network. Sig. Process. Image
Commun. 47, 549–555 (2016)
9. Thodoro, P., Pineau, J., Lim, A.: Learning robust features using deep learning for automatic
seizure detection. In: Machine Learning for Healthcare Conference, pp. 178–190 (2016)
10. Schirrmeister, R.T., Springenberg, J.T., Fiederer, L.D.J., Glasstetter, M., Eggensperger, K.,
Tangermann, M., Hutter, F., Burgard, W., Ball, T.: Deep learning with convolutional neural
networks for EEG decoding and visualization. Hum. Brain Mapp. 38(11), 5391–5420 (2017)
11. Trakoolwilaiwan, T., Behboodi, B., Lee, J., Kim, K., Choi, J.-W.: Convolutional neural
network for high-accuracy functional near-infrared spectroscopy in a brain-computer
interface: three-class classification of rest, right-, and left-hand motor execution. Neuropho-
tonics 5(1) (2007)
12. Croce, P., Zappasodi, F., Merla, A., Chiarelli, A.M.: Exploiting neurovascular coupling: a
Bayesian sequential Monte Carlo approach applied to simulated EEG fNIRS Data. J. Neural
Eng. 14(4) (2017)
232 M. Saadati et al.
13. Hong, K.-S., Naseer, N., Kim, Y.-H.: Classification of prefrontal and motor cortex signals
for three-class fNIRS-BCI. Neurosci. Lett. 587, 87–92 (2015)
14. Chiarelli, A.M., Zappasodi, F., Di Pompeo, F., Merla, A.: Simultaneous functional near-
Infrared spectroscopy and electroencephalography for monitoring of human brain activity
and oxygenation: a review. Neurophotonics 4(4) (2017)
15. Fazli, S., Mehnert, J., Steinbrink, J., Curio, G., Villringer, A., Muller, K.-R., Blankertz, B.:
Enhanced performance by a hybrid NIRS-EEG brain computer interface. Neuroimage 59(1),
519–529 (2012)
16. Saadati, M., Nelson, J.K.: Multiple transmitter localization using clustering by likelihood of
transmitter proximity. In: 51st Asilomar Conference on Signals, Systems, and Computers,
pp. 1769–1773 (2017)
17. Khan, M.J., Hong, M.J., Hong, K.-S.: Decoding of four movement directions using hybrid
NIRS-EEG brain-computer interface. Front. Hum. Neurosci. 8, 244 (2014)
18. Lee, M.-H., Fazli, S., Mehnert, J., Lee, S.-W.: Hybrid brain-computer interface based on
EEG and NIRS modalities. In: Brain-Computer Interface (BCI), 2014 International Winter
Workshop (2014)
19. Jirayucharoensak, S., Pan-Ngum, S., Israsena, P.: EEG-based emotion recognition using
deep learning network with principal component based covariate shift adaptation. Sci.
World J. (2014)
20. Shin, J., Von Luhmann, A., Kim, D.-W., Mehnert, J., Hwang, H.-J., Muller, K.-R.:
Simultaneous acquisition of EEG and NIRS during cognitive tasks for an open access
dataset. In: Generic Research Data (2018)
21. Shin, J., Von Luhmann, A., Kim, D.-W., Mehnert, J., Hwang, H.-J., Muller, K.-R.:
Simultaneous aquisition of EEG and NIRS during cognitive tasks for an open access dataset.
In: Scientific Data, vol. 5 (2018)
22. Pfurtscheller, G.: Functional brain imaging based on ERD/ERS. Vis. Res. 41(10–11), 1257–
1260 (2001)
23. Neuper, C., Pfurtscheller, G.: Event-related dynamics of cortical rhythms: frequency-specific
features and functional correlates. Int. J. Psychophysiol. 43(1), 41–58 (2001)
24. Pourshafi, A., Saniei, M., Saeedian, A. Saadati, M.: Optimal reactive power compensation in
a deregulated distribution network. In: 44th International Universities Power Engineering
Conference (UPEC), pp. 1–6 (2009)
25. Ma, L., Zhang, L., Wang, L., Xu, M., Qi, H., Wan, B., Ming, D., Hu, Y.: A hybrid brain-
computer interface combining the EEG and NIRS. In: 2012 IEEE International Conference
on Virtual Environments Human-Computer Interfaces and Measurement Systems
(VECIMS), pp. 159–162 (2012)
26. Saadati, M., Mortazavi, S., Pourshafi, A., Saeedian, A.: Comparing two modified method for
harmonic and flicker measurement based on RMS, p. R1 (2009). https://rms.scu.ac.ir/Files/
Articles/Conferences/Abstract/v56–27.pdf2009102231223218.pdf
27. Schirrmeister, R.T., Springenberg, J.T., Fiederer, L.D.J., Glasstetter, M., Eggensperger, K.,
Tangermann, M., Hutter, F., Burgard, W., Ball, T.: Deep learning with convolutional neural
networks for EEG decoding and visualization. In: Hum. Brainmapping 38(11), 5391–5420
(2017)