You are on page 1of 7

Downloaded from https://iranpaper.

ir
https://www.tarjomano.com https://www.tarjomano.com

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 54, NO. 7, JULY 2007 1231

A Feature Selection Method for Multilevel Mental


Fatigue EEG Classification
Kai-Quan Shen, Chong-Jin Ong, Xiao-Ping Li*, Zheng Hui, and Einar P. V. Wilder-Smith

Abstract—Two feature selection approaches for multilevel these can be found in the paper by Guyon and Elisseeff [18].
mental fatigue electroencephalogram (EEG) classification are It is generally recognized that feature selection methods are
presented in this paper, in which random forest (RF) is com- classified into two categories: filter-based and wrapper-based
bined with the heuristic initial feature ranking scheme (INIT)
or with the recursive feature elimination scheme (RFE). In a
approaches with the latter being superior in performance but
“leave-one-proband-out” evaluation strategy, both feature selec- normally carrying a heavier computational load [19].
tion approaches are evaluated on the recorded mental fatigue Most of the feature selection methods in the literature have
EEG time series data from 12 subjects (each for a 25-h duration) been originally designed for selecting features for two-class
after initial feature extractions. The latter of the two approaches classification [18]. As a result, past attempts at feature selection
performs better both in classification performance and more in biomedical applications have extensively followed the idea
importantly in feature reduction. RF with RFE achieved its lowest
test error rate of 12.3% using 24 top-ranked features, whereas in which the multiclass classification problem is decomposed
RF with INIT reached its lowest test error rate of 15.1% using into several two-class classification problems and the feature
64 top-ranked features, compared to a test error rate of 22.1% selection is carried out for each independently [18], [20]–[23].
using all 304 features. The results also show that 17 key features However, it may be advantageous to treat the multiclass clas-
(out of 24 top-ranked features) are consistent between the subjects sification problem directly rather than decomposing it into
using RF with RFE, which is superior to the set of 64 features as
determined by RF with INIT.
combinatorial two-class classification problems, where the fea-
ture selection has to be carried out in the context of multiclass
Index Terms—Electroencephalogram (EEG), feature selection, classification [18].
mental fatigue, random forests.
This paper investigates the use of RF [24], [25] to select the
key EEG features from a broad pool of available features, for the
I. INTRODUCTION purpose of multilevel mental fatigue EEG classification. Unlike
some other methods, the feature ranking criterion of RF pro-

M ENTAL fatigue, defined as a state of reduced mental


alertness that impairs performance [1], [2], has become
one of the most significant causes of accidents throughout the
vides a direct relationship between the ranking criterion and the
feature’s influence on the performance of the multiclass classi-
fication. This feature ranking criterion of RF is combined with
modern society [3]–[6]. Electroencephalogram (EEG) might RFE approach [21] or a heuristic INIT approach to yield an
be the most predictive and reliable physiological indicator overall feature selection schemes. The performances of both
of mental fatigue [7]–[9]. In recent years, there has been an schemes are compared by using EEG data recorded from 12 sub-
increasing interest in the EEG-based mental fatigue tracking jects during 25-h mental fatigue experiments.
technology [10]–[13], with the widespread hope that it will be This paper is organized as follows. EEG experimental setup
invaluable in the prevention of mental fatigue related accidents. and data preparation are described in Section II. The feature se-
A major obstacle for this “most wanted” technology [14] is the lection method using RF is detailed in Section III. The numerical
challenge of selecting key EEG features from a large number experiments are reported in Section IV, followed by the discus-
of available EEG features. sion in Section V and conclusions in Section VI.
Feature selection is important for improving generalization,
meeting system specifications or constraints (such as run- II. EXPERIMENTAL SETUP AND DATA PREPARATION
ning time and storage) and enhancing system interpretability
[15]–[17]. However, an exhaustive search for the optimal A. Subjects
feature subset from all possible feature combinations is com- In total 12 right-handed healthy young adults (age range:
putationally expensive. To overcome this, many methods for 19–25 years old; years of education: 11–15 years) were re-
feature selection have been proposed and a good review of cruited from the local tertiary institutions. These subjects were
selected from volunteers who fulfilled the inclusion criteria
Manuscript received March 7, 2006; revised October 29, 2006. Asterisk indi- of not being on any medication, no history of severe disease
cates corresponding author. and no history of a sleep disorder with regular sleep hygiene
K.-Q. Shen, C.-J. Ong, and Z. Hui are with the Department of Mechanical verified by a one-week sleep diary documenting nightly sleep
Engineering, National University of Singapore 117576, Singapore (e-mail:
shen@nus.edu.sg, mpeongcj@nus.edu.sg, zhenghui@nus.edu.sg).
periods of no less than 8 h. Each person was asked to document
*X.-P. Li is with the Department of Mechanical Engineering & Division of that over a period of one week, they did not go to bed later
Bioengineering, National University of Singapore, Singapore 117576, Singa- than 1 am, and did not wake later than 9 am. Additionally, the
pore (e-mail: mpelixp@nus.edu.sg).
E. P. V. Wilder-Smith is with the Department of Medicine, National Univer-
presence of habitual daytime napping resulted in exclusion.
sity of Singapore, Singapore 119074, Singapore (e-mail: mdcwse@nus.edu.sg). Informed consent was obtained and the protocol was approved
Digital Object Identifier 10.1109/TBME.2007.890733 by the local ethics committee.
0018-9294/$25.00 © 2007 IEEE
Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:00:48 UTC from IEEE Xplore. Restrictions apply.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

1232 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 54, NO. 7, JULY 2007

B. Auditory Vigilance Task (AVT) with the largest average power in its full width half max-
The details of AVT test have been reported previously [26], imum band [31] is defined as the dominant peak. The corre-
which was designed to catch the cortical deactivation of the sponding frequency of this dominant peak is the dominant
functional lobes of the brain which is believed to be the physi- frequency.
ological basis of the mental fatigue [27]. Four audio commands Average Power of Dominant Peak (APDP): This is defined
(left, right, up, down) with 500 ms duration each were randomly as the average power on the full width half maximum band
ordered in one command set. Each test session had 50 com- of the dominant peak.
mand sets in about 3 min (1.5 s interset interval). Subjects were Center of Gravity Frequency (CGF): It is defined as
required to constantly concentrate and press the corresponding
prespecified buttons within 1.5 s once they heard each complete
command set. For every AVT session, the AVT score was cal- (1)
culated in terms of percentage of correct responses.

C. Experimental Protocol
where is frequency and is the estimated power
Subjects were monitored in a temperature-controlled labora- spectral density.
tory (23 –25 from 8:30 a.m. until 9:30 a.m. the next day. Frequency Variability (FV): It is defined as
This 25-h design ensures all phases of circadian-induced mental
fatigue were sampled. Caffeine, tea, smoking were prohibited
for 24 h before and throughout the experiment. Subjects were
required to perform one AVT session (with eyes closed) once
an hour throughout the experiment. They were allowed to en- (2)
gage in nonstrenuous activities in the period when they were
not required to perform AVT session.
As a result, in total 304 quantitative EEG features (4 kinds of
D. EEG Acquisition features 19 channels 4 frequency bands, i.e., standard , ,
Multichannel EEG data were recorded simultaneously during , frequency bands [29]) were extracted. Combining with the
every AVT session according to the International 10–20 system EEG labels as described in Section II-E, the feature selection
[28] (A1 or A2 as the reference channels for the right or left results in a data subset for each subject. Specifically, for the th
brain, respectively), with sampling rate of 167 Hz (or 6 ms sam- subject, a data subset is formed in terms of ,
pling interval) and a 0.02–35 Hz pass band using a customized where is a 304-dimentional feature vector for the th sample
bandpass filter implemented in LabView (version 6.1, National and the corresponding mental fatigue level. Small portion of
Instruments, USA). the EEG data (less than 20 s EEG data) in the beginning of each
recording was discarded to ensure the well-balanced samples
E. EEG Data Labeling for all the five classes.
The EEG data were labeled to 5-level mental fatigue ac-
cording to AVT performance score. Specifically, for every III. FEATURE SELECTION USING RF
subject, the individual performance span (the highest AVT Random forest [24], [25] is an ensemble of many random-
score to the lowest AVT score) was evenly divided into five ized decision-trees so that its output is the plurality vote of
segments corresponding to fatigue level 1 to 5, respectively. all these decision-trees. Each decision-tree is randomized by
The label (i.e., mental fatigue in 5 levels) of the EEG data both bootstrapping (a statistical resampling technique) and
recorded in an AVT session was determined by which segment random variable selection. Specifically, each decision-tree
the corresponding AVT performance score fell into. of RF is constructed using a bootstrapped set (only about
two-third of the original training data). The other one-third is
F. Feature Extraction left out as out-of-bag (OOB) cases. During the tree growing, a
In the present study, EEG feature extraction can be carried random selection of a subset of features (with a fixed size of
out in both online and offline manner for the purposes of offline , , where denotes the dimensionality of feature
feature selection and online multilevel mental fatigue classifica- vector) is used for the determination of the split at each node.
tion. EEG data were passing through the feature extraction win- The randomization by both bootstrapping and random variable
dows, with one window for each EEG channel. The length of selection assures low correlation of the individual trees. Mean-
each feature extraction window was 2 s (or 334 EEG samples), while, each decision-tree is grown to maximum depth to keep
with half second (or 84 EEG samples) time interval between the bias low (but with high variance). The ensemble of these
two adjacent calculations of feature extraction. The power spec- weak decision-trees (with low correlation, low bias but high
tral density for the EEG data segment falling into the fea- variance) yields a strong classifier, i.e., RF (with both low bias
ture extraction window was calculated by fast Fourier transform and low variance).
with Hann window [30]. Four features were chosen to charac- Random forest is, a rare feature for nonlinear classifiers, very
terize the power spectral density . Their definitions are as benevolent to train. Even without extensive tuning of the few
follows. RF hyper-parameters ( and another hyper-parameter ,
Dominant Frequency (DF): Among all the peaks in a spec- i.e., the number of decision-trees to grow), it usually yields
trum segment corresponding to a frequency band, the peak good classification results comparable to or better than other
Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:00:48 UTC from IEEE Xplore. Restrictions apply.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

SHEN et al.: FEATURE SELECTION METHOD FOR MULTILEVEL MENTAL FATIGUE EEG CLASSIFICATION 1233

TABLE I TABLE II
PROCEDURE OF RF-INIT FEATURE SELECTION SCHEME PROCEDURE OF RF-RFE FEATURE SELECTION SCHEME

state-of-the-art classifiers on many data sets [25], [32], [33]. Be-


sides its superior classification performance, another feature of
RF is that it can give an importance measure for each feature, ac-
cording to that feature’s contribution to the overall classification
performance. The contribution of a feature is calculated by com-
paring the performance of the predictors (decision-trees) on the
original OOB cases and that obtained from the permuted OOB
cases with the content of that feature having been masked off
(through random permuting the values of that feature). In more of the features containing all elements in . Next, the last
detail, let or - denote the count of the number element of (with the smallest ) is removed from , and
of votes cast for the correct class by th tree for all the OOB stored in the last position of , meaning it is the least important
cases before and after permuting the values of the th feature, feature. RF-INIT is then called again on the reduced . As it
respectively. The raw importance score for the th feature is now builds a random forest using a reduced set of features, it
the average decrease in the number of votes for the correct class will give a different random forest and, therefore, a possibly
due to permuting the th feature, which is defined as different ranking of the features still in . The last feature in
- this new ranking is stored as the one but last position in , and
removed from . This process continues, each time removing
(3)
the least important feature from and storing it in the rightmost
free position of , until is empty.
As suggested by Breiman [32], raw importance score of (3) is, RF-INIT is in fact the existing RF feature selection method
however, too unstable to be used as the feature ranking criterion. proposed by Breiman [25], while RF-RFE is a combination of
The correlation of the decreases in the number of votes for the RF and a backward feature elimination scheme motivated by
correct class due to permuting the th feature, from tree to tree, SVM-RFE algorithm [21]. The implementation of RF-RFE only
has been shown to be quite low. The standard error of these requires some slight modifications from the standard random
decreases can be computed in the classical way and the standard forest package [34]. Steps 3 and 4 of RF-RFE ( , ) in Table II
error of the corresponding raw importance score is . removes one feature from the data set at a time. Obviously, more
A more robust feature ranking criterion can be defined by the than one feature can be removed at one time with slight mod-
-score of the raw importance score, which is computed by di- ifications to steps 3 and 4 in Table II. The current description
viding the raw importance score by its standard error assuming of RF-RFE does not involve the determination of parameters
normality as follows: and for each of the inner loop through a model selec-
tion process. Such a process is possible albeit with higher com-
(4) putational costs.

The remainder of this section presents two approaches using IV. NUMERICAL EXPERIMENTS
this criterion in an overall scheme for feature selection: the A. Methods
RF-based initial feature ranking (RF-INIT) approach and the
RF-based recursive feature elimination (RF-RFE) approach. Both RF-INIT and RF-RFE feature selection approaches are
Both approaches assume that the feature ranking criteria ’s applied on the multilevel mental fatigue EEG data obtained in
(for each feature) are available from RF solution. Section II. These two feature selection approaches are compared
Let denote the dimensionality of feature vector ( quantitatively by the predictive performance in the classification
for the present case). The RF-INIT approach has as its inputs of multilevel mental fatigue.
data set and the index set . The output The EEG data were recorded in 19 different channels in 4
of RF-INIT is a ranked list of the features in the form of an index frequency bands (Section II). Four kinds of features were cal-
set with and . culated for each of them in a sliding window (sampling at 0.5 s
The steps involved in this approach are summarized in Table I. intervals, with a 2 s sliding window) along the time series, re-
The RF-RFE approach is similar to the one given by Guyon sulting in a vector of features ( features), at
[21] but with the of (4) used as the ranking criterion. The a total of samples. The data are available in
steps involved in this approach are summarized in Table II. The subject-wise data subsets for (12 subjects).
inputs are the same as RF-INIT, with the output being the ranked The down-sampling scheme using sliding window with overlap-
list of features . One starts with full feature set . RF-INIT ping makes the features of adjacent samples highly correlated.
is called, which builds a random forest and gets a ranked list This violates, together with the high correlation which is natural
Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:00:48 UTC from IEEE Xplore. Restrictions apply.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

1234 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 54, NO. 7, JULY 2007

to most time series data such as EEG, the assumption of an inde-


pendent and identical distribution of the data. As a consequence,
a random sampling of a testing set will not be appropriate to test
the predictive performance on this kind of data.
To estimate the predictive performance, a block resampling
scheme “leave-one-proband-out” [35] is used, where each sub-
ject corresponds to one proband. Samples from 11 subjects are
used to form a training set , and the samples from the
left-out subject are used to form a testing set . Practically,
this resampling procedure results in 12 pairs of and
in total. In the numerical experiments, for each pair of and
, is used by feature selection approaches RF-INIT
(Table I) and RF-RFE (Table II), producing a ranked feature
list and , respectively. For comparison purposes, random
forest is iteratively fit on , at each iteration building a new Fig. 1. Mean test error rate against the number of top-ranked features where
the top-ranked features are selected by either RF-RFE (solid curve) or RF-INIT
random forest after discarding the least important features ac- (dashed curve). The test error rates are obtained by averaging 12 test error rates
cording to and , respectively. To estimate the predictive on all resampled subsets D ’s.
performance of the selected features, the test errors are obtained
by putting (with the least important features discarded cor- TABLE III
MEAN CONFUSION MATRIX FOR THE OPTIMAL CLASSIFICATION (USING 24
respondingly) down the random forest and calculating the per-
KEY FEATURES CHOSEN BY RF-RFE FOR EACH PAIR OF RESAMPLED D
centage of misclassifications on . In order to save the com- AND ) D
putational time, a three-tier feature removal scheme was used, in
which 20 features were removed at each recursion until 44 fea-
tures left, and then 5 features were removed at each recursion
until 24 features were left, followed by two features removed at
each recursion.
The most important hyper-parameter of random forest to
tune, i.e., the number of input variables to split on at each
node, has been shown not sensitive over a wide range of values has two stages. In the stage 1, the mean test error rate monotoni-
and the default value is often a good choice [36]. In addition, cally decreased, as the number of top-ranked features decreased
as earlier mentioned, the re-tuning of the hyper-parameters is from 304 to 24. The lowest mean test error rate (approximately
computationally very expensive in the inner loops for RF-RFE. 12.3%) was obtained using 24 top-ranked features, compared
Therefore, is set to be the integer nearest to the square root to the mean test error rate of 22.1% with 304 features without
of the feature vector dimensionality, i.e., feature selection. In stage 2, however, as the number of features
for RF-INIT and a decreasing for RF-RFE. was further reduced from 24, the mean test error rate increased.
Another hyper-parameter is set to 1000 where the OOB The transition of mean test error rate in variation against number
error rates already converge with the setting of above-men- of top-ranked features from stage 1 to stage 2 is a clear indica-
tioned hyper-parameter . tion of the effectiveness of RF-RFE feature selection approach.
The curve for RF-INIT in Fig. 1 shows a similar trend, however,
B. Results the result is not as appealing. Compared to RF-RFE, RF-INIT
Fig. 1 shows the mean test error rates of RF on unseen testing generally gave a higher mean test error rate for using the same
sets ’s with the decreasing number of top-ranked features, number of top-ranked features. The lowest mean test error rate
where the top-ranked features were selected by either RF-INIT of 15.1% was obtained when using 64 top-ranked features, com-
or RF-RFE approach. The mean test error rates are the average pared to 12.3% using 24 top-ranked features for RF-RFE. It
values over 12 test error rates corresponding to 12 pairs of should also be noted that there were some fluctuations with the
and . Error bars of standard deviation have not been mean test error rate in the stage 1 for RF-INIT, in contrast to the
plotted for the sake of clarity. However, the results show that clear monotonic decrease in the case of RF-RFE.
for RF-RFE approach the standard deviation is rather stable Equal costs for any misclassification are assumed in the
with regards to the number of the top-ranked features used for present study. Table III shows the mean confusion matrix for
classification (about 5%), whereas for the RF-INIT approach it the optimal classification (using 24 key features chosen by
varies between 5% and 7%. RF-RFE), by averaging 12 confusion matrices for 12 ’s.
Fig. 1 shows that the performances of RF-RFE and RF-INIT It shows that gross errors (such as mental fatigue level 1 being
are similar in terms of the trends of mean test error rate in misclassified to mental fatigue level 5) did not occur heavily.
variation against the number top-ranked features, but RF-RFE The feature rankings and obtained by RF-INIT and
performs better. The difference in performance is confirmed RF-RFE are significantly different, which can be easily shown
by a one-tailed paired -test [37] using a significance level of by a scatter plot (not shown). RF-RFE achieves optimal classi-
. The curve for RF-RFE in Fig. 1 shows that the mean fication at 24 features, which is superior to the set of 64 features
test error rate varying against the number of top-ranked features (less variability, easier to interpret) as determined by RF-INIT.
Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:00:48 UTC from IEEE Xplore. Restrictions apply.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

SHEN et al.: FEATURE SELECTION METHOD FOR MULTILEVEL MENTAL FATIGUE EEG CLASSIFICATION 1235

facts, which are generated by eye movements and facial muscle


movements. They typically have larger amplitudes compared
to normal EEG. The locations of Fp1 and Fp2 are close to the
sources of EOG and EMG. Thus, channel Fp1 and Fp2 failed in
contributing to the classification of mental fatigue at different
levels.
In addition, as shown in Table IV, an interesting finding is
implied, in which the four standard frequency bands (i.e., , , ,
frequency bands) are all important to adequately capturing the
subtle changes in EEG in relationship to mental fatigue, whereas
only some of the four frequency bands were mentioned to be
relevant in the literature [6], [27], [39]. APDP seemed to be an
important feature from Table IV, which means that the power
over the standard frequency bands (i.e., , , , frequency
bands) in EEG is an important measure to mental fatigue.
Fig. 2. Distribution of the 17 features which are important in all of the re-
sampled subsets at optimal classification (selected by RF-RFE). The number
V. DISCUSSION
in bracket following the channel name is the number of key features deriving Two different random forest feature selection approaches
from that channel.
(i.e., RF-INIT and RF-RFE) are evaluated quantitatively in this
paper for the classification of multilevel mental fatigue EEG.
TABLE IV
LIST OF THE 17 FEATURES WHICH ARE IMPORTANT IN ALL OF THE RESAMPLED In the literature, only few applications of RF feature selection
SUBSETS AT OPTIMAL CLASSIFICATION (SELECTED BY RF-RFE) were reported [36], [40]. Most of them are related to gene
selection using microarray data sets, where the ill-posedness
of the classification tasks demands for a strong dimension
reduction. Other examples of a successful application of the
random forest feature selection remain rare.
In the evaluation of generalization performance, a block
resampling scheme “leave-one-proband-out” [35] is used.
The benefits of “leave-one-proband-out” resampling scheme
are manifold: 1) It is assured that the temporal overlapping
neighbours and of a test sample are not part of
the training set. 2) Multiple testing sets are obtained to evaluate
the error and a confidence can be assessed. This confidence
is always obligatory when discussing the significance in the
comparison between the performances of two proposed feature
selection approaches. 3) Each testing set is an independent
data set of a left-out proband. One considers RF being trained
on a leave-one-proband-out subset. It is natural to simulate its
generalization performance on other unseen subjects by the test
It may be useful to examine the features which are within the error on the corresponding testing set of a left-out proband, and
24 top-ranked features in all of the resampled subset ’s by to seek for key features which are repeatedly selected in all of
RF-RFE. There are 17 such features in total (shown in Fig. 2 the resampled subsets.
and Table IV). The neurophysiological interpretation of this The better performance of RF-RFE over RF-INIT (Fig. 1)
common feature set is discussed in the next. is most interesting and deserves attention. RF-INIT is in fact
the existing RF feature selection method proposed by Breiman
C. Neurophysiological Interpretation of Key Features [25], while RF-RFE is a feature selection method motivated by
Benefiting from the increased system interpretability as a re- SVM-RFE feature selection method [21] by inserting RF in-
sult of feature selection by RF-RFE, the identified key features stead of SVM into the backward feature selection procedure.
also give some important information on the critical EEG chan- It is also worthy to acknowledge that RF-INIT approach is vir-
nels and the critical EEG features for discriminating mental fa- tually the same as the gene selection method proposed by Diaz-
tigue in 5 levels. Fig. 2 indicates that the frontal and occipital Uriarte and Andres [36], whereas RF-RFE is similar to a “gene-
regions of the brain were important for classification of mental shaving” approach proposed by Jiang et al. [40]. In their papers,
fatigue at different levels, which is consistent with the neuro- good performances of variable selection using random forests
physiologic basis of the mental fatigue [27]. However, there was for microarray data were reported. However, they used raw im-
no key feature derived from channel Fp1 and Fp2, which is the portance score as in (3) rather than a more reliable Z-score as
region of the prefrontal lobe known to be involved in the com- in (4) as the feature ranking criterion. More importantly, their
plex effect induced by mental fatigue [38]. This could be ex- methods for the evaluation of test error are suitable for mi-
plained by the existence of the unavoidable EOG and EMG arti- croarray data but not appropriate for a time series data such as
Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:00:48 UTC from IEEE Xplore. Restrictions apply.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

1236 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 54, NO. 7, JULY 2007

EEG data. The following paragraphs highlights the reasons why VI. CONCLUSIONS
the proposed approach RF-RFE, together with the “leave-one- A method which allows to identify relevant features for the
proband-out” resampling scheme, can be well suited for anal- classification of multilevel mental fatigue EEG has been pro-
ysis of neuronal (and other biomedical) data. posed. This is done by transforming this search into a multi-
Firstly, RF-RFE is naturally a feature selection method for class classification experiment and to use a feature selection
both two-class and multiclass classification, in contrast to most strategy. Two different feature selection approaches are evalu-
of existing methods (such as SVM-RFE) which are designed ated, both based on random forest and its feature importance
for binary classification only. Many biomedical applications re- measures. The first is in fact a filter-based approach proposed
quire feature selection for multiclass classification. One may by Breiman, i.e., random forest with the heuristic initial feature
argue that a multiclass classification problem can be always de- ranking scheme; the other, following Guyon’s machine learning
composed into combinatorial binary classification problems and advice, is a wrapper-based approach using an iterative feature
feature selection can be carried out for each binary classification elimination scheme. In the “leave-one-proband-out” evaluation
independently. However, it remains an open issue on the mixing strategy, both feature selection approaches are tested on the EEG
strategy which allows to combine multiple ranked feature lists time series data after initial feature extractions. The latter of the
resulting from feature selection for each binary classification two approaches performs better, both in classification perfor-
problem. Meanwhile, it may be advantageous to carry out the mance and more importantly in feature reduction. Results show
feature selection directly in the context of multiclass classifica- that 17 (out of 304) features are consistently important between
tion because the larger the number of the classes, the less likely a subjects, which is superior to the set of 64 features (less vari-
“random” set of features can provide a good classification [18]. ability, easier to interpret) as determined by Breiman’s standard
Second, RF-RFE inherits every appealing characteristic of filter-based approach. From these 17 key features, it was found
random forest. They usually yield good classification results, that the electrode locations in the frontal and occipital regions of
comparable to or better than other state-of-the-art classifiers on the brain are most important to multilevel mental fatigue classi-
a number of data sets [33]. In addition, they are very benevolent fication, which is consistent with the anatomical regions known
to train, with a little need to fine-tune the few hyper-parame- to be involved in mental fatigue. These key features also show
ters. Random forest can also be used in extremely ill-posed clas- that the four standard frequency bands ( , , , ) are all impor-
sification problems when there are many more variables than tant to the multilevel mental fatigue classification.
observations.
Thirdly, RF-RFE runs efficiently on large data sets (e.g., the ACKNOWLEDGMENT
mental fatigue EEG data set used in the present study). The The authors are grateful to L. Breiman and A. Cutler who
little need to fine-tune the hyper-parameters may account for the made their random forest code available through the Internet.
computational advantage of RF-RFE. In addition, more than one The implementation for the present study grew from that Fortran
feature can be removed at one time in RFE procedure, which package. The authors would also like to thank the reviewers for
makes computation cost more affordable. Besides the major their valuable comments and suggestions.
computation cost of re-growing random forest at each iteration,
re-calculation of feature importance measure only slightly in- REFERENCES
creases the computation cost. The feature importance measure [1] E. Grandjean, Fitting the Task to the Man. London, U.K.: Taylor &
Francis, 1981.
of RF is calculated through internal out-of-bag estimates [25]. [2] D. J. Mascord and R. A. Heath, “Behavioral and physiological indices
Its computation cost is running of the OOB and permuted OOB of fatigue in a visual tracking task,” J. Safety Res., vol. 23, pp. 19–25,
cases on each decision-tree which is very fast. 1992.
[3] K. Idogawa, “On the brain wave activity of professional drivers during
The present study also gives some insights into the consistence
monotonous work,” Behaviormetrika, vol. 30, pp. 23–34, 1991.
of the random forest importance over multiple resampled sub- [4] D. F. Dinges, “An overview of sleepiness and accidents,” J. Sleep Res.,
sets, which is an issue that still seems largely ignored in the liter- vol. 4, no. 2, pp. 4–14, 1995.
ature. In the present study, there are 17 features (Table IV) which [5] J. C. Stutts, J. W. Wilkins, and B. V. Vaughn, Why do People Have
Drowsy Driving Crashes? Input From Drivers Who Just did 1999, AAA
are within the 24 top-ranked features in all of the resampled sub- Foundation for Traffic Safety.
sets (i.e., ’s) using RF-RFE feature selection approach. It is a [6] S. K. L. Lal and A. Craig, “A critical review of the psychophysiology
bit disappointing with regards to the consistence of feature selec- of driver fatigue,” Biological Psychology, vol. 55, pp. 173–94, 2001.
[7] O. G. Okogbaa, R. L. Shell, and D. Filipusic, “On the investigation of
tion. The slight inconsistence could be due to individual depen- the neurophysiological correlates of knowledge worker mental fatigue
dence of mental fatigue EEG from subject to subject, since the using the EEG signal,” Appl. Ergonomics, vol. 25, pp. 355–365, 1994.
samples in the present study were divided subject-wise according [8] J. A. Horne and L. A. Reyner, “Driver sleepiness,” J. Sleep Res., vol.
to the “leave-one-proband-out” resampling scheme. Therefore, it 4, pp. 23–29, 1995.
[9] S. K. L. Lal and A. Craig, “Driver fatigue: Electroencephalography
is natural to seek for features which are repeatedly selected over and psychological assessment,” Psychophysiology, vol. 39, no. 3, pp.
multiple resampled subsets. This concern is taken into account 313–321, 2002.
in the present study in the attempt of relating key features with [10] P. Artaud, S. Planque, C. Lavergne, H. Cara, P. de Lepine, C. Tarriere,
and B. Gueguen, “An on-board system for detecting lapses of alertness
the biological basis of mental fatigue. The consistence between in car driving,” in Proc. 14th Int. Conf. Enhanced Safety of Vehicles,
those common key features over multiple resampled subsets Munich, Germany, 1994, vol. 1.
with the anatomical regions known to be involved in mental fa- [11] A. Gevins, H. Leong, R. Du, M. E. Smith, J. Le, D. DuRousseau,
J. Zhang, and J. Libove, “Towards measurement of brain function in
tigue is also a strong indication of the good performance of the operational environments,” Biological Psychol., vol. 40, no. 1–2, pp.
proposed RF-RFE feature selection approach. 169–186, 1995.
Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:00:48 UTC from IEEE Xplore. Restrictions apply.
Downloaded from https://iranpaper.ir
https://www.tarjomano.com https://www.tarjomano.com

SHEN et al.: FEATURE SELECTION METHOD FOR MULTILEVEL MENTAL FATIGUE EEG CLASSIFICATION 1237

[12] D. Dinges and M. Mallis, , L. R. Hartley, Ed., Managing Fatigue in Kai-Quan Shen received the B.S. degree from the
Transportation: Selected Papers From the 3rd Fatigue in Transporta- University of Science and Technology of China,
tion Conference. Oxford, U.K: Elsevier, 1998, pp. 209–229. Hefei, China. He is currently working toward the
[13] S. K. L. Lal, A. Craig, P. Boord, L. Kirkup, and H. Nguyen, “Develop- Ph.D. degree under the supervision of Prof. X.-P.
ment of an algorithm for an EEG-based driver fatigue countermeasure,” Li, Associate Professor E. P. V. Wilder-Smith and
J. Safty Res., vol. 34, pp. 321–328, 2003. Associate Professor C.-J. Ong at the Department of
[14] National Transportation Safety Board (NTSB), Most Wanted Trans- Mechanical Engineering, the National University of
portation Safety Improvements [Online]. Available: http://www.ntsb. Singapore.
gov/Recs/mostwanted/index.htm His research interests focus on feature selection
[15] B. Boser, I. Guyon, and V. N. Vapnik, “A training algorithm for op- methods, support vector machines, brain signal pro-
timal margin classifiers,” in Proc. 5th Annu. Workshop Computational cessing, blind signal separation, and the investigation
Learning Theory, Pittsburgh, PA, 1992, pp. 144–152. of neurophysiological mechanisms of human brain using functional MRI.
[16] C. Cortes and V. N. Vapnik, “Support vector networks,” Mach. Learn.,
vol. 20, pp. 273–297, 1995.
[17] V. N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998. Chong-Jin Ong received the B.Eng. (Hons) and
[18] I. Guyon and A. Elisseeff, “An introduction to variable and feature M.Eng. degrees in mechanical engineering from
selection,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, 2003. the National University of Singapore, Singapore,
[19] R. Kohavi and G. John, “Wrappers for feature subset selection,” Artif. in 1986 and 1988, respectively, and the M.S.E. and
Intell., vol. 97, pp. 273–324, 1997. Ph.D. degrees in mechanical and applied mechanics
[20] E. Yom-Tov and G. F. Inbar, “Feature selection for the classification from University of Michigan, Ann Arbor, in 1992
of movements from single movement-related potentials,” IEEE Trans. and 1993, respectively.
Neural Syst. Rehabil. Eng., vol. 10, no. 3, Sep. 2002. He joined the National University of Singapore
[21] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for in 1993 and is now an Associate Professor with
cancer classification using support vector machines,” Mach. Learn., the Department of Mechanical Engineering. His
vol. 46, no. 1–3, pp. 389–422, 2002. research interests are in algorithms for machine
[22] A. Rakotomamonjy, “Variable selection using SVM-based criteria,” J. learning, feature selection and robust control.
Mach. Learn. Res., vol. 3, pp. 1357–1370, 2003.
[23] T. N. Lal, M. Schroder, T. Hinterberger, J. Weston, M. Boqdan, N.
birbaumer, and B. Scholkopf, “Support vector channel selection in
Xiao-Ping Li received received the Ph.D. degree in
BCI,” IEEE Trans. Biomed. Eng., vol. 51, no. 6, pp. 1003–1010,
mechanical and manufacturing engineering from the
Jun. 2004.
University of New South Wales, Sydney, Australia,
[24] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 26, pp. 123–140,
in 1991.
1996. He joined the National University of Singapore in
[25] L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001. 1992, where he is currently a Full Professor with the
[26] Y. Y. Pang, X. P. Li, K. Q. Shen, H. Zheng, W. Zhou, and E. P. V. Department of Mechanical Engineering and Division
Wilder-Smith, “An auditory vigilance task for mental fatigue detec- of Bioengineering. His current research interests in-
tion,” presented at the 27th Annu. Int. Conf. EMBC, Shanghai, China, clude neurosensors and nanomachining. He is a guest
2005. editor of the International Journal of Computer Ap-
[27] R. D. Ogilvie, “The process of falling asleep,” Sleep Med. Rev., vol. 5, plications in Technology, an editorial board member
no. 3, pp. 247–270, 2001. of the International Journal of Abrasive Technology and Engineering, an edito-
[28] H. Jasper, “The 10–20 electrode system of the international federa- rial advisor of the Chinese Journal of Mechanical Engineering, and a regular re-
tion,” Electroencephalogr. Clin. Neurophysiol., vol. 10, pp. 371–375, viewer of 14 international journals. His research achievements include 6 patents
1958. granted, 2 patents in pending, and 230 technical publications, of which 118 are
[29] E. Niedermeyer and F. L. D. Silva, Electroencephalography: Basic international refereed journal papers. He supervised 6 postdoctoral research fel-
Principles, Clinical Applications, and Related Fields. Baltimore, lows, 13 Ph.D. degree students and 23 masters degree students.
MD: Lippincott Williams & Wilkins, 1999. Dr. Li is a member of the American Society of Mechanical Engineers
[30] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Pro- (ASME), a senior member of Society of Manufacturing Engineers (SME), a
cessing. Upper Saddle River, NJ: Prentice-Hall, 1989. senior member of the North American Manufacturing Institute of SME.
[31] E. W. Weisstein, Full Width at Half Maximum MathWorld—A
Wolfram Web Resource [Online]. Available: http://mathworld.wol-
fram.com/FullWidthatHalfMaximum.html
[32] L. Breiman, RF/Tools: A Class of Two-Eyed Algorithms—Expository Zheng Hui received the B.S. degree in electrical and
computer engineering in 2004 and spent two years
Notes 2003, Univ. California Berkeley, Statist. Dept., .
doing research in signal processing and pattern recog-
[33] D. Meyer, F. Leisch, and K. Hornik, “The support vector machine under
nition for EEG applications as a master student at the
test,” Neurocomputing, vol. 55, no. 1–2, pp. 169–186, 2003.
National University of Singapore, Singapore.
[34] L. Breiman and A. Cutler, Random Forests. Version 5.1 [Online].
He is currently a Research Scientist with the Cog-
Available: http://www.stat.berkeley.edu/~breiman/RandomForests/ nitive Neuroscience Laboratory, Singapore.
cc_software.htm
[35] S. N. Lahiri, Resampling Methods for Dependent Data, ser. Statistics.
New York: Springer, 2003.
[36] R. Diaz-Uriarte and S. A. de Andres, “Gene selection and classification
of microarray data using random forest,” BMC Bioinformatics, vol. 7,
no. 3, 2006.
[37] G. E. P. Box, W. G. Hunter, and J. S. Hunter, Statistics for Ex- Einar P. V. Wilder-Smith received the M.B.B.S.
perimenters: An Introduction to Design, Data Analysis, and Model (equivalent) and M.D. degrees from Heidelberg
Building. New York: Wiley, 1978. University, Heidelberg, Germany, in 1986 and 1989,
[38] A. Muzur, E. F. Pace-Schott, and J. A. Hobson, “The prefrontal cortex respectively.
in sleep,” Trends Cognit. Sci., vol. 6, pp. 475–481, 2002. He joined the National University of Singapore,
[39] S. Makeig and T. Jung, “Tonic, phasic, and transient EEG correlates Singapore, in 2001 as Associate Professor in the De-
of auditory awareness in drowsiness,” Cognit. Brain Res., vol. 4, no. 1, partment of Medicine were he heads the Neurology
pp. 15–25, 1996. Diagnostic Laboratory and is director of Research for
[40] H. Y. Jiang, Y. P. Deng, H.-S. Chen, L. Tao, Q. Y. Sha, J. Chen, the Department of Medicine. His research interests
C.-J. Tsai, and S. L. Zhang, “Joint analysis of two microarray gene-ex- are in the filed of clinical neuropathysiology and in-
pression data sets to select lung adenocarcinoma marker genes,” BMC clude EEG changes in relation to emotions and neu-
Bioinformatics, vol. 5, no. 81, 2004. rological disorders.
Authorized licensed use limited to: Carleton University. Downloaded on June 21,2023 at 11:00:48 UTC from IEEE Xplore. Restrictions apply.

You might also like