A Comparative Study On Classification of Sleep Stage Based On EEG Signals Using Feature Selection and Classification Algorithms

J Med Syst (2014) 38:18
DOI 10.1007/s10916-014-0018-0
RESEARCH ARTICLE
A Comparative Study on Classification of Sleep Stage Based

on EEG Signals Using Feature Selection and Classification
Algorithms
Baha Şen & Musa Peker & Abdullah Çavuşoğlu & Fatih V. Çelebi
Received: 19 September 2013 / Accepted: 23 February 2014 / Published online: 9 March 2014
# Springer Science+Business Media New York 2014
Abstract Sleep scoring is one of the most important diagnos- The results show that the proposed method indicate the ability
tic methods in psychiatry and neurology. Sleep staging is a to design a new intelligent assistance sleep scoring system.
time consuming and difficult task undertaken by sleep experts.
This study aims to identify a method which would classify Keywords EEG signals . Classification algorithms . Feature
sleep stages automatically and with a high degree of accuracy selection algorithms . Classification of sleep stage
and, in this manner, will assist sleep experts. This study
consists of three stages: feature extraction, feature selection
from EEG signals, and classification of these signals. In the Introduction
feature extraction stage, it is used 20 attribute algorithms in
four categories. 41 feature parameters were obtained from Healthy sleep habits affect our daily lives in different ways.
these algorithms. Feature selection is important in the elimi- Performance at work, morale, mood and relationships with
nation of irrelevant and redundant features and in this manner other individuals are but a few of them. In the medical world,
prediction accuracy is improved and computational overhead sleep analysis is of vital importance in the identification of
in classification is reduced. Effective feature selection algo- problems related to sleep. Sleep analysis leads to various
rithms such as minimum redundancy maximum relevance psycho-physiological analyses. In human physiology, a
(mRMR); fast correlation based feature selection (FCBF); healthy deep sleep stage is known to accelerate physical
ReliefF; t-test; and Fisher score algorithms are preferred at recuperation [1]. In addition, a healthy rapid eye movement
the feature selection stage in selecting a set of features which (REM) stage enhances learning skills and memory. In the
best represent EEG signals. The features obtained are used as identification of possible sleep problems, a sleep scoring
input parameters for the classification algorithms. At the clas- process is needed in almost all procedures. Sleep scoring is
sification stage, five different classification algorithms (ran- the identification of sleep stages with the help of polysomno-
dom forest (RF); feed-forward neural network (FFNN); deci- graphic recording (PSG) during sleep. There have been a
sion tree (DT); support vector machine (SVM); and radial number of examinations of patients’ PSG recordings; these
basis function neural network (RBF)) classify the problem. include electroencephalogram (EEG); electrooculogram
The results, obtained from different classification algorithms, (EOG); and electromyogram (EMG) data [1]. An expert eval-
are provided so that a comparison can be made between uated these records by following the Rechtschaffen & Kales
computation times and accuracy rates. Finally, it is obtained (R&K) rules which were identified in 1968. According to
97.03 % classification accuracy using the proposed method. R&K rules, each epoch (30-second data) is classified as awake
(W); non-rapid eye movement (N-REM stage 1, N-REM stage
B. Şen (*) : A. Çavuşoğlu : F. V. Çelebi
2, N-REM stage 3 and N-REM stage 4: from light to deep
Computer Engineering Department, Yıldırım Beyazıt University, sleep); and REM.
Ulus, Ankara, Turkey Amongst the physiological signals, EEG signals are the
e-mail: bsen@ybu.edu.tr signals used most often since they best represent the brain’s
activity [2]. EEG waves (alpha, beta, delta and theta) show
M. Peker
Computer Engineering Department, Karabuk University, different characteristics during different sleep stages. Low
78050 Karabuk, Turkey amplitude, mixed EEG frequency, saw-tooth wave-like
18, Page 2 of 21 J Med Syst (2014) 38:18
pattern, low amplitude EMG and high level EOG signals from presents information regarding the methods used in this study.
both eyes, are apparent during the REM stage. During the N- You can find, also, information related to the performance
REM stage 1, the highest amplitude with a frequency range of evaluation criteria employed in the study. Section 4 provides
2–7 Hz and the existence of alpha waves are found in EEG the experiments undertaken in the framework of the study, the
signal. The EMG level is lower when compared to the awake assessment procedures used and the experimental results ob-
stage. Sleep spindles (12–14 Hz) and k-complexes are obtained. Finally, Section 5 describes the conclusions derived
served during the N-REM stage 2. N-REM stage 3–4, the from the study and some thoughts with regard to future work.
deep sleep stage, may consist of low frequency waves which
are lower than 2 Hz, sleep spindles and k-complexes [3].
Figure 1 represents the EEG signals obtained during the Related work
different sleep stages.
EEG data has highly complicated transformation pat- Recently, various models have been proposed for the classifi-
terns. These signals are not periodic and their amplitude, cation of sleep stage. Feature extraction and classification
phase and frequencies constantly change. Therefore, ex- algorithms commonly used in the literature are examined.
tended periods are required for the measurements in order Some of the studies with a high rate of classification accuracy
to obtain meaningful data [2]. EEG data recorded for hours have been listed below.
is analyzed by the doctors while it is played on the screen Algorithms such as standard deviation [5, 6], median [6, 7],
in 5–10 second frames [4]. This process is rather exhaus- arithmetic mean [7, 8], skewness [9, 10], kurtosis [9, 10], zero
tive and hard. As can be seen, interpreting visual analysis crossing value [2, 7], variance value [7, 11], values of maxi-
and complex EEG signals for sleep staging is a difficult mum and minimum [2, 7], mean energy [2, 12, 13], mean
problem. Therefore, analyzing EEG signals with the help teager energy [12], petrosian fractal dimension [13], rényi
of a consistent and suitable method is necessary to obtain entropy [2, 13], spectral entropy [2, 13, 14], permutation
an effective and speedy sleep staging system. In order to entropy [15], approximate entropy [16], wigner ville coeffi-
assist the experts in their studies, a current study proposes cients [2, 11], wavelet transform [6, 13, 17, 18], mean curve
an effective feature selection based method that will auto- length [12, 13], hurst exponent [16] and Hjorth parameters
matically undertake the sleep staging process. [13, 17] are commonly used for extracting features from EEG
The paper is organized in the following manner: Section 2 data.
presents a brief literature review about classification of the In Holzmann et al. [19] developed an expert system by
sleep stages including information regarding the method sug- using ganglionar lattices for infants. In this regard, they ob-
gested in the current study. Section 3 describes briefly the data tained a 96.4 % accuracy rate from data which did not contain
set of the EEG signals employed in our research. This section artifacts. In their studies, the same authors also obtained a
Fig. 1 EEG signal for the 5000 Low amplitude mixed frequency
different stages of sleep: Awake, 0
N-REM stage 1, N-REM stage 2, -5000
N-REM stage 3, N-REM stage 3 0 500 1000 1500 2000 2500 3000 3500 4000
and REM 5000 Alpha waves
0
-5000
0 500 1000 1500 2000 2500 3000 3500 4000
2000
K-complex
0
Sleep spindle
-2000
0 500 1000 1500 2000 2500 3000 3500 4000
5000
0
-5000
0 500 1000 1500 2000 2500 3000 3500 4000
5000
0
Delta waves
-5000
0 500 1000 1500 2000 2500 3000 3500 4000
5000 Sawooth waves
0
-5000
0 500 1000 1500 2000 2500 3000 3500 4000
J Med Syst (2014) 38:18 Page 3 of 21, 18
84.9 % accuracy rate which included data consisting of arti- classifier. As a novel data pre-processing method, Gunes
facts. By using discrete wavelet transform for sleep staging, et al. [32] proposed the combination of k-means clustering
Oropesa et al. [20] divided EEG waves into 7 specific fre- based feature weighting (KMCFW) with C4.5 decision tree in
quency bands. The authors computed the energy of the bands the classification of sleep stages. Whilst classifying sleep
and obtained a 76.6 % accuracy rate by transferring the stages with ten-fold cross validation by using frequency do-
obtained values into an artificial neural network (ANN) envi- main features belonging to EEG signals, a decision tree was
ronment. In another study, Agarval and Gotman [21] obtained with a classification accuracy rate of 37.84 %. Sleep
employed segmentation and clustering strategies that included stages were classified, with a 41.85 % accuracy rate, by
the active participation of the sleep operator in the implemen- employing a decision tree based on frequency domain features
tation and obtained an 80.6 % accuracy rate. In their studies, which belonged to EEG and chin EMG signals. A 92.40 %
Estrada et al. [22] utilized EEG signals and employed three classification accuracy rate was achieved with regard to sleep
different algorithms identified as relative spectral band energy, stage classification using a decision tree in weighted frequen-
harmonic parameters and itakura distance. By using a different cy domain features belonging to an EEG signal using
application from the other studies which used linear measure- KMCFW. Tagluk et al. [33] used feed-forward neural network
ment, the authors undertook non-linear analysis of the EEG structure for the classification of sleep stages. As an input to
signals. These studies showed that non-linear measures could the neural network, they preferred EEG, EMG and (EOG)
also be used effectively in the scoring process. Becq et al. [23] data. The authors have achieved a success rate of 74.7 % in the
investigated which classifier was most effective in sleep scor- study of the 5 classified stages. Fraiwan et al. [34] used time-
ing, and compared 5 classification algorithms. These were: frequency entropy values as an attribute in different frequency
linear and quadratic classifiers; k nearest neighbors; parzen bands. They preferred linear discriminant analysis algorithm
kernels; and neural networks algorithms. According to their for the classification stage. The authors have achieved a
findings, the best classifier was found to be neural networks success rate of % 84 in the study of the 6 classified stages.
with a 72 % scoring accuracy. Sinha [24] classified three Ozsen [9] used EEG, EMG and EOG data to classify the sleep
stages of a) sleep spindles b) REM sleep and c) awake stages, stages. First, she obtained the time-frequency based attributes
and obtained a 95.55 % classification rate in a study which to represent this data. In the next step, to determine the
employed the use of wavelet transform and ANN. On the effective attributes, sequential class-dependent feature selec-
other hand, Šušmáková and Krakovská [25] used discriminant tion algorithm was used. He presented the selected attributes
analysis using Fisher’s quadratic classifier to study 73 char- as input to the neural network. In the study of the 5 classified
acteristic measures in the sleep staging process. Chapotot and stages, a success rate of % 90.93 was achieved. Hsu et al. [35]
Becq [26] proposed a new method which combined robust obtained energy-based features from EEG signals to classify
feature extraction, artificial neural network classification and sleep data. These attributes were presented as input to feed-
flexible decision rules. They obtained an average accuracy back neural network. The authors have achieved a success rate
rate of 78 %. In order to classify alert vs. drowsy states in of % 87.2 in the study of the 5 classified stages.
arbitrary subjects, Subasi et al. [27] used 5-s long sequences of A different approach in classification of sleep stage is
full-spectrum EEG recordings and employed a wavelet-based proposed by this study. One of the most important prob-
neural network model, trained with the Levenberg–Marquardt lems in classifying sleep stages is the identification of
algorithm, to discriminate the alertness level of the subject. the features that will represent the data. Various algo-
This study’s classification results were found to be 93.3 % for rithms used for the identification of features represent
alert, 96.6 % for drowsy and 90 % for sleep states. Zoubek data, but most of the time, determining which algorithms
et al. [28] used feed-forward neural network structure for the identify the effective features for the specified problem
classification of sleep stages. They used EEG, electromyo- depends on the trial and error method. A system that will
gram (EMG) and electrooculogram (EOG) data as input data automatically generate the identification of effective fea-
to the neural network. By obtaining Fourier transform coeffi- tures will facilitate the works of the researchers to a great
cients, entropy values, kurtosis value and standard deviation extent. A current study seeks to determine a method that
values from these signals, they presented as input to the will ensure the identification of features that can best
network. Doroshenkov et al. [29] developed a classification represent data. Previous studies in literature were
algorithm based on hidden Markov models using only EEG researched with this purpose in mind and 41 features
signals. Authors achieved best result in the classification of that have previously been used in the classification of
REM stage. Ebrahimi et al. [30] used neural networks and sleep stage were extracted. Feature selection algorithms
wavelet packet coefficients to differentiate between different are used to identify the features that provide the highest
sleep stages. The authors have achieved a success rate of % 93 effect. Identification of the classification algorithm that
in the study of the 5 classified stages. Jo et al. [31] obtained an provided the highest accuracy rates was possible with the
accuracy rate of 84.6 % by using a 4 stage genetic fuzzy features selected in the last stage.
18, Page 4 of 21 J Med Syst (2014) 38:18
Table 1 The distribution of sleep stages on dataset
Sleep stages Awake N-REM stage 1 N-REM stage 2 N-REM stage 3 N-REM stage 4 REM Total
Number of epochs in stages 1,109 897 988 1,078 764 324 5,160
The duration of each epoch is 30 s
Materials and methods Feature extraction
Data set Features in 4 different categories (time, non-linear, frequency-

based and entropy) were identified in the feature extraction
The data set used in the study was provided by St. Vincent’s phase. A total of 21 feature algorithms were used; 10 in the
University Hospital and University College Dublin [36]. Data time category, 5 in the non-linear category, 2 in the frequency-
from 25 individuals was selected from the database. The based category and 4 in the entropy category. 41 feature values
demographic characteristics of the individuals whose data were obtained out of 21 feature algorithms. The obtained
was used are as follows: 21 males and 4 females, age: 50± features are displayed in Table 2.
10 years, range 28–68 years; BMI: 31.6±4.0 kg/m2, range
25.1–42.5 kg/m 2 ; AHI: 24.1 ± 20.3, range 1.7–90.9. Time domain features
Polysomnogram recordings were obtained by utilizing the
Jaeger-Toennies system. By using a 10–20 electrode place- Statistical measures In this phase, the statistical attributes of
ments system, each of these acquisitions consisted of 2 EEG EEG signals are obtained. Short explanations of the attributes
channels (C3-A2 and C4-A1), 2 EOG channels and 1 EMG are provided in Table 3.
channel. Only one of the EEG channels (C3-A2) was used in
this work. The sample rate was 256 Hz. By employing a 10- Number of zero crossings (ZC) Zero-crossing is a time based
point noncausal moving average filter, the pre-processing feature widely used in electronics, mathematics, image pro-
stage was used to smooth the EEG signal. A 10th order IIR cessing and signal processing. It expresses the number of zero
Butterworth bandpass filter, at the frequency ranges of 0.1– crossings generated in a segment. Zero crossing occurs when
60 Hz, was applied to the EEG signals in order to remove the there is a difference in the signals in samples. This feature can
noise and artifacts from EEG signals. In addition, a 12th order be used as a measure that expresses the noise rate in the signal
stopband butterworth notchfilter at a frequency of 50 Hz was [37]. Researchers have observed that the number of zero
applied to the EEG signals in order to remove 50 Hz power crossings in EEG signals change during brain activities and
line interference. Then, the EEG signal was segmented different sleep stages [38]. ZC is defined as (xn−1 <0 and xn >
(divided) into epochs of 30 s., each epoch corresponding to 0) or (xn−1 >0 and xn <0) or (xn−1 ≠0 and xn =0).
a single sleep stage. Each segment was windowed using the
Hamming Window. Table 1 shows the distribution of the sleep Hjorth parameters Hjorth parameters (i.e., activity, mobility
epochs belonging to 25 subjects. The average recording time and complexity) are features that are often used in the analysis
was 6.9 h. Sleep experts performed the sleep scoring. Table 1 of EEG signals [39]. First and second derivatives of signals
presents the distribution of the selected sleep stages from the are used in calculating Hjorth parameters that are defined in
data set. Table 4.
Table 2 Attribute table
N Feature N Feature N Feature N Feature N Feature
1 Arithmetic mean (AM) 10 Petrosian fractal dimension (PFD) 19 D3-1 28 D5-2 37 Hjorth complexity (HC)
2 Maximum value (MaxV) 11 Rényi entropy (REn) 20 D3-2 29 D5-3 38 Mean curve length (MCL)
3 Minimum value (MinV) 12 Spectral entropy (SpEn) 21 D3-3 30 D5-4 39 Hurst exponent (HE)
4 Standard deviation (SD) 13 Permutation entropy (PEn) 22 D3-4 31 A5-1 40 Mean energy (ME)
5 Variance (V) 14 Approximate entropy (ApEn) 23 D4-1 32 A5-2 41 Mean teager energy (MTE)
6 Median (MN) 15 Wigner ville coefficient 1 (WV-1) 24 D4-2 33 A5-3
7 Zero crossings (ZC) 16 Wigner ville coefficient 2 (WV-2) 25 D4-3 34 A5-4
8 Skewness (SK) 17 Wigner ville coefficient 3 (WV-3) 26 D4-4 35 Hjorth activity (HA)
9 Kurtosis (KH) 18 Wigner ville coefficient 4 (WV-4) 27 D5-1 36 Hjorth mobility (HM)
J Med Syst (2014) 38:18 Page 5 of 21, 18
Table 3 Short explanations of the statistical attributes
Feature name Formula Explanation
Minimum value (MinV) : MinV=min[xn] (1) where xn =1,2,3…n is a time series, N is the number of data points,
Maximum value(MaxV) : MaxV=max[xn] (2) AM is the mean of the sample.
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Standard deviation(SD) : 2 (3)
SD ¼ ∑Nn¼1 ðxn −AM Þ N −1
Arithmetic Mean (AM) : N (4)
AM ¼ N1 ∑ xn
n¼1
2
Variance (V) : V ¼ ∑Nn¼1 ðxn −AM Þ N −1 (5)
3
Skewness (S) : S ¼ ∑Nn¼1 ðxn −AM Þ ðN −1ÞSD3 (6)
4
Kurthosis (K) : K ¼ ∑Nn¼1 ðxn −AM Þ ðN −1ÞSD4 (7)
th
Median (MN) : MN ¼ N þ1 2
(8) If the number of values is odd then
ðN2 Þ valueþðN2 þ1Þ value
th th
(9) If number of values is even then (where N=number of items)
MN ¼ 2
where σ0 is the variance or mean power of the signal, σ1 is the Mean energy (ME) Energy values in signals increase along
variance of the first derivative and σ2 is the variance of the with increases in activity. When different activities in different
second derivative of the signal. sleep stages are considered, it is believed that the mean energy
is a good indicator. It is calculated as the average of the
Nonlinear-based features squares of all samples in the signal. ME is defined as;
1 Xk
Petrosian fractal dimension (PFD) Fractal dimension is a ME½n ¼ x½m2 ð12Þ
chaotic method that calculates the complexity of a signal N m¼k−N þ1
[40]. PFD facilitates the speedy calculation of a fractal dimen-
sion. PFD undertakes this process by transforming the signal where x[m] is an EEG time series, N is the window length and
into binary sequences. It can be estimated by the following k is the last sample in the epoch.
expression:
Mean curve length (MCL) MCL was proposed by Esteller
PFD ¼ log10 k=ðlog10 k þ log10 ðk=ðk þ 0:4N δ ÞÞÞ ð10Þ
et al. [42] to provide an estimate for the Katz fractal dimension
[43]. It is commonly used in the identification of activity in
EEG signals. MCL is defined as;
where k is the number of signal’s samples and Nδ is the
number of sign changes in the signal derivative. Xk
1
CL½n ¼ x½m−x½m−1 ð13Þ
N m¼k−N þ1
Mean teager energy (MTE) MTE is a feature value that is
widely used in EEG research [2, 41]. MTE first proposed in
[41] and is defined as; where x[m] is an EEG time series, N is the window length and
k is the last sample in the epoch.
1 Xk
MTE½k ¼ x½m−12 −x½mx½m−2 ð11Þ
N m¼k−N þ3
Hurst exponent (H) The Hurst exponent is used as a measure
of long-range dependence within a signal, with a range of 0–1.
H is used in EEG time-series analysis to present the non-
stationary/antistatic EEG states observed in sleep. H is identi-
where x[m] is an EEG time series, N is the window length and
fied as:
k is the last sample in the epoch.
H ¼ logðR=S Þ=logðT Þ ð14Þ
Table 4 Parameters of Hjorth.
Here, T is the time of the data sample and R/S is the value of
Feature name Activity Mobility Complexity (HC)
the range which was rescaled. In the Hurst exponent, a value
(HA) (HM)
of 0.5 represents a signal with the attributes of a standard
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2
Equation HA=σ20 HM=σ1/σ0 HC ¼ ðσ2 =σ1 Þ −ðσ1 =σ0 Þ 2 casual walk or Brownian motion. Values H <0.5 show nega-
tive correlations between the increments or anti-persistent
18, Page 6 of 21 J Med Syst (2014) 38:18
time series, while values H >0.5 reflect a positive correlation coefficients should be calculated to obtain the most appropri-
between the increments or persistent natural series. ate structure of the signal. Figure 2 displays the extrication
procedure of the x(n) signal.
Frequency and time-frequency based features As can be seen in the figure, the discrete x(n) signal crosses
through the high-pass filter to generate detail coefficients
Wigner-Ville distribution (WV) WV is a time frequency distri- (Di[n]) and crosses through the low-pass filter to obtain ap-
bution and a useful method in the analysis of signals that proximation coefficients (Ai[n]). In each extraction level, half
potentially changes through time. WV is defined as; band filters facilitate the formation of the signals that forms
half of the frequency band.
X t 0 t 0 − jt0 ω
∞
WV ðt; ωÞ ¼ f tþ f t− e dω0 ð15Þ For the DWT, which is frequently used in the analysis of
t¼−∞
2 2 EEG signals, it is important to identify appropriate wavelet
type and determining the level of decomposition. The number
where WV(t,ω), is the energy distribution of f(t) signal in t time of levels of decomposition is chosen based on the dominant
and f frequency. * represents the complex conjugate of the frequency components of the signal. The levels are chosen
signal and ω is the frequency. such that those parts of the signal that correlate well with the
Features are also extracted from the time-frequency plane frequencies required for classification of the signal are
obtained through the application of a Wigner-Ville transfor- retained in the wavelet coefficients. EEG signals have signif-
mation to the signal. The largest frequency that corresponds to icant information in the range of 0–30 Hz. Therefore decom-
the time values in the plane is used while features are extract- position level is set at 5. In the case of decomposition level
ed. The function formed by these frequency values was selected below 5, selected sensitivity for low frequency band
modelled with a polynomial, from the third degree in the will be lost. For instance, the representation of the delta band
current study and the coefficients of this polynomial were will be difficult. Selecting the decomposition level above 5
used as features [2, 44]. As a result, 4 features composed of would be unnecessary. Because, with the fifth level, represen-
the coefficients of the polynomial (WV-1, WV-2, WV-3 and tation of all EEG bands is possible. Thus, the signal is
WV-4) were extracted. decomposed into the details of D1–D5 and one final approx-
imation, A5.
D i s c ret e wa vel et tr a n s f o r m ( D W T ) an d w a v e l e t Frequency range of sub-bands obtained by fifth level of
coefficients This is a transformation method developed to DWT decomposition of EEG data is given in Fig. 3. It can be
overcome the deficiencies of the Fourier transformation over seen from Fig. 3 that the components A5 decomposition is
non-stationary signals [45] and this method is less sensitive within the delta range (1–4 Hz), D5 decomposition is within
towards noise and can be easily applied to non-stationary the theta range (4–8 Hz), D4 decomposition is within the
signals. The load is rather heavy in continuous wavelet trans- alpha range (8–13 Hz) and D3 decomposition is within the
formation and in order to decrease the load, DWT is used. beta range (13–30 Hz). Because D1 and D2 bands have the
DWT is defined as: frequency knowledge above 30 Hz, it can be said that these
Z∞ sub-bands have less meaning comparing the others. Therefore
1 t−2 j k
DWT ð j; k Þ ¼ qffiffiffiffiffiffiffiffi xð t Þ ψ dt ð16Þ in this study, D3-D5 detail sub-bands and A5 approximation
2 j 2j
−∞ band were used.
In choosing the appropriate wavelet, different wavelets
Here x(t) is the signal itself and ψ is the main wavelet were tested and it was observed that the best results were
function. A low-pass filter is used to obtain the approximation obtained at fourth level of Daubechies wavelet. In order to
coefficient of the signal which has a low frequency and a high- investigate the effect of other wavelets on classifications ac-
pass filter is used to obtain the detailed coefficient of the signal curacy, tests were also carried out using other wavelets. Apart
which has a high frequency. A sufficient number of from Daubechies of order 4 (db4), Symmlet of order 10
Fig. 2 Sub-band decomposition D1

of DWT implementation; h[n] is g[n] 2
the high-pass filter, g[n] the low- D2
x[n] g[n] 2
pass filter
A1 D3
h[n] 2 g[n] 2
A2
h[n] 2
A3
h[n] 2 ……
J Med Syst (2014) 38:18 Page 7 of 21, 18
Fig. 3 Frequency range of sub-

bands obtained by fifth level of Original Signal
DWT decomposition of EEG data [256 Hz]
A1 D1
[0-64 Hz] [64-128 Hz]
A2 D2
[0-32 Hz] [32-64 Hz]
A3 D3
[0-16 Hz] [16-32 Hz]
A4 D4
[0-8 Hz] [8-16 Hz]
A5 D5
[0-4 Hz] [4-8 Hz]
(sym10), Coiflet of order 4 (coif4), and Daubechies of order 2 For example, Fig. 4 shows approximation (A5) and details
(db2) were also tried. It was noticed that the Daubechies (D1–D5) of an Awake EEG signal. Figure 5 shows approxi-
wavelet gives better accuracy than the others, and db4 is mation (A5) and details (D1– D5) of a N-REM stage 2 EEG
slightly better than db2. In several successful studies related signal. Statistical characteristics were calculated on the wave-
to EEG, it is seen that, 5 is selected as the level of decompo- let coefficients in order to decrease the dimensions of the
sition and db4 is preferred as the wavelet type [8, 46, 47]. extracted feature vectors. Statistical characteristics used to
Awake Stage N-REM2 Stage

D1 Original
D1 Original
5000 2000
0 0
-5000 -2000
0 1000 2000 3000 4000 0 1000 2000 3000 4000
50 500
0 0
-50 -500
0 1000 2000 3000 4000 0 1000 2000 3000 4000
200 500
D2
D2
0 0
-200 -500
0 1000 2000 3000 4000 0 1000 2000 3000 4000
1000 1000
D3
D3
0 0
-1000 -1000
0 1000 2000 3000 4000 0 1000 2000 3000 4000
2000 1000
D4
D4
0 0
-2000 -1000
0 1000 2000 3000 4000 0 1000 2000 3000 4000
1000 1000
D5
D5
0 0
-1000 -1000
0 1000 2000 3000 4000 0 1000 2000 3000 4000
2000 2000
A5
A5
0 0
-2000 -2000
0 1000 2000 3000 4000 0 1000 2000 3000 4000
Fig. 4 EEG signal belonging to awake stage with a 30 seconds epoch Fig. 5 EEG signal belonging to N-REM stage 2 with a 30 seconds epoch
from PSG recordings from PSG recordings
18, Page 8 of 21 J Med Syst (2014) 38:18
display time-frequency distributions of EEG signals are pro- Approximate entropy (ApEn) ApEn is a statistical parameter
vided below: which allows us to grasp the regularity of the time series [51,
52]. ApEn uses a non-negative number to quantify the com-
1. Mean of the absolute values of the coefficients in each plexity of the data and the formation of information in the time
sub-band (D3-1, D4-1, D5-1, and A5-1). series. The higher the number, the more complex and irregular
2. Average power of the wavelet coefficients in each sub- are the data in the time series. ApEn was applied to classify
band (D3-2, D4-2, D5-2, and A5-2). EEG in psychiatric diseases such as schizophrenia [49] and
3. Standard deviation of the coefficients in each sub-band epilepsy [53].
(D3-3, D4-3, D5-3, and A5-3). For N data points x(1),x(2),…,x(N) with an embedding
4. Ratio of the absolute mean values of adjacent sub-bands space of ℜ n, the ApEn measure was given by:
(D3-4, D4-4, D5-4, and A5-4).
NX
1 −nþ1
1 X N −n
ApEnðn; l; N Þ ¼ logC ni ðl Þ− logC ni ðl Þ
Features 1–2 show the frequency distribution of the signal, N −n þ 1 i¼1 N −n i¼1
whereas features 3–4 display the amount of transformation in ð20Þ
the distribution of the frequency [48]. 16 features are obtained
1
N−nþ1
in this manner. where: C ni ðl Þ ¼ N −nþ1 ∑ θ l−xi −x j was the correlation
j¼1
integral [54].
Entropy-based features
In this study, n was chosen as 2 and l was chosen as 0.15
Spectral entropy (SpEn) SpEn is a feature that allows the times the standard deviation of the original data sequence. The
identification of the degree of regularity in the complex sig- n and l values were selected based on Pincus’ results in
nals. The entropy of the data with regular probability distri- previous studies. These indicated good statistical validity for
bution will be higher. On the same line, the entropy of the data ApEn [55].
with irregular probability distribution will be low [49]. Dif-
ferent from the normal entropy estimates, spectral entropy is Permutation entropy (PEn) PEn was developed by Bandt
calculated by using the probability values of the power spec- and Pompe [56]. PEn is a complexity measure for time
trum of the signal. SpEn is defined as; series based on comparing neighboring values. It has the
quality of simplicity, sturdiness and very low computa-
X fb
1 tional cost [57]. Given a scalar time series (xt,t=1,2,…,
H sp ¼ − Pi log ð17Þ T), an embedding procedure forms a vector: Xt =[xt,xt +1,
a¼ f
P a
a
…,xt + (n − 1)l] with the embedding dimension, n, and the
where P is the power density over a defined frequency band of lag, l (here l=1). Then, Xt is arranged in an increasing
the signal, fa and fb are the lower and upper frequency and order: xtþð j1 −1Þl ≤ xtþð j2 −1Þl ≤ … ≤ xtþð jn −1Þl . For n different
power is normalized such that ∑Pn =1 [2, 49]. Hsp is also used numbers, there will be n! possible order patterns π, which
in the normalized form as; are also called permutations. Let f(π) denote its frequency
in the time series, its relative frequency is p(π)=f(π)/(T
SpEn ¼ H sp =logN f ð18Þ −(n−1)l). The permutation entropy is defined as [57]
X
n!
H p ð nÞ ¼ − pðπÞlnpðπÞ ð21Þ
where Nf is the number of frequencies within the defined band n¼1
[fa,fb]. In this work the frequency band is specified as [0, 50]
Hz. More details can be found in [56].
Rényi entropy (RE) Rényi entropy, introduced by Rényi [50],

is a special case of spectral entropy based on the concept of Feature selection algorithms
generalized entropy of a probability distribution. If p is a
probability distribution on a finite set, its RE of order q is The feature selection process, which is an important part of
defined to be: pattern recognition and machine learning, reduces computa-
tion costs and increases classification performance. A suitable
1 X representation of data, from all features, is an important ob-
Sq ¼ ln pqi ðq≠1 and q > 0Þ ð19Þ
1−q i
stacle in machine learning and data mining problems. All
original features are not always useful for classification or
RE is calculated as q=2 in this study. for regression tasks since, in the distribution of the dataset,
J Med Syst (2014) 38:18 Page 9 of 21, 18
some features may be irrelevant/redundant or noisy and they mRMR algorithm

reduce the classification performance. Therefore, the feature
selection process ought to be used in classification or regres- The mRMR algorithm is a feature extraction method which
sion problems so that classification performance is enhanced selects the most relevant features, with class labels, whilst
and the computation cost of classifiers is reduced [58]. This trying to minimize redundancy amongst the selected features
study preferred to use FCBF, t-test, ReliefF, Fisher score, [60]. mRMR uses mutual information to compute feature-
mRMR algorithms and efficient feature selection algorithms. feature and feature–label similarity scores. For a and b fea-
Brief information about these algorithms is provided below: tures, p(a) and p(b) are marginal probability functions and
p(a,b) is the connected probability distribution while I(a,b) is
the amount of mutual information of a and b:
Fast correlation based filter (FCBF)
X p ai ; b j
I ða; bÞ ¼ p ai ; b j log ð26Þ
FCBF was developed based on the relevance and redundancy
i; j pðai Þp b j
criteria among the features [59]. A Symmetrical Uncertainty
(SU) value is used in the assessment of relationships and The mutual information function allows speedy computa-
redundancy between 2 features and 2 classes. SU is based tion of non-linear similarities amongst the features. In terms of
on entropy and is a measure of a non-linear correlation. The the mRMR method, it aims to minimize redundancy (Rd)
SU value is calculated for A and B variables as shown in whilst maximizing relevance (Re) amongst the features.
Eq. 22. Here A and B express a feature or a class label couple
or any 2 features. 1 X
Rd ¼ I ði; jÞ ð27Þ
h i jS j2 i; j∈S

SU ðA; BÞ ¼ 2 IG AB =ðH ðAÞ þ H ðBÞÞ ð22Þ
1 X
Re ¼ I ðh; iÞ ð28Þ
jS j i∈S

IG AB ¼ H ðAÞ−H AB ð23Þ
Here, S is the set of features, h is target class, and I(i,j) is
mutual information between features i and j. Both equations
X use mutual information difference (MID) and mutual infor-
H ðAÞ ¼ − Pðai Þlog2 ðPðai ÞÞ ð24Þ mation quotient (MIQ) values to minimize redundancy and
i maximize relevance which were class labels. Definitions of
the MID and MIQ, are shown in Eqs. (29) and (30).
Here IG(A | B) is the information gain of A after observing
variable B [59]. The entropy of variable A and B are H(A) and MID ¼ maxðRd−ReÞ ð29Þ
H(B), respectively. P(ai) is the probability of variable A and
the entropy (H(A | B)) of A after observing values of another
variable B is defined as; MIQ ¼ maxðRd=ReÞ ð30Þ
X X

H AB ¼ − P b j P ai b j log2 P ai b j ð25Þ
j i Fisher score algorithm (FS)
where P(ai) are the prior probabilities for all values of A and Fisher score is an efficient and simple method which measures
P(ai | bj) is the posterior probabilities of A given the values of the distinctiveness between two classes. Using this method,
B [59]. the Fisher score value of each feature in the data set is
The FCBF algorithm is composed of two stages. First, SU computed, according to Eq. 31, and, in order to select the
values are calculated to evaluate the correlation between each efficient features from the data set, the threshold value is
feature and class labels. Features whose correlation values are, obtained by calculating the average Fisher score values of all
equal to or under a specific threshold value are eliminated. In features If the feature’s Fisher score value is higher than the
the second stage, redundant features are identified by using the threshold value, this feature is included in the attribute space
rest of the features to identify the ones that are most related of the feature data set. The Fisher score was calculated using
with the class labels. The features identified this way are also the formula:
eliminated. The features that are not eliminated in the two Xb 2
stages are identified as the features that are most related to the n μi;a −μi
a¼1 a
class labels and with the least amount of redundancy among FS ¼ Xb ð31Þ
n a σ 2
i;a
themselves. a¼1
18, Page 10 of 21 J Med Syst (2014) 38:18
where μi was the mean of the features; na was the number of the tree to undertake a classification process in the second
samples in the ath class; μi,a was the mean of the features in the stage. There are many algorithms developed in relation to
ath class; and σ2i,a was the variance of the features in the ath decision trees. The most widely known algorithms in the
class. literature are ID3, C4.5 and C5. A Current study utilizes the
C4.5 algorithm. The C4.5 algorithm, calculated based on
T-test algorithm entropy value, was developed by Quinlan [64]. The C4.5
algorithm was developed based on the ID3 algorithm in order
The t-test is a widely applied method used to determine to overcome some deficiencies and problems in ID3. It gen-
whether or not the means of two groups differ statistically. erates top to bottom decision trees just like the ID3 algorithm.
The t-test formula is the ratio given below. The upper half of The ID3 algorithm calculates the knowledge acquisition
the formula is the difference between two means or averages, values for each node in the decision tree [65]. C4.5 calculates
and the lower part is the variability measure or distribution of knowledge acquisition rates for the characteristics in the lower
the scores. set along with knowledge acquisitions. It selects the charac-
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi teristics with the highest knowledge acquisition rate as the
. 1 1 ðn0 −1Þσ20 −ðn1 −1Þσ21 node. The procedure continues until each branch of the tree
t ¼ jμ0 −μ1 j þ ð32Þ
n0 n1 n0 þ n1 −2 corresponds to a single class [65]. Later, the decision tree is
transformed into a rule set. For more information on C4.5
decision tree learning, the readers can refer to [62].
where μi, σ2i and ni denoted the mean; variance; and sample
size of class i, respectively. Feed-forward neural network (FFNN)
ReliefF algorithm (RF) Artificial neural networks (ANNs) are mathematical systems
that are composed of many operation units (neurons) connect-
ReliefF is the supervised feature weighting algorithms of the ed to each other in a weighted manner [66]. The operation unit
filter model. This determines the extent to which feature receives the signals from other neurons, combines them and
values discriminate the instances amongst different classes generates a numerical result. In general, operation units rough-
and is used in estimating the quality of the features according ly correspond to real neurons and they are combined with each
to this criterion. The ReliefF algorithm has the advantage of other in a network [67, 68]. This structure forms the neural
dealing with noisy and unknown data [61]. The ReliefF algo- networks. The current study uses feed-forward neural net-
rithm is calculated using the following formula: works, one of the neural network models. There are mainly
three layers in feed-forward artificial neural networks. These
1X
b
layers are, the input layer that receives the data entering the
RF ¼ dm f a;t −fPQðxa Þ; t −dm f a;t −fPRðxa Þ; t ð33Þ
2 a¼1 neural network, the hidden layer where operations are under-
taken and which trains itself according to the desired result
and the output layer which displays the output results. The
Sigmoid activation function which produced good results in
where fa,t represents the value of instances xa; fPQ(xa),t and
activation functions was preferred in the current study.
xa; fPR(xa);t represents the value on the tth features of the
nearest points to x a and dm represents the distance
measurement. Radial basis network (RBF)
Classifier algorithms RBF were developed and inspired by action and reaction
behaviours observed in biological neural cells. Training of
Decision trees (DT) RBF models can be regarded as a curve fitting approach in
multidimensional space [69]. Therefore, the training perfor-
Since decision tress are easier to structure and comprehend mance of RBF models turns into an interpolation problem
compared to other algorithms, they are most often used in the through finding the most appropriate surface for the data in
solution of classification problems [62]. The reason why this output vector space. RBF models are defined in three layers,
method is widely used is related to the simplicity and ease of as in the ANN structure and include input, hidden and output
understanding regarding the rules used in tree structures. layers. However, contrary to classical ANN structures, RBF
Decision trees use a multi-stage or consecutive approach in models use radial based activation functions and non-linear
the classification procedure [63]. The tree is generated in the cluster analysis in entering the hidden layer forms the input
first stage of the classification. Data is applied one by one to layer. Many types of activation functions (linear, cubic, Gauss,
J Med Syst (2014) 38:18 Page 11 of 21, 18
Fig. 6 The applied methods for

classification of sleep stages
Raw EEG Signals
(30 second segment)
Data Preprocessing Classification stage

(Obtaining the sleep
(Band-pass filter, stages Performance
Moving average filter, Feature Extraction Awake, REM, N-REM 1, Evaluation
Segmentation, N-REM 2, N-REM 3, N-
Windowing) REM 4)
Feature Selection
multi-quadratic and reverse multi-quadratic) are used in RBF suggests combining multiple trees with multi variables, each
models. The Gauss function was preferred in the current study. of which can be trained with different training clusters instead
of generating a single decision tree. Different training clusters
Support vector machine (SVM) are generated from the original training set by using bootstrap
and random feature selection. First, each decision tree reaches
SVM is a machine learning method based on statistical learn- a decision and the class which receives the maximum votes in
ing theory [70]. SVM is a classification and regression method the decision forest, is regarded as the final decision and the
that easily classifies the difficult data sets (linear and non- input data is included in this class. The number of trees was
linear) with the help of kernel functions. It has been widely determined to be 25 in the current study, based on good results
used recently since it has a strong theoretical foundation; it can obtained in precious studies.
be used with large data sets, it has a flexible algorithm along
with kernel functions and it generates high accuracy rates in
results. The method determines the linear function with the Performance evaluation methods
highest margin from among many linear functions, in order to
differentiate the data that can be classified linearly [71]. It Five methods for the performance evaluation were used. The-
transmits the data that cannot be linearly classified to the se methods are classification accuracy, confusion matrix, anal-
higher plane by using kernel functions and finds the multi ysis of sensitivity and specificity, and k-fold cross-validation.
lanes with the highest margins. Polynomial, radial based func-
tion (RBF), Pearson VII (PUK) and normalized polynomial
kernels are used as kernel functions. RBF was used in the Classification accuracy, sensitivity and specificity
current study as the kernel function.
The following expressions for classification accuracy, sensi-
Random forest algorithm (RF) tivity and specificity analysis are used:
TP þ TN
RF more than one decision tree is used in the classification Classification accuracy ¼ ð% Þ ð34Þ
procedure to increase the classification value. Breiman [72] TP þ FN þ FP þ TN
The best result

100 35000
RF DT
90 30000 RBF FFNN
Accuracy Rate (%)
80 SVM
25000
Time (s)
70 20000
60 15000
50 10000
RF DT
40 RBF FFNN 5000
SVM
30 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 1 3 5 7 9 11131517192123252729313335373941
No of features No of features
(a) (b)
Fig. 7 Variation in classification accuracy and computation time with increasing number of selected features in results from Experiment 1
18, Page 12 of 21 J Med Syst (2014) 38:18
Table 5 The best result of all methods and their corresponding size of Table 6 The best result of all methods and their corresponding size of
selected feature subsets selected feature subsets
Method Feature number Classification rate (%) Method Feature number Classification rate (%)
RF 12 96.39 RF 37 96.33
DT 38 93.30 DT 24 92.32
RBF 20 89.25 RBF 37 89.36
FFNN 39 71.53 FFNN 36 71.14
SVM 34 93.27 SVM 30 93.26
structure. The preprocessing of the EEG signals was complet-

TP
sensitivity ¼ ð% Þ ð35Þ ed in the first phase. In this phase, a band-pass filter, a
TP þ FN
Savitzky-Golay filter and the windowing and segmentation
processes mentioned in Section 3.1, were undertaken respec-
TN tively. In the next phase, features of the data were extracted.
specificity ¼ ð% Þ ð36Þ
FP þ TN For 6 sleep stages, 6 separate matrices were created using
Matlab R2009a. The size of these matrices for Awake, N-
where TP,TN,FP and FN denotes true positives, true nega- REM stage 1, N-REM stage 2, N-REM stage 3, N-REM stage
tives, false positives and false negatives, respectively. 4 and REM stages, were 1109×3840; 897×3840; 988×3840;
1078×3840; 764×3840; and 324×3840 respectively. Here,
k-Fold cross-validation matrix dimensions denoted the epoch number x sample num-
ber in each epoch. 41 separate feature parameters, mentioned
As a first step, data set is divided into k times sub clusters in a in Section 3, were obtained for each column in the matrices.
k-fold cross validation test. (k-1) times sub clusters are used in The next phase aimed to identify the efficient features from
training, whereas 1 sub cluster is used for testing the trained amongst the feature cluster. In order to achieve this aim, the
network. The process is continued until all sub clusters are left best m features, as selected by the five feature selection
outside training and tested. The success achieved in tested data methods, were used in each experiment (m=1, 2, …, 41).
sets, provides the reliability and validity degree of the By using the 10-fold cross validation method on the RF; DT;
employed method. Average testing success for k-times data FFNN; RBF; and SVM classifiers, the best m features were
set is obtained to arrive at a single validity value. evaluated for each feature selection method. Comparative
analyses of the experiments and the obtained results are pro-
vided below:
Experimental results and discussion
Experiment 1: Fisher score feature selection algorithm
A computer with an Intel(R) Core™ i7-2670QM 2.20 GHz
microprocessor and 8 GB RAM was used to solve the prob- During experiment 1, the Fisher score algorithm was carried
lems. Figure 6 gives the block diagram of the proposed out with regard to the feature cluster. The ordering of efficient
The best result

100 35000
90 30000 RF DT
RBF FFNN
Accuracy Rate (%)
80 25000 SVM
Time (s)
70 20000
60 15000
50 RF DT 10000
RBF FFNN
40 5000
SVM
30 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
(a) (b)
J Med Syst (2014) 38:18 Page 13 of 21, 18
The best result

100 35000
RF DT
90 30000
Accuracy Rate (%) RBF FFNN
80 SVM
25000
Time (s)
70 20000
60 15000
50 10000
RF DT
40 RBF FFNN 5000
SVM
30 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
(a) (b)
features obtained through the use of the Fisher score algorithm feature cluster, which has 12 feature values with a 96.39 %
was found to be: D5-1, PEn, D5-3, D3-3, MCL, PFD, D4-3, accuracy rate, obtains the highest accuracy classification rate.
D5-2, MTE, D4-2, SD, HE, A5-4, A5-1, D4-1, A5-3, D3-2, The ordering, of efficient features obtained through the use of
MaxV, D3-1, MinV, HC, HM, V, ME, HA, ApEn, ZC, A5-2, the RF algorithm, is found to be: 27 (D5-1); 13 (PEn); 29 (D5-
SpEn, KT, SK, WV-4, D5-4, REn, MN, D4-4, WV-3, D3-4, 3); 21 (D3-3); 38 (MCL); 10 (PFD); 25 (D4-3); 28 (D5-2); 41
WV-1, AM and WV-2. Figure 7 displays the classification (MTE); 24 (D4-2); 4 (SD) and 39 (HE). After the RF algo-
accuracy rates and the computation times of the feature clus- rithm, the best score was obtained with the DT; SVM; RBF;
ters identified by the Fisher score algorithm. and FFNN algorithms respectively.
Figure 7a displays that even with only one feature value in Investigation of Fig. 7b shows that the computation time
some of the algorithms, might allow high classification accu- increases with the increase in the number of features. The
racies to be obtained. This shows that the selected feature algorithm which worked the fastest, is the DT algorithm. In
clusters are efficient. The graph shows that, with the help of general, the DT algorithm is followed by the RF, SVM, FFNN
the RF, DT and RBF algorithms, the feature cluster, which and RBF algorithms respectively. The RBF algorithm is found
includes the first three feature values, obtain accuracy values to require the highest computation time. It can be seen that
higher than 80 %. It can be seen that the best results were neural network based algorithms require high computation
obtained using the RF algorithm. The investigation of the times.
other algorithms show that, in terms of success, the RBF
algorithm is more efficient in the first three features. After Experiment 2: mRMR feature selection algorithm
the first three feature clusters, the DT algorithm is more
advantageous and continues to be so for the first 23 features. During experiment 2, the mRMR algorithm was carried out
After the first 23 features, the SVM is advantageous and, in with regard to the feature cluster. The ordering of efficient
general, the lowest success level is seen with the FFNN features obtained through the use of the mRMR algorithm,
algorithm. was found to be: PFD, PEn, HE, D5-1, WV-1, AM, WV-2,
Table 5 shows the best result for each method together with WV-3, WV-4, MaxV, D4-1, D4-3, D5-2, D4-2, D5-4, MN,
the corresponding size of selected feature subset cardinalities. MinV, D5-3, D3-3, D3-2, D3-1, MCL, A5-1, HC, SpEn,
It is identified that, with the use of the RF algorithm, the MTE, ME, HA, A5-2, ZC, V, SK, KT, A5-3, A5-4, SD, D3-
4, ApEn, REn, D4-4 and HM. Figure 8 displays the classifi-
cation accuracy rates and the computation times of the feature
Table 7 The best result of all methods and their corresponding size of clusters identified by the mRMR algorithm.
selected feature subsets Figure 8a indicates that the classification accuracy rate
is low with regard to the first five feature values and that
Method Feature number Classification rate (%)
it increases above 80 % in some of the sub-clusters
RF 31 96.22 which consist of 5 and more feature values. In terms of
DT 39 92.73 success, the SVM algorithm is found to be more efficient
RBF 37 89.36 in the first five features. After the first five features, the
FFNN 36 71.08 RF algorithm becomes more advantageous. In general,
SVM 39 92.88 the best results are obtained with the help of the RF
algorithm. In terms of success, for the first 20 features,
18, Page 14 of 21 J Med Syst (2014) 38:18
The best result

100 35000
RF DT
90 30000 RBF FFNN
SVM
Accuracy Rate (%)
80 25000
Time (s)
70 20000
60 15000
50 10000
RF DT
40 RBF FFNN 5000
SVM
30 0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
(a) (b)
the RF algorithm is followed by the DT and RBF algo- Experiment 3: t-test feature selection algorithm
rithms. Following the first 20 features, the SVM algo-
rithm becomes more advantageous. The lowest success During experiment 3, the t-test algorithm was carried out with
level is found to be with the FFNN algorithm. regard to the feature cluster. The ordering of efficient features
Table 6 shows the best result for each method together with obtained through the use of the t-test algorithm, was found to
the corresponding size of selected feature subset cardinalities. be: PFD, HM, WV-4, A5-4, D5-1, MaxV, HE, ZC, MTE, D4-
It is identified that, with the use of the RF algorithm, the 4, SK, KT, SD, HC, MinV, REn, WV-1, D3-1, WV-3, V,
feature cluster, which has 37 feature values with a 96.33 % SpEn, D4-1, ME, AM, MN, A5-1, D3-2, WV-2, D5-4, ApEn,
accuracy rate, obtains the highest accuracy classification rate. A5-2, D3-4, D4-2, D5-2, A5-3, D3-3, D5-3, HA, PEn, D4-3,
The ordering, of efficient features obtained through the MCL. Figure 9 displays the classification accuracy rates and
use of the RF algorithm, is found to be: 10 (PFD), 13 the computation times of the feature clusters identified by the
(PEn), 39 (HE), 27 (D5-1), 15 (WV-1), 1 (AM), 16 t-test algorithm.
(WV-2), 17 (WV-3), 18 (WV-4), 2 (MaxV), 23 (D4-1), Figure 9a indicates that the classification accuracy rate is
25 (D4-3), 28 (D5-2), 24 (D4-2), 30 (D5-4), 6 (MN), 3 low with regard to the first four feature values, and it increases
(MinV), 29 (D5-3), 21 (D3-3), 20 (D3-2), 19 (D3-1), 38 above 80 % in some of the sub-clusters which consist of 4 and
(MCL), 31 (A5-1), 37 (HC), 12 (SpEn), 41 (MTE), 40 more feature values. In terms of success, the DT algorithm is
(ME), 35 (HA), 32 (A5-2), 7 (ZC), 5 (V), 8 (SK), 9 found to be more efficient in the first five features. After the
(KT), 33 (A5-3), 34 (A5-4), 4 (SD), 22 (D3-4). first five features, the RF algorithm becomes more advanta-
Investigation of Fig. 8b shows that the computation time geous. In general, the best results are obtained with the help of
increases with the increase in the number of features. The the RF algorithm. In terms of success, for the first 23 features,
algorithm which worked the fastest, is the DT algorithm. In the RF algorithm is followed by the DT algorithm until,
general, the DT algorithm is followed by the RF, SVM, FFNN following the first 23 features, the SVM algorithm becomes
and RBF algorithms respectively. The RBF algorithm is found more advantageous. The lowest success level is found to be
to require the highest computation time. It can be seen that with the FFNN algorithm.
neural network based algorithms require high computation Table 7 shows the best result for each method together with
times. the corresponding size of selected feature subset cardinalities.
It is identified that, with the use of the RF algorithm, the
feature cluster, which has 31 feature values with a 96.22 %
Table 8 The best result of all methods and their corresponding size of accuracy rate, obtains the highest accuracy classification rate.
selected feature subsets The ordering, of efficient features obtained through the use of
the RF algorithm, is found to be: 10 (PFD), 36 (HM), 18 (WV-
Method Feature number Classification rate (%)
4), 34 (A5-4), 27 (D5-1), 2 (MaxV), 39 (HE), 7 (ZC), 41
RF 32 96.22 (MTE), 26 (D4-4), 8 (SK), 9 (KH), 4 (SD), 37 (HC), 3
DT 36 92.61 (MinV), 11 (REn), 15 (WV-1), 19 (D3-1), 17 (WV-3), 5 (V),
RBF 37 89.36 12 (SpEn), 23 (D4-1), 40 (ME), 1 (AM), 6 (MN), 31 (A5-1),
FFNN 38 71.49 20 (D3-2), 16 (WV-2), 30 (D5-4), 14 (ApEn), 32 (A5-2).
SVM 26 93.12 Investigation of Fig. 9b shows that the computation time
increases with the increase in the number of features. The
J Med Syst (2014) 38:18 Page 15 of 21, 18
Fig. 11 The classification 6000 5490.1

accuracy rates and computation 100
times for the feature cluster 5000
Accuracy Rate (%)

80
identified by the FCBF algorithm 4000
Time(s)
60
3000
40 2000
20 1000
689 495.6
27.8 3.6
0 0
RF DT RBF FFNN SVM RF C4.5 RBF FFNN SVM
Classification algorithm Classification algorithm
(a) (b)
algorithm which worked the fastest, is the DT algorithm. In after 15 features. The lowest success level is found to be with
general, the DT algorithm is followed by the RF, SVM, FFNN the FFNN algorithm.
and RBF algorithms respectively. The RBF algorithm is found Table 8 shows the best result for each method together with
to require the highest computation time. It can be seen that the corresponding size of selected feature subset cardinalities.
neural network based algorithms require high computation It is identified that, with the use of the RF algorithm, the
times. feature cluster, which has 32 feature values with a 96.22 %
accuracy rate, obtains the highest accuracy classification rate.
The ordering, of efficient features obtained through the use of
Experiment 4: ReliefF feature selection algorithm
the RF algorithm, is found to be: 10 (PFD), 39 (HE), 13 (PEn),
2 (MaxV), 27 (D5-1),7 (ZC), 8 (SK), 3 (MinV), 12 (SpEn), 30
During experiment 4, the ReliefF algorithm was carried out
(D5-4), 29 (D5-3), 1 (AM), 23 (D4-1), 28 (D5-2), 6 (MN), 37
with regard to the feature cluster. The ordering of efficient
(HC), 9 (KT), 25 (D4-3), 11 (REn), 17 (WV-3), 16 (WV-2), 19
features obtained through the use of the ReliefF algorithm,
(D3-1), 15 (WV-1), 24 (D4-2), 21 (D3-3), 38 (MCL), 14
was found to be: PFD, HE, PEn, MaxV, D5-1, ZC, SK, MinV,
(ApEn), 31 (A5-1), 20 (D3-2), 41 (MTE), 33 (A5-3), 18
SpEn, D5-4, D5-3, AM, D4-1, D5-2, MN, HC, KT, D4-3,
(WV-4).
REn, WV-3, WV-2, D3-1, WV-1, D4-2, D3-3, MCL, ApEn,
Investigation of Fig. 10b shows that the computation time
A5-1, D3-2, MTE, A5-3, WV-4, A5-4, HM, SD, D4-4, A5-2,
increases with the increase in the number of features. The
ME, HA, V, D3-4. Figure 10 displays the classification accu-
algorithm which worked the fastest, is the DT algorithm. In
racy rates and the computation times of the feature clusters
general, the DT algorithm is followed by the RF, SVM, FFNN
identified by the ReliefF algorithm.
and RBF algorithms respectively. The RBF algorithm is found
Figure 10a indicates that the classification accuracy rate is
to require the highest computation time. It can be seen that
low until the first 3 feature values. It then increased above
neural network based algorithms require high computation
80 % in some of the sub-clusters which consisted of 3 and
times.
more feature values. In general, the best results are obtained
by using the RF algorithm. In terms of success, the RF
algorithm is followed by the DT, RBF and SVM algorithms. Experiment 5: FCBF feature selection algorithm
The SVM algorithm’s success is found to increase, especially
The FCBF feature selection algorithm, an efficient feature
6 extraction method, was carried out during Experiment 5. As
mentioned in Section 3.3.1, the FCBF algorithm works with
5
Number of Repetation
the element of elimination and, therefore, as compared to the

4 other feature selection algorithms, only one feature cluster was
obtained. The ordering of efficient features obtained with the
3
implementation of this algorithm, was as follows: 10 (PFD);
2 13 (PEn); 27 (D5-1); 31 (A5-1); and 8 (SK). Figure 11 dis-
1 plays the classification accuracy rates and computation times
for the feature cluster identified by the FCBF algorithm.
0
27 (D5-1) 10 (PFD) 13 (PEn) 2 (MaxV) 39 (HE) Figure 11a shows that the RF algorithm obtains the best
Features result and, with the use of the RF algorithm, the feature cluster,
Fig. 12 5 attributes that are most often chosen as a result of different which obtains 5 feature values identified with a 94.65 %
experiments accuracy rate, reaches the highest level of classification
18, Page 16 of 21 J Med Syst (2014) 38:18
Fig. 13 The classification 120 5550.45

6000
Accuracy Rate (%)

accuracy rates and computation 97.03 92.35 93.21
100 89.45
times for the cluster feature 5000
80 71.88
identified by the hybrid approach 4000
Time (s)
60 3000
40 2000
750.55 502.1
20 1000
30.5 4.8
0 0
RF C4.5 RBF FFNN SVM RF C4.5 RBF FFNN SVM
Classification Algorithms Classification algorithm
(a) (b)
accuracy. In terms of success, the RF algorithm is followed by As can be seen in Fig. 12, during the stage of experiment 6,
the DT (%88.5), the RBF (87.92 %), the SVM (72.07 %) and 5 attributes algorithm was selected. These are 27 (D5-1), 10
the FFNN (62.94 %). (PFD), 13 (PEn), 2 (MaxV) and 39 (HE) attribute values,
Investigation of Fig. 11b shows that computation time respectively. These six attributes algorithm are presented as
increased with the increase in the number of features. The input to 5 different classification algorithms. Success rates
algorithm, which worked the fastest, was found to be the DT obtained are presented in Fig. 13.
algorithm. In general, the DT algorithm was followed the RF; Figure 13a shows that the RF algorithm obtains the best
SVM; FFNN; and RBF algorithms respectively. The RBF result and, with the use of the RF algorithm, the feature cluster,
algorithm was found to require the highest computation time. which obtains 5 feature values identified with a 97.03 %
It was seen that neural network based algorithms required high accuracy rate, reaches the highest level of classification accu-
computation times. racy. In terms of success, the RF algorithm is followed by the
DT (%92.35), the RBF (89.45 %), the SVM (93.21 %) and the
FFNN (71.88 %).
Experiment 6: A Hybrid Approach Investigation of Fig. 13b shows that computation time
increased with the increase in the number of features. The
In the first five experiments, effective attribute rankings were algorithm, which worked the fastest, was found to be the DT
obtained according to the 5 different attribute selection algo- algorithm. In general, the DT algorithm was followed the RF;
rithms. Experiments were carried out depending on these SVM; FFNN; and RBF algorithms respectively. The RBF
rankings. More functional approach aimed at the sixth exper- algorithm was found to require the highest computation time.
imental stage. At this stage, the most effective attributes It was seen that neural network based algorithms required high
obtained as a result of 5 different experiments were deter- computation times.
mined. Accordingly, first 10 attributes obtained as a result of
each algorithm are noted. To view the repetition frequency of
selected attributes, a histogram graph was plotted. According General assessment
to the histogram chart, a new set of attributes were obtained by
selecting 3 or more selected attributes. Experiments were Based on the results, the RF algorithm was found to be the
carried out according to these set of attributes. The obtained best algorithm in terms of success. Following that, the RF,
attributes and histogram graph showing the number of times SVM, DT and RBF algorithms were found to be effective in
they were selected are shown in Fig. 12. different feature clusters. Similar results were obtained in
Table 9 The best result of all

experiments Experiment Type Feature selection Classification Feature Classification
algorithm algorithm number rate (%)
Experiment 1 Fisher score RF 12 96.39

Experiment 2 mRMR RF 37 96.33
Experiment 3 t-test RF 31 96.22
Experiment 4 ReliefF RF 32 96.22
Experiment 5 FCBF RF 5 94.65
Experiment 6 A hybrid approach RF 5 97.03
J Med Syst (2014) 38:18 Page 17 of 21, 18
Fig. 14 Performance statistics 100

for each sleep stage 90
80
Percentage value (%)

70
60
50
40
30
20
10
0
Accuracy Sensitivity Specificity
Wake 0,100 0,100 0,100
Stage 1 098 096 099
Stage 2 094 095 099
Stage 3 096 096 099
Stage 4 097 097 099
REM 097 095 100
different experiments relating to the identification of compu- proposed method was found to produce efficient results in
tation time. The DT algorithm was found to be the one with terms of both success and speed performance.
the lowest computation time. It appear that, in general, this The errors, in each stage, can be investigated by exploring
algorithm is followed by the RF, SVM, FFNN and RBF the confusion matrix, as shown in Table 10. This indicates the
algorithms respectively. agreement between the proposed method and the experts’
Table 9 presents the best results obtained through each scores. The expert score was obtained by the consensus of
experiment. The best result was obtained through using the the three experts. Also, the proposed method substantially
feature cluster which included 5 feature values obtained by solved the problem of combining Stages 3 and 4, an aspect
combining the hybrid approach and RF algorithms. that the literature regarded as an important problem.
The best method’s success was evaluated according to The above-mentioned analyses, carried out in terms of the
performance evaluation criteria mentioned in Section 3.5 classification problem, were developed by taking into consid-
(see Fig. 14). These statistics were calculated based on a eration R&K criteria as specified in the study’s introduction.
one-versus-all classification (where the analyzed stage was There was also an assessment of the success of the proposed
positive and all other combined stages were negative). The method’s classification regarding sleep stages identified ac-
highest classification accuracy was obtained in the wake stage cording to the American Academy of Sleep Medicine
and, in terms of classification accuracy, N-REM stage 1, N- (AASM) standards [73]. Under Dr. Iber Conrad’s chairman-
REM stage 4, REM, N-REM stage 3 and N-REM stage 2 were ship, in 2007 the AASM determined new rules for scoring
found to follow the wake stage. Efficient results were ob- sleep, and these new rules were employed in sleep staging.
served, also, in terms of other statistical parameters. The There are the following changes in sleep stages definition: S1,
Table 10 Scoring agreement be-

tween manual scoring and the Expert’s score
proposed method (according to
R&K rules) Stage Awake Stage 1 Stage 2 Stage 3 Stage 4 REM
Proposed method Awake 1,109 0 0 0 0 0

Stage 1 0 875 25 7 0 0
Stage 2 0 15 931 25 5 0
Stage 3 0 7 15 1,037 15 3
Stage 4 0 0 7 5 742 8
REM 0 0 10 4 2 313
Accuracy (%) 100 97.55 94.23 96.20 97.12 96.60
Overall Accuracy (%) 97.03
18, Page 18 of 21 J Med Syst (2014) 38:18
S2, S3 and S4 are used instead of N-REM stage 1, N-REM signals with high rates of accuracy. The key parts of the study
stage 2, N-REM stage 3 and N-REM stage 4, respectively [32, are as follows:
73]. Since the characteristics of stages N-REM stage 3 and N-
REM stage 4 are very similar, the AASM combined stages S3 & The effect of the features as selected by the feature selec-
and S4 into the deep sleep, or slow wave sleep (SWS), stage. tion algorithm on performance, are found to be more
This was done, to facilitate simple and accurate sleep staging. positive and higher when compared to the use of all
The term, SWS, was used to reinforce this stage’s physical features.
meaning. In the same vein, the current study utilizes the five- & Investigation of the literature about EEG classification,
stage classification: Awake, S1; S2; SWS; and REM. During shows that many feature extraction algorithms have been
the study, experts undertook the sleep scoring process by used in studies, but the major problem is the identification
following AASM standards. Table 11 displays the results of the effective features. This study is important also, in
obtained from the confusion matrix table and the accuracy the sense that it will prove to be a guide to researchers in
rates obtained in each stage. the field of sleep study. The same method could be used in
As far as the AASM standards were concerned, an accura- the diagnosis and scoring of epilepsy, the identification of
cy rate of 98.02 % was obtained. This value was found to be depth in terms of anesthesia, migraine, etc. all of which
higher than that of the results obtained by RF standards. This use EEG signals, and efficient features could be identified.
difference in accuracy rates, was due mainly to the N-REM & This study presents a unique analysis, both in the identi-
levels’ similarity of in the RF standards, and this similarity fication of the most effective features, and in the identifi-
reduced the success of the classification. Combining S3 and cation of the most efficient algorithm in the classification
S4, based on the AASM recommendations, enabled us to of the problem. In the study, features were selected from
obtain better results. amongst the features used for the representation of the
The performance of the proposed procedure was compared problem, and the algorithm was chosen from amongst the
with the recent work available from the literature listed in 6 most popular classification algorithms. High levels of
Table 12. classification accuracy were obtained during the identifi-
Table 12 shows the performance of automatic sleep stage cation of efficient features and classification algorithms.
classification implementations including 6 sleep stages, were Whilst using other biomedical signals, the same method
generally in the range of 70–93 %. Many of the studies, which could be used to achieve high rates of accuracy. Studies
obtain 90 % or higher compatibility, were studies which such as this one, might be of service in discovering effec-
generalized the stages in 3–4 stages. In this sense, it is believed tive solutions to the question “Which feature algorithm
that this study has contributed substantially to the field. should be employed to obtain the feature that best repre-
sents the data?”
& The method proposed in this study, could be implemented,
Conclusion without difficulty, with other medical data, and it could be
used in areas other than the classification of sleep stages.
It is a difficult task to classify a patient’s sleep stages. This After establishing the algorithms used previously for the
requires the observation of the patient, an EEG, and the problems under consideration, researchers who were will-
collection of additional clinical information. This study pro- ing to employ the proposed method could easily identify
poses an efficient model to help neurologists to analyze EEG the effective features. It was not at all difficult to create the
format which the model required. After the data feed, the
system administered automatically, in turn, the implemen-
Table 11 Scoring agreement between manual scoring and the proposed
method (according to AASM rules) tation of the feature algorithms, the feature selection algo-
rithm and the classification algorithms. Also, the results of
Expert’s score the analysis were provided graphically. Analysis shows
that an important benefit was being able to obtain results
Stage Awake Stage 1 Stage 2 SWS REM
in a relatively short time. In future studies, it is planned to
Proposed method Awake 1,109 0 0 0 0 prepare a visual interface for the model. The application of
Stage 1 0 881 18 10 0 such an interface might increase its applicability.
Stage 2 0 9 947 16 4 & The effective features employed in this study were select-
SWS 0 7 17 1,804 3 ed from amongst the 41 feature parameters mentioned
REM 0 0 6 12 317 previously. Undoubtedly, different features were used,
Accuracy (%) 100 98.21 95.85 97.93 97.83 also, in the classification of EEG data. A future proposal
Overall Accuracy (%) 98.02 is planned to undertake a more comprehensive study in the
future that will expand the feature cluster.
Table 12 Recent works for automatic sleep stage scoring
J Med Syst (2014) 38:18
Authors Sleep stages Extracted features Model Accuracy rate
Šušmáková and W, S1, S2, SWS, REM EEG, EMG, EOG and ECG signals: Fractal exponent and Discriminant analysis Classification error: 23 %
Krakovská [25] fractal dimension
Chapotot and Becq W, transitional sleep(N1), shallow EEG signals: 2 Shannon entropy, Hjorth activity, mobility and FFNN (with back propagation W: 34 %, N1: 43 %,
[26] sleep(N2), deep sleep(N3), REM, complexity, Hurst exponent, spectral edge frequency 95 %, algorithm) N2: 51 %, N3: 82 %,
movement time (MT) relative power: delta, theta, alpha, sigma, beta and gamma; REM: 82 %, MT: 13 %
EMG signals: Shannon entropy, spectral edge frequency
95 %, gamma relative power .
Zoubek et al. [66] W, S1, S2, SWS, REM EEG signals: FT coefficients, EMG signals: entropy, EOG FFNN (with back propagation 71 % (EEG only), 80 % (EEG, EOG
signals: entropy, kurtosis number and standard deviation. algorithm) and EMG): 84.57 %: W, 64.56 %:
S1, 85.55 %: S2, 92.90 %: SWS,
72.81 %: REM
Tagluk et al. [67] REM, S1 (Drowsy), S2 (light sleep), EEG signals: 5 s segments of 0.3–50 Hz, EMG signals: FFNN (with back propagation 74.7 %
S3 and S4 (deep sleep) 40–4,000 Hz, LEOG and REOG signals: 0.5–100 Hz algorithm)
Sinha [24] W, Sleep spindles, REM EEG signals: Wavelet transform coefficients FFNN (with back propagation 95.35 %
algorithm), combined with
content rules
Fraiwan et al. [68] W, N-REM 1, N-REM 2, N-REM 3, EEG signals: Time frequency entropy at different Linear discriminant analysis 84 %
N-REM 4, REM frequency bands
Subasi et al. [27] Alert, drowsy, sleep EEG signals: Wavelet transform FFNN (with Levenberg– 94,03 %
Marquardt algorithm)
Ebrahimi et al. [69] W, S1, S2, SWS, REM EEG signals: Wavelet packet coefficients FFNN (with back propagation 93 %
algorithm)
Doroshenkov et al. W, N-REM 1, N-REM 2, N-REM 3, EEG signals: Fourier transform Hidden Markov models 92 %
[70] N-REM 4, REM
Gunes et al. [29] W, N-REM 1, N-REM 2, N-REM 3, EEG signals: Welch spectral analysis, k-means clustering Decision tree (C4.5) 92.40 %
N-REM 4, REM based feature weighting
Ozsen [9] W, N-REM 1, N-REM 2, N-REM 3, EEG, EOG and EMG signals: Class-dependent sequential ANN 90.93 %
REM feature selection
Hsu et al. [71] W, N-REM 1, N-REM 2, SWS, REM EEG signals: Energy features Elman recurrent neural classifier 87.2 %
Proposed method W, N-REM 1, N-REM 2, N-REM 3, EEG Signals: 27 (D5-1), 10 (PFD), 13 (PEn), Random forest 97.03 %
N-REM 4, REM 2 (MaxV) and 39 (HE)
Proposed method W, S1, S2, SWS, REM EEG Signals: 27 (D5-1), 10 (PFD), 13 (PEn), Random forest 98.02 %
2 (MaxV) and 39 (HE)
Page 19 of 21, 18
18, Page 20 of 21 J Med Syst (2014) 38:18
References 20. Oropesa, E., Cycon, H. L., and Jobert, M., Sleep stage classification
using wavelet transform and neural network. ICSI Technical Report
TR-99-008. pp. 1–7, 1999.
1. Pan, S. T., Kuo, C. E., Zeng, J. H., and Liang, S. F., A transition- 21. Agarwal, R., and Gotman, J., Computer-assisted sleep staging. IEEE
constrained discrete hidden Markov model for automatic sleep stag- Trans Biomed Eng 48:1412–1423, 2001.
ing. BioMedical Eng OnLine. 11:52–71, 2012. 22. Estrada, E., Nazeran, H., Nava, P., Behmehani, K., Burk, J., and
2. Sen, B., and Peker, M., Novel approaches for automated epileptic Lucas, E., EEG feature extraction for classification of sleep stages.
diagnosis using FCBF feature selection and classification algorithms. In: Proceedings of the 26th annual conference of the IEEE EMBS.
Turk. J. Electr. Eng. Comput. Sci. 21:2092–2109, 2013. San Francisco. pp. 196–199, 2004.
3. Fraiwan, L., Lweesy, K., Khasawneh, N., Wenz, H., and 23. Becq, G., Charbonnier, S., Chapotot, F., Buguet, A., Bourdon, L., and
Dickhaus, H., Automated sleep stage identification system based Baconnier, P., Comparison between five classifiers for automatic
on time-frequency analysis of a single EEG channel and random scoring of human sleep recordings. Stud Comput Intell. 4:113–127,
forest classifier. Comput Methods Prog Biomed 108(1):10–19, 2005.
2012. 24. Sinha, R. K., Artificial neural network and wavelet based automated
4. Artan, R. B., and Yazgan, E., Epileptic seizure detection from SEEG detection of sleep spindles, REM sleep and wake states. J Med Syst
data by using higher order statistics and spectra. itüdergisi 7:102–111, 32:291–299, 2008.
2008. 25. Šušmáková, K., and Krakovská, A., Discrimination ability of indi-
5. Fathima, T., Bedeeuzzaman, M., Farooq, O., and Khan, Y. U., vidual measures used in sleep stages classification. Artif Intell Med
Wavelet based features for epileptic seizure detection. MES J of 44:261–277, 2008.
Technol and Manag. 2(1):108–112, 2010. 26. Chapotot, F., and Becq, G., Automated sleep-wake staging combin-
6. Yuen, C. T., San, W. S., Rizoni, M., and Seong, T. C., Classification ing robust feature extraction, artificial neural network classification,
of human emotions from EEG signals using statistical features and and flexible decision rules. Int J Adapt Control and Signal Process
neural network. Int. J Integr Eng. 1:71–79, 2009. 24:409–423, 2010.
7. Albayrak, M., and Koklukaya, E., The detection of an epileptiform 27. Subasi, A., Kiymik, M. K., Akin, M., and Erogul, O., Automatic
activity on EEG signals by using data mining process. e-Journal of recognition of vigilance state by using wavelet-based artificial neural
New World Sci. Acad 4(1):1–12, 2009. network. Neural Comput Appl.. 14(1):45–55, 2005.
8. Subasi, A., EEG signal classification using wavelet feature extraction 28. Zoubek, L., Charbonnier, S., Lesecq, S., Buguet, A., and Chapotot,
and a mixture of expert model. Expert Syst Appl 32:1084–1093, F., Feature selection for sleep/wake stages classification using data
2007. driven methods. Biomed Signal Process Control. 2:171–179, 2007.
9. Ozsen, S., Classification of sleep stages using class-dependent se- 29. Doroshenkov, L. G., Konyshev, V. A., and Selishchev, S. V.,
quential feature selection and artificial neural network. Neural Classification of human sleep stages based on EEG processing using
Comput. & Applic. 2012. doi:10.1007/s00521-012-1065-4. hidden markov models. Biomed Eng 41:25–28, 2007.
10. Gandhi, T. K., Chakraborty, P., Roy, G. G., and Panigrahi, B. K., 30. Ebrahimi, F., Mikaeili, M., Estrada, E., and Nazeran, H., Automatic
Discrete harmony search based expert model for epileptic seizure sleep stage classification based on EEG signals using neural networks
detection in electroencephalography. Expert Syst Appl 39(4):4055– and wavelet packet coefficients. Proceeding of IEEE EMBC. pp.
4062, 2012. 1151–1154, 2008.
11. Mohseni, H. R., Maghsoudi, A., and Shamsollahi, M. B., Seizure 31. Jo, H. G., Park, J. Y., Lee, C. K., An, S. K., and Yoo, S. K., Genetic
detection in EEG signals: A comparison of different approaches. fuzzy classifier for sleep stage identification. Comput Biol Med 40:
IEEE EMBS’06. pp. 6724–6727, 2006. 629–634, 2010.
12. Alessandro, M. D’, Vachtsevanos, G., Hinson, A., Esteller, R., 32. Gunes, S., Polat, K., Yosunkaya, S., and Dursun, M., A novel data
Echauz, J., and Litt, B., A genetic approach to selecting the optimal pre-processing method on automatic determining of sleep stages: K-
feature for epileptic seizure prediction. IEEE EMBC’01, pp. 1703– means clustering based feature weighting. Complex Systems and
1706, 2001. Applications-ICCSA. Le Havre-France. pp. 112–117, 2009.
13. Kannathal, N., Choo, M., Acharya, U., and Sadasivan, P., Entropies 33. Tagluk, M. E., Sezgin, N., and Akin, M., Estimation of sleep stages
for detection of epilepsy in EEG. Comput Methods Prog Biomed 80: by an artificial neural network employing EEG, EMG and EOG. J
187–194, 2005. Med Syst 34:717–725, 2010.
14. Srinivasan, V., Eswaran, C., and Sriraam, N., Artificial neural net- 34. Fraiwan, L., Lweesy, K., Khasawneh, N., Fraiwan, M., Wenz, H.,
work based epileptic detection using time domain and frequency and Dickhaus, H., Classification of sleep stages using multi-
domain features. J Med Syst 29:647–660, 2005. wavelet time frequency entropy and LDA. Methods Inf Med
15. Bruzzo, A. A., Gesierich, B., Santi, M., Tassinari, C. A., Birbaumer, 49(3):230–237, 2010.
N., and Rubboli, G., Permutation entropy to detect vigilance changes 35. Hsu, Y. L., Yang, Y. T., Wang, J. S., and Hsu, C. Y., Automatic sleep
and preictal states from scalp EEG in epileptic patients-A preliminary stage recurrent neural classifier using energy features of EEG signals.
study. Neurol Sci 29(1):3–9, 2008. Neurocomputing 104:105–114, 2013.
16. Geng, S., Zhou, W., Yuan, Q., Cai, D., and Zeng, Y., EEG non-linear 36. Goldberger, A. L., Amaral, L. A., Glass, L., et al., PhysioBank,
feature extraction using correlation dimension and Hurst exponent. PhysioToolkit, and PhysioNet: components of a new research re-
Neurol Res 33(9):908–912, 2011. source for complex physiologic signals. Circulation 101(23):215–
17. Bao, F. S., Lie, D. Y., and Zhang, Y., A new approach to automated 220, 2000.
epileptic diagnosis using EEG and probabilistic neural network. 37. Smith, J. R., Karacan, I., and Yang, M., Automated EEG analysis
ICTAI’08. pp. 482–486, 2008. with microcomputers. Sleep 1(4):435–443, 1979.
18. Sezer, E., Isik, H., and Saracoglu, E., Employment and comparison of 38. Quyen, M. L. V., Martinerie, J., Baulac, M., and Varela, F.,
different Artificial Neural Networks for epilepsy diagnosis from EEG Anticipating epileptic seizures in real time by a non-linear analysis
signals. J Med Syst 36(1):347–62, 2012. of similarity between EEG recordings. Neuroreport 10:2149–215,
19. Holzmann, C. A., Pe´rez, C. A., Held, C. M., Martıń, M. S., Pizarro, 1999.
F., Pe´rez, J. P., Garrido, M., and Pierano, P., Expert-system classifi- 39. Hjorth, B., Time domain descriptors and their relation to a particular
cation of sleep/waking states in infants. Med Biological Biol. Eng. model for generation of EEG activity. In: CEAN – Computerized
Comput. 37:466–476, 1999. EEG analysis, Stuttgart: Gustav Fischer Verlag. pp. 3–8, 1975.
J Med Syst (2014) 38:18 Page 21 of 21, 18
40. Petrosian, A., Kolmogorov complexity of finite sequences and rec- 57. Liu, X. F., and Wang, Y., Fine-grained permutation entropy as a
ognition of different preictal EEG patterns. IEEE CBMS’ 95. pp. measure of natural complexity for time series. Chinese Phys B 18:
212–217, 1995. 2690–2695, 2009.
41. Gardner, A. B., Krieger, A. E., Vachtsevanos, G., and Litt, B., One- 58. Cao, B., Shen, D., Sun, J. T., Yang, Q., and Chen, Z., Feature
class novelty detection for seizure analysis from intracranial EEG. J selection in a kernel Space. 24th Annual International Conference
Mach Learn Res 7:1025–1044, 2006. on Machine Learning, pp. 121–128, 2007.
42. Esteller, R., Echaus, J., Tcheng, T., Litt, B., and Pless, B., Line length: 59. Yu, L., and Liu, H., Feature selection for high-dimensional data: A
an efficient feature for seizure onset detection. IEEE EMBS’01. pp. fast correlation-based filter solution. ICML’03. pp. 856–863, 2003.
1707–1710, 2001. 60. Ding, C., and Peng, H. C., Minimum redundancy feature selection
43. Katz, M. J., Fractals and the analysis of waveforms. Comput Biol from microarray gene expression data, Second IEEE Computational
Med 18:145–156, 1988. Systems Bioinformatics Conference. pp. 523–528, 2003.
44. Avsar, E., Epileptic EEG signal classification using one-class support 61. Kononenko, I., Estimating attributes: Analysis and extensions of
vector machines, Istanbul Technical University. M.Sc. Thesis. 2009. RELIEF. ECML’94. pp. 171–182, 1994.
45. Hasiloglu, A., Rotation-Invariant texture analysis and classification 62. Sen, B., Ucar, E., and Delen, D., Predicting and analyzing secondary
by artificial neural networks and wavelet transform. Technical report, education placement-test scores: A data mining approach. Expert Syst
1999. Appl 39(10):9468–9476, 2012.
46. Subasi, A., Application of adaptive neuro-fuzzy inference system for 63. Kavzoglu, T., and Colkesen, I., Classification of satellite images
epileptic seizure detection using wavelet feature extraction. Comput using decision trees: Kocaeli case. Electron. J Map Technol. 2(1):
Biol Med 37(2):227–244, 2007. 36–45, 2010.
47. Mahajan, K., Vargantwar, M. R., and Rajput, M. S., Classification of 64. Quinlan, L., C4.5: Programs for machine learning. Morgan
EEG using PCA, ICA and Neural Network. Int. J. Eng Adv. Technol. Kaufmann, San Mateo, 1993.
(IJEAT) 1(1):1–5, 2011. 65. Akgobek, O., Application of inductive learning to gain knowledge of
48. Peker, M., and Sen, B., A new complex-valued intelligent sys- an expert system. VI. Production Research Symposium. pp. 1–4,
tem for automated epilepsy diagnosis using EEG signals. Glob J 2006.
Technol: 3rd World Conference on Inf Technol. 3:1121–1128, 66. Yigit, S., Eryigit, R., and Celebi, F. V., Optical gain model proposed
2013. with the use of artificial neural networks optimised by artificial bee
49. Sabeti, M., Katebi, S., and Boostani, R., Entropy and complexity colony algorithm. Optoelectronics Adv Mater Rapid Commun 5(9):
measures for EEG signal classification of schizophrenic and control 1026–1029, 2011.
participants. Artif Intell Med 47:263–274, 2009. 67. Celebi, F. V., A proposed CAD model based on amplified spontane-
50. Rényi, A., On a new axiomatic theory of probability. Acta Math ous emission spectroscopy. J Optoelectron Adv Mater 7(3):1573–
Hung. 6:285–335, 1995. 1579, 2005.
51. Approximate entropy, http://en.wikipedia.org/wiki/Approximate_ 68. Goktas, H., Cavusoglu, A., Sen, B., and Toktas, I., The use of
entropy (Accessed: 10.10.2012) artificial neural networks in simulation of mobile ground vehicles.
52. Xu, L., Meng, M. Q. H., Qi, X., and Wang, K., Morphology vari- Math Comput Appl. 12(2):87–96, 2007.
ability analysis of wrist pulse waveform for assessment of arterio- 69. Celebi, N., An accurate single CAD model based on radial basis
sclerosis status. J Med Syst 34(3):331–339, 2010. function network. J. Optoelectron. Adv. Mater Rapid Commun. 4(4):
53. Yuan, Q., Zhou, W., Li, S., and Cai, D., Epileptic EEG classification 498–501, 2010.
based on extreme learning machine and nonlinear features. Epilepsy 70. Cortes, C., and Vapnik, V., Support vector networks. Mach Learn
Res 96:29–38, 2011. 20(3):273–297, 1995.
54. Acharya, U. R., Joseph, K. P., Kannathal, N., Lim, C. M., and Suri, J. 71. Ocak, H., A medical decision support system based on support vector
S., Heart rate variability: a review. Med Biol Eng Comput 44:1031– machines and the genetic algorithm for the evaluation of fetal well-
1051, 2006. being. J Med Syst 37(2):1–9, 2013.
55. Pincus, S. M., and Goldberger, A. L., Physiological time-series 72. Breiman, L., Random forests. Mach Learn 45(1):5–32, 2001.
analysis: what does regularity quantify? Am Physiol. Soc.. 266: 73. American academy of sleep medicine task force, Sleep related breath-
1643–1656, 1994. ing disorders in adults: recommendations for syndrome definition
56. Bandt, C., and Pompe, B., Permutation entropy: a natural complexity and measurement techniques in clinical research. Sleep 22:667–689,
measure for time series. Phys Rev Lett 88(17):1–4, 2002. 1999.

A Comparative Study On Classification of Sleep Stage Based On EEG Signals Using Feature Selection and Classification Algorithms

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Comparative Study On Classification of Sleep Stage Based On EEG Signals Using Feature Selection and Classification Algorithms

Uploaded by

Copyright:

Available Formats

J Med Syst (2014) 38:18

A Comparative Study on Classification of Sleep Stage Based

Table 1 The distribution of sleep stages on dataset

The duration of each epoch is 30 s

Materials and methods Feature extraction

Data set Features in 4 different categories (time, non-linear, frequency-

Table 2 Attribute table

N Feature N Feature N Feature N Feature N Feature

Table 3 Short explanations of the statistical attributes

Feature name Formula Explanation

Fig. 2 Sub-band decomposition D1

Fig. 3 Frequency range of sub-

Awake Stage N-REM2 Stage

Rényi entropy (RE) Rényi entropy, introduced by Rényi [50],

some features may be irrelevant/redundant or noisy and they mRMR algorithm

Fig. 6 The applied methods for

Data Preprocessing Classification stage

The best result

structure. The preprocessing of the EEG signals was complet-

The best result

The best result

The best result

Fig. 11 The classification 6000 5490.1

Accuracy Rate (%)

the element of elimination and, therefore, as compared to the

Fig. 13 The classification 120 5550.45

Accuracy Rate (%)

Table 9 The best result of all

Experiment 1 Fisher score RF 12 96.39

Fig. 14 Performance statistics 100

Percentage value (%)

Table 10 Scoring agreement be-

Proposed method Awake 1,109 0 0 0 0 0

Authors Sleep stages Extracted features Model Accuracy rate

You might also like