You are on page 1of 28

MATLAB ANALYSIS OF EEG SIGNALS FOR

DIAGNOSIS OF EPILEPTIC SEIZURES

By
ASHWANI SINGH
117BM0731

Under The Guidance of

Prof. Bibhukalyan Prasad Nayak

Department of Biotechnology and Biomedical


Engineering
National Institute of Technology Rourkela

1
ACKNOWLEDGEMENT

I would like to express my special appreciation and thanks to


my Professor B.P. Nayak, you have been a tremendous mentor
for me. I would like to thank you for encouraging my research
and for allowing me to grow as a person. Your advice on both
research as well as on my career is priceless.
I thank my senior (Phd) Rakesh Buhlan for giving me this
opportunity to work on this project on signal processing using
MATLAB and their inputs in suggesting various feature
extraction techniques and algorithms.

2
Table of Contents

SR.NO. TOPIC PAGE NO.

1 Introduction 4

2 MATLAB code for extracted features 7

3 Classification techniques and corresponding 11


MATLAB codes

4 Analysis of features using hypothesis testing 18

5 Miscellaneous analysis tools 21

6 Discussion 29

3
INTRODUCTION

The following report presents a method diagnosis of epileptic seizure in patients


with brain related disorders. This is done through collection of large samples of
EEG data of patients showing seizures and no seizures and using certain feature
extraction techniques to help uniquely classify the seizure-signals. The data of
patients with seizures was extracted from ictal region of the brain. The interval in
which the patients showed no signs of seizure was interictal data. The signal
sample space used consists of 200 EEG epochs, 100 ictal and 100 inter-ictal.
This thesis focuses on the implementation of paper

(Lan-Lan Chena, JianZhanga, Jun-ZhongZoua, Chen-JieZhaob, Gui-


SongWang, A framework on wavelet-based non linear features and extreme
learning machine for epileptic seizure detection, Elsevier, Biomedical signal
procesisng and control doi : http://dx.doi.org/10.1016/j.bspc.2013.11.010)

The wavelet decomposition technique was used to filter the signal into different
sub-bands each in a unique frequency range. The sampling frequency used while
extracting the signal was 256Hz. Thus the maximum frequency of the eeg signal
would be 128Hz. Thus a 7 level decomposition is used and type of debauche 4th
(db4) order filter was used. The following features were used for classification:

1) Approximate entropy of main signal

2) Approximate entropy of sub-bands d3-d7

3) Sample entropy of main signal

4) Recurrence Quantification analysis (%DET, %ENTR, LEN, %REC)

5) Sub band Energy d3-d7

This makes a total of 16 features for each EEG epoch, thus making a total of 200
x 16 matrix entries. The MATLAB code to extract sample entropy is as follows.

Various classification algorithms were also explored like least square support
vector machines, extreme machine learning, discriminant analysis, resilient

4
back propagation neural networks and random forest algorithm. The accuracy of
classification using various features was studied and compared. The ROC curves
to study sensitivity of various features was also done. Other miscellaneous
analysis tools such as box plot diagrams, frost algorithm, scalograms and time-
frequency plots were also used to study the signals

EXTRACTED FEATURES

1) In statistics, an approximate entropy (ApEn) is a technique used to quantify


the amount of regularity and the unpredictability of fluctuations over time-series
data.

The algorithm used to measure approximate entropy is as follows :

Two input parameters, the embedding dimension m and a tolerance window r


must be specified. Usually, m is set to be 1,2 or 3 and r is set to be some
percentage of the standard deviation of the amplitude of time series. Given N
data points from a time series {x(n)} = x(1), x(2), ..., x(N), the level of ApEn
can be calculated using the following steps:

Step1 : From m-dimensional vectors Xm (1) to Xm (N-m+1) defined as Xm


=[x(i), x(i+1),...x(i+m-1)], i=1,2,. .. N-m+1. Each vector is viewed as a template.

Step 2: The distance between each template and the other templates denoted as
d[Xm (i), Xm (j)] is computer as the maximum absolute difference between their
scalar components :

d[Xm (i), Xm (j)]= max(|x(i+k)- X(k+j)|).

Step 3 : For a given template Xm(i) count the number of template matching,
denoted as Øi i.e.the number of j(1<j<N-m+1) satisfying the distance d[Xm (i),
Xm (j)]<r. Then Øim(r) is the probablity that any template Xm(j) matches Xm(i) :

Øim(r)= 1/(N-m+1)*Øi

Step 4 :

5
Step 5 : Increase the dimension to m+1 and follow steps 1-4 to compute Øim+1(r)
and Øi(r)

Step 6 : The approximate entropy is defined as :

The code for computing approximate entropy for all the data is given in
Appendix I. This was done similarly for all sub-bands and the data was stored in
an excel files.

The approximate entropies were extracted and scatter plots were made for the
same. The scatter plots for different sub-bands are as shown below.

RESULTS : SCATTER PLOTS D3 SUB-BAND AND D4 SUB-BAND

Approximate entropy of both signals for d3 feature blue-ictal, red-interictal

6
Approximate entropy of both signals for d4 feature blue-ictal, red-interictal

RESULTS : SCATTER PLOTS D5 SUB-BAND AND D6 SUB-BAND

Approximate entropy of both signals for d5 feature blue-ictal, red-interictal

7
Approximate entropy of both signals blue-ictal, red-interictal

2) Sample Entropy : Sample entropy (SampEn) is a modification of


approximate entropy (ApEn), used extensively for assessing the complexity of a
physiological time-series signal, thereby diagnosing diseased state. Unlike
ApEn, SampEn shows good traits such as data length independence and trouble-
free implementation.

Sample Entropy is defined as follows :

The code for computing the same is given in Appendix II. This feature is
extracted for all the data and stored in excel files.

3) Recurrence Quantification Analysis : Recurrence quantification analysis


(RQA) is a method of nonlinear data analysis (cf. chaos theory) for the
investigation of dynamical systems. It quantifies the number and duration of
recurrences of a dynamical system presented by its phase space trajectory.

Typically we extract 4 main parameters.

a) Recurrence rate (%REC),which is the density of recurrence dots in are


recurrence plot. The recurrence rate corresponds with the probability that a
specific state will re-occur. %REC is calculated as shown below :

8
The code for computing the same is given in Appendix III. This feature is
extracted for all the data and stored in excel files.

b) Determinism (%DET), which is the ratio of recurrence dots on

the diagonal structures to all recurrence dots. Diagonal lines represent epochs
with similar time evolution of states. Therefore, %DET is related with the
determinism of the system. %DET is calculated as shown below, where P(l) is
the frequency distribution of the lengths of the diagonal structures in the RP.
Lmin is the threshold, which excludes the diagonal lines formed by the tangential
motion of a phase space trajectory, in this research Lmin =2.

c) ENTR refers to the Shannon Entropy of the frequency distribution of the


diagonal line lengths, which is a measure of the complexity of the recurrence
structure. ENTR is calculated by

d) LEN refers to average diagonal length of the recurrence plot. The code for
the same is in Appendix III.

9
RECCURENCE PLOTS :

Reccurence plot of Normal and Abnormal Signal

4) Sub Band Energy :For the purpose of comparison with the three non linear
methods introduced above, the energy in each of the sub-band signals {d3 d4 d5
d6 d7} is computed. The sub-band signals {d3 d4 d5 d6 d7} are not used
directly as entries of the feature vector. Using such a direct representation of the
EEG waveform is too sensitive to noise and slight variations in morphology
.Instead, the energy in each of the sub-band signals {d3 d4 d5 d6 d7} is used.
An explicit representation of the features computed for each sub-band is:

The code for computing approximate entropy for all the data is given in
Appendix IV. This was done similarly for all sub-bands and the data was stored
in an excel files.

10
CLASSIFICATION TECHNIQUES AND CORRESPONDING MATLAB
CODES

The extracted features were then all stored in an excel file of size 200 x 16 ,
the first 100 entries being ictal of seizure signal features and 101 to 200 being
inter-ictal signals. Now this data is fed as training to classifiers. Various types
of classifiers and their accuracy were studied in this report.

1) Support Vector Machines :

An ls-svm, least squared type support vector machine was used for
classification of this data. First a target vector was created for binary
classification of the data. Seizure signals were given a target of 0 and abnormal
signals a target of 1. The code of classification using ls-svm in MATLAB is
presented in Appendix 4.

The obtained accuracy for given training data was found to be 99.5%.

ROC Plots Obtained using LS-SVM

11
2) Aritifcal neural networks with resilient back propagation :

ACCURACY : 98.64 %

Back Prop ANN Training Accuracy

3) CLASSIFICATION USING DISCRIMINANT ANALYSIS :


Linear discriminant analysis (LDA) and the related Fisher's linear discriminant
are methods used in statistics, pattern recognition and machine learning to find a
linear combination of features which characterizes or separates two or more
classes of objects or events. The resulting combination may be used as a linear
classifier, or, more commonly, for dimensionality reduction before later
classification. The code is in Appendix 6.

4) RANDOM FOREST ALGORITM : (Tree bagger)


Random forests are an ensemble learning method for classification (and
regression) that operate by constructing a multitude of decision trees at training
time and outputting the class that is the mode of the classes output by individual
trees.

12
The accuracy obtained using the same classifier was 99%. This uslally depends
on number of trees used. The code for the same is in Appendix 7.

5) EXTREME MACHINE LEARNING :


ELM was originally proposed for standard single hidden layer feedforward
neural networks (with random hidden nodes (random hidden neurons, random
features)), and has recently been extended to kernel learning as well:

ELM provides a unified learning platform with widespread type of feature


mappings and can be applied in regression and multi-class classification
applications directly. From the optimization method point of view ELM has
milder optimization constraints compared to SVM, LS-SVM and PSVM;

In theory ELM can approximate any target continuous function and classify any
disjoint regions; In theory compared to ELM, SVM, LS-SVM and PSVM
achieve suboptimal solutions and require higher computational complexity. (cf.:
Dedails on the reasons why SVM/LS-SVM provide suboptimal solutions). The
accuray obtained using this was 93.5 % and code is in Appendix 8.

ANALYSIS OF FEATURES USING HYPOTHESIS TESTING :

TTEST HYPOTHESIS TESTING:

ttest is basically used to check whether two data belong to the same normal
distribution or different ones. If two data belong to the same distribution we get
h=1 and the null hypothesis (independent data) is rejected else it is accepted.
Thus ideally all features should show h=1. The code for the same is in
Appendix 9.

13
MISCELLANEOUS ANALYSIS TOOLS :

WAVELET BASED SPECTROGRAM VIEW :

In signal processing, a scaleogram or scalogram is a visual method of displaying


a %wavelet transform. There are 3 axes: x representing time, y representing
scale, and z %representing coefficient value. The z axis is often shown by
varying the colour or %brightness. A scaleogram is the equivalent of a
spectrogram for wavelets.

Scalogram View for a Normal EEG Signal

14
SPECTROGRAM PLOTS FOR WAVELET SUB-BANDS (APP.11)

Time Frequency Plots for Normal Signals

Time Frequency Plots for Abnormal Signals

Time-frequency plots for a seizure signal

15
How sensitive are the features to classification?

An ROC plot of the different features and after learning is as shown below.
ROC analysis provides tools to select possibly optimal models and to discard
suboptimal ones independently from (and prior to specifying) the cost context
or the class distribution. ROC analysis is related in a direct and natural way to
cost/benefit analysis of diagnostic decision making.

ROC Plot of Sub-band Energies

16
ROC Plot of Approximate Entropy

ROC Plot of Recurrence Quantification Analysis

17
ROC Plot of Sample Entropy

BOX PLOT ANALYSIS :

To further compare the two signals statically, through the mean, median,
quartile etc, the box plots of the two signals for different features was obtained.

18
Box plots for seizure feature d3 sub-band energy (left) and inter-ictal are shown
clearly showing that the seizure signal has more energy than a non seizure
signal.

Discussion
In the course of this project the wavelet based signal processing technique was
studied and used for feature extraction. Features like sample entropy ,
approximate entropy , recurrence quantifiers, wavelet energy, etc of the main
signal as well as the decomposed signals were extracted. These features were
then used to classify the eeg signals.

Many different classifying techniques / algorithms were used. Their accuracies


were then tested. The random forest algorithm was found to give the most
accurate results. Results show that these extracted could actually be used for
seizure detection with good accuracy.

Scalograms, time-frequency plots, box-plots , ROC curves were among the


other tools used to extract information from the EEG signals.

APPENDIX

1) Code for computing Approximate entropies :

CODE FOR APPROXIMATE ENTROPY (main signal) :

a=textread('S001.txt','%f');

approx_entropy(2,0.5,a)

CODE FOR APPROXIMATE ENTROPY (d3 Sub-band)


i=1;

anNw=zeros(1,1);

a=textread('N001.txt','%f');

[C,L] = wavedec(a,7,'db4');

D3 = wrcoef('d',C,L,'db4',3);

19
anNw(i)=approx_entropy(2,0.5,D3); ; disp( anNw(i)) ; i=i+1;

2) CODE FOR SAMPLE ENTROPY (main signal) :

amp=zeros(1,1);

i=1;

a=textread('N001.txt','%f');

amp(i)=SampEn(2,0.5,a,1); i=i+1;

3) CODE FOR RECCURENCE QUANTIFICATION ANALYSIS :

rec=zeros(1,1);

det=zeros(1,1);

entr=zeros(1,1);

lent=zeros(1,1);

i=1;j=1;k=1;y=1;

a=textread('N001.txt','%f');

Q=transpose(a);

[RP,DD] = RPplot(Q,3,1,.5,0);

[RR,DET,ENTR,L] = Recu_RQA(RP,0)

rec(i)= RR; i=i+1;

det(j)=DET; j=j+1;

entr(k)=ENTR;k=k+1;

lent(y)=L;y=y+1;

4) CODE FOR SUB BAND ENERGIES :

E3=zeros(1,1);

E4=zeros(1,1);

20
E5=zeros(1,1);

E6=zeros(1,1);

E7=zeros(1,1);

i=1; j=1; k=1; l=1;m=1;

a=textread('S001.txt','%f');

[C,L] = wavedec(a,7,'db4');

[Ea,Ed] = wenergy(C,L);

E3(i)= Ed(3); i=i+1;

E4(j)=Ed(4); j=j+1;

E5(k)=Ed(5); k=k+1;

E6(l)=Ed(6); l=l+1;

E7(m)=Ed(7);m=m+1;

5) The Code for LS-SVM training and Accuracy :

X=xlsread('Final.xls');

X=transpose(X);

T=zeros(200,1);

for i=100:200

T(i)=1;

end

T=transpose(T);

SVMStruct = svmtrain(X,T,'method','LS')

plotroc(T,X)

21
%Define input A

Group = svmclassify(SVMStruct,A)

plotroc(T,X)

ACCURACY TEST : 99.5%


X=xlsread('Final.xls');

X=transpose(X);

T=zeros(200,1);

for i=101:200

T(i)=1;

end

T=transpose(T);

net = feedforwardnet(10,'trainrp')

net = train(net,X,T);

y=net(X);

plotroc(T,y)

5) Back Prop ANN Code with ROC Plot :

X=xlsread('Final.xls');

X=transpose(X);

T=zeros(200,1);

for i=101:200

T(i)=1;

end

T=transpose(T);

net = feedforwardnet(10,'trainrp')

22
net = train(net,X,T);

y=net(X);

plotroc(T,y)

6) LINEAR DISCRIMINANT ANALYSIS MATLAB CODE :

X=xlsread('Final.xls');

T=zeros(200,1);

for i=101:200

T(i)=1;

end

class = classify(Y,X,T);

ACCURACY : 99%
X=xlsread('Final.xls');

T=zeros(200,1);

for i=101:200

T(i)=1;

end

count=0;

for i=1:100

Sample=A(i, :) ;

class = classify(Sample,X,T);

if(class==0) count=count+1;

end

end

for i=101:200

23
Sample=A(i, :) ;

class = classify(Sample,X,T);

if(class==1) count=count+1;

end

end

Accuracy= count/200*100

7) RANDOM FOREST ALGORITHM CODE

X=xlsread('Final.xls');

X=transpose(X);

T=zeros(200,1);

for i=101:200

T(i)=1;

end

X=transpose(X);

B = TreeBagger(2,X,T,'NVarToSample',1)

Class = predict(B,P)

ACCURACY OF RANDOM FOREST : 99%


X=xlsread('Final.xls');

T=zeros(200,1);

for i=101:200

T(i)=1;

end

B = TreeBagger(2,X,T,'NVarToSample',1)

count=0;

24
A=xlsread('Final.xls');

for i=1:100

Sample=A(i, :) ;class = predict(B,Sample);

if(strcmp(class,'0'))count=count+1;

end

end

for i=101:200

Sample=A(i, :) ;

class = predict(B,Sample);

if(strcmp(class,'1')) count=count+1;

end

end

Accuracy= count/200*100

8) EXTREME MACHINE LEARNING :

93.5 % ACCURACY
[TrainingTime,TrainingAccuracy] = elm_train('Final.txt', 1, 20,'sig')

X=xlsread('Final.xls');

%X=transpose(X);

T=zeros(200,1);

for i=100:200

T(i)=1;

end

%T=transpose(T);

plotroc(T,X)

25
9) TTESTS :

X=xlsread('N Approximate entropy d3.xls');

X=X';

Y=xlsread('S Approximate entropy d3.xls');

Y=Y';

[h,p] = ttest2(X,Y)

X=xlsread('N Approximate entropy d4.xls');

X=X';

Y=xlsread('S Approximate entropy d4.xls');

Y=Y';

[h,p] = ttest2(X,Y)

X=xlsread('N Approximate entropy d5.xls');

X=X';

Y=xlsread('S Approximate entropy d5.xls');

Y=Y';

[h,p] = ttest2(X,Y)

X=xlsread('N Approximate entropy d6.xls');

X=X';

Y=xlsread('S Approximate entropy d6.xls');

Y=Y';

[h,p] = ttest2(X,Y)

X=xlsread('N Approximate entropy d7.xls');

X=X';

Y=xlsread('S Approximate entropy d7.xls');

26
Y=Y';

[h,p] = ttest2(X,Y)

X=xlsread('S E3.xls');

X=X';

Y=xlsread('S E3.xls');

Y=Y';

[h,p] = ttest2(X,Y)

X=xlsread('S E4.xls');

X=X';

Y=xlsread('S E4.xls');

Y=Y';

[h,p] = ttest2(X,Y)

X=xlsread('S E5.xls');

X=X';

Y=xlsread('S E5.xls');

Y=Y';

[h,p] = ttest2(X,Y)

X=xlsread('S E6.xls');

X=X';

Y=xlsread('S E6.xls');

Y=Y';

[h,p] = ttest2(X,Y)

27
10) SCALOGRAM :

COEFS = cwt(a,1:1,'db4');

SC = wscalogram('image',COEFS);

11) TIME FREQUENCY USING WAVELET :


a=textread('S010.txt');

[C,L] = wavedec(a,3,'db4');

D1 = wrcoef('d',C,L,'db4',1);

D2 = wrcoef('d',C,L,'db4',2);

D3 = wrcoef('d',C,L,'db4',3);

figure(1)

spectrogram(D1);

figure(2)

spectrogram(D2);

figure(3)

spectrogram(D3);

28

View publication stats

You might also like