Professional Documents
Culture Documents
Tech Project
Autism Severity Detection Using EEG
Signal
Sankalp Shrivastava
18EC35025
I hereby declare that the work contained in this report has been done by me under
the guidance of my supervisor Prof. Goutam Saha. The work has not been submitted
to any other institute for any degree or diploma. I have conformed to the norms and
guidelines given in the Ethical Code of Conduct of the Institute. Whenever I have used
materials (data, theoretical analysis, figures and text) from other sources, I have given
due credit to them by citing them in the text of the thesis and providing their details in
the references.
i
DEPARTMENT OF ELECTRONICS AND ELECTRICAL
COMMUNICATION ENGINEERING
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
KHARAGPUR - 721302, INDIA
Certificate
This is to certify that the project report entitled "Autism Severity Detection Us-
ing EEG Signal" submitted by Sankalp Shrivastava (Roll No. 18EC35025) to Indian
Institute of Technology Kharagpur towards partial fulfilment of requirements for the
award of degree of Masters in Technology in Electronics and Electrical Communication
Engineering is a record of bonafide work carried out by him under my supervision and
guidance during year, 2021-22.
ii
Abstract
Background:
Materials and Method:
Results:
Conclusions:
iii
Contents
Declaration i
Certificate ii
Abstract iii
List of Figures vi
Acronyms viii
1 Introduction 1
2 Literature Review 3
2.1 Fundamentals of EEG measurement . . . . . . . . . . . . . . . . . . . . . 3
3 Objectives 6
4 Motivation 7
5 Database Used 8
5.1 BCIAUT-P300: Benchmark dataset on Autism . . . . . . . . . . . . . . . 8
5.1.1 P300-Based BCI System . . . . . . . . . . . . . . . . . . . . . . . 8
5.1.2 BCI session procedure . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1.3 Dataset Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6 Experimental Setup 11
6.1 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.3.1 EEGNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.3.2 CNN-BLSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
iv
Contents
6.4 Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7 Experimental Results 20
7.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Data Preprocessing Results . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2.2 Independent Component Analysis . . . . . . . . . . . . . . . . . . 22
7.2.3 Z-score normalization . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.3 EEGNet Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.4 CNN-BLSTM Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
8 Discussion 31
9 Conclusion 33
10 Future Work 34
Bibliography 35
v
List of Figures
vi
List of Tables
vii
Acronyms
CT Computer Tomography. 4
ECG Electrocardiogram. 4, 14
EMG Electromyography. 14
viii
Acronyms
ix
1 Introduction
According to estimates from the CDC’s Autism and Developmental Disabilities Mon-
itoring (ADDM) Network, about 1 in 44 children are diagnosed with ASD [2]. Yet,
ASD is currently incurable. However, the effects of this condition can be lessened by
implementing comprehensive early interventions to improve children’s learning and func-
tioning as well as their participation in their communities [3]. Early ASD identification
is crucial because younger children learn the necessary abilities more quickly and be-
cause the early implementation of specialised education will help reduce some of the
ASD symptoms earlier [4].
Deep Learning has significantly reduced the requirement for manual feature extraction,
leading to state-of-the-art performance in areas like speech recognition and computer
vision. Particularly, the adoption of deep Convolutional Neural Network (CNN) has
increased in part as a result of their superior performance versus approaches depending
on hand-crafted features in a variety of difficult picture classification applications. Even
though CNNs have outperformed more traditional machine learning techniques, they
fall short when it comes to learning long-term, high-level temporally dispersed charac-
1
1. Introduction
teristics. Recurrent Neural Network (RNN) might be able to get around this restriction.
These models are able to learn long-term temporal patterns and have demonstrated ex-
cellent performance in the categorization of complicated time series.Long Short Term
Memory (LSTM) RNNs are the most popular kind. These networks do better in recog-
nising voice and language.
2
2 Literature Review
In this literature review, comprehensive study on a few methods related to the detection
of P300 waves in EEG signals have been presented. This chapter of the report reviews
a few published papers related to those methods. It also presents a detailed review of a
paper that describes a benchmark dataset on Autism for P300 based Brain Computer
Interface (BCI) that this project used.
3
2. Literature Review
100µV in amplitude, which is about 100 times weaker than Electrocardiogram (ECG)
signals. The brain waves can be categorized into six basic groups based on the dom-
inant frequencies as described in table 2.1. These brain wave types can be separated
and extracted from the power spectrum of the time brain signals by means of Fourier
transform. The normal alpha rhythm is the best known and most extensively studied
rhythm of the human brain. Alpha waves are usually observed better in the posterior,
central and occipital regions [6]. Alpha activity is induced by closing the eyes and by
relaxation. It is destroyed by eye-opening or altering mechanisms such as thinking and
calculating. Beta waves are dominant during the normal state of wakefulness with open
eyes.
4
2. Literature Review
study of cognitive processes of both a normal and disordered nature (neurological or psy-
chiatric disorders such as Autism Spectrum Disorder). Mental processes like perception,
selective attention, language processing, and memory take place across time intervals of
tens of milliseconds or less. Due to its high temporal resolution, the time course of these
activations can be determined with the aid of ERPs.
The amplitude of ERP components is usually much smaller than normal EEG com-
ponents making them difficult to recognise. In order to extract ERP signals, digital
averaging of epochs is employed. The natural background EEG noise and fluctuations
are averaged out, leaving the evoked brain potentials.
Oddball Paradigm
The oddball paradigm is a commonly used task for cognitive and attention measurement
in ERP studies. Presentations of sequences of repetitive stimuli are infrequently inter-
rupted by a deviant stimulus. The participant’s response to this "oddball" stimulus is
seen and recorded.
In ERP research it has been found that an event-related potential across the parieto-
central area of the skull that usually occurs around 300 ms after stimuli presentation
called P300 is larger after the target stimulus. The P300 wave only occurs if the subject
is actively engaged in the task of detecting the targets. Its amplitude varies with the
improbability of the targets. Its latency varies with the difficulty of discriminating the
target stimulus from the standard stimuli.
5
3 Objectives
1. To identify P300 Event Related Potential (ERP)s in the Autistic individuals from
the EEG signal.
6
4 Motivation
About 50 years ago, Autism Spectrum Disorder (ASD) was narrowly defined and was
considered a rare disorder of childhood onset. But today it has become a well publi-
cised, advocated, and researched lifelong condition, recognised as fairly common and
very heterogeneous. Several countries in the world have confirmed an increase in the
number of ASD cases. Due to the increasing number of ASD cases in various countries,
it is necessary to understand the impact of having children with ASD on family social
life because some of the characteristics of ASD patients, according to the American
Psychiatric Association (APA), are the characteristics of ASD, namely limitations or
disturbances in communication and social interaction The outlook for many individuals
with autism spectrum disorder today is brighter than it was 50 years ago; more peo-
ple with the condition are able to speak, read, and live in the community rather than
in institutions, and some will be largely free from symptoms of the disorder by adult-
hood. Nevertheless, most individuals will not work full-time or live independently. It
is also important to implement what we already know and develop services for adults
with autism spectrum disorder. Clinicians can make a difference by providing timely
and individualised help to families navigating referrals and access to community support
systems, by providing accurate information despite often unfiltered media input, and by
anticipating transitions such as family changes and school entry and leaving.
7
5 Database Used
1. the g. Nautilus system was used as the data acquistion module to record EEG
data.
2. The EEG data was acquired from 8 active electrodes positioned at C3, Cz, C4,
CPz, P3, Pz, P4 and POz locations.
3. The reference electrode was placed at the right ear and the ground electrode at
AFz location.
5. The Vizard toolkit was used as the stimuli presentation module which created and
displayed a virtual environment consisting of following objects which were used as
stimuli.
8
5. Database Used
• Each training session was divided into two phases: calibration (training) phase
and online (testing) phase.
• Each phase was composed of several blocks. Each block consisted of K runs in
which the subject tried to identify one of the 8 objects as the target.
• Each run is composed by a single flash of every object for 100ms at different times
and random order, with an Inter-Stimulus interval (ISI) of 200ms.
• The training phase was composed of 20 blocks and each block contained 10 runs.
Hence, in each training session, there are a total of 1600 EEG signals in 200 runs,
out of which only 200 are target P300 signals and 1400 are non-target signals.
• The testing phase was composed of 50 blocks. The number of runs in each block
varied between subjects and sessions ranging from 3 to 10.
9
5. Database Used
• trainData.mat – Data from the cal- • testData.mat – Data from the online
ibration phase, structured as [chan- phase, in the same structure as the
nels x epoch x event], epoch the data train data.
samples from -200 ms to 1200 ms rel-
ative to the event stimulus. • testEvents.txt – One label per line
(from 1 to 8), corresponding to the
• trainEvents.txt – One label per line order of the flashed objects.
(from 1 to 8), corresponding to the
order of the flashed objects. • testTargets.txt – 1 or 0 per line, indi-
cating if the flashed. object was the
• trainTargets.txt – 1 or 0 per line, in- target or not, respectively.
dicating if the flashed object was the
target or not, respectively. • runsperblock.txt – File containing
only one number, corresponding to
• trainLabels.txt – Label of the target the number of runs per block used
object per line (from 1 to 8), one for in the online phase (from 3 to 10).
each block.
• runs_per_block.txt – File contain-
ing only one number, corresponding
to the number of runs per block used
in the online phase (from 3 to 10).
◦ For training data, it represents 8 events per run × 10 runs per blocks × 20
blocks = 1600 epochs
◦ For testing data, it represents 8 events per run × K runs per blocks × 50
blocks = 400K epochs
• The first sample of each epoch corresponds to the time -200ms relative to the
stimulus onset and the last sample corresponds to the time 1200ms.
• Hence, the number of time sample points = (200ms + 1200ms) × 250Hz = 350
10
6 Experimental Setup
The proposed model consists of three primary steps: data selection, data preprocessing
and classification. A snapshot of the presented model is shown in figure 6.1. The details
of these steps are given in the sections below.
11
6. Experimental Setup
Feature Value
Signal Type EEG
Subjects Details 15 patients with Autism Spectrum Disorder
Age Range and gender Age from 16 to 38 and male sex
Number of Subjects 15
Number of Sessions 7
Sampling Frequency 250 Hz
Electrodes C3, Cz, C4, CPz, P3, Pz, P4 and POz
Notch Filtering
A notch filter is a type of filter that removes a single frequency component from an input
signal. More specifically, a notch filter is a band-stop filter with a very narrow stopband.
12
6. Experimental Setup
While collecting EEG data, shielded rooms are used to minimize the impact of urban
electric background, in particular 50/60 Hz alternating current line noise. Usually, most
of the information of interest in EEG signals lie below this line noise and we can use
low-pass filters with cut-off below 50/60 Hz. If one wants to keep the higher frequency
bands a notch filter can be applied which is able to reduce only a narrow band around
50/60 Hz. The notch filter distorts the phases of the signal.
Bandpass filtering
A bandpass filter is a filter that allows only a band of frequencies. It is composed
of a high-pass filter and a low-pass filter. A high-pass filter is required for reducing
low frequencies coming from bioelectric events such as breathing, that remain in the
signal after subtracting voltages toward the ground electrode. Its cut-off frequency
typically falls between 0.1 and 0.7 Hz. A low-pass filter with a cut-off frequency equal
to the highest frequency of our interest is employed to guarantee that the signal is
band-limited (in the range from 40 Hz up to less than one-half of the sampling rate).
Low-pass filters also prevent the signal from being distorted by aliasing, or interference
caused by sampling rate effects, which would happen if frequencies more than one-half
of the sample rate persisted without decreasing.
Artefacts removal
Electroencephalogram is designed to record cerebral activity, but it is not isolated from
the electrical activities arising from sites other than the brain and records them as
well. The recorded activity that is not of cerebral origin is called artefact. Typically,
it is a sequence that has a larger amplitude and a different form from signal sequences
that are not significantly contaminated. There are two chief categories of artefacts:
physiological/biological (such as cardiac, pulse, respiratory, sweat, eye movement (blink,
lateral rectus spikes from lateral eye movement), and muscle and movement artifacts) or
nonphysiological/technical artifacts (caused by electrical phenomena or devices in the
recording environment). The most common EEG artefact sources are:
13
6. Experimental Setup
Z-score normalization
Z-score normalization refers to normalization of every value in a dataset such that the
mean of all of the values is 0 and the standard deviation is 1. The z-score can be written
in simple terms as in equation 6.1. The final step of preprocessing the Autism data is
z-score normalization. After z-score normalization, the pattern of signal does not change
but the signal range reduces making the classification models train faster.
X̄n − µ
Zn = √ (6.1)
σ/ n
Sampling techniques
As we check the distribution of classes across the instance space, it says it is totally un-
even which indicates that a sampling technique is required so that the learning system
sees an equal number of training samples from each of the classes. There are two follow-
ing sampling techniques employed in the subsequent work. They are as follows Random
Oversampling: For a given batch of data samples across all the classes, this method
upsamples those samples corresponding to minority classes by replicating the samples
of the minority classes in the given batch. The samples which are to be replicated are
selected randomly with replacements. Random Undersampling: For a given batch of
data samples across all the classes, this method downsamples those samples correspond-
ing to the majority classes by removing the samples of the majority classes in the given
batch. The samples which are to be removed are selected randomly without replacement.
RandomUpsampling : [xAxBxBxCxCxC] ! [xAxAxAxBxBxBxCxCxC] (3.11) Random-
Downsampling : [xAxAxAxBxBxCxCxCxCxC] ! [xAxAxBxBxCxC] (3.12) Where the
xA; xB; xC indicate training examples from classes A;B;C respectively. As indicated,
14
6. Experimental Setup
random upsampling increases the number of samples that the learning system sees during
an iteration, whereas for random upsampling, the number of samples that the learning
system sees decreases.
6.3 Classification
In this work, deep classifiers or Deep Neural Network (DNN) for the classification of
P300 and non-P300 classes have been used. A brief description of the classifiers used is
given below:
6.3.1 EEGNet
EEGNet is a previously validated CNN architecture for P300 decoding [9]. Lawhern,
Solon, Waytowich, et al. show that EEGNet generalizes across paradigms better than,
and achieves comparably high performance to, the reference algorithms when only lim-
ited training data is available.
A modified version of EEGNet has been implemented in this work.
15
6. Experimental Setup
Table 6.2: EEGNet model hyper-parameters and output shape after each layer
6.3.2 CNN-BLSTM
Santamaría-Vázquez, Martínez-Cagigal, Gomez-Pilar, et al. in their paper [10], pre-
sented a novel deep learning architecture for Brain Computer Interfaces based on Event
Related Potential (ERP).Four models were proposed out of which the model that used
CNN and BLSTM performed the best. This neural network combined convolutional
and recurrent layers in order to learn high-level spatial and temporal features. Hence,
in this work, the same architecture has been implemented for performing classification
on BCIAUT-P300 dataset.
16
6. Experimental Setup
Bidirectional Long Short Term Memory (BLSTM) layers, which are one of the most
common type of RNN. BLSTM layers are two LSTM layers linked to the same output
layer, one of which processes the training sequence forward and the other backward.
This design enables determining if a certain EEG pattern is an ERP by using both past
and future information. Compared to LSTM, BLSTM networks have shown to be more
effective at tackling issues like speech recognition. Dropout regularization was applied to
avoid overfitting. The architecture used in CNN-BLSTM is graphically shown in figure.
The summary of the architecture along with the number of parameters is shown in table
6.3.
Preprocessing: The input signals are truncated from 0 to 1000ms. Therefore, the
time signals are 250 in length at sampling frequency of 250Hz.
Training: The optimizer used in training was Adam with the default parameters and
a learning rate of 5*1e-4, minimizing the binary cross-entropy loss function. A mini-
batch size of 128 was used and the maximum number of epochs was set to 600. To
address the considerable class unbalance, a single mini-batch was undersampled such
that data contained a proportion of 50–50% of the classes randomly selecting the trials
within the dataset.
17
6. Experimental Setup
Table 6.3: CNN-BLSTM model hyper-parameters and output shape after each layer
6.4 Specifications
The hardware specifications for the computing device used are:
• 4 CPU cores with processor base frequency of 2.40 GHz with overclocking up to
4.1 GHz.
• 8 GB RAM
18
6. Experimental Setup
19
7 Experimental Results
In this study, EEG signals obtained from BCIAUT-P300 dataset were used for auto-
mated P300 and non-P300 signal classification. For this purpose, two Deep Neural
Network models were implemented using pytorch on python. The preprocessing of the
EEG data was done on MATLAB using EEGLAB open-source toolbox. This chapter
shows the results obtained throughout the project.
20
7. Experimental Results
vs. the fraction of false positives out of the negatives (FPR = false positive rate), at
various threshold settings. TPR is also known as sensitivity, and FPR is one minus the
specificity or true negative rate.
7.2.1 Filtering
The EEG data was passed through the following filters:
21
7. Experimental Results
(a) Eye movement artefact (b) Muscle artefact (c) Line noise artefact
Figure 7.3: Some common artefacts visible on EEGLAB after performing runICA
decomposition
Figure 7.4 shows the 2-D scalp map plots of ICA decomposition for subject 02 and the
first session.
22
7. Experimental Results
An EEGLAB plugin called ICLabel was used to automatically classify and distinguish
independent components as brain or non-brain sources. The results shown in figure 7.5
confirm that all components are brain waves and none of them is an artefact.
23
7. Experimental Results
24
7. Experimental Results
Subject
No. S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12 S13 S14 S15 acc(mean±SEM)
Training
accuracy 96.36 98.11 97.68 97.54 99.32 97.86 97.57 98.32 96.68 98.93 98.71 95.93 97.57 94.89 97.82 97.55±0.30
Testing
accuracy 82.02 93.60 83.76 85.12 95.39 90.08 89.88 90.59 87.57 84.31 91.78 84.41 88.73 88.81 93.00 88.60±1.03
AUC ROC
score 0.88 0.98 0.89 0.93 0.98 0.95 0.95 0.96 0.93 0.90 0.97 0.91 0.94 0.94 0.98 0.94
F1-
score 0.82 0.93 0.85 0.85 0.95 0.90 0.89 0.90 0.87 0.85 0.91 0.83 0.86 0.88 0.93 0.88
Table 7.1: Performance at the level of single subject as represented by the average target
object accuracies of the EEGNet model
Following are the plots of training and validation accuracies obtained for each subject
to visualize training progress:
25
7. Experimental Results
Figure 7.8: EEGNet Accuracies plot during training phase for each subject
26
7. Experimental Results
(o) S15
Figure 7.9: EEGNet Accuracies plot during training phase for each subject
Following are the confusion matrices for the testing data for each subject.
27
7. Experimental Results
28
7. Experimental Results
Figure 7.11: EEGNet Confusion matrices for testing data for each subject
29
7. Experimental Results
(o) S15
Figure 7.12: EEGNet Confusion matrices for testing data for each subject
30
8 Discussion
Review In this project, a number of published articles and papers on and a few
non-contact procedures for diagnosing were reviewed. While reviewing studies on it
was noted that polysomnography serves as the standard for calculating diagnostic values
for sleep studies. We analyzed a promising automatic video analysis for the diagnosis of .
The algorithm, which is based on the principle that the volume of air that circulates into
the lungs is proportional to the amplitude of thoracic movement that a patient presents
with breathing, can detect respiratory movements independently of the position and
situation of the subject while sleeping and can infer/awake periods. The results from
this algorithm were similar to the results produced by on the same data sample. Some
other non-contact techniques for the diagnosis of were also reviewed. This included a
technique which used impulse-radio ultra-wideband radar for the non-intrusive diagnosis
of obstructive sleep apnea. The radio technology could recognise large body movements
as well as subtle breathing using its very short energy pulses ( 100ps) radar.
Results From the results of this project following observations can be made:
1. The raw movement signal represents the movement within a frame of the input
video. The peaks in the plot of this signal contains periodic peaks. These peaks
represent the body movement of patients while they are breathing. Although this
signal correctly represents the movements, the differences between raw movement
signals obtained from different classes of sleep event are not significant. From figure
??, there is not much noticeable differences in the signal. Perhaps the differences
are so subtle that they can only be distinguished by a neural network.
2. The MFCC plots represent the coefficients as a function of time. The differences
between MFCC plots from different sleep events are considerable. Normal events
don’t have many peaks and the coefficients are mostly around 0(green). In hypop-
nea events, there are many peaks and regular oscillations. In OSA events, the plot
has irregular oscillations.
3. From table ??, it is clear that the training accuracies after 30 epochs are close
to 95% for all the cases. Although this means that our model is trained very
31
8. Discussion
well, higher values of training accuracy can also indicate that the model could be
overfitting.
4. From the confusion matrices of all the cases, it is evident that most of the correct
predictions are for the normal events. In the database, we have very large number
of normal events compared to hypopnea or OSA events. This uneven distribution
causes the model to train mostly to predict normal events. Most of the hypopnea
and OSA events are also falsely classified as normal events by the classifier model.
This high correct prediction rates for normal events is also responsible for high
value of testing accuracy in the table ??.
5. The major source of inaccuracy in the classifier model is the limited database
available for the sleep events. There are only 4084 total number of observations in
our database including all the four subjects. This small dataset causes overfitting
while training and shows high variance and high error during testing.
32
9 Conclusion
The image and signal processing techniques and machine learning algorithms such as
Deep Neural Network (DNN)s along with the power of hardware acceleration using
Graphics Processing Units can be used to build reliable systems to diagnose various sleep
disorders such as . These algorithms overcome the disadvantages and limitations of . The
techniques do not require the use of expensive and advanced medical instruments and
can achieve same accuracy as PSG without making any contact with the patients. The
image processing techniques do not require large amount of database and data samples
unlike supervised machine learning algorithms. The image processing techniques are
ideal in the cases where there is a scarcity of data set, which is indeed the case with
. Meanwhile, classification models using neural networks can overcome the limitation
of image processing techniques by automating the whole process without needing to
manually tweak thresholds and parameters.
From the results of this project, it can be concluded that the proposed non-invasive
diagnostic method using image processing and machine learning is able to extract good
amount of information about the sleep events.
33
10 Future Work
The present work proposed a diagnostic method based on image processing to extract
respiratory movement signals from the video data, signal processing to extract features
from the audio data and a classification model using neural network to classify the sleep
events in Apnea events and normal events. Following are some future works that can be
built upon the proposed algorithm:
34
Bibliography
35
Bibliography
36