You are on page 1of 46

M.

Tech Project
Autism Severity Detection Using EEG
Signal

Sankalp Shrivastava
18EC35025

Under the guidance of


Prof. Goutam Saha

Department of Electronics and Electrical Communication Engineering


Indian Institute of Technology Kharagpur, India
November 2022
Declaration

I hereby declare that the work contained in this report has been done by me under
the guidance of my supervisor Prof. Goutam Saha. The work has not been submitted
to any other institute for any degree or diploma. I have conformed to the norms and
guidelines given in the Ethical Code of Conduct of the Institute. Whenever I have used
materials (data, theoretical analysis, figures and text) from other sources, I have given
due credit to them by citing them in the text of the thesis and providing their details in
the references.

Date: November 24, 2022


Place: Kharagpur Sankalp Shrivastava
18EC35025

i
DEPARTMENT OF ELECTRONICS AND ELECTRICAL
COMMUNICATION ENGINEERING
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
KHARAGPUR - 721302, INDIA

Certificate

This is to certify that the project report entitled "Autism Severity Detection Us-
ing EEG Signal" submitted by Sankalp Shrivastava (Roll No. 18EC35025) to Indian
Institute of Technology Kharagpur towards partial fulfilment of requirements for the
award of degree of Masters in Technology in Electronics and Electrical Communication
Engineering is a record of bonafide work carried out by him under my supervision and
guidance during year, 2021-22.

Prof. Goutam Saha


Department of Electronics and Electrical Communication Engineering
Indian Institute of Technology Kharagpur, India

ii
Abstract
Background:
Materials and Method:
Results:
Conclusions:

iii
Contents

Declaration i

Certificate ii

Abstract iii

List of Figures vi

List of Tables vii

Acronyms viii

1 Introduction 1

2 Literature Review 3
2.1 Fundamentals of EEG measurement . . . . . . . . . . . . . . . . . . . . . 3

3 Objectives 6

4 Motivation 7

5 Database Used 8
5.1 BCIAUT-P300: Benchmark dataset on Autism . . . . . . . . . . . . . . . 8
5.1.1 P300-Based BCI System . . . . . . . . . . . . . . . . . . . . . . . 8
5.1.2 BCI session procedure . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1.3 Dataset Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6 Experimental Setup 11
6.1 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.3 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.3.1 EEGNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.3.2 CNN-BLSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

iv
Contents

6.4 Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

7 Experimental Results 20
7.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Data Preprocessing Results . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2.2 Independent Component Analysis . . . . . . . . . . . . . . . . . . 22
7.2.3 Z-score normalization . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.3 EEGNet Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.4 CNN-BLSTM Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

8 Discussion 31

9 Conclusion 33

10 Future Work 34

Bibliography 35

v
List of Figures

6.1 Flowchart showing the proposed model . . . . . . . . . . . . . . . . . . . 11


6.2 Data preprocessing steps . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

7.1 Channel locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21


7.2 Frequency response of the bandpass filter in semilog scale. . . . . . . . . 22
7.3 Some common artefacts visible on EEGLAB after performing runICA
decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.4 Independent component analysis of EEGdata for subject 02 session 01 . . 23
7.5 Automatic independent components classification using ICLabel EEGLAB
plugin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.6 EEG data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.8 EEGNet Accuracies plot during training phase for each subject . . . . . . 26
7.9 EEGNet Accuracies plot during training phase for each subject . . . . . . 27
7.11 EEGNet Confusion matrices for testing data for each subject . . . . . . . 29
7.12 EEGNet Confusion matrices for testing data for each subject . . . . . . . 30

vi
List of Tables

2.1 Brain waves classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

6.1 Details of BCIAUT EEG dataset used . . . . . . . . . . . . . . . . . . . 12


6.2 EEGNet model hyper-parameters and output shape after each layer . . . 16
6.3 CNN-BLSTM model hyper-parameters and output shape after each layer 18

7.1 Performance at the level of single subject as represented by the average


target object accuracies of the EEGNet model . . . . . . . . . . . . . . . 25

vii
Acronyms

ADDM Autism and Developmental Disabilities Monitoring. 1

APA American Psychiatric Association. 7

ASD Autism Spectrum Disorder. 1, 2, 5, 7, 8, 12

AUC Area Under Curve. 20

BCI Brain Computer Interface. 3, 6, 8, 16

BLSTM Bidirectional Long Short Term Memory. 17

CDC Centers for Disease Control and Prevention. 1

CNN Convolutional Neural Network. 1, 15

CT Computer Tomography. 4

DNN Deep Neural Network. 15, 20, 33

ECG Electrocardiogram. 4, 14

EEG Electroencephalogram. 1, 3, 6, 8, 9, 11–13, 20

EMG Electromyography. 14

ERP Event Related Potential. 2, 4, 6, 16

GPU Graphics Processing Unit. 33

ICA Independent Component Analysis. 22

ISI Inter-Stimulus interval. 9

LSTM Long Short Term Memory. 2

viii
Acronyms

MRI Magnetic Resonance Imaging. 4

RNN Recurrent Neural Network. 2, 17

ROC Receiver Operating Characteristic. 20

ix
1 Introduction

Autism Spectrum Disorder (ASD) is a common constellation of early-appearing social


communication deficits and repetitive sensory-motor behaviours. It causes problems
with social communication and interaction. Autism can be seen as a spectrum that
can range from very mild to severe. It typically has a significant hereditary component
along with other reasons. According to studies, early abnormal brain development and
neuronal reorganisation cause ASD [1].

According to estimates from the CDC’s Autism and Developmental Disabilities Mon-
itoring (ADDM) Network, about 1 in 44 children are diagnosed with ASD [2]. Yet,
ASD is currently incurable. However, the effects of this condition can be lessened by
implementing comprehensive early interventions to improve children’s learning and func-
tioning as well as their participation in their communities [3]. Early ASD identification
is crucial because younger children learn the necessary abilities more quickly and be-
cause the early implementation of specialised education will help reduce some of the
ASD symptoms earlier [4].

A Brain-Computer Interface (BCI) is a system that provides a direct communication


between the brain and a computer or external device . In short, it must interpret brain
activity and translate it into commands that can be used to control devices or programs,
from prosthesis, orthosis, wheelchairs and other robots to a mouse or a keyboard in a
controlled computer environment. Traditionally, BCIs have been used for medical appli-
cations such as neural control of prosthetic artificial limbs. However, recent research has
opened up the possibility for novel BCIs focused on enhancing performance of healthy
users, often with noninvasive approaches based on Electroencephalogram (EEG).

Deep Learning has significantly reduced the requirement for manual feature extraction,
leading to state-of-the-art performance in areas like speech recognition and computer
vision. Particularly, the adoption of deep Convolutional Neural Network (CNN) has
increased in part as a result of their superior performance versus approaches depending
on hand-crafted features in a variety of difficult picture classification applications. Even
though CNNs have outperformed more traditional machine learning techniques, they
fall short when it comes to learning long-term, high-level temporally dispersed charac-

1
1. Introduction

teristics. Recurrent Neural Network (RNN) might be able to get around this restriction.
These models are able to learn long-term temporal patterns and have demonstrated ex-
cellent performance in the categorization of complicated time series.Long Short Term
Memory (LSTM) RNNs are the most popular kind. These networks do better in recog-
nising voice and language.

This work is an attempt at detecting the severity of Autism Spectrum Disorder in


patients using the detection of P300 Event Related Potential signals in the EEG data
obtained from BCIAUT-P300 dataset. The P300 event-related potential is a stereotyped
brain response in reaction to a new visual stimuli. It is frequently induced using the visual
oddball paradigm, in which participants are exposed to set amounts of repeated "non-
target" visual stimuli interspersed with uncommon "target" visual stimuli (for example,
1 Hz). The P300 waveform is a substantial positive detection of electrical activity that is
recorded over the parietal cortex 300 ms after the commencement of the target stimulus,
with the intensity of the observed detection being inversely proportional to the frequency
of the target stimuli. The classification was achieved with the use of two prior-art neural
network models: EEGNet and CNN-BLSTM. As the autism dataset used in this work is
small and limited, EEGNet was used in this work due to the fact that it uses depthwise
and separable convolutions for feature extraction as well as to simultaneously reduce
the number of trainable parameters. CNN-BLSTM helps extracting long-term temporal
dependencies within the data which EEGNet fails to do.

2
2 Literature Review

In this literature review, comprehensive study on a few methods related to the detection
of P300 waves in EEG signals have been presented. This chapter of the report reviews
a few published papers related to those methods. It also presents a detailed review of a
paper that describes a benchmark dataset on Autism for P300 based Brain Computer
Interface (BCI) that this project used.

2.1 Fundamentals of EEG measurement


Electroencephalography is a medical imaging technique used to read electrical activ-
ity from the scalp generated by brain structures. The Electroencephalogram (EEG) is
characterised as an alternating electrical activity that is recorded from the scalp surface
using conductive materials and metal electrodes [5]. EEG mostly measures the local
current flows produced when brain cells or neurons are activated. These currents flow
during synaptic excitations of the dendrites of many pyramidal neurons in the cerebral
cortex [6].
From the anatomical point of view, the brain is divided into three sections - the cere-
brum, cerebellum, and brain stem. The cerebral cortex — a highly convoluted outer
layer of the cerebrum — is found on the left and right hemispheres of the brain. Most of
the central nervous system is made up of the cortex. The cerebrum is where the brain’s
centres for movement initiation, conscious awareness of sensation, complex processing,
and expression of emotion and behaviour are located. Coordination of voluntary muscle
movements and maintaining balance are also functions of the cerebellum. The brain
stem regulates the heartbeat, biorhythms, and release of hormones and neurohormones,
among other functions. The cerebral cortex’s electrical activity has the greatest impact
on EEG due to its surface location.

Brain Waves Classification


Its been confirmed that brain activity changes in a consistent and recognizable way when
the general status of the subject changes, for example, from relaxation to alertness [7].
The brain wave patterns are commonly sinusoidal. They normally range from 0.5 to

3
2. Literature Review

100µV in amplitude, which is about 100 times weaker than Electrocardiogram (ECG)
signals. The brain waves can be categorized into six basic groups based on the dom-
inant frequencies as described in table 2.1. These brain wave types can be separated
and extracted from the power spectrum of the time brain signals by means of Fourier
transform. The normal alpha rhythm is the best known and most extensively studied
rhythm of the human brain. Alpha waves are usually observed better in the posterior,
central and occipital regions [6]. Alpha activity is induced by closing the eyes and by
relaxation. It is destroyed by eye-opening or altering mechanisms such as thinking and
calculating. Beta waves are dominant during the normal state of wakefulness with open
eyes.

Brain wave type Frequency range


Delta 0.5 − 4Hz
Theta 4 − 8Hz
Alpha 8 − 13Hz
Beta 13 − 30Hz
Lower Gamma 30 − 50Hz
Higher Gamma > 50Hz

Table 2.1: Brain waves classification

Advantages and Applications


The major advantages of EEG over other medical imaging techniques such as Computer
Tomography (CT) and Magnetic Resonance Imaging (MRI) are speed and low cost. A
high temporal resolution is responsible for its speed. Neural activity can be measured
occurring within fractions of a second after the administration of a stimulus. The only
disadvantage of EEG is that it provides less spatial resolution compared to MRI. The
non-invasive and painless nature of the EEG treatment makes it popular for research
into how the brain organises cognitive functions like perception, memory, attention,
language, and emotion in healthy adults and children. For this purpose, EEG finds its
most useful application in Event Related Potential (ERP) technique.

Event Related Potentials


Event-related potentials (ERPs) are significant voltage fluctuations resulting from an
evoked neural activity. It is also known as evoked potential. Evoked potential can be
initiated by an external or internal stimulus. ERPs are suitable methodology for the

4
2. Literature Review

study of cognitive processes of both a normal and disordered nature (neurological or psy-
chiatric disorders such as Autism Spectrum Disorder). Mental processes like perception,
selective attention, language processing, and memory take place across time intervals of
tens of milliseconds or less. Due to its high temporal resolution, the time course of these
activations can be determined with the aid of ERPs.
The amplitude of ERP components is usually much smaller than normal EEG com-
ponents making them difficult to recognise. In order to extract ERP signals, digital
averaging of epochs is employed. The natural background EEG noise and fluctuations
are averaged out, leaving the evoked brain potentials.

Brain Computer Interface


A communication system called a brain computer interface (BCI) solely interprets a
user’s commands from their brainwaves and responds accordingly. A straightforward
assignment can be as simple as having the subject imagine moving either his or her left
or right hand in the direction of an arrow that is presented on the screen. As a result
of the imaging process, some brainwave characteristics are enhanced and can be utilised
to recognise user commands, such as motor mu waves (brain waves in the alpha range
frequency linked with physical motions or the intention to move) or specific ERPs.

Oddball Paradigm
The oddball paradigm is a commonly used task for cognitive and attention measurement
in ERP studies. Presentations of sequences of repetitive stimuli are infrequently inter-
rupted by a deviant stimulus. The participant’s response to this "oddball" stimulus is
seen and recorded.
In ERP research it has been found that an event-related potential across the parieto-
central area of the skull that usually occurs around 300 ms after stimuli presentation
called P300 is larger after the target stimulus. The P300 wave only occurs if the subject
is actively engaged in the task of detecting the targets. Its amplitude varies with the
improbability of the targets. Its latency varies with the difficulty of discriminating the
target stimulus from the standard stimuli.

5
3 Objectives

The primary objective of this project is to devise an automated non-invasive technique


to detect the severity of Autism in individuals so as to be able to differentiate between
mild and severe Autism using P300 signals in the EEG data obtained from a BCI system.
Thus our main objective can be divided into two major task:

1. To identify P300 Event Related Potential (ERP)s in the Autistic individuals from
the EEG signal.

2. To investigate the Autism severity (mild or severe) of the individuals through a


thresholding method.

6
4 Motivation

About 50 years ago, Autism Spectrum Disorder (ASD) was narrowly defined and was
considered a rare disorder of childhood onset. But today it has become a well publi-
cised, advocated, and researched lifelong condition, recognised as fairly common and
very heterogeneous. Several countries in the world have confirmed an increase in the
number of ASD cases. Due to the increasing number of ASD cases in various countries,
it is necessary to understand the impact of having children with ASD on family social
life because some of the characteristics of ASD patients, according to the American
Psychiatric Association (APA), are the characteristics of ASD, namely limitations or
disturbances in communication and social interaction The outlook for many individuals
with autism spectrum disorder today is brighter than it was 50 years ago; more peo-
ple with the condition are able to speak, read, and live in the community rather than
in institutions, and some will be largely free from symptoms of the disorder by adult-
hood. Nevertheless, most individuals will not work full-time or live independently. It
is also important to implement what we already know and develop services for adults
with autism spectrum disorder. Clinicians can make a difference by providing timely
and individualised help to families navigating referrals and access to community support
systems, by providing accurate information despite often unfiltered media input, and by
anticipating transitions such as family changes and school entry and leaving.

To decode brain signals, a Brain-Computer Interface (BCI) employs machine learning


techniques. A reliable identification of the P300 response in electroencephalography
(EEG) data can be used to develop P300-based BCIs to encourage social attention in
Autistic Spectrum Disorder (ASD) symptoms. Recently, there was a growing interest
in the application of Convolutional Neural Networks (CNNs) models to decode P300 in
an end-to-end fashion. However, the complexity of these models needs to be carefully
considered. To decode whether ASD participants were paying attention to the virtual
environment, a lightweight CNN (EEGNet) previously validated for P300 detection was
employed. This study aims in comparing EEGNet with different CNN architectures and
also analyzes the handcrafted CSP features based on shallow networks.

7
5 Database Used

5.1 BCIAUT-P300: Benchmark dataset on Autism


BCIAUT-P300 is a multi-session and multi-subject benchmark dataset on Autism for
P300-Based Brain Computer Interface. The BCIAUT-P300 dataset contains the full
EEG recordings from a clinical experiment to determine the viability of using a P300-
based Brain Computer Interface to train youngsters with Autism Spectrum Disorder
(ASD) to recognise and respond to social cues. The dataset was obtained from 15
ASD individuals who underwent 7 sessions of P300-based BCI joint-attention training,
thereby making a total of 105 sessions.

5.1.1 P300-Based BCI System


The dataset used a BCI system developed by Amaral, Simões, Mouga, et al. (2017) [8]
that is based on P300 signals which has a virtual environment with a virtual human
character and several objects of interest to train the ability of participants to follow the
cues of the virtual character to the objects.
The major highlights of the BCI system are as follows:

1. the g. Nautilus system was used as the data acquistion module to record EEG
data.

2. The EEG data was acquired from 8 active electrodes positioned at C3, Cz, C4,
CPz, P3, Pz, P4 and POz locations.

3. The reference electrode was placed at the right ear and the ground electrode at
AFz location.

4. Sampling rate = 250Hz.

5. The Vizard toolkit was used as the stimuli presentation module which created and
displayed a virtual environment consisting of following objects which were used as
stimuli.

8
5. Database Used

(i) Books on a shelf, (ii) A radio on dresser, (iii) A printer on shelf,


(iv) A laptop on table, (v) A ball on ground, (vi) A corkboard on wall,
(vii) A wooden plane, (viii) A picture on wall

5.1.2 BCI session procedure


• The fifteen participants underwent 7 identical training sessions in different days.

• Each training session was divided into two phases: calibration (training) phase
and online (testing) phase.

• Each phase was composed of several blocks. Each block consisted of K runs in
which the subject tried to identify one of the 8 objects as the target.

• Each run is composed by a single flash of every object for 100ms at different times
and random order, with an Inter-Stimulus interval (ISI) of 200ms.

• The training phase was composed of 20 blocks and each block contained 10 runs.
Hence, in each training session, there are a total of 1600 EEG signals in 200 runs,
out of which only 200 are target P300 signals and 1400 are non-target signals.

• The testing phase was composed of 50 blocks. The number of runs in each block
varied between subjects and sessions ranging from 3 to 10.

5.1.3 Dataset Structure


The dataset folder contained 15 subfolders corresponding to each subject names in the
format SBJXX ∀ XX ∈ {01...15}. Each subject folder contained 7 session folders
names SY Y . Each session folder contained one folder for training data and another for
testing data.

9
5. Database Used

Train Folder: Test Folder:

• trainData.mat – Data from the cal- • testData.mat – Data from the online
ibration phase, structured as [chan- phase, in the same structure as the
nels x epoch x event], epoch the data train data.
samples from -200 ms to 1200 ms rel-
ative to the event stimulus. • testEvents.txt – One label per line
(from 1 to 8), corresponding to the
• trainEvents.txt – One label per line order of the flashed objects.
(from 1 to 8), corresponding to the
order of the flashed objects. • testTargets.txt – 1 or 0 per line, indi-
cating if the flashed. object was the
• trainTargets.txt – 1 or 0 per line, in- target or not, respectively.
dicating if the flashed object was the
target or not, respectively. • runsperblock.txt – File containing
only one number, corresponding to
• trainLabels.txt – Label of the target the number of runs per block used
object per line (from 1 to 8), one for in the online phase (from 3 to 10).
each block.
• runs_per_block.txt – File contain-
ing only one number, corresponding
to the number of runs per block used
in the online phase (from 3 to 10).

• N umber of Epochs = events_per_run × runs_per_block × blocks

◦ For training data, it represents 8 events per run × 10 runs per blocks × 20
blocks = 1600 epochs
◦ For testing data, it represents 8 events per run × K runs per blocks × 50
blocks = 400K epochs

• The first sample of each epoch corresponds to the time -200ms relative to the
stimulus onset and the last sample corresponds to the time 1200ms.

• Hence, the number of time sample points = (200ms + 1200ms) × 250Hz = 350

• The feature vectors are, therefore, of dimensions 8 × 350.

10
6 Experimental Setup

The proposed model consists of three primary steps: data selection, data preprocessing
and classification. A snapshot of the presented model is shown in figure 6.1. The details
of these steps are given in the sections below.

Figure 6.1: Flowchart showing the proposed model

6.1 Experimental Data


The Autism dataset used in this project was BCIAUT-P300 dataset. This dataset
contains EEG signals acquired from 15 subjects over 7 sessions. The details of this
dataset are thoroughly explained in the previous chapter. A brief description of the
characteristics of the BCIAUT-P300 dataset are shown in table 6.1

11
6. Experimental Setup

Feature Value
Signal Type EEG
Subjects Details 15 patients with Autism Spectrum Disorder
Age Range and gender Age from 16 to 38 and male sex
Number of Subjects 15
Number of Sessions 7
Sampling Frequency 250 Hz
Electrodes C3, Cz, C4, CPz, P3, Pz, P4 and POz

Table 6.1: Details of BCIAUT EEG dataset used

6.2 Data Preprocessing


Before the EEG signals of subjects from the BCIAUT-P300 can be fed into the classifi-
cation models, the data must be prepared and preprocessed. The general steps involved
in the preprocessing of an EEG data are graphically represented in the figure 6.2. Fur-
thermore, the data was preprocessed on the MATLAB® using EEGLAB open-source
toolbox.

Figure 6.2: Data preprocessing steps

Notch Filtering
A notch filter is a type of filter that removes a single frequency component from an input
signal. More specifically, a notch filter is a band-stop filter with a very narrow stopband.

12
6. Experimental Setup

While collecting EEG data, shielded rooms are used to minimize the impact of urban
electric background, in particular 50/60 Hz alternating current line noise. Usually, most
of the information of interest in EEG signals lie below this line noise and we can use
low-pass filters with cut-off below 50/60 Hz. If one wants to keep the higher frequency
bands a notch filter can be applied which is able to reduce only a narrow band around
50/60 Hz. The notch filter distorts the phases of the signal.

Bandpass filtering
A bandpass filter is a filter that allows only a band of frequencies. It is composed
of a high-pass filter and a low-pass filter. A high-pass filter is required for reducing
low frequencies coming from bioelectric events such as breathing, that remain in the
signal after subtracting voltages toward the ground electrode. Its cut-off frequency
typically falls between 0.1 and 0.7 Hz. A low-pass filter with a cut-off frequency equal
to the highest frequency of our interest is employed to guarantee that the signal is
band-limited (in the range from 40 Hz up to less than one-half of the sampling rate).
Low-pass filters also prevent the signal from being distorted by aliasing, or interference
caused by sampling rate effects, which would happen if frequencies more than one-half
of the sample rate persisted without decreasing.

Artefacts removal
Electroencephalogram is designed to record cerebral activity, but it is not isolated from
the electrical activities arising from sites other than the brain and records them as
well. The recorded activity that is not of cerebral origin is called artefact. Typically,
it is a sequence that has a larger amplitude and a different form from signal sequences
that are not significantly contaminated. There are two chief categories of artefacts:
physiological/biological (such as cardiac, pulse, respiratory, sweat, eye movement (blink,
lateral rectus spikes from lateral eye movement), and muscle and movement artifacts) or
nonphysiological/technical artifacts (caused by electrical phenomena or devices in the
recording environment). The most common EEG artefact sources are:

13
6. Experimental Setup

Physiological artefacts: Technical artefacts:

- any minor body movements - 50/60 Hz line


- EMG - impedance fluctuation
- ECG (pulse,cardiac) - cable movements
- Respiratory - broken wire contacts
- sweating - too much electrode paste or dried
pieces
- eye movements
- low battery

Z-score normalization
Z-score normalization refers to normalization of every value in a dataset such that the
mean of all of the values is 0 and the standard deviation is 1. The z-score can be written
in simple terms as in equation 6.1. The final step of preprocessing the Autism data is
z-score normalization. After z-score normalization, the pattern of signal does not change
but the signal range reduces making the classification models train faster.

X̄n − µ
Zn = √ (6.1)
σ/ n

Sampling techniques
As we check the distribution of classes across the instance space, it says it is totally un-
even which indicates that a sampling technique is required so that the learning system
sees an equal number of training samples from each of the classes. There are two follow-
ing sampling techniques employed in the subsequent work. They are as follows Random
Oversampling: For a given batch of data samples across all the classes, this method
upsamples those samples corresponding to minority classes by replicating the samples
of the minority classes in the given batch. The samples which are to be replicated are
selected randomly with replacements. Random Undersampling: For a given batch of
data samples across all the classes, this method downsamples those samples correspond-
ing to the majority classes by removing the samples of the majority classes in the given
batch. The samples which are to be removed are selected randomly without replacement.
RandomUpsampling : [xAxBxBxCxCxC] ! [xAxAxAxBxBxBxCxCxC] (3.11) Random-
Downsampling : [xAxAxAxBxBxCxCxCxCxC] ! [xAxAxBxBxCxC] (3.12) Where the
xA; xB; xC indicate training examples from classes A;B;C respectively. As indicated,

14
6. Experimental Setup

random upsampling increases the number of samples that the learning system sees during
an iteration, whereas for random upsampling, the number of samples that the learning
system sees decreases.

6.3 Classification
In this work, deep classifiers or Deep Neural Network (DNN) for the classification of
P300 and non-P300 classes have been used. A brief description of the classifiers used is
given below:

6.3.1 EEGNet
EEGNet is a previously validated CNN architecture for P300 decoding [9]. Lawhern,
Solon, Waytowich, et al. show that EEGNet generalizes across paradigms better than,
and achieves comparably high performance to, the reference algorithms when only lim-
ited training data is available.
A modified version of EEGNet has been implemented in this work.

Architecture: The EEGNet is a compact CNN architecture carefully designed to


reduce the number of training parameters through the use of Depthwise and Separable
convolutional layers. The architecture used in EEGNet is graphically shown in figure.
The summary of the architecture along with the number of parameters is shown in table
6.2.
Preprocessing: The input signals are truncated from -100 to 1000ms. Therefore,
the time signals are 275 in length at sampling frequency of 250Hz.
Training: The optimizer used in training was Adam with the default parameters.
A multistep learning rate scheduler was employed starting from a learning rate of 0.01
then reducing the learning rate by 0.3684 times after 30, 60 and 90 epochs till the final
learning rate reaches 5*1e-4. This was done to increase convergence rate. Binary cross-
entropy loss function was minimized. A mini-batch size of 64 was used and the maximum
number of epochs was set to 600. To address the considerable class unbalance, a single
mini-batch was undersampled such that data contained a proportion of 50–50% of the
classes randomly selecting the trials within the dataset.

15
6. Experimental Setup

Layer Output Shape Param Count


EEGNet [2] –
Sequential: 1-1 [1, 2] –
Conv2d: 2-1 [1, 8, 8, 276] 1,024
BatchNorm2d: 2-2 [1, 8, 8, 276] 16
Depth-Conv2D: 2-3 [1, 16, 1, 276] 128
BatchNorm2d: 2-4 [1, 16, 1, 276] 32
ELU: 2-5 [1, 16, 1, 276] –
AvgPool2d: 2-6 [1, 16, 1, 69] –
Dropout: 2-7 [1, 16, 1, 69] –
Sep-Conv2D: 2-8 [1, 16, 1, 69] 528
Sep-Conv2D: 2-9 [1, 16, 1, 69] 256
BatchNorm2d: 2-10 [1, 16, 1, 69] 32
ELU: 2-11 [1, 16, 1, 69] –
AvgPool2d: 2-12 [1, 16, 1, 8] –
Dropout: 2-13 [1, 16, 1, 8] –
Depth-Conv2D: 2-14 [1, 2, 1, 1] 258
Softmax: 2-15 [1, 2, 1, 1] –
Squeeze: 2-17 [1, 2] –
Total params: 2,274
Trainable params: 2,274
Non-trainable params: 0
Total mult-adds (M): 2.35
Input size (MB): 0.01
Forward/backward pass size (MB): 0.38
Params size (MB): 0.01
Estimated Total Size (MB): 0.40

Table 6.2: EEGNet model hyper-parameters and output shape after each layer

6.3.2 CNN-BLSTM
Santamaría-Vázquez, Martínez-Cagigal, Gomez-Pilar, et al. in their paper [10], pre-
sented a novel deep learning architecture for Brain Computer Interfaces based on Event
Related Potential (ERP).Four models were proposed out of which the model that used
CNN and BLSTM performed the best. This neural network combined convolutional
and recurrent layers in order to learn high-level spatial and temporal features. Hence,
in this work, the same architecture has been implemented for performing classification
on BCIAUT-P300 dataset.

Architecture: The first layer of this model is a convolutional 1D layer, which is


designed to learn spatial patterns. The second and third layers of this architecture are

16
6. Experimental Setup

Bidirectional Long Short Term Memory (BLSTM) layers, which are one of the most
common type of RNN. BLSTM layers are two LSTM layers linked to the same output
layer, one of which processes the training sequence forward and the other backward.
This design enables determining if a certain EEG pattern is an ERP by using both past
and future information. Compared to LSTM, BLSTM networks have shown to be more
effective at tackling issues like speech recognition. Dropout regularization was applied to
avoid overfitting. The architecture used in CNN-BLSTM is graphically shown in figure.
The summary of the architecture along with the number of parameters is shown in table
6.3.
Preprocessing: The input signals are truncated from 0 to 1000ms. Therefore, the
time signals are 250 in length at sampling frequency of 250Hz.
Training: The optimizer used in training was Adam with the default parameters and
a learning rate of 5*1e-4, minimizing the binary cross-entropy loss function. A mini-
batch size of 128 was used and the maximum number of epochs was set to 600. To
address the considerable class unbalance, a single mini-batch was undersampled such
that data contained a proportion of 50–50% of the classes randomly selecting the trials
within the dataset.

17
6. Experimental Setup

Layer Output Shape Param Count


cnn_blstm [1, 1] –
Sequential: 1-1 [1, 32, 62] –
BatchNorm1d: 2-1 [1, 8, 250] 16
Conv1d: 2-2 [1, 32, 62] 1,056
ReLU: 2-3 [1, 32, 62] –
Dropout: 2-4 [1, 32, 62] –
Sequential: 1-2 [1, 62, 32] –
BatchNorm1d: 2-5 [1, 62, 32] 124
LSTM: 2-6 [1, 62, 32] 6,400
Sequential: 1-3 [1, 62, 16] –
BatchNorm1d: 2-7 [1, 62, 32] 124
LSTM: 2-8 [1, 62, 16] 2,688
Sequential: 1-4 [1, 1] –
Linear: 2-9 [1, 1] 17
Sigmoid: 2-10 [1, 1] –
Total params: 10,425
Trainable params: 10,425
Non-trainable params: 0
Total mult-adds (M): 0.63
Input size (MB): 0.01
Forward/backward pass size (MB): 0.09
Params size (MB): 0.04
Estimated Total Size (MB): 0.14

Table 6.3: CNN-BLSTM model hyper-parameters and output shape after each layer

6.4 Specifications
The hardware specifications for the computing device used are:

• CPU: Intel Core i5-9300H, 9th Gen

• 4 CPU cores with processor base frequency of 2.40 GHz with overclocking up to
4.1 GHz.

• GPU: NVIDIA GeForce GTX 1650

• 4 GB dedicated GPU memory

• 8 GB RAM

The software specifications are:

18
6. Experimental Setup

• MATLAB® , 64-bit, academic version R2021a for data preprocessing.

– Signal Processing Toolbox™ for filtering data.


– EEGLAB - an open source toolbox for processing electrophysiological signals.

• Visual Studio Code for source-code editing.

• Python version 3.10.0

19
7 Experimental Results

In this study, EEG signals obtained from BCIAUT-P300 dataset were used for auto-
mated P300 and non-P300 signal classification. For this purpose, two Deep Neural
Network models were implemented using pytorch on python. The preprocessing of the
EEG data was done on MATLAB using EEGLAB open-source toolbox. This chapter
shows the results obtained throughout the project.

7.1 Performance Metrics


Evaluating the performance of a system is a necessary task. For classification purposes,
accuracy is used as performance metric in most of the cases but if the dataset is skewed
i.e. there is an uneven class distribution across the instance space. For dealing with this
F1-score is preferred as performance metric. The following are the definiton of these
metrics. For calculating these metrics, scikit-learn [11], a machine learning library for
python, was used.
TP + TN
Accuracy = (7.1)
TP + FP + TN + FN
TP
P recision = (7.2)
TP + FP
TP
Recall = (7.3)
TP + FN
Recall ∗ P recision
F 1 − score = 2 ∗ (7.4)
Recall + P recision
TP + TN
Accuracy = (7.5)
TP + FP + TN + FN
Where TP, FP, FN and TN are true positive, false positive, false negative and true
negative respectively.
Along with these metrics, ROC AUC score was also calculated in sklearn. A Receiver
Operating Characteristic (ROC) is a graphical plot that illustrates the performance of
a binary classifier system as its discrimination threshold is varied. It is created by
plotting the fraction of true positives out of the positives (TPR = true positive rate)

20
7. Experimental Results

vs. the fraction of false positives out of the negatives (FPR = false positive rate), at
various threshold settings. TPR is also known as sensitivity, and FPR is one minus the
specificity or true negative rate.

7.2 Data Preprocessing Results


The data preprocessing was performing using EEGLAB electrophysiological signal pro-
cessing toolbox on MATLAB. EEGLAB uses a graphical user interface for easy EEG
data processing. The BCIAUT-P300 dataset contain EEG recordings form 8 EEG elec-
trodes. The 8 channels locations on scalp are plotted on EEGLAB and shown in figure
7.1.

Figure 7.1: Channel locations

7.2.1 Filtering
The EEG data was passed through the following filters:

1. Notch filtering: Notch filtering the EEG data at 50 Hz to remove interference of


ac power line.

2. Bandpass filtering: between 0.5 Hz to 80 Hz to remove low frequency DC compo-


nents as well as to remove unnecessary high frequency components to guarantee
band limited signals. The figure 7.2 shows the frequency response of the bandpass
filter used in EEGLAB.

21
7. Experimental Results

Figure 7.2: Frequency response of the bandpass filter in semilog scale.

7.2.2 Independent Component Analysis


Independent Component Analysis (ICA) is a technique used to remove artifacts embed-
ded in data such as muscle, eye blinks or eye movements. ICA was performed on the
dataset using runICA algorithm on EEGLAB.

(a) Eye movement artefact (b) Muscle artefact (c) Line noise artefact

Figure 7.3: Some common artefacts visible on EEGLAB after performing runICA
decomposition

Figure 7.4 shows the 2-D scalp map plots of ICA decomposition for subject 02 and the
first session.

22
7. Experimental Results

Figure 7.4: Independent component analysis of EEGdata for subject 02 session 01

An EEGLAB plugin called ICLabel was used to automatically classify and distinguish
independent components as brain or non-brain sources. The results shown in figure 7.5
confirm that all components are brain waves and none of them is an artefact.

Figure 7.5: Automatic independent components classification using ICLabel EEGLAB


plugin

23
7. Experimental Results

7.2.3 Z-score normalization


The sample result after z-score normalization is shown in figure 7.6.

(a) before z-score normalization

(b) after z-score normalization

Figure 7.6: EEG data

7.3 EEGNet Results


7.4 CNN-BLSTM Results
Several variations of the model were trained using different sets of training samples.
Following table shows the results of training and testing the DAG network classifier

24
7. Experimental Results

model using different sets of subjects:

Subject
No. S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12 S13 S14 S15 acc(mean±SEM)
Training
accuracy 96.36 98.11 97.68 97.54 99.32 97.86 97.57 98.32 96.68 98.93 98.71 95.93 97.57 94.89 97.82 97.55±0.30
Testing
accuracy 82.02 93.60 83.76 85.12 95.39 90.08 89.88 90.59 87.57 84.31 91.78 84.41 88.73 88.81 93.00 88.60±1.03
AUC ROC
score 0.88 0.98 0.89 0.93 0.98 0.95 0.95 0.96 0.93 0.90 0.97 0.91 0.94 0.94 0.98 0.94
F1-
score 0.82 0.93 0.85 0.85 0.95 0.90 0.89 0.90 0.87 0.85 0.91 0.83 0.86 0.88 0.93 0.88

Table 7.1: Performance at the level of single subject as represented by the average target
object accuracies of the EEGNet model

Following are the plots of training and validation accuracies obtained for each subject
to visualize training progress:

(a) S01 (b) S02

(c) S03 (d) S04

25
7. Experimental Results

(e) S05 (f) S06

(g) S07 (h) S08

(i) S09 (j) S10

Figure 7.8: EEGNet Accuracies plot during training phase for each subject

26
7. Experimental Results

(k) S11 (l) S12

(m) S13 (n) S14

(o) S15

Figure 7.9: EEGNet Accuracies plot during training phase for each subject

Following are the confusion matrices for the testing data for each subject.

27
7. Experimental Results

(a) S01 (b) S02

(c) S03 (d) S04

28
7. Experimental Results

(e) S05 (f) S06

(g) S07 (h) S08

(i) S09 (j) S10

Figure 7.11: EEGNet Confusion matrices for testing data for each subject

29
7. Experimental Results

(k) S11 (l) S12

(m) S13 (n) S14

(o) S15

Figure 7.12: EEGNet Confusion matrices for testing data for each subject

30
8 Discussion

Review In this project, a number of published articles and papers on and a few
non-contact procedures for diagnosing were reviewed. While reviewing studies on it
was noted that polysomnography serves as the standard for calculating diagnostic values
for sleep studies. We analyzed a promising automatic video analysis for the diagnosis of .
The algorithm, which is based on the principle that the volume of air that circulates into
the lungs is proportional to the amplitude of thoracic movement that a patient presents
with breathing, can detect respiratory movements independently of the position and
situation of the subject while sleeping and can infer/awake periods. The results from
this algorithm were similar to the results produced by on the same data sample. Some
other non-contact techniques for the diagnosis of were also reviewed. This included a
technique which used impulse-radio ultra-wideband radar for the non-intrusive diagnosis
of obstructive sleep apnea. The radio technology could recognise large body movements
as well as subtle breathing using its very short energy pulses ( 100ps) radar.

Results From the results of this project following observations can be made:

1. The raw movement signal represents the movement within a frame of the input
video. The peaks in the plot of this signal contains periodic peaks. These peaks
represent the body movement of patients while they are breathing. Although this
signal correctly represents the movements, the differences between raw movement
signals obtained from different classes of sleep event are not significant. From figure
??, there is not much noticeable differences in the signal. Perhaps the differences
are so subtle that they can only be distinguished by a neural network.

2. The MFCC plots represent the coefficients as a function of time. The differences
between MFCC plots from different sleep events are considerable. Normal events
don’t have many peaks and the coefficients are mostly around 0(green). In hypop-
nea events, there are many peaks and regular oscillations. In OSA events, the plot
has irregular oscillations.

3. From table ??, it is clear that the training accuracies after 30 epochs are close
to 95% for all the cases. Although this means that our model is trained very

31
8. Discussion

well, higher values of training accuracy can also indicate that the model could be
overfitting.

4. From the confusion matrices of all the cases, it is evident that most of the correct
predictions are for the normal events. In the database, we have very large number
of normal events compared to hypopnea or OSA events. This uneven distribution
causes the model to train mostly to predict normal events. Most of the hypopnea
and OSA events are also falsely classified as normal events by the classifier model.
This high correct prediction rates for normal events is also responsible for high
value of testing accuracy in the table ??.

5. The major source of inaccuracy in the classifier model is the limited database
available for the sleep events. There are only 4084 total number of observations in
our database including all the four subjects. This small dataset causes overfitting
while training and shows high variance and high error during testing.

32
9 Conclusion

The image and signal processing techniques and machine learning algorithms such as
Deep Neural Network (DNN)s along with the power of hardware acceleration using
Graphics Processing Units can be used to build reliable systems to diagnose various sleep
disorders such as . These algorithms overcome the disadvantages and limitations of . The
techniques do not require the use of expensive and advanced medical instruments and
can achieve same accuracy as PSG without making any contact with the patients. The
image processing techniques do not require large amount of database and data samples
unlike supervised machine learning algorithms. The image processing techniques are
ideal in the cases where there is a scarcity of data set, which is indeed the case with
. Meanwhile, classification models using neural networks can overcome the limitation
of image processing techniques by automating the whole process without needing to
manually tweak thresholds and parameters.
From the results of this project, it can be concluded that the proposed non-invasive
diagnostic method using image processing and machine learning is able to extract good
amount of information about the sleep events.

33
10 Future Work

The present work proposed a diagnostic method based on image processing to extract
respiratory movement signals from the video data, signal processing to extract features
from the audio data and a classification model using neural network to classify the sleep
events in Apnea events and normal events. Following are some future works that can be
built upon the proposed algorithm:

• Data augmentation to increase the number of observations in training dataset


obtained from limited database available for .

• Modifying hyper-parameters of the proposed classifier model to increase the accu-


racy of the model.

34
Bibliography

[1] M. Bauman and T. Kemper, “Neuroanatomic observations of the brain in autism: A


review and future directions,” International journal of developmental neuroscience
: the official journal of the International Society for Developmental Neuroscience,
vol. 23, pp. 183–7, Apr. 2005. doi: 10.1016/j.ijdevneu.2004.09.006.
[2] M. J. Maenner, K. A. Shaw, A. V. Bakian, D. A. Bilder, and et. al., “Prevalence
and characteristics of autism spectrum disorder among children aged 8 years —
autism and developmental disabilities monitoring network, 11 sites, united states,
2018,” Dec. 2021. doi: 10.15585/mmwr.ss7011a1.
[3] K. Dillenburger, “Why early diagnosis of autism in children is a good thing,”
English, The Conversation, Oct. 2014.
[4] J. H. Elder, C. M. Kreider, S. N. Brasher, and M. Ansell, “Clinical impact of early
diagnosis of autism on the prognosis and parent–child relationships.,” Psychology
research and behavior management, 2017.
[5] D. L. Schomer and L. d. S. F. Henrique, Niedermeyer’s electroencephalography:
Basic principles, clinical applications, and related fields. Oxford University Press,
2018.
[6] M. Teplan, “Fundamental of eeg measurement,” MEASUREMENT SCIENCE RE-
VIEW, vol. 2, Jan. 2002.
[7] J. D. Bronzino, The Biomedical Engineering Handbook. CRC/Taylor amp; Francis,
2006, pp. 201–212.
[8] C. P. Amaral, M. A. Simões, S. Mouga, J. Andrade, and M. Castelo-Branco,
“A novel brain computer interface for classification of social joint attention in
autism and comparison of 3 experimental setups: A feasibility study,” Journal of
Neuroscience Methods, vol. 290, pp. 105–115, 2017, issn: 0165-0270. doi: https:
//doi.org/10.1016/j.jneumeth.2017.07.029. [Online]. Available: https:
//www.sciencedirect.com/science/article/pii/S0165027017302728.

35
Bibliography

[9] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung,


and B. J. Lance, “EEGNet: A compact convolutional neural network for EEG-
based brain–computer interfaces,” Journal of Neural Engineering, vol. 15, no. 5,
p. 056 013, Jul. 2018. doi: 10 . 1088 / 1741 - 2552 / aace8c. [Online]. Available:
https://doi.org/10.1088%5C%2F1741-2552%5C%2Faace8c.
[10] E. Santamaría-Vázquez, V. Martínez-Cagigal, J. Gomez-Pilar, and R. Hornero,
“Deep learning architecture based on the combination of convolutional and recur-
rent layers for erp-based brain-computer interfaces,” in XV Mediterranean Con-
ference on Medical and Biological Engineering and Computing – MEDICON 2019,
J. Henriques, N. Neves, and P. de Carvalho, Eds., Cham: Springer International
Publishing, 2020, pp. 1844–1852, isbn: 978-3-030-31635-8.
[11] F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-learn: Machine learning
in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

36

You might also like