Grp5TanYiXuen (2)

Capstone Project II
PRJ63504
MACHINE LEARNING APPROACH FOR

DEPRESSION DETECTION USING EEG SIGNALS
TAN YI XUEN
(0348746)
SCHOOL OF COMPUTER SCIENCE

BACHELOR OF COMPUTER SCIENCE (HONS)
APRIL 2023 [PART 2]
SUPERVISOR: MS. SUMATHI A/P BALAKRISHNAN

TABLE OF CONTENTS
Acknowledgement .......................................................................................................................... 3
Abstract ........................................................................................................................................... 4
Chapter 1: Project Proposal ............................................................................................................ 5
1.1 Executive Summary .............................................................................................................. 5
1.2 Project Purpose ..................................................................................................................... 6
1.2.1 Problem Statement ......................................................................................................... 6
1.2.2 Project Objectives .......................................................................................................... 7
1.3 Assumptions.......................................................................................................................... 9
1.4 Project Description, Scope and Management Milestones..................................................... 9
1.4.1 Project Description......................................................................................................... 9
1.4.2 Scope ............................................................................................................................ 10
1.4.3 Summary of Milestones and Deliverables ................................................................... 13
1.5 Project Organization ........................................................................................................... 14
1.5.1 Project Organization Chart .......................................................................................... 14
1.5.2 Roles and Responsibilities ........................................................................................... 15
Chapter 2: Literature Review ........................................................................................................ 16
2.1 Data Acquisition ................................................................................................................. 16
2.2 Methodology ....................................................................................................................... 17
Chapter 3: Analysis ....................................................................................................................... 21
3.1 Use Case Diagram............................................................................................................... 21
3.2 Architecture Diagram.......................................................................................................... 23
3.2.1 User end ....................................................................................................................... 23
3.2.2 Front end ...................................................................................................................... 24
3.2.3 Back end....................................................................................................................... 24
3.3 System Flow Diagram......................................................................................................... 25
3.4 Data Flow Diagram ............................................................................................................. 27
Chapter 4: Design ......................................................................................................................... 29
4.1 Introduction ......................................................................................................................... 29
4.2 Application Developer ........................................................................................................ 29
4.2.1 Prototype Design .......................................................................................................... 29
4.2.2 Methodology ................................................................................................................ 36
4.3 Data Engineer...................................................................................................................... 37
1
4.3.1 Training Dataset ........................................................................................................... 37
4.3.2 Data Pre-processing ..................................................................................................... 39
4.3.3 Methodology ................................................................................................................ 40
Chapter 5: Project Implementation ............................................................................................... 42
5.1 Introduction ......................................................................................................................... 42
5.2 Back-end System Architecture ........................................................................................... 42
5.2.1 Preprocessing stage ...................................................................................................... 43
5.2.2 Implementation of Machine Learning Model .............................................................. 43
5.2.3 Development Tools (Framework) ................................................................................ 43
5.3 Implementation Details ....................................................................................................... 44
5.3.1 Preprocessing Raw Data .............................................................................................. 44
5.3.2 Extracting the machine learning model ....................................................................... 57
5.3.3 Flask implementation ................................................................................................... 58
5.4 Challenges ........................................................................................................................... 78
5.4.1 Processing training dataset ........................................................................................... 78
5.4.2 Unable to access EEG recording devices..................................................................... 78
5.4.3 Lack knowledge of EEG .............................................................................................. 79
5.4.4 Lack knowledge in application development .............................................................. 79
Chapter 6: System Testing ............................................................................................................ 80
6.1 Login Page .......................................................................................................................... 80
6.2 Booking Function................................................................................................................ 82
6.3 Depression Test ................................................................................................................... 84
Chapter 7: Conclusion & Critical Evaluation ............................................................................... 86
7.1 Key Findings ....................................................................................................................... 86
7.2 Critical Evaluation .............................................................................................................. 87
7.2.1 Limitations ................................................................................................................... 87
7.2.2 Future Enhancements ................................................................................................... 88
7.3 Critical Appraisal of Work Done ........................................................................................ 89
References ..................................................................................................................................... 90
Appendix ....................................................................................................................................... 93
Appendix A: Project Proposal .................................................................................................. 93
Appendix B: Log Sheets ......................................................................................................... 109
Appendix C: Turnitin Result ................................................................................................... 120
2
Acknowledgement
First and foremost, I would like to express my sincere gratitude to Taylor’s University for giving
me and my group mates the opportunity to work on this wonderful project on the topic Machine
Learning Approach for Depression Detection Using EEG Signals.
Secondly, I am extremely grateful to my academic lecturer, who is also my project supervisor, Ms.
Sumathi for her continuous support and guidance throughout the semester. She has provided our
team with invaluable advice and suggestions to further improve and refine our project scopes and
objectives.
Finally, this endeavour would not have been possible without the coordination and help from my
fellow group mates Chew Li Wei, Ooi Jing Kai, and Yap Wan Teng. Everyone was able to assist
each other and complete this project within the provided time frame.
3
Abstract
Depression has been around for a long time and is currently one of the most common mental
disorders affecting humans. According to World Health Organization (2021), an estimated of 3.8%
of the worldwide population are suffering from depression, which is around 280 million people. It
is also discovered that younger adults aged 18 to 29 years old tend to have higher rate of depression
compared to older adults 65 or above (Villarroel & Terlizzi, 2020). In order to detect depression
among people, various depression tests have emerged like the Patient Health Questionnaire (PHQ-
9), Beck Depression Inventory-II (BDI-II), etc (Titov et al., 2011). Moreover, in recent years there
has been researches that incorporate machine learning in detecting depression in social media texts,
brain signals, and so on.
Hence, this project proposes to use machine learning as the main core of detecting depression using
EEG (Electroencephalogram) signals. A machine learning model will be constructed and
implemented on a desktop application platform called DeNotPress that is integrated in clinic
computers. As a secondary support for the results generated by the machine learning model, the
application will also include a self-evaluation questionnaire test (PHQ-9) that will be answered by
the patient and the results will be compared. If the patient is detected with depression, the clinic
practitioner can use the DeNotPress application to search for nearby specialists and recommend
them to the patient to seek further medical attention. Before the beginning of the development
phase, the team has performed numerous research and analysis on the approach of the project. This
paper contains the proposal, literature review, analysis of the system design used, the design of the
initial prototype of the desktop application, the implementation stage, the testing phase of the
application, and also the conclusion.
4
Chapter 1: Project Proposal
1.1 Executive Summary
Depression is an ongoing topic in the medical field, there are more and more people suffering from
depression and it is being more common in young teenagers (Aguiar Neto & Garcia Rosa, 2019).
According to World Health Organization (2021), there are 3.8% of the worldwide population
affected by depression. The highest depression rates are from young teenagers aged 12 to 17 years,
which take up 14.4% and followed by young adults ageing from 18 to 25 years old, which take up
13.8% (Sison, 2020). Besides, based on the research from Sison (2020), older people from 50 years
old onwards have the lowest depression rates of 4.5%. There are a lot of ways of detecting
depression among people, and one of them is through websites that provide a set of questions and
based on the choices it tells you whether you have depression or not. Some problems that we
identified from current available tests include unreliable results of online depression tests, training
dataset that is not properly processed affects the performance of the machine learning model, and
disclosure of user privacy.
The objective of this project is to develop a web application that is able to produce accurate results
on detecting depression, but instead of just using the traditional PHQ-9 questionnaire method, we
are proposing an additional approach of detecting depression by using Electroencephalogram
(EEG). EEG is essentially a graph of electrical signals which is triggered by brain activity.
Moreover, the application will also generate a medical report and locate nearby specialists. If they
are detected with depression, they will also be advised to seek medical attention from specialists.
Lastly, the application will also be implemented with cybersecurity measures to ensure patient’s
data privacy.
The main approach of the proposed web application is to detect depression through two stages,
conscious and unconscious responses. Conscious response refers to using PHQ-9 questionnaires
to record patients responses for initial depression detection. For unconscious responses, an EEG
test is performed to record a patient's brain signal activity. Next, the EEG results are fed to the
system for a secondary detection process. The depression detection system includes a machine
learning model that does all the processing and predicting work. In addition, multiple machine
learning algorithms and even deep learning algorithms will be compared beforehand and the one
5
that has the highest accuracy will be implemented to the final depression detection model of the
application. Moreover, multiple datasets will be compared and the most suitable one will be
selected as the training dataset to train the model before deployment.
1.2 Project Purpose

1.2.1 Problem Statement
Depression is a common mental disorder that is affecting people worldwide. Many depression tests
have emerged throughout the years, from traditional hand-written questionnaires, to the now more
advanced online websites that provide similar self-assessment tests. All these methods seem
practical on paper but in reality, nobody can define how reliable these tests are and nobody knows
if they can be trusted. The major problems that depression test websites encounter will be
addressed by the proposed desktop application platform. The following points serve as a summary
of the issues encountered.
1. Disclosure of user privacy

Cybersecurity breaches are intimately related to user information because many online
platforms have been exposed to security vulnerabilities. Users worry that once they have
given a platform their permission, a third party might steal their data. The site itself may
on occasion resell user data to generate illicit revenue. Some websites with low security
may have the cyber-attack from the black hat hacker. Users find it difficult to enter their
true data since the network security system's confidentiality is impacted by a number of
circumstances, which could affect how statistically reliable the information provided by
the Research Institute is.
2. Unreliable results of online depression tests

People look online for answers because they want to know everything. Some people have
a propensity to look online for support when they want to know if they are depressed. Now
is when the downside of internet free speech becomes apparent. On numerous websites,
users can quickly get answers to all of their inquiries concerning depression, but not all of
these answers are accurate and congruent with their symptoms. Due to inaccurate and
perplexing information, some users may even be misinformed and misunderstood about
6
depression. There are also many questionnaires available online to provide consumers a
preliminary test, however some of the questions lack enough specificity to allow users to
provide answers for their particular conditions, which causes the system to misjudge the
results.
3. Training dataset that is not properly processed affects the performance of the
machine learning model
Due to the limitations of questionnaire-based depression tests, in recent years there have
been attempts to implement machine learning into detection of depression. Some machine
learning models which are reported with accuracy below 70% are not good enough to be
used as early detection of depression because there is a high possibility of misclassification
(Lee & Ham, 2022). The accuracy of the machine learning model is highly affected by the
training dataset which has data that is not properly processed (contains noise, outliers,
inconsistency) will reduce the model’s performance (Rozhnova, 2021).
1.2.2 Project Objectives

1. To implement cybersecurity measures to the system to protect patient data privacy.
2. To develop a reliable depression test system analysed by using EEG signal with machine
learning approach.
3. To determine a training dataset that is most reliable to give better accuracy and reliability.
7
No. Proposed Functionality Problems Solved/ Opportunities
1 Control panel for user ● Functionality for Depression Test Application
● Lack of user friendly
2 Control panel for system ● Functionality for Depression Test Application
developer
3 Account management for ● Functionality for Depression Test Application
system developer ● Lack of system security
4 Account management for ● Functionality for Depression Test Application
user ● Lack of system security
5 Store the data for statistical ● Lack of analytics tools
purpose ● Lack of data for further research
● Lack of system security
6 Depression questionnaire • Functionality for Depression Test Application
7 Generating depression report ● Functionality for Depression Test Application
8 Generating level of ● Functionality for Depression Test Application
depression detection
9 Booking appointments ● Functionality for Depression Test Application
10 Locating nearby specialists ● Functionality for Depression Test Application

11 Register new patients ● Functionality for Depression Test Application
Table 1. Proposed Functionalities of the Application
8
1.3 Assumptions
We shall have access to online open-sourced records and necessary documents which includes test
data and scripts required to execute the application.
Assuming data acquisition and data collection are the same across the users, which include:
- Necessary preparations before performing EEG test
- EEG devices used
- Procedures to follow while performing EEG test
- Time required for EEG test
1.4 Project Description, Scope and Management Milestones

1.4.1 Project Description
The project is to propose a web application that can use EEG to detect depression. The approach
of this project is to develop a web application as a platform for clinic practitioners to utilise for
depression detection using EEG signals obtained from EEG tests. Two types of responses are used
to detect depression, conscious and unconscious responses. For conscious response, the clinic
practitioner will perform the PHQ-9 questionnaire for the initial depression test. Moving on, for
unconscious responses, EEG tests will be performed and fed through the depression detection
system for a secondary test. The two results of the conscious and unconscious responses are then
compared and finally concluding if the patient has depression or not.
One of the main solutions that provide the system the ability to detect depression using EEG signals
is by building a machine learning model. Such models have the ability to process EEG signals as
data inputs and extract features for classification. Based on the features extracted, it will undergo
further feature selection to select a subset of features that are necessary for detecting depression
signs. Since the model will be trained with some training data beforehand, it will base on previous
training results and predict whether the given EEG signals of the patient has depression or not. In
order to determine which machine learning algorithm produces the most accurate results, multiple
algorithms will be used to compare and find which one performs with the highest accuracy, then
it will be implemented to the final depression detection model.
9
The application can help a lot of people that feel that they might have some level of depression but
are afraid of visiting specialists. Instead, they can visit the nearest clinics and perform the
depression test, if they are detected with depression, then only they will be recommended to see a
specialist. This application can help reduce the cases of late diagnosis of depression which can be
fatal if not treated well and help improve people’s life who have mild or moderate depression.
Moreover, this application also enables developers to utilise the data collected and perform
statistical analysis to discover insights that can allow specialists to better understand what group
of individuals are more likely to have depression. Hence, our proposed application’s benefits are
stated as mentioned above.
1.4.2 Scope
Our proposed web application that uses EEG signals to detect depression is a dedicated system
that is only accessible to clinics with trained practitioners. Below is a detailed description of the
scope of our proposed application. First of all, there are 3 essential roles that make up the system,
they are written as follow:
a) Front End Developers - One of the critical groups of individuals that are responsible for
the entire development of the application interface and functionality.
b) Back End Developers - A critical group that are responsible for the output generating and
predictive model of the system.
c) User - The clinic practitioners that use the system.
Moving on, there are essentially three main parts that contribute to the application, which are stated
as below:
Application Platform
a) Framework
Framework is the main application that allows web application developers to develop their
applications from scratch. One of the most popular frameworks for web application
development is Flask. Flask is a Python-based web development framework that is
beginner-friendly because developers can develop a web application with some basic
knowledge of Python (Makai, 2022). It also allows developers to combine front-end codes
10
(HTML, CSS, JavaScript) with the back-end code, for this project’s case the machine
learning model. It is relatively lightweight and extensible, making it more suitable for
developing the depression detection system.
b) Detect depression using EEG and questionnaire

This is the main core of the application, which allows the user to know whether he or she
has the possibility of being detected with depression. The clinic practitioner will first
request the patient answer the questionnaire by oral questioning based on the Patient Health
Questionnaire(PHQ-9), and the clinic practitioners will help the patient input the patient’s
answer into the application. Next, the clinic practitioners will test the patients depression
by using EEG and will transfer the patient’s EEG signals to the application and the system
will take the input to undergo necessary computation as programmed. Next, the machine
learning model in the system will use this input to determine whether the patient is
depressed. The result will be shown on the system’s interface.
c) Provide medical report and recommend nearby specialists

If two test results confirm depression in a patient, a medical report will be generated. The
report will clearly state the PHQ-9 score of the patient and also the machine learning test
result. Moving on, the clinic practitioner can utilize the “locate nearby specialist” function
to search for specialists near to the patient’s location and recommend to them.
User Side
a) User
The clinic practitioner can access our application to detect the patient's depression by
logging in into our system, and then they would enter the patient's personal information
like the patient's name, gender, age, occupation, etc. The purpose of logging into the system
as an admin is to protect the patients privacy and security policies to prevent patients data
and information leak out.
Before the patient performs the EEG test, the clinic practitioner will perform the PHQ-9
questionnaire test on the patient as an initial depression test. Next, the clinic practitioner
will evaluate the patient's level of depression using Electroencephalogram (EEG) methods
11
to get accurate results that show whether the patients are suffering from depression.
Combining the two results from the questionnaire and the depression test using EEG
signals, the practitioner can conclude if the patient is depressed or not.
Server Side
a) Database
Cloud databases are becoming increasingly popular among web application developers, as
they offer numerous advantages over traditional storage methods. Unlike conventional
storage solutions, cloud databases can be accessed through the internet and do not require
physical storage devices. This allows developers to access their data from anywhere using
the vendor's API or web interface, making it incredibly convenient to use. Additionally,
cloud databases are highly scalable and flexible, making it easy for developers to store and
manage large amounts of data. As a result, developers prefer to use cloud databases because
of their ease of use, accessibility, and flexibility.
In a nutshell, the project will include the following functions:

a) Control panel for user
b) Control panel for system developer
c) Account management for system developer
d) Account management for user
e) Store the data for statistical purpose
f) Generating depression report
g) Generating level of depression detection
h) Booking appointments
i) Locating nearby specialists
j) Register new patients
k) Depression questionnaire
12
The project will not include the following functions:
a) Functions that are not stated in the scope above
b) Profit for the developers
c) Mobile app support (Android or iOS)
1.4.3 Summary of Milestones and Deliverables

Milestone Milestone Person Expected Duration
No. Responsible (days)
001 Strategy planning All 14 days
002 Design use case diagram Chew Li Wei 3 days
003 Design architecture of system Tan Yi Xuen 5 days
Come up with design of Yap Wan
004 system Teng, Ooi Jing 5 days
Kai
005 Develop system interface Ooi Jing Kai 14 days
Develop functionalities of Ooi Jing Kai
006 system Yap Wan Teng 14 days
Tan Yi Xuen
007 Research suitable dataset Tan Yi Xuen 3 days
Construct machine learning
008 Chew Li Wei 14 days
model
009 Develop database Chew Li Wei 7 days
010 Test system functionalities Ooi Jing Kai 3 days
Implement cybersecurity
011 Yap Wan Teng 7 days
measures
Identify potential bugs and
012 Tan Yi Xuen 5 days
debugging
013 Beta testing All 5 days
014 Launching of system All 3 days
Table 2. Summary of milestones
13
1.5 Project Organization
1.5.1 Project Organization Chart
Figure 1. Project Organization Chart
14
1.5.2 Roles and Responsibilities
Name Roles Responsibilities
- Supervise the team’s progress and
provide suggestions to further improve
the project’s scope. Provide necessary
Sumathi A/P
Project Supervisor guidance and assistance in terms of
Balakrishnan
knowledge, opinion, and viewpoint to
ensure the team is on the right track on
achieving the project’s goals
- Implement cybersecurity measures to
Yap Wan Teng Cybersecurity engineer the system to protect users from attacks
while using the system.
- Design the interface of the system.
- Improve the users’ experience quality
of the application
Ooi Jing Kai Front-end developer
- Develop the functionalities of the
system that are stated in the project
scope
- Combine front-end and back-end code
to develop application
Application developer
Tan Yi Xuen - Research on suitable training dataset
Back-end developer
- Pre-process the raw data into usable
data
- Construct a predictive model and
Back-end developer
Chew Li Wei analyse data
Database engineer
- Construct the database
Table 3. Roles and Responsibilities
15
Chapter 2: Literature Review
In this section, the main goal is to conduct researches on studies that have worked on the same
topic, which is detecting depression using machine learning and EEG signals. The findings of each
study will be summarized and compared in order to derive the project’s approach based on the
advantages and limitations of each study.
2.1 Data Acquisition

Safayari & Bolhasani (2021) made a study review on a variety of papers discussing about detecting
depression using deep learning with EEG signals. Since their study is a summary review of all the
papers they investigated, there are different variation of datasets used for the papers and a table
was shown to summarize the features of each dataset. Majority of the datasets collected EEG
signals from participants aged around 20 to 55 years old on both left and right hemispheres. Some
of the papers used a sampling frequency of 256Hz and some even went up until 500Hz, with a
signal duration of 5 minutes, which is preferred by most papers. Most paper have equal number of
healthy and depressed participants, but the EEG recordings were done on small number of
participants. It was found that because deep learning requires a large sample size to work
efficiently, the lack of sample size collected might have some degradation effects on the accuracy
results.
Avots et al. (2022) acquired their dataset from Tallinn University of Technology (TalTech). The
equipment used to perform the EEG recordings was the Cadwell Easy II EEG (Kennewick, WA,
USA), the total channels recorded were 18 channels. During the recording session, the subjects
were lying with their eyes closed. The sampling frequency rate was 400Hz for linear methods,
200Hz for nonlinear methods. The dataset consists of EEG signals recorded from 20 participants,
14 females and 6 males, from the age 24 to 60 years old. The participants were half healthy and
half depressed. In order to assess the healthy group of subjects, the HAM-D and EST-Q scores
were used to ensure that they did not show any signs depression and even other mental disorders.
Akbari et al. (2020) conducted their study using a dataset that consists of 22 healthy and 22
depressed subjects, there are 16 men and 6 women for healthy and 10 men and 12 women for
depressed. All subjects were aged 23 to 58 years old. During the EEG recording process, the
16
subjects were asked to be at their resting state with eyes-open and eyes-closed, each experiment
lasted 10 minutes. For the left hemisphere, the EEG signals were recorded by the FP1-T3 channel;
for the right hemisphere, FP2-T4 channel was recorded. The sampling frequency rate was 256Hz
and together with that a 50Hz power line intrusion elimination.
Wang et al. (2022) has used two datasets for their study, the first is the MODMA dataset from
Lanzhou University and the second is the Depression Rest dataset from University of New Mexico.
The first dataset consists of 18 depressed and 25 normal subjects, aged 18 to 53 years old. The
equipment used was a three-electrode EEG acquisition sensor that collects signals from Fp1, Fp2,
and Fpz electrodes. The subjects recorded their EEG signals with closed eyes at their resting states.
For the second dataset, the participants were aged from 18 to 25 years old with no mental disease
history. The EEG recordings were done on 64 channels with a 500Hz sampling rate. Since the first
dataset only uses three channels for the experiment, they only chose the three same channels as
the first dataset from the second to perform the comparison experiment.
Mohammadi & Moradi (2020) acquired their EEG dataset by performing the EEG recording
session with the Mistar-201 system. The session lasted for 5 minutes and the participants were
required to be in their resting state with their eyes closed. A total of 60 participants’ EEG data
were collected and in order to assess their depression severity levels, the DSM-IV interview and
BDI score was used. During the EEG recording process, the sampling rate was set to 256Hz and
the total channels recorded were 19.
2.2 Methodology
The research done by Safayari & Bolhasani (2021) is conducting reviews on papers that are related
to detecting depression using deep learning and utilizing EEG signals as the main data input. They
have evaluated a total of 22 articles ranging from 2016 to 2021 and organized them into the
systematic literature review (SLR). For pre-processing, they discovered that majority of the papers
used a notch filter of 50Hz , band-pass filters for noise removal; independent component analysis
(ICA) and wavelet transform method was found to be used by most papers to remove artifacts like
eye blinks, muscle movements, etc. Moving on to feature extraction, since almost all papers are
using deep learning for detecting depression, most of them utilized convolutional layer as an end-
17
to-end manner to extract relevant features including frequency waves (Alpha, Beta, Theta, and so
on). Fast Fourier Transform (FFT) and band-pass filter are the second most used technique for
feature extraction which some papers used it in combination with the convolutional layer. Their
reviews showed that CNN-based deep learning algorithms are more preferred compared to other
deep learning algorithms. A hybrid model of CNN and LSTM was ranked second to CNN-based
models. Some of the limitations they concluded from the papers are insufficient data input,
inefficiency of data pre-processing, and also lack of usability in real-world application.
Avots et al. (2022) done their research by performing classification using a variety of linear and
nonlinear features. Their main focus is in the feature selection stage to select the best subset of
features that gives the highest accuracy. They have also included some classifier configurations
that can further improve the classification performance. During the EEG recording section, they
recorded 18 channels of 20 subjects (14 females and 6 males) that have been diagnosed with
depression at some point of time. The features collected were represented as 1D vectors that
contain 9 feature types for 18 electrodes, which are then combined to produce 162 unique features.
In order to select the best subset of features, the univariate feature ranking algorithm, F-test
together with ReliefF was used. The machine learning algorithms used consists of SVM, LDA,
Naïve Bayes, KNN, and DT. Moreover, ensemble methods were also implemented for further
classification improvement. It was found that feature selection had a massive impact in improving
the classifiers’ accuracy, more specifically the ReliefF, which enabled classifiers to classify
subjects with an accuracy of 80-95%. Ensemble method allowed the Naïve Bayes classifier to
reach up to 93.3% of accuracy. They came to a conclusion that the best combination would be an
ensemble approach using ReliefF-selected features.
Akbari et al. (2020) have conducted their research in utilizing second order difference plot (SODP)
of EEG signals, and also using binary particle swarm optimization (BPSO) as their feature
selection technique for their classification of depressed subjects. Their study used a dataset
consisting of 22 healthy and 22 depressed subjects (26 men and 28 women). To visualize the EEG
signals in 2D space, they used SODP to represent the complexity of the EEG signals. Features
were selected using Kruskal-Wallis and BPSO method, then they were passed through the KNN
and SVM classifier with ten-fold cross validation. In order to evaluate the classifiers’ performance,
18
accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and
Matthews correlation coefficient are used as evaluation metrics. The results showed the cumulative
accuracy of the classifiers to classify depressed and not depressed subjects are 97.84% and 98.79%
for the left and right hemispheres respectively. Based on their results, they discover some insights
that the SODP shape of depression EEG signals are relatively easier to detect depressed subjects
compared to normal signals.
The research done by Wang et al. (2022) is to develop a CNN-based model that utilized multi-
channel data fusion and clipping augmented data of EEG signals to detect depression. There are
two datasets used by this study, one is the MODMA dataset from Lanzhou University, the other is
the Depression Rest dataset from the University of New Mexico. For data pre-processing, data
multi-channel fusion is used to combine the EEG signals into 2D images, this would not require
manually performing the processing and feature extraction of the normal EEG signals. In addition,
in order to acquire a large dataset, they used data multi-scale clipping augmentation to expand the
original dataset. Each dataset was expanded by 2, 4, 8 times. Moving on, the datasets were fed to
the CNN model to perform the classification task. For both datasets, the accuracy achieved without
fusion and augmentation was lower and was able to achieve higher accuracy with the both data
pre-processing techniques applied. Through their research they found out that by using methods
like data fusion and data clipping augmentation, the issue of small sample size can be overcome
and improve the performance of the CNN model. These methods can also reduce the feature
extraction steps required to make detection of depression more efficient.
Mohammadi & Moradi (2020) have conducted their study to investigate whether functional
connectivity and complexity of the EEG signals are able to classify the severity level of depression,
which is measured by the Beck Depression Inventory-II (BDI-II) scores. A total of 60 participants,
which consists of 17 minimal, 10 mild, 10 moderate, and 23 severe have participated this study.
The software EEGLAB is used to process the EEG signals, some of the methods used are
Butterworth filter, low pass filter, ICA, etc. The method used to investigate the significant
difference among the different depression levels are the post hoc test with Šidák method. To find
the relationship between the EEG signals and depression severity scores, the Pearson’s correlation
coefficient is used. In order to predict the depression severity level, a linear regression model is
19
implemented. The evaluation of the model is done using leave-one-out cross validation (LOOCV),
the metrics used to assess are RMSE and MAE. After the experiment, they found that there is
negative relationship between functional connectivity and depression severity, on the other hand
there is positive relationship between complexity features in alpha and delta band. They also
concluded that the performance gained was RMSE = 7.69, MAE = 6.11, and it was considered as
a reliable model.
Mahato & Paul (2019) compared multiple signal processing techniques for EEG signals and
compared the accuracy of different classifiers in detecting depression. In the pre-processing phase,
they remove artifacts that were produced due to eye blinks, muscle movements, or electrical noise
from the device power line. Moving on, some feature extraction techniques were used to extract
linear and nonlinear features. For linear features, the methods used were band power, Wavelet
Transform (WT), Fast Fourier Transform (FFT), Auto Regressive (AR), etc. On the other hand,
for nonlinear features, methods used were Correlation Dimension (CD), Approximate Entropy
(ApEn), etc. Since there is high number of features extracted from the EEG channels, they used a
number of feature selection methods to select a subset of significant features to feed through the
classifiers later on. The methods used include Principal Component Analysis (PCA), Linear
Discriminant Analysis (LDA), and Genetic Algorithm (GA). The classifiers used to classify
normal subjects and depressed subjects are LR, K-Means clustering, KNN, DT, SVM, and many
more. Based on their results, the highest performing classifier is SVM using all signal power and
alpha symmetry features with accuracy of 98.4%. Some limitations encountered by their research
include lack of standardized dataset for depression, the environment and procedures for EEG
recordings are not the same, the EEG recording length and epoch length are different.
20
Chapter 3: Analysis
3.1 Use Case Diagram
Figure 2. Use Case Diagram
According to Figure 2, it shows the use case diagram of the DeNotPress web application. This
diagram basically depicts the main functionalities that make up the whole system. There are four
main actors that play their role for different parts of the application, namely Clinic practitioner,
Patient, Administrator, and Database engineer. The relationships between the actors and the use
cases of the DeNotPress application are explained in detail at the below table.
21
Actor Use Case
Admin Register approved clinic practitioner: To register practitioners with

certain certification or training to use the application.
Verify user: When users login to the application, the admin can verify
if that user is a registered practitioner.
Database engineer Store data: To manage and store the data imported from the system
into the database.
Search query for data: Perform necessary queries for searching up

data.
Clinic Practitioner Login: Enter the account details given by the admin and login to the
application.
Booking for appointment: Make booking of appointments for the

clients by entering the client’s name, date and time of appointment.
Register new patient: Register new clients by entering their personal

information and medical history.
Depression test: Performs the depression test which is separated into

two sections, the PHQ-9 questionnaire and the machine learning
prediction. The PHQ-9 questionnaire will be done by the patient, next
the machine learning prediction will be done by the clinic practitioner
by uploading the patient’s EEG data into the system.
Search nearby specialist: If the patient is detected with depression,

the clinic practitioner can search for nearby specialists and
recommend it to the patient.
Patient PHQ-9 questionnaire: Answer the set of questions in the

questionnaire.
Table 4. Use Case Specifications

22
3.2 Architecture Diagram
Figure 3. Architecture Diagram
Figure 3 shows the architecture diagram of the DeNotPress application. It depicts some of the
physical components of the application and how they interact with each other. According to Figure
3, the architecture of the DeNotPress application can be separated into the following section: User
end, Front end, and Back end.
3.2.1 User end

The user end of the architecture diagram shows how the user can access the DeNotPress
application. Web browsers such as Google Chrome and Safari are the main platforms that the user
can access the DeNotPress application from since the application is designed for computer use.
23
Moreover, the user is required to have a stable Internet connection in order to connect to the
DeNotPress web application.
3.2.2 Front end

The front end of the DeNotPress application shown in Figure 3 includes the tools used for
developing the UI/UX design of the application. The three main components are HTML, CSS, and
JavaScript. HTML is mostly used for creating the layout and structure of the web page, CSS is
mainly used for customizing the design of the elements created in HTML for more appealing
appearance, and JavaScript is used for adding interactivity web pages for the users to interact with.
3.2.3 Back end

The back end architecture of the DeNotPress application is shown in Figure 3, which is made up
of multiple different components: database, machine learning, development tools, and security.
For database, MongoDB is chosen because it can allow storing of unstructured data such as EEG
data and also it can be easily integrated to the application with the installation of the PyMongo
extension. For machine learning, the application is implemented with the machine learning model
to detect depression using EEG data. For development tools, Flask is used as the framework to
develop the entire application using the Python language. Lastly, for security, Advanced
Encryption Standard (AES) and human verification are used to strengthen the security aspects of
the DeNotPress application to protect clients’ data privacy.
24
3.3 System Flow Diagram
Figure 4. System Flow Diagram
The above Figure 4 shows the system flow diagram depicting how the system processes go.
At the start of the application, the main user, which is the clinic practitioner, will have to login the
system at the login page. They will have to use the account details provided by the admin if not
they will not be able to login. In addition, the system will generate a random sequence for the user
to enter as an extra level of verification to ensure it is a human being trying to use the application
and not a robot or AI. If the user enters the wrong random sequence, the system will regenerate
another random sequence for the user to re-enter. After entering the correct random sequence, the
user will be directed to the home page.
25
There are three scenarios which the user can interact with the system, the first is for booking an
appointment for the client. The user can utilize the booking function by entering the client’s name,
date and time of appointment. After that, the system will display the available bookings on the
home page in a table.
The second scenario is when the user is trying the look up the client’s information. First, the clinic
practitioner will ask the client if they have previously visited the clinic and registered as their
patient, if not the user can register them as the new patient. If the client has been registered before,
the user can straight up search for the client’s details by entering their IC number or passport
number. The system will then display the client’s details.
The last scenario is when the client wants to perform the depression test. The clinic practitioner
will first enter the client’s basic information like first name, last name, IC or passport number, and
phone number. Once the client’s information is entered, the system will direct to the PHQ-9
questionnaire page where the client is required to answer the 9 questions by themselves. After
completing the questionnaire, the clinic practitioner will proceed to the next page where they will
upload the client’s EEG file and the system will then generate the depression test results. If the
client is detected as depressed, the clinic practitioner can utilize the locate nearby specialist
function to search for nearby specialist and recommend to the client.
26
3.4 Data Flow Diagram
Figure 5. Data Flow Diagram
In order to easily understand the flow of data of the application, a data flow diagram is used to
depict the flow of information in the system as shown in Figure 5. The input and output of the
system can be easily visualized using the data flow diagram and necessary improvements can be
made if there are any areas that contain redundancies.
At the initial stage of the system, the clinic practitioner has to enter their account details to login.
Here the data inputted is the Business ID and the Password, which will be fed into the system to
the administrative system to verify. Moving on, the clinic practitioner enters the patient’s personal
details like their name, age, IC number and so on if they are not a registered patient, then these
data get sent to the database for storing purposes. Next is the depression detection stage, where the
patient will first answer the PHQ-9 questionnaire test and provide their answers by entering in the
application interface. Those answers will generate a score which will in turn produce the initial
results and stored in the database. For the second depression test, the patient will record their EEG
27
signals with the help of the clinic practitioner and the data produced is the EEG recording and it
will be fed into the depression detection model in the system. The system the generates that result
showing if the patient is depressed or not, and this result again is stored into the database. This
sums up the basic data flow of the DeNotPress application.
28
Chapter 4: Design
4.1 Introduction
In this section, it provides the explanation and prototype design of our proposed web application,
DeNotPress based on the analysis done on the previous section. Our team consists of four members,
and each of us is responsible for different parts of the system. For this report, the roles and tasks
that will be discussed in detail are the application developer and data engineer.
4.2 Application Developer

The application developer is an individual that works on designing and creating interactive
functionalities of a computer software application, in this case it is the DeNotPress application.
They follow the specific software requirements provided by the client and develop an application
based on those requirements that is functional and interactive. As the application developer of the
DeNotPress web application, the main goal is to develop the functionalities of the application that
allows users to interact with while using the application. The prototype design and methodologies
that will be used are discussed in the below sections in detail.
4.2.1 Prototype Design

The prototype design of the functionalities that will be discussed in this report are the PHQ-9
questionnaire test and the import EEG recording data function. The prototype is designed using
Figma, which is an open-source tool that allows designers to create UI and UX designs of their
web or mobile applications, for this case Figma is used to design the prototype of the DeNotPress
application.
29
4.2.1.1 PHQ-9 Questionnaire Test
The PHQ-9, also known as Patient Health Questionnaire-9, is a commonly used method for
depression screening worldwide (Wang, Kroenke, Stump, & Monahan, 2020) and there are
currently many healthcare systems that uses it as their metric for detecting depression. It is chosen
as the initial response to collect from the patient for the DeNotPress application for some reasons
as stated below:
• Severity-based method for classification of patients of different depression levels, which
include minimal to severe depression (Lamela, Soreira, Matos, & Morais, 2020).
• Ease of use, high accuracy and sensitivity (Levis, Benedetti, & Thombs, 2019).
• Easy to interpret since the depression levels are classified by scores and can be used as a
guideline for monitoring patient’s condition.
30
Figure 6. PHQ-9 Questionnaire Sample
31
Based on Figure 6, it shows a sample of the standard PHQ-9 questionnaire that is internationally
recognized. There are a total of 9 questions to be answered and the score for each answer ranges
from 0 to 3. The total scores will then be calculated and the total scores will be classified to
corresponding severity levels of depression as shown in the below table.
PHQ-9 score Depression severity

0-4 Non – minimal
5-9 Mild
10-14 Moderate
15-19 Moderately severe
20-27 Severe
Table 5. PHQ-9 scores and Depression Severity Levels
According to Table 5, patients can be classified into non or minimal depression (0-4), mild (5-9),
moderate (10-14), moderately severe (15-19), and severe (20-27). One advantage of using this is
that it is a relatively easy process of detecting depression among patients without much hassle, but
of course it cannot be the sole guideline to determine if a patient is depressed or not. Instead, it
should be a supportive statement for detecting depression and additional measure should be taken
to further solidify this statement.
32
Figure 7. Prototype Design of PHQ-9 in DeNotPress application
Figure 7 shows the prototype design of how the PHQ-9 questionnaire will be implemented into
the DeNotPress application. Each question will be displayed on a separate page that can be
navigated with the Previous and Next button, and the patient can select their answers by clicking
on the little circles beside each answer. Just like the standard PHQ-9 questionnaire, the 9 questions
are all included in the system and the same score classification method is implemented for
classifying the different depression levels.
33
4.2.1.2 Import EEG Recording Data
One of the main core functions of the DeNotPress application is the detection of depression using
EEG signals. In order to achieve that, the EEG recordings of the patient must be collected and fed
into the system.
Figure 8. Electrode Layout of EEG Cap for 128 Channels
34
Figure 8 shows a simplified view of the EEG cap that is used for EEG recording for 128 channels.
There will be an EEG device or cap which the clinic practitioner will place on the patient’s scalp,
then the EEG recording session will begin. The results of the EEG signals will then be imported
into the system and fed through the depression detection model.
Figure 9. Prototype Design of EEG Recording Instructions
Figure 9 shows the prototype design of the page that directs the clinic practitioner to perform the
EEG recording session. Instructions are written to ensure the procedure is performed according to
the guidelines and no error will occur. There will be a little button below the instructions that
prompts the practitioner to click once the preparations are done, then they can click on the Start
button to initiate the EEG recording session. Once the recording session is done, the EEG signal
data will be imported into the system’s depression detection model.
35
4.2.2 Methodology
In order to make the above-mentioned functionalities of the DeNotPress application functional, a
combination of HTML, CSS, and JavaScript is required. These three programming languages are
commonly used by developers when designing their website, mobile application, and so on.
4.2.2.1 HTML
HTML, also called Hypertext Markup Language, is a programming language that allows
developers to create web pages and applications. It is the skeleton of most web pages nowadays.
HTML contains a set of tags and elements that is used to create things like heading, images, footers
and many more. Basically, HTML is used to create the foundation structure of any web page or
application, it can be enhanced with the help of CSS and JavaScript to make the web page
interactive.
4.2.2.2 CSS
CSS, known as Cascading Style Sheets, is a styling language that is used to enhance the appearance
of web pages or applications. It provides developers a pathway to design the layout and apply
different fonts, colours, positioning and so on to the elements and content of the web page while
separating it from the original structure of the web page. This allows easy management and ease
of modification of the appearance of the web page.
4.2.2.3 JavaScript
JavaScript is a well-known programming language in the web development ecosystem that is used
primarily to make web pages functional and interactive. It is especially common for front end
developers that design the web interface and implement functionalities to the web pages. For back-
end developers, JavaScript is normally used for server-side scripting, and also building
applications on computers or mobile phones with the help of frameworks like Electron.
4.2.2.4 Flask
Flask is a Python-based framework which is used for web or application development. It enables
developers with basic Python and front-end (HTML, CSS, JavaScript) knowledge to build a
functionable application with simple, easy-to-understand coding. It is also lightweight and
36
extensible at the same time, allowing developers to import additional libraries and dependencies
to add special functions to the application. This is extremely useful especially for this project that
requires the implementation of machine learning.
4.3 Data Engineer

The data engineer is responsible for working with the raw data collected from the data acquisition
phase. They are required to do researches on different datasets and perform data pre-processing to
those data to convert them into usable format to pass through the machine learning model. Like
previously mentioned in the project objectives, before implementing the machine learning model
into the DeNotPress application, it must be trained beforehand with a training dataset and tested
for its performance to ensure that the results are highly accurate and reliable. In this project, there
are two main tasks for the data engineer, finding a suitable training dataset to train the model and
come up with data pre-processing methods to process the data.
4.3.1 Training Dataset

In this section, a summary is provided as four datasets that are available online are reviewed and
compared in a table below.
Dataset Stimuli Stress Total Selected No. Participant Frequency Rate Classes
(Stressor) Labelling EEG Channels (Hz)
Channels
Schizophrenia Actigraph Motor - - 23 schizophrenia; 32Hz MADRS1/
and depression watch activity 23 depressed; 32 MADRS2
dataset control group
Resting state EEG/ Resting- 60 60 88 adolescents - BDI_II/

EEG of Questionnaire state or PHQ9
adolescents eyes-closed
and -open
SEED/ Video EEG 62 62 SEED: 15 SEED: 200Hz -
Emotional BCI stimulation signals Emotional BCI Emotional BCI
Competition Competition Competition
Database Database: 23 Database: 100Hz
MODMA Emotional- EEG 128 128 24 MDD; 29 250Hz -

neutral face signals healthy
pairs
Table 6. Summary of Datasets
37
The four datasets analysed are: Schizophrenia and depression dataset; Resting state EEG of
adolescents; SEED/Emotional BCI Competition Database; MODMA. The key points that are
reviewed on Table 6 are: Stimuli, Stress Labelling, Total EEG Channels, Selected Channels, No.
Participant, Frequency Rate (Hz), and Classes.
4.3.1.1 Schizophrenia and Depression Dataset

This dataset was collected for the research on motor activity in people with schizophrenia and
major depression. The motor activity was recorded via an actigraph watch worn at the participants’
hand. The sampling frequency rate was set to 32Hz and the class labels were determined using the
MADRS1 and MADRS2 scores.
4.3.1.2 Resting State EEG of Adolescents

This dataset is a collection of EEG data of 88 adolescents at their resting state, with both eyes-
closed and -open. Within the 88 participants, the number of participants for each depression
severity level are 30 minimal, 29 mild, and 29 moderate. The EEG channels used were 60 channels
and the class labels used were the BDI-II and PHQ-9 scores.
4.3.1.3 SEED/Emotional BCI Competition Database

This dataset is a combination of two datasets, SEED and Emotional BCI Competition Database.
The SEED dataset collects EEG signals from 15 subjects under video stimulation. The sampling
rate was 200Hz with a bandpass filter applied from 0 to 75Hz. On the other hand, The Emotional
BCI Competition Database contains EEG signals from 23 subjects, also under video stimulation,
with a sampling rate of 100Hz.
4.3.1.4 MODMA
This dataset contains EEG signals recorded from 24 MDD and 29 healthy subjects with ages
ranging from 16 to 52 years old. The total EEG channels used were 128 and the sampling frequency
rate was set to 250Hz. The subjects were shown emotion-neutral face pairs as stimuli while
recording their EEG signals.
38
Based on the research done on the four datasets, the MODMA dataset was chosen as the training
dataset for this project for the following reason:
• It records EEG signals from subjects that normal and have depression, which is related to
this project’s objective and approach.
• It records EEG signals through 128 channels, which can provide more features to be chosen
at the feature extraction stage. Moreover, it can allow more combinations during feature
selection to select the most relevant set of features to detect depression.
• According to some articles from the literature review, they have also chosen this dataset
for their research, which proves that this dataset is widely approved by most researchers.
4.3.2 Data Pre-processing

Now that a dataset is chosen, the first step to do is data pre-processing. It is an important process
that allows the conversion of the raw data into usable format that can be utilized for feature
extraction, feature selection and eventually passing into the machine learning model for
classification. Based on the literature review done in chapter 2, data pre-processing can be divided
into two steps, removal of noise and removal of artifact.
4.3.2.1 Removal of Noise

Noise basically refers to any random variation or disturbance that is present in the EEG signals.
Normally noise is caused by interference from electronic devices, fluctuations of the power supply
during the recording session, etc. One of the methods to remove noise in EEG signals are filters.
According to Safayari & Bolhasani (2021) and Mohammadi & Moradi (2020), the most common
methods for noise removal include high pass filter, low pass filter, notch filter, and band-pass filter.
These filters can be performed by setting a specific frequency range to remove unwanted frequency
signals and retain the targeted signals. If used correctly, it can almost eliminate all noise from the
EEG signals, which can improve the machine learning model’s performance.
4.3.2.2 Removal of Artifact

Artifact refers to any distortions that are present in EEG signals. Some of the forms of artifact
include spikes, glitches, dropouts, etc. In EEG signals, artifacts are mostly produced due to eye
blinking, muscle movement, and even electromagnetic interference. Based on the literature review,
39
there are two common approaches for removing artifacts, Independent Component Analysis (ICA)
and Wavelet Transform method. ICA has the ability to separate independent components from a
multivariate signal, whereas Wavelet Transform removes artifacts from each channel individually.
The combination of these two techniques can remove almost all artifacts in the EEG signals. In
addition, methods like common average references (CAR) can also be used to remove artifacts or
noise caused by electrical interference or muscle activity.
4.3.3 Methodology
Figure 10. MNE
In order to perform data pre-processing, a library must be selected that contains the necessary
packages and tools. MNE (Magnetoencephalography and Electroencephalography) is chosen as
the library for providing the data pre-processing tools to analyse and process the EEG data. The
reasons why MNE was chosen is stated below:
• It is a Python-based software package that is commonly used for analysing and visualising
EEG signals.
• It is open-source hence it can be easily accessible.
• It supports multiple formats of EEG files which is crucial because EEG data can be
collected using different formats.
• Some notable tools that are provided include filters, artifact removal, and so on, which is
basically everything needed for this project.
40
41
Chapter 5: Project Implementation
5.1 Introduction
In this chapter, the development process and implementation of the system is discussed in detail.
This chapter covers the necessary techniques and methods used for the development of code for
preprocessing the raw EEG data and also the development code for the whole application, which
combines both front-end and back-end codes to complete the entire web application. Besides, the
challenges encountered during the development process are discussed with derived solutions to
resolve those issues. The aim of this chapter is to ensure that the implementations meet the system
scope and requirements that were outlined previously in the proposal.
5.2 Back-end System Architecture
Figure 11. Back-end System Architecture Diagram
Based on Figure 11, it shows the back-end system architecture of the DeNotPress web application.
It includes database, machine learning, development tools (framework), and also the security
aspects. The author is responsible for the preprocessing stage, implementation of machine learning
model to the application, and also developing code using the framework to build the entire
application. Hence, in this chapter the author only discusses about the above-mentioned parts .The
other parts of the back-end development are discussed by the other back-end developers of the
project.
42
5.2.1 Preprocessing stage
In the preprocessing stage, one of the first things to do is the acquisition of the dataset to be used.
Like mentioned in the previous chapter, the MODMA dataset is chosen and in order to have
authorization and access to retrieve the dataset, the author is required to submit an application form
to the authorities of the MODMA website. After receiving the dataset, the author starts to analyse
the raw data using the MNE library in Jupyter Notebook. There were issues during the analysing
process and the solutions will be discussed in detail later.
Preprocessing techniques like bandpass filters, EEG referencing, and creating epochs are
implemented to ensure that the raw EEG data is clean from noise and biases and separated into
segments for the modelling processes. The specific and detailed implementation process will be
discussed furthermore in the later chapter.
5.2.2 Implementation of Machine Learning Model

Once the other back-end developer has completed the comparison between three algorithms, which
are Support Vector Machine (SVM), K-Nearest Neighbour (KNN), and Convolutional Neural
Network (CNN), the best performing algorithm is chosen for constructing the classification model.
For constructing the machine learning model, methods like hyperparameter tuning are used to
further tune the parameters of the classifier to achieve higher accuracy and better performance.
Next, the model is exported and implemented to the application.
5.2.3 Development Tools (Framework)

The development tools (framework) used for developing the DeNotPress application is Flask. The
majority of the back-end coding from machine learning model, connecting to the database, and
implementation of security features can be written all in Python. There are also methods provided
by Flask that enables rendering of HTML, CSS and Javascript templates for designing the UI and
UX of the application.
43
5.3 Implementation Details
In this section, the detailed technical description of the implementation process is documented to
provide some insights of the thought process and problem-solving skills to handle the issues
encountered.
5.3.1 Preprocessing Raw Data

In order to convert raw EEG data into clean usable data for analysing, the preprocessing stage is
crucial in achieving that. Techniques like removing noise and biases, filtering, artifact removal are
some of the common methods used to process EEG data. The technical descriptions of the
preprocessing process are discussed below.
5.3.1.1 MODMA Dataset

Originally, in the previous chapters the authors mentioned that the 128-channel EEG dataset from
MODMA is chosen as the training dataset for this project. Eventually, the author decided to
retrieve the 3-channel dataset from MODMA since preprocessing the 128-channel dataset was
deemed too difficult for the purpose of this project. The 3-channel dataset includes 55 participants,
consisting of 26 outpatients with 15 being male and 11 being female between 16 to 56 years old
that were depressed, and 29 healthy controls with 20 being male and 9 being female between 18
to 55 years old (Cai et al., 2022).
44
Figure 12
The 3 electrode EEG recording device
The EEG data was recorded using a 24-bit A/D converter with a sampling frequency of 250Hz.
The equipment is shown in Figure 12 above. The experiment was done in a soundproof room and
the participants had their eyes closed during the 90 seconds EEG data recording session.
45
Figure 13. Snippet of EEG data file
Figure 13 shows a snippet of one of the EEG data files in txt format. It contains values recorded
in the experiment from 3 electrodes, which are located on the prefrontal lobe (Fp1, Fpz, Fp2). In
order to know the exact size of the EEG data file, the author used Jupyter Notebook and imported
the numpy library to read the txt file and used the shape() function to find out the size of the data
file.
46
Figure 14. Loading the txt file and finding the data size
Based on Figure 14, the author used the np.loadtxt() function to load one of the txt files. Next, the
shape() function is used and it was found that the data file contains 301740 rows and 3 columns.
This means that the EEG recording session done in this experiment was considered a long session
due to the sheer volume of EEG data recorded.
Figure 15. Code snippet for creating EEG object
Moving on, in order to convert the raw EEG data from the txt file to an actual EEG object, the
author first specifies the channel names that were recorded (Fp1, Fpz, Fp2). An info object was
created to store the information of the EEG data, which includes the channel names, channel types,
and the sampling frequency of 250Hz. Next, the raw EEG object is created by passing the data file
and info object into the function mne.io.RawArray(). The author also used set_eeg_reference() for
re-referencing of the EEG channels for removing unwanted common noises and also performing
bandpass filter with low frequency cutoff of 0.5Hz and high frequency cutoff of 45Hz (MNE
47
Developers, 2023). Lastly, using the plot() function the EEG data can be plotted as a continuous
graph for visualization.
Figure 16. Output code snippet of creating raw object and plot
In Figure 16, it shows the output generated when creating the raw EEG object and also plotting it
into a graph for visualization. Based on the graph, there is no visible representation of the EEG
graph. This is possibly due to some error while loading the EEG data file as a txt file or it may be
that the values contained inside the file do not fall within the appropriate range for EEG signals.
48
Figure 17. Loading the EEG file with pandas read_csv method
For an attempt to solve the previous issue, a different method was used for loading the EEG txt
file, which is using pandas read_csv() function. The author set the separator or so-called delimiter
to a space since the data inside the txt file are separated using a space and also set the header to
None since the data file does not contain any column names to begin with. Moving on, the data is
converted to a numpy array and displayed as shown in Figure 17 to ensure the array is valid.
Figure 18. Code snippet for creating raw EEG object and plotting
The similar method for creating the raw EEG object is used. By defining the sampling frequency
(250Hz), channel names (Fp1, Fpz, Fp2), and channel type (eeg), the data can be passed through
the mne.io.RawArray() function to create the raw EEG object. This time for plotting the EEG data,
49
the author specifically selected the Fp1 channel for plotting for a duration of 100s to check if the
raw EEG data is properly processed and can be visualized as a graph.
Figure 19. Output code snippet of creating raw object and plot
According to Figure 19, there is still some problems while reading the EEG data because the plot
generated still does not represent a graph. After multiple reattempts, the issue is still not solved.
After consideration, the author decided to search for other datasets with different file formats that
the MNE library can directly read and process instead of loading the data with pandas or numpy
first and then use MNE to create raw EEG object.
50
5.3.1.2 Dataset from Wajid Mumtaz
The second dataset retrieved can be found through this link:
https://figshare.com/articles/EEG_Data_New/4244171
This dataset was uploaded by Mumtaz et al. (2017), which was collected through an experiment
approved and carried by the Hospital Universiti Sains Malaysia (HUSM), Kelantan, Malaysia.
This dataset consists of EEG data samples collected from 34 major depressive disorder (MDD)
outpatients, including 17 males and 17 females with a mean age of 40.3±12.9; another group of
participants are 30 healthy controls, including 21 males and 9 females with a mean age of
38.3±15.6.
Figure 20
The EEG cap with 19 electro-gel sensors
Note. The EEG cap with sensors placed according to the International 10-20 electrode placement
standard, by Mumtaz et al., 2017, PLOS ONE.
Based on Figure 20 above, it shows the EEG cap sensor placement that was used in this experiment
to record the EEG data of the participants. The cap with 19 electro-gel sensors followed the
International 10-20 system method. Moreover, there were three types of EEG data collected from
each participant, eyes closed (5mins), eyes open (5mins), and task (10mins) which involves a 3-
stimulus oddball task.
51
Figure 21. Code snippet for reading EEG data
After retrieving the dataset from Mumtaz et al. (2017), the author performs the same procedure as
before to load the EEG data and attempt to read it using the library MNE. Based on Figure 21, the
code uses mne.io.read_raw_edf() function for reading the EEG data which is in edf format.
Figure 22. Output code snippet of successful read
According to Figure 22, the output generated after running the code above shows that it has
successfully read the edf file and created a raw EEG object. Now for further confirmation that the
raw EEG object is properly created, the author used the same plot() function to plot the EEG data
into a graph.
52
Figure 23. Graph generated
According to Figure 23, it shows the graph of the raw EEG object that was read earlier. This time
the graph looks like what an EEG data should be, in addition it also shows the different channels
on the left and the time on the bottom. This confirms that this dataset consisting EEG data of edf
format can be successfully read using the library MNE but according to the graph there seems to
be some noise components in the EEG signals. Thus, the author can move on to the preprocessing
stage to clean and covert the raw data into usable format for feature selection and machine learning
model development.
53
Figure 24. Preprocessing EEG data
Moving on the author starts to implement preprocessing techniques to the EEG data. According to
Figure 24, the two methods are used to preprocess the EEG data, first is set_eeg_reference() and
second is filter() which is a bandpass filter function.
The function set_eeg_reference() is used mainly for setting a reference electrode to serve as a
baseline for the other electrodes, which can help improve the EEG data distribution (MNE
Developers, 2023). By not specifying the type of reference, it will implement the default reference
method which is the Common Average Reference (CAR). CAR takes the signal from each
electrode and re-referenced it by subtracting the average across all electrodes, this is done to reduce
the common noises such as electrical interference and muscle activity and also enhancing the
spatial distribution of the EEG data (Labs, 2019).
Next, the bandpass filter is used mainly to pass a specific frequency range through and ignoring
the others for enhancing the specific frequency components. This is a common technique used for
removing unwanted noise in EEG data and improving the quality of the EEG data. Based on Figure
24, the bandpass filter is set with a low frequency cutoff of 0.5Hz and high frequency cutoff of
45Hz.
54
Figure 25. Graph generated after preprocessing
After applying the preprocessing methods, the author plotted another graph of the same EEG data
to visualize the difference before and after preprocessing. Based on Figure 25, it shows that the
EEG signals are cleaner and some of the noise that can be seen from the previous graph has been
successfully removed. Now, the author will implement these reading and preprocessing techniques
to process the whole dataset.
55
Figure 26. Importing dataset files and grouping
First and foremost, the author specifies the file path to the folder containing the EEG files. Since
the EEG files are grouped into healthy and depressed samples, the code shown in Figure 26 above
is used to separate the two groups of samples based on the naming convention of the EEG files
into healthy and patient file paths.
Figure 27. Defining function for reading and preprocess EEG data
Next, the author defines a function called read_data that can read all the EEG files and preprocess
them using the reading and preprocessing techniques discussed earlier. In addition, during the
implementation of this function, the author found that the EEG files in the dataset were not
consistent in the number EEG channels. Some of the EEG files had 20 channels and some had 22,
hence the author added an addition code which will drop the two extra channels (23A-23R, 24A-
24R) to make all EEG files have the same 20 number of channels. Besides, the author also added
the mne_make_fixed_length_epochs() function to the defined function read_data and set it with
duration of 5 seconds and overlap of 2 seconds. Creating epochs is commonly used in order to
segment the large EEG data into smaller segments, this is because EEG data is continuous and
56
recorded in long sessions, meaning the data size is large. To reduce the data size and computational
load, the author decided to add this function to increase the efficiency in processing the EEG data.
Lastly, the read_data function will return an array for each EEG file.
Figure 28. Calling the read_data function
With the read_data function, the author used it to read and process the EEG files in the healthy and
patient file paths and saving all of the arrays in separate array objects control_epochs_array and
patient_epochs_arrray. Now that the EEG data has been properly processed and cleaned, the author
will pass the workflow to the other back-end developer to perform feature selection and construct
the machine learning model.
5.3.2 Extracting the machine learning model

After the machine learning comparison work is done by the other back-end developer, it was found
that Support Vector Machine (SVM) had the best performance compared to K-Nearest Neighbour
(KNN) and Convolutional Neural Network (CNN). Now, before the author is able to integrate the
SVM classifier into the DeNotPress application, the SVM classifier must be extracted into a sav
file for it to be called in the system.
57
Figure 29. Code snippet for extracting model
In order to save the SVM classifier model, a module called pickle is used. Pickle mainly uses
binary protocols for object serialization and deserialization (Python, 2023). First, the author
specifies the file name to save the model as (PredictionModel.sav), then using the dump() function
from pickle to write the pickled representation of the model object into the file. By doing so, the
PredictionModel.sav file will be saved in the same folder together with the Python code.
5.3.3 Flask implementation

The framework that was chosen to develop the DeNotPress web application is Flask. Flask is a
popular Python-based web framework for developing web applications (Makai, 2022). Some of
the advantages of using Flask for developing the DeNotPress application is stated below:
• Uses Python language for code development: Python is the main language used for
developing Flask applications, this is extremely convenient for developing DeNotPress
because one of the main features of the application is the machine learning model used for
detecting depression. By using Python, the integration of the model will be much easier
compared to using other frameworks.
• Microframework: Flask is considered a microframework, which means that it is suitable
for developing smaller projects initially but provides scaling up features in the future
(Deery, 2022). It only includes the simple and essential features for developing an
application which makes it very beginner-friendly.
• Extensibility: Flask is considered extensible because it enables the developer to import
outsource libraries or modules. For this project, in order to integrate machine learning into
the application, libraries or modules like MNE, numpy, scipy, scikit-learn and so on can
be imported easily.
58
• Easy development using local computers: Using Flask, it provides a built-in
development server that utilizes the local host server. This feature is useful especially for
web application development because it enables developers to test run the application
without the need to subscribe to online web servers. Moreover, the instant debugging
feature allows the author to quickly reload the application when making changes to the
DeNotPress application, making the development process more efficient.
5.3.3.1 Importing libraries and modules
Figure 30. Code snippet for importing libraries and modules
According to Figure 30, it shows all the necessary libraries and modules required to be imported
when developing the DeNotPress application. Some notable modules such as mne, numpy, sklearn,
scipy are the modules needed to be imported to integrate the SVM machine learning model into
the application. The module Pymongo is used for connecting the DeNotPress application to the
MongoDB database. The rest of the libraries and modules are used for making the application
functional.
59
5.3.3.2 Connecting to database
Figure 31. Code snippet for connecting to database
In order to connect the Flask application to the MongoDB database, the code in Figure 31
demonstrates how to do just that. The first line of code is used for establishing a connection to the
author’s MongoDB database platform by passing the author’s connection string copied from the
database. Next, the second line of code is used for accessing the database ‘patient_info’. For the
third and fourth line of code, it is used to access the two collections which were created by the
author beforehand, one being ‘Patient’ for saving patient details which is assigned to collection1
and another being ‘Booking’ for saving bookings for appointments which is assigned to collection2.
5.3.3.3 Developing login page
Figure 32. Code snippet for login page
Based on Figure 32, it shows the code implementation to develop the login page of DeNotPress.
In the route (‘/’), it is the very first page that will be displayed everytime the application is ran.
Using the render_template() function in Flask, the author is able to render the Login.html template
to display the login page. In the route(‘/login’), it is used for defining the login logics to verify the
user’s username and password to login to the application. By defining the method as ‘POST’, it
60
makes it so that the system can retrieve the user’s input from the HTML template using the
request.form.get function from Flask. By setting a condition like shown in the code, if the user
enters admin as username and the correct password, the Flask application will direct the user to
the home page using the redirect function and specifying the destination (‘/home’). If the user
enters the wrong details, the system will return a message “Invalid username or password”.
Figure 33. Login page
5.3.3.4 Developing home page
Figure 34. Code snippet for home page
61
The code in Figure 34 shows the route (‘/home’) and code implementation to develop the
homepage of DeNotPress. Since the home page includes a table that displays the bookings of
appointments made by the user, in order to retrieve the bookings from the database and display it
on the home page, the author assigned an object called ‘bookings’ which stores the list of bookings
retrieve from the database using collection2.find(). The for loop is used for defining the index
numbers of the bookings starting from 1. Lastly, the author used the render_template() function to
render the Home.html template and also passed the bookings object to the html file to display the
bookings in the table created in the html file.
Figure 35. Home page
62
5.3.3.5 Developing booking page
Figure 36. Code snippet for booking page
Based on Figure 36, it shows the code implementation to develop the booking page by defining
the route (‘/booking’). Since the booking page needs to retrieve user input, the ‘POST’ method is
specified. The author assigned an object called booking_data to store the user inputs
(customer_name, date, time_slot) into a dictionary. Next, the data in the booking_data dictionary
is stored into the database collection ‘Booking’ using collection2.insert_one(). At the end of the
code, the render_template() is used to render the Booking.html.
Figure 37. Booking page
63
5.3.3.6 Developing register client page
Figure 38. Code snippet for register client page
The code above for developing the register client page is identical with the booking page. The only
difference is the variables retrieved from the html form, which includes the client’s first name, last
name, nationality, nric number and so on. All these information are stored in the patient_data
dictionary and stored into the database collection ‘Patient’ using collection1.insert_one(). Lastly,
render_template for rendering the Register.html.
64
Figure 39. Register client page
5.3.3.7 Developing search client page
Figure 40. Code snippet for search client page
The code shown in Figure 40 develops the route (‘/search’) for creating the search client page. It
is also implemented with the ‘POST’ method because it will receive user’s input and use that to
perform the search for client details in the database. The identifier variable is used to retrieve the
user’s input (nric or passport number), then it is used for looking up the corresponding client in
the database and read the data stored for that client using collection1.find(). Then the client’s
details will be displayed in the page.
65
Figure 41. Search client page
5.3.3.8 Developing the locate nearby specialist page
Figure 42. Code snippet for locate nearby specialist page
The code provided is used for rendering the locate nearby specialist page by specifying the route
(‘/specialist’) and using the render_template() to render the Nearby_specialist.html file. Since the
function to display the specialist is all done in the HTML file hence there is no additional coding
required in the Flask end.
66
Figure 43. Locate nearby specialist page
5.3.3.9 Developing the depression test function

The depression test in the DeNotPress application is the main core feature which utilizes the PHQ-
9 questionnaire and EEG data to generate the depression detection results. Moreover, it is also
integrated with the SVM machine learning model pipeline to process the EEG data and classifies
the client as healthy or depressed. The depression test function contains a few sections and will be
discussed below.
67
Figure 44. Code snippet for enter patient details page
The first page displayed for the depression test is the enter patient details page, which is used for
entering the patient’s details like first name, last name, IC or passport number, and phone number.
This information is used for determining which patient is performing the depression test. The code
snippet in Figure 44 shows the code implementation of the route (‘/searchPatient’) with the method
‘POST’ to receive user’s input. The user’s inputs are stored in the variable patient and passed to
the database to search for the corresponding patient using the collection1.find_one() function.
Once the corresponding patient is found, the patient’s data is stored using the session feature and
json_util.dumps() function. This is used for saving the patient’s data in the session inside a
temporary directory on the server in order to utilize this data in later stages (PythonBasics, 2022).
This is so that at the end of the depression test the system knows which patient to store the results
in the database and display the results together with the patient’s details. Lastly, this route is
redirected to the next page, which is the questionnaire page.
68
Figure 45. Enter patient details page
69
Figure 46. Code snippet of questionnaire page
Based on Figure 46, it shows the code snippet of the code implementation of rendering the PHQ-
9 questionnaire page in the route (‘/questionnaire’). The calculation of the scores of the PHQ-9
questionnaire is handled in the HTML file, the code here is used for storing the patient’s score of
the PHQ-9 questionnaire into the database. First, using the session which has the patient data saved
inside is retrieved as the variable patient. Next, the score variable is used to store the patient’s
score as an integer using the request.form.get() function to retrieved the score from the HTML
form. Lastly, the score is updated in the patient’s data in the database using the
collection1.update_one() function. Then, the system will redirect to the next page where the clinic
practitioner uploads the patient’s EEG file.
70
Figure 47. Questionnaire page
71
Figure 48. Code snippet for upload EEG file page
The code snippet shown in Figure 48 shows the code used for rendering the upload EEG file page.
The upload function is defined in the HTML file (EEG_upload.html) and the session containing
the patient’s data is also passed into the route (‘/upload’) using the json_util.loads() and
session.get() function.
Figure 49. Upload EEG file page
72
5.3.3.10 Implementation of SVM model
Since during the previous section the SVM model is successfully extracted and saved into a sav
file, the author moves on to the implementation of the machine learning model into the Flask
application by building the depression detection pipeline.
Figure 50. Code snippet of loading the SVM model
Based on Figure 50, the SVM model is loaded into the flask application using the load() function
from pickle. This allows the PredictionModel.sav file to be deserialized and assigned to the model
object to be called later on when developing the depression detection pipeline.
Figure 51. Code snippet of reading data function
According to Figure 51, the function read_data is defined in order to accept the input of the raw
EEG data file and process it to create the raw EEG object to pass it into the SVM model. The same
preprocessing techniques done in the previous chapter are also implemented here. In the end this
function will return an array of the epochs created from the raw EEG object.
73
Figure 52. Code snippet of feature analysing function
The code shown in Figure 52 is used for extracting the various statistical features from the EEG
data. Functions like calculating the mean, standard deviation, variance, and so on are used here.
Lastly, the concatenate_features function is defined to combine the results of all the above
functions into a single array that explains the different aspects of the EEG data with the statistical
features extracted.
74
Figure 53. Code snippet of depression detection function
The code provided in Figure 53 shows the main code of implementing the SVM machine learning
model by defining the function called predict_depression. This function accepts the input of an
EEG data file of edf format, then the file is passed through the first function read_data to read and
process the raw data file. Next, the for loop is used for identifying the StandardScaler object inside
the machine learning pipeline, if the object is found then it is assigned to the scaler variable.
Moving on, the concatenate_features function is called to extract the relevant features and store
them into the features array. This features array will then be passed into the SVM model using the
predict() function and it will generate the prediction results from the defined class_labels ‘Healthy’
and ‘Depressed’. Lastly, the predict_depression function returns the result of the machine learning
prediction pipeline for it to be stored into the database later on.
75
Figure 54. Code snippet for predict pipeline
The code snippet in Figure 54 depicts the predict pipeline that will be processed after the clinic
practitioner has uploaded the EEG file into the system, which is routed as (‘/predict’). First, the
system retrieves the patient’s data from the session and assigns it to the current_patient variable
using json_util.loads(). Next, the system will retrieve the EEG file uploaded using request.files
and pass it to the predict_depression() function. The results generated is stored in the variable
result and then gets stored in the database to the corresponding patient’s details using
collection1.update_one(). Lastly, the system will redirect the user to the results page (‘/result’).
76
Figure 55. Code snippet for results page
Figure 55 shows the code snippet for developing the results page with the route (‘/result’). First,
the system retrieves the patient’s data using the session.get() function and loads it into the
current_patient variable using json_util.loads(). It then searches for the corresponding patient’s
details in the database using collection1.find_one() with the patien’s IC or passport number as the
identifier. Lastly, once the patient has been found, the patient’s details will be passed to the
Result.html to display it on the results page.
Figure 56. Results page
77
5.4 Challenges
In this section, the author discusses about the challenges faced during the implementation stage of
the project and what are the approaches attempted by the author to solve those obstacles.
5.4.1 Processing training dataset

First and foremost, the preprocessing stage of the training dataset chosen at first, which is the
MODMA dataset poses multiple issues that the author was not able to overcome. Like discussed
in the implementation section, the MODMA dataset EEG files are in txt format, and the MNE
library does not contain a function which can directly read and load the EEG files with txt format.
Hence, the author attempts to read the EEG file as an array using numpy and pandas and then
create a raw EEG object using MNE. Although this process did not pose any error in the code
output, when the author tries to plot the EEG data into a graph there is no visual representation of
the EEG signals. After multiple attempts, the author eventually decided to find another dataset and
try to read directly using MNE since the MODMA dataset was making it difficult to work with in
the early stages. Eventually, the author settled with the dataset by Wajid Mumtaz which contains
EEG files in edf format that stores time-series data. Since the MNE library contains a function that
can read EEG files with edf format directly, this time by plotting the EEG data it is able to show
actual graph representations of the EEG signals.
5.4.2 Unable to access EEG recording devices

Second, during the proposal and brainstorming stages of this project, it was originally planned to
allow the system to connect to an external EEG recording device to record real-time EEG data
from the patient and pass that EEG data directly into the DeNotPress application. Unfortunately,
the author found out that it is impossible to have access to such devices since it requires a
professional to operate and perform such experiments. Hence, the team later changed the scope to
providing a file upload feature to let users upload patient’s EEG data file into the system. This
means that the patient would need to perform their EEG data recording session on other places or
clinics beforehand.
78
5.4.3 Lack knowledge of EEG
Moreover, due to lack of knowledge and experience in dealing with EEG data, the process in
developing the necessary code to properly process the EEG data was time-consuming and often
times inefficient because the techniques used might not be effective. The author had to spend time
in researching communities like Stack Overflow and the MNE community to look for solutions on
how to process EEG data. The finalized code used may not be the best solution but it still manages
to convert the raw EEG data into usable format and was able to pass it into the machine learning
model later on.
5.4.4 Lack knowledge in application development

Lastly, the author had no prior knowledge or experience in application development in the
beginning of this project, which makes it even difficult and time-consuming because the author
needs to start from scratch by doing extensive research and learning from online resources.
Eventually, the author came across Flask and found it to be suitable for developing the DeNotPress
application since it is Python-based and extensible, meaning the author can develop the code using
Python language and import Python libraries and modules like previous data science assignments
and projects. This made the developing process much faster and effective since Python is much
easier and straightforward to handle compared to other languages.
79
Chapter 6: System Testing
In this chapter, the author discusses about the system testing cases during the development of the
DeNotPress application, this is a crucial stage of developing the application because it allows
developers to assess the application’s performance and whether it is functional or not. The test
cases discussed in this chapter only contains the test cases done by the author himself.
6.1 Login Page

Test Case ID : TCID_01
Test Title : Login interface and function
Test Designed by : Ooi Jing Kai
Tested by : Tan Yi Xuen
Description : User can login the DeNotPress application with the given credentials.
Step Test Steps Expected Results Actual Results Status

1 User launches the Application pops up The application pops up Pass
application. in separate window in separate window and
with the login page the login page is
displayed. displayed successfully.
2 User enters the The username entered The username entered Pass
username and is displayed and the by the user is displayed
password. password entered is and the password is
hashed. hashed successfully.
3 User clicks on the The system generates The system successfully Pass
“Generate” button to the random sequence. generates the random
generate the random sequence and displays it.
sequence for human
verification.
4 User enters the random The user can login to The user successfully Pass
sequence and clicks on the system’s home login to the system and
page.
80
the “Login” button directed to the home
login to the application. page.
Table 7. TCID_01 for login page
Based on Table 7, it shows the necessary steps required to test the flow of the login page to ensure
it is fully functional and can run without any errors. The results acquired after testing the login
page meet the expected results, which means users can login to the DeNotPress application with
the given credentials. The human verification step where the system generates the random
sequence is also an additional feature to ensure the system is not being attempted to be login by a
robot or AI.
81
6.2 Booking Function
Test Title : Booking appointment and display on home page
Test Designed by : Ooi Jing Kai
Description : User can make bookings for client appointments and the bookings are
shown on the home page.

1 User clicks on “New System directs user to The system successfully Pass
Appointment” button. the booking page. directs the user to the
booking page.
2 User enter client’s The user can enter the The user can enter the Pass
name, date and time client’s name, select name, choose date from
slot. the date from the the calendar pop up, and
calendar pop up, and choose the time slot
select the time slot available in the time slot
from the time slot pop pop up.
up.
3 User clicks on The system generates The system successfully Pass
“Submit” button. a pop up displaying generates the success
the success message pop up and redirects the
and redirects the user user back to the home
back to the home page. The user is able to
page. The bookings see the bookings in a
are displayed in a table.
table.
Table 8. TCID_02 for booking function
82
According to Table 8, it shows the test steps taken to test the booking function of the DeNotPress
application. This function is more complicated since it requires the saved bookings to be displayed
on the home page after successfully performing the booking appointment feature. The code that
runs behind this function is more prone to error since it requires the system to fetch the booking
data from the database and display it in a table. After testing, the system did not generate any error
and the user can book appointments and it is shown on the home page in a table, which means it
meets the expected results and indicates that the booking feature is fully functional.
83
6.3 Depression Test
Test Title : Perform PHQ-9 questionnaire and depression detection test using EEG
Test Designed by : Chew Li Wei
Description : User can perform the depression test by first completing the PHQ-9
questionnaire and then followed by uploading EEG file to the system to
generate depression test result.

1 User clicks on System directs user to The system successfully Pass
“Depression Test” the enter client details directs the user to the
button. page. enter client details page.
2 User enter client’s first The user can enter the The user can enter the Pass
name, last name, IC or client’s first name, client’s first name, last
passport number, phone last name, IC or name, IC or passport
number. passport number, and number, and phone
phone number. number successfully.
3 User clicks on “Start” The system directs The system successfully Pass
button. the user to the PHQ-9 directs the user to the
questionnaire page. PHQ-9 questionnaire
page.
4 User answers the PHQ- The user can select The user is able to Pass
9 questionnaire. from the four choices answer all 9 questions
for each question. by selecting one answer
for each question.
5 User clicks on the The system directs The system successfully Pass
“Submit” button. the user to the EEG directs the user to the
file upload page. EEG file upload page.
84
6 User uploads EEG file. The system can The system receives the Pass
receive the EEG file EEG file and passed the
and pass the EEG EEG data through the
data through the detection pipeline.
detection pipeline.
7 User clicks on the The system directs The system successfully Pass
submit icon. the user to the results directs the user to the
page and displays the results page and
client’s details and displays the client’s
the depression test details and depression
results. test results (PHQ-9
score and EEG
prediction result).
Table 9. TCID_03 for depression test
Table 9 shows the necessary test steps required to test the main function of the DeNotPress
application, which is the depression test. First, the system needs the user to enter the client’s details
who is going to perform the depression test, besides the system needs to be able to identify the
corresponding client in the database to be able to store the results afterwards. This test case is
crucial in determining whether the DeNotPress application’s main feature is a success or not.
Based on Table 9, the test steps done were all successful and meet the expected results, the system
can accurately detect whether the client is depressed or healthy by processing the EEG file.
85
Chapter 7: Conclusion & Critical Evaluation
In this chapter, the author provides the conclusion and critical evaluation of the development of
the DeNotPress application. This chapter is separated into the following sections: key findings,
critical evaluation, and lastly future research.
7.1 Key Findings

First and foremost, the DeNotPress application is designed to address the current problems of late
detection of depression, where many people who are not aware that they are suffering from
depression due to the unreliable depression test results produced by current online systems. This
application allows clinic practitioners to perform depression test on these group of people using
their EEG data with the implementation of machine learning algorithm to predict of they are
depressed or not. If the system detects that the person is depressed, the PHQ-9 questionnaire result
which the person answered will be used to assess their level of depression. Based on that, the clinic
practitioner can recommend nearby specialists to the person to seek further medical attention.
The key findings during the completion of this project are stated below:
• EEG data is much more complex and complicated to process than expected, extensive
research was required in order to ensure properly cleaned data to pass into the machine
learning model. The different file formats EEG data could be stored in indicates that there
are multiple ways and approaches to process it and there is no universal solution to process
all types of EEG data.
• The DeNotPress application is able to produce highly accurate and reliable depression test
results using the EEG data file uploaded into the system. The SVM machine learning model
implemented into the system achieved an accuracy level of 97%, which was way above the
benchmark set from the previous research work.
• The application interface is simple for users to navigate around and does not contain
excessive elements that will confuse the users. The interface of the application was
designed to be as professional as possible since its main user will be the clinic practitioners.
• The DeNotPress application has the potential to save people’s lives who are suffering from
depression because of the highly reliable depression detection results it produces. It can
86
reduce the cases of late diagnosis of depression and be the key to encourage more people
who are unsure of their current mental health status to take one step out to perform these
depression tests.
In short, the DeNotPress application can be a life-saving key to those who are unaware that they
are suffering from depression. They can go to their nearest clinics and perform the depression test
using the PHQ-9 together with their EEG data, producing a more reliable depression detection
result, thus reducing the cases of late diagnosis of depression.
7.2 Critical Evaluation

The DeNotPress application is designed to provide a valuable contribution to the medical field in
hopes of reducing the cases of late diagnosis of depression. To proof that this application is
essential in early depression detection and it has an edge over the current online depression test
systems, in this section the author talks through the critical evaluations of this project, which is
discussed in detail by separating into two sections: limitations and future enhancements.
7.2.1 Limitations
One of the main limitations of the DeNotPress application the lack of administrative system to
allow admins to register certified clinic practitioners as the user. Since there is no administrative
system currently, the available certified user’s credentials are hard-coded into the system, which
means the admin is unable to register new users in the future. This makes it hard if there are several
users that can utilize the application, and excessive blocks of code are required making the back-
end development much tedious.
Second, another limitation of the application is that the machine learning model pipeline designed
by the author can only accept EEG files with edf format. Since EEG data can be stored in multiple
file formats such as bdf, vhdr, fdt, and so on (BIDS Contributors, 2022), it would be difficult for
client’s to only have their EEG data recorded specifically in edf format. Hence, it would be best if
the machine learning pipeline can accept different file formats of EEG data.
87
Moreover, the third limitation of the DeNotPress application is that the machine learning model
used might not perform well as time progresses. This is due to the fact that machine learning
models do not have the ability to “learn” overtime as data input increases, meaning it can only be
trained with past data and not new data. If there are new variations in EEG data in the future, the
machine learning model might not be able to accurately detect whether that person is depressed or
not.
7.2.2 Future Enhancements

All the above-mentioned limitations provide possibilities of future enhancements to be done in
order to improve the DeNotPress application’s usability. First, for the lack of administrative system,
future enhancements could be done by developers to develop the administrative system for admins
to register new users to have access of the application. This system should require the admin to
enter the clinic practitioner’s valid certifications to proof that they have received proper training
before since this application is not intended for the use of normal clinicians.
Moreover, extensive research on developing the machine learning pipeline can be done in the
future to come up with a method that can accept different file formats of EEG data instead one just
one. This will greatly enhance the capabilities of the application since it can process different types
of EEG files while also being able to produce highly accurate and reliable results. In addition,
further research can be made to connect external EEG recording devices directly into the system
so that the EEG data can be imported directly after the recording session instead of having it
recorded in a different place and time.
Lastly, deep learning algorithms could be implemented in the future since they are able to keep on
“learning” and improving with continuous inputs of new EEG data overtime. This is something
machine learning cannot achieve whereas deep learning “behaves” like the human neural network,
its ability to “learn” is what makes it perform so well in the long run compared to machine learning.
Even though during the machine learning comparison stages of this project, the deep learning
algorithm used (CNN) was found to not have good performance compared to the other machine
learning algorithms, but if the data input is large enough, its performance will eventually
outperform the machine learning algorithms.
88
7.3 Critical Appraisal of Work Done
Throughout the whole journey of developing the DeNotPress application and completion of this
capstone project, the author has acquired a considerable amount of knowledge on the scope of the
project’s topic. During the beginning stages of the capstone project, the author had faced many
difficulties because the project topic is new and unexplored before. It took a long period of time
and extensive research in order to fully understand the concepts of EEG and how other researchers
have analyse and process EEG data to constructing the machine learning model pipeline to predict
depression. Even though the author is specialised in Data Science, working with EEG data is a
completely new domain and the challenges faced were hard to overcome because of how diverse
the topic is and the author ends up getting overwhelmed by different solutions provided by online
resources such as Stack Overflow. In the end, it was a great learning experience for the author to
deepen his knowledge in working with EEG data for detecting depression.
Besides, working with a team to develop a web application from scratch was completely new to
the whole team. None of the team members had any experience in application development, thus
the amount of work and research required to be done by each member was extensive and time-
consuming. Often times the progress of each member is slow since it takes longer time for them
to debug or resolve the issue faced during the implementation phase of their work. Moreover,
communication has been a challenging task because many times some design or function of the
application were changed but not every member was aware of the change. They end up proceeding
with the old design and had to change it later on after realising changes were made. This made the
development process took longer than expected and during the last few weeks it became stressful
for each member because time was not enough to finish up the application. Luckily, everyone held
on until the end and the DeNotPress application was successfully finished and functioning.
Although it is not perfect and could use further touching up and improvements, in the end the
whole team is satisfied with the result.
89
References
1. Aguiar Neto, F. S. de, & Garcia Rosa, J. L. (2019). Elsevier Enhanced Reader.
Reader.elsevier.com.
https://reader.elsevier.com/reader/sd/pii/S0149763419303823?token=624FC1B3974D68
95E60AEEDE729FD147F612BC4D776184FC63B30474C22C860A58CF2553C5A9EA
6F32A62623EC24A81C&originRegion=eu-west-1&originCreation=20230207061155
2. Akbari, H., Sadiq, M. T., Payan, M., Esmaili, S. S., Baghri, H., & Bagheri, H. (2020).
Depression Detection Based on Geometrical Features Extracted from SODP Shape of
EEG Signals and Binary PSO. ResearchGate.
https://www.researchgate.net/profile/Hesam-Akbari-
6/publication/350434065_Depression_Detection_Based_on_Geometrical_Features_Extra
cted_from_SODP_Shape_of_EEG_Signals_and_Binary_PSO/links/62fd7e4bceb9764f72
04550e/Depression-Detection-Based-on-Geometrical-Features-Extracted-from-SODP-
Shape-of-EEG-Signals-and-Binary-PSO.pdf?origin=publication_detail
3. Avots, E., Jermakovs, K., Bachmann, M., Päeske, L., Ozcinar, C., & Anbarjafari, G.
(2022). Ensemble Approach for Detection of Depression Using EEG Features. Entropy,
24(2), 211. https://doi.org/10.3390/e24020211
4. BIDS Contributors. (2022). Electroencephalography - Brain Imaging Data Structure

v1.8.0. Bids-Specification.readthedocs.io. https://bids-
specification.readthedocs.io/en/stable/04-modality-specific-files/03-
electroencephalography.html
5. Cai, H., Yuan, Z., Gao, Y., Sun, S., Li, N., Tian, F., Xiao, H., Li, J., Yang, Z., Li, X.,
Zhao, Q., Liu, Z., Yao, Z., Yang, M., Peng, H., Zhu, J., Zhang, X., Gao, G., Zheng, F., &
Li, R. (2022). A multi-modal open dataset for mental-disorder analysis. Scientific Data,
9(1), 178. https://doi.org/10.1038/s41597-022-01211-x
6. Deery, M. (2022). The Flask Web Framework: A Beginner’s Guide. Careerfoundry.com.

https://careerfoundry.com/en/blog/web-development/what-is-flask/#advantages-and-
disadvantages-of-flask
7. Dinculeană, D., & Cheng, X. (2019). Vulnerabilities and Limitations of MQTT Protocol
Used between IoT Devices. Applied Sciences, 9(5), 848.
https://doi.org/10.3390/app9050848
8. Labs, S. (2019, March 31). Common Average vs Infinity Reference in EEG. Sapien Labs |
Neuroscience | Human Brain Diversity Project. https://sapienlabs.org/lab-talk/common-
average-vs-infinity-reference-in-eeg/#:~:text=The%20Average%20Reference
90
9. Lamela, D., Soreira, C., Matos, P., & Morais, A. (2020). Systematic review of the factor
structure and measurement invariance of the patient health questionnaire-9 (PHQ-9) and
validation of the Portuguese version in community settings. Journal of Affective
Disorders, 276, 220–233. https://doi.org/10.1016/j.jad.2020.06.066
10. Lee, K.-S., & Ham, B.-J. (2022). Machine Learning on Early Diagnosis of Depression.
Psychiatry Investigation, 19(8), 597–605. https://doi.org/10.30773/pi.2022.0075
11. Levis, B., Benedetti, A., & Thombs, B. D. (2019). Accuracy of Patient Health
Questionnaire-9 (PHQ-9) for screening to detect major depression: individual participant
data meta-analysis. BMJ, 365(1476). https://doi.org/10.1136/bmj.l1476
12. Mahato, S., & Paul, S. (2019). Electroencephalogram (EEG) Signal Analysis for
Diagnosis of Major Depressive Disorder (MDD): A Review: Proceeding of NCCS 2017.
ResearchGate.
https://www.researchgate.net/publication/326784158_Electroencephalogram_EEG_Signa
l_Analysis_for_Diagnosis_of_Major_Depressive_Disorder_MDD_A_Review_Proceedin
g_of_NCCS_2017
13. Makai, M. (2022). Flask. Fullstackpython.com.

https://www.fullstackpython.com/flask.html
14. MNE Developers. (2023). MNE — MNE 1.0.2 documentation. Mne.tools.

https://mne.tools/stable/index.html
15. Mohammadi, Y., & Moradi, M. H. (2020). Prediction of Depression Severity Scores
Based on Functional Connectivity and Complexity of the EEG Signal. Clinical EEG and
Neuroscience, 155005942096543. https://doi.org/10.1177/1550059420965431
16. Mumtaz, W., Xia, L., Mohd Yasin, M. A., Azhar Ali, S. S., & Malik, A. S. (2017). A
wavelet-based technique to predict treatment outcome for Major Depressive Disorder.
PLOS ONE, 12(2), e0171409. https://doi.org/10.1371/journal.pone.0171409
17. Park, C.-S., & Nam, H.-M. (2020). Security Architecture and Protocols for Secure
MQTT-SN. IEEE Access, 8, 226422–226436.
https://doi.org/10.1109/ACCESS.2020.3045441
18. Python. (2023). pickle — Python object serialization — Python 3.9.6 documentation.
Docs.python.org.
https://docs.python.org/3/library/pickle.html#:~:text=%E2%80%9CPickling%E2%80%9
D%20is%20the%20process%20whereby
19. PythonBasics. (2022). Session data in Python Flask - Python Tutorial. Pythonbasics.org.
https://pythonbasics.org/flask-sessions/
91
20. Rozhnova, M. (2021, July 27). Impact of dataset errors on model accuracy. Deelvin
Machine Learning. https://medium.com/deelvin-machine-learning/impact-of-dataset-
errors-on-model-accuracy-723fef5e0b28
21. Safayari, A., & Bolhasani, H. (2021). Depression diagnosis by deep learning using EEG
signals: A systematic review. Medicine in Novel Technology and Devices, 12, 100102.
https://doi.org/10.1016/j.medntd.2021.100102
22. Sison, G. (2020, February 4). Statistics about depression in the U.S. The Checkup.
https://www.singlecare.com/blog/news/depression-statistics/
23. Titov, N., Dear, B. F., McMillan, D., Anderson, T., Zou, J., & Sunderland, M. (2011).
Psychometric Comparison of the PHQ-9 and BDI-II for Measuring Response during
Treatment of Depression. Cognitive Behaviour Therapy, 40(2), 126–136.
https://doi.org/10.1080/16506073.2010.550059
24. Villarroel, M., & Terlizzi, E. (2020, September 24). Symptoms of Depression Among
Adults: United States, 2019. Www.cdc.gov.
https://www.cdc.gov/nchs/products/databriefs/db379.htm
25. Wang, B., Kang, Y., Huo, D., Feng, G., Zhang, J., & Li, J. (2022). EEG diagnosis of
depression based on multi-channel data fusion and clipping augmentation and
convolutional neural network. Frontiers.
https://www.frontiersin.org/articles/10.3389/fphys.2022.1029298/full
26. Wang, L., Kroenke, K., Stump, T. E., & Monahan, P. O. (2020). Screening for perinatal
depression with the Patient Health Questionnaire depression scale (PHQ-9): A systematic
review and meta-analysis. General Hospital Psychiatry, 68, 74–82.
https://doi.org/10.1016/j.genhosppsych.2020.12.007
27. World Health Organization. (2021). Depressive Disorder (Depression). World Health
Organisation. https://www.who.int/news-room/fact-sheets/detail/depression
92
Appendix
Appendix A: Project Proposal
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
Appendix B: Log Sheets
109
110
111
112
113
114
115
116
117
118
119
Appendix C: Turnitin Result
120
The high Turnitin result was due to chapter 2 to 4 submitted on the previous semester, which took
up 34% of the entire 37% plagiarism percentage.
121

Grp5TanYiXuen (2)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Grp5TanYiXuen (2)

Uploaded by

Copyright:

Available Formats

Capstone Project II

MACHINE LEARNING APPROACH FOR

SCHOOL OF COMPUTER SCIENCE

APRIL 2023 [PART 2]

SUPERVISOR: MS. SUMATHI A/P BALAKRISHNAN

1.2 Project Purpose

1. Disclosure of user privacy

2. Unreliable results of online depression tests

1.2.2 Project Objectives

10 Locating nearby specialists ● Functionality for Depression Test Application

Table 1. Proposed Functionalities of the Application

1.4 Project Description, Scope and Management Milestones

b) Detect depression using EEG and questionnaire

c) Provide medical report and recommend nearby specialists

In a nutshell, the project will include the following functions:

1.4.3 Summary of Milestones and Deliverables

Table 2. Summary of milestones

Figure 1. Project Organization Chart

Table 3. Roles and Responsibilities

2.1 Data Acquisition

Figure 2. Use Case Diagram

Admin Register approved clinic practitioner: To register practitioners with

Search query for data: Perform necessary queries for searching up

Booking for appointment: Make booking of appointments for the

Register new patient: Register new clients by entering their personal

Depression test: Performs the depression test which is separated into

Search nearby specialist: If the patient is detected with depression,

Patient PHQ-9 questionnaire: Answer the set of questions in the

Table 4. Use Case Specifications

Figure 3. Architecture Diagram

3.2.1 User end

3.2.2 Front end

3.2.3 Back end

Figure 4. System Flow Diagram

Figure 5. Data Flow Diagram

4.2 Application Developer

4.2.1 Prototype Design

PHQ-9 score Depression severity

Table 5. PHQ-9 scores and Depression Severity Levels

Figure 8. Electrode Layout of EEG Cap for 128 Channels

Figure 9. Prototype Design of EEG Recording Instructions

4.3 Data Engineer

4.3.1 Training Dataset

Resting state EEG/ Resting- 60 60 88 adolescents - BDI_II/

MODMA Emotional- EEG 128 128 24 MDD; 29 250Hz -

Table 6. Summary of Datasets

4.3.1.1 Schizophrenia and Depression Dataset

4.3.1.2 Resting State EEG of Adolescents

4.3.1.3 SEED/Emotional BCI Competition Database

4.3.2 Data Pre-processing

4.3.2.1 Removal of Noise

4.3.2.2 Removal of Artifact

Figure 10. MNE

5.2 Back-end System Architecture

Figure 11. Back-end System Architecture Diagram

5.2.2 Implementation of Machine Learning Model

5.2.3 Development Tools (Framework)

5.3.1 Preprocessing Raw Data

5.3.1.1 MODMA Dataset

Figure 15. Code snippet for creating EEG object

Figure 22. Output code snippet of successful read

Figure 28. Calling the read_data function

5.3.2 Extracting the machine learning model

5.3.3 Flask implementation