You are on page 1of 91

BANGALORE INSTITUTE OF TECHNOLOGY

K.R Road, V.V Puram, Bengaluru-04


DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

“Detection of Parkinson’s Disease


using Machine Learning”
Presented by

1BI17CS158 Suryansh Singh


1BI17CS159 Sushrut M
1BI17CS160 Swapnil Negi
1BI17CS161 Syed Ameen
Under the Guidance of
Dr Suneetha K R

Associate Professor
Dept of CSE
Agenda
• Introduction
• Literature Survey
• Existing System
• Problem Statement
• System Design
• Applications
• References
Introduction
Parkinson’s disease is a neurodegenerative disorder of central
nervous system that causes partial or full loss of motor reflexes,
speech, behavior, mental processing, and other vital functions.

It is generally observed in elderly people and causes disorders in


speech and motor abilities of 90% of the patients. 

People with Parkinson’s disease suffer from speech impairments


like dysphonia (defective use of the voice), hypophonia (reduced
volume), monotone (reduced pitch range), and dysarthria
(difficulty with articulation of sounds or syllables).

In this project voice parameters of Parkinson’s disease patients


and healthy subjects will be analyzed to predict the presence of
Parkinson’s disease.
Objectives
• To accomplish improved accuracy of the models used
for detecting and predicting Parkinson’s disease.

• To build supervised machine learning algorithms to train the


models on cleaned  UCI Parkinson's dataset.

• To design a web application to take voice features as


parametric input and check if a person has Parkinson’s disease
or not.

• To implement a system which monitors activities of patient


suffering from Parkinson's disease and stores their history.
 
Literature Survey
Sl. No. Title Author Year Contributions & Drawbacks

1. Collection and Analysis of a Betul Erdogdu Sakar, M. 2013 • Introduced a dataset which was collection of voice samples
Parkinson Speech Dataset Erdem Isenkul, C. Okan Sakar, from 40 individuals out of which 20 were healthy and 20 were
with Multiple Types of Ahmet Sertbas, Fikret Gurgen, suffering from Parkinson disease.
Sound Recordings Sakir Delil, Hulya Apaydin, • Compared the leave one subject out and summarised leave one
and Olcay Kursun out validation schemes on classification algorithms KNN and
SVM. Results showed that s-loso performed much better than
the loo method.
• Mean and standard deviation were found to better metrics for
summarization of the features obtained from the voice samples
rather than considering each and every sample of a subject.
• Main drawback is the model gave low accuracy results with
KNN using loso and s-loso schemes since vowels carried more
weight on PD classification rather than sentences or words.

2. Analysis of multiple types of Achraf Benba, Abdelilah 2016 • MFCC technique was applied to obtain voiceprint from
voice recordings in cepstral Jilbab, Ahmed Hammouch recorded speech samples which was compressed by taking
domain using MFCC for the average value. This prevented low accuracy obtained and
discriminating between time for processing by the classification algorithms.
patients with Parkinson’s • SVM with different kernels- rbf, linear and polynomial was
disease and healthy people applied on the Istanbul PD dataset. These were applied for a,
o and u vowels and their combination. The results indicated
that svm with mlp kernel provided highest accuracy for
vowel a , o and u with 80.0%, 77.5% and 82.5% separately.
• SVM with linear kernel obtained highest accuracy of 77.5%
with vowels a, o and u combined together.
• Main drawback is SVM models obtained low accuracy with
different kernels which could not implemented for clinical or
medical diagnosis of PD as feature selection was not
performed.
Sl. No. Title Author Year Contributions & Drawbacks

3. DETECTION OF Ömer ESKøDERE, Ali 2015 • Random subspace ensemble method using knn to
PARKINSON’S DISEASE KARATUTLU, Cevat ÜNAL improve the performance of individual classifiers. With
FROM VOCAL FEATURES the help of k-fold cross validation KNN, LDA and
USING RANDOM QDA were used as the base classifiers.
SUBSPACE CLASSIFIER • These were applied on normalized features of the
ENSEMBLE Istanbul PD dataset. They were tested on varied
number of features and with number of KNN, LDA
and QDA learners and it was found that ensemble of
KNN learners out performed the LDA and QDA. It
was seen that with k valued 10 and with 114 KNN
classifiers on 7 dimensional subspaces demonstrated
lowest classification error.
• It was also shown that variation and the type of base
classifier varied the performance of the random
subspace classifier ensemble method.
• Main drawback is  random selection of
feature subspaces because some of
the randomly selected subsets might have poor
discrimination capability

4. An LSTM based Deep Danish Raza Rizvi1 , Iqra 2020 • DNN and LSTM to detect and predict the Parkinson
Learning model for voice- Nissar , Sarfaraz Masood , disease on the UCI Parkinson voice sample dataset.
based detection of Mumtaz Ahmed , Faiyaz • It was observed that using 256, 128 and 64 size for
Parkinson’s disease Ahmad first second and third hidden layers for DNN produced
an accuracy of 97%. Along with of dropout of 0.5
ADAM optimizer to prevent overfitting and
categorical cross entropy loss was used.
• Dropout also made sure model doesn’t rely on single
node. Activation functions relu and softmax was used
in hidden and output layer respectively.
• For LSTM a batch size of 16. Hidden layer size of 32
when trained for 80 epoch produced an accuracy of
99.03%.
• These proposed models outperformed previously
applied and studied methods.
Sl. No. Title Author Year Contributions & Drawbacks

5. Simultaneous Learning of Yongming Li , Yunjian Jia, 2015 • Feature selection algorithm which attains new hybrid
Speech Feature and Segment Xiaoheng Zhang , Cheng features for classification without any feature
for Classification of Zhang, Ping Wang. Tingjie Xie transformation. Istanbul Dataset was split into training
Parkinson Disease and test datasets after hybrid features were constructed
from combining features and segments.
• Hybrid voice features were chosen after normalizing
using loso for constructing a new training set.
• The SVM classifier was applied on datasets for
classification. The results indicated that linear kernel
performed better than rbf kernel and p_value and sdc
were better evaluation criteria compared to corrcoef.
• The classification algorithms svm performed better on
the hybrid selected features than on the original dataset
features with a mean of 82.5% accuracy, 85%
sensitivity and 80% specificity.
• Main drawback is very few samples were considered
for feature selection. new samples could be acquired
for further verification and modification

6. Diagnosis of the Abdullah CALISKAN, Hasan 2017 • A DNN with stacked autoencoder and the softmax
Parkinson Disease by BADEM, Alper BAŞTÜRK , classifier cascaded to one another for the prediction on
using Deep Neural Mehmet Emin YÜKSEL Istanbul PD dataset and OPD dataset with 10 fold cross
Network Classifier validation 30 times.
• DNNs use autoencoders thereby reducing dimension of
the features and softmax layers for the process of
classification when compared with conventional
classifier techniques.
• Several simulations are performed over two databases to
demonstrate the effectiveness of the deep neural network
classifier .
• The proposed DNN model performs better than SVM, DT
and NB classification algorithms with an accuracy of
65.549% and 86.095% on PD and OPD dataset
respectively. The proposed DNN classifier has the ability
to extract hidden features increasing the performance of
the classifier.
Sl. No. Title Author Year Contributions & Drawbacks

7. Classifying Parkinson Lucijano Berus, Simon 2018 • Multiple ANNs were used on PD Dataset using LOSO
Disease based on acoustic Klancnik, Miran Brezocnik and scheme. LOSO has far less bias, and provides
measures using artificial Mirko Ficko practically unbiased prediction.
neural networks • The method involved feature selection and then
selecting best result from multiple ANN classifiers
using majority voting technique. Feature selection was
done using Pearsons, Kendalls correlation coefficient,
pca and self-organizing maps.
• The number 4 and short sentence 4 from the voice
samples contained more value in classifying PD
according to Pearson and Kendall’s correlation
coefficients. NN is fine-tuned, and a test accuracy of
86.47% was achieved.
• Main drawback is performance of ANNs’ could be
improved by using other feature selection procedures
and by additional work on fine-tuning. Several vocal
tests in other languages were not included in the study
and performing the classification on those datasets
would help increase the reach of the models in
predicting PD.

8. A Multiple-Classifier Mahnaz Behroozi and Ashkan 2016 • Instead of considering each and every voice sample for
Framework for Sami the prediction of Istanbul PD author separated each and
Parkinson’s Disease every vocal sample from the PD dataset and applied the
Detection Based on classification algorithms.
Various Vocal Tests • Features were selected based on Pearson Correlation
Coefficient. If the vocal tests had no relevant features,
MCFS and A-MCFS was applied to select them based
on prevalent features to remove unsuccessful voice
sample respectively.
• LOO CVV technique was used on KNN, SVM, Naïve
Bayes and discriminant analysis classifiers where all of
them obtained highest accuracy with A-MCFS method
compared to LOSO, s-LOSO and MCFS methods. Final
classification result would be a majority vote from all of
the classifiers.
• Main drawback is discriminating ability of all the vocal
terms is not the same, even some of those vocal terms
that have been considered to be discriminating in the
literature, such as vowel “a,” failed to be successful.
Further studies on different vocal terms from the
proposed perspective and several vocal tests from
various languages can be studied.
Sl. No. Title Author Year Contributions & Drawbacks

9. Can a Smartphone Mahnaz Behroozi and Ashkan 2017 • SAE for the process of dimension reduction and applied
Diagnose Parkinson Sami various classification algorithms such as KNN, LDA, NB,
Disease? A Deep Neural LSBM, RSVM, CART, KELM, MSVM to predict
Network Method and Parkinson disease on the PD voice telemonitoring datasets.
Telediagnosis System
Implementation • The proposed SAE had a batch size of 20 with two hidden
layers with 10,9,8 and 8,7,6 neurons respectively. It was
shown that SAE with KNN gave the most accurate
classification result on the Istanbul PD dataset. The
dimensional space was low for the proposed method to
remap time frequency features.

• Main drawback is smartphone application designed used


this method to predict PD and observed that noise must not
be present when recording is taken through a smartphone
and the microphone quality must be high for prediction
through the smartphone application with help of the web.

10. A Machine Learning Ismail Cantürk and Fethullah 2016 • Four feature selection algorithms LASSO, Relief, LLBFS
System for the Diagnosis Karabiber and RMR was applied on features of the PD voice dataset.
of Parkinson’s Disease Six classification methods Adabost, LibSVM, MLP,
from Speech Signals and Ensemble, K-NN and NB were applied on the features
Its Application to Multiple obtained through feature selection methods.
Speech Signal Types • The validation methods were 10 cross fold CV and LOSO.
The result using 10 cross fold CV indicated that KNN for
k=3, MLP with 20 neurons, MLP with 10 neurons and KNN
for k=7 performed best for LASSO, Relief, LLBFS, RMR FS
algorithms respectively.
• The result using LOSO using n=12 indicated that LibSVM
linear kernel, Adaboost 10 iteration, LibSBM Linear Kernel,
MLP with 10 neurons performed best for LASSO, Relief,
LLBFS, RMR FS algorithms respectively. The results also
indicated that use of feature selection algorithms improved
performance of the classifiers by a significant amount.
• Main drawback is daily speech and speech variation is
ignored. daily speech is more important, because if patients
are diagnosed with certain phonations, their stress level in the
PD diagnosis process might be raised as voice is easily
affected by our stress or excitement like vibrations in sound.
Sl. No. Title Author Year Contributions & Drawbacks

11. Automated Detection of Liaqat Ali, Ce Zhu, Zhonghao 2019 • PD lacked generalization, provided low accuracy
Parkinson’s Disease Based Zhang, Yipeng Liu predictions and had issues such as subject overlap. A
on Multiple Types of hybrid intelligent system which used LDA and genetic
Sustained Phonations Using algorithm for reducing dimensionality and optimize
Linear Discriminant hyperparameters of the neural networks to predict PD is
Analysis and Genetically proposed.
Optimized Neural Network • The proposed model had low complexity and provided
accuracy of 82.14% on test dataset after balancing the
gender imbalanced dataset.
• This provided a generalized model, highlighted the gender
imbalance in the Istanbul PD dataset and how these
associated features could be eliminated to provide high
accuracy.
• Main drawback is method was not exploited for prodromal
and differential diagnosis which are considered challenging
tasks. Independent dataset which was only collected from
PD patients and is highly imbalanced was used as test data.
missing information about the feature extraction process
such as the extraction of features corrected for pitch was
not considered.

12. Classification He‑Hua Zhang , Liuyang 2016 • Multi edit nearest neighbor algorithm along with an ensemble
of Parkinson’s disease Yang , Yuchuan Liu , Pin learning algorithm was proposed. It was observed that present
utilizing multi-edit Wang , Jun Yin , Yongming Li, classification methods did not consider the optimization of
nearest-neighbor Mingguo Qiu , Xueru Zhu and the sample via selection method. This caused noise as outliers
and ensemble learning Fang Yan to be considered for training.
algorithms with speech • The proposed method MENN performed sample selection and
samples reduced effect of these outliers. MENN removed the
overlapping regions present in sample with different classes
by which the misleading data was supressed.
• The MENN was combined with RF and DNNE method to
improve the accuracy of prediction on the Istanbul PD
dataset.
• Main drawback is compressed speech feature data was not
examined. This would requireverification and research to
further verify and possibly modify the PD_MEdit_EL
algorithm.
Sl. No. Title Author Year Contributions & Drawbacks

13. Parkinsons Disease Anchana Khemphila and Veera 2012 • The authors noticed that previous classifiers had low
Classification using Neural Boonjing accuracy due the fact that the dataset involved more
Network and Feature number of attributes. A Multi layer Perceptron with back
selection propagation learning method to classify the UCI Oxford
PD disease effectively was proposed.
• The features were selected according to importance of
them using information gain. The 22 features were reduced
to 16 features using information gain and ANN were used
to classify on them.
• This method obtained an accuracy of 82.05% and 83.33%
on the training and validation datasets. The results
indicated that reduction in number of attributes increased
the accuracy in the classification of PD.
• Main drawback is information gain is biased toward
variables with large number of distinct values not variables
that have observations with large values. An attributes
(variable) with many distinct values, the information gain
fails to accurately discriminate among the attributes.

14. An efficient diagnosis Hui-Ling Chen, Chang-Cheng 2013 • Fuzzy KNN method was proposed and was compared against
system for detection of Huang, Xin-Gang Yu, Xin Xu, SVM and ANN on the PD dataset. The dataset was
Parkinson’s disease using Xin Sun, Gang Wang and Su- normalized by scaling to range 0 to 1. After reducing features
fuzzy k-nearest neighbor Jing Wang. of the UCI PD dataset using PCA, optimized Fuzzy KNN
approach. classifier was applied.
• The step size was 0.01 for the fuzzy strength parameter m and
multiple analysis was carried for different number of
neighbours k. The Dataset was divided using the 10 fold CV
method to gain an unbiased estimate of generalised accuracy.
• This increased the reliability as the test dataset was
independent. FKNN4 with k as 7 and m as 1.02 performed
much better than SVM linear and SVM RBF with an accuracy
of 95.79%. .
• Main drawback of FKNN is that it is computationally
expensive. Since this model should be run for all data set, it is
time consuming and requires large memory size to store all
training data.
Sl. No. Title Author Year Contributions & Drawbacks

15. Accurate telemonitoring of Athanasios Tsanas, Max A. 2009 • 3 linear and 1 non linear methods of regression was
Parkinson’s disease Little, Patrick E. McSharry, studied to link the voice attribute measures with total and
progression by non-invasive Lorraine O. Ramig motor UPDRS scores on AHTD PD dataset. CART
speech tests outperformed linear methods with small deviation from
interpolated scores.
• CART was seen to have low error for prediction and
performed well in tracking the linear interpolated
UPDRS. Using 1000 runs and 10 fold cv they could
accurately predict motor and total UPDRS score with low
prediction error.
• Linear predictors used performed better than
conventional LS and LASSO methods. Using LASSO
the non classical phonetic attributes also contributed to
right prediction and was backed by AIC and BIC
techniques.
• Main drawback is study was confined to using dysphonia
measures to predict the average clinical overview of the
PD metric UPDRS. Although the dysphonia measures
have physiological interpretations, it is difficult to link
self-perception and physiology.

16. A novel diagnosis system Huseyin Guruler 2017 • Noticing the need of a hybrid system for diagnosis of
for Parkinson’s disease PD disease the a combination of k-means clustering
using complex-valued based feature weighting method and a complex valued
artificial neural network artificial neural network is introduced.
with k-means clustering • The feature based wights method help in achieving high
feature weighting method classification accuracy. The weighting method used
collects like data points and helps in the conversion of
nonlinear separable dataset into linear dataset which is
separable.
• The extracted new features were converted into complex
number format and fed to the neural network as its
input. The proposed method achieved high accuracy of
99.52% with tenfold CV method and 99.39% with 50-
50 training testing data selection method.
• Their method provided a fast and low computational
load classification of PD disease.
Sl. No. Title Author Year Contributions & Drawbacks

17. A comparative analysis of C. Okan Sakar, Gorkem Serbes 2018 • Authors have applied tunable Q factor wavelet
speech signal processing Aysegul Gunduz, Hunkar C. transform(TQWT) to voice signal samples od PD patients.
algorithms for Parkinson Tunc, Hatice Nizam, Betul The required frequency for the band pass filters can be
disease classification and Erdogdu Sakar, Melih Tutuncu, determined by changing Q-factor,oversampling rate and
the use of the tunable Q- Tarkan Aydin, M. Erdem number of analysis level. .
factor wavelet transform Isenkul • Higher Q-values in TQWT analysis made it possible to
obtain narrower frequency responses and helps to obtain
narrower frequency responses which helps to obtain better
decomposition of sub band.
• It was seen that highest accuracy was obtained by using
normal Q-wavelet transform features in multilayer
perceptron classifier. It was noted that increasing Q value
too much such as to 4,5 does not increase the classification
accuracy due to the need of more number of analysis levels.
• Main drawback is the TQWT technique, which has showed
promising results in PD classification problem, can be used
to predict the Unified Parkinson’s Disease Rating Scale
(UPDRS) score of PD patients to build a robust PD
telemonitoring system.

18. Classification of Parkinsons Sajid Ullah Khan 2015 • Proposed idea cluster analysis which is an iterative process
Disease Using Data Mining was applied to modify data pre-processing and model
Techniques parameters until the required properties are achieved.
• The data is passed through the data pre-processing phases
that are data cleaning, recovering missing values and
transformed. After which the three clustering techniques are
applied.
• The techniques that are used are K-NN, Random Forest and
Ada-Boost This is done to get the accurate model for
detecting disease. The accuracy achieved were 90.25%,
87.17, 88.71% for K-NN, Random Forest and Ada-Boost
respectively. It was observed from the results that K-NN was
the best model for classification with highest accuracy.
• Main drawback is SVM was not applied on the reduced
dataset and its comparison to previous works conducted.
Sl. No. Title Author Year Contributions & Drawbacks

19. An Ensemble Method for Razieh Sheibani, Elham 2019 ◦ In this paper authors have applied an ensemble-based
Diagnosis of Parkinson's Nikookar, and Seyed method to identify patients by class label prediction using
Disease Based on Voice Enayatollah Alavi voice frequency characteristics. In this method probability
Measurements of making mistake in determining class label is
significantly reduced.
◦ It have three stages of data pre-processing , internal
classification, and ultimate classification. In first stage
dataset is divided into six subsets according to recorded
voice types. In next stage, prediction models are generated
by applying internal classifiers.
◦ Then, the result of each prediction model is calculated used
as an input for next stage's input. Finally, ultimate
classifiers determine the final class label of the sample. The
authors have used WEKA software which include machine
learning and data mining algorithms for this. .
◦ It was observed from the results that k-NN algorithm with k
value as 1 gave the best performance with 90% accuracy.

20. SVM Classification to Ipsita Bhattacharya and M.P.S 2010 • In this paper authors have used a data mining tool called
Distinguish Parkinson Bhatia Weka to pre-process the dataset. They then use SVM method
Disease Patients to differentiate between healthy people and people with
Parkinson’s disease. The accuracy achieved was 65.217%.
• It was observed from the results that by increasing the value
of cross validation fold the value of true positive rate also
increases and the value of false positive rate decreases.
• This happened because as the cross validation value is
increased the number of training set also increases and
number of test set decreases which leads to increase in
accuracy. It was also observed that accuracy can be increased
by changing the split ratio and repeating the test.
• Main drawback is testing the same dataset on different tools
like in Matlab and compare the efficiency of the two was not
done. With proper partitioning of the dataset better accuracy
can be achieved.
Drawbacks of Existing System
• There are currently no blood or laboratory tests to diagnose
nongenetic cases of Parkinson's disease.
• Parkinson's disease can't be cured, but medications can help
control the symptoms, often dramatically. Early detection and
treatment becomes vital.
• There has been little attempt to summarize and synthesize
qualitative studies concerning the experience and perception
of living with Parkinson’s disease.
• There is need for improvement in accuracy of detection of
Parkinson’s disease.
• Outpatient integrated PD care models may improve patient‐
reported health‐related quality of life compared with standard
care.
Problem Statement
Problem statement :
To develop a web application to detect if a person has Parkinson's disease or
not on the basis of input voice features using supervised machine learning
models.
Input: The training data belongs to 20 PWP (6 female, 14 male) and 20
healthy individuals (10 female, 10 male) who appealed at the Department of
Neurology in Cerrahpasa Faculty of Medicine, Istanbul University. From all
subjects, multiple types of sound recordings (26 voice samples including
sustained vowels, numbers, words and short sentences) are taken. A group of
26 linear and time frequency based features are extracted from each voice
sample. These voice features are used to train and are given as input for the
machine learning model to predict.
Output: Presence of Parkinson’s disease or not
Proposed System
The system proposes a method for detecting Parkinson’s
disease using the voice features using supervised machine
learning classification algorithms – K Nearest Neighbor,
Support Vector Machines, Random forest and Decision
Trees.
A web application where data about patients can be recorded
on daily basis which can be accessed and reviewed by
doctors to monitor the condition of the patients is also
proposed.
Requirement Engineering
Hardware Requirements:
• Processor (CPU) with 2 gigahertz (GHz) frequency or above
• Minimum of 4 GB of RAM and minimum of 2 GB of available space on
the hard disk.
• Internet Connection Broadband connection with a speed of 2 Mbps or
higher.
• NVIDIA GEFORCE GTX graphics card.
Software Requirements:
• Jupyter Notebook (open source web application that you can use to create
and share documents)
• Programming Language – Python
• Python Libraries for machine learning
• Operating System - Windows 7 / Windows 8 / Windows 10
• HTML, CSS, Bootstrap, JavaScript and Flask
Conceptual/Analysis Modelling
Scenarios
A new patient can provide his voice parametres to test if
he has Parkinsons’ or not. In case he is diagnosed with
PD, he can register a new account. Older patients can
login to their accounts and provide daily records for the
doctor’s reference. Doctors can login to their accounts and
check on the patient’s daily records. In case they notice
any sufficient changes, they can interact with the patients
and set up an appointment at a suitable time.
Use Case Diagram
Sequence Diagram
Sequence Diagram
Doctors in PD system
Activity Diagram for Patients
Activity Diagram for Doctors
State Diagram for Patients
State Diagram for Doctors
Class Diagram
Software Requirements Specification
Functional Requirements:
• Classify whether or not a person is suffering from Parkinsons Disease based
on input voice features.
• Users of the web application should be authenticated whenever he/she logs
into the system.
• Only the patient and their concerned doctor have the right to view the
patients history data.
Non-Functional Requirements:
• Machine learning models developed should predict if a person is suffering
from Parkinsons Disease with high accuracy.
• Provide security by removing the access to a user if he/she fails to
authenticate the account several number of times.
• Easy to use user interface.
Project Scheduling
System Design
• System Architecture
Component Design / Module Decomposition
Machine Learning Components
• Dataset collection: In this module, we collect the data from
UCI dataset archives. This dataset contains the information of
voice parameters and the presence of Parkinson’s disease.
• Data Cleaning: In this module data cleaning is done to
prepare the data for analysis by removing or modifying the
data that may be incorrect, incomplete, duplicated or
improperly formatted.
• Feature Extraction: This is done to reduce the number of
attributes in the dataset hence providing advantages like
speeding up the training and accuracy improvements.
• Model training : In this module we use supervised
classification algorithms like knn, svm, random forest and
xgboost to train the model on the cleaned dataset after
dimensionality reduction.
• Testing the trained model: In this module we test the trained
machine learning model using the test dataset.
• Performance Evaluation: In this module, we evaluate the
performance of trained machine learning model using
performance evaluation criteria such as F1 score, accuracy
and classification error. In case the model performs poorly,
we optimize the machine learning algorithms to improve the
performance.
• Prediction of Parkinson’s disease: In this module we use
trained and optimized machine learning model to predict
whether the patient has Parkinson’s disease or not using the
voice features.
Wen Application Components
• Login : This module is used to check credentials and provide
access to the application.
• Registration: This module is used to register new patients in
the web application database. After successful registration
patient can login from next time.
• Patient profile: This module provides information about the
patient and his/her previous prediction test results.
• Disease Prediction: This module is used to predict whether
the patient has Parkinson’s disease or not using the trained
machine learning model based on the patient’s voice feature
inputs.
• Doctor Profile: This module provides information about the
doctor and his/her associated patients.
• Patient History: This module provides past test history of
the patient.
• Appointment : This module helps the patient to book
multiple appointments with his/her respective associated
doctor. The doctor can then accept or reject his appointments.
Both patient and doctor can also view their scheduled
appointments..
• Information Page: This module provides information about
Parkinson’s disease such as symptoms, mental health issues,
advancements about cure and research.
Module Description - Dataset Collection

● We collected the dataset from the UCI dataset archives in the


Machine Learning Repository.
● The main dataset used was the Istanbul dataset for voice
recordings of Parkinson patients.
● The dataset contains the information of voice parameters of
the patients and the presence of Parkinson’s disease found or
not.
Module Description - Data Cleaning

● The dataset was cleaned by removing rows or entries which


fulfilled one of the following criteria:
1. Presence of null values in row
2. Duplicate row entry
3. Outlier elimination - Improper values in case of values
outside specified range or incorrect data type present in one
of the columns
● Columns were renamed if naming was not correct or did not
give sufficient context
Module Description - Feature Extraction

● Reduced number of attributes or columns present in the


dataset.
● K-Means, Silhouette Score were used to select the clusters.
From the selected clusters and correlation analysis in EDA,
the number of attributes reduced from 28 to 13.
● Outliers were eliminated to improve accuracy in finding
priority for each attribute
● Histogram, heat maps and scatter plots were used to compare
one attribute with another and with the class label or result
Module Description - Model training

● Supervised Classification algorithms were used to train the


model on the cleaned dataset after feature extraction and
dimensionality reduction
● The algorithms used include :
1. K Nearest Neighbour
2. Support Vector Machine
3. Random Forest
4. XG Boost
Module Description - Performance evaluation

● The test dataset was formed by doing a 20-80 split on the


cleaned dataset.
● Evaluation was done on the results of the test for accuracy,
recall, precision and classification error using the machine
learning models on the test dataset as discussed previously.
● The performance evaluation criteria include F1 score,
accuracy and classification error.
● In case one of the model performs poorly, we optimise the
machine learning algorithms to improve the performance.
Module Description - Prediction of
Parkinson’s disease
● The trained and optimised machine learning model with the
best accuracy is selected to predict if the user has Parkinson’s
disease or not.
● The input taken from the user is in terms of voice parameters
which have to be entered in form of values.
● The result of this prediction will be used to decide whether
the patient is asked to sign in and book an appointment with a
doctor.
Module Description - Login module

● Login module is used by a user to sign up to his existing


account.
● The login module can be used by patients and doctors.
● Authentication is required in the form of password in order to
access the details for any patients or doctors.
● This authentication is necessary since data privacy for both
patients and doctors is given the utmost priority.
Module Description - Registration

● Registration module is used for users to create a new account


for them to access the website
● The patients need to create an account in order to test whether
they have Parkinson’s disease or not
● The doctors need to create an account in order for them to
access a patient’s details and for the patients to book an
appointment with the doctor
Module Description - Patient profile

● Provides information about the patient.


● These information need to be filled by the patient at the time
of registration of account.
● The result of the Parkinson’s disease test will also be present
and their history of tests taken will be also be recorded
depending on the result of the initial test.
Module Description - Disease Prediction

● Used to predict whether the patient has Parkinson’s disease or


not.
● The patient needs to provide input for their voice feature
parameters in form of numerical values.
● The prediction is done by the trained machine learning model
selected according to best accuracy
● Depending upon the result, the patient will have the ability to
set appointments with a doctor
Module Description - Doctor Profile

● Provides information about the doctor and the list of his / her
associated patients.
● These information need to be filled by the doctor at the time
of registration of account.
● The list of patients will be updated as and when the patients
select an appointment with the doctor
Module Description - Patient History

● Provides information about the patient and his past history of


tests taken and their results.
● This information can be accessed by the corresponding doctor
with whom the patient has an appointment.
● This can help in case a new doctor wants to look at his
patient’s past history and come to suitable conclusion about
his health.
Module Description - Appointments

● The patient can book one or multiple appointments with a


doctor of his choice.
● The doctor can choose to accept or reject an patient’s
appointment.
● When the doctor accepts the appointment, the corresponding
patient can check this in his scheduled appointments.
● The doctor can also check his scheduled appointments with
all of his patients on his page.
Module Description - Information Page

● This page will provide information about Parkinson’s disease


and will act as a bulletin in order to spread awareness and
minimise myths about the disease
● It will include the symptoms, issues regarding mental health,
latest advancements about the cure and research on the
disease
● Is shown and can be read by users who are not registered or
logged in since it is important to spread social awareness
about the disease and its impact of patients lives
Interface Design

• User: The user is a registered person on the web application who is


undergoing some test to determine whether He/ She is having Parkinson
Disease and record their daily activities.
• Task: To determine whether the User is suffering from Parkinson
Disease or not. If the user is suffering from Parkinson disease they can
request for an appointment from the doctor. User can also record their
daily activities.
• Environmental Analysis: The interface will be provided through web
application user can perform their tasks anywhere with the help of
internet connection on the web application. There are no constraint on the
web application through physical medium such as space, light and noise.
The human factor involved in the environment is user input which is the
voice features for the prediction of Parkinson disease.
USER INTERFACE DESIGN ARCHITECTURE
PICTORIAL REPRESENTATION OF USER INTERFACE
Data Structure Design
List : Lists are used to store multiple items in a single
variable.
● In KNN, lists are used to store distances –eucleidian,
Manhattan and hamming. It also stores error rate and
nearest neighbors
● In SVM, list is used to store loss which inlcudes data
points exceeding margin region for hyperplane even with
bias. It also store C and gamma values in GridSearchCV
● In Random Forest, it is used to store tree filter pairs
which includes tree model and filter associated with it.
● In XGBoost, it is used to store max_depth, alpha sub-
sample and learning rate values
Array : Array is a data structure consisting of a
collection of elements, each identified by at least one
array index or key
• In SVM, it is used to store margin value, misclassified
points, prediction value and accuracy scores
• In Random Forest, it is used to store the number of
columns and bagged data for input data
DataFrame : DataFrame is a collection of series where
each series represents a record in the dataset.
• In all the algorithms, input data- train and test data is
stored in the form of a dataframe
Dictionary : A dictionary is a general-purpose data structure
for storing a group of objects. A dictionary is an ordered or
unordered list of key-element pairs, where keys are used to
locate elements in the list.
• In both SVM and XGBoost, dictionary is used to store
the parameter values for optimization
Algorithm Design - KNN
Steps:

1. Split the dataset into test and train datasets


2. Iterate through each row in the training dataset
3. For each row, find the distance between the test data and each
row of training data. The distance metric used can be
Euclidean, Hamming or Manhattan.
4. Sort the calculated distances in ascending order based on the
distance
5. Get the first k rows from the sorted array, k value by default
is taken as 3
6. The prediction can be taken as the most frequently appearing
result in the first k rows
7. The best accuracy among all 3 distance metrics is taken.
KNN Optimisation - Minimising error rate
Steps:

1. Iterate through each value of k from 1 to 40


2. For each value of k, find the accuracy of the model on the test
dataset.
3. Plot the graph for error rate vs k value, and find the suitable
value of k where error rate and k are relatively low
4. Consider that value of k and find the accuracy for each
distance metric.
5. The highest accuracy among the three can be taken as the
optimised accuracy of the trained model.
KNN - Pseudo Code
PURPOSE: Prediction of presence of Parkinson’s disease.
INPUT: Voice features obtained from cleaned dataset after
dimensionality reduction.
OUTPUT: Presence of Parkinson’s disease or not.

STEPS :
1. Calculate accuracy and error rate for different values of k,
plot the error rate vs k graph and find the suitable value of k
for which error rate is low
2. For the selected value of k, find the accuracy for all distance
metrics and choose the metric with the best accuracy as the
selected training model.
Implementation of KNN Algorithm

class distanceMetrics:
def __init__(self):
pass
def euclideanDistance(self, vector1, vector2):
self.vectorA, self.vectorB = vector1, vector2
if len(self.vectorA) != len(self.vectorB):
raise ValueError("Undefined for sequences of unequal length.")
distance = 0.0
for i in range(len(self.vectorA)-1):
distance += (self.vectorA[i] - self.vectorB[i])**2
return (distance)**0.5
def manhattanDistance(self, vector1, vector2):
self.vectorA, self.vectorB = vector1, vector2
if len(self.vectorA) != len(self.vectorB):
raise ValueError("Undefined for sequences of unequal length.")
return np.abs(np.array(self.vectorA) - np.array(self.vectorB)).sum()
def hammingDistance(self, vector1, vector2):
self.vectorA, self.vectorB = vector1, vector2
if len(self.vectorA) != len(self.vectorB):
raise ValueError("Undefined for sequences of unequal length.")
return sum(el1 != el2 for el1, el2 in zip(self.vectorA, self.vectorB))
class kNNClassifier:
def __init__(self,k = 3, distanceMetric = 'euclidean'):
pass
def fit(self, xTrain, yTrain):
assert len(xTrain) == len(yTrain)
self.trainData = xTrain
self.trainLabels = yTrain
def getNeighbors(self, testRow):
calcDM = distanceMetrics()
distances = []
for i, trainRow in enumerate(self.trainData):
if self.distanceMetric == 'euclidean':
distances.append([trainRow, calcDM.euclideanDistance(testRow,
trainRow), self.trainLabels[i]])
elif self.distanceMetric == 'manhattan':
distances.append([trainRow, calcDM.manhattanDistance(testRow,
trainRow), self.trainLabels[i]])
elif self.distanceMetric == 'hamming':
distances.append([trainRow, calcDM.hammingDistance(testRow,
trainRow), self.trainLabels[i]])
distances.sort(key=operator.itemgetter(1))
neighbors = []
for index in range(self.k):
neighbors.append(distances[index])
return neighbors
def predict(self, xTest, k, distanceMetric):
self.testData = xTest
self.k = k
self.distanceMetric = distanceMetric
predictions = []
for i, testCase in enumerate(self.testData):
neighbors = self.getNeighbors(testCase)
output= [row[-1] for row in neighbors]
prediction = max(set(output), key=output.count)
predictions.append(prediction)
return predictions

error_rate = []
for i in range(1,40):
knn = kNNClassifier(k=i)
knn.fit(xTrain,yTrain)
pred_i = knn.predict(xTest,i,'euclidean')
error_rate.append(np.mean(pred_i != y_test))
ALGORITHM DESIGN - SVM
Steps:
1. Split the dataset into train and test datasets
2. Normalise the attribute values in train datasets
3. Initialise C,gamma,bias,weight,train and test dataset variables
4. Create Kernel Matrix for the train datset attribute values
5. For selected no of epoch –
* Calculate the margin value set by the boundary for train dataset
attribute values
* If the margin is less than 1 consider it as a misclassification
* For each misclassification
- Calculate the no of data points in the plot which are exceeding the
margin threshold and store it in form of loss in a loss array
6. After the epoch no of runs plot the graph for loss and print score achieved through
comparing predicted target value with the actual target test value for validation
ALGORITHM - SVM OPTIMISATION

Algorithm :
Steps
1. Initialize C value and gamma values for SVM in the form of
a list
2. We use GridSearch CV method along with the specified C
and gamma values to optimize the SVM algorithm.
GridSearchCV tries all the combinations of the values passed
in the dictionary and evaluates the model for each
combination using the Cross-Validation method. We will be
using this function to get accuracy/loss for every combination
of hyperparameters and then choose the one with the best
performance to set the parameters for the model.
SVM - PSEUDO CODE
PURPOSE: Prediction of presence of Parkinson’s disease.
INPUT: Voice features obtained from cleaned dataset after dimensionality
reduction.
OUTPUT: Presence of Parkinson’s disease or not.

STEPS:
1. Generate hyperplanes which segregates the classes in the best possible
way. There are many hyperplanes that might classify the data. We should
look for the best hyperplane that represents the largest separation, or
margin, between the two classes.
2. We choose the hyperplane so that distance from it to the support vectors
on each side is maximized. If such a hyperplane exists, it is known as
the maximum margin hyperplane. This divides the target values and hence
does the classification
IMPLEMENTATION OF SVM
class SVMPrimalProblem:
def __init__(self, C=1.0, kernel='rbf', sigma=.1, degree=2):

if kernel == 'poly':
self.kernel = self._polynomial_kernel
self.c = 1
self.degree = degree
else:
self.kernel = self._rbf_kernel
self.sigma = sigma

self.C = C
self.w = None
self.b = None
self.X = None
self.y = None
self.K = None
def get_params(self, deep = False):
return {'C':self.C}

def set_params(self, **parameters):


for parameter, value in parameters.items():
setattr(self, parameter, value)
return self
def _rbf_kernel(self, X1, X2):
return np.exp(-(1 / self.sigma ** 2) * np.linalg.norm(X1[:, np.newaxis] - X2[np.newaxis, :],
axis=2) ** 2)
def __decision_function(self, X):
return self.w.dot(self.kernel(self.X, X)) + self.b

def __margin(self, X, y):


return y * self.__decision_function(X)

def score(self, X, y):


prediction = self.predict(X)
return np.mean(y == prediction)

def predict(self, X):


return np.sign(self.__decision_function(X))
def fit(self, X, y, lr=1e-5, epochs=500):
self.w = np.random.randn(X.shape[0])
self.b = 0
self.X = X
self.y = y
self.K = self.kernel(X, X)
loss_array = []
for _ in range(epochs):
margin = self.__margin(X, y)
misclassified_pts_idx = np.where(margin < 1)[0]
d_w = self.K.dot(self.w) - self.C * y[misclassified_pts_idx].dot(self.K[misclassified_pts_idx])
self.w = self.w - lr * d_w
d_b = - self.C * np.sum(y[misclassified_pts_idx])
self.b = self.b - lr * d_b
loss = (1 / 2) * self.w.dot(self.K.dot(self.w)) + self.C * np.sum(np.maximum(0, 1 - margin))
loss_array.append(loss)
plt.plot(loss_array)
plt.title("loss per epochs")
plt.show()
ALGORITHM DESIGN - RANDOM FOREST
n_estimators= The required number of trees in the Random Forest
Step-1: Split the Dataset into train and test datasets.
Step-2:  Choose a set of columns randomly drawn according to the
number request by input data.
Step-3: (Bagging Data) Choose a random row to populate a
bootstrapped dataset maintain the correlation between Train and Test
Data.
Step-4: Create a Decision Tree with the given depth of tree (while
initializing the Random Forest Algorithm).
Step-5: Fit the new DT from step 4 with the Bagged Train and Test data
and only the matching index values are filtered from dataset.
Step-6: Uses the list of tree model build in step 5 to predict the data and
score the accuracy of predicted data with test input values.
RANDOM FOREST - PSEUDO CODE
PURPOSE: Prediction of presence of Parkinson’s disease.
INPUT: Voice features obtained from cleaned dataset after
dimensionality reduction.
OUTPUT: Presence of Parkinson’s disease or not.

STEPS:
1.Select random data points from the training set.  Build the decision
trees associated with the selected data points randomly.Choose the
number for decision trees that you want to build.

2.  For new data points, find the predictions of each decision tree, and
assign the new data points to the category that has majority Score and
also predict the accuracy.
IMPLEMENTATION OF RANDOM FOREST
import collections
import pandas as pd
import numpy as np

class random_forest_classifier:
    
    def __init__(self, n_trees = 10, max_depth=None, n_features='sqrt', mode='rfnode', seed=None):
self.n_trees = n_trees
        self.max_depth = max_depth
        self.n_features = n_features
        self.tree_filter_pairs = []
        self.mode = mode
        if seed:
            self._seed = seed
            np.random.seed(seed)

def find_number_of_columns(self, X):
if isinstance(self.n_features, int):
            return self.n_features
        if self.n_features == 'sqrt':
            return int(np.sqrt(X.shape[1])+0.5)
        if self.n_features == 'div3':
            return int(X.shape[1]/3+0.5)
        else:
            raise ValueError("Invalid n_features selection")
def get_bagged_data(self, X, y):
index = np.random.choice(np.arange(len(X)),len(X))
        return X[index], y[index]

 def randomize_columns(self,X):
 num_col = self.find_number_of_columns(X)
        filt = np.random.choice(np.arange(0,X.shape[1]),num_col,replace=False)
        filtered_X = self.apply_filter(X, filt)
        return filtered_X, filt

def apply_filter(self, X, filt):
filtered_X = X.T[filt]
        return filtered_X.T

def fit(self, X, y):
X = self.convert_to_array(X)
        y = self.pandas_to_numpy(y)
        try:
            self.base_filt = [x for x in range(X.shape[1])]
        except IndexError:
            self.base_filt = [0]
        for _ in range(self.n_trees):
            filt = self.base_filt
            bagX, bagy = self.get_bagged_data(X,y)
            if self.mode == 'rftree':
                bagX, filt = self.randomize_columns(bagX)
            new_tree = decision_tree_classifier(self.max_depth, mode=self.mode, n_features=self.n_features)
            new_tree.fit(bagX, bagy)
            self.tree_filter_pairs.append((new_tree, filt))
def score(self, X, y):
 pred = self.predict(X)
        correct = 0
        for i,j in zip(y,pred):
            if i == j:
                correct+=1
        return float(correct)/float(len(y))

def pandas_to_numpy(self, x):
 if type(x) == type(pd.DataFrame()) or type(x) == type(pd.Series()):
            return x.to_numpy() 
        if type(x) == type(np.array([1,2])):
            return x
        return np.array(x) 

 def handle_1d_data(self,x):
 if x.ndim == 1:
            x = x.reshape(-1,1)
        return x
    
def convert_to_array(self, x):
  x = self.pandas_to_numpy(x)
        x = self.handle_1d_data(x)
        return x
ALGORITHM DESIGN - XGBOOST

Steps:
1. Split the dataset into train and test datasets
2. Construct a base tree using the initial training set.
3. Calculate the similarity weight and total gain for each split and
use total gain to decide priorities of the features.
4. Continue this process to form the complete decision tree.
5. Update the values of the residuals for the initial dataset.
6. Create a new decision tree in the same way as done earlier but
using the updated residual values.
7. Keep updating the residual values after each iteration of creation
of decision tree and repeat the process till the required number of
iterations are succefully performed.
ALGORITHM – XGBOOST OPTIMISATION
Algorithm :
Steps
1. Write the parameters that we want to consider and from these
parameters select the best ones.
2. Create RandomizedSearchCV object and fit the data for various
values of the parameters and print the accuracy.
RandomizedSearchCV tries all the combinations of the values
passed and evaluates the model for each combination. We will be
using this function to get accuracy for every combination of
parameters and choose the one that gives the best accuracy and
performance
XGBOOST - PSEUDO CODE
PURPOSE: Prediction of presence of Parkinson’s disease.
INPUT: Voice features obtained from cleaned dataset after dimensionality
reduction.
OUTPUT: Presence of Parkinson’s disease or not.

STEPS:
1. Create the initial base decision tree using the similarity weight and total
gain for each split using the training set.
2. Update the residual values and create new decision trees using these new
values. This way the new learners learn from the residuals of the
previous model and suppress it in new models.
IMPLEMENTATION OF XGBOOST
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=.2, random_state=123)
xgb_clf = xgb.XGBClassifier(random_state=123)
xgb_clf.set_params(n_estimators=10)
xgb_clf.fit(X_train, y_train)
preds = xgb_clf.predict(X_test)
accuracy = float(np.sum(preds==y_test))/y_test.shape[0]
print("Baseline accuracy:", accuracy)
matplotlib.rcParams['figure.figsize'] = (10.0, 8)
xgb.plot_importance(xgb_clf)
xgb.plot_importance(xgb_clf, importance_type="gain")
df_dmatrix = xgb.DMatrix(data=X,label=y)
params = {"objective":"binary:logistic", 'max_depth': 3}
xgb_cv = xgb.cv(dtrain=df_dmatrix, params=params, nfold=3, num_boost_round=10, seed=123)
accuracy= 1 - xgb_cv["test-error-mean"].iloc[-1]
print("baseline cv accuracy:", accuracy)
xgb_cv = xgb.cv(dtrain=df_dmatrix, params=params, nfold=3,
num_boost_round=40,early_stopping_rounds=10, seed=123)
accuracy= 1 - xgb_cv["test-error-mean"].iloc[-1]
print("accuracy:", accuracy)
xgb_clf = xgb.XGBClassifier(n_estimators=25, random_state=123)
xgb_clf.set_params(max_depth=10)
xgb_clf.fit(X_train, y_train)
preds = xgb_clf.predict(X_test)
accuracy_score(y_test, preds)
xgb_clf.set_params(colsample_bytree=0.5)
xgb_clf.fit(X_train, y_train)
preds = xgb_clf.predict(X_test)
accuracy_score(y_test, preds)
xgb_clf.set_params(subsample=0.75)
xgb_clf.fit(X_train, y_train)
preds = xgb_clf.predict(X_test)
accuracy_score(y_test, preds)
xgb_clf.set_params(gamma=0.25)
xgb_clf.fit(X_train, y_train)
preds = xgb_clf.predict(X_test)
accuracy_score(y_test, preds)
xgb_clf.set_params(learning_rate=0.3)
xgb_clf.fit(X_train, y_train)
preds = xgb_clf.predict(X_test)
accuracy_score(y_test, preds)
xgb_clf.set_params(reg_alpha=0.01)
xgb_clf.fit(X_train, y_train)
preds = xgb_clf.predict(X_test)
accuracy_score(y_test, preds)
rs_param_grid = {
'max_depth': list((range(3,12))),
'alpha': [0,0.001, 0.01,0.1,1],
'subsample': [0.5,0.75,1],
'learning_rate': np.linspace(0.01,0.5, 10),
'n_estimators': [10, 25, 40]
}
xgb_clf = xgb.XGBClassifier(random_state=123)
xgb_rs = RandomizedSearchCV(estimator=xgb_clf,param_distributions=rs_param_grid, cv=3, n_iter=5,
verbose=2, random_state=123)
xgb_rs.fit(X_train, y_train)
Testing
Unit Testing:
Login (Patient & Doctor)
Test case 1:
Test cases which are checked when a Patient or Doctor tries to login from
their respective login page

Test Test Data(input) Expected Result Actual Result Pass/


Fail
1
Username and Display Home Page of Display Home Page of Patient Pass
Password both valid
Patient or Doctor. or Doctor.
2
Username is valid Display an error message Display an error message Pass
but password not indicating username and indicating username and
valid or vice versa.
password combination is password combination is not
incorrect. correct.
3
Username is empty Display an error message Display an error message Pass
or password is indicating username is indicating username is empty
empty. empty or password is or password is empty.
empty.
Registration (Patient & Doctor)
Test case 2:
Test cases which are covered when a Patient or Doctor tries to register on the
website

Test Test Data(input) Expected Result Actual Result Pass/


Fail
1
If all the Required A new Patient or Doctor is A new Patient or Doctor is Pass
fields are provided register to website. register to website.
by patient or doctor.
2
If some of the Display an error message Display an error message Pass
required fields are indicating that some the indicating that some the
missing. missing field are missing. missing field are missing.
3
If Email is already Display an error message Display an error message that Pass
present in website. that the Patient or Doctor the Patient or Doctor already
already exists in the exists in the database.
database.
Appointment Booking (Patient)
Test case 3:
Test cases which are covered when a Patient tries to book an appointment with
a doctor

Test Test Data(input) Expected Result Actual Result Pass/


Fail
1
Selects a doctor from A appointment request is A appointment request is sent Pass
list of Doctors to sent to the Doctor. to the Doctor.
book and
appointment with all
required fields.

2
If any fields missing It should not book an It should not book an Pass
appointment and an alert to appointment and an alert to
enter all fields must be enter all fields must be
displayed displayed
Appointment Acceptance (Doctor)
Test case 4:
Test cases which are covered when a doctor accepts an appointment

Test Test Data(input) Expected Result Actual Result Pass/


Fail
1
If Doctor accepts the Accept the Appointment Accept the Appointment and Pass
appointment request. and also update the status also update the status to
to accept. accept.
2
If Doctor does not the appointment stays as the appointment stays as Pass
accept the pending pending
appointment.
Info And Activity( Patient and Doctor)
Test case 5:
Test cases which cover the patient info tab in the site.

Test Test Data(input) Expected Result Actual Result Pass/


Fail
1
If Patient visits a It Show correct It Show correct information Pass
particular doctor. information of Patient and of Patient and the associated
the associated Doctor. Doctor.
2
If Patient History of Doctor can only view Doctor can only view patient Pass
Particular Patient is patient data of only those data of only those patients he
viewed by a Doctor. patients he is associated is associated with
with
Integrated manual testing
Test Test Data(input) Expected Result Actual Result Pass/
Fail

1 Attributes entered by Parksinsons is detected Parkinson is detected Pass


a patient suffering
from Parkinson’s
disesase
2 Attributes entered by Parkinsons is not Parkinsons is not Pass
a healthy person. detected detected

3 User leave a field Generate the error Model doesn’t process Pass
empty while entering “empty field detected” the input because of
voice attribute values wrong number of
attributes passed and an
alert is released to enter
values of all fields
4 User enters negative Generate the error Model process the input Pass
value as one of the “negative value with the negative values
input value for entered” and an alert is thrown to
prediction enter only positive
numerical values
Applications
• To help in early detection of Parkinson’s disease which
would help in early diagnosis thus slowing down disease
progression.

• Provide machine learning models which would increase


accuracy of detection of the Parkinson’s disease.

• To help or aid current research to find a cure for the


Parkinson’s disease.

• Implementing a web application to take in voice features and


predict whether the person is suffering for Parkinson’s
disease or not and monitor the patients on day to day basis.
Conclusion
This project aims to provide a better prediction for
Parkinson’s disease by optimizing and tuning the
parameters of KNN, SVM and Random Forest
Algorithms. This project also provides a reliable health
monitoring system for those suffering from Parkinson’s
disease. The proposed web application also provides a
real time appointment and feedback system which helps
in gathering valuable data which aids the current
research to find a cure for Parkinson’s disease.
CONFERENCE PROCEEDINGS
Thanks to Suneetha madam’s guidance our paper
“Detection of Parkinson’s Disease using Machine
Learning” has been accepted in Comesyso – 2021
conference with Springer publication.
References
[1]B. E. Sakar et al., "Collection and Analysis of a Parkinson Speech Dataset With Multiple
Types of Sound Recordings," in IEEE Journal of Biomedical and Health Informatics, vol. 17,
no. 4, pp. 828-834, July 2013, doi: 10.1109/JBHI.2013.2245674.
[2]Benba, A., Jilbab, A. & Hammouch, A. Analysis of multiple types of voice recordings in
cepstral domain using MFCC for discriminating between patients with Parkinson’s disease
and healthy people. Int J Speech Technol 19, 449–456 (2016).
[3]Ö. Eskıdere, A. Karatutlu and C. Ünal, "Detection of Parkinson's disease from vocal features
using random subspace classifier ensemble," 2015 Twelve International Conference on
Electronics Computer and Computation (ICECCO), Almaty, 2015, pp. 1-4, doi:
10.1109/ICECCO.2015.7416886.
[4]Nissar, Iqra & Rizvi, Danish & Masood, Sarfaraz & Mir, Aqib. (2018). Voice-Based Detection
of Parkinson’s Disease through Ensemble Machine Learning Approach: A Performance Study.
EAI Endorsed Transactions on Pervasive Health and Technology. 5. 162806. 10.4108/eai.13-
7-2018.162806.
[5]Y. Li, C. Zhang, Y. Jia, P. Wang, X. Zhang and T. Xie, "Simultaneous learning of speech
feature and segment for classification of Parkinson disease," 2017 IEEE 19th International
Conference on e-Health Networking, Applications and Services (Healthcom), Dalian, 2017,
pp. 1-6, doi: 10.1109/HealthCom.2017.8210820.
[6]Caliskan, Abdullah & Badem, Hasan & Basturk, Alper & Yüksel, Meltem. (2017). Diagnosis
of the Parkinson disease by using deep neural network classifier. Istanbul University - Journal
of Electrical and Electronics Engineering. 17. 3311-3318.
[7]Berus L, Klancnik S, Brezocnik M, Ficko M. Classifying Parkinson's Disease Based on
Acoustic Measures Using Artificial Neural Networks. Sensors (Basel). 2018;19(1):16.
Published 2018 Dec 20. doi:10.3390/s19010016
[8]Mahnaz Behroozi, Ashkan Sami, "A Multiple-Classifier Framework for Parkinson’s Disease
Detection Based on Various Vocal Tests", International Journal of Telemedicine and
Applications, vol. 2016, Article ID 6837498, 9 pages, 2016.
https://doi.org/10.1155/2016/6837498
[9]Y. N. Zhang, "Can a Smartphone Diagnose Parkinson Disease? A Deep Neural Network
Method and Telediagnosis System Implementation", Parkinson’s Disease, vol. 2017, Article
ID 6209703, 11 pages, 2017. https://doi.org/10.1155/2017/6209703
[10]Cantürk, İ., Karabiber, F. A Machine Learning System for the Diagnosis of Parkinson’s
Disease from Speech Signals and Its Application to Multiple Speech Signal Types. Arab J Sci
Eng 41, 5049–5059 (2016). https://doi.org/10.1007/s13369-016-2206-3
[11]L. Ali, C. Zhu, Z. Zhang and Y. Liu, "Automated Detection of Parkinson’s Disease Based on
Multiple Types of Sustained Phonations Using Linear Discriminant Analysis and Genetically
Optimized Neural Network," in IEEE Journal of Translational Engineering in Health and
Medicine, vol. 7, pp. 1-10, 2019, Art no. 2000410, doi: 10.1109/JTEHM.2019.2940900.
[12]Zhang HH, Yang L, Liu Y, et al. Classification of Parkinson's disease utilizing multi-edit
nearest-neighbor and ensemble learning algorithms with speech samples. Biomed Eng Online.
2016;15(1):122. Published 2016 Nov 16. doi:10.1186/s12938-016-0242-6
[13]Khemphila, A. , Boonjing, V. (2012). 'Parkinsons Disease Classification using Neural
Network and Feature Selection'. World Academy of Science, Engineering and Technology,
Open Science Index 64, International Journal of Mathematical and Computational Sciences,
6(4), 377 - 380.
[14]Chen, Huiling & Huang, Chang-Cheng & Yu, Xin-Gang & Xu, Xin & Sun, Xin & Wang, Su-
Jing. (2013). An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-
nearest neighbor approach. Expert Systems with Applications. 40. 263–271.
10.1016/j.eswa.2012.07.014.
[15]A. Tsanas, M. A. Little, P. E. McSharry and L. O. Ramig, "Accurate Telemonitoring of
Parkinson's Disease Progression by Noninvasive Speech Tests," in IEEE Transactions on
Biomedical Engineering, vol. 57, no. 4, pp. 884-893, April 2010, doi:
10.1109/TBME.2009.2036000.
[16]Gürüler, H. A novel diagnosis system for Parkinson’s disease using complex-valued artificial
neural network with k-means clustering feature weighting method. Neural Comput & Applic
28, 1657–1666 (2017). https://doi.org/10.1007/s00521-015-2142-2
[17]Sakar, C. Okan & Serbes, Gorkem & Gunduz, Aysegul & Tunc, Hunkar & Nizam, Hatice &
Sakar, Betul & Tutuncu, Melih & Aydin, Tarkan & Isenkul, Muhammed & Apaydin, Hulya.
(2018). A comparative analysis of speech signal processing algorithms for Parkinson’s disease
classification and the use of the tunable Q-factor wavelet transform. Applied Soft Computing.
74. 10.1016/j.asoc.2018.10.022.
[18]Khan SU. Classification of Parkinson’s Disease Using Data Mining Techniques. J Parkinsons
Dis Alzheimer Dis. 2015;2(1): 4.
[19]Sheibani R, Nikookar E, Alavi SE. An Ensemble Method for Diagnosis of Parkinson's
Disease Based on Voice Measurements. J Med Signals Sens. 2019;9(4):221-226. Published
2019 Oct 24. doi:10.4103/jmss.JMSS_57_18
[20]Bhattacharya, Ipsita & Bhatia, Mohinder :Pal Singh. (2010). SVM classification to
distinguish Parkinson disease patients. 14. 10.1145/1858378.1858392.

You might also like