You are on page 1of 8

Proceedings of the International Conference on Sustainable Computing and Data Communication Systems (ICSCDS-2023)

IEEE Xplore Part Number: CFP23AZ5-ART; ISBN: 978-1-6654-9199-0

Analysis and Implementation of Machine Learning


2023 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS) | 978-1-6654-9199-0/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICSCDS56580.2023.10104661

Model for Detection of Parkinson’s Disease


1st Kumar Spandan 2nd Biresh Kumar 3rd Pallab Banerjee
Amity Inst. of Information Technology Amity School of Engg. and Technology Amity School of Engg. and Technology
Amity University Jharkhand Amity University Jharkhand Amity University Jharkhand
Ranchi, INDIA Ranchi, INDIA Ranchi, INDIA
kumar.spandan@s.amity.edu bkumar@rnc.amity.edu pbanerjee@rnc.amity.edu

4th Pooja Jha 5th Laxmi Kumari Pathak 6th Mohan Kumar Dehury
Amity Inst. of Information Technology Amity Inst. of Information Technology Amity Inst. of Information Technology
Amity University Jharkhand Amity University Jharkhand Amity University Jharkhand
Ranchi, INDIA Ranchi, INDIA Ranchi, INDIA
pjha@rnc.amity.edu lkpathak@rnc.amity.edu mkdehury@rnc.amity.edu
mohankdehury@gmail.com

Abstract—Parkinson’s disease, which affects 2-3% of people microphonic, singular-chromatic, dysarthria, and dysphonia,
under 65 years of age, is the second most prevalent neurological and that the first symptom discovered in persons with this
ailment. The neuropathological indicators of Parkinson’s disease condition is a lack of voice. There is now no documented
are intracellular inclusions containing clumps of synuclein and
neuronal death in the substantia nigra, which results in striatal therapy for the condition, but a range of pharmaceutical
dopamine insufficiency. The central and peripheral autonomic medications that can greatly lessen symptoms, especially
nerve systems contain numerous more cell types that are also in the initial stages, are available. The analysis of voice
involved, most likely beginning with the first stages of the disease. frequency is brief and not invasive. As an outcome, voice
Although bradykinesia and other essential motor characteristics frequency can be utilized to monitor the course of this
are required for clinical diagnosis, Parkinson’s disease is also ac-
companied by a wide range of non-motor symptoms that increase subjective sickness [1].
overall disability. Numerous routes and mechanisms, including
synuclein proteostasis, mitochondrial function, oxidative stress, Parkinson’s disease causes additional problems, some of
calcium homeostasis, axonal transport, and neuroinflammation, which are manageable with medication or other forms of
are involved in the underlying molecular etiology. This paper therapy. Parkinson’s disease is a major global health concern
aims to implement the SVM algorithm to predict Parkinson’s
disease in a patient. that might be greatly reduced with the aid of machine learning
Index Terms—Parkinson’s Disease, SVM, KNN, ANN, Machine and innovative computer algorithms in the field of medical
Learning. science.

I. I NTRODUCTION II. PROBLEM STATEMENT


The brain is the body’s primary functioning unit, and any A. Importance of the study
minor mishap in any part of the body has an instant impact The study is significant because a significant portion
on the other organs. One of the undiagnosed adverse effects of diagnostic data analysis in the healthcare sector, has
is Parkinson’s disease. It is an incurable neurological disorder been carried out by medical professionals. Due to the
that develops over time. 9.4 million people worldwide will complexity or variety of characteristics among the data that
still be suffering are estimated to still suffer from this disease are being handled, the presentation of medical photographs
by 2020. Only 4% of the cases occur in people under 50 years is therefore highly restricted to particular professionals with
of age. This disease mainly affects adults over the age of 60 in-depth knowledge in the field. The followings are the main
years. A motor or non-motor symptom is associated with this contributions and implications of the paper: In terms of
condition. There are a variety of motor symptoms associated information gathering, feature extraction, feature set selection,
with Parkinson’s disease, including sluggishness, tremors, completely different classifiers, and result comparison
rapid eye movement problems, shivering, gait difficulty, and arrangement, we will (a) provide a comprehensive review
unstable posture. The most common non-motor complaints that includes the latest research publications and (b) provide
are hypotension, body sweating, fatigue, constipation, urinary a good range of comparisons from various points of view.
difficulties, and weight loss.
Neurodegenerative disorders affect the nervous system
Several studies have found that 90% of Parkinson’s disease and brain, such as Parkinson’s disease. It is characterized
patients have speech and voice acoustic difficulties, including by degeneration of the neurological system, specifically

978-1-6654-9199-0/23/$31.00 ©2023 IEEE 14


Authorized licensed use limited to: AMITY University. Downloaded on July 25,2023 at 10:54:55 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Sustainable Computing and Data Communication Systems (ICSCDS-2023)
IEEE Xplore Part Number: CFP23AZ5-ART; ISBN: 978-1-6654-9199-0

brain neuron degeneration. Only Alzheimer’s disease is more expectancy increases. For decades, researchers have worked to
common than Parkinson’s disease among neurodegenerative understand much about this disease, and hence to find some
diseases. A majority of the neurons that produce dopamine techniques for successfully minimizing its symptoms, which
are affected in the substantia nigra, a specific portion of are often recurrent muscular tremors and/or rigidity. In the
the brain. Parkinson’s disease causes the loss of dopamine- later stages of Parkinson’s syndrome, further symptoms such
producing neurons, resulting in the inability to control any as akinesia, bradykinesia, and speech impairment may arise.
voluntary activities. This disorder can cause both motor and Symptoms usually begin modestly and worsen over time. As
non-motor symptoms, such as tremors, sluggish movement, the disease progresses, people may have difficulty speaking
sleep problems, posture problems, sadness, and others [2]. and walking. They may also encounter behavioral and mental
problems, sleep troubles, depression, memory loss, and fatigue
A number of machine learning models, including SVM, [5].
KNN, ANN, Random Forest technique, etc., are being
presented and have assisted in the early detection of D. Solution
Parkinson’s disease. Given that symptomatic therapy may Parkinson’s disease is a major global health concern that
be ineffective and time-consuming, a neuro-image-based might be greatly reduced with the aid of machine learning and
diagnosis for Parkinson’s disease may also be acceptable [3]. innovative computer algorithms in the field of medical science.
In addition to other medical scans such as Positron Emission We can develop algorithms like a support vector classifier or
Tomography (PET), Magnetic Resonance Imaging (MRI), a machine to detect it at an early stage.
Functional Magnetic Resonance Imaging (fMRI), and others,
Single-photon Emission Computed Tomography (SPECT) III. OBJECTIVE AND GOAL
is the functional imaging technique most commonly used The goal of this work is to compare different existing
in European clinics for the early diagnosis of Parkinson’s machine learning models for detecting Parkinson’s disease and
disease [4]. to implement our model to identify the same.
A. Goal
Clinical decisions are often based on a physician’s intuition
and knowledge, which is supported by hospital-wide data. To detect Parkinson’s disease using vocal features.
The enormous dimensions of information in data sets require
B. Specific Objectives
the discovery of approaches that can be used in therapeutic
choices. This data should be assessed for medical research The objectives are:
and use in health centers. • To compare the different machine-learning models
• To analyze their accuracy score
B. Background of the Study • To implement our SVM model
The brain is the main functioning unit of the body, and any
small incident in any part of the human body will have an IV. LITERATURE REVIEW
immediate impact on the other organs. Parkinson’s disease is Many authors have studied Parkinson’s disease prediction
an inoperable neurological condition that worsens over time. using Machine learning methods. Paper [2] provides a
By 2020, an estimated 9.4 million people worldwide were complete survey of the most recent research papers published
still living with this condition. Only 4% of the cases occur in up to the year 2017. Based on vocal datasets obtained from the
those under the age of 50 years of age. This condition mainly UCI repository, it compares the accuracy of existing classifiers
affects people over the age of 60 years. The symptoms of this and validates the performance of implemented classifiers. In
disease are classified as motor or non-motor. The slowness of [6], authors have reviewed and addressed numerous issues
movement, tremors, rapid eye movement disorder, shivering, while also providing some future recommendations and
gait difficulty, and unstable posture are the most common opportunities. In particular, this review offers significant
motor symptoms. insights and guidance for future advancements regarding
neural networks and associated learning systems.
Currently, there is no documented therapy for the condition,
but a variety of pharmaceutical medications are available that Authors in [7] have characterized this disorder by tremors,
can greatly reduce symptoms, especially in the initial stages. muscle rigidity, and incorrect walking motions. They have
The analysis of voice frequency is brief and not invasive. As studied previous attempts to distinguish Parkinson’s disease
an outcome, voice frequency can be utilized to monitor the from healthy subjects, but this work focused on differentiating
course of this subjective sickness [3]. Parkinson’s disease from other neurological-specific diseases
like Huntington’s Disease and Amyotrophic lateral sclerosis
C. The Problem (ALS) based on gait characteristics. The authors in [4]
The number of patients with Parkinson’s disease is calcu- suggested an ML model for a given DaTSCAN that is
lated to be 120-180 per 100,000 persons, although the propor- precisely classified as Parkinson’s disease or not as well as
tion (and thus the number of affected people) increases as life offers a reasonable basis for the prediction. Visual indicators

978-1-6654-9199-0/23/$31.00 ©2023 IEEE 15


Authorized licensed use limited to: AMITY University. Downloaded on July 25,2023 at 10:54:55 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Sustainable Computing and Data Communication Systems (ICSCDS-2023)
IEEE Xplore Part Number: CFP23AZ5-ART; ISBN: 978-1-6654-9199-0

created by Local Interpretable Model-Agnostic Explainer system’s data dimensionality influences both the training and
(LIME) are used to get such reasoning. Transfer learning was run-time phases. Meanwhile, excessive dimensionality may
used to train DaTSCANs on a CNN (VGG16) using data result in dimensionality issues.
from the Parkinson’s Progression Markers Initiative.

The authors in paper [8] described how, due to its widespread


applicability, speech signal processing has received a great V. METHODOLOGY
deal of attention in recent years. The authors have done In this paper, we have used primarily used three of the major
a comparative investigation for efficient identification of machine learning algorithm to detect Parkinson’s disease in
Parkinson’s disease using machine learning classifiers from a patient. The three algorithms are SVM, ANN, and KNN.
a vocal disorder known as dysphonia in this work. To In this paper, our entire concern is to implement a working
demonstrate a robust identification technique, they used model that detects this disease with the help of some feature
Artificial Neural Networks (ANN) and K Nearest Neighbours vectors. The support vector machine algorithm has been used,
(KNN) algorithms to distinguish between Parkinson’s disease as it is most likely to give an accuracy score above 75. The
patients and healthy individuals. artificial neural networks (ANN) and the k-nearest neighbor
(KNN) algorithm have also been included in this study to
Authors in [9] have used Artificial Neural Networks understand the working of these machine learning models for
(ANN) to diagnose medical disorders. Two types of ANNs effectively identifying Parkison’s disease. A host of python
are employed in this work to classify effective Parkinson’s libraries have been used such as Sklearn, pandas, NumPy, and
disease diagnosis. Multilayer Perceptron (MLP) with back- SVM in order to complete our execution successfully. The
propagation learning method and Radial Basis Function (RBF) integrated development environment is Google Collab, and the
ANNs were used to distinguish between clinical factors of programming language is Python.
samples (N = 195) with and without Parkinson’s disease. The
study extensively detailed the accuracy difference obtained in A. System Requirements and Tools:
both scenarios. Tools and Software Used:
• Software: Google Collab
The study in [10] created an extended fuzzy min-max • Operating System: 10 pro
neural network with the OneR attribute evaluator (EFMM • RAM: 8GB
OneR) as a hybrid model for diagnosing Parkinson’s disease. • Cache Memory: 32MB
The proposed model is separated into two stages: Feature • SSD: 256GB
selection is used in the early step to locate and remove th
• Processor: Intel Core i3 10 generation
irrelevant, redundant, or noisy characteristics from the • Language: Python 3.10
provided dataset. The enhanced fuzzy min-max (EFMM)
neural network is used for classification in the second stage. B. How SVM works.
Comparing the EFMM-OneR model to other classifiers in the SVM works by extracting the information to a high-
literature, the results showed that it can increase classification dimensional feature set, which allows data points to
accuracy. be categorized even if they are not otherwise linearly
independent. When a splitter between these categories is
In paper, [8], feature selection and classification algorithms found, the data is converted so that the separator can be
have been studied separately without considering the shown as a hyper-plane. Once that, the attributes of the
relationship between the two procedures, resulting in poor changed data can be used to identify which category the new
performance. We provide an algorithm learning-based record should be assigned to. The datapoints closest to the
neural network (ALBNN), a novel neural network approach hyperplane are the feature vectors in this case.
for boosting classification accuracy by combining feature
selection and classification operations, in this research. In A broad range of applied math tools are used to select
general, a knowledge-based artificial neural network acts on feature vectors. Thirteen separate options comparable to
previous knowledge derived from domain expertise, providing bridging frequencies [13], maximum to maximum attributes
it with better starting points for the objective function and [5], and frequency bands were done for each of the seven gait
resulting in higher classification accuracy. variables, producing a total of 13 × 7 = 91 distinct feature
vectors. These early vectors were then bated before being
Machine learning algorithms are frequently employed in evaluated with the vector machine. These were then lowered
the analysis of various types of data [11], [12]. However, by running the vector machine on each vector individually.
as large-scale data sets proliferate, many machine-learning C=1 and sigma=1 were used in the Radial Basis Kernel SVM.
approaches suffer from intractability issues. Although several
reasons contribute to intractability, one of the most important The SVM could be binary classifier that is not even
is the amount of qualities, or dimensionality. A learning probabilistic that classifies contemporary samples. It is a

978-1-6654-9199-0/23/$31.00 ©2023 IEEE 16


Authorized licensed use limited to: AMITY University. Downloaded on July 25,2023 at 10:54:55 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Sustainable Computing and Data Communication Systems (ICSCDS-2023)
IEEE Xplore Part Number: CFP23AZ5-ART; ISBN: 978-1-6654-9199-0

supervised kernel-based approach. that 1st analyses knowledge


with notable categories before classifying ambiguous test
samples. The linear vector machine is that the most elementary
sort of a classifier, during which instances could be arranged
as points in house so divided into 2 with the help of definite
void with the dimension of the gap maximised [14]. It
builds for classification or regression, a descriptor in an
unlimited vector space is used. A hyper-plane with the finest
separation of two data points has a very low classifier error.
Support vectors are knowledge points that aid within the
formulation of this hyperplane. They’re those who are nearest
to the hyperplane. SVM is accomplished in either a linear
or non-linear fashion. once the linear edge feature space is
established doesn’t yield a finest match, random Support
Vector Machine produces higher results. Non-Linear SVM
maximises hyperspace by reworking feature area employing
a kernel technique .For classification, a Gaussian radial basis
perform (rbf) kernel is employed during this work. Given a
group of n data points, comparable to (x1 , y1 ) . . . (xn , yn )
[7]. Where xi originally is the original knowledge, yj may
be a binary symbol (one or minus one) denoting the category
to which data xi corresponds, and x is the reworked dataset
information belonging to linear SVM’s latest space. Equations
connect kernel perform to a convert:
K(xi , xj ) = ϕ(xi ).ϕ(xj ) (1)
Fig. 1. Process flow diagram of proposed method
Where K(xi , xj ) is the equation of kernel, ϕ(xi ) is the
changed data set, and ϕ(xj ) is the hyperspace’s traditional
vector. The equation of kernel for the Gaussian kernel (rbf) is thousands to hundreds of thousands of characteristics, it’s
as follows: a fantastic alternative. Due to their propensity for dealing
K(xi, xj) = exp(−|x − x′ |2 /2σ 2 ) (2) with small, complex datasets, they frequently produce more
accurate findings when compared to other algorithms.
wherever |x − x′ |2 a pair of is that the Euclidian distance
squared. And σ is a free attribute. In this work, the value of VI. I MPLEMENTATION
C is chosen by employing a grid search strategy, with C is set
A. Importing major libraries of Python on Machine learning
to 1000 exhibiting rock-bottom classification error. The data
points were adjusted to own the smallest unit, variance before Numpy is a Python package for controlling arrays. It
exploiting the coaching data. The separation hyperplane was also has a function for dealing with algebraic topology
found using the ordered lowest optimization with the leniency add matrices. Numpy is an anagram for numerical Python.
level of the conditions set to 3. The information was segregated Python, often known as pandas, is a Python library for
into training data and testing data samples, along with the data handling and analysis. It provides data structure and
primary seven sample data of every category (atomic number operations for manipulating numerical tables and time
46 and supervision) utilized as coaching & also the remaining series in particular. Sklearn from Scikit Learn is the most
samples as analysis. The process flow of proposed method is effective and dependable Python machine-learning package
shown in Fig. 1. in this case. It provides a variety of tools for mathematical
analysis and machine learning, including classification
C. Role of SVM dependent variables and others, via a Python consistency
SVMs are utilized in a variety of applications, including web interface. SVM is used in web pages, intrusion detection,
pages, intrusion detection, face identification, email catego- face identification, genes classification add handwriting
rization, and gene classification. We utilize SVMs in machine recognition. In other applications, we utilise SVM in
learning for a number of reasons. Like SVM is useful in both machine learning for other reasons including detecting
classification and regression. Also, both linear and non-linear Parkinson’s disease. Both classification and regression of
data are supported. Another reason for utilizing SVMs is that linear and nonlinear data are supported in SVM. The sklean.
they may uncover intricate connections in your data without matrices module implements several laws, scores, and utility
requiring you to perform a tonne of manual manipulations. functions to measure performance. Some matrices might
When working with smaller datasets that include tens of require probability estimates of the positive class, confidence

978-1-6654-9199-0/23/$31.00 ©2023 IEEE 17


Authorized licensed use limited to: AMITY University. Downloaded on July 25,2023 at 10:54:55 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Sustainable Computing and Data Communication Systems (ICSCDS-2023)
IEEE Xplore Part Number: CFP23AZ5-ART; ISBN: 978-1-6654-9199-0

value, or binary decision value. Accuracy score function


of sklearn.matrices package calculates the estimated score
for a set of credited labels against the true labels. Sklearn’s
preprocessing package includes a variety of universal utility
functions and transforms classes that can be used to convert
untreated feature vectors into format vectors suitable for
subsequent estimators. In general, normalizing the dataset
benefits algorithm development.

Using StandardScaler() each variable of the scale


is scaled to unit variance. This process is carried out
independently on the basis of features. StandardScaler()
estimates the empirical mean and std deviation of each feature
if they are present in the dataset. Python train test split()
divides arrays or matrices into random sepsis for the
training and testing set of data we will import the train
test split function into our Google Collab as shown. In Fig. 3. Dataset information.
pd.readc sv(′ /content/parkinsons.csv ′ ), we have imported
our data set on Parkinson’s disease (the disease attributes of the
subset rows for classification), which will be used to predict
whether a person has Parkinson’s or not. The data set used
here is from Kaggle.com [15]. P arkinsons data.head()
shows the first five rows of our imported data set as shown
in Fig. 2.

Fig. 2. Top rows of the dataset. Fig. 4. Information about missing values.

B. Displaying Data information


E. Displaying status and mean of the two binary statuses
We can see details about the data set by using the info
function. It has the syntax Parkinsons data.info (). So, the In Fig. 6 we have shown different statuses for different input
info method prints the data frame information, which includes data sets which are primarily 0 and 1. ‘One’ represents Parkin-
the total number of columns, their labels, data types, memory son’s disease and ‘zero’ represents no Parkinson’s disease. The
utilised, raise index, and the number of cells in each column parkinsons data[‘status’].value counts() helps to get to know
that have non-null values as shown in Fig. 3. how many of those rows have ‘1’ as status and how many
of them has ‘0’. From the data, we can interpret that healthy
C. Displaying number of null values in each field attribute people have more voice frequency than Parkinson hit people.
Isnull().sum() function shows a number of values missing
in each column. As shown in Fig. 4, there is no missing value
in the dataset. It is important for all the field attributes in the
dataset to have values in order to detect the disease as it plays
a vital role in getting high accuracy measures for both training
and testing of the data.

D. Describing the data in a statistical manner


The data set consists of 196 samples and 24 features.
describe() displays different statistical measures which are
drawn from our data ( as shown in Fig. 5 ) like mean, std,
count, min, etc. These will further help us to train our data. Fig. 5. different statistical measures of dataset.

978-1-6654-9199-0/23/$31.00 ©2023 IEEE 18


Authorized licensed use limited to: AMITY University. Downloaded on July 25,2023 at 10:54:55 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Sustainable Computing and Data Communication Systems (ICSCDS-2023)
IEEE Xplore Part Number: CFP23AZ5-ART; ISBN: 978-1-6654-9199-0

Fig. 6. Status and mean of the two binary statuses.

F. Using drop() command into a specific range. Next, we used scaler.fit() which will fit
In this phase, the data pre-processing has started. We will all the data. The scaler.transform() function will convert the
separate features and targets. Here ‘features’ are the other data in the same range. Now the output range of X train is
columns in the dataset and ‘target’ is the status column. Now in between (-1 to +1) whereas, previously our values of the
two variables X and Y are created. X will take all the features dataset used to lie in hundreds. The obtained output is shown
column and Y will take the status column. In X we will drop in Fig. 9.
the ‘name’ column and the ‘status’ column of our dataset and
we will store it in X as shown in Fig. 7.

Fig. 8. Use of scalar function.

Fig. 7. Parkinson’s disease status.

G. Splitting the data into training and testing data


To educate the data, four arrays must be generated: X train,
Y train, X test, and Y test. We will create the arrays using
the train test split function which was already imported in
the header file. In this stage, training data and testing data
are getting split into two different arrays. Here test size=0.2
means 20%. That is, we used 20 percent of total of the data
for evaluation and the remaining 80% towards training. Also,
random state=2 is an identity number. Here ‘2’ ensures that
the data is splitted in organised manner. If it is more than Fig. 9. Range of output values between -1 to +1.
two, then the data could be splitted in an unorder fashion.
I. Model Training
In this stage, we are training our data using a machine
H. Data Standardization learning model: SVM (Support Vector Machine). We can use
In data standardization, we are loading one instance of the SVM for both classification and regression. In this case, we
StandardScaler() function to the scaler variable as shown in use for classification because we will classify patients into
Fig. 8. Here, we used a scaler variable to standardize our data Parkinson-positive patients and Parkinson-negative patients.

978-1-6654-9199-0/23/$31.00 ©2023 IEEE 19


Authorized licensed use limited to: AMITY University. Downloaded on July 25,2023 at 10:54:55 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Sustainable Computing and Data Communication Systems (ICSCDS-2023)
IEEE Xplore Part Number: CFP23AZ5-ART; ISBN: 978-1-6654-9199-0

model.fit() function displays the parameters for support vector


classifiers. It will fit our data points into the model and will
try to find the hyperplane. When the data can be split using
a single line, or when it is linearly separable, a linear kernel Fig. 12. Accuracy score of testing data.
is utilized. It is one of the most often utilized kernels. It is
typically employed when a given data set has a sizable number
of features. In our case, as we are classifying patients into a person has Parkinson’s or not. Here we have taken the field
Parkinson-positive patients and Parkinson-negative patients, values from the .csv file that holds different values and has a
we find linear kernel to be most suitable. As shown in Fig. status of either 0 or 1. We had previously deliberately removed
10, we have used kernel=’linear’ for training purposes. After the status column while training the data and hence we will be
this, the training of our data is done. able to find whether our model is predicting it correctly. Here
we took a NumPy array as the processing of NumPy array is
much easier. Upon reshaping the array (to give one data point),
we will standardize the data using Scaler. transform(). At last,
we are using prediction using the model. predict(std data).
Fig. 10. SVC kernel Here, if the print statement is ‘1’, the person has Parkinson
and if it is ‘0’, the person is healthy. As shown in Fig. 13, we
have correctly predicted that the patient is Parkinson-positive.
J. Model Evaluation
This is the model evaluation stage where we find out the
accuracy score of our training data and testing data. The
accuracy score in machine learning is a statistical measurement
that compares the proportion of accurate predictions made by
a model to all predictions made. It is determined by dividing
the number of correct predictions by the total number of Fig. 13. Parkinson’s disease prediction.
predictions which is given as follows:
number of correct predictions
Accuracy score = (3)
total number of predictions VII. R ESULT A NALYSIS
Here the training data accuracy achieved was 0.8846 or A. KNN
88.46 in percentage as shown in Fig. 11. An interpretation
K-NN could be a statistic classification technique
of this could be that our model is able to predict 88 cases
employed in machine learning [16]. The end result of the
successfully out of 100 cases. Generally, if the accuracy score
KNN technique is set by the style of output needed for the
particular applications. If K is set to 1, the genre of that
object’s sole nearest neighbor is allocated. According to the
associate economical designation system for paralysis, agitans
Fig. 11. Accuracy score of training data.
detection supported the fuzzy k-nearest neighbor method
[10]. The dataset is the same as that indicated within the
is more than 75%, it is assumed that our model is working text on top of below the speech impairment data to find
fine. The training time for the model is 4 seconds. Parkinson’s disease. The subjects ranged in age from forty-six
to eighty-five years (statistical average of sixty five point
Now, the accuracy score can be analyzed using eight and normal statistical deviation of nine point eight).
model.predict() and hence finding the accuracy score Statistical averages of six intonations lasting one to thirty-six
with the help of X train and Y train prediction. Our accuracy seconds were recorded for every individual. The info was
score of test data comes out to be 0.8717 or 87.17 in then divided into coaching (70%) and check (30%) groups [1].
percentage as shown in Fig. 12. In both cases, our model
should have nearly the same accuracy. If there is a very To store all of the coaching datasets, the K-nearest neighbor
wide gap between training and testing data accuracy, we classifier rule was utilized. The author tends to employ a
can interpret that our model has overtrained/Overfitting ten-fold cross-validation approach, which may be a solid
(Accuracy of training data > Accuracy of test data) or technique oftentimes accustomed to verify the prognostic
undertrained/Underfitting (Accuracy of training data < system’s accuracy and avoid overfitting. The new example
Accuracy of test data)], based on the accuracy score. was classified exploitation the quantity of neighbours, and
therefore the nearest neighbours were determined using
K. Predictive System the gap function exploitation of the trigonometric function
Our final stage of the paper is Parkinson’s Prediction distance, the simplest accuracy score of 79% was found once
System. Here we will use different values to predict whether the amount of neighbours were taken at 1 or k set to 1 [8].

978-1-6654-9199-0/23/$31.00 ©2023 IEEE 20


Authorized licensed use limited to: AMITY University. Downloaded on July 25,2023 at 10:54:55 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Sustainable Computing and Data Communication Systems (ICSCDS-2023)
IEEE Xplore Part Number: CFP23AZ5-ART; ISBN: 978-1-6654-9199-0

TABLE I R EFERENCES
C OMPARISION OF ML MODELS
[1] M. Wodzinski, A. Skalski, D. Hemmerling, J. R. Orozco-Arroyave, and
ML model Accuracy Score E. Nöth, “Deep learning approach to parkinson’s disease detection using
KNN 79% voice recordings and convolutional neural network dedicated to image
ANN 83.02% classification,” in 2019 41st Annual International Conference of the
SVM 87.17% IEEE Engineering in Medicine and Biology Society (EMBC). IEEE,
2019, pp. 717–720.
[2] G. Pahuja and T. Nagabhushan, “A comparative study of existing
machine learning approaches for parkinson’s disease detection,” IETE
Journal of Research, vol. 67, no. 1, pp. 4–14, 2021.
B. ANN [3] M. Ricci, G. Di Lazzaro, A. Pisani, N. B. Mercuri, F. Giannini,
and G. Saggio, “Assessment of motor impairments in early untreated
In recent years, Artificial Neural Network or the (ANN)- parkinson’s disease patients: the wearable electronics impact,” IEEE
based diagnosing of clinical disorders has received loads of at- Journal of Biomedical and Health Informatics, vol. 24, no. 1, pp. 120–
tention. during this paper, 2 forms of ANNs are utilized to clas- 130, 2019.
[4] P. R. Magesh, R. D. Myloth, and R. J. Tom, “An explainable machine
sify effective degenerative disorder diagnosis. The Multilayer learning model for early detection of parkinson’s disease using lime
Perceptron with Precedence Propagation Teaching Method on datscan imagery,” Computers in Biology and Medicine, vol. 126, p.
and Radial Basis Operation (RBF). ANNs were trained to 104041, 2020.
[5] Y. Matsumoto, M. Seki, T. Ando, Y. Kobayashi, H. Iijima, M. Nagaoka,
differentiate between medical parameters in samples (total of and M. G. Fujie, “Analysis of emg signals of patients with essential
195) with and without Parkinson’s disease [17]. An ANN is an tremor focusing on the change of tremor frequency,” in 2012 Annual
attributed chaotic system with a connected graph topology that International Conference of the IEEE Engineering in Medicine and
Biology Society. IEEE, 2012, pp. 2244–2250.
accepts data output through a state reaction to input activities. [6] A. Rana, A. Dumka, R. Singh, M. K. Panda, N. Priyadarshi, and
Nodes of the ANN are processor parts and directed channels. B. Twala, “Imperative role of machine learning algorithm for detec-
The accuracy score obtained using ANN is 83.02% [18]. tion of parkinson’s disease: Review, challenges and recommendations,”
Diagnostics, vol. 12, no. 8, p. 2003, 2022.
[7] S. Shetty and Y. Rao, “Svm based machine learning approach to
identify parkinson’s disease using gait analysis,” in 2016 International
C. SVM Conference on Inventive Computation Technologies (ICICT), vol. 2.
IEEE, 2016, pp. 1–5.
Using the SVM model on the dataset as discussed in the [8] O. Asmae, R. Abdelhadi, C. Bouchaib, S. Sara, and K. Tajeddine,
previous section the accuracy score is obtained to be 87.17% “Parkinson’s disease identification using knn and ann algorithms based
on voice disorder,” in 2020 1st International Conference on Innovative
as shown in Fig. 12. The accuracy score of the employed Research in Applied Science, Engineering and Technology (IRASET).
machine learning models in this paper is presented in table I. IEEE, 2020, pp. 1–6.
[9] F. S. Gharehchopogh and P. Mohammadi, “A case study of parkinson’s
disease diagnosis using artificial neural networks,” International Journal
VIII. CONCLUSION of Computer Applications, vol. 73, no. 19, 2013.
[10] H.-L. Chen, C.-C. Huang, X.-G. Yu, X. Xu, X. Sun, G. Wang, and
S.-J. Wang, “An efficient diagnosis system for detection of parkinson’s
In this paper, we have recognized and implemented three disease using fuzzy k-nearest neighbor approach,” Expert systems with
supervised learning machine-learning algorithms in our applications, vol. 40, no. 1, pp. 263–271, 2013.
[11] B. Kumar, S. Roy, A. Sinha, C. Iwendi, and L. Strážovská, “E-commerce
evaluation. Following that, the performance of the three website usability analysis using the association rule mining and machine
classifiers used in the prediction of Parkinson’s disease was learning algorithm,” Mathematics, vol. 11, no. 1, p. 25, 2022.
evaluated using various statistical approaches. In the tests, [12] R. Vunnava, L. Bodla, M. K. Dehury, and B. K. Mohanta, “Perfor-
mance analysis of ml techniques in identification of fake news,” in
we prepared and tested each categorization algorithm on a 2022 International Conference on Sustainable Computing and Data
training set that included both positive and negative datasets. Communication Systems (ICSCDS). IEEE, 2022, pp. 276–281.
The preliminary performance results show that the SVM [13] S. Rissanen, M. Kankaanpää, M. P. Tarvainen, J. Nuutinen, I. M. Tarkka,
O. Airaksinen, and P. A. Karjalainen, “Analysis of surface emg signal
outperformed the other two classifiers on the Parkinson morphology in parkinson’s disease,” Physiological measurement, vol. 28,
datasets. no. 12, p. 1507, 2007.
[14] D. Surangsrirat, C. Thanawattano, R. Pongthornseri, S. Dumnin,
C. Anan, and R. Bhidayasiri, “Support vector machine classification
In the future, this study will help in the development of parkinson’s disease and essential tremor subjects based on temporal
of a project that aims to design an automated program that fluctuation,” in 2016 38th Annual International Conference of the IEEE
can react more precisely to normal occurrences of the said Engineering in Medicine and Biology Society (EMBC). IEEE, 2016,
pp. 6389–6392.
disease and provide more informed choices in complex [15] [Online]. Available: https://www.kaggle.com/datasets/vikasukani/parkinsons-
situations. The program will be able to diagnose Parkinson’s disease-data-set
disease in a matter of minutes and warn users of the [16] P. Hall, B. U. Park, and R. J. Samworth, “Choice of neighbor order in
nearest-neighbor classification,” the Annals of Statistics, vol. 36, no. 5,
possibility of contracting the disease. This can be incredibly pp. 2135–2152, 2008.
beneficial in areas where there is a scarcity of healthcare [17] M. Tsuda, S. Asano, Y. Kato, K. Murai, and M. Miyazaki, “Differential
institutions and physicians. The model can be further scaled diagnosis of multiple system atrophy with predominant parkinsonism
and parkinson’s disease using neural networks,” Journal of the Neuro-
by gathering data from many clinical and medical institutes logical Sciences, vol. 401, pp. 19–26, 2019.
locally. Ensemble methods may be preferred for establishing [18] A. I. Galushkin, Neural networks theory. Springer Science & Business
an accurate model of these Parkinson’s disease predictions Media, 2007.
and performance could be enhanced.

978-1-6654-9199-0/23/$31.00 ©2023 IEEE 21


Authorized licensed use limited to: AMITY University. Downloaded on July 25,2023 at 10:54:55 UTC from IEEE Xplore. Restrictions apply.

You might also like