Professional Documents
Culture Documents
Abstract— Parkinson's is a complex disease. That includes applications. It is used when it is challenging to establish
numerous physical, mental, and neurological assessments. The traditional algorithms.
Assessments incorporate examining various signs and side
effects, exploring the clinical history, and checking the sensory Classification is a prediction method [4]. Classification in
system states. In our proposed method, we applied five different machine learning is to anticipate the class of a given dataset.
classifiers. This paper aims to increase the accuracy and Classes are known as labels. They are also known as targets
optimize the feature used in the dataset. We used the hybrid or categories.
feature selection method for optimization, which combines the
wrapper and filter methods. The method name is Recursive
Feature selection needs in machine learning to improve
Feature Elimination. The highest accuracy was gained by the efficiency or to improve accuracy for the machine learning
classifier XGBoost, 97.43%, with nine features out of 22 model. It is used to remove the unrelated features, which
features. features are not related to the model [5]. The feature that we
use as a train data to train the machine learning model has a
Keywords— Parkinson, XGBoost, SVM, KNN, Classification Feature selection needs in machine learning to improve
efficiency or to improve accuracy for the machine learning
I. INTRODUCTION model. It is used to remove the unrelated features, which
Parkinson's sickness is a degenerative mess [1]. The features are not related to the model [5]. The feature that we
degenerative condition is the continuous fall of the format of use as a train data to train the machine learning model has a
neurons. This illness mostly influences the motor system. The In our proposed method, we predict Parkinson’s disease
motor system means the set of central architecture in the using the hybrid feature selection method combined with the
nervous system which supports the movement. The syndrome filter method and wrapper method. We at first normalize the
of Parkinson's disease rises slowly. The non-motor features, then we use the feature selection filtering method and
syndromes started to show when the disease gets worse. One selected only 11 features. We also applied the RFE method
of the syndromes is difficulty with walking. Other syndromes and got only nine features out of 22 features. Finally, we
are rigidity and slowness of action and tremor. These applied the XGBoost classifier and got an accuracy of 97.43%
symptoms are early symptoms. For Parkinson cognitive by using the nine features that we got from the hybrid feature
problems can also occur. Cognitive means mental activity as selection method.
well as the process of acquiring knowledge. For Parkinson's
disease, behavioral issues can also happen with anxiety when II. RELATED WORKS
the disease is at an advanced level. Those who have
Parkinson's disease can also have sleep problems. Also, they Tarigoppula V.S Sriram et al. [6] proposed the Random
can have issues with the sensory system. In this disease, the Forest algorithm, KNN, SVM, KStar, and Naïve Bayes
cells of the substantia nigra, which is situated in the midbrain, classifiers for classification and achieved 90.26%, 88.90%,
become dead. 82.50%, 89.74%, and 69.23% accuracy, respectively. Among
The World Health Organization(WHO) published these classifiers, Random Forest shows the highest accuracy
Parkinson’s disease data in 2018. According to that data, death trailed by KStar classifiers. The least accuracy was achieved
for Parkinson’s disease in Bangladesh arrived up to 1363 [3]. from Naïve Bayes in the light of the PD dataset. The dataset
It is 18% death. The death rate is 1.32 per 1000000 is the same as our dataset [19].
community. Based on Parkinson’s disease, Bangladesh
Andres Ortiz et al. [7] used the Convolutional neural
ranked 159 in the world. The highest-rated country is Ireland.
network (CNN) for classification. This paper proposed the
Machine learning is a field where we can study computer utilization of isosurfaces as an approach to separate the
algorithms [2]. The algorithms develop undoubtedly through applicable data from 3D DatSCAN pictures to utilize as
experience. We can say that machine learning is a part of contributions of CNN designs. This system accomplishes
artificial intelligence. The model is established on sample accuracy and AUC as 95.1% and 97%, respectively.
data, which is called "training data." The machine learning Inzamam et al. [8] used the Ensemble method using
algorithm build model to make predictions or agreement PASW Benchmark and applied the six classifiers such as
without being exceptionally programmed to do so. Machine KNN, Quest, C5.0, SVM, Logistic regression, Bayesian
learning algorithms are very much valuable for some broad
Network. This paper showed how the ensemble method gave And then we got the result. Fig. 1 is the block diagram of the
better accuracy than single classifiers. For the testing dataset, proposed method.
the accuracy gained 95.31%. For the training dataset, the A. Data Representation
accuracy gained 98.47%. The ensemble technique beats
single classifiers' shortcomings and is utilized to join their For our proposed method, we collected the Parkinson
qualities to foresee classes. The proposed method collected a dataset from UCI Machine Learning Repository [6] that was
dataset from UCI[19] that is also used in our proposed also used in many existing methods. The data representation
of the Parkinson dataset has shown in Table I.
method. TABLE I. DATA REPRESENTATION OF PARKINSON DATASET
Mostafa et al. [9] used three classifiers, such as decision tree,
Feature Range
neural network, and naïve Bayes, to diagnose Parkinson's Description
Name
disease. In this paper, they applied the 10-fold cross-validation It is an ASCII subject name and Unique for each
process. By applying the Decision Tree, the highest accuracy Name
also recording number patient
score and the Root Mean Square Error (RMSE) gained were MDVP: The average of vocal fundamental [88.333,260.105]
91.63% and 0.2701, respectively. For the Naïve Bayes Fo(Hz) frequency (VFF)
classifier, the highest accuracy score and RMSE achieved MDVP: [102.145,
were 89.46% and 0.2668, respectively. In Neural Network, Maximum of VFF
Fhi(Hz) 592.03]
the highest accuracy score was 91.01%, and the RMSE was MDVP:
Minimum of VFF
[74.997,239.17]
the lowest, 0.2871. Flo(Hz)
MDVP: Several measures of fluctuation in [0.00567,0.00784
Shivangi, Anubhav et al. [10] used the Deep Neural Jitter(%) fundamental frequencies (FF) ]
Network. In this paper, two modules are used. The accuracy MDVP: Several measures of fluctuation in [0.00003,0.00007
of these two modules is 88.17% and 89.15%, respectively, on Jitter(Abs) fundamental frequencies (FF) ]
the testing dataset. The result is compared with three major MDVP: Several measures of fluctuation in [0.00295,0.0037]
algorithms, which are SVM, XGBoost, and MLP, and RAP fundamental frequencies (FF)
MDVP: Several measures of fluctuation in [0.00317,0.00554
achieved the best result. PPQ fundamental frequencies (FF) ]
KNN is a classification method that was developed by Several measures of fluctuation in [0.00885,0.01109
Joseph Hodges in the year of 1951 [11]. It was later expanded Jitter: DDP
fundamental frequencies (FF) ]
by Thomas Cover. This algorithm is used in classification and MDVP: Several measures of fluctuation in [0.01884,0.04374
also in regression. Shimmer amplitude ]
MDVP: [0.19,0.426]
Several measures of fluctuation in
In KNN, The k is a positive integer that is typically small Shimmer(d
amplitude
[11]. An object is classified. There are some neighbors of the B)
object, and there is a plurality vote. This happens when KNN Shimmer: Several measures of fluctuation in [0.01026,0.02182
APQ3 amplitude ]
is in classification. Shimmer: Several measures of fluctuation in [0.01161,0.0313]
XGBoost (eXtreme Gradient Boosting) is domineering the APQ5 amplitude
applied machine learning field and also the Kaggle MDVP: Several measures of fluctuation in [0.01373,0.02971
competitions field[12]. It is designed for speed and APQ amplitude ]
performance. Shimmer: Several measures of fluctuation in [0.03078,0.06545
DDA amplitude ]
XGBoost is one kind of scalable learning system [19]. It is a It measures the ratio of noise to [0.02211,0.04398
part of large-scale machine learning. It is a package that will NHR tonal components, which is in the ]
solve data science problems. The XGBoost is fast and voice.
optimized for out-of-core computations. It is useful for data- It is measures of the ratio of noise to
tonal components, which is in the
science problems as well as it can be put as a part of a HNR voice [21.033,21.209]
production pipeline. It is also able to focus on scalability. 1 -Parkinson’s
XGBoost takes input and produces models in the beginning. Status Subjects health status disease, 0 - healthy
It was designed to be a closed package.
Many existing methods are proposed to predict
Parkinson's disease. But the correct classification rate is low. RPDE IT is dynamical complexity [0.414783,0.462803]
Also, many existing methods used all features for
[2.301442,
classification. Features that are not relevant to the D2 IT is dynamical complexity 2.555477]
classification, can decrease the performance of the
classification result. So, in this paper, we used the hybrid [0.664357,
DFA It is a signal fractal scaling exponent 0.815285]
feature selection technique by combining filter method and It is a nonlinear measure of FFV or
wrapper method and then apply classification technique using Spread1 fundamental frequency [0.414783,0.462803]
those individual classifiers that are used in many existing
methods to predict Parkinson disease and got highest It is nonlinear measures of FFV or [0.190667,
accuracy than the existing methods. spread2 fundamental frequency variation 0.266482]