You are on page 1of 5

Parkinson disease prediction using feature selection

technique in machine learning


Tamim Wasif Md. Inzamam Ul Hossain
Computer science and engineering Computer science and engineering Mariam Sultana
North Western University North Western University Computer science and engineering
Khulna, Bangladesh Khulna, Bangladesh North Western University
deviltamim88@gmail.com cse.inzamam@yahoo.com Khulna, Bangladesh
mariammitu007@gmail.com
Asif Mahmud
Computer science and engineering
North Western University
Khulna, Bangladesh
am456636@gmail.com

Abstract— Parkinson's is a complex disease. That includes applications. It is used when it is challenging to establish
numerous physical, mental, and neurological assessments. The traditional algorithms.
Assessments incorporate examining various signs and side
effects, exploring the clinical history, and checking the sensory Classification is a prediction method [4]. Classification in
system states. In our proposed method, we applied five different machine learning is to anticipate the class of a given dataset.
classifiers. This paper aims to increase the accuracy and Classes are known as labels. They are also known as targets
optimize the feature used in the dataset. We used the hybrid or categories.
feature selection method for optimization, which combines the
wrapper and filter methods. The method name is Recursive
Feature selection needs in machine learning to improve
Feature Elimination. The highest accuracy was gained by the efficiency or to improve accuracy for the machine learning
classifier XGBoost, 97.43%, with nine features out of 22 model. It is used to remove the unrelated features, which
features. features are not related to the model [5]. The feature that we
use as a train data to train the machine learning model has a
Keywords— Parkinson, XGBoost, SVM, KNN, Classification Feature selection needs in machine learning to improve
efficiency or to improve accuracy for the machine learning
I. INTRODUCTION model. It is used to remove the unrelated features, which
Parkinson's sickness is a degenerative mess [1]. The features are not related to the model [5]. The feature that we
degenerative condition is the continuous fall of the format of use as a train data to train the machine learning model has a
neurons. This illness mostly influences the motor system. The In our proposed method, we predict Parkinson’s disease
motor system means the set of central architecture in the using the hybrid feature selection method combined with the
nervous system which supports the movement. The syndrome filter method and wrapper method. We at first normalize the
of Parkinson's disease rises slowly. The non-motor features, then we use the feature selection filtering method and
syndromes started to show when the disease gets worse. One selected only 11 features. We also applied the RFE method
of the syndromes is difficulty with walking. Other syndromes and got only nine features out of 22 features. Finally, we
are rigidity and slowness of action and tremor. These applied the XGBoost classifier and got an accuracy of 97.43%
symptoms are early symptoms. For Parkinson cognitive by using the nine features that we got from the hybrid feature
problems can also occur. Cognitive means mental activity as selection method.
well as the process of acquiring knowledge. For Parkinson's
disease, behavioral issues can also happen with anxiety when II. RELATED WORKS
the disease is at an advanced level. Those who have
Parkinson's disease can also have sleep problems. Also, they Tarigoppula V.S Sriram et al. [6] proposed the Random
can have issues with the sensory system. In this disease, the Forest algorithm, KNN, SVM, KStar, and Naïve Bayes
cells of the substantia nigra, which is situated in the midbrain, classifiers for classification and achieved 90.26%, 88.90%,
become dead. 82.50%, 89.74%, and 69.23% accuracy, respectively. Among
The World Health Organization(WHO) published these classifiers, Random Forest shows the highest accuracy
Parkinson’s disease data in 2018. According to that data, death trailed by KStar classifiers. The least accuracy was achieved
for Parkinson’s disease in Bangladesh arrived up to 1363 [3]. from Naïve Bayes in the light of the PD dataset. The dataset
It is 18% death. The death rate is 1.32 per 1000000 is the same as our dataset [19].
community. Based on Parkinson’s disease, Bangladesh
Andres Ortiz et al. [7] used the Convolutional neural
ranked 159 in the world. The highest-rated country is Ireland.
network (CNN) for classification. This paper proposed the
Machine learning is a field where we can study computer utilization of isosurfaces as an approach to separate the
algorithms [2]. The algorithms develop undoubtedly through applicable data from 3D DatSCAN pictures to utilize as
experience. We can say that machine learning is a part of contributions of CNN designs. This system accomplishes
artificial intelligence. The model is established on sample accuracy and AUC as 95.1% and 97%, respectively.
data, which is called "training data." The machine learning Inzamam et al. [8] used the Ensemble method using
algorithm build model to make predictions or agreement PASW Benchmark and applied the six classifiers such as
without being exceptionally programmed to do so. Machine KNN, Quest, C5.0, SVM, Logistic regression, Bayesian
learning algorithms are very much valuable for some broad
Network. This paper showed how the ensemble method gave And then we got the result. Fig. 1 is the block diagram of the
better accuracy than single classifiers. For the testing dataset, proposed method.
the accuracy gained 95.31%. For the training dataset, the A. Data Representation
accuracy gained 98.47%. The ensemble technique beats
single classifiers' shortcomings and is utilized to join their For our proposed method, we collected the Parkinson
qualities to foresee classes. The proposed method collected a dataset from UCI Machine Learning Repository [6] that was
dataset from UCI[19] that is also used in our proposed also used in many existing methods. The data representation
of the Parkinson dataset has shown in Table I.
method. TABLE I. DATA REPRESENTATION OF PARKINSON DATASET
Mostafa et al. [9] used three classifiers, such as decision tree,
Feature Range
neural network, and naïve Bayes, to diagnose Parkinson's Description
Name
disease. In this paper, they applied the 10-fold cross-validation It is an ASCII subject name and Unique for each
process. By applying the Decision Tree, the highest accuracy Name
also recording number patient
score and the Root Mean Square Error (RMSE) gained were MDVP: The average of vocal fundamental [88.333,260.105]
91.63% and 0.2701, respectively. For the Naïve Bayes Fo(Hz) frequency (VFF)
classifier, the highest accuracy score and RMSE achieved MDVP: [102.145,
were 89.46% and 0.2668, respectively. In Neural Network, Maximum of VFF
Fhi(Hz) 592.03]
the highest accuracy score was 91.01%, and the RMSE was MDVP:
Minimum of VFF
[74.997,239.17]
the lowest, 0.2871. Flo(Hz)
MDVP: Several measures of fluctuation in [0.00567,0.00784
Shivangi, Anubhav et al. [10] used the Deep Neural Jitter(%) fundamental frequencies (FF) ]
Network. In this paper, two modules are used. The accuracy MDVP: Several measures of fluctuation in [0.00003,0.00007
of these two modules is 88.17% and 89.15%, respectively, on Jitter(Abs) fundamental frequencies (FF) ]
the testing dataset. The result is compared with three major MDVP: Several measures of fluctuation in [0.00295,0.0037]
algorithms, which are SVM, XGBoost, and MLP, and RAP fundamental frequencies (FF)
MDVP: Several measures of fluctuation in [0.00317,0.00554
achieved the best result. PPQ fundamental frequencies (FF) ]
KNN is a classification method that was developed by Several measures of fluctuation in [0.00885,0.01109
Joseph Hodges in the year of 1951 [11]. It was later expanded Jitter: DDP
fundamental frequencies (FF) ]
by Thomas Cover. This algorithm is used in classification and MDVP: Several measures of fluctuation in [0.01884,0.04374
also in regression. Shimmer amplitude ]
MDVP: [0.19,0.426]
Several measures of fluctuation in
In KNN, The k is a positive integer that is typically small Shimmer(d
amplitude
[11]. An object is classified. There are some neighbors of the B)
object, and there is a plurality vote. This happens when KNN Shimmer: Several measures of fluctuation in [0.01026,0.02182
APQ3 amplitude ]
is in classification. Shimmer: Several measures of fluctuation in [0.01161,0.0313]
XGBoost (eXtreme Gradient Boosting) is domineering the APQ5 amplitude
applied machine learning field and also the Kaggle MDVP: Several measures of fluctuation in [0.01373,0.02971
competitions field[12]. It is designed for speed and APQ amplitude ]
performance. Shimmer: Several measures of fluctuation in [0.03078,0.06545
DDA amplitude ]
XGBoost is one kind of scalable learning system [19]. It is a It measures the ratio of noise to [0.02211,0.04398
part of large-scale machine learning. It is a package that will NHR tonal components, which is in the ]
solve data science problems. The XGBoost is fast and voice.
optimized for out-of-core computations. It is useful for data- It is measures of the ratio of noise to
tonal components, which is in the
science problems as well as it can be put as a part of a HNR voice [21.033,21.209]
production pipeline. It is also able to focus on scalability. 1 -Parkinson’s
XGBoost takes input and produces models in the beginning. Status Subjects health status disease, 0 - healthy
It was designed to be a closed package.
Many existing methods are proposed to predict
Parkinson's disease. But the correct classification rate is low. RPDE IT is dynamical complexity [0.414783,0.462803]
Also, many existing methods used all features for
[2.301442,
classification. Features that are not relevant to the D2 IT is dynamical complexity 2.555477]
classification, can decrease the performance of the
classification result. So, in this paper, we used the hybrid [0.664357,
DFA It is a signal fractal scaling exponent 0.815285]
feature selection technique by combining filter method and It is a nonlinear measure of FFV or
wrapper method and then apply classification technique using Spread1 fundamental frequency [0.414783,0.462803]
those individual classifiers that are used in many existing
methods to predict Parkinson disease and got highest It is nonlinear measures of FFV or [0.190667,
accuracy than the existing methods. spread2 fundamental frequency variation 0.266482]

It is nonlinear measures of FFV or [0.148569 ,


III. PROPOSED METHOD PPE fundamental frequency variation 0.284654]
In our work, we applied the filter method to the
Parkinson's dataset after we got optimized features. After we B. Feature selection
executed classifiers on the features. After applying the Feature selection selects a subset of features [18].
wrapper method on Parkinson's dataset, we again executed Feature selection is used for simplifying the model so that it
classifiers on the features. After applying RFE (Recursive can be
Feature Elimination), we executed classifiers on the features.
of feature, which we are calling quasi constant, is not very
useful in predictions. For the variance of quasi constant
features, there is no rule as to what should be the threshold.
The quasi-constant features should be removed that have
more than 99% similar values for the output observations.
We can say two features duplicate if they have similar values
[13]. The duplicate features don’t add any value to the
algorithm training. The duplicate features add overhead and
unnecessary delay to the training time. That’s why the
duplicate features should be removed from the dataset.
Mutual information is a part of the filter method [14]. It is
determined between two variables.
Like duplicate features in the dataset, the dataset can
contain correlated features [13]. If two or more features are
close to each other in the linear space, then we can say these
two or more features are correlated to each other. For an
example of correlated features, take a dataset of the fruit
basket. Here the weight of the fruit basket is correlated with
the price. The more the weight of the fruit basket, the higher
the price of it.
D. Wrapper method
In Step Forward, Feature Selection selects the best
features and which feature performs better in the algorithm
Fig. 1. Feature selection method
model [15]. It continues the process until the required
easier to interpret by researchers or users. Feature features are not selected. We applied this method using our
selection removes redundant or irrelevant features from the four classifiers.
dataset, and there is no loss of information [18]. The features Step backward feature selection is nearly analogous to
can be strongly correlated. step forward feature selection [15]. The step backward
C. Filter method feature selection works backward. It also removes features
Filter methods select features independently from the until the required features are not selected.
machine learning algorithm models [13]. The independently E. Recursive feature elimination
selecting feature process is one of the big advantages of filter
methods. The selected features by the filtering method can be Recursive Feature Elimination, which is known as RFE,
used as input to any machine learning model. Another is a feature selection algorithm. [16]
advantage of the filter method is that it is very fast. In any It is a wrapper-type feature selection algorithm [16]. The
feature selection pipeline filtering method is the first step. meaning of wrapper-type feature selection algorithm is that
The filter method can be divide into two parts. there are different types of machine learning algorithms that
• Univariate Filter Methods are used in this method, and the algorithm is wrapped by
RFE, and the algorithm helps to select features. This feature
• Multivariate filter methods selection method scores the features and selects the features
In the univariate filtering method, the features are ranked which have a higher score.
according to specific criteria [13]. After ranking, top N F. Classifiers
features are selected. There are different types of ranking We used XGBoost classifier after using feature selection
criteria in the univariate filtering method, such as fisher score, filter methods to check the performance by using the
mutual information, and variance of the feature. There is a optimized features achieved from the feature selection
disadvantage of the univariate filtering method that is it may method. We also applied Random Forest, SVM, KNN, and
select redundant features. This method is ideal for removing Neural network. In Recursive feature elimination, we also
constant and quasi-constant features from the data. This used those classifiers and collected the best accuracy.
method doesn't take into account the relationship between
individual features. IV. EXPERIMENTAL RESULT
Multivariate filtering methods can remove redundant In Parkinson’s dataset, there are no constant features.
features because they take the mutual relationship between Also, there are no quasi constant features. Parkinson’s dataset
features into account [13]. It is used to remove duplicate and also doesn’t have any duplicate features. There are 11
correlated features from data too. correlated features. We got this information by executing the
Constant features are features that consist of one value feature selection filter method.
[13]. Constant features are not helpful in the classification. The RFE accuracy of XGBoost, Random Forest, and
That's why the constants features should be removed from the SVM is given in a line graph form in Fig. 2.
dataset. The blue line is the accuracy of XGBoost, and the orange
We can say a quasi feature constant that is almost constant line is the accuracy of Random Forest, and the ash color line
[13]. In other words, it is said that this type of feature has the is the accuracy of SVM. The left side has percentage
same values for a very large subset of the outputs. This type accuracy, and the lower side has a feature number. The
highest accuracy of XGBoost is 97.43% at feature 9. The
highest accuracy of Random Forest is 94.87% at feature 20,
and the highest accuracy of SVM is 9230% at feature 15. CONCLUSION
The KNN classifier got the best accuracy of 97.43% with Parkinson's disease symptoms show gradually and get
the uncorrelated features. The number of uncorrelated very bad over time. The Parkinson's dataset has been utilized
features is 11. by numerous specialists in clinical and classification
investigations. In this paper, we use feature selection methods
to optimize the accuracy. We used some classifiers. The
classifiers are XGBoost, Random Forest, KNN, SVM, and
Neural Network. We got the best accuracy among all the
classifiers is XGBoost, with an accuracy of 97.43% with nine
features. All classification technique does not show the same
performance for the dataset. Here in our proposed method,
we apply some popular classifiers for Parkinson's disease
prediction. But we apply the vast number of classifiers to
search for the best classifier(s). Also, the ensemble method is
a popular classification technique that can able to combine
the result of the individual classifiers used for the ensemble.
It shows better performance than the single classifiers in
The Neural Network got the best accuracy at 87.18% at many existing methods. But we didn’t apply it in our dataset.
epoch 1. These are the limitations of our proposed method. In the
future, we will enhance our proposed work for eliminating
Fig. 2. Line graph of RFE accuracy the limitations.
In TABLE II, we can see that our proposed method has the
highest accuracy of the existing methods. The nearest
accuracy is achieved in [8], but it is also lower than the REFERENCES
proposed method. [1] En.wikipedia.org. 2021. Parkinson's disease. [online] Available at:
TABLE II. COMPARISON TABLE <https://en.wikipedia.org/wiki/Parkinson%27s_disease> [Accessed 2
February 2021].En.wikipedia.org. 2021. Machine learning. [online]
Method Accuracy
[2] En.wikipedia.org. 2021. Machine learning. [online] Available at:
Random Forest [6] 90.26% <https://en.wikipedia.org/wiki/Machine_learning>[Accessed 4
February 2021].
LeNet-based [7] 0.95±0.03% [3] World Life Expectancy. 2021. Parkinson's Disease in Bangladesh.
KNN [8] 96.875% [online] Available at: <
https://www.worldlifeexpectancy.com/bangladesh-parkinson-
The decision tree [9] 91.63% disease.> [Accessed 4 February 2021].
Voice Impairment [4] Medium. 2021. Machine Learning Classifiers. [online] Available at:
89.15% <https://towardsdatascience.com/machine-learning-classifiers-
Classifier [10] a5cc4e1b0623> [Accessed 4 February 2021].
Our paper’s best accuracy is collected by XGBoost, and [5] Medium. 2021. Feature Selection Techniques in Machine Learning
the accuracy is 97.43%, with features 9 out of 22. with Python. [online] Available at:
The best accuracy collected from our work is shown in <https://towardsdatascience.com/feature-selection-techniques-in-
machine-learning-with-python-f24e7da3f36e> [Accessed 5 February
TABLE III. Here, we can see that the highest accuracy is 2021].
achieved from the XGBoost and KNN. But XGBoost gives
[6] Sriram, T. V. Intelligent Parkinson Disease Prediction Using Machine
the highest accuracy by using the minimum number of Learning Algorithms. Intelligent Parkinson Disease Prediction Using
features. It is also observed that the recursive feature Machine Learning Algorithms, 3(3), 212-214. (2013).
elimination technique gives the optimal feature number for [7] Ortiz, A., Munilla, J., Martínez-Ibañez, M., Górriz, J., Ramírez, J. and
the highest accuracy. Salas-Gonzalez, D., 2019. Parkinson's Disease Detection Using
TABLE III. BEST ACCURACY TABLE Isosurfaces-Based Features and Convolutional Neural Networks.
Frontiers in Neuroinformatics, 13, pp.1-11.
FeaturesUsed

[8] M. Inzamam-Ul-Hossain, L. MacKinnon and M. R. Islam, "Parkinson


Specificity
Sensitivity
Accuracy

disease detection using ensemble method in PASW benchmark," 2015


Classifier

IEEE International Advance Computing Conference (IACC),


AUC
Method
(%)

Banglore, 2015, pp. 666-670, DOI: 10.1109/IADCC.2015.7154790.


[9] Mostafa, S., Mustapha, A., Al-Dulaimi, S. H., & Ahmad, M. S.
Evaluating the Performance of Three Classification Methods in
XGBoost 97.43 RFE 9 1.0 85.71 92.85 Diagnosis of Parkinson’s Disease. Evaluating the Performance of Three
Classification Methods in Diagnosis of Parkinson’s Disease, 43-
Uncorrela 11 96.8 1.0 98.43
52. doi:10.1007/978-3-319-72550-5_5. (2018).
KNN 97.43 ted
Features [10] Shivangi, A. Johri, and A. Tripathi, "Parkinson Disease Detection Using
Uncorrela 11 1.0 71.42 85.71 Deep Neural Networks," 2019 Twelfth International Conference on
Random Contemporary Computing (IC3), Noida, India, 2019, pp. 1-4, DOI:
94.87 ted
Forest 10. 1109/IC3.2019.8844941.
Features
12 1.0 57.14 78.57 [11] En.wikipedia.org. 2021. k-nearest neighbors algorithm - Wikipedia.
SVM 92.30 RFE [online] Available at: <https://en.wikipedia.org/wiki/K-
One nearest_neighbors_algorithm> [Accessed 30 April 2021].
epoc [12] Brownlee, J., 2021. A Gentle Introduction to XGBoost for Applied
Neural RFE with h - - - Machine Learning. [online] Machine Learning Mastery. Available at:
87.18
Network XGBoost
<https://machinelearningmastery.com/gentle-introduction-xgboost-
applied-machine-learning/> [Accessed 30 April 2021]. [17] Sites.google.com. 2021. Story and Lessons Behind the Evolution of
[13] Stack Abuse. 2021. Applying Filter Methods in Python for Feature XGBoost - nttrungmt-wiki. [online] Available at:
Selection. [online] Available at: < https://stackabuse.com/applying- <https://sites.google.com/site/nttrungmtwiki/home/it/data-science---
filter-methods-in-python-for-feature-selection/.> [Accessed 7 April python/xgboost/story-and-lessons-behind-the-evolution-of-
2021]. xgboost> [Accessed 1 May 2021].
[14] Brownlee, J., Information Gain and Mutual Information for Machine [18] En.wikipedia.org. 2021. Feature selection - Wikipedia. [online]
Learning. [online] Machine Learning Mastery. Available at: Available at: <https://en.wikipedia.org/wiki/Feature_selection>
<https://machinelearningmastery.com/information-gain-and-mutual- [Accessed 1 May 2021].
information/> [Accessed 7 April 2021]. 2021
[19] Archive.ics.uci.edu. 2021. UCI Machine Learning Repository:
[15] Mayo, M., 2021. Step Forward Feature Selection: A Practical Example Parkinson's Data Set. [online] Available at:
in Python - KDnuggets. [online] KDnuggets. Available at: < <https://archive.ics.uci.edu/ml/datasets/parkinsons> [Accessed 16
https://www.kdnuggets.com/2018/06/step-forward-feature-selection- May 2021].
python.html.> [Accessed 7 April 2021].
[16] Brownlee, J., Recursive Feature Elimination (RFE) for Feature
Selection in Python. [online] Machine Learning Mastery. Available
at: <https://machinelearningmastery.com/rfe-feature-selection-in-
python/> [Accessed 7 April 2021]. 2021.

You might also like