You are on page 1of 21

Oman Medical Journal

A Novel Diagnosis System for Parkinson Disease Based on Ensemble Random Forest
--Manuscript Draft--

Manuscript Number:

Full Title: A Novel Diagnosis System for Parkinson Disease Based on Ensemble Random Forest

Article Type: Original Article

Keywords: Parkinson Disease; Ensemble; Random Forest; Machine Learning Classifiers

Corresponding Author: AVIJIT KUMAR CHAUDHURI, MTECH, MBA


Techno Engineering College Banipur
KOLKATA, West Bengal INDIA

Corresponding Author Secondary


Information:

Corresponding Author's Institution: Techno Engineering College Banipur

Corresponding Author's Secondary


Institution:

First Author: AVIJIT KUMAR CHAUDHURI, MTECH, MBA

First Author Secondary Information:

Order of Authors: AVIJIT KUMAR CHAUDHURI, MTECH, MBA

Order of Authors Secondary Information:

Manuscript Region of Origin: INDIA

Abstract: One of the important concerns in healthcare and machine learning research is
diagnosing Parkinson's disease (PD). This study aims to develop a PD prediction
model. This paper proposes an ensemble random forest (ERF) model, which employs
a variety of classification approaches to achieve this goal. The proposed ERF classifier
is evaluated on the PD dataset from the machine learning repository at the University
of California at Irvine (UCI). The suggested classifier is also compared to various state-
of-the-art machine-learning classifiers, such as random forest, naive bayes, support
vector machine with radial basis function kernel, and decision tree. To assess the
effectiveness of the suggested ERF classifier, several performance indicators such as
accuracy, sensitivity, specificity, F-Measure, receiver operating characteristic, area
under the curve, and statistical tests such as the kappa statistics were used. Finally,
the suggested ERF model revealed its potential in the classification results, with a 96
% accuracy rate.

Suggested Reviewers: Dr. Deepankar Sinha


dsinha2000@gmail.com
NA

SANKHAYAN CHOUDHURI
sankhayan@gmail.com
NA

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Manuscript (Without cover page) Click here to access/download;Manuscript (Without cover
page);Parkinsons disease without cp.doc

Abstract

One of the important concerns in healthcare and machine learning research is diagnosing Parkinson's
disease (PD). This study aims to develop a PD prediction model. This paper proposes an ensemble
random forest (ERF) model, which employs a variety of classification approaches to achieve this goal.
The proposed ERF classifier is evaluated on the PD dataset from the machine learning repository at the
University of California at Irvine (UCI). The suggested classifier is also compared to various state-of-
the-art machine-learning classifiers, such as random forest, naive bayes, support vector machine with
radial basis function kernel, and decision tree. To assess the effectiveness of the suggested ERF classifier,
several performance indicators such as accuracy, sensitivity, specificity, F-Measure, receiver operating
characteristic, area under the curve, and statistical tests such as the kappa statistics were used. Finally,
the suggested ERF model revealed its potential in the classification results, with a 96 % accuracy rate.

Keywords: Parkinson Disease, Ensemble, Random Forest, Machine Learning Classifiers

1. Introduction
Parkinson's disease (PD) is a serious health problem that affects individuals worldwide. It
is the prototypical adult-onset neurodegenerative disorder initially described by Doctor James
Parkinson as shaking palsy (Parkinson, 2002). It is a chronic sickness that deteriorates the
patient's health over time. Dopamine is a neurotransmitter produced by neurons in the biological
neural network that is necessary for the regulation of many motor and non-motor natural
processes in the human body. Certain cell body clusters in PD neurons are unable to generate
dopamine. The quantity of Dopamine generated in the neurological system diminishes as PD
advances, leaving a person unable to move appropriately (Langston, 2002; Meissner et al., 2011).
Tremors in one hand, foot, or leg are often the initial symptoms of PD. Slow movement,
stiffness, decreased body balance, difficulty in standing, decreased facial expressions,
coordination issues, difficulty in thinking, difficulty in understanding, difficulty in writing,
distorted sense of smell, dribbling urine, impaired voice, soft speech, and voice box spasms are
some of the other symptoms(Langston, 2002; Meissner et al., 2011; Gallagher et al., 2010;
Mittel, 2003). 90% of these people have vocal dysfunction and have trouble speaking or
communicating (Schley et al., 1982). As a result, careful examination of the sound/voice of PD
patients utilizing modern signal processing algorithms assists in the diagnosis and tracking of the
disease progress.
In the absence of any known risk factors, the diagnosis is especially difficult, and
consistency of results between doctors is difficult to achieve. Based on risk factors, it is difficult
to determine the risks or make a diagnosis of PD (Akyol, 2017). The machine-learning approach
is useful in this situation because of its improved ability to collect, store, and process data in
order to reveal patterns and provide insights. Machine-learning techniques can predict risk early
in the course of PD. Any error in disease prediction, especially Type II errors, has the potential to
be fatal. In addition to accuracy, other performance measures such as consistency, sensitivity, and
specificity are important in such studies. Several machine-learning techniques to disease
prediction in general and PD in particular, fail to achieve these extra performance criteria.
An Ensemble Random Forest (ERF) classifier is proposed in this paper. It is compared
against many cutting-edge machine-learning classifiers like Logistic Regression (LR), Naïve
Bayes(NB), Decision Tree(DT), Support Vector Machine(SVM), and Random Forest(RF), as
well as earlier research on the same data set as shown in Table 8 below. The author compares
many performance measures and statistical tests (accuracy, sensitivity, specificity, ROC, AUC,
Kappa statistic) on 50–50 %, 66–34 %, 80–20 % splits of training and testing data and 20-fold
cross-validation.
The goal is to find answers to the following research questions:
Research Question 1: Is the suggested ERF classifier suitable for PD prediction?
Research Question 2: Does the suggested ERF classifier fulfill the added Sensitivity and
Specificity criteria?
Research Question 3: Is the suggested ERF classifier consistent and statistically significant
throughout the dataset's various levels of Training and Testing samples?
Figure 1 depicts the flowchart of the experimental design and model construction.
Figure 1. PD prediction model
2. Related Work
Emerging technologies for disease diagnosis and detection are becoming possible as technology
advances(Ray & Chaudhuri, 2021). Many machine learning, expert systems, and soft computing
approaches are offered by diverse researchers in practically every field of medicine(Chaudhuri et
al., 2021; Chaudhuri et al., 2021). Various ways for diagnosing PD are available in the literature
using adequate modeling of voice and speech datasets. Prepossessing, feature extraction and
classification are critical phases in these computerized classification systems. Little et al., 2008
used a kernel support vector machine (SVM) with a feature selection approach to diagnose PD,
and they achieved a promising actual accuracy of 91.4%. Nonlinear Models Using Dirichlet
Process Mixtures for PD diagnosis were proposed by Little et al.(2008), Shahbaba & Neal(2009)
and Psorakis et al.(2010). The authors (Shahbaba & Neal, 2009) employed Improved multi-class
multi-kernel Relevance Vector Machines (mRVMs) and attained 89.47 % using the ten-fold
cross-validation approach. Psorakis et al.(2010) suggested a model based on genetic
programming and the expectation-maximization technique. Guo et al.(2010), Psorakis et
al.(2010) and Sakar & Kursun(2010) built an appropriate model using the mutual information
measure and Support Vector Machines. Sakar & Kursun(2010) and Das(2010) conducted a
comparative analysis of four separate classification models and found that the Artificial Neural
Network-based model had the largest accuracy (92.9 %). Das(2010) and Luukka(2011) used
fuzzy entropy metrics and a similarity classifier. This study produced an average actual accuracy
of 85.03 % using a 50/50 training and testing set. For the diagnosis of PD, Luukka(2011), Ozcift
& Gulten(2011) suggested a correlation-based feature selection (CFS) technique with rotating
forest ensemble classifiers. To improve classification performance on relatively small data sets,
Ozcift & Gulten(2011) and Li et al.(2011) presented fuzzy-based nonlinear transformation
approaches using PCA and SVM. To demonstrate the performance of their technique, they
applied it to six medical datasets, including a PD dataset.
Using Parallel Artificial Neural Network architecture, Astrom & Koker(2011) achieved 91.20 %
classification accuracy. Spadoto et al.(2011) used evolutionary-based feature selection strategies
to increase the accuracy of an optimum path forest classifier for PD diagnosis. Polat(2012) used
a Fuzzy c-means (FCM) clustering-based feature weighting approach and a kNN classifier to
achieve a 97.93% accuracy rate. Support Vector Machine with chi-square distance kernel was
used by Daliri (2013). The achieved accuracy was 91.20 %. PCA and a fuzzy kNN system were
employed by Chen et al.(2013). With the 10 fold cross-validation procedure, they attained a
promising accuracy of 96.07 %. Zuo et al.(2013) used an evolutionary approach (Particle Swarm
Optimization) to improve the performance of a fuzzy k-nearest neighbor classifier, achieving an
accuracy of 97.47 %. Zhang(2017) presented PD diagnosis utilizing time-frequency
characteristics, stacked autoencoders (SAE), and kNN classifiers.

3. Methodology
3.1. Dataset
Max Little of the University of Oxford generated the dataset in association with the National
Centre for Voice and Speech in Denver, Colorado, to capture the speech signals. The feature
extraction approaches for general voice problems were reported in the original paper. This
dataset includes biomedical voice measurements from 31 people, 23 of whom have PD. Each
column in the table represents a different voice measure, and each row represents one of the 195
voice recordings made by these people ("name" column) as depicted in Table 1. The primary
goal of the data is to distinguish healthy people from those with PD by using the "status" column,
which is set to 0 for healthy people and 1 for those with PD(Little et al., 2007).
The data is stored in ASCII CSV format. Each row in the CSV file corresponds to a single voice
recording occurrence. There are approximately six recordings per patient, with the patient's name
in the first column.
Table 1. Description of the dataset

Sl. Attributes Description Range of Mean Std. Dev


No Values

1 name Patient name in ASCII and recording - - -


number
2 MDVP:Fo(Hz) Vocal fundamental frequency on average 88.333 - 154.22864 41.39006
260.105 1 475

3 MDVP:Fhi(Hz) Maximum fundamental frequency of the 102.145 - 197.10491 91.49154


voice 592.03 79 764

4 MDVP:Flo(Hz) Minimum fundamental frequency of the 65.476 - 116.32463 43.52141


voice 239.17 08 318

5 MDVP:Jitter(% Several measures of fundamental frequency 0.00168 - 0.00622 0.004848


) variation 0.03316
6 MDVP:Jitter(A 0.000007 - 4.40E-05 3.48E-05
bs) 0.00026
7 MDVP:RAP 0.00068 - 0.0033064 0.002967
0.02144 1 774
8 MDVP:PPQ 0.00092 - 0.0034463 0.002758
0.01958 59 977
9 Jitter:DDP 0.00204 - 0.0099199 0.008903
0.06433 49 344
10 MDVP:Shimme Several amplitude variation measures 0.00954 - 0.0297091 0.018856
r 0.11908 28 932
11 MDVP:Shimme 0.085 - 0.2822512 0.194877
r(dB) 1.302 82 29
12 Shimmer:APQ3 0.00455 - 0.0156641 0.010153
0.05647 54 162
13 Shimmer:APQ5 0.0057 - 0.0178782 0.012023
0.0794 56 706
14 MDVP:APQ 0.00719 - 0.0240814 0.016946
0.13778 87 736
15 Shimmer:DDA 0.01364 - 0.0469926 0.030459
0.16942 15 119
16 NHR Two measurements of the noise-to-tonal 0.00065 - 0.0248470 0.040418
component ratio in the voice 0.31482 77 449
17 HNR 8.441 - 21.885974 4.425764
33.047 36 269
18 RPDE There are two measurements of nonlinear 0.25657 - 0.4985355 0.103941
dynamical complexity. 0.685151 38 714
19 D2 1.423287 - 2.3818260 0.382799
3.671155 87 047
20 DFA Exponent of signal fractal scaling 0.574282 0.7180990 0.055335
- 0.825288 46 83
21 spread1 Three nonlinear measurements of (-7.964984) - 1.090207
fundamental frequency fluctuation – (- 5.684396 764
2.434031) 744
22 spread2 0.006274 - 0.2265103 0.083405
0.450493 49 763
23 PPE 0.044539 - 0.2065516 0.090119
0.527367 41 322
24 status Patients' health conditions
1 – PD
0 - healthy

3.2. Algorithm for the ERF and Stacking

Stacking tries to combine the ability of numerous state-of-the-art classifiers to obtain prediction
accuracy that is better than the ensemble's classifiers. Stacking divides the training dataset into
the same number of subsets as the ensemble's classifiers, with each classifier training on a non-
overlapping subset. Based on obtained fit, each classifier is given a relative weight, and a Meta
RF classifier is formed (Sikora, 2015; Bhasuran et al., 2016). The proposed IRF technique
employs RF as the ultimate meta-classifier, training an RF classifier on each non-overlapping
subset produced using stratified random sampling (to assure equal class distribution). The
working of the proposed classifier is illustrated in Figure 2.

Figure 2.Working of ERF classifier


Suppose we are considering a set of training data {(x j , y j )}Kj=1, where x j  R Q and y j  {-1, 1}. and

suppose we are given a potentially large number of logistic regression classifiers as weak
m ( ai ) bi
classifiers, denoted f p (x) {-1, 1}, and a 0-1 loss function I, defined as I(f m (a), b)  {00 ifif ff m ( ai )  bi

Then, the algorithm of the AdaBoost can be illustrated as follows


for j from 1 to N, w (jJ ) =1
for m = 1 to M do
Fit weak classifier m(random forest) to minimize the objective function:
 w (jm) I(f x (a j )  b j
N

m  j 1
 j w(jm)
where I(f p (a j )  b j ) = 1 if f p (a j )  b j and 0 otherwise
1 m
 m  ln
m
for all j do
 m I(f p (x j )  y j )
w (jm 1)  w (jm)e
end for
end for
After learning, the final classifier is based on a linear combination of the logistic regressions:
 M 
g(x) = sign   m f m ( x) 
 m 1 

Table 2: Performance evaluation metrics and statistical tests


Metrics Formula/Description
Accuracy
Sensitivity
Specificity
F-Score

Kappa statistics PRa -PR ac


, ‘ PRa ’ represents total agreement probability and ‘ PRac ’
(1-PR ac )
represents probability ‘by chance’
Receiver operating ROC is plotted between Sensitivity and (1-Specificity). The area under the
characteristic (ROC) curve (AUC) measures the degree to which the curve is up in the northwest
corner(Vandewiele et al., 2021).
4. Results and Discussion

The author developed and simulated a proposed model by using the Python programming
language. In this model, author performed a comparative study between five state-of-the-art
machine-learning algorithms, namely LR, RF, NB, SVM, and DT, and the proposed model.
Among these five popular machine-learning techniques, some show better accuracy, whereas the
performances of others are inferior. To boost the accuracy and performance of the weak
classifier, the author used advanced ensemble machine learning and proposed an ensemble meta-
algorithmic technique (Figure 2).
The main aim of the experimentation is to classify healthy people and people with PD.
The machine-learning techniques applied on the UCI dataset named “Oxford Parkinson’s
Disease Detection (OPD)” of biomedical voice measurements is used for this purpose and
showed 96% accuracy. The various performance evaluation matrices used in this study are
professed in Table 2.
As shown in Table 3, different approaches yielded different levels of accuracy, with NB
recording an accuracy of 70% while the proposed ERF exhibited 96% accuracy.
Table 3. Comparison of accuracies

Training -Testing Accuracies


Partition
LR NB DT SVM RF ERF

50-50 0.90 0.75 0.86 0.92 0.90 0.96

66-34 0.84 0.90 0.85 0.85 0.94 0.93

80-20 0.87 0.69 0.92 0.87 0.92 0.95

20 fold cross 0.82 0.70 0.79 0.84 0.79 0.88


validation

A confusion matrix presents the statistics of real and projected classifications achieved
from the analysis of different classification systems. The performance of all such systems is
generally assessed by using the data generated in this matrix. Table 4 shows the results generated
from confusion matrices by using different machine-learning algorithms. The performance of the
proposed model, along with the performances of other methods, was evaluated based on
sensitivity, specificity, and accuracy tests, which use the true positive (TP), true negative (TN),
false negative (FN), and false positive (FP) terms.

Table 4. Comparison of sensitivity and specificity

Train LR NB DT SVM RF ERF


ing-
Testi
ng

Sensit Specif Sensit Specif Sensit Specif Sensit Specif Sensit Specif Sensit Specif
ivity icity ivity icity ivity icity ivity icity ivity icity ivity icity

50-50 0.89 0.93 0.96 0.48 0.91 0.70 0.90 1 0.93 0.78 0.96 0.95

66-34 0.86 0.75 0.92 0.47 0.90 0.71 0.87 0.77 0.94 0.93 0.94 0.88

80-20 0.89 0.75 0.92 0.33 0.94 0.83 0.89 0.75 0.94 0.83 0.94 1

20 0.82 0.85 0.70 0.97 0.79 0.87 0.84 0.86 0.89 0.94 0.88 0.87
fold
cross
valida
tion

The results of sensitivity and specificity in Table 4 demonstrate the potential of the
proposed model in the classification of two classes. The comparison of the proposed model with
other widely used independent classification techniques is shown in Figure 6 It is clear from the
comparative results that the proposed classification technique has the highest accuracy,
sensitivity and specificity values (accuracy 96%, sensitivity=0.96, specificity=1) for the PD
dataset.
I found from different analyses in this study that the classification accuracy of the LR
was 90%, with 89% sensitivity and 93% specificity. The RF achieved an accuracy of
classification of 94%, with 94% sensitivity and 94% specificity. The accuracy of classification of
the SVM was 92%, with 90% sensitivity and 100% specificity. However, the best performance
of the six classifiers evaluated was that of the proposed classifier, which achieved 96% accuracy
in classification, with 96% sensitivity and 100% specificity. Table 3 and Table 4 show the
complete set of results.
The analytical results of the suggested classifier are equally acceptable. The enhanced
specificity and sensitivity to predict PD using the proposed classifier is a significant outcome.
The NB classifier achieved 96% sensitivity and 97% specificity; however, the other classifiers
produced lower sensitivity and specificity than ERF. The suggested classifier eliminated that
disadvantage, yielding sensitivity and specificity accuracy levels of 96% and 100% respectively,
implying that fewer patients would need to be tested for PD due to improved specificity. At the
same time, a greater sensitivity value would save money and minimize the waiting periods of
really ill patients, both of which would be crucial in saving lives.
Table 5. Comparison of F-Score

Training -Testing F-Score


Partition
LR NB DT SVM RF ERF

50-50 0.94 0.81 0.91 0.95 0.93 0.97

66-34 0.90 0.78 0.90 0.90 0.96 0.95

80-20 0.93 0.79 0.95 0.93 0.95 0.97

20 fold cross 0.80 0.78 0.80 0.82 0.90 0.90


validation

The F-measure, on the other hand, is a commonly used metric in information retrieval
and class imbalance issues that has been studied by numerous researchers. The fundamental
advantage of F-measure is that it compares classifier performance in terms of recall (or True
Positive Rate) to accuracy using a factor that adjusts their relative relevance. (Soleymani,
Granger & Fumera, 2020). The F-measure findings in Table 5 show the potential of the
suggested model in the classification of two classes. Table 5 compares the proposed model to
several frequently applied independent classification algorithms. The comparison study found
that the suggested classification algorithm has the greatest F-measure values (97%) for the PD
dataset.
The ROC charts for these experiments with individual machine-learning techniques are
depicted in Table 6. In this table, six ROC charts are drawn in different parts for a 20-fold cross-
validation shown in blue. Experimental results show that, in terms of cross-validation accuracy,
the proposed classifier outperformed most of the other previously used methods discussed in the
literature review. With the proposed model, the generated AUC value reaches 0.95.
Table 6. Comparison of AUC

Training -Testing LR NB DT SVM RF ERF


Partition

20 fold cross
validation

AUC 0.84 0.82 0.73 0.80 0.93 0.95

Comparing the performances of different machine-learning classifiers might generate an


ambiguous result if the comparison has been based only on accuracy-based metrics. The Cohen’s
Kappa Statistic (CKS) value is used to help produce error-free comparative efficiency of
different classifiers(Chaudhuri, Banerjee & Das, 2021). The cost of error must be considered in
such evaluations. In this respect, the CKS is an excellent measure for inspecting classifications
that may be due to chance. Usually, the CKS takes a value between -1 and +1. As the classifier’s
calculated kappa value approaches 1, its performance is assumed to be more realistic than “by
chance.” Therefore, the CKS value is a suggested metric for measurement purposes in the
performance analysis of classifiers (Ben-David, 2008). This kappa value is calculated by using
Equation 1:
PRa -PR ac
CKS= (1)
(1-PR ac )

Where ‘ PRa ’ represents total agreement probability and ‘ PRac ’ represents probability ‘by
chance’
The results of the CKS analysis of the five popular machine-learning techniques and the
proposed model are shown in Table 7. These results clearly demonstrate that the proposed model
performed much better than other classifiers (value=0.88).
Table 7. Kappa statistic for each model
Training- LR NB DT SVM RF ERF
Testing

Kappa Statistic

50-50 0.68 0.46 0.60 0.74 0.72 0.88

66-34 0.52 0.40 0.61 0.57 0.84 0.80

80-20 0.48 0.28 0.72 0.48 0.72 0.80

20 fold 0.40 0.48 0.43 0.45 0.70 0.66


cross
validation

Table 8. PD dataset performance comparison

Year Method Classifica Sensitivity/ F-Measure Kappa ROC/AUC


tion Specificity
Accuracy
(%)
ÇAĞLAR et al., 2010 ANFC- 94.72 0.88/1 × × ×
LH
MLPNN 89.69 0.88/0.96 × × ×
RBFNN 87.63 0.88/0.96 × × ×

Ene, 2008 PNN 81.28 × × × ×


Avci & Dogantekin, GA-WK- 96.81 0.95/0.98 × × ×
2016 ELM
Cai et al., 2018 CBFO- 96.97 0.97/0.99 × × 0.98
FKNN
Ozcift & Gulten, 2011 CFS-RF 87.10 × × × 0.86
Cai et al., 2017 SVM- 97.42 0.99/0.92 × × ×
BFO
Chen ET AL., 2013 PCA- 96.07 0.96/0.96 × × 0.96
FKNN
Kadam & Jadhav, 2019 FESA- 92.19 0.97/0.90 × × ×
DNN
This study ERF 96 0.96/1 0.97 0.88 0.95

5. Conclusion
PD is one of the major causes of mortality worldwide, particularly in low- and middle-income
nations. Detecting PD necessitates a series of medical tests and their interpretation by experts.
When the findings of medical tests disagree, making a diagnosis becomes difficult, and it may
impair the consistency of diagnosis among physicians. In this case, the machine-learning
technique can identify patterns and give insights. Machine-learning approaches can identify the
risk of PD at an early stage based on elements of regular lifestyles and the results of a few
medical tests.
The suggested ensemble random forest classifier, like the stacking-ensemble approach, combines
the random forest classifier predictions utilizing RF (also) as the Meta-Classifier. On the PD
Study dataset, biomedical voice measures from 31 patients, 23 of whom have PD, are included in
this collection (PD). Each column in the table indicates a distinct voice measure, and each row
represents one of these people's 195 voice recordings ("name" column).
The classes in the dataset are distributed unequally. Such unbalanced datasets increase the risk of
a diagnosed sick patient being misdiagnosed as healthy (which is very severe) and decrease the
likelihood of a healthy patient being misdiagnosed.
The author compares the proposed classifier to two parametric (LR and NB) and three non-
parametric classifiers (SVM with radial basis kernel, DT, and RF). The comparison is based on
the accuracy, sensitivity, specificity, F- Measure, ROC, AUC and Kappa statistics. The author
uses the 50–50, 66–34, and 80–20% train-test splits, as well as 20-fold cross-validations. We
confirm that the training and testing datasets do not contain identical records of PD patients.
Non-parametric classifiers outperform parametric classifiers (whose learning is bound by their
assumptions), and the Random Forest classifier outperforms them all. The suggested classifier
outperforms the Random Forest classifier in terms of accuracy, with 96% accuracy and the
lowest false positives/negatives overall.

Akyol, K. (2017). A study on the diagnosis of Parkinson’s disease using digitized wacom
graphics tablet dataset. Int J Inf Technol Comput Sci, 9, 45-51.
American Parkinson Disease Association (APDA), Symptoms of Parkinson’s disease. https://
Åström, F., & Koker, R. (2011). A parallel neural network approach to prediction of Parkinson’s
Disease. Expert systems with applications, 38(10), 12470-12474.
Avci, D., & Dogantekin, A. (2016). An expert diagnosis system for parkinson disease based on
genetic algorithm-wavelet kernel-extreme learning machine. Parkinson’s disease, 2016.
Ben-David, A. (2008). Comparison of classification accuracy using Cohen’s weighted
kappa.Expert Systems with Applications, 34(2), 825–832.
Bhasuran, B., Murugesan, G., Abdulkadhar, S., & Natarajan, J. (2016). Stacked ensemble
combined with fuzzy matching for biomedical named entity recognition of diseases.
Journal of biomedical informatics, 64, 1-9.
ÇAĞLAR, M. F., ÇETİŞLİ, B., & Toprak, I. B. (2010). Automatic recognition of Parkinson’s
disease from sustained phonation tests using ANN and adaptive neuro-fuzzy classifier.
Mühendislik Bilimleri ve Tasarım Dergisi, 1(2), 59-64.
Cai, Z., Gu, J., Wen, C., Zhao, D., Huang, C., Huang, H., ... & Chen, H. (2018). An intelligent
Parkinson’s disease diagnostic system based on a chaotic bacterial foraging optimization
enhanced fuzzy KNN approach. Computational and mathematical methods in medicine,
2018.
Chaudhuri, A. K., Banerjee, D. K., & Das, A. (2021). A Dataset Centric Feature Selection and
Stacked Model to Detect Breast Cancer. International Journal of Intelligent Systems and
Applications (IJISA), 13(4), 24-37.
Chaudhuri, A. K., Sinha, D., Banerjee, D. K., & Das, A. (2021). A novel enhanced decision tree
model for detecting chronic kidney disease. Network Modeling Analysis in Health
Informatics and Bioinformatics, 10(1), 1-22.
Chaudhuri, A.K., Ray, A., Banerjee, D.K., & Das, A. (2021). A Multi-Stage Approach
Combining Feature Selection with Machine Learning Techniques for Higher Prediction
Reliability and Accuracy in Cervical Cancer Diagnosis. International Journal of
Intelligent Systems and Applications.
Chen, H. L., Huang, C. C., Yu, X. G., Xu, X., Sun, X., Wang, G., & Wang, S. J. (2013). An
efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest
neighbor approach. Expert systems with applications, 40(1), 263-271.
Chen, H. L., Huang, C. C., Yu, X. G., Xu, X., Sun, X., Wang, G., & Wang, S. J. (2013). An
efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest
neighbor approach. Expert systems with applications, 40(1), 263-271.
Daliri, M. R. (2013). Chi-square distance kernel of the gaits for the diagnosis of Parkinson's
disease. Biomedical Signal Processing and Control, 8(1), 66-70.
Das, R. (2010). A comparison of multiple classification methods for diagnosis of Parkinson
disease. Expert Systems with Applications, 37(2), 1568-1572.
Ene, M. (2008). Neural network-based approach to discriminate healthy people from those with
Parkinson's disease. Annals of the University of Craiova-Mathematics and Computer
Science Series, 35, 112-116.
Gallagher, D. A., Lees, A. J., & Schrag, A. (2010). What are the most important nonmotor
symptoms in patients with Parkinson's disease and are we missing them?. Movement
Disorders, 25(15), 2493-2500.
Guo, P. F., Bhattacharya, P., & Kharma, N. (2010, June). Advances in detecting Parkinson’s
disease. In International Conference on Medical Biometrics (pp. 306-314). Springer,
Berlin, Heidelberg.
Kadam, V. J., & Jadhav, S. M. (2019). Feature ensemble learning based on sparse autoencoders
for diagnosis of Parkinson’s disease. In Computing, communication and signal
processing (pp. 567-581). Springer, Singapore.
Langston, J. W. (2002). Parkinson’s disease: current and future challenges. Neurotoxicology,
23(4-5), 443-450.
Li, D. C., Liu, C. W., & Hu, S. C. (2011). A fuzzy-based data transformation for feature
extraction to increase classification performance with small medical data sets. Artificial
intelligence in medicine, 52(1), 45-52.
Little, M., McSharry, P., Hunter, E., Spielman, J., & Ramig, L. (2008). Suitability of dysphonia
measurements for telemonitoring of Parkinson’s disease. Nature Precedings, 1-1.
Little, M., McSharry, P., Roberts, S., Costello, D., & Moroz, I. (2007). Exploiting nonlinear
recurrence and fractal scaling properties for voice disorder detection. Nature Precedings,
1-1.
Luukka, P. (2011). Feature selection using fuzzy entropy measures with similarity classifier.
Expert Systems with Applications, 38(4), 4600-4607.
Meissner, W. G., Frasier, M., Gasser, T., Goetz, C. G., Lozano, A., Piccini, P., ... & Bezard, E.
(2011). Priorities in Parkinson's disease research. Nature reviews Drug discovery, 10(5),
377-393.
Mittel, C. S. (2003). Parkinson's disease: Overview and current abstracts.
Ozcift, A., & Gulten, A. (2011). Classifier ensemble construction with rotation forest to improve
medical diagnosis performance of machine learning algorithms. Computer methods and
programs in biomedicine, 104(3), 443-451.
Ozcift, A., & Gulten, A. (2011). Classifier ensemble construction with rotation forest to improve
medical diagnosis performance of machine learning algorithms. Computer methods and
programs in biomedicine, 104(3), 443-451.
Polat, K. (2012). Classification of Parkinson's disease using feature weighting method on the
basis of fuzzy C-means clustering. International Journal of Systems Science, 43(4), 597-
609.
Psorakis, I., Damoulas, T., & Girolami, M. A. (2010). Multiclass relevance vector machines:
sparsity and accuracy. IEEE Transactions on neural networks, 21(10), 1588-1598.
Ray, A., & Chaudhuri, A. K. (2021). Smart healthcare disease diagnosis and patient
management: Innovation, improvement and skill development. Machine Learning with
Applications, 3, 100011.
Sahu, B., & Mohanty, S. N. (2021). CMBA-SVM: a clinical approach for Parkinson disease
diagnosis. International Journal of Information Technology, 13(2), 647-655.
Sakar, C. O., & Kursun, O. (2010). Telediagnosis of Parkinson’s disease using measurements of
dysphonia. Journal of medical systems, 34(4), 591-599.
Schley, W. S., Fenton, E., & Niimi, S. (1982). Vocal symptoms in parkinson disease treated with
levodopa: a case report. Annals of Otology, Rhinology & Laryngology, 91(1), 119-121.
Shahbaba, B., & Neal, R. (2009). Nonlinear models using Dirichlet process mixtures. Journal of
Machine Learning Research, 10(8).
Sikora, R. (2015). A modified stacking ensemble machine learning algorithm using genetic
algorithms. In Handbook of Research on Organizational Transformations through Big
Data Analytics (pp. 43-53).IGi Global.
Soleymani, R., Granger, E., & Fumera, G. (2020). F-measure curves: A tool to visualize
classifier performance under imbalance. Pattern Recognition, 100, 107146.
Spadoto, A. A., Guido, R. C., Carnevali, F. L., Pagnin, A. F., Falcão, A. X., & Papa, J. P. (2011,
August). Improving Parkinson's disease identification through evolutionary-based feature
selection. In 2011 Annual International Conference of the IEEE Engineering in Medicine
and Biology Society (pp. 7857-7860). Ieee.
Vandewiele, G., Dehaene, I., Kovács, G., Sterckx, L., Janssens, O., Ongenae, F., ... &
Demeester, T. (2021). Overly optimistic prediction results on imbalanced data: a case
study of flaws and benefits when applying over-sampling. Artificial Intelligence in
Medicine, 111, 101987.
www.apdaparkinson.org/what-is-parkinsons/symptoms/ (2017). Accessed 21 Nov 2017
Zhang, Y. N. (2017). Can a smartphone diagnose parkinson disease? a deep neural network
method and telediagnosis system implementation. Parkinson’s disease, 2017.
Zuo, W. L., Wang, Z. Y., Liu, T., & Chen, H. L. (2013). Effective detection of Parkinson's
disease using an adaptive fuzzy k-nearest neighbor approach. Biomedical Signal
Processing and Control, 8(4), 364-373.

Declaration of Interests

The author declares that he has no known competing financial interests or personal relationships that
could have appeared to influence the work reported in this paper.

Highlights
 This study has been done to tackle the Parkinson's disease which is a highly mortal disease
 Proposed a novel improved random forest model which works on ensemble of classifiers
 Found efficient compared with other classifiers and researches done on Parkinson's disease
 We could achieve close to 100% accuracy in the classification results
 Significantly reduce medical costs for prediction of Parkinson's disease
Cover Page Only

A Novel Diagnosis System for Parkinson Disease Based on Ensemble Random Forest

Dr. Avijit Kumar Chaudhuri

Computer Science and Engineering, Techno Engineering College Banipur, Kolkata, India

c.avijit@gmail.com
ETHICAL APPROVAL

1
2
3
4 Declaration of Interests
5
6
7 The author declares that he has no known competing financial interests or personal relationships
8 that could have appeared to influence the work reported in this paper.
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65

You might also like