You are on page 1of 6

COMPARATIVE ANALYSIS OF THE CLASSIFICATION PERFORMANCE OF

MACHINE LEARNING CLASSIFIERS AND DEEP NEURAL NETWORK


CLASSIFIER FOR PREDICTION OF PARKINSON DISEASE
AMIN UL HAQ, JIANPING LI, MUHAMMAD HAMMAD MEMON, JALALUDDIN KHAN, SALAH UD DIN,
IJAZ AHAD, RUINAN SUN, ZHILONG LAI

School of Computer Science and Engineering, University of Electronic Science and Technology of China Chengdu
611731-China
E-MAIL:khan.amin50@yahoo.com, jpli2222@uestc.edu.cn, muhammadhammadmemon@yahoo.com,
jalal4amu@yahoo.com, swati_pak2003@yahoo.com, ijazahad1@gmail.com, rui_nan@outlook.com, 641449078@qq.com

Abstract: The cells suffering by PD do not have a consistent flow of


The accurate diagnosis of Parkinson disease specifically in dopamine with motor system. The vocal impairment is
its initial stages is extremely complex and time consuming. Thus hypothesized initial signs of the disease [1]. It has been a
the accurate and efficient diagnosis of Parkinson disease has diagnosis that human being with Parkinsonism (PD) has
been a significant challenge for medical experts and researchers. vocal disorders which affect their speech in term of volume
In order to tackle the accurate diagnosis of Parkinson disease
level and face difficulty in pronunciation of syllables and so
issue we proposed machine learning and deep neural networks
based non-invasive prediction system for accurately and on forth. Thus, to use vocal measurements as an effective
time diagnosis of Parkinson disease. In the development of the diagnostic tool for PD [2]. In 1817, PD was described as
system machine learning predictive models such as support “shaking palsy” by Doctor James Parkinson [3]. Health
vector machine, logistic regression and deep neural network practitioner Doctor J Parkinson identified six causes; three of
were used for people with Parkinson disease and healthy people them were examined by him. In [4] Early relating to as
classification. The data set was splits into 70% for training “paralysis agitans”. In the 19th century Charcot provide
purpose and 30% for testing. Furthermore, performance credit to Parkinson’s that a disease as “maladies de
evaluation metrics such as classification accuracy, sensitivity, Parkinson’s” or PD. He also diagnosed tremulous form of
specificity and Matthews’s correlation coefficient were utilized
PD and accurately identified that slowness of motion due to
for model performance evaluation. The Parkinson disease
dataset of 23 attributes and 195 instances available on UCI weakness or “lessened muscular power” especially used for
machine learning repository was used for testing of the Parkinson’s. After the actual description by Parkinson’s
proposed system. Through our experimental results analysis before it was recognized that a patient having PD loss cells
shows that the proposed system classified the Parkinson disease in the substantial nigra. In [5], PD diagnosis is being
and healthy people effectively. We also investigated that deep typically based on conducted few invasive techniques and
neural performance of classification was excellent as compared empirical tests and examinations. The invasive based
to traditional machines learning classifiers. These finding techniques in order to diagnose the PD are very expensive,
suggest that the proposed diagnosis system could be used to very complex equipment’s needed to conducts and less
accurately predict Parkinson disease.
efficient and the accuracy is also not satisfactory.
New techniques and methods are needed to diagnose
Keywords: PD. Therefore, less expensive, simplified and reliable
Parkinson disease; Classification; Machine learning
methods should be adapted to diagnosis disease and ensure
classifier; Deep neural network; Diagnosis
treatments. However, noninvasive diagnosis techniques of
PD require being investigated. Machine learning techniques
1. Introduction
are used to classify PD and healthy people. It has been a
diagnosis that human being with Parkinsonism has vocal
The Parkinson’s disease (PD) is considered second
disorders which affect their speech in term of volume level
most common neurological sicknesses around the world after
and face difficulty in pronunciation of syllables and so forth
Alzheimer’s disease. PD is a progressive long-term
[2].These vocal issues of disorders can be assessed for early
degenerative disorder of central nervous system that badly
PD diagnosis [6]. PD diagnosing and controlling through
affects numerous people whose age is usually above 60 years.

978-1-7281-1536-8/18/$31.00 ©2018 IEEE 101


speech signals is more effective and reliable. However, classification of diseases. The deep neural networks are an
telemonitoring systems that use speech signals permits far effective classifier for classification of PD and healthy
off monitoring of PD. To classifying PD and healthy people subjects. In various researches’ works deep neural networks
the usage of speech signals is a promising method for was used for PD and healthy subjects classification. Ali H.
diagnosing PD from speech impairments. In literature Al-Fatlawi .et al [12] used a deep belief network (DBN) to
different machines learning based classification techniques diagnosis Parkinson's disease used dataset [10] and achieved
have been proposed to classify PD and healthy people using accuracy 94%. Abdullah Caliskan.et al [13] Proposed system
speech signals, few of them are disuses in this study. for PD prediction by deep neural networks and obtained high
In [7] 132 features were extracted from speech signals classification accuracy.
by using dysphonia measures. The database only contained In this paper, we proposed a system to successfully
vowels and some features extraction algorithms such as least diagnosis of people with PD and that improve patient’s life
absolute shrinkage selection operator (LASSO), Minimal confortable using speech signals. Machine learning
redundancy maximal relevance (mRMR), relief, Local predictive classifiers and deep neural network classifier were
learning based features selection (LLBFS) were used and 10 used for classification of PD and healthy people. The PD
features out of 132 were selected by FS algorithms. After that, dataset was used to test the performance of traditional
these 10 selected features were used as input parameters for machine learning classifiers and deep neural network
classification with two machine learning algorithms i.e. classifier which is available on university of California
random forests and support vector machines In another study Irvine (UCI) machine learning repository [14], and which
Tsana et al. [8] process speech signals of people with were used by others researchers also [10].The dataset was
Parkinson (PWP) to compute a relation between the severity divided into 70% for training purpose and 30 % for testing.
of PD and speech disorder. Performance evaluation metrics such as classification
In [9], the proposed system was developed on a accuracy, sensitivity, specificity and MCC were computed
machine learning based system and use speech signals. Four inorder to check the performance of proposed system.
feature selection algorithms (LASSO, Relief, LLBS, and The reaming paper is organized as follows. In section 2
mRMR) were used to filter out the most appropriate features the PD dataset, machine learning classifiers and deep neural
from the dataset Moreover; classifiers such as Ada boost, network are discussed briefly. Validation technique and
SVM, k-NN, multi-layer perceptron (MLP), Naïve Bayes performance evaluation metrics also briefly discussed in in
(NB) were applied for classification PD and healthy subjects. this section. In Section 3, experimental results are analyzed
Moreover, two validation techniques (k-fold and LOSO) and discussed in details.The final section 4 concerned to
were used for correct classification of PWP. The proposed conclusion and future works.
system performances were evaluated by performance
evaluation metrics such as accuracy, sensitivity, specificity, 2. Materials and Method
and MCC. The computation complexity of algorithm also
computed and the system was tested on a PD dataset that 2.1. Dataset
contained multiple types of speech signals. Zhennnao Cai et
al. [10] proposed a new intelligent framework for diagnosis The dataset used in this study was collected at the
of Parkinson’s disease. They used SVM classifier and relief university of Oxford with joint collaboration with national
feature selection algorithm with bacterial foraging center for voice and speech created by Max little et al [14,
optimization (BFO) and achieved best classification 15], and is available at the UCI machine learning repository.
performance. Lizbeth et al. [11] proposed a classification The original study published the feature extraction methods
system. They used two stage features and classification for general voice disorders. The voice recordings of 31
approach for Parkinson’s disease diagnosis by using voice people, including 23 people with Parkinson’s disease (16
recording replication and obtained best classification males and 7 females) and 8 health controls (3 males and 5
performances. females) were used in the study. In the dataset table each
The performance of conventional machine learning column for particular voice and each row are related to one
classifiers depends sometimes on features selection and of 195 voice recording from these individual subjects.
therefore classifiers required features selection algorithms Additionally, the people ranged from 46 to 85 years of age
for appropriate features selection. However, deep neural with mean age of 65.8 and standard division 9.8. The main
networks (DNNs) automatically performed the job of aim of this dataset was to discriminate people with
features selection algorithms. The deep neural network Parkinson’s disease from healthy people by finding
performance on classification related problems are good and differences in their vowel vocalization according to “status”
many researchers applied deep neural techniques for

102
column which is set to 0 for health and 1 for PD. For each and healthy people classification. Here is the brief theoretical
subject an average of 6 phonation of a vowel was recorded and mathematical backgrounds of these classifiers are
for 36 second and total 195 samples were recorded. The presented.
phonations were recorded in an industrial Acoustic Company
(ICA) sound-treated booth using microphone which at 2.2.2.1. Support vector machine
distance 8 cm from moth and microphone was calibrated as
describe in [22]. The voice signals were recorded in The support vector machine is machine learning
computer using computerized speech laboratory. algorithm which has been mostly used for classification
problems [16][23-27]. SVM used a maximum margin
2.2. Methodology of proposed system strategy that transformed into solving a complex quadratic
programming Problem. The high performance of SVM in
The proposed system has been developed with the aim classification, various applications widely applied it [26-27].
to classify people with Parkinson’s disease and healthy In a binary classification problem, the instances are separated
people. In the system designing machine learning predictive with a hyper plane w T x + b = 0 , where w and d
classifiers such as SVM, Logistic regression, k-NN and deep dimensional coefficient vector, which is normal to the hyper
neural network classifier were used to classify PD and plane of the surface and b, is offset value from the origin, x
healthy people. The PD dataset which is online available at is data set values. The SVM get results of w and b. The W
UCI machine learning repository was used to test the can solve by introducing Lagrangian multipliers in the linear
proposed system. The dataset was divided into 70% for case. The data points on borders are called support vectors.
training and The solution of w can be expressed in equation (1).
30% for testing purpose. Four performance evaluation w=∑𝑛𝑖=1 𝛼𝑖 𝑦𝑖 𝑥𝑖 (1)
metrics were used for predictive model performance Where n is the number of Support vectors, yi are target
evaluation. The methodology of the proposed classification labels to x. The value of w and b are calculated; the linear
system organized into these steps preprocessing of dataset, discriminant function can be written as in equation (2).
data partition, machine learning and deep neural network f(x)= sgn ( ∑ni=1 αiyi xiT x + b ) (2)
classifier and classifiers performance evaluation. Fig.1; The non-linear scenario, for kernel trick and decision
show the framework of the proposed system. function, can be written as in equation (3).
f(x) = sgn ( ∑ni=1 αiyi K(xi, x)+b ) (3)
The positive semi definite functions that obey the Mercer‘s
condition as kernel functions [25]. Such as the polynomial
kernel as expressed in equation (4).
(K(x, xi) = ((xT xi) +1) d) (4)
Fig.1 Proposed system model The Gaussian kernel as expressed in equation (5).
(K(x, xi) = exp (-γ||x – xi||2)) (5)
2.2.1. Data preprocessing
2.2.2.2. Logistic regression
The preprocessing of data is a very necessary step for
good representation of data and machine-learning classifier A logistic regression is a classification algorithm [28-
should be trained and tested with it effectively. 30]. For binary classification problem in order to predict the
Preprocessing techniques such as removing of missing value of predictive variable y when y ∈ [0, 1], 0 is negative
values, standard scalar, Min-Max scalar have been applied to class and 1 is positive class. It also uses for multi-
the dataset for effective used in classifiers. In standard Scalar classification to predict the value of y when y ∈ [0, 1, 2, 3].
ensures that every feature has the mean 0 and variance 1, In order to classify two classes 0 and 1, a hypothesis
T
bringing all features to the same coefficient. Similarly, in h(θ) = θ X will be designed and threshold classifier
Min-Max Scalar shifts the data such that all features are output is hθ (x) at 0.5. If the value of hypothesis
between 0 and 1. hθ(x)>=0.5, it will predict y=1 mean that person have PD
and if the value of hθ(x) < 0.5, then predict y=0 which show
2.2.2. Machine learning and deep neural network that person is healthy. Hence the prediction of logistic
classifiers regression under the condition 0<= hθ (x) <=1.
Logistic regression sigmoid function written as in
In this study the following classifiers were used for PD equation (6).

103
hθ(x) = g (θ𝑇 𝑋) (6) DNN classifier.
1
Where g (z) = hθ(x) =
1+𝑥 −𝑧
Similarly, the logistic regression cost function can be 2.2.3. Data Partition
written as in equation (7).
1 (𝑖) The data set divided into 70% for training and 30% for
J(θ)= ∑𝑚 𝑖=1 𝐶𝑜𝑠𝑡(ℎ θ( 𝑥 ), 𝑦 )
(𝑖)
(7) testing purpose.
𝑚

2.2.2.3. K-Nearest Neighbor 2.2.4. Performances Evaluation Metrics

In order to classify people with PD and healthy people Evaluation metrics used to evaluate the performance of
K-NN classifier used in this study [31], K-NN classifier classifiers.In this study four performance evalauation metrics
predicts the class label of a new input, K-NN use the were used. Table 2 show the confusion metrics of binary
similarity of new input to its inputs samples in the training classification problem.
set. If the new input is same the samples in the training set
then K-NN classification performance is not good. Let (x, y) Table 1 Confusion Matrix [29]
be the training observations and learning function h: X →Y, Predicted PD Predicted Healthy
so that given an observation x, h(x) can determine y value Patient Person
[32]. The two output prediction class values of K-NN will be Actual PD
TP FN
associated to people with PD or healthy people. Where k was Patient
integer number. Different 8 values of k were used from Actual Healthy
k=[1,2,3,4,5,6,7,8]. K-NN classifier is more appropriate for FP TN
Person
classification of such problem because it not need additional
parameters. From table 1, we compute the following metirics as
expressed in equations (11), (12), (13) and (14) respectively
2.2.2.4. Deep neural network TP: true positive if subject is classified as PD.
TN: true negative if a healthy subject is classified as healthy.
A DNN is a feed-forward, artificial neural network that FP: false positive if a healthy subject is classified as PD.
has more than one layer of hidden units between its inputs FN: false negative of a PD is classified as healthy.
𝑇𝑃+𝑇𝑁
layer and its outputs. Each hidden unit, j, typically uses the Accuracy= *100 (11)
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
logistic function to map its total input from the layer below
xj, to the scalar state, yj that it sends to the layer above [33]. Sensitivity =
𝑇𝑃
*100 (12)
𝑇𝑃+𝐹𝑁
As shown in equation (8).
1
𝑦𝑗 = logistic (xj) = −𝑥 , xj = bj+∑𝑖 𝑦𝑖 𝑤𝑖𝑗 (8) 𝑇𝑁
1+𝑒 𝑗 Specificity = *100 (13)
𝑇𝑁+𝐹𝑃
Where bj is the bias of unit j and i is an index over units in 𝑇𝑃×𝑇𝑁−𝐹𝑃×𝐹𝑁
the layer under and wij is the weight on a connection to unit MCC = *100 (14)
√(𝑇𝑃+𝐹𝑃)(𝑇𝑃+𝐹𝑁)(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁)
j from unit i in the layer below. In binary classification the
output j converts its total input,𝑥𝑗 , into a class probability , 3. Experimental results analysis and discussion
𝑝𝑗 , by using the , ‘softmax’ nonlinearity .which is expressed
in equation (9). In this section of the paper, machine learning classifiers
𝑝𝑗
exp(𝑥𝑗 )
(9) and DNN classifier were used for prediction of PD. The PD
=
∑𝑘 exp(𝑥𝑘 ) dataset was used in this study and divided it into 70% for
Where k is index of class. DNNs can be discriminatively training purpose and 30 % testing. In order to check the
trained by back propagating derivatives of a cost function performance of classifiers performance evaluation metrics
that measures the discrepancy between the target outputs and were computed. All features were normalized and
the actual outputs produced for each training case [34]. When standardized before applying to classifiers. All computation
using the softmax Output function, the natural cost function was performed in python on an Intel(R) Core™ i5-2410
C is the cross entropy between the target probabilities d and M,CPU @3.10 GHz CPU,4 GB RAM, Window 10.
the outputs of the softmax p, as shown in equation (10). Experiment: Results of the different machine
C= -∑𝑗 𝑑𝑗 𝑙𝑜𝑔𝑝𝑗 (10) learning classifiers and DNN classifier for prediction of
Where the target probabilities, typically taking values of one PD.
or zero, are the supervised information provided to train the In this experiment PD was diagnosis by using machine

104
learning classifiers such as Logistic regression, SVM, k-NN.
The DNN classifier used the sequential model with 3 layers. Performance of classifiers
Layer 1 is input layer with 22 units and layer 2 is hidden layer Accuracy Specificity Sensitivity
with two units and layer 3 is output layer with one unit,
99
sigmoid activation function used for activation and soft-max 100
98
95 97 97 95
function was used. The performance evaluation metrics such 94
90 91 90

Performance rate
as classification accuracy, specificity, sensitivity and MCC
were automatically computed in order to evaluate the 80
78
performance of these classifiers and tabulated into table 2.
Hence table 2 shows the performances of these classifiers. 70 70
63
Table 2 The performance evaluation of classifiers 60

Performance evaluation metrics 50


Classifier
Param- Accuracy Specificity Sensitivity 40
MMC Logistic SVM K-NN DNN
eters (%) (%) (%)
regression
Logistic Classifiers
C=10 91 70 95 83
regression
Fig.2 Performances of classifiers
C=10,g
SVM 90 63 97 80
=0.025
4. Conclusions

K-NN K=5 94 78 97 88 In this study we analyzed the classification performance


of machine learning classifiers such as logistic regression,
DNN - 98 95 99 95 SVM and K-NN with deep neural network classifier for
prediction of PD diagnosis. The experimental results as
According to table 2 the logistic regression reported in table 2 and graphically shown in Fig 2 shows that
classification accuracy is 91%, sensitivity 95% and the deep neural network performances were excellent as
specificity 70 %. SVM obtained 90% accuracy, 63% compared to machine learning classifiers. The DNN obtained
specificity and 97 % sensitivity. K-NN performances were 98% classification accuracy, 95% specificity which shows
excellent as compared to logistic regression and SVM. The that DNN is best for detection of healthy people and 99%
K-NN at k=5 obtained 94% accuracy, 78% specificity and sensitivity of DNN good for detection of PD.
97% sensitivity. Hence for PD diagnosis the K-NN is best From these finding we suggests that for PD diagnosis
machine learning classifier. The performances of DNN were the deep neural network classifier is best and it could be
excellent as compared to machine learning classifiers and easily incorporated in designing a diagnosis system for PD.
obtained 98% classification accuracy, 95% specificity and 99%
sensitivity. Hence for PD diagnosis the DNN is more Acknowledgements
effective the traditional machine learning classifiers. The
performances of these classifiers are shown in Fig.2 This paper was supported by the National Natural
graphically for better understanding. Science Foundation of China (Grant No. 61370073), the
National High Technology Research and Development
Program of China (Grant No. 2007AA01Z423), the project
of Science and Technology Department of Sichuan Province,
and Chengdu Civil-military Integration Project Management
Co., Ltd.

References

[1] J. R. Duffy, "Motor Speech Disorders: Substrates,


Differential Diagnosis, and Management," Elsevier,
2005.

105
[2] B. E. Sakar, et al., "Collection and analysis of a [16] J. G. Švec, et al., "Measurement of vocal doses in
Parkinson speech dataset with multiple types of sound speech: Experimental procedure and signal
recordings," IEEE Journal of Biomedical and Health processing," Logopedics Phoniatrics Vocology, vol. 28,
Informatics, vol. 17, pp. 828-834, 2013. pp. 181-192, 2003.
[3] J. Jankovic, "Parkinson’s disease: clinical features and [17] N. Cristianini and J. Shawe-Taylor, "An introduction to
diagnosis," Journal of Neurology, Neurosurgery & support vector machines," ed: Cambridge University
Psychiatry, vol. 79, pp. 368-376, 2008. Press Cambridge, United Kingdom: 2000.
[4] H. B. empster PA, Lees AJ, " A new look at James [18] C.-C. Chang and C.-J. Lin, "LIBSVM: a library for
Parkinson’s essay on the shaking palsy.," Neurology, support vector machines," ACM transactions on
2007. intelligent systems and technology (TIST), vol. 2, p. 27,
[5] N. C. C. f. C. Conditions, "Parkinson's disease: national 2011.
clinical guideline for diagnosis and management in [19] H.-L. Chen, et al., "A support vector machine classifier
primary and secondary care," 2006. with rough set-based feature selection for breast cancer
[6] B. Harel, et al., "Variability in fundamental frequency diagnosis," Expert Systems with Applications, vol. 38,
during speech in prodromal and incipient Parkinson's pp. 9014-9022, 2011.
disease: A longitudinal case study," Brain and [20] J. Mourao-Miranda, et al., "Classifying brain states and
cognition, vol. 56, pp. 24-29, 2004. determining the discriminating activation patterns:
[7] A. Tsanas, et al., "Nonlinear speech analysis algorithms support vector machine on functional MRI data,"
mapped to a standard metric achieve clinically useful NeuroImage, vol. 28, pp. 980-995, 2005.
quantification of average Parkinson's disease symptom [21] V. D. Sánchez A, "Advanced support vector machines
severity," Journal of the royal society interface, vol. 8, and kernel methods," Neurocomputing, vol. 55, pp. 5-
pp. 842-855, 2011. 20, 2003.
[8] A. Tsanas, et al., "Accurate telemonitoring of [22] F. E. Harrell, "Ordinal logistic regression," in
Parkinson’s disease progression by non-invasive Regression modeling strategies, ed: Springer, 2015, pp.
speech tests," 2008. 311-325.
[9] İ. Cantürk and F. Karabiber, "A Machine Learning [23] K. Larsen, et al., "Interpreting parameters in the logistic
System for the Diagnosis of Parkinson’s Disease from regression model with random effects," Biometrics, vol.
Speech Signals and Its Application to Multiple Speech 56, pp. 909-914, 2000.
Signal Types," Arabian Journal for Science and [24] V. Vapnik, The nature of statistical learning theory:
Engineering, vol. 41, pp. 5049-5059, 2016. Springer science & business media, 2013.
[10] Z. Cai, et al., "A new hybrid intelligent framework for [25] N. S. Altman, "An introduction to kernel and nearest-
predicting parkinson’s disease," IEEE Access, vol. 5, neighbor nonparametric regression," The American
pp. 17188-17200, 2017. Statistician, vol. 46, pp. 175-185, 1992.
[11] L. Naranjo, et al., "A two-stage variable selection and [26] X. Wu, et al., "Top 10 algorithms in data mining,"
classification approach for Parkinson’s disease Knowledge and information systems, vol. 14, pp. 1-37,
detection by using voice recording replications," 4, December 2008.
Computer methods and programs in biomedicine, vol. [27] L. D. Geoffrey Hinton, Dong Yu, George E. Dahl,
142, pp. 147-156, 2017. Abdel-Rahman Mohamed, Navdeep Jaitly, and V. V.
[12] A. H. Al-Fatlawi, et al., "Efficient diagnosis system for Andrew Senior, Patrick Nguyen, Tara N. Sainath, and
Parkinson's disease using deep belief network," in IEEE Brian Kingsbury, "Deep neural networks for acoustic
Congress on Evolutionary Computation, 2016, pp. 1-8. modeling in speech recognition: The shared views of
[13] A. Caliskan, et al., "Diagnosis of the parkinson disease four research groups," IEEE Signal processing
by using deep neural network classifier," Istanbul magazine, vol. 29, pp. 82-97, 15, october 2012.
University-Journal of Electrical & Electronics [28] D. E. Rumelhart, et al., "Learning representations by
Engineering, vol. 17, pp. 3311-3319, 2017. back-propagating errors," nature, vol. 323, p. 533, 9,
[14] L. M, Lichman, M. UCI School of Information and October 1986.
Computer Science, 2013. [29] Amin Ul Haq et.al “A Hybrid Intelligent System
[15] M. A. Little, et al., "Suitability of dysphonia Framework for the Prediction of Heart Disease Using
measurements for telemonitoring of Parkinson's Machine Learning Algorithms”, Mobile Information
disease," IEEE transactions on biomedical engineering, system, volume 2018, 2, December 2018
vol. 56, pp. 1015-1022, 2009.

106

You might also like