Professional Documents
Culture Documents
School of Computer Science and Engineering, University of Electronic Science and Technology of China Chengdu
611731-China
E-MAIL:khan.amin50@yahoo.com, jpli2222@uestc.edu.cn, muhammadhammadmemon@yahoo.com,
jalal4amu@yahoo.com, swati_pak2003@yahoo.com, ijazahad1@gmail.com, rui_nan@outlook.com, 641449078@qq.com
102
column which is set to 0 for health and 1 for PD. For each and healthy people classification. Here is the brief theoretical
subject an average of 6 phonation of a vowel was recorded and mathematical backgrounds of these classifiers are
for 36 second and total 195 samples were recorded. The presented.
phonations were recorded in an industrial Acoustic Company
(ICA) sound-treated booth using microphone which at 2.2.2.1. Support vector machine
distance 8 cm from moth and microphone was calibrated as
describe in [22]. The voice signals were recorded in The support vector machine is machine learning
computer using computerized speech laboratory. algorithm which has been mostly used for classification
problems [16][23-27]. SVM used a maximum margin
2.2. Methodology of proposed system strategy that transformed into solving a complex quadratic
programming Problem. The high performance of SVM in
The proposed system has been developed with the aim classification, various applications widely applied it [26-27].
to classify people with Parkinson’s disease and healthy In a binary classification problem, the instances are separated
people. In the system designing machine learning predictive with a hyper plane w T x + b = 0 , where w and d
classifiers such as SVM, Logistic regression, k-NN and deep dimensional coefficient vector, which is normal to the hyper
neural network classifier were used to classify PD and plane of the surface and b, is offset value from the origin, x
healthy people. The PD dataset which is online available at is data set values. The SVM get results of w and b. The W
UCI machine learning repository was used to test the can solve by introducing Lagrangian multipliers in the linear
proposed system. The dataset was divided into 70% for case. The data points on borders are called support vectors.
training and The solution of w can be expressed in equation (1).
30% for testing purpose. Four performance evaluation w=∑𝑛𝑖=1 𝛼𝑖 𝑦𝑖 𝑥𝑖 (1)
metrics were used for predictive model performance Where n is the number of Support vectors, yi are target
evaluation. The methodology of the proposed classification labels to x. The value of w and b are calculated; the linear
system organized into these steps preprocessing of dataset, discriminant function can be written as in equation (2).
data partition, machine learning and deep neural network f(x)= sgn ( ∑ni=1 αiyi xiT x + b ) (2)
classifier and classifiers performance evaluation. Fig.1; The non-linear scenario, for kernel trick and decision
show the framework of the proposed system. function, can be written as in equation (3).
f(x) = sgn ( ∑ni=1 αiyi K(xi, x)+b ) (3)
The positive semi definite functions that obey the Mercer‘s
condition as kernel functions [25]. Such as the polynomial
kernel as expressed in equation (4).
(K(x, xi) = ((xT xi) +1) d) (4)
Fig.1 Proposed system model The Gaussian kernel as expressed in equation (5).
(K(x, xi) = exp (-γ||x – xi||2)) (5)
2.2.1. Data preprocessing
2.2.2.2. Logistic regression
The preprocessing of data is a very necessary step for
good representation of data and machine-learning classifier A logistic regression is a classification algorithm [28-
should be trained and tested with it effectively. 30]. For binary classification problem in order to predict the
Preprocessing techniques such as removing of missing value of predictive variable y when y ∈ [0, 1], 0 is negative
values, standard scalar, Min-Max scalar have been applied to class and 1 is positive class. It also uses for multi-
the dataset for effective used in classifiers. In standard Scalar classification to predict the value of y when y ∈ [0, 1, 2, 3].
ensures that every feature has the mean 0 and variance 1, In order to classify two classes 0 and 1, a hypothesis
T
bringing all features to the same coefficient. Similarly, in h(θ) = θ X will be designed and threshold classifier
Min-Max Scalar shifts the data such that all features are output is hθ (x) at 0.5. If the value of hypothesis
between 0 and 1. hθ(x)>=0.5, it will predict y=1 mean that person have PD
and if the value of hθ(x) < 0.5, then predict y=0 which show
2.2.2. Machine learning and deep neural network that person is healthy. Hence the prediction of logistic
classifiers regression under the condition 0<= hθ (x) <=1.
Logistic regression sigmoid function written as in
In this study the following classifiers were used for PD equation (6).
103
hθ(x) = g (θ𝑇 𝑋) (6) DNN classifier.
1
Where g (z) = hθ(x) =
1+𝑥 −𝑧
Similarly, the logistic regression cost function can be 2.2.3. Data Partition
written as in equation (7).
1 (𝑖) The data set divided into 70% for training and 30% for
J(θ)= ∑𝑚 𝑖=1 𝐶𝑜𝑠𝑡(ℎ θ( 𝑥 ), 𝑦 )
(𝑖)
(7) testing purpose.
𝑚
In order to classify people with PD and healthy people Evaluation metrics used to evaluate the performance of
K-NN classifier used in this study [31], K-NN classifier classifiers.In this study four performance evalauation metrics
predicts the class label of a new input, K-NN use the were used. Table 2 show the confusion metrics of binary
similarity of new input to its inputs samples in the training classification problem.
set. If the new input is same the samples in the training set
then K-NN classification performance is not good. Let (x, y) Table 1 Confusion Matrix [29]
be the training observations and learning function h: X →Y, Predicted PD Predicted Healthy
so that given an observation x, h(x) can determine y value Patient Person
[32]. The two output prediction class values of K-NN will be Actual PD
TP FN
associated to people with PD or healthy people. Where k was Patient
integer number. Different 8 values of k were used from Actual Healthy
k=[1,2,3,4,5,6,7,8]. K-NN classifier is more appropriate for FP TN
Person
classification of such problem because it not need additional
parameters. From table 1, we compute the following metirics as
expressed in equations (11), (12), (13) and (14) respectively
2.2.2.4. Deep neural network TP: true positive if subject is classified as PD.
TN: true negative if a healthy subject is classified as healthy.
A DNN is a feed-forward, artificial neural network that FP: false positive if a healthy subject is classified as PD.
has more than one layer of hidden units between its inputs FN: false negative of a PD is classified as healthy.
𝑇𝑃+𝑇𝑁
layer and its outputs. Each hidden unit, j, typically uses the Accuracy= *100 (11)
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
logistic function to map its total input from the layer below
xj, to the scalar state, yj that it sends to the layer above [33]. Sensitivity =
𝑇𝑃
*100 (12)
𝑇𝑃+𝐹𝑁
As shown in equation (8).
1
𝑦𝑗 = logistic (xj) = −𝑥 , xj = bj+∑𝑖 𝑦𝑖 𝑤𝑖𝑗 (8) 𝑇𝑁
1+𝑒 𝑗 Specificity = *100 (13)
𝑇𝑁+𝐹𝑃
Where bj is the bias of unit j and i is an index over units in 𝑇𝑃×𝑇𝑁−𝐹𝑃×𝐹𝑁
the layer under and wij is the weight on a connection to unit MCC = *100 (14)
√(𝑇𝑃+𝐹𝑃)(𝑇𝑃+𝐹𝑁)(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁)
j from unit i in the layer below. In binary classification the
output j converts its total input,𝑥𝑗 , into a class probability , 3. Experimental results analysis and discussion
𝑝𝑗 , by using the , ‘softmax’ nonlinearity .which is expressed
in equation (9). In this section of the paper, machine learning classifiers
𝑝𝑗
exp(𝑥𝑗 )
(9) and DNN classifier were used for prediction of PD. The PD
=
∑𝑘 exp(𝑥𝑘 ) dataset was used in this study and divided it into 70% for
Where k is index of class. DNNs can be discriminatively training purpose and 30 % testing. In order to check the
trained by back propagating derivatives of a cost function performance of classifiers performance evaluation metrics
that measures the discrepancy between the target outputs and were computed. All features were normalized and
the actual outputs produced for each training case [34]. When standardized before applying to classifiers. All computation
using the softmax Output function, the natural cost function was performed in python on an Intel(R) Core™ i5-2410
C is the cross entropy between the target probabilities d and M,CPU @3.10 GHz CPU,4 GB RAM, Window 10.
the outputs of the softmax p, as shown in equation (10). Experiment: Results of the different machine
C= -∑𝑗 𝑑𝑗 𝑙𝑜𝑔𝑝𝑗 (10) learning classifiers and DNN classifier for prediction of
Where the target probabilities, typically taking values of one PD.
or zero, are the supervised information provided to train the In this experiment PD was diagnosis by using machine
104
learning classifiers such as Logistic regression, SVM, k-NN.
The DNN classifier used the sequential model with 3 layers. Performance of classifiers
Layer 1 is input layer with 22 units and layer 2 is hidden layer Accuracy Specificity Sensitivity
with two units and layer 3 is output layer with one unit,
99
sigmoid activation function used for activation and soft-max 100
98
95 97 97 95
function was used. The performance evaluation metrics such 94
90 91 90
Performance rate
as classification accuracy, specificity, sensitivity and MCC
were automatically computed in order to evaluate the 80
78
performance of these classifiers and tabulated into table 2.
Hence table 2 shows the performances of these classifiers. 70 70
63
Table 2 The performance evaluation of classifiers 60
References
105
[2] B. E. Sakar, et al., "Collection and analysis of a [16] J. G. Švec, et al., "Measurement of vocal doses in
Parkinson speech dataset with multiple types of sound speech: Experimental procedure and signal
recordings," IEEE Journal of Biomedical and Health processing," Logopedics Phoniatrics Vocology, vol. 28,
Informatics, vol. 17, pp. 828-834, 2013. pp. 181-192, 2003.
[3] J. Jankovic, "Parkinson’s disease: clinical features and [17] N. Cristianini and J. Shawe-Taylor, "An introduction to
diagnosis," Journal of Neurology, Neurosurgery & support vector machines," ed: Cambridge University
Psychiatry, vol. 79, pp. 368-376, 2008. Press Cambridge, United Kingdom: 2000.
[4] H. B. empster PA, Lees AJ, " A new look at James [18] C.-C. Chang and C.-J. Lin, "LIBSVM: a library for
Parkinson’s essay on the shaking palsy.," Neurology, support vector machines," ACM transactions on
2007. intelligent systems and technology (TIST), vol. 2, p. 27,
[5] N. C. C. f. C. Conditions, "Parkinson's disease: national 2011.
clinical guideline for diagnosis and management in [19] H.-L. Chen, et al., "A support vector machine classifier
primary and secondary care," 2006. with rough set-based feature selection for breast cancer
[6] B. Harel, et al., "Variability in fundamental frequency diagnosis," Expert Systems with Applications, vol. 38,
during speech in prodromal and incipient Parkinson's pp. 9014-9022, 2011.
disease: A longitudinal case study," Brain and [20] J. Mourao-Miranda, et al., "Classifying brain states and
cognition, vol. 56, pp. 24-29, 2004. determining the discriminating activation patterns:
[7] A. Tsanas, et al., "Nonlinear speech analysis algorithms support vector machine on functional MRI data,"
mapped to a standard metric achieve clinically useful NeuroImage, vol. 28, pp. 980-995, 2005.
quantification of average Parkinson's disease symptom [21] V. D. Sánchez A, "Advanced support vector machines
severity," Journal of the royal society interface, vol. 8, and kernel methods," Neurocomputing, vol. 55, pp. 5-
pp. 842-855, 2011. 20, 2003.
[8] A. Tsanas, et al., "Accurate telemonitoring of [22] F. E. Harrell, "Ordinal logistic regression," in
Parkinson’s disease progression by non-invasive Regression modeling strategies, ed: Springer, 2015, pp.
speech tests," 2008. 311-325.
[9] İ. Cantürk and F. Karabiber, "A Machine Learning [23] K. Larsen, et al., "Interpreting parameters in the logistic
System for the Diagnosis of Parkinson’s Disease from regression model with random effects," Biometrics, vol.
Speech Signals and Its Application to Multiple Speech 56, pp. 909-914, 2000.
Signal Types," Arabian Journal for Science and [24] V. Vapnik, The nature of statistical learning theory:
Engineering, vol. 41, pp. 5049-5059, 2016. Springer science & business media, 2013.
[10] Z. Cai, et al., "A new hybrid intelligent framework for [25] N. S. Altman, "An introduction to kernel and nearest-
predicting parkinson’s disease," IEEE Access, vol. 5, neighbor nonparametric regression," The American
pp. 17188-17200, 2017. Statistician, vol. 46, pp. 175-185, 1992.
[11] L. Naranjo, et al., "A two-stage variable selection and [26] X. Wu, et al., "Top 10 algorithms in data mining,"
classification approach for Parkinson’s disease Knowledge and information systems, vol. 14, pp. 1-37,
detection by using voice recording replications," 4, December 2008.
Computer methods and programs in biomedicine, vol. [27] L. D. Geoffrey Hinton, Dong Yu, George E. Dahl,
142, pp. 147-156, 2017. Abdel-Rahman Mohamed, Navdeep Jaitly, and V. V.
[12] A. H. Al-Fatlawi, et al., "Efficient diagnosis system for Andrew Senior, Patrick Nguyen, Tara N. Sainath, and
Parkinson's disease using deep belief network," in IEEE Brian Kingsbury, "Deep neural networks for acoustic
Congress on Evolutionary Computation, 2016, pp. 1-8. modeling in speech recognition: The shared views of
[13] A. Caliskan, et al., "Diagnosis of the parkinson disease four research groups," IEEE Signal processing
by using deep neural network classifier," Istanbul magazine, vol. 29, pp. 82-97, 15, october 2012.
University-Journal of Electrical & Electronics [28] D. E. Rumelhart, et al., "Learning representations by
Engineering, vol. 17, pp. 3311-3319, 2017. back-propagating errors," nature, vol. 323, p. 533, 9,
[14] L. M, Lichman, M. UCI School of Information and October 1986.
Computer Science, 2013. [29] Amin Ul Haq et.al “A Hybrid Intelligent System
[15] M. A. Little, et al., "Suitability of dysphonia Framework for the Prediction of Heart Disease Using
measurements for telemonitoring of Parkinson's Machine Learning Algorithms”, Mobile Information
disease," IEEE transactions on biomedical engineering, system, volume 2018, 2, December 2018
vol. 56, pp. 1015-1022, 2009.
106