You are on page 1of 6

Parkinson’s Disease Identification using KNN and

ANN Algorithms based on Voice Disorder


Ouhmida Asmae Raihani Abdelhadi Cherradi Bouchaib
SSDIA Laboratory, ENSET SSDIA Laboratory, ENSET SSDIA Laboratory, ENSET
Mohammedia Mohammedia Mohammedia
Hassan II Univiversity Hassan II Univiversity Hassan II Univiversity
Casablanca, Morocco Casablanca, Morocco Casablanca, Morocco
asmaeouhmida1995@gmail.com raihani@enset-media.ac.ma bouchaib.cherradi@gmail.com

Sandabad Sara Khalili Tajeddine


SSDIA Laboratory, ENSET SSDIA Laboratory, ENSET
Mohammedia Mohammedia
Hassan II Univiversity Hassan II Univiversity
Casablanca, Morocco Casablanca, Morocco

Abstract—In recent years, speech signal processing has necessary information for the assessment of PD. The extracted
benefited from a lot of attention, because of its widespread characteristics are transmitted to learning algorithms for the
application. In this study, we have led a comparative analysis for creation of reliable decision support systems.
efficient detection of Parkinson’s disease applied to machine
learning classifiers from voice disorder known as dysphonia. To Currently, the commonly used acoustic parameters in
prove robust detection process, we used Artificial Neural acoustic analysis applications as well as the most cited in the
Networks (ANN) and K Nearest Neighbors (KNN) algorithms, literature are fundamental frequency, jitter, shimmer and
in the purpose of distinguishing between PD patient and healthy HNR.
individual. Experimental results show that the ANN classifier
achieved higher average performance than the KNN classifier The fundamental frequency (F0), measured in hertz, is
in term of accuracy. The UCI Experiment consists of 31 subjects defined as the number of times the vocal cords repeat a
of which 23 were diagnosed with Parkinson's disease. The produced sound wave during a given period. It is also the
established system is able to distinguish healthy people from an number of opening / closing cycles of the glottis. There is a
acceptable range of people with PD with an accuracy rate of typical range of values for this frequency for different genders
96.7% by using ANN. and ages. But these values are not stationary since F0 is also
used to transmit prosody [1].
Keywords—Parkinson’s disease; ANN; KNN; dysphonia
Jitter is defined as the parameter of frequency variation
I. INTRODUCTION from one cycle to the next. It is mainly affected by the lack of
control over the vibration of the vocal cords; the voices of
Neurodegenerative disease has human, social and patients with pathologies often have a higher percentage of
financial impacts, on a personal, professional and social level. jitter. Most researchers have estimated that sustained
It is a progressive pathology that affects the brain and the phonation for adults range from 0.5 to 1.0%.
nervous system, leading to the death of nerve cells. The most
known and frequent ones are Alzheimer's and Parkinson's Shimmer refers to the variation in amplitude of the sound
diseases. Parkinson’s disease, is particularly linked to the loss wave [2]. It changes according to reduced glottal resistance
of dopamine-producing neurons in the basic ganglia. In fact, and massive damage of the vocal cords and is correlated with
medication fees for Parkinson's disease are very expensive. At the presence of noise and breathing. It is considered as
the moment, no cure has been found. Medication is limited to pathological voice for values below 3% for adults and between
treatments, at an early stage, to improve the patient's quality 0.4 and 1% for children [3].
of life.
The detection of Parkinson's disease is based on the use of
Several methods have been used to detect the symptoms different classifiers. The distinction between them is based on
of Parkinson's disease, but most of them require motor actions measurement criteria, namely classification accuracy,
that appear only in an advanced state of the disease. Matthews’s correlation coefficient (MCC), Spearman
correlation coefficient, specificity, sensitivity, F-score (F-
Most used traditional methods for the determination of the measure) ... etc. Each of these measurement criteria has
disease, are costly invasive methods namely SPECT and CT formulas to calculate it and conclude which is the most
tomography’s which are effective, essentially, in the mature qualitatively adequate classifier for the study.
stage of the disease. Besides classic methods, practitioners
adopt several diagnostic paths. Some of them were based on Before defining these criteria, we must focus on the
handwriting by considering the relationship between confusion matrix. Called a contingency table, it is a tool for
handwriting and nervous system problems. Others have relied measuring the performance of a learning model, checking how
on peripheral biomarkers for early detection of PD. far its predictions are correct, compared to reality in
classification problems.
The current study focuses on voice analysis based
diagnostic paths. In fact, voice-based systems are the focal Table I shows the confusion matrix for a two-class
point of recent PD telemedicine studies. Thus, several speech classifier.
signal processing algorithms have been used to extract the

978-1-7281-4979-0/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE Xplore. Restrictions apply.
TABLE I. CONFUSION MATRIX 𝑇𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = (6)
Predicted Class 𝑇𝑁 + 𝐹𝑃
Positive Negative F1-score:
Actual Positive (TP) (FN)
Class It represents the harmonic average of the recall and the
Negative (FP) (TN)
precision:
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦
TP (True Positive): the prediction is positive while the 𝐹𝑠𝑐𝑜𝑟𝑒 = 2 ∗ (7)
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦
actual real value is positive.
In general, the evaluation of a classifier is based on the
TN (True Negative): the prediction is negative while the confusion matrix. And it is from the measurement criteria
actual real value is negative. extracted from this matrix that we quantify the performance of
FP (False Positive): the prediction is positive while the a classifier.
actual real value is negative. In order to accomplish the evaluation, several authors
FN (False Negative): the prediction is negative while the based their work on the overall accuracy, which is an index of
actual real value is positive. overall performance, since, among all the reference sites, it
indicates what proportion has been correctly mapped and
a) Matthews’s correlation coefficient (MCC): measures performance regardless of the size of the classes.
The overall accuracy is generally expressed as a percentage,
𝑇𝑃. 𝑇𝑁 − 𝐹𝑃. 𝐹𝑁 and at a value of 100%, it constitutes a perfect classification
𝑀𝐶𝐶 = (1)
√(𝑇𝑃 + 𝐹𝑃). (𝑇𝑃 + 𝐹𝑁). (𝑇𝑁 + 𝐹𝑃). (𝑇𝑁 + 𝐹𝑁) in which all the reference sites have been classified correctly.
In this study, two classifiers were applied for
distinguishing healthy people from people with Parkinson’s
This coefficient is used as a quality measurement of binary Disease (PD), which are the ANN and the KNN. According to
and multi-class classification. It’s based on the true and false the precision classifier, the ANN gives a classification rate of
positive and negative. It returns a value ranging from -1 to 1. 96.7% and the KNN gives 79.3% of accuracy.

(1): Perfect classifier (perfect prediction). II. MATERIALS AND METHODS:


(0): No better than random prediction. The current study is based on Parkinson's UI Machine
Learning [4] database, having acoustic characteristics.
(-1): Contradiction between prediction and observation.
A. Data set:
b) Spearman correlation coefficient:
The database consists of 195 sustained vocal phonations
This coefficient analyzes monotonic non-linear relations (if of 31 male and female subjects, of which 23 were diagnosed
one of the variables increases, the other does the same, and with Parkinson's disease. The age of the patients varies
vice versa). between 46 and 85 years (average of 65.8, standard deviation
of 9.8). For each patient, averages of six phonations were
6 ∑ di 2 recorded, with a length ranging from 1 to 36 s.
rs = 1 − (2)
n(n2 − 1)
According to this database, the main goal is to distinguish
healthy people from those suffering from Parkinson's disease.
n: number of observations.
It is done based on the "status" column, which is defined by 0
𝑑𝑖 = 𝑔(𝑋𝑖 ) − 𝑟𝑔(𝑌𝑖 ): The difference between the two rows of for healthy people and 1 for patients with PD. Each column in
each observation the table represents a measure of voice and each line
c) Classification accuracy, specificity, sensitivity: represents one of the 195 recordings of the voice of these
The classification model can be evaluated based on the individuals (the column ‘name’).
confusion matrix. According to the evaluation results the
B. Speech recordings:
performance measures are then extracted.
The subjects had ages ranging from 46 years to 85 years
Precision: Represents the proportion of correct predictions (average of 65.8 and standard deviation of 9.8). For each of
among the points that have been predicted positives. the subjects, averages of six phonations, lasting from 1 to 36
𝑇𝑃 s, were recorded. See Table I for subject details. Figure 3
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (3) illustrates plots of two of these voice signals [5].
𝑇𝑃 + 𝐹𝑃
Accuracy: the number of correct predictions made by the
model over all kinds predictions made.
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (4)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Sensitivity: It represents the rate of true positives.
𝑇𝑃
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = (5)
𝑇𝑃 + 𝐹𝑁
Specificity: It represents the rate of true negatives.

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE Xplore. Restrictions apply.
Table Head Table Column Head
MDVP: RAP Relative Amplitude Perturbation

MDVP: PPQ Five-point Period Perturbation Quotient

Jitter: DDP Average absolute difference of differences between


cycles, divided by the average period
MDVP: Shimmer Shimmer Local amplitude perturbation

MDVP: Shimmer Local amplitude perturbation (decibels)


(db)
Shimmer: APQ3 3-point Amplitude Perturbation Quotient

Shimmer: APQ5 5-point Amplitude Perturbation Quotient

MDVP: APQ 11-point Amplitude Perturbation Quotient


Fig. 1. Example of speech signal of a healthy individual
Shimmer: DDA Average absolute difference between the
amplitudes of consecutive periods
NHR Noise-to-Harmonics Ratio

HNR Harmonics-to-Noise Ratio

RPDE Recurrence Period Density Entropy

D2 Correlation Dimension

DFA Detrended Fluctuation Analysis

Spread1 Fundamental frequency variation

Spread2 Fundamental frequency variation

PPE Pitch period entropy

Fig. 2. Example of speech signal of a subject with PD

The phonations were recorded in an Industrial Acoustics To calculate the jitter and period disturbance measurements, it
Company (IAC) sound-treated booth using a head-mounted is necessary to use the sequence of frequencies for each vocal
microphone (AKG C420) positioned at 8 cm from the lips. cycle, by taking the successive absolute differences between
The microphone was calibrated using a class 1 sound-level the frequencies of each cycle and calculating the average over
meter (B&K 2238) placed 30 cm from the speaker. a varying number of cycles, eventually normalizing by the
Computerized Speech Laboratory (CSL) 4300B hardware overall average.
(KAY Elemetrics) was used to record the voice signals The shimmer and amplitude perturbation measures are
directly on computer, sampled at 44.1 kHz, with 16-bit obtained from the sequence of the maximum extent of the
resolution [5]. amplitude of the signal within each vocal cycle. The average
C. Feature Extraction / Selection : difference of this sequence is taken as a measure of the
variation between cycle amplitudes. The noise-to-harmonics
To calculate the characteristics, it is necessary to base the (and harmonics-to-noise) ratios are obtained from the signal-
selection on traditional as well as non-standard measurement to-noise estimates of the autocorrelation of each cycle. To
methods for all voice signals. Each method produces a unique know more about how to calculate the traditional measures
number for each of the 195 signals. Table II shows a list of the refer back to [6], [8] and [7].
measures used as characteristics of this study.
2) Calculation of Nonstandard measures:
1) Calculation of traditional measures The calculation of the correlation dimension (D2) is based
This calculation was made using Praat software [6]. on time-delay embedding the signal to reconstruct the phase
Traditional measurements are based on the application of space of the nonlinear dynamical system in order to generate
short-term autocorrelation to successive segments of the the speech signal [10]. In this recreation phase space, complex
signal, with peak peaking to determine the frequency of dynamics which are implicated in dysphonia, were indicated
vibration of the vocal cords (F0 or pitch period), and location by a geometrically self-similar (fractal) object [9]. We employ
in time of the beginning of each cycle of vibration of the vocal the Time Series Analysis (TISEAN) implementation [11]. The
cords (pitch marks) [7]. recurrence period density entropy (RPDE) quantifies the
extent to which dynamics in the reconstructed phase space
TABLE II. ATTRIBUTE DETAILS FOR DIAGNOSIS OF PD PATIENT
after time-delay embedding can be regarded as strictly
Table Head Table Column Head periodic, i.e., repeating exactly [13]. The recurrence period T
MDVP: F0 (Hz) Average vocal fundamental frequency represent a recurrent signal that, after a certain length of time,
returns to the same point in the phase space. It has been shown
MDVP: Fhi (Hz) Maximum vocal fundamental frequency
that the diversion from periodicity estimated by the entropy H
MDVP: Flo (Hz) Minimum vocal fundamental frequency of the distribution of these recurrence periods 𝑃(𝑇) is a
convenient indicator of general voice disorders, as general
MDVP: Jitter (%) Fundamental frequency perturbation (%)
voice pathologies guide to impairment in the capability to
MDVP:Jitter Absolute jitter in microseconds sustain regular vibration of the vocal folds [13]. The RPDE
(Abs)

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE Xplore. Restrictions apply.
values ( 𝐻𝑛𝑜𝑟𝑚 ) is normalized by dividing through by the
entropy of the uniform distribution, to the range [0,1].
Finally, DFA represents a measure of the extent of the
stochastic self-similarity of the noise in the speech signal.
Mostly, turbulent airflow generates the noise in speech
through the vocal folds [13]. It is noticed that for general voice
disorders, the scaling exponent is larger for dysphonic than
healthy subjects [8]. The extent of amplitude variation 𝐹(𝐿)
of the speech signal over a range of times scales is calculated
by the DFA algorithm. And the self-similarity of the speech
signal is measured by the slope 𝛼 of a straight line on a log- Fig. 3. General Structure of Artificial Neural Network
log plot of 𝐿 versus 𝐹(𝐿). These slope values (𝛼𝑛𝑜𝑟𝑚 ) are then
normalized by a simple nonlinear transformation to the range 2) K-NN Classification:
[0,1][8]. The K-Nearest Neighbors or KNN is one of the supervised
machine learning algorithms most used and one of the
D. Classifiers baselines: simplest classification algorithms. It’s a method to data
In our study, the machine learning algorithms used are classification, that determines what group a data point is in by
Artificial Neural Networks (ANN) and K-nearest neighbors looking at the data points around it. A case is classified based
(KNN). They are some of the most popular classifiers in on the majority vote of its neighbors, with the case being
literature. Each one of them, has his own techniques and assigned to the class most common amongst its K nearest
methods of tuning that will be used to classify PD patients neighbors measured by a distance function. In summary, the
from healthy ones. formal k-NN classifier algorithm is as follows:
1) ANN Classification: arg min(𝑑𝑒 (𝑡, 𝑜, 𝑘))  Identify P (10)
The biological neural networks were behind the
appearance of the artificial neural networks. It comprises a Where 𝑡 is the training data, 𝑜 is the object to be
large network of interconnected neurons which exchange classified, 𝑃 is the assigned class of the new object, 𝑘 is the
signals between each other. These links are weighted based on number of closest neighbors to be considered, and 𝑑𝑒 is the
past experiences. Thus; it’s an adaptive network capable of Euclidean distance given by:
learning [14].
L
The Artificial neural networks are widely used to model de (t, o, k) = √∑( t i,k − oi,k )2 (11)
complex patterns, prediction and classification problems. i=1
Fig.3 shows the general structure of artificial neural network.
During the training of the ANN, there are many different Where 𝐿 is the length of each of data vector.
optimization algorithms. They, all, have different
characteristics and performance regarding requirements, To have the optimal results, we should choose the optimal
speed and precision. Therefore, we should be very accurate to value for K and the appropriate method to calculate distance.
choose the correct algorithm in accordance with the correct Historically, the optimal value of k for most datasets has been
conclusion of the problem. The main optimization algorithms between 3 and 10. There are different ways of calculating
used in the training phase of multilayer networks are Gradient distance, but one way might be the preferable one according
Descent (GD), Batch Back Propagation (BBP), conjugate to the problem we are solving. However, the Euclidean
gradient descent (CGD) and Levenberg-Marquardt (LM) Distance is the popular choice.
training algorithms. In our study, it was preferred to use LM III. RESULTS AND DISCUSSION
algorithm for the classification process, which is a nonlinear
optimization algorithm. Developed by Kenneth Levenberg The present study is based on 22 acoustic features,
and Donald Marquardt, the LM algorithm represent a extracted from dysphonia measurements of people. It is used
combination of Newton’s method in back propagation and to identify PD patients. The 22 features are provided in table
gradient descent characteristics [15]. It has been designed to II.
work specifically with loss functions which take the form of a A. ANN Classification:
sum of squared errors. It works with the gradient vector and
the Jacobian matrix. To perform the classification process, Neural Network
Pattern Recognition tool (nprtool) was used in MATLAB. Our
m data were divided into 3 parts of artificial neural networks:
f = ∑ e2i (8) Training: the network is trained according to 70% from the
i=1 data set (137) which is given to the network during the
training.
Where m is the number of instances in the data set.
Validation: when the development of the network stopped,
The weight calculation equation used in LM algorithm is we use 5% from the data set to terminate the training.
shown in (9).
Test: to measure the performance parameters of the
W = (JT J + μU)−1 JT e (9) network during training and validation steps, we use 25%
Where J is Jacobian matrix, which is composed of from the data set. Figure 4 shows the view of ANN.
derivatives of errors at each weight, μ is a scalar, U unit matrix
and e represents the error vector of the network.

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE Xplore. Restrictions apply.
The regression graphs of the training, validation and test
data set are shown in figure 5, 6 and 7 respectively. The figure
8 shows the entire regression of the data set.

Fig. 4. The view of ANN

Fig. 8. All Dataset regression

B. K-NN Classification:
The data were divided into training (70%) and test (30%)
data. The KNN classifier method, was used here, to store all
the training data. We used ten-fold cross validation method,
which is a reliable technique widely used to assess the
accuracy of the predictive system and for avoiding the over
fitting. The number of neighbors was used to classify the new
example and the distance function was used to determinate the
nearest neighbors. The best accuracy of 79% was obtained
Fig. 5. Training regression when the number of neighbors taken was k=1, by using the
cosine distance.
By using ANN, 96.7% correct classification rate was
obtained. And for the KNN, the accuracy achieved was
79.31%. Thus, the classifier result that was produced by the
ANN is the best score to our knowledges.
Our results show that the ANN is better that the KNN for
classifying PD patients, according to the higher accuracy
obtained results.
IV. CONCLUSION:
In this study, we aimed to compare between machine
learning classifiers, Artificial Neural Networks (ANN) and K
nearest neighbors (KNN). We used them to detect PD patients
from healthy ones. The ANN algorithm yielded the best score.
The experiments results gained 96.7% classification accuracy
Fig. 6. Validation regression for ANN, depending on the data set and the number of
acoustic features used. MATLAB is one of the widely used
software for this purpose. Nowadays, in the medical imaging
domain, many classification techniques are applied in order to
achieve a highest accuracy.
This work can be extended to other machine learning
algorithms and diverse datasets in order to increase the
performance of classifiers to achieve the highest score of
accuracy. As a perspective we intend to work on hybrid
classification methods which can gather ANN and KNN with
other machine learning classifiers.

REFERENCES
[1] Teixeira, J. P.; Ferreira, D.; Carneiro, S.. Análise acústica vocal -
determinação do Jitter e Shimmer para diagnóstico de patalogias da
fala. In 6º Congresso Luso-Moçambicano de Engenharia. Maputo,
Moçambique, 2011.
Fig. 7. Test regression

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE Xplore. Restrictions apply.
[2] Zwetsch, I., Fagundes, R., Russomano, T., Scolari, D.. Digital signal [9] J. J. Jiang, Y. Zhang, and C. McGilligan, “Chaos in voice, from
processing in the differential diagnosis of beningn larynx diseases, modelingto measurement,” J. Voice, vol. 20, pp. 2–17, 2006.
Porto Alegre, 2006. [10] H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, New ed.
[3] Guimarães, Isabel. A Ciência e a Arte da Voz Humana. Escola Superior Cambridge, U.K.: Cambridge Univ. Press, 1999.
de Saúde de Alcoitão, 2007. [11] R. Hegger, H. Kantz, and T. Schreiber, “Practical implementation of
[4] M.A.Little, P.E. McSharry, E.J.Hunter, J.Spielmanand L.O.Ramig nonlinear time series methods: The TISEAN package,” Chaos, vol. 9,
“Suitability of dysphonia measurements for telemonitoring of pp. 413–435, 1999.
Parkinson's disease” IEEE Transactions on Biomedical Engineering, [12] M.A. Little, P. E. McSharry, S. J.Roberts, D. A. Costello, and I. M.
2009. Moroz, “Exploiting nonlinear recurrence and fractal scaling properties
[5] Little, M. A., McSharry, P. E., Hunter, E. J., Spielman, J., & Ramig, L. for voice disorder detection,” Biomed. Eng. Online., vol. 6, p. 23, 2007.
O. (2009). Suitability of Dysphonia Measurements for Telemonitoring [13] R. P. Dixit, “On defining aspiration,” in Proc. 12th Int. Conf.,
of Parkinson’s Disease Linguistics, Tokyo, Japan, 1988, pp. 606–610.
[6] P. Boersma and D. Weenink, “Praat, a system for doing phonetics by [14] Manik S, Saini L M and Vadera N 2016 Counting and classification of
computer,” Glot Int., vol. 5, pp. 341–345, 2001. white blood cell using artificial neural network (ANN) Proc. IEEE 1st
[7] P. Boersma, “Accurate short-term analysis of the fundamental International Conference on Power Electronics, Intelligent Control and
frequency and the harmonics-to-noise ratio of a sampled sound,” Energy Systems 2016
presented at the Inst. Phonet. Sci., University of Amsterdam, [15] Hagan M T and Menhaj M B 1994 Training feedforward networks with
Amsterdam, The Netherlands, 1993, vol. 17. the Marquardt algorithm IEEE transactions on Neural Networks 5(6)
[8] KayPENTAX , “Kay elemetrics disordered voice database, model pp 989-993.
4337,” Kay Elemetrics, Lincoln Park, NJ, 1996–2005.

Authorized licensed use limited to: Auckland University of Technology. Downloaded on June 04,2020 at 04:10:44 UTC from IEEE Xplore. Restrictions apply.

You might also like