You are on page 1of 5

Word Accuracy and Dynamic Time Warping to Assess Intelligibility Deficits in

Patients with Parkinsons Disease

J. C. Vasquez-Correa1 , J. R. Orozco-Arroyave1,2 , and E. Nöth2


1
Faculty of Engineering, University of Antioquia UdeA, Calle 70 No. 52-21, Medellı́n, Colombia.
2
Pattern Recognition Lab, Friedrich Alexander Universität, Erlangen-Nürnberg, Germany
jcamilo.vasquez@udea.edu.co

Abstract has been interested in the objective evaluation of the neu-


rological state of PD patients with the aim of developing
Parkinson’s disease patients develop several impair- technologies to monitor PD patients non-intrusively [3, 4].
ments related to the speech production process. The deficits This aim can be achieved by using speech analysis. Several
of the speech of the patients include reduction in the phona- studies in the literature have described the speech impair-
tion, articulation, prosody and intelligibility capabilities. ments of Parkinson patients in terms of phonation, artic-
Related studies have analyzed the phonation, articulation ulation, and prosody. Phonation is related with the capa-
and prosody of the patients with Parkinson’s, while the in- bility of the speaker to make the vocal folds vibrate, and
telligibility impairments have not been enough evaluated. it has been analyzed in terms of features related to per-
In this study we propose two novel features based on the turbation measures such as jitter, shimmer, the amplitude
word accuracy and the dynamic time warping algorithm perturbation quotient, the pitch perturbation quotient, and
with the aim of assess the intelligibility deficits of the pa- non-linear dynamics measures [5, 6]. Articulation is related
tients using an automatic speech recognition system. We with the modification of the position, stress, and shape of
evaluate the suitability of the features by the automatic clas- several limbs and muscles to produce the speech, and it
sification of utterances of patients vs. healthy controls, and has been described with features such as the vowel space
by predicting automatically the neurological state of the pa- area, the vowel articulation index, the formant cetralization
tients. According to results, an accuracy of up to 92% is ob- ratio, the diadochokinetic analysis, and the energy in the
tained, indicating that the proposed features are highly ac- voiced/unvoiced transition [7, 8, 9]. Finally prosody reflects
curate to detect Parkinson’s disease from speech. Regard- variation of loudness, pitch, and timing to produce natural
ing the automatic monitoring of the neurological state, the speech and it is commonly evaluated with features related
proposed approach could be used as complement of other to the fundamental frequency, the energy contour, and the
features derived from speech or other bio–signals to moni- speech rate [10]. Along with these three aspects of speech,
tor the neurological state of the patients. intelligibility is related to the capability of a person to be
Keywords: Parkinon’s disease, Intelligibility, Word ac- understood by other person or by a system. Intelligibility
curacy, Dynamic time Warping, Classification, Regression. is also deteriorated in Parkinsons patients causing loss of
communication abilities and social isolation specially at ad-
vanced stages of the disease [11, 12].
In this study two novel features are proposed to analyze
1. Introduction the intelligibility deficits of patients with PD. First one is
the word accuracy (WA) obtained from the Google API sys-
Parkinsons disease (PD) is a neurological disorder char- tem for automatic speech recognition (ASR) i.e., how many
acterized by the progressive loss of dopaminergic neu- words are correctly recognized by the system. Second one
rons on the midbrain, producing several motor and non- corresponds to a similarity score computed between sev-
motor impairments [1]. Speech disorders are among eral sentences read by the patients and the sequence of
the most prevalent, and an early sign of further motor words recognized by the Google API for ASR. The simi-
impairments [2]. The most common symptoms in the larity score is computed using the dynamic time warping
speech of PD patients include reduced loudness, monopitch, (DTW) algorithm. Two experiments are performed to ana-
monoloudness, reduced stress, breathy, hoarse voice qual- lyze the intelligibility of Parkinsons patients. (1) The auto-
ity, and imprecise articulation [2]. The research community matic classification of PD vs. healthy controls (HC) speak-

978-1-5090-3797-1/16/$31.00 2016
c IEEE
ers using a support vector machine (SVM) with a Gaussian time, and number of samples, performing a time–alignment
kernel, and (2) the automatic prediction of the neurological between the sequences. We compute the DTW distance be-
state of patients according to the standard unified Parkin- tween the predicted string i.e., the complete sentence us-
son disease rating scale (UPDRS). The scale is assigned by ing the ASR system and the original sentence read by the
neurologist experts and considers several items to evaluate patients. The distance is computed over the all text, at
motor activities such as finger tapping, walking, neck move- grapheme level.
ment, and others. The scale only includes one item to eval- Then the distance is transformed to a similarity score
uate speech [13]. The prediction of the neurological state using Equation 2. If the sequences are the same, the
of the patients is performed using a support vector regres- DT W distance will be zero, and the similarity will be 1,
sor (SVR) with a linear kernel. According to results, we and if the strings are very different, the DT W distance
obtain a high accuracy for the classification of PD vs. HC. will be high, and the similarity will be close to zero.
Most of patients also exhibit reduced intelligibility capabil-
ities compared to the HC speakers. In general, the WA and 1
the DTW-based features are reduced in patients compared similarity = (2)
1 + DTW distance
to HC speakers. For the prediction of the neurological state
of the patients, the proposed features show to be promising An example of the features computed for six sentences
for that task, and might be considered as a complement to is shown in Table 1
analyze speech deficits in Parkinson disease.
Table 1. Example of the intelligibility features for six sentences
The rest of paper is as follows: Section 2 contains the Original String Predicted String WA DTW
description of the methods used for feature extraction, clas-
Mi casa tiene tres Ricardo tiene tres
sification, and regression, section 3 describes the database, 0.60 0.70
cuartos cuartos
and the experiments performed. Section 4 details the results
Omar, que vive Omar vive cerca
of this study, and section 5 describes the main conclusions. 0.50 0.38
cerca, trajo miel dragón bien
Laura sube al tren Laura soultrain
2. Materials and methods que pasa que pasa
0.50 0.44
2.1. Speech recognition Los libros nuevos Los libros nuevo
no caben en la Logan en la mesa 0.73 0.59
The ASR systems have been used to analyze the in- mesa de la oficina de la oficina
telligibility capabilities of people with speech patholo- Rosita Niño que Recital Niño que
gies [14, 15]. In this study we consider the off-the-shelf pinta bien donó pinta bien todos 0.44 0.39
cloud-based ASR system publicly accessible developed sus cuadros ayer los juegos ayer
by Google Inc. (https://www.google.com/intl/
Luisa Rey com- Luisa Rey com-
es/chrome/demos/speech.html). To perform the
pra el colchón prar un colchón
ASR we made the requests to a server and analyzed the re- 0.60 0.57
duro que tanto le duro que tanto la
sulting transcription. Then we compare the strings obtained
gusta lluvia
with the ASR with the original text read by the patients us-
ing the two propose features.

2.2. Feature extraction 2.3. Classification and Regression


2.2.1 Word accuracy We perform two different prediction tasks to analyze the
suitability of the computed intelligibility features: (1) The
The WA has been established as a marker to analyze the
classification of PD patients vs. HC speakers, and the pre-
performance of ASR systems and intelligibility of persons.
diction of the neurological state of PD patients according to
The WA is defined as the number of words correctly recog-
the UPDRS score.
nized by the ASR system relative to the total of words in the
original string. The WA is computed following Equation 1. For the classification we use a SVM with a Gaus-
sian kernel, following the state of the art [5, 9]. The
# words correctly recognized meta-parameters C and γ are optimized in a grid search,
WA = (1) with selection criterion based on the accuracy obtained
# of total words
in the train set (C ∈ {10−5 , 10−4 , . . . 104 } and γ ∈
2.2.2 Dynamic Time Warping {10−6 , 10−5 , . . . 102 }). For this problem, a leave one out
speaker out (LOSO) cross validation strategy is performed
The DTW is a technique to analyze similarities between two i.e., one subject is used for test, and the others are used for
time–series when both sequences may have differences in train and to optimize the meta-parameters of the classifier.

2
For the regression problem we use a SVR with a ε- Table 2. Results for classification of PD patients vs. HC speakers
using intelligibility features
insensitive loss function and a linear kernel. The param-
eters of the regressor (C and ε) are optimized in a grid- Task Accuracy AUC
search with C ∈ [10−4 , 10−3 , 10−2 , . . . , 100] and ε ∈ sentence 1 78% 0.67
[10−4 , 10−3 , 10−2 , 10−1 , 1, 10, 20]. The performance is sentence 2 61% 0.64
evaluated using the Spearman’s correlation coefficient be- sentence 3 63% 0.63
tween the predicted values and the UPDRS scores following
a 10-Fold cross validation strategy with the aim of compute sentence 4 64% 0.67
the correlation coefficient with more data points rather than sentence 5 82% 0.82
if we perform a LOSO cross-validation strategy. sentence 6 62% 0.65
read text 75% 0.79
3. Experimental Framework all sentences 88% 0.96
3.1. Database all sentences + read text 92% 0.98
The PC-GITA database is considered in this study. The
data is formed with uterances from 50 PD patients and 50
1

Word Accuraccy
HC speakers [16]. The data is balanced by age and gender,
and it was recorded in a sound-proof booth, with a dynamic
omnidirectional microphone and a professional audio card 0.5
with a sampling frequency of 44.1 kHz and 16-bit resolu-
tion. All patients were recorded in ON state, i.e., no more
than three hours after their morning medication, and were 0
labeled by a neurologist expert according to the UPDRS HC PD
score. The tasks considered for this study include the read-
ing of six sentences and a text formed with 36 words. The 1
Similarity DTW

sentences and the read text were chosen with the aim of have
both simple and complex language structures. The read text
contains also all the phonemes of the Spanish language. 0.5

3.2. Experiments
0
We perform the classification of the 50 PD patients vs. HC PD
the 50 HC speakers, and the monitoring of the neurological
state of the 50 PD patients. We perform the classification Figure 1. Box-plots of WA (Top) and DTW (Bottom) both for PD
and regression individually per each task (the six sentences patients and HC speakers
and the read text), and combine the features of all tasks.

4. Results and Discussion higher (in order of 1̃00).


The box-plots of Figure 1 illustrate the difference be-
4.1. Classification
tween the two intelligibility features extracted from PD pa-
Table 2 contains the results for the classification of PD tients and HC speakers. Note the high difference in the fea-
patients vs. HC speakers using the proposed intelligibility tures, specially for the DTW–based features. The features
features for all tasks. The performance is evaluated consid- for PD patients exhibit also lower values in both cases. The
ering the accuracy of the classification, and the area under outliers observed in the HC speakers might be appear due to
the Receiver Operating Characteristic (ROC) curve (AUC). reading errors of the sentences, which causes that the WA
The best individual result is obtained with the sentence 5, for some of the speakers may be reduced. With the aim
with an accuracy of 82%. Note also that the fusion of the of evaluate if the difference between PD and HC is statisti-
features for the different tasks highly improves the results. cally significant, we perform a Wilcoxon rank sum test [17],
For the fusion of the six sentences and the read text, an ac- which is a non-parametric statistical hypothesis test used to
curacy of 92% is obtained. The results obtained are similar assess whether the medians of two population differs. We
or even higher than those obtained in related works using obtain P-values of 8.3e − 37 for the WA, and 7.5e − 25 for
the same database [6], highlighting that the proposed ap- the DTW–based features, rejecting the null hypothesis of
proach consider less features (only 2), compared to the re- both populations (features from PD and HC) have the same
lated works, where the number of features is considerable median.

3
Table 3. Spearman correlation coefficient (ρ) between the real and cal state of PD patients.
predicted UPDRS
Further studies might be performed combining the pro-
Task ρ posed features with other features from speech that analyze
sentence 1 0.20 other impairments of PD patients such as phonation, articu-
lation, and prosody, in order to improve the results both in
sentence 2 0.02 the classification and the regression problems.
sentence 3 0.16
sentence 4 -0.40 6. Acknowledgments
sentence 5 -0.07 The authors would like to thank to COLCIENCIAS
sentence 6 0.23 project # 111556933858 for founding this study, and Fun-
dalianza Parkinson Colombia.
read text 0.19
all sentences -0.12 References
all sentences + read text 0.01 [1] O. Hornykiewicz. Biochemical aspects of Parkinson’s dis-
ease. Neurology, 51(2):S2–S9, 1998.
[2] J. A. Logemann, H. B. Fisher, B. Boshes, and E. R. Blonsky.
4.2. Regression Frequency and cooccurrence of vocal tract dysfunctions in
the speech of a large sample of Parkinson patients. Journal
Table 3 contains the Spearman correlation coefficient (ρ)
of Speech and Hearing Disorders, 43(1):47–57, 1978.
between the real and predicted UPDRS score using the SVR
for all tasks. The highest correlation is obtained with the [3] J. C. Vásquez-Correa, T. Arias-Vergara, J. R. Orozco-
sentence 6 (0.23). In general, note that using only the pro- Arroyave, J. F. Vargas-Bonilla, J. D. Arias-Londoño, and
posed features, the neurological state of the PD patients is E. Nöth. Automatic detection of Parkinson’s disease from
continuous speech recorded in non-controlled noise condi-
not well predicted; however it must be considered that the
tions. In Annual Conference of the International Speech
UPDRS score is a complete motor scale, which only one Communication Association (INTERSPEECH), 2015.
item is related to the speech impairments. The proposed
features might provide suitable information that could be [4] A. Tsanas, M. A. Little, C. Fox, and L. O. Ramig. Objective
automatic assessment of rehabilitative speech treatment in
used as complement of other features from speech or from
Parkinson’s disease. IEEE Transactions on Neural Systems
other bio-signals to monitor the neurological state of PD pa- and Rehabilitation Engineering, 22(1):181–190, 2014.
tients.
[5] A. Tsanas, M. Little, P. E. McSharry, J. Spielman, and L. O.
Ramig. Novel speech signal processing algorithms for high-
5. Conclusion accuracy classification of Parkinson’s disease. IEEE Trans-
actions on Biomedical Engineering, 59(5):1264–1271, 2012.
In this study two novel features are proposed to evaluate
the intelligibility deficits in PD patients. The suitability of [6] J. R. Orozco-Arroyave, E. A. Belalcazar-Bolaños, J. D.
Arias-Londoño, J. Francisco Vargas-Bonilla, S. Skodda,
the features is tested in two scenarios: (1) the classification
J. Rusz, K. Daqrouq, Florian. Hönig, and Elmar Nöth. Char-
of PD patients vs. HC speakers, and (2) The monitoring of acterization methods for the detection of multiple voice
the neurological state of PD patients according to the UP- disorders: Neurological, functional, and laryngeal dis-
DRS score. eases. IEEE Journal of Biomedical and Health Informatics,
According to results, the proposed features are highly 19(6):1820–1828, 2015.
accurate to classify PD patients from HC speakers. Accura- [7] S. Skodda, W. Visser, and U. Schlegel. Vowel articulation in
cies of up to 92% are obtained when the features extracted Parkinson’s disease. Journal of Voice, 25(4):467–472, 2012.
from different tasks are combined. Significant differences
[8] J. Rusz, R. Cmejla, T. Tykalova, H. Ruzickova, J. Klempir,
are also observed between the features extracted from PD V. Majerova, J. Picmausova, J. Roth, and E. Ruzicka. Impre-
patients and HC speakers, with lower values of the features cise vowel articulation as a potential early marker of Parkin-
obtained with the utterances of the patients. For the regres- son’s disease: Effect of speaking task. The Journal of the
sion problem, the proposed features are not able to predict Acoustical Society of America, 134(3):2171–2181, 2013.
the UPDRS score of the patients by themselves; however
[9] J. R. Orozco-Arroyave, F. Hönig, J. D. Arias-Londono,
it must be considered that the UPDRS score is a complete J. F. Vargas-Bonilla, S. Skodda, J. Rusz, and E. Nöth.
motor scale which analyze other aspects rather than speech. Voiced/unvoiced transitions in speech as a potential bio-
The proposed features might provide suitable information marker to detect Parkinsons disease. In 16th Annual Con-
that could be used as complement of other features from ference of the Speech and Communication Association (IN-
speech or from other bio-signals to monitor the neurologi- TERSPEECH), pages 95–99, 2015.

4
[10] Sabine Skodda, Wenke Grönheit, and Uwe Schlegel. Into-
nation and speech rate in parkinson’s disease: General and
dynamic aspects and responsiveness to levodopa admission.
Journal of Voice, 25(4):e199–e205, 2011.
[11] Miet De Letter, Patrick Santens, and John Van Borsel. The
effects of levodopa on word intelligibility in Parkinson’s dis-
ease. Journal of Communication Disorders, 38(3):187–196,
2005.
[12] Nick Miller, Liesl Allcock, Diana Jones, Emma Noble, An-
thony J Hildreth, and David J Burn. Prevalence and pattern of
perceived intelligibility changes in parkinsons disease. Jour-
nal of Neurology, Neurosurgery & Psychiatry, 78(11):1188–
1190, 2007.
[13] C. G. Goetz et al. Movement Disorder Society-sponsored
revision of the Unified Parkinson’s Disease Rating Scale
(MDS-UPDRS): Scale presentation and clinimetric testing
results. Movement Disorders, 23(15):2129–2170, 2008.
[14] Maria Schuster, Andreas Maier, Tino Haderlein, Emeka
Nkenke, Ulrike Wohlleben, Frank Rosanowski, Ulrich Eysh-
oldt, and Elmar Nöth. Evaluation of speech intelligibil-
ity for children with cleft lip and palate by means of auto-
matic speech recognition. International Journal of Pediatric
Otorhinolaryngology, 70(10):1741–1747, 2006.
[15] J. R. Orozco-Arroyave, J. C. Vásquez-Correa, F. Hönig, J. D.
Arias-Londoño, J. F. Vargas-Bonilla, S. Skodda, J. Rusz, and
E. Nöth. Towards an automatic monitoring of the neurolog-
ical state of the Parkinson’s patients from speech. In 41st
International Conference on Acoustic, Speech, and Signal
Processing (ICASSP), 2016.
[16] J. R. Orozco-Arroyave, J. D. Arias-Londoño, J. F. Vargas-
Bonilla, M. C. Gonzalez-Rátiva, and E. Nöth. New Spanish
speech corpus database for the analysis of people suffering
from parkinson’s disease. In 9th Language Resources and
Evaluation Conference, LREC, pages 342–347, 2014.
[17] Frank Wilcoxon. Individual comparisons by ranking meth-
ods. Biometrics bulletin, 1(6):80–83, 1945.

You might also like