Professional Documents
Culture Documents
978-1-5090-3797-1/16/$31.00
2016
c IEEE
ers using a support vector machine (SVM) with a Gaussian time, and number of samples, performing a time–alignment
kernel, and (2) the automatic prediction of the neurological between the sequences. We compute the DTW distance be-
state of patients according to the standard unified Parkin- tween the predicted string i.e., the complete sentence us-
son disease rating scale (UPDRS). The scale is assigned by ing the ASR system and the original sentence read by the
neurologist experts and considers several items to evaluate patients. The distance is computed over the all text, at
motor activities such as finger tapping, walking, neck move- grapheme level.
ment, and others. The scale only includes one item to eval- Then the distance is transformed to a similarity score
uate speech [13]. The prediction of the neurological state using Equation 2. If the sequences are the same, the
of the patients is performed using a support vector regres- DT W distance will be zero, and the similarity will be 1,
sor (SVR) with a linear kernel. According to results, we and if the strings are very different, the DT W distance
obtain a high accuracy for the classification of PD vs. HC. will be high, and the similarity will be close to zero.
Most of patients also exhibit reduced intelligibility capabil-
ities compared to the HC speakers. In general, the WA and 1
the DTW-based features are reduced in patients compared similarity = (2)
1 + DTW distance
to HC speakers. For the prediction of the neurological state
of the patients, the proposed features show to be promising An example of the features computed for six sentences
for that task, and might be considered as a complement to is shown in Table 1
analyze speech deficits in Parkinson disease.
Table 1. Example of the intelligibility features for six sentences
The rest of paper is as follows: Section 2 contains the Original String Predicted String WA DTW
description of the methods used for feature extraction, clas-
Mi casa tiene tres Ricardo tiene tres
sification, and regression, section 3 describes the database, 0.60 0.70
cuartos cuartos
and the experiments performed. Section 4 details the results
Omar, que vive Omar vive cerca
of this study, and section 5 describes the main conclusions. 0.50 0.38
cerca, trajo miel dragón bien
Laura sube al tren Laura soultrain
2. Materials and methods que pasa que pasa
0.50 0.44
2.1. Speech recognition Los libros nuevos Los libros nuevo
no caben en la Logan en la mesa 0.73 0.59
The ASR systems have been used to analyze the in- mesa de la oficina de la oficina
telligibility capabilities of people with speech patholo- Rosita Niño que Recital Niño que
gies [14, 15]. In this study we consider the off-the-shelf pinta bien donó pinta bien todos 0.44 0.39
cloud-based ASR system publicly accessible developed sus cuadros ayer los juegos ayer
by Google Inc. (https://www.google.com/intl/
Luisa Rey com- Luisa Rey com-
es/chrome/demos/speech.html). To perform the
pra el colchón prar un colchón
ASR we made the requests to a server and analyzed the re- 0.60 0.57
duro que tanto le duro que tanto la
sulting transcription. Then we compare the strings obtained
gusta lluvia
with the ASR with the original text read by the patients us-
ing the two propose features.
2
For the regression problem we use a SVR with a ε- Table 2. Results for classification of PD patients vs. HC speakers
using intelligibility features
insensitive loss function and a linear kernel. The param-
eters of the regressor (C and ε) are optimized in a grid- Task Accuracy AUC
search with C ∈ [10−4 , 10−3 , 10−2 , . . . , 100] and ε ∈ sentence 1 78% 0.67
[10−4 , 10−3 , 10−2 , 10−1 , 1, 10, 20]. The performance is sentence 2 61% 0.64
evaluated using the Spearman’s correlation coefficient be- sentence 3 63% 0.63
tween the predicted values and the UPDRS scores following
a 10-Fold cross validation strategy with the aim of compute sentence 4 64% 0.67
the correlation coefficient with more data points rather than sentence 5 82% 0.82
if we perform a LOSO cross-validation strategy. sentence 6 62% 0.65
read text 75% 0.79
3. Experimental Framework all sentences 88% 0.96
3.1. Database all sentences + read text 92% 0.98
The PC-GITA database is considered in this study. The
data is formed with uterances from 50 PD patients and 50
1
Word Accuraccy
HC speakers [16]. The data is balanced by age and gender,
and it was recorded in a sound-proof booth, with a dynamic
omnidirectional microphone and a professional audio card 0.5
with a sampling frequency of 44.1 kHz and 16-bit resolu-
tion. All patients were recorded in ON state, i.e., no more
than three hours after their morning medication, and were 0
labeled by a neurologist expert according to the UPDRS HC PD
score. The tasks considered for this study include the read-
ing of six sentences and a text formed with 36 words. The 1
Similarity DTW
sentences and the read text were chosen with the aim of have
both simple and complex language structures. The read text
contains also all the phonemes of the Spanish language. 0.5
3.2. Experiments
0
We perform the classification of the 50 PD patients vs. HC PD
the 50 HC speakers, and the monitoring of the neurological
state of the 50 PD patients. We perform the classification Figure 1. Box-plots of WA (Top) and DTW (Bottom) both for PD
and regression individually per each task (the six sentences patients and HC speakers
and the read text), and combine the features of all tasks.
3
Table 3. Spearman correlation coefficient (ρ) between the real and cal state of PD patients.
predicted UPDRS
Further studies might be performed combining the pro-
Task ρ posed features with other features from speech that analyze
sentence 1 0.20 other impairments of PD patients such as phonation, articu-
lation, and prosody, in order to improve the results both in
sentence 2 0.02 the classification and the regression problems.
sentence 3 0.16
sentence 4 -0.40 6. Acknowledgments
sentence 5 -0.07 The authors would like to thank to COLCIENCIAS
sentence 6 0.23 project # 111556933858 for founding this study, and Fun-
dalianza Parkinson Colombia.
read text 0.19
all sentences -0.12 References
all sentences + read text 0.01 [1] O. Hornykiewicz. Biochemical aspects of Parkinson’s dis-
ease. Neurology, 51(2):S2–S9, 1998.
[2] J. A. Logemann, H. B. Fisher, B. Boshes, and E. R. Blonsky.
4.2. Regression Frequency and cooccurrence of vocal tract dysfunctions in
the speech of a large sample of Parkinson patients. Journal
Table 3 contains the Spearman correlation coefficient (ρ)
of Speech and Hearing Disorders, 43(1):47–57, 1978.
between the real and predicted UPDRS score using the SVR
for all tasks. The highest correlation is obtained with the [3] J. C. Vásquez-Correa, T. Arias-Vergara, J. R. Orozco-
sentence 6 (0.23). In general, note that using only the pro- Arroyave, J. F. Vargas-Bonilla, J. D. Arias-Londoño, and
posed features, the neurological state of the PD patients is E. Nöth. Automatic detection of Parkinson’s disease from
continuous speech recorded in non-controlled noise condi-
not well predicted; however it must be considered that the
tions. In Annual Conference of the International Speech
UPDRS score is a complete motor scale, which only one Communication Association (INTERSPEECH), 2015.
item is related to the speech impairments. The proposed
features might provide suitable information that could be [4] A. Tsanas, M. A. Little, C. Fox, and L. O. Ramig. Objective
automatic assessment of rehabilitative speech treatment in
used as complement of other features from speech or from
Parkinson’s disease. IEEE Transactions on Neural Systems
other bio-signals to monitor the neurological state of PD pa- and Rehabilitation Engineering, 22(1):181–190, 2014.
tients.
[5] A. Tsanas, M. Little, P. E. McSharry, J. Spielman, and L. O.
Ramig. Novel speech signal processing algorithms for high-
5. Conclusion accuracy classification of Parkinson’s disease. IEEE Trans-
actions on Biomedical Engineering, 59(5):1264–1271, 2012.
In this study two novel features are proposed to evaluate
the intelligibility deficits in PD patients. The suitability of [6] J. R. Orozco-Arroyave, E. A. Belalcazar-Bolaños, J. D.
Arias-Londoño, J. Francisco Vargas-Bonilla, S. Skodda,
the features is tested in two scenarios: (1) the classification
J. Rusz, K. Daqrouq, Florian. Hönig, and Elmar Nöth. Char-
of PD patients vs. HC speakers, and (2) The monitoring of acterization methods for the detection of multiple voice
the neurological state of PD patients according to the UP- disorders: Neurological, functional, and laryngeal dis-
DRS score. eases. IEEE Journal of Biomedical and Health Informatics,
According to results, the proposed features are highly 19(6):1820–1828, 2015.
accurate to classify PD patients from HC speakers. Accura- [7] S. Skodda, W. Visser, and U. Schlegel. Vowel articulation in
cies of up to 92% are obtained when the features extracted Parkinson’s disease. Journal of Voice, 25(4):467–472, 2012.
from different tasks are combined. Significant differences
[8] J. Rusz, R. Cmejla, T. Tykalova, H. Ruzickova, J. Klempir,
are also observed between the features extracted from PD V. Majerova, J. Picmausova, J. Roth, and E. Ruzicka. Impre-
patients and HC speakers, with lower values of the features cise vowel articulation as a potential early marker of Parkin-
obtained with the utterances of the patients. For the regres- son’s disease: Effect of speaking task. The Journal of the
sion problem, the proposed features are not able to predict Acoustical Society of America, 134(3):2171–2181, 2013.
the UPDRS score of the patients by themselves; however
[9] J. R. Orozco-Arroyave, F. Hönig, J. D. Arias-Londono,
it must be considered that the UPDRS score is a complete J. F. Vargas-Bonilla, S. Skodda, J. Rusz, and E. Nöth.
motor scale which analyze other aspects rather than speech. Voiced/unvoiced transitions in speech as a potential bio-
The proposed features might provide suitable information marker to detect Parkinsons disease. In 16th Annual Con-
that could be used as complement of other features from ference of the Speech and Communication Association (IN-
speech or from other bio-signals to monitor the neurologi- TERSPEECH), pages 95–99, 2015.
4
[10] Sabine Skodda, Wenke Grönheit, and Uwe Schlegel. Into-
nation and speech rate in parkinson’s disease: General and
dynamic aspects and responsiveness to levodopa admission.
Journal of Voice, 25(4):e199–e205, 2011.
[11] Miet De Letter, Patrick Santens, and John Van Borsel. The
effects of levodopa on word intelligibility in Parkinson’s dis-
ease. Journal of Communication Disorders, 38(3):187–196,
2005.
[12] Nick Miller, Liesl Allcock, Diana Jones, Emma Noble, An-
thony J Hildreth, and David J Burn. Prevalence and pattern of
perceived intelligibility changes in parkinsons disease. Jour-
nal of Neurology, Neurosurgery & Psychiatry, 78(11):1188–
1190, 2007.
[13] C. G. Goetz et al. Movement Disorder Society-sponsored
revision of the Unified Parkinson’s Disease Rating Scale
(MDS-UPDRS): Scale presentation and clinimetric testing
results. Movement Disorders, 23(15):2129–2170, 2008.
[14] Maria Schuster, Andreas Maier, Tino Haderlein, Emeka
Nkenke, Ulrike Wohlleben, Frank Rosanowski, Ulrich Eysh-
oldt, and Elmar Nöth. Evaluation of speech intelligibil-
ity for children with cleft lip and palate by means of auto-
matic speech recognition. International Journal of Pediatric
Otorhinolaryngology, 70(10):1741–1747, 2006.
[15] J. R. Orozco-Arroyave, J. C. Vásquez-Correa, F. Hönig, J. D.
Arias-Londoño, J. F. Vargas-Bonilla, S. Skodda, J. Rusz, and
E. Nöth. Towards an automatic monitoring of the neurolog-
ical state of the Parkinson’s patients from speech. In 41st
International Conference on Acoustic, Speech, and Signal
Processing (ICASSP), 2016.
[16] J. R. Orozco-Arroyave, J. D. Arias-Londoño, J. F. Vargas-
Bonilla, M. C. Gonzalez-Rátiva, and E. Nöth. New Spanish
speech corpus database for the analysis of people suffering
from parkinson’s disease. In 9th Language Resources and
Evaluation Conference, LREC, pages 342–347, 2014.
[17] Frank Wilcoxon. Individual comparisons by ranking meth-
ods. Biometrics bulletin, 1(6):80–83, 1945.