You are on page 1of 9

UM5

ENSAM Rabat
Ing énieurie Biom édicale

A Spectrogram Based Deep Feature Assisted


Computer-Aided Diagnostic System for
Parkinson’s Disease

Projet Intelligence Artificielle


Préparé par

OUBAHA Soumia
IDRISSI MANALI Ichraq

Encadré par:
prof.Nsiri Benayad

Année universitaire 2023/2024


GBM-S5
ENSAM Rabat

article

2
Parkinson’s Disease Detection: A Comprehensive Study

Your Name

December 17, 2023


Abstract

Parkinson’s disease is a neurodegenerative disorder characterized by the progressive degenera-


tion of dopamine cells in the brain, leading to motor and non-motor symptoms. Early detection
is challenging due to the gradual onset of symptoms. This study explores different diagnostic
systems focusing on gait, tremor, and speech characteristics. Recent research indicates speech
impairments as potential predictors for Parkinson’s disease, emphasizing the need for modeling
speech variations using acoustic features.
Three detection methods are proposed: a transfer learning-based approach using speech
spectrograms, evaluation of deep features extracted from spectrograms with machine learning
classifiers, and assessment of simple acoustic features using machine learning classifiers. The
frameworks are evaluated on the Spanish dataset pc-Gita. Results show promising outcomes
with deep features, achieving the highest accuracy of 99.7
INDEX TERMS: Parkinson disease, Classification, Deep features, Speech signals, Trans-
fer Learning
0.1. I. INTRODUCTION

0.1 I. INTRODUCTION
Parkinson’s disease (PD) is a slowly advancing neurodegenerative condition with an elusive
origin [1]. Researchers have identified certain hereditary and environmental factors contributing
to the development of Parkinson’s disease, affecting approximately 100-250 individuals per
100,000 [2]. While prevalent among individuals aged 50 and above, early symptoms have
also been observed in those aged 30-50. The disease primarily impacts the central nervous
system, leading to the degeneration of dopamine-producing neuron cells. Dopamine, a chemical
produced by the substantial nigra (basal ganglia), plays a crucial role in transmitting signals
within the brain. The loss of dopamine-producing cells results in movement disorders in PD
patients.
The symptoms of Parkinson’s disease are categorized into motor and non-motor symptoms.
Motor symptoms, more noticeable than non-motor symptoms, involve slowness of movement
(bradykinesia), rigidity, postural instability, and tremor [3]. Non-motor symptoms, occurring
at specific intervals, include sleep disorders, speech and swallowing problems, and olfactory
disorder (loss of sense of smell) [1, 3]. Parkinson’s disease has a distinctive impact on speech,
affecting phonation, articulation, and prosody. Phonation refers to the use of vocal folds for
speech, articulation involves the use of special tissues in speech production, and prosody is
related to amplitude, loudness, and pitch to produce sound. Research in Parkinson’s disease
detection often focuses on phonation, including the pronunciation of vowels ı ø ˘ [4].
Speech signals are a prominent method for diagnosing Parkinson’s disease, with studies
revealing significant pronunciation issues in vowels, sentences, and words [5]. Consequently,
speech is considered a major predictor of PD. Articulation, intelligibility, and prosody features
in speech signals have demonstrated promising results in PD detection [6]. Additionally, age
factors contribute to the disease, with significant defects observed in the speech recordings of
young speakers [7]. Monitoring of Skype calls using normal sentences has revealed substantial
errors in the pronunciation of PD patients [8].
Traditionally, acoustic features, often coupled with support vector machines (SVM), are
considered in recent works for PD detection. Various recent studies have explored disease
detection using gait, handwriting, and speech datasets. The literature also highlights Gaussian-
based models, several machine learning techniques, and convolutional neural networks as
contributors to PD diagnosis [12]. Detecting Parkinson’s disease using the most suitable speech
impairment features remains an imperative yet open research area.
This research focuses on analyzing speech recordings using spectrograms and acoustic
features. In our approach, all recordings undergo transformation into short-time Fourier
transform (spectrograms) utilized in the transfer learning method. We propose a method
based on simple acoustic features and consider a pre-trained convolutional neural network
architecture, specifically the Alexnet model, for deep feature extraction and PD detection [13,
14]. To ensure fairness in comparison, transfer learning-based classification is also employed.
The proposed methods are evaluated using Parkinson’s disease speech recordings from the
PC-GITA dataset [15]. Results indicate that the deep feature-based technique yields superior
outcomes.

2
0.2. II. LITERATURE REVIEW

The primary contributions of our research in PD detection using speech signals are
summarized as follows: 1. Introduction of a spectrogram-based approach for extracting deep
features to distinguish PD patients from healthy individuals. 2. Proposal of an acoustic-phonetic-
based approach for PD detection. 3. Conducting a comprehensive comparison between the
proposed deep feature-based approach, simple acoustic features, and transfer learning-based
methods.
The remainder of the research paper is organized as follows: Section 2 provides a literature
review, Section 3 explains the proposed methodology, Section 4 details the experimental setup
and dataset, Section 5 presents a detailed view of results and simulations, and the paper
concludes with Section 6, including discussions on future work.

0.1.1 Acoustic Features and SVM


Traditionally, acoustic features, often combined with support vector machines (SVM),
have been employed for PD detection. Recent research expands detection methods to include
gait, handwriting, and speech datasets, along with Gaussian-based models and convolutional
neural networks (CNNs) [8]. However, the selection of suitable speech impairment features for
PD detection remains an open research area.

0.2 II. LITERATURE REVIEW


This section provides an overview of existing techniques for Parkinson’s disease detection
using the Spanish speech dataset pc-Gita. Techniques are categorized into machine learning-
based methods and other approaches.

0.2.1 MACHINE LEARNING BASED METHODS


In recent years, machine learning-based disease classification has become widely utilized
in the medical field, gaining significant importance [16, 17]. L. Velazquez et al. [18] employed a
Gaussian-based density approach, focusing on four to five different Parkinson’s disease (PD)
corpora. Their study utilized phonetic text-dependent utterances that required vocal tract
features of speech signals. Evaluation encompassed words, sentences, monologues, and vowels
across three corpora, with a predominant male patient presence in the Czech dataset. The
results demonstrated superior performance compared to other datasets, specifically a 81
Rueda et al. [6] applied a wrapper feature selection method to analyze vowel ӊnd words
”-ta-ka” (Articulation, phonation, Diadochokinetic features) from pc-Gita recordings, achieving
a 70
Karan, B. et al. [20] assessed inherent and decomposition-based features from the vowel ”,̈
”ø” in two datasets, pc-Gita and Saarbrucken. Their study reported a notable 96
Various other studies explored different aspects. Vasquez-Correa et al. [22] proposed a novel
approach considering the on-off state of vocal folds, reporting 94.9The authors considered gender
factors and emphasized the significant role of age in Parkinson’s disease (PD) classification.
Their research found that signals from younger speakers contribute more to the PD classification

3
0.3. III. PROPOSED METHODOLOGY

process than signals from older speakers. They employed binary and multi-class support vector
machines, comparing them with neural networks and reporting an impressive 95
Cernak, Milos, et al. [25] proposed a novel approach for classifying speech signals into
Parkinson’s or healthy patients. Using a phonological feature-based method on the Spanish
pc-Gita dataset, they found this approach valuable in assessing PD patients in clinical settings.
Moro-Velázquez, Laureano, et al. [25] utilized traditional machine learning methods,
focusing on articulation and phonological features for PD detection. Their assessment of kinetic
features and speech signals from Parkinson disease patients using Gaussian mixture modeling
and i-vectors resulted in an 87
Orozco-Arroyave, J. R et al. [26] developed open-source software for PD assessment, using
phonation, articulation, prosody, and intelligibility dimensions from the pc-Gita dataset. They
designed a clinician-friendly system for identifying Parkinson using conventional machine
learning, achieving practicality and ease of adoption.
T. Arias-Vergara et al. [8] proposed a model for PD assessment through individual speaker
speech signal analysis. They examined phonation, articulation, and prosody in spontaneous
and read speech from the Spanish pc-Gita dataset, observing the effectiveness of Skype speech
signals in distant observation. Evaluation using Gaussian mixture modeling and i-vectors
yielded a 0.77
J.C. Vásquez-Correa et al. [27] presented an enhanced version of m-FDA for PD detection,
considering phonation, articulation, prosody, and intelligibility features from Spanish vowel
/a/, sentences, and words.
Orhan et al. [28] used statistical pooling and ReliefF for feature selection, achieving 91
Immane et al. [30] assessed the gait physionet dataset for PD diagnosis, employing a deep
1-D neural network that achieved 98.7
Turker et al. [31] proposed an octopus-based multiple pooling method for feature extraction,
achieving 99.2
Diogo et al. [32] presented an approach for early PD diagnosis using three distinct databases
with vowel pronunciation in different languages. Their work achieved the highest accuracy of
99.94

0.2.2 Other Approaches

Vasquez-Correa et al. [14] proposed a novel approach based on the on-off state of vocal
folds, reporting 94.9

0.3 III. PROPOSED METHODOLOGY

This section details the proposed methodology, encompassing spectrogram-based ap-


proaches, acoustic-phonetic methods, and deep feature extraction using a pre-trained convolu-
tional neural network architecture.

4
0.4. IV. EXPERIMENTAL SETUP AND DATASET DETAILS

0.3.1 Spectrogram-Based Approach


In our work, all recordings are transformed into short-time Fourier transform (spectrograms)
for the transfer learning method.

0.3.2 Acoustic-Phonetic Approach


We propose an acoustic-phonetic based approach for PD detection, leveraging traditional
acoustic features.

0.3.3 Deep Feature Extraction


Additionally, we consider a pre-trained convolutional neural network architecture, specifi-
cally the Alexnet model, for deep feature extraction and PD detection.

0.4 IV. EXPERIMENTAL SETUP AND DATASET DETAILS


This section provides an overview of the experimental setup and details of the pc-Gita
dataset used for evaluating the proposed methods.

0.5 V. RESULTS AND DISCUSSION


Results and simulations are presented, highlighting the performance of the proposed
methods in PD detection.

0.6 VI. CONCLUSION AND FUTURE WORK


The paper concludes with a summary of contributions and potential avenues for future
research.

5
Bibliography

Bibliography
[1] Author1, Title1, Journal1, Year1.

[2] Author2, Title2, Journal2, Year2.

[3] Author3, Title3, Journal3, Year3.

[4] Author4, Title4, Journal4, Year4.

[5] Author5, Title5, Journal5, Year5.

[6] Author6, Title6, Journal6, Year6.

[7] Author7, Title7, Journal7, Year7.

[8] Author12, Title12, Journal12, Year12.

[9] Author16, Title16, Journal16, Year16.

[10] Author17, Title17, Journal17, Year17.

[11] Author18, Title18, Journal18, Year18.

[12] Author19, Title19, Journal19, Year19.

[13] Author20, Title20, Journal20, Year20.

[14] Author22, Title22, Journal22, Year22.

[15] Author23, Title23, Journal23, Year23.

[16] Author24, Title24, Journal24, Year24.

You might also like