Professional Documents
Culture Documents
Ahmed Al-nasheri1, Zulfiqar Ali1,3, Ghulam Muhammad1, Mansour Alsulaiman1, Khalid H. Almalki2, Tamer A.
Mesallam2, Mohamed Farahat2
1
Digital Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences
2
Otolaryngology Department, College of Medicine,
King Saud University
Riyadh 11543, Saudi Arabia a.alnashari@yahoo.com, {zuali,
ghulam, msuliman}@ksu.edu.sa
3
Centre for Intelligent Signal and Imaging Research (CISIR), Department of Electrical and Electronic Engineering
Universiti Tekhnologi PETRONAS
Tronoh 31750, Perak, Malaysia
zulfiqar_g02579@utp.edy.my
Abstract — This paper investigates the use of Multi- of male and female teachers in the Riyadh area suffer from
Dimensional Voice Program (MDVP) parameters to voice disorders [4]. At our voice center (Communication and
automatically detect voice pathology in Arabic voice pathology Swallowing Disorders Unit, King Abdul Aziz University
database (AVPD). MDVP parameters are very popular among Hospital) a high volume of voice disorder cases are examined
the physician / clinician to detect voice pathology; however, (almost 760 cases per annum in individuals with various
MDVP is a commercial software. AVPD is a newly developed professional and etiological backgrounds. The use of
speech database designed to suit a wide range of experiments in computers to detect or identify pathological problems in
the field of automatic voice pathology detection, classification, speech is a non-invasive method that is advancing with time.
and automatic speech recognition. This paper is the first step to
In the last decade, much research has been done on the
evaluate MDVP parameters in AVPD using sustained vowel /a/.
automatic detection of vocal fold disorders, and these tasks
The experimental results demonstrate that some of the acoustic
features show an excellent ability to discriminate between normal
continue to require further investigation due to the lack of
and pathological voices. The overall best accuracy is 81.33% by standard diagnosing approaches/equipment for voice
using SVM classifier. disorders. Pathology detection is the first crucial step to
correctly diagnose and manage voice disorders. The use of the
Keywords— voice pathology detection; AVPD, MDVP, SVM, objective assessment that includes acoustical analysis is
MEEI. independent of human bias and can assess the voice quality
more reliably by relating certain parameters to vocal fold
I. INTRODUCTION behavior. On the other hand, subjective measurement of voice
quality is based on individual experience. Automatic voice
pathology detection can be accomplished by various types of
Voice pathologies affect the vocal folds, and these features, which can be obtained by the long-term and short-
disorders produce irregular vibrations in the vocal folds due to term signal analysis. The long-term parameter can be derived
the malfunctioning of the voice box. Vocal fold pathologies by acoustic analysis [5], [6] of speech, and the short term
exhibit variations in vibratory cycle of the vocal folds due to parameter can be calculated by linear predictive coefficients
their incomplete closure. The voice disorder also changes the (LPC) [7], [8], linear predictive cepstral coefficients (LPCC)
shape of the vocal tract and produces irregularities in spectral [9], Mel-frequency cepstral coefficients (MFCC) [10], [11]]
properties. etc. Different pattern matching technique such as Gaussian
The number of dysphonic patients having different types of mixture model (GMM) [12], [13], Hidden markov model
voice disorders has been increased significantly. In United (HMM) [14], Support vector machine (SVM) [15], Artificial
States, approximately 7.5 million people have vocal difficulty neural networks (ANN) [16] etc. has been used to differentiate
[1]. It has been found that 15% of the total visitors to the King between disorder and normal subjects. Multiple long-term
Abdul-Aziz University Hospital complain from a voice acoustic features, namely, pitch, shimmer, jitter, APQ
disorder [2]. The complications caused by a voice problem in a (amplitude perturbation quotient), PPQ (pitch perturbation
teaching professional are significantly greater than in a non- quotient), HNR (harmonic to noise ratio), NNE (normalized
teaching professional. Studies revealed that, in the U.S., the noise energy), VTI (voice turbulence index), SPI (soft
prevalence of voice disorders during a lifetime is 57.7% for phonation index), FATR (frequency amplitude tremor), and
teachers and 28.8% for non-teachers [3]. Approximately, 33% the glottal to noise excitation ratio (GNE) are frequently used
In this paper, four different experiments are performed with 7 sAPQ 18 Fftr
the samples of normal and pathological subjects for males and 8 Shim 19 Flo
females. The number of pathological samples is 106 subjects
9 APQ 20 Jita
with 5 different disorders as shown in Table II. The number of
normal samples is 120 subjects for males and females as shown 10 PFR 21 Fhi
in Table III. All sample taken from AVPD only for vowel /a/.
11 FTRI 22 F0
The whole feature were descending sorted according to F- accuracies, sensitivities, and specificities. The overall
ratio values that were calculated between two classes one from accuracies, sensitivities, and specificities that we got are shown
pathological and the other from normal for each feature in Table V. The performance measures can be calculated by
individually on the whole voices samples. The sorted feature using equation 2, 3, and 4, respectively, where TP, TN, FP,
that were used in the experiments are shown in Table IV and
and FN represent true positive, true negative, false positive, and
the F-ratio values were calculated by using formula 1.
false negative, respectively.
ሺఓேିఓሻ
ܨൌ మ ାఋ మ ܹ݄݁ ݅݁ݎൌ ͳǡʹǡ ͵ ǥ ʹʹሺͳሻ
ఋಿ ು
TP + TN
Accuracy = (2)
TP + TN+ FP+ FN
The value of F-ratio enables us to make a decision about our TP
null hypothesis (the two classes are same). When the F- ratio value Sensitivity= (3)
TP + FN
is small, we accept the null hypothesis and we
conclude that there are no significant differences between the two Specificity = TN (4)
classes. On the other hand, a large value of F-ratio enable us to reject FP + TN
the null hypothesis and conclude that there are
significant differences between the two classes. In our case the two
The receiver operating characteristic (ROC) curve
classes are the normal and pathological samples.
illustrates the performance of a binary classifier graphically. As
we can see from Fig. 1, which represents ROC curve for the
All features in every experiment are fed to Support Vector four experiments, the best overall accuracy is with 22-features,
Machine (SVM) classifier by using 5-fold cross validation to avoid followed by 15-features, 10-features, and 5-features. The area
the bias of the testing data. All four experiments were repeated 10 under the ROC curve (AUC) are 0.82, 0.78, 0.79, 0.81,
times to make sure from the accuracy of the detection process. Then respectively. False positive and true positive rates are taken
we took the average of the 10 along x- and y-axis, respectively. It shows that performance of