You are on page 1of 5

Voice Pathology Detection with MDVP Parameters

Using Arabic Voice Pathology Database

Ahmed Al-nasheri1, Zulfiqar Ali1,3, Ghulam Muhammad1, Mansour Alsulaiman1, Khalid H. Almalki2, Tamer A.
Mesallam2, Mohamed Farahat2
1
Digital Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences
2
Otolaryngology Department, College of Medicine,
King Saud University
Riyadh 11543, Saudi Arabia a.alnashari@yahoo.com, {zuali,
ghulam, msuliman}@ksu.edu.sa
3
Centre for Intelligent Signal and Imaging Research (CISIR), Department of Electrical and Electronic Engineering
Universiti Tekhnologi PETRONAS
Tronoh 31750, Perak, Malaysia
zulfiqar_g02579@utp.edy.my

Abstract — This paper investigates the use of Multi- of male and female teachers in the Riyadh area suffer from
Dimensional Voice Program (MDVP) parameters to voice disorders [4]. At our voice center (Communication and
automatically detect voice pathology in Arabic voice pathology Swallowing Disorders Unit, King Abdul Aziz University
database (AVPD). MDVP parameters are very popular among Hospital) a high volume of voice disorder cases are examined
the physician / clinician to detect voice pathology; however, (almost 760 cases per annum in individuals with various
MDVP is a commercial software. AVPD is a newly developed professional and etiological backgrounds. The use of
speech database designed to suit a wide range of experiments in computers to detect or identify pathological problems in
the field of automatic voice pathology detection, classification, speech is a non-invasive method that is advancing with time.
and automatic speech recognition. This paper is the first step to
In the last decade, much research has been done on the
evaluate MDVP parameters in AVPD using sustained vowel /a/.
automatic detection of vocal fold disorders, and these tasks
The experimental results demonstrate that some of the acoustic
features show an excellent ability to discriminate between normal
continue to require further investigation due to the lack of
and pathological voices. The overall best accuracy is 81.33% by standard diagnosing approaches/equipment for voice
using SVM classifier. disorders. Pathology detection is the first crucial step to
correctly diagnose and manage voice disorders. The use of the
Keywords— voice pathology detection; AVPD, MDVP, SVM, objective assessment that includes acoustical analysis is
MEEI. independent of human bias and can assess the voice quality
more reliably by relating certain parameters to vocal fold
I. INTRODUCTION behavior. On the other hand, subjective measurement of voice
quality is based on individual experience. Automatic voice
pathology detection can be accomplished by various types of
Voice pathologies affect the vocal folds, and these features, which can be obtained by the long-term and short-
disorders produce irregular vibrations in the vocal folds due to term signal analysis. The long-term parameter can be derived
the malfunctioning of the voice box. Vocal fold pathologies by acoustic analysis [5], [6] of speech, and the short term
exhibit variations in vibratory cycle of the vocal folds due to parameter can be calculated by linear predictive coefficients
their incomplete closure. The voice disorder also changes the (LPC) [7], [8], linear predictive cepstral coefficients (LPCC)
shape of the vocal tract and produces irregularities in spectral [9], Mel-frequency cepstral coefficients (MFCC) [10], [11]]
properties. etc. Different pattern matching technique such as Gaussian
The number of dysphonic patients having different types of mixture model (GMM) [12], [13], Hidden markov model
voice disorders has been increased significantly. In United (HMM) [14], Support vector machine (SVM) [15], Artificial
States, approximately 7.5 million people have vocal difficulty neural networks (ANN) [16] etc. has been used to differentiate
[1]. It has been found that 15% of the total visitors to the King between disorder and normal subjects. Multiple long-term
Abdul-Aziz University Hospital complain from a voice acoustic features, namely, pitch, shimmer, jitter, APQ
disorder [2]. The complications caused by a voice problem in a (amplitude perturbation quotient), PPQ (pitch perturbation
teaching professional are significantly greater than in a non- quotient), HNR (harmonic to noise ratio), NNE (normalized
teaching professional. Studies revealed that, in the U.S., the noise energy), VTI (voice turbulence index), SPI (soft
prevalence of voice disorders during a lifetime is 57.7% for phonation index), FATR (frequency amplitude tremor), and
teachers and 28.8% for non-teachers [3]. Approximately, 33% the glottal to noise excitation ratio (GNE) are frequently used

978-1-4799-7626-3/15/$31.00 ©2015 IEEE


to diagnose voice pathology (referred in [13] as [1]-[11]). TABLE I: TWENTY TWO ACOUSTIC PARAMETERS EXTRACTED BY MDVP
SOFTWARE ANALYSIS [19].
Furthermore, jitter and shimmer capture the vocal fold
vibration characteristics for both pathological and normal
people, and both parameters are widely used for clinical and
scientific diagnosis [17]. Seven acoustic parameters, including
shimmer and jitter, are extracted by means of an iterative
residual signal estimator in Rosa et al. [18], and jitter provided
54.8% pathology detection accuracy among 21 pathologies.
Thirty-three different long-term acoustic parameters with their
definitions are listed in Arjmandi et al. [19]. Twenty-two
acoustic parameters are selected from the list, and they were
extracted by means of MDVP from the voice samples of the
Massachusetts Eye and Ear Infirmary (MEEI) database. In this
study, 50 dysphonic patients and 50 normal persons were used
to conduct the experiments for the detection. The 22
parameters are calculated for each sample and fed to six
different classifiers to compare their accuracies. Two feature
reduction techniques are also used before applying
classification methods. Binary classifier SVM has shown the
best results compared to other classifiers, and its recognition
rate is 94.26%. In [20], MFCC and six acoustic parameters:
jitter, shimmer, NHR, SPI, APQ and RAP (Relative Average
Perturbation) are extracted and the results are compared with
the NN-based voice pathology detection system [21]. The
acoustic parameters are fed to GMM and SVM based systems,
and they provided an accuracy of 98.4% and 95.2% for
training and testing data, respectively, which is higher than
other systems. Recently, MPEG-7 audio descriptors and multi-
directional regression based features are used in voice
pathology detection and found to have good accuracy [23, 24].
Another recent study investigates the most discriminative
frequency region for voice pathology detection [25].
In this paper, the well-known Multi-Dimensional Voice
Program (MDVP) parameters are used in Arab voice
pathology database (AVPD) to detect voice pathology. MDVP
parameters are commonly used by the physician / clinician to
assess the voice pathology; however, MDVP is a commercial
software, and it includes parameters of voice quality
measurement.
The rest of the paper is organized as follows: section II
provides overview of the speech database. Section III presents
the automatic voice detection system. Section IV describes
experimental setup and discussion on the results. Finally,
section V draws some conclusion.
II. MDVP PARAMETERS

MDVP is a software program that is introduced in 1993 to


analyze 33 quantitative voice parameters. These parameters
allow evaluation of fundamental frequency, amplitude, spectral
energy balance and the presence of any sonority gap and
diplophony. This software is widely used in many literatures
for the different analysis purposes such as determining the
severity of a phonation disorders. In this paper we use only 22
parameters because the rest of the 33 parameters do not reflect
voice quality or they are not generated for some voices. The
used parameters in this paper are listed in Table I to analyze the III. ARABIC VOICE PATHOLOGY DATABASE (AVPD)
automatic voice pathology detection over AVPD.
Dedicating a versatile and relevant database for the study of
voice pathology is very important to researchers in this field.

978-1-4799-7626-3/15/$31.00 ©2015 IEEE


Actually, MEEI database is the most popular database used in TABLE II: PATHOLOGICAL SAMPLES
many researches even if it has many disadvantages such as the Disorder Male Female Total
pathological and normal samples were recorded in two Cyst 3 7 10
different environments, and the sample rate for both normal ,
pathological subjects are 50 KHz and 25 KHz, respectively. Nodules 2 13 15
When a technique is evaluated on MEEI database, we are not Paralysis 13 19 32
sure whether it is detecting pathology recording environment.
Polyp 12 14 26
These drawbacks of MEEI database lead to the need for
designing a new database that overcome these kinds of Sulcus 11 12 23
problems and add more features to the new database such as Total 41 65 106
severity of disorders which is missing in MEEI database. As a
result, we developed AVPD at the communication and
Swallowing Disorders Unit of King Abdul Aziz University TABLE III: NORMAL SAMPLES
Hospital, Riyadh, Saudi Arabia. The patients seen in those Male Female Total
clinics are exposed to a comprehensive protocol of assessment,
23 97 120
including voice recording. The voice recording process is
performed in sound treated room using a Kay Pentax
Computerized speech laboratory (CSL) utilizing MDVP TABLE IV: 22- SORTED FEATURES ACCODING TO F-RATIO
software. The recording process begins with seating the patient No. Feature No. Feature
upright in comfortable position holding the microphone 10 cm
away from his/her mouth using a microphone spacer. Then the 1 NHR 12 vFo
patient was asked to perform successive speech tasks based on 2 VTI 13 Fatr
the clinician’s instructions, including sustained phonation of
/a/, /e/ and /o/ vowels, counting from 0-10, a standardized 3 RAP 14 vAm
Arabic passage, and reading common three words. 4 PPQ 15 SPI
5 Jitt 16 STD
IV. EXPERIMENTS
6 sPPQ 17 ATRI

In this paper, four different experiments are performed with 7 sAPQ 18 Fftr
the samples of normal and pathological subjects for males and 8 Shim 19 Flo
females. The number of pathological samples is 106 subjects
9 APQ 20 Jita
with 5 different disorders as shown in Table II. The number of
normal samples is 120 subjects for males and females as shown 10 PFR 21 Fhi
in Table III. All sample taken from AVPD only for vowel /a/.
11 FTRI 22 F0

The whole feature were descending sorted according to F- accuracies, sensitivities, and specificities. The overall
ratio values that were calculated between two classes one from accuracies, sensitivities, and specificities that we got are shown
pathological and the other from normal for each feature in Table V. The performance measures can be calculated by
individually on the whole voices samples. The sorted feature using equation 2, 3, and 4, respectively, where TP, TN, FP,
that were used in the experiments are shown in Table IV and
and FN represent true positive, true negative, false positive, and
the F-ratio values were calculated by using formula 1.
false negative, respectively.
ሺఓேିఓ௉ሻ
‫ ܨ‬ൌ మ ାఋ మ ܹ݄݁‫ ݅݁ݎ‬ൌ ͳǡʹǡ ͵ ǥ ʹʹሺͳሻ
ఋಿ ು
TP + TN
Accuracy = (2)
TP + TN+ FP+ FN
The value of F-ratio enables us to make a decision about our TP
null hypothesis (the two classes are same). When the F- ratio value Sensitivity= (3)
TP + FN
is small, we accept the null hypothesis and we
conclude that there are no significant differences between the two Specificity = TN (4)
classes. On the other hand, a large value of F-ratio enable us to reject FP + TN
the null hypothesis and conclude that there are
significant differences between the two classes. In our case the two
The receiver operating characteristic (ROC) curve
classes are the normal and pathological samples.
illustrates the performance of a binary classifier graphically. As
we can see from Fig. 1, which represents ROC curve for the
All features in every experiment are fed to Support Vector four experiments, the best overall accuracy is with 22-features,
Machine (SVM) classifier by using 5-fold cross validation to avoid followed by 15-features, 10-features, and 5-features. The area
the bias of the testing data. All four experiments were repeated 10 under the ROC curve (AUC) are 0.82, 0.78, 0.79, 0.81,
times to make sure from the accuracy of the detection process. Then respectively. False positive and true positive rates are taken
we took the average of the 10 along x- and y-axis, respectively. It shows that performance of

978-1-4799-7626-3/15/$31.00 ©2015 IEEE


the SVD classifier in discriminating the normal and of discrimination between normal and pathological subjects
pathological subject is good. The 95% confidence interval such as NHR and VTI. The accuracy of the detection is
(C.I.) is [0.9449 0.9870], and 1-tail p-value is zero (<0.05) comparable with other databases that are used in different
describes the significance of the data in the two classes. research such as MEEI database. We will use AVPD for other
features such as MFCC, LPCC, etc. as a future work.
In [19], the obtained accuracy for MEEI database is 89.29%
with 22 acoustic features. Our obtained accuracy for AVPD
with 22 acoustic features is 81.33%, which is comparable ACKNOWLEDGMENT
result. This project was supported by NSTIP strategic technologies
programs, number (12-MED-2474-02) in the Kingdom of
TABLE V: ACCURACY (%), SENSITIVITY (%) AND SPECIFICITY (%) FOR ALL Saudi Arabia.
EXPERIMENTS.
Average REFERENCES
Experiments Accuracy Sensitivity Specificity
[1] National Institute on Deafness and Other Communication Disorders:
All 22-Features 81.33 72.18 88.96
Voice, Speech, and Language: Quick Statistics, 2014. Available at http:
Top-05-features 75.55 69.00 83.49 //www .nidcd.nih.gov/health/statistics/vsl/Pages/stats.aspx. Accessed on
Dec, 2014.
Top-10-features 77.77 66.46 89.67 [2] Research Chair of Voicing and Swallowing Disorders. Available at http:
//c.ksu.edu.sa/vas/en/vsb. Acessed on Dec, 2014.
Top-15-features 80.88 78.72 83.75
[3] N. Roy, R.M. Merrill, S. Thibeault, R.A. Parsa, S.D. Gray, and E.M.
Smith, “Prevalence of voice disorders in teachers and the general
population,” J Speech Lang Hear Res., vol.47, no. 2, pp. 281-93, Apr
2004.
[4] K.H. Malki, “Voice Disorders Among Saudi Teachers in Riyadh City”,
Saudi Journal of Oto-Rhinolaryngology Head and Neck Surgery, 2010.
[5] B. Boyanov, and S. Hadjitodorov, “Acoustic analysis of pathological
voices. a voice analysis system for the screening of laryngeal diseases”,
Proceedings of IEEE International Conference on Engineering in
Medicine and Biology Society, vol.16, pp.74-82, 1997.
[6] C. E. Martinez, and L H. Rufiner, “Acoustic analysis of speech for
detection of laryngeal pathologies”, Proceedings. of 22nd Annual IEEE
International Conference on Engineering in Medicine and Biology
Society, vol. 3, pp.2369-2372, 2000.
[7] B. S. Atal, "Effectiveness of linear prediction characteristics of the speech
wave for automatic speaker identification and recognition" J. Acoustic.
Soc. Amer., vol. 54, no. 6, pp. 1304-1312, 1974.
Figure 1. ROC Curve for the 4-different experiments
[8] L. Xugang, and D. Jianwu, “An investigation of dependencies between
frequency components and speaker characteristics for text-independent
speaker identification”, Speech Communication’ 07, vol. 50, no. 4, pp.
The scatter plot for the two top ranked acoustic features 312-322, Oct 2007.
shown in Fig. 2 indicates that these acoustic features (NHR and [9] M. A. Anusuya, S. K. Katti, “Front end analysis of speech recognition: a
VTI) have an excellent ability to discriminate between normal review”, International Journal of Speech Technology, vol. 14, pp. 99-145,
and pathological voices. Dec. 2010.
[10] L. Rabiner and B.H. Juang, Fundamentals of speech recognition.
Englewood Cliffs, NJ: Prentice-Hall, 1993.
[11] Z. Ali, M. Aslam., and M.E. Ana María, “A speaker identification system
using MFCC features with VQ technique”, Proceedings of 3rd IEEE
International Symposium on Intelligent Information Technology
Application, pp. 115-119, 2009.
[12] W.J.J. Roberts, and J.P. Willmore, "Automatic speaker recognition using
Gaussian mixture models", proceedings of Information, Decision and
Control, IDC’99, pp. 465 – 470, 1999.
[13] J.I. Godino-Llorente, P. Gomes-Vilda and M. Blanco-Velasco,
"Dimensionality reduction of a pathological voice quality assessment
system based on Gaussian mixture models and short-term cepstral
parameters", IEEE Transactions on Biomedical Engineering, vol. 53, no.
10, pp. 1943-1953. Oct. 2006.
Figure 2. VTI and NHR Scatter Plot [14] L.E. Baum and T. Petrie, “Statistical inference for probabilistic functions
of finite state Markov Chains”, Ann. Math. Stat., vol. 37, pp. 1554-1563,
V. CONCLUSION 1966.
[15] S. Abe, Support Vector Machines for Pattern Classification. Springer-
Verlag, Berlin Heidelberg New York, 2005
In this work, we evaluate MDVP parameters on AVPD [16] T. Ritchings, M. McGillion, and C. Moore, “Pathological voice quality
with four different experiments. The overall accuracies we got assessment using artificial neural networks,” Med. Eng. Phys., vol. 24,
are 81.33 %. Some of the acoustic features show high ability no. 8, pp. 561–564, Sept 2002.

978-1-4799-7626-3/15/$31.00 ©2015 IEEE


[17] M. Brockmann, M.J. Drinnan, C. Storck, and P.N. Carding, "Reliable [22] N. Sáenz-Lechón, J.I. Godino-Llorente, Ví. Osma-Ruiz, and P. Gómez-
jitter and shimmer measurements in voice clinics: The relevance of Vilda, “Methodological issues in the development of automatic systems
vowel, gender, vocal intensity, and fundamental frequency effects in a for voice pathology detection,” Biomedical Signal Processing and
typical clinical task, Journal of voice, vol. 25, no. 1, pp. 44-53, 2011. Control, vol. 1, no. 2, pp. 120-128, April 2006.
[18] M. Rosa, J.C. Pereira, and M. Grellet, “Adaptive estimation of residue [23] G. Muhammad and M. Melhem, “Pathological Voice Detection and
signal for voice pathology diagnosis,” IEEE Trans. Biomed. Eng., vol. 47, Binary Classification Using MPEG-7 Audio Features," Biomedical
no. 1, pp. 96–104, Jan 2000. Signal Processing and Controls, 11 (2014), pp. 1 – 9, 2014.
[19] M.K. Arjmandi, M. Pooyan, M. Mikaili, M. Vali, and A. Moqarehzadeh, [24] G. Muhammad, T. Mesallam, K. Almalki, M. Farahat, A. Mahmood, and
“ Identification of voice disorders using long-time features and support M. Alsulaiman, “Multi Directional Regression (MDR) Based Features for
vector machine with different feature reduction methods”, Journal of Automatic Voice Disorder Detection,” Journal of Voice, Elsevier, Vol.
Voice, vol. 25, no. 6, pp. 275-289, Nov 2011. 26, No. 6, pp. 817.e19-817.e27, 2012.
[20] J. Wang, C. Jo, "Vocal folds disorder detection using pattern recognition [25] A. A-Nasheri, Z. Ali, G. Muhammad, and M. Alsulaiman, “Voice
method", Proceedings of 29th Annual International Conference of the Pathology Detection Using Auto-Correlation of Different Filters Bank,”
IEEE EMBS, pp. 3253-3256, Lyon, France, 2007. 11th ACS/IEEE International Conference on Computer Systems and
[21] T. Li, C. Jo, and S. Wang, “Classification of pathological voice including Applications, Doha, Qatar, November 10-13, 2014.
severely noisy cases”, Proceedings of 8th International Conference on
Spoken Language Processing, I, Jeju, Korea, 2004, pp. 77-80.

978-1-4799-7626-3/15/$31.00 ©2015 IEEE

You might also like