You are on page 1of 4

Voice Disorder Identification Using Machine

Learning Techniques
Vinit Kumar Agrawal, 116CS0161, Dept. of Computer Science & Engineering
National Institute of Technology, Rourkela

Abstract— Nowadays, the use of mobile devices in the Several acoustic parameters are estimated to
healthcare sector is increasing significantly. Mobile evaluate the state of health of the voice. Unfortunately, the
technologies offer not only forms of communication for accuracy of these parameters in the detection of voice
multimedia content (e.g. clinical audio-visual notes and disorders is, often, related to the algorithms used to estimate
medical records) but also promising solutions for people them. For this reason the main effort of researchers is
who desire the detection, monitoring and treatment of oriented to the study of acoustic parameters and the
their health conditions anywhere and at any time. application of classification techniques able to obtain a high
Mobile health systems can contribute to make patient discrimination accuracy. Recently, speech pathology has
care faster, better and cheaper. Several pathological focused interest on machine learning techniques.
conditions can benefit from the use of mobile In this work, we want to discuss the application of
technologies. In this work we focus on the alteration of machine learning algorithms and features selection methods
the voice quality that affects about one person in three at capable of discriminating between pathological and healthy
least once in his/her lifetime. Voice disorders are rapidly voices with a better accuracy. In detail, we evaluate the
spreading, although they are often underestimated. pathology recognition using the information data of patients,
Mobile health systems can be an easy and fast support to such as age and gender, and different features extracted
voice pathology detection. The identification of an from the voice signals.
algorithm that discriminates between pathological and
healthy voices with more accuracy is necessary to realize
a valid and precise mobile health system. The key II. VOICE DISORDER DETECTION
contribution of this study is to investigate and compare Speech or, in general, the voice signal is used in several
the performance of several machine learning techniques kinds of application ranging from emotion recognition to
useful for voice pathology detection. The results patient healthcare state recognition.
obtained are evaluated in terms of accuracy, sensitivity, Several m-health solutions, adopt these signals to
specificity. They show that the best accuracy in voice estimate the state of voice health, as well as systems that use
diseases detection is achieved by the Support Vector voice signals to evaluate emotional condition. Voice
Machine algorithm or the Decision Tree one, depending pathology detection has, often, been achieved through
on the features evaluated by using opportune feature specific machine learning techniques, and over recent years,
selection methods. several approaches have been developed to improve the
performance in terms of accuracy in the discrimination
Keywords— Mobile Health Systems, Machine Learning between healthy and pathological voices.
Techniques, Voice Disorders These studies are focused on the identification of
parameters to measure the voice quality and new techniques
I. INTRODUCTION able to detect voice disorders.
The introduction of mobile devices for data transmission or Among several machine learning techniques
disease control and monitoring has been a main attraction existing in literature, Support Vector Machine (SVM) has
of research and business communities. They offer, in fact, been widely used in voice signal processing.
numerous opportunities to realize efficient mobile health The SVM technique was also used in to estimate
(mhealth) systems. These solutions can allow patients and the presence of dysphonia, investigating four types of
doctors to access medical records, clinical audio-visual pathology: chronic laryngitis, cysts, Reinke’s edema and
notes and drug information anywhere and at any time from spasmodic dysphonia. The proposed algorithm based on the
their mobile devices, such as a tablet or smartphone, to use of MFCC and Linear Discriminant Analysis (LDA) as a
monitor several conditions. M-health solutions can also be dimensionality reduction method. This algorithm identifies
used in other important applications such as the detection the presence of a pathology with a discrete accuracy (86%).
and prevention of specific diseases, decision making and the However, it was tested on a very limited dataset.
management of chronic conditions and emergencies,
improving the quality of patient care and reducing the costs
of healthcare. Several pathological conditions can be
detected and monitored, such as the well known and III. MATERIALS AND METHODS
widespread cardiovascularer diseases. In this study, we analyzed the accuracy in the discrimination
of pathological from healthy voices of the main machine
learning techniques to identify the most reliable one. The pattern of the vocal folds, quantifying the cycle-to-cycle
idea has been to integrate the best one in a valid m-health changes in amplitude.
system, where the voice signal can be acquired by a mobile
device, such as a smartphone or tablet, processed in real- Harmonic to Noise Ratio (HNR): This quantifies the ratio
time to extract the voice features, and analyzed by using the of signal information over noise due to turbulent airflow,
machine learning classifier to detect the presence or not of a resulting from an incomplete vocal fold closure in speech
voice disorders. pathologies.
In the following subsections we introduce the
C. Machine Learning Classifications
dataset used in this study, the features extracted from the
voice signal and used for the classification, and the machine In order to make an exhaustive comparison, we
learning techniques compared. have chosen different machine learning algorithms.
Actually, each of them has been chosen as a representative
of a class of algorithms based on similar characteristics.
A. The database These techniques are:
In our research, we have to select a subset of voice samples
Support Vector Machine (SVM): This is a discriminative
which containing voices from healthy and pathological
classifier formally defined by a separating hyperplane that
individuals. To evaluate the patient’s voice quality the use
divides data belonging to different classes. The aim is to
of vowels is preferable because they avoid linguistic
identify the class of belonging of the different data. Training
artifacts and are used in many voice assessment
a support vector machine requires the solution of a very
applications. In relation to voice disorder detection and
large quadratic programming optimization problem. To
identification problems, in clinical practice, the vocalization
resolve this problem the sequential minimum optimization
of the vowel /a/ is used. In our experimental tests, we have
(SMO) technique is used, which is able to divide the
to perform the experiments on a well-balanced database
optimization problem into a series of smaller possible
containing both pathological and healthy voices. To test the
problems.
capability of the considered algorithms, we have selected
the pathological voices from all types of pathology existing
Decision Tree (DT): This technique is used to classify
in the database. There are organic voice disorders, such as
categorical data in which the learned function is represented
chronic laryngitis or Reinke’s edema, and functional
by a decision tree. Decision trees are easy to interpret,
dysphonia as a hyperfunctional or hypofunctional one.
capable of working with missing values and categorical and
B. Features continuous data, characteristics of the medical field.
Feature extraction is an important task that allows an
improvement of the analysis and classification. The choice Bayesian Classification (BC): This approach named after
of which features of the speech signal to use in our study Thomas Bayes, who proposed the Bayes Theorem. The
was made by taking into account two considerations. On the classification is achieved by evaluating the probabilistic
one hand, we have used the main parameters adopted by the model that represents a set of random variables and their
specialist during the clinical evaluation; on the other, we conditional dependencies identified, respectively, as nodes
have chosen the main features used in several correlated and strings. The major advantage is the easy interpretation
studies existing in literature concerning the use of machine of the results and the robustness in dealing with missing
learning techniques for the voice classification. data.
In detail, the parameters used in clinical practice
are: Logistic Model Tree (LMT): This technique combines
logistic regression models with tree induction. It consists of
Fundamental Frequency (F0): This represents the rate of a standard decision tree structure with logistic regression
vibration of the vocal folds constituting an important index functions at the leaves.
of laryngeal function. It is at the basis of the other
parameters calculated in the acoustic analysis and most Instance-based Learning algorithms: These algorithms
noise estimation methods. use specific instances to achieve the classification
predictions. The algorithms used are k-nearest neighbor one
Jitter: This describes the instabilities of the oscillating (k-NN), where the classification is based on k nearest
pattern of the vocal folds, quantifying the cycle-to-cycle neighbors of a new instance. It is important to remark here
changes in fundamental frequency. that other classification techniques are not reported in this
study due to the poor performance achieved during the
Shimmer: This indicates the instabilities of the oscillating experiments.
IV. RESULTS & DISCUSSIONS

The performance of the selected machine learning The ROC area is a measure of the goodness of a
classification techniques was evaluated in terms of accuracy, classification algorithm evaluated by plotting a curve
sensibility, specificity and ROC area by using the following representing the sensitivity versus the complementarity of
measurements: specificity (1- specificity) and measuring the area under this
curve (AUC). The AUC can be interpreted as the average
True Positive (TP): the voice sample is pathological and value of sensitivity for all the possible values of specificity.
the algorithm recognizes this. The maximum (AUC=1) means that the algorithm is perfect
True Negative (TN): the voice sample is healthy and the in the classification between diseased and non-diseased
algorithm recognizes this. voices. On the other hand, AUC=0 means that the algorithm
False Positive (FP): the voice sample is healthy but the incorrectly classifies all subjects with diseases as negative
algorithm recognizes it as pathological. and all healthy subjects as pathological.
False Negative (FN): the voice sample is pathological but
the algorithm recognizes it as healthy. A. Feature Selection
Attribute selection is an important task that allows the
The accuracy, that is the percentage of correctly
classified instances, is defined as: improvement of the dataset analysis to identify the
redundant and/or irrelevant features to optimize memory
space and time machine computing speed. For these reasons,
(TP+TN) in our study, we have chosen to test the machine learning
Accuracy = (TP+TN+FP+FN) classification techniques over the overall database, and,
additionally, over three different subset of the database
chosen by selecting some of the calculated features applying
While the sensitivity and the specificity, that represent the following features selection methods:
respectively the test’s ability to detect positive results or the
identification of negative results, are defined as: InfoGainAttributeEval algorithm: This calculates the
information gain for each feature. The results can vary from
TP 0 (no information) to 1 (maximum information). The best
Sensitivity = (TP+FN) value is achieved by the age, followed by two MFCC
coefficients, the HNR, jitter, the second derivative, and
TN others. For our experimental tests, we excluded all features
Specificity = (TN+FP) that did not produce an information gain, this is equal to 0,
while those features with an information gain greater than 0
were considered.
Correlation method: This assesses the predictive ability of Tree and Instance-based Learning algorithms. Moreover, in
each attribute, giving us the possibility of preferring sets of this work we focus on identifying appropriate voice signal
attributes that are highly correlated with the class. features by using the comparative study of different
classifiers.
Principal Components Analysis (PCA) method: Similarly Although the accuracy values are smaller than the
we used the PCA method to select the most significant values obtained in other studies in literature, it is necessary
parameters. We selected the principal components which to highlight that all these studies are performed on very
have obtained at least 50% of the ranking. We obtained four limited and, often, non-accessible datasets. To enhance the
new parameters that are combination of several features. classification rate obtained we are interested in improving
The choice of these features selection methods was the classification phase by developing a hybrid system using
suggested by the wide use of such techniques in machine a combination of several machine-learning techniques. In
learning classifications to improve the overall quality of the future work, we want to integrate this hybrid classifier in an
patterns and/or the time required for the actual mining. m-health system, able to detect the presence or not of a
voice disorders, useful to monitoring and treatment of
patients suffering from these pathologies.
B. Classification Performance
We have evaluated the accuracy, sensitivity, specificity
and ROC area performance over the overall dataset and also REFERENCES
over the three subset obtained from the features selection 1. B. M. Silva, J. J. Rodrigues, I. de la Torre Diez, M.
methods. Lopez-Coronado, and K. Saleem, “Mobile-health: A review
Comparing the results obtained with several machine of current state in 2015,” Journal of biomedical informatics,
learning algorithms, the best accuracy in the voice vol. 56, pp. 265–272, 2015.
pathology detection was achieved using the SMO technique 2. S. Naddeo, L. Verde, M. Forastiere, G. De Pietro, and G.
using all parameters for the classification. Result confirmed Sannino, “A real-time m-health monitoring system: An
observing the successive experimental tests where we have integrated solution combining the use of several wearable
selected opportune features with two of the three features sensors and mobile devices.” in HEALTHINF, 2017, pp.
selection methods. Considering, in fact, only the features 545–552.
selected with the InfoGainAttribute Eval method and PCA 3. M. S. Hossain and G. Muhammad, “Cloud-assisted
one, the highest classification accuracy values were obtained industrial internet of things (iiot)–enabled framework for
using the SMO technique. Meanwhile, when we considered health monitoring,” Computer Networks, vol. 101, pp. 192–
the features selected with Correlation method, the best 202, 2016.
accuracy to discriminate between pathological and healthy 4. G. Sannino, I. De Falco, and G. De Pietro, “A supervised
voices was achieved with the Decision Tree Algorithm. approach to automatically extract a set of rules to support
fall detection in an mhealthsystem,” Applied Soft
Computing, vol. 34, pp. 205–216, 2015.
V. CONCLUSIONS 5. J. Mohammed, C.-H. Lung, A. Ocneanu, A. Thakral, C.
Jones, and A. Adler, “Internet of things: Remote patient
In recent years, the use of mobile multimedia services and monitoring using web services and cloud computing,” in
applications in healthcare sector has been increasing Internet of Things (iThings), 2014 IEEE International
significantly. Mobile health applications allow people to Conference on, and Green Computing and Communications
access medical information and data of interest at any time (GreenCom), IEEE and Cyber, Physical and Social
and anywhere, useful for the monitoring and detection of Computing (CPSCom), IEEE. IEEE, 2014, pp. 256–263.
specific diseases, such as dysphonia, a voice disorder often 6. R. Gravina and G. Fortino, “Automatic methods for the
underestimated that affects a great percentage of people. detection of accelerative cardiac defense response,” IEEE
Research on mobile automatic systems to estimate Transactions on Affective Computing, vol. 7, no. 3, pp.
voice disorders has received considerable attention in the 286–298, 2016.
last few years due to its objectivity and non-invasive nature. 7. K. Elemetrics, “Kay elemetrics corp. disordered voice
Machine learning techniques can be a valid support to database,” Model, vol. 4337, 1994.
investigate new approaches to signal processing in an easy 8. R. Ritchings, M. McGillion, and C. Moore, “Pathological
and fast way that can be implemented in an m-health voice quality assessment using artificial neural networks,”
solution. This study compares the performance of different Medical engineering & physics, vol. 24, no. 7, pp. 561–564,
voice pathology identification methods, taking into account 2002.
the main machine learning techniques. Several techniques
are applied such as the Support Vector Machine, Decision
Tree, Bayesian Classification Classification, Logistic Model

You might also like