You are on page 1of 4

Towards evidence based diagnosis of voice disorders

using Phonovibrograms

Jörg Lohscheller
University of Applied Sciences Trier, Department of Computer Science
Trier, Germany
J.Lohscheller@fh-trier.de

Abstract— Clinical diagnosis of voice disorders is based on fold vibration pattern of a subject. Hence, by describing the
examination of the oscillating vocal folds during phonation with geometric PVG structure an objective analysis of vocal fold
state-of-the-art endoscopic high-speed cameras. Commonly, the dynamics is obtained.
offline analysis is performed in a subjective and time-consuming
manner via slow-motion playback. In this study an objective
From particular interest is the identification of voice
method for overcoming this drawback is presented being based
disorders (vocal fold paralysis, functional voice disorder)
on Phonovibrogram (PVG) images. For a set of normal voices
and subjects suffering from vocal fold paralysis or functional
which occur without abnormal organic changes of vocal folds.
dysphonia the laryngeal dynamics were captured by specialized An automatic classification approach is presented which is
Phonovibrogram features and classified using a support vector capable of differentiating between normal and potentially
machine (SVM). In case of functional voice disorders a mean disturbed vocal fold vibrations (left/right vocal fold paralysis,
classification accuracy of 78.5% was obtained while in vocal fold hypo/hyper functional voice disorder). From the PVG images
paralysis a classification accuracy of up to 93% could be sets of features are extracted describing quantitatively the
obtained. The classification results show that the PVG features spatio-temporal behaviour of vocal fold (VF) dynamics. By
hold a lot of promise in supporting diagnosis of voice pathologies applying machine learning methods (Support Vector machine,
even in case of muscle tension dysphonia. SVM) to a training set of healthy and pathological voices
models of voice disorders were inductively built and
Keywords- PVG; Phonovibrogram; Vocal Folds; Dysphonia, subsequently used to predict the class membership of the
Machine Learning; Diagnosis Support. unseen cases.

I. INTRODUCTION II. METHOD


Human voice is generated within the larynx by two oscillating
vocal folds. Voice disorders are closely related to disturbances A. Data Collection
of the dynamic patterns of vocal folds. Hence, distinguishing The Richard Wolf High–Speed Video System HS
between oscillation patterns of healthy and pathological vocal ENDOCAM was used to collect the data which is equipped
folds is a fundamental part within the diagnostic of voice with a rigid 90° endoscope optic. The recordings were
disorders. State-of-the-art technique for examining the performed at a frame rate of 4,000 frames/sec while the images
laryngeal dynamics is the high-speed endoscopy which were captured with a gray scale resolution of 256 x 256 pixels.
captures the two dimensional oscillation patterns of vocal Endoscopy was performed as in conventional
folds during voice production with a frame rate of up to 4,000 videostroboscopy.
images per second. To date the clinical analysis of the
recorded high speed (HS) videos is commonly performed in a B. Clinical Material
subjective manner via slow-motion playback. The resulting The vocal fold movements of hundred and five patients
diagnoses exhibit frequently a high inter/intra-rater variability were analyzed. The diagnoses that subsequently served as a
which does not satisfy the demands of evidence based gold standard for classification and evaluation were made by
medicine. clinically experienced and speech therapists according to the
basic protocol of voice pathology assessment of the European
To overcome this drawback a computerized method is Laryngological Society [1].
presented which allows an objective description of the two- In this manner, a population of 50 women with a diagnosed
dimensional vocal fold vibration patterns. It comprises a functional voice disorder was obtained. This clinical picture is
precise extraction of vocal fold dynamics from the high speed also referred to as primary muscle tension dysphonia (MTD),
videos and further depicts the extracted movement information and is diagnosed in case of dysphonia given normal vocal fold
into a single image, called Phonovibrogram (PVG), which can morphology and motion, and the absence of organic
be grasped very intuitively. The occurring geometric structures pathological conditions [2]. The considered population
within a PVG image depend directly on the individual vocal included 25 cases with a hyperfunctional and 25 cases with a

‹,(((
hypofunctional disorder. The distinction between these two of a high-speed movie. Image processing yields the glottal
dysfunctional types is clinically made based on the patient's main axis and the vocal fold edges for both vocal folds.
overall muscle tension status, the amount of laryngeal muscle
tension applied during phonation, the varying degree of The principal steps in the procedure to derive a PVG are
hoarseness during crescendo and abnormal laryngeal posture shown in Figs. 1: Firstly, for each image the absolute values of
during connected speech [2]. the distances between the glottal symmetry axis and the vocal
fold contours are computed (Fig. 1, middle). In a further step,
The second investigated clinical group consists of patients the glottal axis is longitudinally split and the left vocal fold is
with a vocal fold paresis. This relatively prevalent voice turned 180° around the posterior ending P, (Fig. 1, middle).
disorder involves the degradation of one VF side’s vibratory The computed distances are finally stored within a column
properties, commonly caused by neural damage. While for one vector (Fig. 1). For visualization, the vector elements are color
half of the examined patients mainly the left VF side was coded (Fig. 1, right). The value of each vector element is
affected by paresis, for the other half of the test persons the represented by the pixel intensity (red); black denotes no
pathology was identified at the right side. In contrast to organic distance of the vocal fold edge towards the glottis midline.
voice disorders (e.g. nodule, polyp, edema), which can be
assessed quite reliably by analyzing laryngeal still images, it is By iterating the above described procedure for an entire
sequence within a high-speed movie the resulting set of vectors
widely held that among other things, an appropriate paralytic
diagnosis can only be made by considering the overall can be consecutively arranged in a matrix as shown in Fig.1 on
oscillation behaviour of the VFs during phonation and relating the right. The posterior endings of the vocal folds P are mapped
both sides’ dynamics to each other. to the horizontal centre line. In contrast to other approaches the
PVG image visualizes the entire dynamical characteristics of
Furthermore, as a reference population for normal voices, vocal fold vibrations along the full vocal fold length over [5-7].
the laryngeal dynamics of 25 female candidate speech
therapists were recorded. These healthy individuals exhibited D. PVG feature extraction
no voice irregularities. To objectify the criteria underlying the clinical diagnosis of
voice disorders using HS videos, the resulting PVG data matrix
C. Phonovibrogram (PVG) was subsequently analyzed to extract a set of numerical
To identify the dynamics of vocal fold vibrations, features describing the contained laryngeal movement
Phonovibrograms were computed. The detailed description of information. As VF vibrations exhibit periodically recurring
deriving PVGs was previously introduced and described in movement patterns (see Fig. 1), it is appropriate to take these
detail [3]. For the ease of readability, a brief description of individual oscillation episodes comprising distinct opening and
deriving PVGs will be given now. Phonovibrographic closing phases as a starting point for further
visualization first demands the segmentation of the laryngeal
high-speed recording, Fig 1 (left).

analysis.
Figure 1. PVG visualization of vocal fold vibrations.
Figure 2. Extraction of characteristic contour lines for all cycles allow to
As the quality of the image data, in terms of manual focus, describe the mean spatio-temproal vibration pattern of both vocal folds.
lighting condition, and image contrast varies considerably
between different high-speed images, a special image The dynamic opening and closing properties of the VFs are
segmentation procedure was developed allowing the highly characterized through the geometric shapes within each PVG
accurate reconstruction of vocal fold edges from high-speed cycle, which can be seen in Fehler! Verweisquelle konnte
recordings even in movies with low image quality. The nicht gefunden werden.. Here, according to the ELS
accuracy of the segmentation procedure was evaluated within classification given in Fehler! Verweisquelle konnte nicht
an extensive study [4]. There, it was applied to 372 different gefunden werden., the depicted movement patterns exhibit a
high-speed movies each comprising 500 frames. The procedure dorsal triangular shape. This relevant shape information can be
showed a mean information flow-rate of approx. 100 images represented using so-called contour lines.
per second and can thus be applied directly after the recording
In this manner these PVG contour lines serve as numerical quite balanced – none of the two VFs can be characterized as
features to describe the oscillation patterns of both vocal folds being functionally inferior to the other. Hence, for these
individually. particular pathological cases the common diagnostic criterion
of lateral vibration dissimilarity does not hold – by all
E. SVM Classification appearances they rather resemble cases from the healthy
control group. Thus, an automatic distinction between theses
The derived feature sets were subsequently taken as a basis pathological cases is quite difficult.
for inductively building models of normal and pathological VF
vibrations, which can potentially support clinical decision
making. For this purpose each patient’s VF oscillation patterns
were represented by numerical feature vectors computed from
the corresponding PVG. By attaching the underlying clinical
diagnoses to this representative set of feature vectors, a training
set was obtained which was then analyzed with a support
vector machine (SVM) [8].
In this study the following clinically relevant classification
tasks were examined, wherein for each classification task a
balanced group size was used.
• Healthy versus Paresis (all)
• Healthy versus Paresis (Left / Right)
• Paresis Left versus Paresis Right
• Healthy versus functional dysphonia (all)
• Healthy versus Hyper/Hypo functional dysphonia
• Hypo versus Hyper functional dysphonia
III. RESULTS AND DISCUSSION
Figure 4. Classification accuracy of classifying functional dysphonia using
The results of classifying different types are dysphonia are PVG features.
given in Fig. 3 and Fig. 4. In case of functional voice disorders
a mean classification accuracy of 78.5% was reached while in In case of functional dysphonia the subsumption of the two
vocal fold paralysis an average classification accuracy of 93% pathological (hyper/hypo) disorders into one pathology class
could be obtained. has been shown to be obstructive in terms of differentiating
between healthy and pathological vocal fold movement.
Therefore, it can be stated that it is beneficial to analyze and
model hyper- and hypofunctional diagnoses individually, as
the corresponding classification results are better than in the
class pooling approach.

Overall, the classification results show that the PVG features


hold a lot of promise in automatic supporting diagnosis of
voice pathologies in non-organic case as in paresis and muscle
tension dysphonia. In future, further clinical studies with a
higher number of subjects will be conducted to improved
training step of the applied machine learning approaches.

Figure 3. Classification accuracy of classifying vocal fold paralysis using ACKNOWLEDGMENT (HEADING 5)
PVG features. This work was supported by the German Research
Association DFG project number LO 1413/2-1.
While for both clinical pictures (paresis, functional dysphonia)
the differentiation healthy versus a pathological case achieved
a relatively high classification performance the differentiation REFERENCES
within the pathological groups does not show such high
classification accuracy. [1] PH Dejonckere, P Bradley, P Clemente, G Cornut, L Crevier-Buchman,
G Friedrich, P Van De Heyning, M Remacle, V Woisard, and
Committee on Phoniatrics of the European Laryngological Society
In case of left/right paresis is can be concluded that the (ELS). A basic protocol for functional assessment of voice pathology,
relation between the vibrations of the left and right VF side is especially for investigating the efficacy of (phonosurgical) treatments
and evaluating new assessment techniques. Guideline elaborated by the
committee on phoniatrics of the european laryngological society (els). [5] Q. Qiu, H. K. Schutte, L. Gu, and Q. Yu, “An automatic method to
Eur Arch Otorhinolaryngol, 258(2):77-82, Feb 2001. quantify the vibration properties of human vocal folds via
[2] CA Rosen and T Murry. Nomenclature of voice disorders and vocal videokymography,” Folia Phoniatr Logop, vol. 55, pp. 128–136, 2003.
pathology. Otolaryngol Clin North Am, 33(5):1035-1046, Oct 2000. [6] Y. Yan, K. Ahmad, M. Kunduk, and D. Bless, “Analysis of vocal-fold
[3] J. Lohscheller, U. Eysholdt, H. Toy, and M. Doellinger, vibrations from high-speed laryngeal images using a hilbert transform-
“Phonovibrography: mapping high-speed movies of vocal fold based methodology,” J Voice, vol. 19, pp. 161–175, 2005.
vibrations into 2-d diagrams for visualizing and analyzing the underlying [7] J. J. Jiang, S. Tang, M. Dalal, C. H. WU, and D. G. Hanson, “Integrated
laryngeal dynamics,” IEEE Trans Med Imaging, vol. 27, pp. 300-309, analyzer and classifier of glottographic signals,” IEEE Trans Rehabil
2008. Eng, vol. 6, pp. 227-234, 1998.
[4] J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, M. and Doellinger, [8] C. W. Hsu, C. C. Chang, and C. J. Lin, A practical guide to support
“Clinically evaluated procedure for the reconstruction of vocal fold vector classification, Technical report, Department of Computer Science
vibrations from endoscopic digital high-speed videos,” Med Image Anal, and Information Engineering, National Taiwan University, 2003.
vol. 11, pp. 400-413, 2007.

You might also like