You are on page 1of 10

ARTICLE IN PRESS

Validation of Acoustic Voice Quality Index Version 3.01 and


Acoustic Breathiness Index in Korean Population
*Geun-Hyo Kim, †Ben Barsties von Latoszek, and *Yeon-Woo Lee, *Busan, South Korea, and yAntwerp, Belgium

Summary: Objectives. This study aimed to verify the Acoustic Voice Quality Index (AVQI) version 3.01 and
the Acoustic Breathiness Index (ABI) as tools for acoustic analyses in the Korean language.
Methods. Concatenated voice samples of sustained vowels (SV) and continuous speech (CS) were collected
from 151 subjects with dysphonia and 71 vocally healthy subjects. The overall voice disorder severity (grade [G]
and overall severity [OS]) and breathiness severity (B) were subjected to an auditory-perceptual rating by three
raters. First, we equalized the proportions of SV and CS with respect to the time lengths of the voice samples to
improve the ecological validity. We then validated the AVQI and ABI in the Korean language, using our most
recent dataset of 1,667 voice samples. Second, we compared the results of the acoustic analyses between the
vocally healthy controls and the dysphonia groups. Third, we confirmed the concurrent validity and diagnostic
accuracy using the Spearman rank-order correlation coefficient (rs) and various statistical methods (receiver oper-
ating characteristic curve, pairwise comparison, and likelihood ratio [LR] analyses).
Results. We observed strong inter-rater reliability for G, B, and OS. Moreover, we identified 26 standardized
syllables in the CS samples (3 second voiced segments), which allowed the equalization of both voice tasks.
A comparison of the two voice groups revealed statistically significant differences in the AVQI, ABI, G, B, and
OS (all P < 0.01). Moreover, we identified strong correlations of the AVQI with G (rs > 0.88, P < 0.01) and OS
(rs > 0.84, P < 0.01) and of ABI with B (rs > 0.87, P < 0.01). Finally, we confirmed cutoffs of 3.154 (sensitivity:
90%, specificity: 89%, LR+: 8.45, and LR-: 0.12) and 3.685 (sensitivity: 88%, specificity: 86%, LR+: 6.47, and
LR-: 0.14) as optimal predictive powers for AVQI and ABI, respectively.
Conclusion. As per our results, in a sample of Korean speakers, the AVQI and ABI exhibited strong concurrent
validity for the quantification of dysphonia severity with respect to OS and B. We consider that analyses based on
the AVQI and ABI will enable the discrimination and assessment of dysphonia in clinical practice.
Key Words: Acoustic voice quality index−Acoustic breathiness index−Dysphonia−Auditory-perceptual rat-
ings.

INTRODUCTION Acoustic methods, such as two recently introduced acous-


Both clinical field and voice studies perceive voice as a mul- tic indexes with sufficient validity and reliability in voice
tidimensional entity. Vocal quality is considered to be an evaluations, could potentially overcome the limitations of
important factor in the evaluation of voice anomalies in A-P ratings. First, the Acoustic Voice Quality Index
patients with dysphonia, such as roughness and hoarseness.1 (AVQI) is a multiparametric index that measures the overall
Various types of voice evaluations have been used to under- voice quality using six acoustic parameters.5 The AVQI is
stand and confirm this multidimensional nature. Generally, used to calculate data for a sustained vowel (SV; e.g., [a:])
voice quality is assessed using acoustic, aerodynamic, and and continuous speech (CS), which are combined for the
video-endoscopic (visual) evaluations; auditory-perceptual index measurements. The inclusion of both the SV and CS
(A-P) ratings; and self-rated questionnaires.2 Standardized increases the ecological validity of the AVQI by enabling a
A-P measures, such as the Grade, Roughness, Breathiness, better representation of the quality of voice used in a general
Asthenia, Strain (GRBAS) scale, and Consensus Auditory- communication situation. An AVQI measurement may con-
Perceptual Evaluation of Voice (CAPE-V), are considered nect the SV and CS and thus, include both types of speaking
the reference standards for rating the severity of voice disor- samples, and a single score is calculated by measuring
der.3,4 However, these measures are subjective, with limited concatenated speech samples using the PRAAT script.6
validity and intra- and inter-rater reliability. Several studies have reported a relatively strong relation-
ship between the AVQI and A-P ratings of overall voice
Accepted for publication October 10, 2019. quality in various spoken languages. The regression equa-
Declarations of interest: none.
Financial disclosure: none. tions used to calculate the AVQI were developed up to the
From the *Department of Otorhinolaryngology-Head and Neck Surgery and Bio- first, second, and third version. The first version measured
medical Research Institute, Pusan National University Hospital, Busan, South Korea;
and the yFaculty of Medicine and Health Sciences, University of Antwerp, Antwerp, variables using two programs, such as the PRAAT and
Belgium. SpeechTool programs,5 whereas only PRAAT was used to
Address correspondence and reprint requests to Yeon-Woo Lee, Department of
Otorhinolaryngology-Head and Neck Surgery and Biomedical Research Institute, measure all variables in the second version.7 The third ver-
Pusan National University Hospital, 179 Gudeok-ro, Seo-gu, Busan, South Korea. sion of the AVQI (AVQIv3) was designed such that the
E-mail: ahaha1216@gmail.com
Journal of Voice, Vol. &&, No. &&, pp. &&−&& voice quality was analyzed after ensuring equal durations of
0892-1997 the SV and CS.8 The use of PRAAT to customize the AVQI
© 2019 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
https://doi.org/10.1016/j.jvoice.2019.10.005 has increased the validity of the latter. Moreover, unlike the
ARTICLE IN PRESS
2 Journal of Voice, Vol. &&, No. &&, 2019

Dysphonia Severity Index (DSI), which comprises the the ecological validity, automatic analysis of concatenated
parameters of Fo-high, intensity-low, jitter, and maximum voice samples using the PRAAT script, large data sets, a rig-
phonation time (MPT),9 the AVQI is not affected by age or orous selection process, analysis of intra- and inter-rater
gender and is sensitive to changes in vocal recovery and reliability, and various validation studies.
damage. The DSI includes the F0-high and MPT, affected The specific questions addressed in this study were as
by gender and age. As the values of these variables increase, follows:
the DSI value also increases. There is a significant difference
in the F0-high between males and females. In addition, rela- 1) What is the number of standardized syllables that cor-
tively low DSI values can be measured, especially in chil- respond to 3-second of vowel /a/ in Korean?
dren and the elderly, due to lack of breathing support (short 2) Are there significant differences between vocally
MPT). These attributes affect DSI resulting in low DSI val- healthy and pathological voice groups in acoustic vari-
ues, even though the voice quality is not bad. These varia- ables (AVQI and ABI) and A-P ratings (G, B, and
bles are not considered in the regression formula for DSI; OS)?
the DSI value may vary depending on gender and age, 3) What are the correlations between acoustic variables
regardless of the voice quality. and A-P ratings?
The second acoustic method, the Acoustic Breathiness 4) What are the optimal cutoff values and diagnostic pre-
Index (ABI), is also used to analyze concatenated voice dictive powers that can discriminate normal and patho-
samples of SV and CS and thus evaluate the severity of logical voices?
breathiness,10 a parameter characterized by turbulent noise.
Breathiness is caused by the incomplete closure of the vocal
folds during phonation and is a prominent sign of laryngeal
diseases associated with glottal gaps, such as vocal fold nod- MATERIALS AND METHODS
ules, polyps, and vocal cord paralysis caused by recurrent Subjects
laryngeal nerve injury. This quality, defined elsewhere as This retrospective study included all voice samples obtained
“audible unintentional air leakage in the voice”4, does not from March 1, 2015 to February 28, 2019. The voice sam-
occur independently of other characteristics such as asthenia ples were recorded during routine voice assessments in a
or roughness1 and is often attributed to increased airflow.11 voice clinic prior to medical intervention or behavioral ther-
Like the AVQI, the ABI is a multiparameter model and is apy. Subjects with voice disorders were recruited to partici-
composed of nine acoustic variables. The ABI and AVQIv3 pate in the present study at Pusan National University
are similar with respect to the length of the voice samples Hospital in Korea. The Institutional Review Board of
and rating systems used during analyses.12-18 According to Pusan National University Hospital approved this study
initial studies, the ABI yielded acceptable concurrent valid- [H1904-011-078]. The initial study population comprised
ity and predictive power in the Dutch,5 Spanish,13 Ger- voice samples from 253 subjects, of which 31 were excluded
man,16 and Japanese languages.18 Although verified in following a signal to noise ratio (SNR) analysis. The 222
other languages, it is necessary to validate it in Korean lan- subjects after this exclusion were randomly selected, and the
guage. For example, between Korean and English, the for- severity of the voice was evaluated by three speech-language
mer is a syllable-timed language with insignificant stress on pathologists (SLPs). We included subjects with organic,
individual words, whereas the latter is a stress-timed lan- functional, and neurological voice disorders who exhibited
guage.19 In other words, the rhythm and intonation of dysphonia at varying levels of severity. The subjects
Korean language are based on each syllable, whereas included 112 men and 110 women with a mean age
English derives these from the distribution of stressed and (§standard deviation [SD]) of 54.1 § 14.6 years (range:
unstressed syllables. Moreover, the two languages differ in 16−87 years). Medical diagnoses were determined based on
the number of consonants and vowels. Korean language has chart reviews, patient interviews, laryngeal videoendoscopy,
24 letters (14 consonants and 10 vowels), whereas English and stroboscopy examinations.
language has 26 letters (21 consonants and 5 vowels).20 In The core aim of the study was to validate the ability of the
addition, English sounds, such as /f/, /v/, /th/, and /z/, are AVQI and the ABI in the Korean language to discriminate
not present in the Korean inventory, which has led to the between vocally healthy subjects and patients with dyspho-
substitution of these sounds with the most similar ones.21 nia. The vocally healthy group comprised subjects with no
Sound composition in different languages can have different pathological laryngeal lesions or voice disorders and a grade
effects on acoustic analysis. Therefore, it is necessary to vali- of 0 on the GRBAS scale. In addition, there were no hearing
date the standard tests in each language. loss, communication problems, or neurological problems
In this study, we aimed to verify both the AVQI and ABI (brain infarction, cerebral hemorrhage, Parkinson's disease,
in a sample of Korean speakers and to determine the ability etc.). The vocally healthy group was selected according to
of both indexes to discriminate vocal pathology. We have the above criteria from those who visited the ENT clinic for
verified diverse multivariate models that can quantify dys- a medical check-up. No laryngeal disease was confirmed
phonia and help us discriminate voice disorders. This study through several evaluations such as laryngoscopy, patient
included the equalization of SV and CS lengths to improve reports, and chart review.
ARTICLE IN PRESS
Geun-Hyo Kim, et al Validation of Acoustic Voice Quality Index Version 3.01 and Acoustic Breathiness 3

Study design the original text without removing the unvoiced interval
Our research design has modified the method of research corresponding to the first 3-second of the 222 voice samples
presented in the previous study.12 Through the initial 222 containing only voiced segments. An acceptable value for
samples (phase 1), we identified the corresponding syllable the SSN requires statistically no significant differences
in response to vowel /a/ of 3-second duration, and based on between (1) the HM syllables and SSN and (2) the AVQI
this result, intra-/inter-rater reliabilities were confirmed for and ABI values.
A-P ratings (phase 2a). Subsequently, a large number of
voice samples (1,445 samples) were added for acoustic anal-
Phase 2a: Auditory-perceptual assessments
ysis (AVQI and ABI). For the correlations between acoustic
During the A-P assessments, the voice quality was rated by
measurements and A-P ratings, A-P ratings were performed
a panel of three SLPs with 7−25 years of specialized profes-
on the entire voice samples (1,667 samples), and A-P rating
sional experience in voice evaluation, therapy, and speech
results of three people were presented as averaged scores. A
science. The A-P assessments were rated by concatenating
total of 1,667 voice samples were automatically analyzed
the mid-portion of the SV (3-second) and the CS (i.e., the
for AVQI and ABI through the PRAAT script, and the
final syllable number determined in Phase 1). The GRBAS
results were saved as text files (phase 2b).
scale and CAPE-V were used for the A-P ratings. Each
rater scored the overall voice quality, which is represented
Voice samples by the grade (G)-variable of GRBAS, the overall severity
We assessed the voice samples according to the instructions (OS)-variable of CAPE-V, and the breathiness severity (B-
of the European Laryngological Society.22 We recorded all variable). The B-variable corresponds to the degree of audi-
voice samples using a cardioid AKG Perception 220 micro- ble or excessive emission of breath. The G and B variables
phone (AKG Acoustics, Vienna, Austria) situated at a fixed were scored using an ordinal four-point equalized interval
interval of 10 cm from the lips and an angle of 45° from the scale, whereas the OS variable was scored using a 100-mm
front of the mouth. The recordings were saved as WAV for- visual analog scale with anchoring points. In many previ-
mat files with a 44.1-kHz sampling rate and 16-bit quantiza- ous studies, A-P ratings were conducted mainly using G
tion using the Computerized Speech Lab (model 4500, and OS scales. In this study, we used the B of GRBAS scale
KayPENTAX, Lincoln Park, NJ). The subjects were to validate the ABI in Korean speakers according to the
instructed to sustain vowel /a/ for more than 3 second and method of research suggested in preceding studies.10,18
read a phonetically balanced Korean text “Walk” with 70 We did not evaluate the subitems of CAPE-V: breathiness
syllables (/nopʰɯn sʰane ollaga malgɯn goEgirɯl maɕimyʌ (B) because we tried to validate ABI in more representative
sʰorirɯl tsirɯmyʌn kasʰɯmi hwalts̕ aE ɲyʌʎʎinɯn tɯtʰada/ OS variables first, and then decided whether B would be
and /nʌlk̕ e pʰyʌltsʰʌinnɯn padarɯl parabomyʌn nɛ maɯm appropriate.
yʌk̚ ɕ̕ i nʌlbʌdzinɯn gʌt̚ kat̚t̕ a/) in comfortable pitch and The A-P ratings were conducted in a quiet, noise-con-
loudness. trolled room, and all judgments were completed in a single
session. All raters were allowed to rate each voice sample as
many times as necessary before completing the final rating.
Signal to noise ratio (SNR) Completion times of 222 sample ratings were not measured.
We measured the SNR to ensure the reliability of the sound To minimize factors such as fatigue and reduced concentra-
recording. Previous studies have shown that stability of the tion due to repeated A-P ratings, raters took short breaks
recording is guaranteed at a minimum SNR of 30 dB, for every 50 voice samples. The recordings of 110 randomly
although a value >42 dB is preferred.23,24 The SNRs of the selected voice samples (»50% of the initial total samples)
recordings ranged from 19.8 to 57.2 dB, with a mean value were evaluated a second time after a 2-week interval to
of 39.5 dB. The records of 31 subjects (from the pathological assess the intra-rater reliability. Moreover, the A-P ratings
voice group) with SNRs <30 dB were excluded from this produced by the three raters were compared to measure
study. inter-rater reliability. A Google survey for the GRBAS vari-
ables G and B and a customized computer program for the
OS variable of the digital CAPE-V were used to facilitate
Phase 1: Standardized syllable number in the CS
the saving of data files and analyses and enabled the auto-
segments
matic saving of the A-P scores of the voice samples. The OS
To improve the ecological validity, the ratio of the two voice
rating program was designed for our research purposes and
processing segments (3-second SV and CS) must be bal-
is not a commercialized program. We already introduced it
anced. The numbers of standardized syllables (SSN) in the
in our previous study.26 This software will be complemented
CS segments of the Korean language samples were mea-
and shared.
sured as proposed by Barsties and Maryn,25 who recom-
mended the following protocol for standardizing the CS
duration. First, we extracted all voiceless components of the Phase 2b: Acoustic measures
whole Korean sentence using the PRAAT script.6 Second, Both the AVQI and ABI were applied to the concatenated
we identified a hand-marked (HM) custom cutoff point in files comprising the 3-second SV and extracted CS voiced
ARTICLE IN PRESS
4 Journal of Voice, Vol. &&, No. &&, 2019

TABLE 1. We used the next equation, Eq. (2), to calculate the


The Primary Laryngological Diagnoses Included in the ABI10:
Present Study
ABI ¼ ð5:0447730915 ½0:172  CPPS ½0:193  Jit
Age Absolute % of
Laryngeal status (Mean § SD) number total  ½1:283  GNEmax4; 500Hz
Vocally healthy 51.8 § 13.0 197 11.8%
Nodules 55.9 § 15.7 250 15.0%  ½0:396  Hfno6; 000Hz þ ½0:01  HNRD
Sulcus 190 11.4%
Cancer 186 11.2% þ ½0:017  H1H2 þ ½1:473  SLdB
Presby 177 10.6%
SD 117 7.0%  ½0:088  SL ½68:295  PSDÞ
Palsy 112 6.7%
Polyp 79 4.7%  2:9257400394:
PTC 75 4.5%
LRPD 66 4.0%
Papilloma 59 3.5%
Leuko 49 2.9% Statistics
Edema 47 2.8% The R platform, version 3.5.2 (The R Foundation for Statis-
Cyst 46 2.8% tical Computing, Vienna, Austria) and R Studio 1.1.463 (R
FD 17 1.0% Studio Inc., Boston, MA) were used to calculate all statisti-
Total 1,667 100.0%
cal variables except the inter- and intra-rater reliabilities.
Abbreviations: Cancer, glottic cancer; Cyst, vocal cyst; FD, functional dys- The latter variables were calculated using SPSS for Win-
phonia; Leuko, leukoplakia; Edema, laryngeal edema; LPRD, laryngophar-
yngeal reflux disease; Nodules, vocal nodules; Palsy, vocal cord palsy; dows, version 20.0 (IBM Corp, Armonk, NY). A P value of
Papilloma, laryngeal papillomatosis; Polyp, vocal polyp; Presby, presby- <0.05 was considered statistically significant.
laryngis; PTC, papillary thyroid cancer; SD, spasmodic dysphonia; Sul- First, we identified and set the SSN of the CS segment
cus, sulcus vocalis.
using rounded integers corresponding to the 95% confidence
interval lower bound of the HM syllable number, as
described by Barsties and Maryn.25 The Wilcoxon signed-
rank test was used to compare the AVQI and ABI values
between the cutoff point of the CS portion of the SSN and
segment. An additional 1,445 subjects (758 male and 687 the number of HM syllables.
female) were included to enhance the validity of the acoustic Second, to verify Phase 2a, we used the intraclass correla-
analyses (AVQI and ABI), and all corresponding samples tion coefficient (ICC) to calculate the inter-rater reliability
were assessed using the PRAAT script and the results were in G, B, and OS. Previous studies have reported that the
saved as text files.6,10 The subjects' data are summarized in GRBAS scale is an ordinal scale and ICCs are suitable for
Table 1. The following acoustic variables were measured. measuring the intra-/inter-rater reliability of ordinal scale
First, the six parameters of the AVQI were determined: (1) with two or more raters.27-32 The intra-rater reliability of
smoothed cepstral peak prominence (CPPS), (2) harmonics the three raters for the 110 voice samples was measured.
to noise ratio (HNR), (3) shimmer local (SL), (4) shimmer The scores of the three SLPs were averaged, and the results
local dB (SL-dB), (5) general slope of the spectrum (Slope), were expressed as mean G, B, and OS values for the 222
and (6) trend line of slope (Tilt). Second, the nine parame- subjects. ICC was calculated using a two-way mixed, consis-
ters of the ABI were determined: (1) CPPS, (2) jitter local tency, average-measures (P < 0.05).33
(Jit), (3) glottal to noise excitation ratio at 4,500 Hz (GNE- Third, we used an independent t test to evaluate the dif-
max-4,500Hz), (4) parameters of relative level of high-fre- ferences in acoustic analyses (AVQI and ABI) and A-P rat-
quency noise between energy (from 0 to 6,000 Hz and ings (G, B, and OS) between the vocally healthy controls
energy from 6,000 to 10,000 Hz) [Hfno-6,000Hz], (5) HNR and dysphonia groups (P < 0.05).
of Dejonckere (HNR-D), 6) differences between the ampli- Fourth, the concurrent validity of the AVQI and ABI was
tudes of H1 and H2 on the spectrum (H1-H2), (7) SL, (8) determined using the correlation coefficient (r) and coeffi-
SL dB, and (9) standard deviation of the period (PSD). cient of determination (r2). The Spearman rank-order corre-
In this study, we used the following equation, Eq. (1), to lation coefficients (rs) were measured between G, OS, and
calculate the AVQIv38: AVQI values and between B and ABI values (P<0.01).
AVQI ¼ ð4:152 ½0:177  CPPS ½0:006  HNR Correlation coefficient [r] is classified as follows: r ≥ 0.9,
very high; 0.7 ≤ r < 0.9, high; 0.5 ≤ r < 0.7, moderate;
 ½0:037  SL þ ½0:941  SLdB 0.3 ≤ r < 0.5, low; and r < 0.3, negligible.34
Fifth, we performed a receiver operating characteristic
þ ½0:01  Slope þ ½0:093  TiltÞ  2:8902 (ROC) curve analysis to calculate the cutoff values and diag-
nostic predictive accuracy (i.e., area under the curve [AUC])
ARTICLE IN PRESS
Geun-Hyo Kim, et al Validation of Acoustic Voice Quality Index Version 3.01 and Acoustic Breathiness 5

of the AVQI and ABI for discriminating between the vocally TABLE 3.
healthy controls and pathological voice group. The ROC Intra-Rater Reliability of the Three SLPs Who Evaluated
curve analysis was also used to calculate the sensitivity G, B, and OS in A-P Ratings
(Sens.), specificity (Spec.), AUC, and likelihood ratio (posi-
ICC Rater 1 Rater 2 Rater 3
tive/negative result; LR+/LR-) of each acoustic measure
(AVQI or ABI) for discriminating pathological voice. G 0.949** 0.909** 0.910**
Finally, we applied a pairwise comparison to the ROC curves B 0.915** 0.892** 0.881**
and thus compared the AUCs of variables contributing to OS 0.873** 0.904** 0.901**
variance in the index parameters.35 Discriminative ability is ** P < 0.01.
Abbreviations: B, breathiness; G, gradel; ICC, intraclass correlation coeffi-
classified as follows: AUC ≥ 0.9, excellent; 0.8 ≤ AUC < 0.9, cient; OS, overall severity.
good; 0.7 ≤ AUC < 0.8, fair; and AUC < 0.7, poor.36

analyses and the A-P ratings according to groups. The


RESULTS
results demonstrate the statistically significant differences
Standardized syllable number for the CS segment confirmed for all variables (P < 0.01). Here, the vocally
In the CS segment, 10−68 syllables were required to achieve healthy group exhibited significantly low values of AVQI,
a 3-second duration. We confirmed a SSN of 26 syllables as ABI, G, B, and OS, whereas the dysphonia group exhibited
the lower 95% confidence boundary. We did not observe relatively high values.
any significant difference in the duration of the HM syllable
selection and the SSN selection of 26 syllables (z = 1.060,
P = 0.289) (see Table 2). Furthermore, comparisons of the Correlation between acoustic analyses (ABI and
AVQI and ABI values calculated using the HM syllable and AVQI) and A-P ratings (G, B, and OS)
SSN with 26 syllables did not yield statistically significant First, we observed strong correlations of the AVQI with G
differences (AVQI: z = 1.372, P = 0.170; ABI: z = 1.043, (rs > 0.88, P < 0.01) and OS (rs > 0.84, P < 0.01). Second,
P = 0.297) (see Table 2). For further analyses, the CS seg- we observed a strong correlation of the ABI with B (rs >
ment was edited based on the SSN of 26. 0.87, P < 0.01), indicating a high level of concurrent valid-
ity. Third, the coefficients of determination (r2s) of 0.77,
Reliability of auditory-perceptual judgments 0.71, and 0.76 indicated that 77%, 71%, and 76% of the var-
Tables 3 and 4 summarize the intra- and inter-rater reliabil- iances in G, OS, and B, respectively, could be attributed to
ities of the A-P ratings produced by the three raters. The the AVQI and ABI.
intra-rater reliabilities for the G, B, and OS variables ranged
from 0.909 to 0.949, from 0.881 to 0.915, and from 0.873 to
ROC analysis of diagnostic accuracy
0.904, respectively. The inter-rater reliabilities ranged from
The AUC of the AVQI was 0.955, suggesting an excellent
moderate (ICC = 0.777 between raters 1 and 3 for G) to high
predictive power for discriminating between vocally healthy
(ICC = 0.879 between raters 1 and 2 for G), with a mean ICC
controls and dysphonic voice. In the Korean population, an
of 0.833. The multiple correlation coefficients indicated strong
AVQI value of 3.154 was identified as the best cutoff point
correlations among the three A-P parameters (G: 0.936, B:
between the vocally healthy controls and dysphonia groups.
0.930, and OS: 0.933). Taken together, these statistics indicate
This value provided the best balance between Sens. and
statistically significant agreement in the A-P ratings among
Spec., with excellent values of 90% and 89%, respectively.
the three SLPs. The general variability among the three raters
In addition, the LR values at this point were acceptable (LR
was considered acceptable for the purpose of this study.
+: 8.45 and LR-: 0.12) (Figure 2).
The AUC of the ABI was 0.948, which also suggested a
Comparison of ABI and AVQI between two groups strong predictive power for discriminating vocally healthy
Figure 1 presents the mean values measured for the acoustic controls from dysphonic voice. Here, an ABI value of 3.685
variables, while Table 5 lists the average values of acoustic was identified as the optimal cutoff for distinguishing

TABLE 2.
Descriptive Outcomes Between the Hand-Marked Selection Number and Standardized Selection Number of 26 Syllables
of Time, AVQI, and ABI Values
Time (in seconds) AVQI ABI
Mean SD Mean SD Mean SD
Hand-marked selection 3.0014 0.0745 3.1206 1.4453 3.6689 1.4842
Standardized selection of 26 syllables 2.9974 0.1140 2.9801 1.5932 3.6042 1.5159
Abbreviations: ABI, acoustic breathiness index ; AVQI, acoustic voice quality index.
ARTICLE IN PRESS
6 Journal of Voice, Vol. &&, No. &&, 2019

TABLE 4.
Inter-Rater Reliability of the Three SLPs Who Judged G, B, and OS in A-P Ratings
Rater 2 Rater 3
ICC on G ICC on B ICC on OS ICC on G ICC on B ICC on OS
Rater 1 0.879** 0.791** 0.807** 0.777** 0.813** 0.832**
Rater 2 0.867** 0.872** 0.859**
** P < 0.01.
Abbreviations: B, breathiness; G, grade; ICC, intraclass correlation coefficient; OS, overall severity.

FIGURE 1. Comparison of acoustic variables (AVQI and ABI) between normal and pathological voice groups (A: AVQI, B: ABI).

breathy voice, with significant Sens. and Spec. values of 88% language and investigated the predictive power of these
and 86%, respectively. The LR values at this point were measures to discriminate dysphonia. Specifically, we evalu-
excellent (LR+: 6.47 and LR-: 0.14). Finally, a pairwise ated the severity of various voice disorders in our dataset
comparison of the ROC curves revealed no significant dif- and identified significant differences between vocally
ference between the AVQI and ABI (z = 1.353, P = 0.1759) healthy controls and those with pathological voice disor-
(Table 6; Figure 3). ders. In particular, we identified strong correlations among
both the AVQI and ABI tests with the A-P variables and
calculated the cut-off values that most accurately discrimi-
DISCUSSION nate between the two voice groups.
In this study, we validated the AVQI and ABI tests with a First, we confirmed the number of syllables required to
dataset of voice samples recorded from speakers of Korean balance the influences of SV and CS in the Korean language

TABLE 5.
Independent t Test Results for Comparison Between Normal and Pathological Groups in Acoustic Analyses and A-P
Ratings
Group Value (Mean§SD) t P
AVQI Vocally healthy 2.12 § 0.95 -26.0731 0.01**
Pathological 5.53 § 1.81
ABI Vocally healthy 2.56 § 1.06 28.022 0.01**
Pathological 5.64 § 1.49
Grade Vocally healthy 0.00 § 0.00 31.849 0.01**
Pathological 1.49 § 0.66
Breathiness Vocally healthy 0.00 § 0.00 26.121 0.01**
Pathological 1.35 § 0.66
Overall severity Vocally healthy 15.53 § 7.16 37.217 0.01**
Pathological 47.37 § 16.94
** P < 0.01.
Abbreviations: AVQI, acoustic voice quality index version 3.01; ABI, acoustic breathiness index.
ARTICLE IN PRESS
Geun-Hyo Kim, et al Validation of Acoustic Voice Quality Index Version 3.01 and Acoustic Breathiness 7

FIGURE 2. Correlation between AVQI, ABI and A-P ratings (G, OS and B).A: AVQI and Grade, B: AVQI and Overall Severity, C: ABI
and Breathiness.

and then analyzed additional speech samples using the iden- Second, a comparison of the acoustic analyses and A-P
tified SSN of 26 syllables. This value is similar to the SSNs ratings between the vocally healthy controls and pathologi-
reported for the German16 (27 syllables), Japanese18 (30 syl- cal voice groups revealed statistically significant differences
lables), Dutch25 (34 syllables), and Spanish13 (33 syllables) in all variables. These differences were confirmed by quanti-
languages. This SSN of 26 syllables did not yield statistically fying normal and pathological voices based on multivariate
significant differences with respect to the syllable length, models such as the AVQI and ABI. Although these two
AVQI, and ABI. Furthermore, the AVQI and ABI values
obtained using the SSN of 26 were fairly consistent with
those determined using individual HM syllables. Our results
suggest that this SSN facilitates a highly reliable and bal-
anced estimation of both voice tasks.

TABLE 6.
Pairwise Comparison of ROC Curve
AVQI-ABI
Difference between areas 0.00682
Standard errora 0.00504
95% Confidence interval 0.00306»0.0167
Z statistics 1.353
Significance level P = 0.1759 FIGURE 3. ROC analysis to discriminate normal and pathologi-
cal voice groups
ARTICLE IN PRESS
8 Journal of Voice, Vol. &&, No. &&, 2019

indexes differed with respect to the included parameters, identified AVQI as a meaningful index. Again, these results
both the AVQI and ABI were designed such that larger val- were comparable to those of previous studies of the Dutch
ues would indicate a poorer voice quality, thus reflecting the (LR+: 19.9, LR-: 0.27)6, English (LR+: 10.25, LR-: 0.20)52,
severity of the voice pathology. Consistent with the intended German16 (LR+: 7.4, LR-: 0.31), Japanese43 (LR+: 15.1,
design, the vocally healthy group in our study exhibited LR-: 0.29), and Spanish13 (LR+: 13.8, LR-: 0.27)
lower AVQI and ABI values than those in the dysphonia languages.
group. Previous studies similarly reported higher AVQI and Regarding breathiness severity, the ABI yielded an AUC
ABI values for subjects with voice disorders of various etiol- of 0.948, indicating a good capacity to distinguish breathi-
ogies relative to vocally healthy subjects.5,10,16,18,37,38 ness in the voice signal. This result is comparable to those of
Third, we identified statistically significant correlations previous studies of the Dutch10 (AUC = 0.95), German16
between the acoustic indexes and various A-P ratings. Both (AUC = 0.91), Japanese18 (AUC = 0.89) and Spanish13
the AVQI and ABI were strongly correlated with the G, B, (AUC = 0.92) languages and thus confirms the high predic-
and OS, (rs > 0.84, P < 0.01). In other words, both indexes tive ability of this index for the assessment of breathiness. In
appear to form strong correlations with the auditory percep- the Korean language, we identified an optimal cut-off of
tions of overall voice quality and breathiness severity in a <3.69 for distinguishing between normal and breathy voi-
Korean population. In previous studies, relatively high con- ces, with a Sens. of 88% and a Spec. of 86%. The LR statis-
current validity assessments of the A-P ratings with the tics were also acceptable, but lower than those reported
AVQI (rs: 0.73−0.92) and ABI (rs: 0.83−0.89) were from previous studies of the Dutch10 (LR+: 11.6, LR-:
reported.5,10,12,14-17,37,39-42 These correlations between the 0.19), German16 (LR+: 15.6, LR-: 0.3), Japanese18 (LR+:
acoustic and A-P evaluations confirm that the former can 8.09, LR-: 0.25), and Spanish13 (LR+: 16.0, LR-: 0.27) lan-
be considered multivariate models associated with the per- guages. Both AVQI and ABI showed statistically high levels
ceived severity of overall dysphonia and breathiness, consis- of discriminative predictive power, with no statistically sig-
tent with previous acoustic studies of these vocal nificant differences between the two acoustic variables. This
qualities.26,32,40,42-44 These acoustic evaluations differ from means that both variables have excellent discriminative
traditional analysis models, which were used to analyze SV predictability and are useful for discriminating normal voice
or CS individually.45-48 In contrast, the AVQI and ABI are from pathological voice. In this study, we conducted AVQI
determined using a multistep process involving the (1) con- and ABI measurements for entire voice groups, and we also
catenation of SV and CS segments from samples, (2) extrac- need to study groups with specific voice characteristics (e.g.,
tion only of voiced segments, and (3) calculation of models dominant in breathy voice). A further study comparing
with multiple variables. In addition, our study included groups with breathy and nonbreathy voices is expected to
more than 1,600 voice samples from subjects with voice dis- produce interesting results. In addition, we will study the
orders of various types and severity levels, which were ana- latest voice extraction algorithm and apply it to PRAAT.
lyzed automatically using the PRAAT script. An important This study has the following limitation: The multivariate
aspect of the PRAAT script was the nonvoiced extraction, models used in this study are affected by breathy voice. In a
which has applied the algorithms presented in a previous previous pilot study, we found that these models have lim-
study47 and will need to be developed to apply the non- ited predictive ability to detect rough voice, strained voice,
voiced extraction methods recently introduced to the tremulous voice, etc. For the evaluation of various voice dis-
PRAAT.49-51 Accordingly, our report presents a multivari- orders, future studies should develop multivariate models
ate large-group analysis of the acoustic measurements corre- that will detect not only the characteristics of breathy voice
sponding to overall voice quality and breathiness. but also that of rough voice, strained voice, tremulous voice,
Fourth, both the AVQI and ABI exhibited high levels of etc.
predictive power for discriminating between dysphonic and
normal voices. ROC curve analysis of AVQI identified an
Future directions
AUC of 0.955, which was comparable to the values identi-
The key point of this study is that regression equations for
fied previously for the Dutch (AUC = 0.98 and 0.92), Ger-
AVQI and ABI indexes are being developed. As regression
man (AUC = 0.90), Japanese (AUC = 0.92) and Spanish
equations change, the outcome of the study may also
languages (AUC = 0.91). Moreover, previous studies have
change. The regression equations should be continually
identified optimal AVQI cut-off values of <2.43
developed to improve the ability to quantify and differenti-
(Sens. = 78.5%, Spec. = 93.2%), <1.85 (Sens. = 72%,
ate pathological voices.
Spec. = 90%), <2.06 (Sens. = 72.1%, Spec. = 93.8%), and
<2.28 (Sens. = 74.8%, Spec. = 94.6%) for the Dutch,8,25 Ger-
man,16 Japanese,12 and Spanish13 languages, respectively. CONCLUSION
In our study, an optimal AVQI cut-off value of <3.15 was In this study, we explored the combined validity of the
determined, with a Sens. of 90% and Spec. of 89%. This AVQI and ABI tests for acoustic analyses and examined the
AVQI cutoff value is relatively large, compared to those variables that could significantly distinguish between patho-
determined for other languages. The LR statistics for the logical and normal voice in a cohort of Korean speakers.
AVQI were also acceptable (LR+: 8.45, LR-: 0.12) and We used the methodologies of both indexes to quantify
ARTICLE IN PRESS
Geun-Hyo Kim, et al Validation of Acoustic Voice Quality Index Version 3.01 and Acoustic Breathiness 9

normal and dysphonic voices and identified strong correla- 14. Kim GH, Lee YW, Bae IH, Park HJ, Lee BJ, Kwon SB. Comparison
tions of AVQI with A-P ratings G and OS, and of ABI with of two versions of the acoustic voice quality index for quantification of
A-P rating B. We further identified the optimal cutoff values dysphonia severity. J Voice. 2018.
15. Pommee T, Maryn Y, Finck C, Morsomme D. The Acoustic voice
of the AVQI and ABI tests that could be used to discrimi- quality index, version 03.01, in French and the voice handicap index.
nate between normal and pathological voices, as well as J Voice. 2018.
evaluated the corresponding diagnostic predictive accura- 16. Barsties B, Lehnert B, Janotte B. Validation of the acoustic voice qual-
cies. Our results demonstrate strong concurrent validity of ity index version 03.01 and acoustic breathiness index in German.
the AVQI and ABI tests to quantify the severity of dyspho- J Voice. 2018.
17. Englert M, Lima L, Behlau M. Acoustic voice quality index and acous-
nia, compared to the overall severity and breathiness in our tic breathiness index: Analysis with different speech material in the
sample cohort of Korean speakers. Finally, we assessed the Brazilian portuguese. J Voice. 2019.
effectiveness of both the AVQI and ABI tests to successfully 18. Hosokawa K, von Latoszek BB, Ferrer-Riesgo CA et al. Acoustic
quantify and discriminate voice disorders. We believe that breathiness index for the japanese-speaking population: Validation
study and exploration of affecting factors. J Speech Lang Hear Res
AVQI- and ABI-based analyses of voice samples will facili-
2019:1−15.
tate the differentiation and evaluation of voice disorders in 19. Ha S, Johnson CJ, Kuehn DP. Characteristics of Korean phonology:
clinical settings. review, tutorial, and case studies of Korean children speaking English.
J Commun Disord. 2009;42:163–179.
20. Teahan WJ, Cleary JG. The entropy of English using PPM-based
SUPPLEMENTARY MATERIALS models. In: Proceedings of Data Compression Conference-DCC'96.
Supplementary material associated with this article can be IEEE; 1996:53–62.
21. Koo HS. A study of production difficulties of English bilabial stops
found in the online version at https://doi.org/10.1016/j. and labiodental fricatives by Korean learners of English. Phonetics
jvoice.2019.10.005. Speech Sci. 2009;1:11–15.
22. Dejonckere PH, Bradley P, Clemente P, et al. A basic protocol for
functional assessment of voice pathology, especially for investigating
REFERENCES the efficacy of (phonosurgical) treatments and evaluating new assess-
1. Kreiman J, Gerratt BR, Berke GS. The multidimensional nature of ment techniques − guideline elaborated by the committee on phoniat-
pathologic vocal quality. J Acoust Soc Am. 1994;96:1291–1302. rics of the European Laryngological Society (ELS). Eur Arch Oto-
2. Barsties B, De Bodt M. Assessment of voice quality: Current state-of- Rhino-L. 2001;258:77–82.
the-art. Auris Nasus Larynx. 2015;42:183–188. 23. Ingrisano DRS, Perry CK, Jepson KR. Environmental noise: a threat
3. Hirano M. Psyco-acoustic evaluation of voice. Clin Exam Voice: Dis- to automatic voice analysis. Am J Speech-Lang Pat. 1998;7:91–96.
orders Human Commun. 1981:81–84. 24. Deliyski DD, Shaw HS, Evans MK. Adverse effects of environmental
4. Kempster GB, Gerratt BR, Abbott KV, Barkmeier-Kraemer J, Hill- noise on acoustic voice quality measurements. J Voice. 2005;19:15–28.
man RE. Consensus auditory-perceptual evaluation of voice: develop- 25. Barsties B, Maryn Y. The improvement of internal consistency of the
ment of a standardized clinical protocol. Am J Speech-Lang Pat. Acoustic Voice Quality Index. Am J Otolaryngol. 2015;36:647–656.
2009;18:124–132. 26. Kim GH, Lee YW, Bae IH, Park HJ, Wang SG, Kwon SB. Validation
5. Maryn Y, De Bodt M, Roy N. The Acoustic Voice Quality Index: of the acoustic voice quality index in the Korean language. J Voice.
Toward improved treatment outcomes assessment in voice disorders. J 2018.
Commun Disord. 2010;43:161–174. 27. Portney LG. Foundations of Clinical Research: Applications to Prac-
6. Maryn Y, Corthals P, Van Cauwenberge P, Roy N, De Bodt M. tice. 3th (third) Edition 2009.
Toward Improved ecological validity in the acoustic measurement of 28. Hallgren KA. Computing inter-rater reliability for observational data:
overall voice quality: combining continuous speech and sustained vow- an overview and tutorial. Tutorials Quantitative Methods Psychol.
els. J Voice. 2010;24:540–555. 2012;8:23.
7. Maryn Y, Weenink D. Objective dysphonia measures in the program 29. Webb A, Carding P, Deary IJ, MacKenzie K, Steen N, Wilson JA.
Praat: smoothed cepstral peak prominence and acoustic voice quality The reliability of three perceptual evaluation scales for dysphonia.
index. J Voice. 2015;29:35–43. Euro Arch Oto-Rhino-Laryngol Head Neck. 2004;261:429–434.
8. Barsties B, Maryn Y. External validation of the acoustic voice quality 30. Yiu EM, Ng CY. Equal appearing interval and visual analogue scaling
index version 03.01 with extended representativity. Ann Otol Rhinol of perceptual roughness and breathiness. Clin Linguist Phon.
Laryngol. 2016;125:571–583. 2004;18:211–229.
9. Wuyts FL, De Bodt MS, Molenberghs G, et al. The Dysphonia 31. Mutlu A, Livanelioglu A, Gunel MK. Reliability of Ashworth and
Severity Index: an objective measure of vocal quality based on a multi- Modified Ashworth scales in children with spastic cerebral palsy. BMC
parameter approach. J Speech Language Hearing Res. 2000;43:796– Musculoskelet Disord. 2008;9:44.
809. 32. Maryn Y, Kim HT, Kim J. Auditory-Perceptual and Acoustic meth-
10. Barsties v, Latoszek B, Maryn Y, Gerrits E, De Bodt M. The Acoustic ods in measuring dysphonia severity of Korean speech. J Voice.
Breathiness Index (ABI): a multivariate acoustic model for breathiness. 2016;30:587–594.
J Voice. 2017;31:511–e511. 33. McGraw KO, Wong SP. Forming inferences about some intraclass
11. Fritzell B, Hammarberg B, Gauffin J, Karlsson I, Sundberg J. correlation coefficients. Psychol Methods. 1996;1:30–46.
Breathiness and insufficient vocal fold closure. J Phonetics. 34. Mukaka MM. A guide to appropriate use of correlation coefficient in
1986;14:549–553. medical research. Malawi Med J. 2012;24:69–71.
12. Hosokawa K, Barsties VLB, Iwahashi T, et al. The Acoustic voice 35. Delong ER, Delong DM, Clarkepearson DI. Comparing the areas
quality index version 03.01 for the Japanese-speaking population. under 2 or more correlated receiver operating characteristic curves - a
J Voice. 2019;33:125 e121-125 e112. nonparametric approach. Biometrics. 1988;44:837–845.
13. Hernandez JD, Gomez NML, Jimenez A, Izquierdo LM, vander 36. Muller MP, Tomlinson G, Marrie TJ, et al. Can routine laboratory
Latoszek B. Validation of the acoustic voice quality index version tests discriminate between severe acute respiratory syndrome and other
03.01 and the acoustic breathiness index in the Spanish language. Ann causes of community-acquired pneumonia? Clin Infect Dis.
Otol Rhinol Laryngol. 2018;127:317–326. 2005;40:1079–1086.
ARTICLE IN PRESS
10 Journal of Voice, Vol. &&, No. &&, 2019

37. Englert M, Lima L, Constantini AC, Latoszek BBV, Maryn Y, Behlau index for objective measurement of Dysphonia severity. Acta Otorrino-
M. Acoustic Voice Quality Index - AVQI for Brazilian Portuguese laringol (English Edition). 2017;68:204–211.
speakers: analysis of different speech material. Codas. 2019;31: 45. Watts CR, Awan SN, Worth F, Bodt D, Cauwenberge V. Use of spec-
e20180082. tral / cepstral analyses for differentiating normal from hypofunctional
38. Barsties VLB, Ulozaite-Staniene N, Petrauskas T, Uloza V, Maryn Y. voices in sustained vowel. J Speech, Language Hearing Res.
Diagnostic accuracy of dysphonia classification of DSI and AVQI. 2011;54:1525–1538.
Laryngoscope. 2019;129:692–698. 46. Awan SN, Roy N, Dromey C. Estimating dysphonia severity in con-
39. Pommee T, Maryn Y, Finck C, Morsomme D. Validation of the tinuous speech: application of a multi-parameter spectral/cepstral
acoustic voice quality index, version 03.01, in French. J Voice. 2018. model. Clin Linguist Phon. 2009;23:825–841.
40. Barsties VLB, Ulozaite-Staniene N, Maryn Y, Petrauskas T, Uloza V. 47. Parsa V, Jamieson DG. Acoustic discrimination of pathological voice:
The influence of gender and age on the acoustic voice quality index sustained vowels versus continuous speech. J Speech Lang Hear Res.
and Dysphonia severity index: a normative study. J Voice. 2001;44:327–339.
2019;33:340–345. 48. Hillenbrand J, Houde RA. Acoustic correlates of breathy vocal qual-
41. Uloza V, Barsties B, Ulozaite-Staniene N, Petrauskas T, Maryn Y. A ity: dysphonic voices and continuous speech. J Speech Lang Hear Res.
comparison of Dysphonia Severity Index and Acoustic Voice Quality 1996;39:311–321.
Index measures in differentiating normal and dysphonic voices. Eur 49. Kumar SBS, Rao KS. Voice/non-voice detection using phase of zero
Arch Otorhinolaryngol. 2018;275:949–958. frequency filtered speech signal. Speech Commun. 2016;81:90–103.
42. Uloza V, Petrauskas T, Padervinskis E, Ulozaite N, Barsties B, Maryn 50. Germain FG, Sun DL, Mysore GJ. Speaker and noise independent
Y. Validation of the acoustic voice quality index in the Lithuanian lan- voice activity detection. in. Proc. Interspeech. 2013:732–736.
guage. J Voice. 2017;31:257–e251. 51. Gorriz JM, Ramirez J, Lang EW, Puntonet CG, Turias I. Improved
43. Hosokawa K, Barsties B, Iwahashi T, et al. Validation of the acoustic likelihood ratio test based voice activity detector applied to speech rec-
voice quality index in the Japanese language. J Voice. 2017;31:260– ognition. Speech Commun. 2010;52:664–677.
e261. 52. Reynolds V, Buckland A, Bailey J, et al. Objective assessment of pedi-
44. Nun 
~ez-Batalla F, Díaz-Fresno E, Alvarez-Fern andez A, Mu~ noz Cor- atric voice disorders with the acoustic voice quality index. J Voice.
dero G, Llorente Pendas JL. Application of the acoustic voice quality 2012;26:672–e671.

You might also like