You are on page 1of 10

Ann Otol Rhinol Laryngol 112:2003

CEPSTRAL PEAK PROMINENCE:


A MORE RELIABLE MEASURE OFDYSPHONIA

YOLANDA D. HEMAN-AcKAH, MD REINHARDT J. HEUER, PHD


CHICAGO, ILLINOIS PHILADELPHIA, PENNSYLVANIA
DEIRDRE D. MICHAEL, PHD ROSEMARY OSTROWSKI, MM, MS MICHELLE HORMAN, MA
MINNEAPOLIS, MINNESOTA PHILADELPHIA, PENNSYLVANIA PHILADELPHIA, PENNSYLVANIA

MARGARET M. BAROODY, MM JAMES HILLENBRAND, PHD ROBERT T. SATALOFF, MD, DMA


PHILADELPHIA, PENNSYLVANIA KALAMAZOO, MICHIGAN PHILADELPHIA, PENNSYLVANIA

Quantification of perceptual voice characteristics allows the assessment of voice changes. Acoustic measures of jitter, shimmer,
and noise-to-harmonic ratio (NHR) are often unreliable. Measures of cepstral peak prominence (CPP) may be more reliable predic-
tors of dysphonia. Trained listeners analyzed voice samples from 281 patients. The NHR, amplitude perturbation quotient, smoothed
pitch perturbation quotient, percent jitter, and CPP were obtained from sustained vowel phonation, and the CPP was obtained from
running speech. For the first time, normal and abnormal values of CPP were defined, and they were compared with other acoustic
measures used to predict dysphonia. The CPP for running speech is a good predictor and a more reliable measure of dysphonia than
are acoustic measures of jitter, shimmer, and NHR.
KEY WORDS - acoustic analysis, cepstral peak prominence, cepstrum, dysphonia, voice.

BACKGROUND AND SIGNIFICANCE sure. The CPP is calculated from the Fourier trans-
There is an increasing need for the quantification formation of the voice spectrum.
of abnormalities of the voice. The currently avail- The process of calculating the CPP begins with an
able methods of acoustic analysis have not been as understanding of the voice signal."? The primary unit
successful as originally hoped in their abilities to of the voice signal is the sine wave (Fig 1). This wave
quantify the voice consistently and reliably. 1-3 Jitter has both a frequency and an amplitude. The frequency
(frequency perturbation), shimmer (amplitude pertur- is the number of cycles the wave makes per second.
bation), and noise-to-harmonic ratio (NHR) are the The amplitude is the magnitude of deflection of the
most frequently used measures for acoustic analy-
2.0,--------------------,
sis. These measures rely on the ability to accurately
identify and track changes in fundamental frequency.
In the mildly dysphonic yet reasonably periodic voice,
this is possible. However, as the voice becomes in-
creasingly more dysphonic and less periodic, track-
ing changes in fundamental frequency becomes in-
creasingly difficult, and measures that rely on fre-
quency tracking are less reliable.
An ideal acoustic measure should be able to quan-
tify the voice signal independently, without relying
on frequency tracking or other variables that may
influence the accuracy of the measure. Such a mea- ·2.0~-,..___, _ __.-_....-~- .........-...._-..,_-~~
.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00
sure should be reliable, should correlate with the se-
Time(seconds)
verity of dysphonia, and should be reproducible. Cep-
stral peak prominence (CPP) may be one such mea- Fig 1. Graphic representation of simple sound wave.
From The Voice Center, Department of Otolaryngology-Head and Neck Surgery, University of Illinois at Chicago, Chicago, Illinois (Heman-
Ackah), the Department of Speech and Hearing Sciences, Temple University (Heuer), the American Institute for Voice and Ear Research
(Ostrowski, Horman, Baroody), and the Department of Otolaryngology-Head and Neck Surgery, Thomas Jefferson University (Sataloff), Phila-
delphia, Pennsylvania, the Department of Otolaryngology-Head and Neck Surgery, University of Minnesota, Minneapolis, Minnesota (Michael),
and the Department of Speech and Hearing Sciences, Western Michigan University, Kalamazoo, Michigan (Hillenbrand).
Presented at the meeting of the American Laryngological Association, Palm Desert, California, May 12-13,2001.
CORRESPONDENCE - Yolanda D. Heman-Ackah, MD, The Voice Center, Dept of Otolaryngology-Head and Neck Surgery, University of
Illinois at Chicago, 1855 W Taylor St (MC 648), Chicago, lL 60612.

324
Heman-Ackah et al, Cepstral Peak Prominence 325

1O.;.Wa~v.::.e...:1~ ...,

A .'0'.&.- -'

Wave 2
10,......~--------------_,

Fig 2. Graphic representation of complex sound signal and its com-


ponent sound waves.

B .'0"-- -'

Wave 3
'o,:-:-:~---------------,

Complex Wave
0Nave 1 + Wave 2 + Wave 3 + Wave 4)
c ·'0-'--- --' 15-r--------------------,
Wave 4
10.,-------------------, 10

-10

·,0+-_-~_-~_-~_-~ _ ____l .15 ......_...-_...-_...-_ _- _ -......--,......--,...---....---1


00 500 1000 1500 20.00 2500 3000 3500 40,00 4500 5000 .00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 SO.OO
Time (seconds) Time (seconds)
o E
wave in the positive and negative directions. The am- changing frequency is said to be periodic. If the fre-
plitude of the wave varies with intensity (which cor- quency varies inconsistently over time, the signal is
responds with loudness); more intense signals pro- said to be aperiodic.
duce larger amplitudes, and softer signals have small-
er amplitudes. The frequency of the wave correlates Most sounds in nature, including the human voice,
with the pitch. A signal that has a constant and un- are complex signals; that is, they consist of several
326 Heman-Ackah et al, Cepstral Peak Prominence

Harmonic
Fundamental Frequency
110
I 110

100
Frequency
I 100

""'"' 90 90
/Xl ""'
/Xl"'
"'0
'-" ~
;;... 80
...;;... 80
.'::: .iii
en
c
....sc
-
sc 70

60
0 70

60

50 Background Noise 50

40 40
0 1 234 5 0 1 2 3 4 5
Frequency (kHz) Frequency (kHz)
A B
Fig 3. Spectral representations of A) normal and B) severely dysphonic voice signals.

sine waves of different frequencies and amplitudes of frequencies in the spectrum. With a relatively pe-
that, when added together, produce one complex sig- riodic voice, the peaks of the fundamental frequency
nal (Fig 2). Each ofthese component sine waves has and the harmonic frequencies are distinctly higher in
an amplitude and a frequency. When the amplitudes amplitude than the background noise energy (Fig 3A).
of each of the component waves are added together With an aperiodic voice, there is no definable funda-
at any given moment in time, the amplitude of the mental frequency, and the amplitude is relatively
complex wave at that moment in time is derived. If equally distributed across many frequencies (Fig 3B).
the amplitudes of the component waves are graphed
If a Fourier transformation of the spectrum is per-
as a function of frequency, the amplitude (or power)
formed, a cepstrum is produced (Fig 4). Thus, the
spectrum is produced (Fig 3A). The representation
cepstrum is a spectral representation of the spectrum.
ofthe complex wave is graphed in the "time domain";
In producing a cepstrum, the spectrum is thought of
that is, amplitude is graphed as a function of time.
as a complex waveform that is the summation of
The representation of the spectrum is graphed in the
many smaller component sine waves. Each of these
"frequency domain," in which amplitude is graphed
sine waves also has an amplitude and a "frequency."
as a function of frequency. The process of transfor-
To avoid confusion in terminology, No1l4,5 renamed
mation of the initial voice signal from the time do-
the "frequency" of each of the component waves of
main to the frequency domain is called Fourier trans-
the spectrum, and called it "quefrency." Quefrency
formation.s? Often, the amplitude in the spectral rep-
is the frequency of the occurrence of the frequency
resentation is a logarithm of the amplitude rather than
in the power spectrum; the unit of measurement is
the absolute amplitude. By use of the logarithm,
cycles per frequency, which is seconds. When the
greater resolution of the difference between smaller amplitude of each of the component waves of the
amplitudes and larger amplitudes is obtained.
spectrum is graphed as a function of quefrency, the
cepstrum is produced, and the resultant Fourier trans-
The human voice is modified, in part, by resona-
formation has taken the information in the spectrum
tors in the vocal tract. Each voice has a fundamental
(the frequency domain) and transformed it to a time
frequency that is determined primarily by the vocal
(quefrency) domain.
folds. The fundamental frequency has the largest am-
plitude of all of the frequencies in the voice spec- The predominant peak in the cepstrum is the fun-
trum and corresponds to pitch. There are also other damental period of the spectrum. The fundamental
frequencies that are amplified by the resonators in period is the quefrency ("frequency") of the domi-
the vocal tract and that are usually multiples of the nant sine wave of the complex wave termed the spec-
fundamental frequency. These also produce charac- trum, just as the fundamental frequency is the fre-
teristic amplitude peaks in the spectrum and are re- quency of the dominant sine wave of the complex
ferred to as the harmonic frequencies. Because the wave termed the voice signal. The smaller-amplitude
human voice is not perfectly periodic, there are usu- peaks in the cepstrum are called rhamonics.t-' A high-
ally low-amplitude sound energies in a continuum ly periodic voice signal will have a strong peak at
Heman-Ackah et al, Cepstrai Peak Prominence 327

Fundamental
Frequency Harmonic
Freq~nCieS
110 \ / 95
/
Cepstral Peak
(fundamental period)
(CPPS-IaJ = 12.3 dB)
100
--.
~ 90
"0
'-'
...>.
'iii 80
Q
2 Rhamonic periods
.s 70
80
/' J
60

50 75
0 1 2 3 4 5 o 5 10 15 20
Frequency (kHz) Quefrency (ms)
Spectrum Cepstrum
Fig 4. Spectral and cepstral representations of normal voice signal. ePPS-lal - smoothed cepstral peak prominence for sus-
tained vowel phonation.

the fundamental frequency and at multiples of the quefrencies (Fig 6).


fundamental frequency in the voice spectrum (Fig The cepstral peak is the peak in the cepstrum with
4). These peaks will occur at regular intervals. This the highest amplitude. When a linear regression line
interval corresponds to the fundamental period of the that represents the average sound energy is drawn
cepstrum. Thus, a large-amplitude peak is seen at through the cepstrum, the distance from the cepstral
the fundamental period. peak to this linear regression line is termed the CPP.
An aperiodic or weakly periodic voice signal will This linear regression line is drawn to normalize for
have multiple similar-amplitude peaks in the voice variability in amplitude of phonation from one per-
spectrum at many frequencies, without any definite son to another, as well as from one testing situation
pattern or defined intervals. A weakly periodic sig- to another within the same person.s-? Without this
nal will produce a very low-amplitude cepstral peak linear regression line, a speaker with a weakly peri-
(Fig 5). The cepstrum of an aperiodic signal will dem- odic voice who is talking at 70 dB will produce a
onstrate multiple similar-amplitude peaks at many cepstral peak that is greater in absolute amplitude

110 95

100
--. 90 Cepstral Peak
--. ~ / (CPPS-IaJ = 5.6 dB)
~ 90 "0
"0 '-'

-
'-' a.l
>.
'r;;
Q
80 ...
"0
;:j

'2
85
2
-
0()
c:: ell
70 ::E
80
60

50 75
o 1 2 3 4 5 o 5 10 15 20
Frequency (kHz) Quefrency (ms)
Spectrum Cepstrum
Fig 5. Spectral and cepstral representations of moderately dysphonic voice signal.
328 Heman-Ackah et ai, Cepstral Peak Prominence

110 95

100
r-.
a:l 90
~
0 Cepstral Peak
';ji 80
t: ....-- (CPPS-faf = 0.\ dB)

-
~
t:
70
80
60

50 75
0 1 2 3 4 5 o 5 10 15 20
Frequency (kHz) Quefrency (ms)
Spectrum Cepstrum
Fig 6. Spectral and cepstral representations of severely dysphonic voice signal.

than a speaker with a very periodic voice who is talk- studies was that the small number of voice samples
ing at 50 dB because of the loudness of phonation did not provide sufficient information regarding a
rather than because of the prominence of the peak. wide range of dysphonic and normal voices; thus,
Thus, the addition of the linear regression line al- the range of CPP values in the population remains
lows one to determine the magnitude of the cepstral unknown, as does the ability of the CPP to differen-
peak in relation to the amplitude of phonation and tiate normal from abnormal voices in the general pop-
allows for objective comparison from one testing ulation reliably. The purposes of this study were to
situation to another without having to account for determine the ability of the CPP to predict severity
differences in loudness of phonation, microphone dis- of dysphonia reliably, to determine the reliability of
tance, or recording level. A highly periodic voice sig- the CPP in predicting dysphonia relative to other
nal will have a high-amplitude CPP, and a weakly acoustic measures, and to determine the range of nor-
periodic or aperiodic voice signal will have a low- mal and abnormal values of CPP.
amplitude CPP. METHODS
Although the idea of the cepstrum was first intro- Voice samples from 281 consecutive patients who
duced by No1l4,5 in 1964, the lack of high-speed com- presented for objective voice analysis during 1999
puters made calculating the cepstrum cumbersome at the senior author's (R.T.S.) private laryngology
and time-consuming. In 1994, Hillenbrand et a1 6 de- and professional voice practice were used in this
veloped an automated method of calculating the cep- study. The patients ranged in age from 7 to 80 years;
strum using the high-speed capabilities of modern the mean age was 43 years. Of the 281 patients, 176
computers. In addition, the concept of the linear re- were female and 105 were male. The voice samples
gression line was added as a means of normalizing had been recorded on an analog tape recorder at the
the measure for purposes of comparison, and the use time of initial patient presentation. All recordings
of the CPP was introduced.v A smoothing feature was were performed with the microphone positioned 6
added in 1996 to produce the smoothed CPP (CPPS), inches from the mouth and consisted of sustained
in which the individual cepstra are averaged over a vowel and running speech phonation. I I The sustained
given number of frames before and after the frame vowel samples were digitized at a 50-kHz sampling
of interest. 7 rate with the Computerized Speech Laboratory (CSL)
Both CPP and CPPS were shown to be reliable system (Kay Elemetrics, Pine Brook, New Jersey).
indicators of breathiness in 2 separate samples of 20 The samples were edited to include only the second
voice signals that were analyzed perceptually with I-second portion of the /a/. Samples of running speech
regard to the quality of breathiness.v-? Subsequently, were taken while the patients were reading the "Mar-
in a sample of 38 dysphonic voices, it was found vin Williams" passage. All patients had been in-
that the CPPS reflected overall dysphonia most structed to begin the reading of the passage with reci-
strongly, although it continued to correlate well with tation of the title. The running speech samples were
breathiness.I" The limitation of each of these earlier digitized at a 25-kHz sampling rate and edited by
Heman-Ackah et al, Cepstral Peak Prominence 329

TABLE 1. INTER-RATER RELIABILITY regarding the definitions of grade, roughness, breath-


Rater Grade Breathiness Roughness Strain iness, and strain, as well as the relative severities of
1 0.92 0.91 0.85 0.82 each in the training voice samples. The separation of
2 0.87 0.79 0.76 0.79 the raters into 3 groups allowed for the possibility of
3 0.92 0.84 0.83 0.87 greater variability of definitions of grade, roughness,
4 0.87 0.83 0.83 0.84 breathiness, and strain that were based more on the
5 0.94 0.88 0.84 0.89 background of the raters than on a single unified group
6 0.88 0.80 0.83 0.82 definition. The goal of doing so was to gain percep-
7 0.92 0.86 0.73 0.83 tual ratings more representative of many speech-lan-
8 0.92 0.89 0.82 0.87 guage pathologists and voice specialists than of one
9 0.94 0.87 0.92 0.88
particular region or style of analysis.
10 0.92 0.90 0.82 0.83
Mean 0.91 0.86 0.82 0.84 After the initial training period, the 3 groups were
Cronbach's coefficient 0:, p < .001 for each. asked to rate the running speech portions of the study
voice samples. The Philadelphia group rated each of
means of the CSL to consist only of the portion of the 281 voice samples. However, because of time and
the passage containing the first sentence ("Marvin scheduling constraints, the 2 Minneapolis groups
Williams is only nine"). each rated only the first 145 of the voice samples.
Perceptual Analysis. Ten individual speech-lan- The samples were arranged randomly and presented
guage pathologists and voice specialists performed to the raters in a blinded fashion. There were no sig-
perceptual ratings of the samples of running speech nificant differences in the quality or character of the
in a blinded fashion. Each of the raters had a mini- voices between the first 145 voices that were ana-
mum of 3 years of professional experience special- lyzed by all of the raters and the second 136 voices
izing in voice disorders; the range was 3 to 25 years. that were rated only by the Philadelphia group. Each
Three of the raters had their professional practices of the 10 raters analyzed the voice samples individu-
in Philadelphia, Pennsylvania; the other 7 had their ally. None of the raters were allowed to discuss their
professional practices in Minneapolis, Minnesota, ratings of the study voice samples with any of the
and its surrounding areas. The raters were separated other raters in their group. Each of the groups rated
into 3 groups of 3, 3, and 4 raters each. The 3 raters the samples on separate occasions so that none of
in Philadelphia constituted I group, and the Minne- the members from one group ever had contact with
apolis area raters constituted the other 2 groups. Rat- the members of the other groups. Each of the voice
ers whose experience and expertise in voice analy- samples was rated in the categories of grade, rough-
sis were well known to the authors were chosen to ness, breathiness, and strain in a manner similar to
rate the samples. The raters were chosen from 2 geo- the methods used in the training session.
graphically separate areas of the country to help to
minimize the effect of regional bias on the percep- Acoustic Analysis. Each of the 281 sustained vowel
tual ratings. samples underwent acoustic analysis using conven-
tional measures of jitter, shimmer, and NHR. Per-
Each of the groups underwent an initial training
cent jitter, relative average perturbation (RAP), and
session separate from the other groups. During the
smoothed pitch perturbation quotient (sPPQ) are
training session, the raters were given general defi-
measures of jitter; the RAP measures short-term fre-.
nitions of grade, roughness, breathiness, and strain.
quency perturbation, and the sPPQ measures long-
Nineteen standardized voice samples with various de-
term frequency perturbation. The amplitude pertur-
grees of overall dysphonia (grade), roughness, breath-
bation quotient (APQ) is a measure of amplitude per-
iness, and strain were presented, and the raters were
turbation or shimmer. The NHR is a ratio of the am-
asked individually to rate these samples. Each of the
plitude of the portion of the voice signal that is pri-
voice samples was rated in the categories of grade,
marily aperiodic to that portion of the voice signal
roughness, breathiness, and strain by means of a
that is primarily periodic. These measures were ob-
GRBAS-like scale. Severity in each category was
tained with the Multi-Dimensional Voice Program
quantified by a mark on a 100-mm line from most
(MDVP) model 4300B (Kay Elemetrics).
normal (0) to most abnormal (100). The distance in
millimeters from the end of the line designated "most The CPPS was obtained from both the running
normal" to the rater's mark represented the numeri- speech and the sustained vowel samples. The CPPS
cal rating of the voice sample. After the raters made software was designed by one of the co-authors
their initial judgments of the training samples, they (J.H.).6,7 The CPPS uses larger windows for smooth-
were asked to discuss their ratings with the other ing of samples of sustained vowel phonation than it
members of their group and to come to a consensus does for samples of running speech." The CSL-digi-
330 Heman-Ackah et al, Cepstral Peak Prominence
60 6O~---------------,

50 50
til
.!e III
Q. Ci
E 40 ~4O
<1l
(/) (/)

e'5.~ 30
.~
.g 30
'0
Gi
.0 20 ~E 20
E
:::J :::J
Z Z
10 Std. Dey = 25.18 10 Std. Dey = 23.04
Mean=372 Mean =20.0
0 N" 281.00 . . . . . . . . . . . . . N=281.00
00 "00 ~oo U'qo '90 ~o 600 ">00 ~o "bo "'t?o "00 ~oo V'°o ~o ~o 6'00 ">00 600 "bo "'t?o
0
Mean Grade Score Mean Breathiness Score
A B
60 6O~-------------,

50 50
til til
Ql Ql
Ci Ci
E 40 E 40
<1l
(/) ~
Ql
.9
.g 30 g~ 30
'5 '0
Gi Gi
.0 20 .0 20
E E
:::J :::J
Z Z
10 Std. Dev .. 23.81 10 Std. Dey = 23.50
Mean-25.1 Mean =27.6
0 N-281.00 o N" 281.00
7. .=! Ul ..._ .~- 6J ~ 7.
"00 ~"o ~o ~o ~o 6'''0 ">"0
0'9_ .0..
00 00 «0 «0 """0 'T.:'o Po 00 q,o "0 6bo -lbo
V o V o
Mean Roughness Score Mean Strain Score
C D
Fig 7. Histograms of distribution of mean ratings of A) grade, B) breathiness, C) roughness, and D) strain for 281 voice
samples.

tized samples of running speech were inputted direct- eral speech-language pathologists and voice special-
ly into the CPPS software for running speech; the re- ists from 2 regions of the country and was not thought
sultant value is termed the CPPS-s (CPPS for speech). to be influenced greatly by regional bias. Because
The sustained vowel samples were inputted directly the Philadelphia group was the only group that rated
into the CPPS software for sustained vowels; the re- each of the 281 samples, correlations between rat-
sultant value is termed the CPPS-faf (CPPS for fa/). ings of this group and "the gold standard" (all rat-
ers) were performed to ensure that these raters' per-
DATA ANALYSIS ceptions were not significantly different from the
standard. Pearson's correlation coefficient was used
Perceptual Analysis. The inter-rater reliabilities of
to determine the relationship between the mean rat-
the perceptual ratings of grade (overall dysphonia),
ings of the Philadelphia group and the mean of all of
breathiness, roughness, and strain were determined
the raters for the first 145 voice samples. This corre-
with Cronbach's coefficient ex (Table 1), which as-
lation was 0.968 for grade, 0.937 for breathiness,
sesses the correlation between each individual rater's
0.930 for roughness, and 0.953 for strain (p < .001
scores and that of the group, as well as the overall
for each). The correlation between the means of the
inter-rater reliability. Philadelphia raters and those ofthe Minneapolis rat-
The mean value of all raters for the first 145 voice ers was 0.942 for grade, 0.879 for breathiness, 0.865
samples was considered the "gold standard" percep- for roughness, and 0.906 for strain (p < .001 for each).
tual rating, as it represented the perceptions of sev- The mean perceptual rating of the Philadelphia rat-
Heman-Ackah et al, Cepstral Peak Prominence 331

1.0,....----------------::,£;-----9---e- - 1.0,....-----------~- -,··_··& ···-(}--------8 .... •.. Q- ...-

.8 .8

sensitivity =.89 Sensitivity = .87


~ .6 Specificity = .77 >. .6 Specificity = .90
.z CPPS-fal s 10 dB :~ CPPS-ss 5.0
.~
j .4 c'3 .4

.2

O.O+-~~~~~~~~~-~,...._~~~~_t 0.0 <l!'-~~~__._~~~___,~~~~~~~__._--l


.00 .01 .02 .04 .14 .34 .71 .93 1.00 • • • ~ ro 3 ~ • 1•
~ ~ ro MD. ~ m .00 .00 .01 .01 .10 .48 .84 .99

FalsePositive Rate(1- Specificity) FalsePositive Rate(1 - Specificity)


A B
Fig 8. Receiver operating characteristic curves for A) smoothed cepstral peak prominence for sustained vowel phonation
(CPPS-fa!) and B) smoothed cepstral peak prominence for running speech (CPPS-s) as indicators of grade of dysphonia.

ers for each of the 281 voice samples was then used ber of severely dysphonic voice samples. Specific-
as the "rating" for the individual samples. For each ity was defined as the number of mildly dysphonic
of the perceptual categories, severity was divided into voice samples with values in the normal range di-
mild, moderate, and severe. Mild was defined as a vided by the total number of mildly dysphonic voice
rating between 1 and 33, moderate was defined as a samples. Positive predictive value was defined as the
rating of 34 to 67, and severe was defined as a rating ratio of the combined sum of moderately and severely
between 68 and 100. A histogram of the relative dis- dysphonic voice samples with abnormal test scores
tribution of the ratings for each of the perceptual cate- to the total number of voice samples with abnormal
gories is displayed in Fig 7. There were too few sam- values. Negative predictive value was defined as the
ples with ratings above 50 in the categories of breath- ratio of the number of mildly dysphonic voice sam-
iness, roughness, and strain for us to make inferences ples with normal test values to the total number of
about the ability of the acoustic measures to predict voice samples with normal test values. By use ofthe
any of these individual characteristics reliably. There- same definitions of sensitivity, specificity, positive
fore, only the perception of grade, which we defined predictive value, and negative predictive value, a re-
as overall dysphonia, was used as the standard against ceiver operating characteristic (ROC) curve was
which the acoustic measures were compared. drawn to determine the criteria for positivity for the
CPPS-s and CPPS-/a/.
Acoustic Analysis. Sensitivity, specificity, positive
predictive value, and negative predictive value were RESULTS
calculated for the percent jitter, RAP, sPPQ, APQ, The ROC curves for the CPPS-s and CPPS-/a/ as
and NHR by use of the values for the normal range measures of overall dysphonia (grade) are presented
given by MDVP as the defined normal values for in Fig 8. The CPPS-s and CPPS-/a/that gave the high-
these measures. Sensitivity was defined as the num- est values of both sensitivity and specificity were
ber of severely dysphonic voice samples with val- chosen as the criteria for positivity. The range of val-
ues in the abnormal range divided by the total num- ues for the CPPS-/a/ was 0 to 16.99 dB. The crite-
TABLE 2. SENSITIVITY, SPECIFICITY, AND PREDICTIVE VALUE OF ACOUSTIC MEASURES
Positive Negative
Predictive Predictive
Measure Sensitivity Specificity Value Value
Smoothed cepstral peak prominence for running speech 0.87 0.90 0.81 0.77
Smoothed cepstral peak prominence for sustained vowel phonation 0.89 0.77 0.69 0.80
Amplitude perturbation quotient 0.87 0.55 0.54 0.82
Percent jitter 0.70 0.87 0.74 0.76
Noise-to-harmonic ratio 0.50 0.96 0.83 0.69
Relative average perturbation 0.82 0.79 0.66 0.78
Smoothed pitch perturbation quotient 0.91 0.67 0.59 0.81
332 Heman-Ackah et at, Cepstrat Peak Prominence

rion for positivity for the CPPS-faf was 10 dB or low- measure to do so is reflected in its sensitivity, speci-
er, with values above 10 dB falling within the nor- ficity, and predictive values. If dysphonia is the dis-
mal range. This criterion resulted in a sensitivity for ease of interest, the sensitivity of a measure of dys-
the CPPS-faf of 89% and a specificity of 77%. The phonia is the percentage of patients with dysphonia
positive predictive value of the CPPS-faf was 69%, who have a positive test result. The specificity is the
and the negative predictive value was 80%. For the proportion of normal patients who have a negative
CPPS-s, the criterion for positivity was 5.0 dB or test result. The sensitivity and specificity are not af-
lower. The range of CPPS-s values was 0.76 to 8.13 fected by the prevalence of the disease in the popu-
dB. All values above 5.0 dB were deemed to be within lation of interest. 12 The predictive values of the test
the normal range. By these criteria, the sensitivity of give information regarding the interpretation of a
the CPPS-s in detecting overall dysphonia was 87%, negative or positive test result. The positive predic-
and the specificity was 90%. The corresponding posi- tive value is the proportion of individuals with a posi-
tive predictive value was 81 %, and the negative pre- tive test result who have dysphonia; the negative pre-
dictive value was 77%. The sensitivity, specificity, dictive value is the proportion of patients with a nega-
and positive predictive and negative predictive val- tive test result who are normal.
ues of the percent jitter, RAP, sPPQ, APQ, and NHR < A comparison of the sensitivity, specificity, and
are presented in Table 2. predictive values of the CPPS-s and CPPS-faf to those
of the APQ, percent jitter, NHR, RAP, and sPPQ re-
DISCUSSION veals several interesting findings. The sensitivity of
The assessment of a new diagnostic test must the CPPS-s and CPPS-faf is similar to the sensitivity
" ...begin with the identification of a group of patients of the APQ and sPPQ; that is, the proportion of pa-
known to have the disorder of interest, using an ac- tients with dysphonia who test positive on these 4
cepted reference test known as the gold stan- measures is on the order of 87% to 91%. However,
dard."12(p34) The assessment of dysphonia relies on when one compares the predictive values of positive
perceptual judgments. There is no other reliable stan- tests using these measures, the CPPS-s and CPPS-faf
dard to measure dysphonia; thus, perceptual judg- are better measures, with predictive values that are
ments must be considered the gold standard. Because 81 % and 69%, respectively (versus 54% and 59%
there is an inherent bias in a rater's perceptions of for the APQ and sPPQ, respectively). Although the
dysphonia based on an individual sense of aesthet- positive predictive value of the NHR is similar to
ics, it is somewhat risky to use individual assess- that of the CPPS-s, the sensitivity of the NHR is con-
ments of dysphonia as the gold standard. This risk siderably lower than that of the CPPS-s or the CPPS-
was lessened in this study by using the perceptions lal (50% versus 87% and 89%), as is reflected in a
of 10 speech-language pathologists and voice spe- lower negative predictive value for the NHR. Al-
cialists with extensive knowledge and experience in though the percent jitter and RAP have predictive
the diagnosis of voice disorders. Although there is values that are similar to those of the CPPS-faf, both
some debate about the reliability of perceptual judg- of these measures lack the sensitivity and positive
ments, the raters in this study were found to have a predictive value of the CPPS-s.
remarkable degree of inter-rater reliability in the as-
CONCLUSIONS
sessment of overall dysphonia, with a mean correla-
tion of 0.91 (Cronbach's coefficient a, p < .00l). The CPPS-s and CPPS-/a/ are good predictors of
Thus, the use of their perceptual ratings as the gold dysphonia. Overall, the CPPS-s has better sensitiv-
staridard against which the other measures are evalu- ity, specificity, and positive and negative predictive
ated seems reasonable. Currently, there is no other values than do measures ofjitter, shimmer, and NHR.
method of assessing the voice that has this degree of The software that measures the CPPS-s and CPPS-
reliability. [al is fast and relatively easy to use, and relies merely
on the transfer into the CPP program of acoustic sig-
a
If diagnostic measure is to be clinically useful, nals captured by the CSL. The CPPS-s and CPPS-
it should be able to distinguish reliably the diseased fa/ are reliable measures that should become routine
state from the nondiseased state. The ability of the in objective voice analysis.
ACKNOWLEDGMENTS - We thank Leslie Glaze, Nancy Solomon. Miriam van Mersbergen, Carol Rue. Lynne Conley. and Robert Grider for
their contributions to this study.
REFERENCES
1. Nichols AC. Jitter and shimmer related to vocal rough- 2. Yumoto E. Sasaki Y. Okamura H. Harmonics-to-noise
ness: a comment on the Deal and Emanuel study. J Speech Hear ratio and psychophysical measurement of the degree of hoarse-
Res 1979;22:670-1. ness. J Speech Hear Res 1984;27:2-6.
Heman-Ackah et al, Cepstrai Peak Prominence 333

3. Wolfe V,Fitch J, Cornell R. Acoustic prediction of sever- 8. Kersta LG. Amplitude cross-section representation with
ity in commonly occurring voice problems. J Speech Hear Res the sound spectrograph. J Acoust Soc Am 1948;20:796-801.
1995;38:273-9. 9. Baken RJ, Orlikoff RF. Clinical measurement of speech
4. Noll AM. Short-time spectrum and "cepstrum" tech- and voice. 2nd ed. San Diego, Calif: Singular Publishing Group,
niques for vocal-pitch detection. J Acoust Soc Am 1964;36:296- 2000:227-33.
302. 10. Hernan-Ackah YD, Michael DO, Goding GS Jr. The re-
5. Noll AM. Cepstrum pitch determination. J Acoust Soc lationship between cepstral peak prominence and selected pa-
Am 1967;41:293-309. rameters of dysphonia. J Voice 2002; 16:20-7.
6. Hillenbrand J, Cleveland RA, Erickson RL. Acoustic cor- II. Price DB, Sataloff RT. Technical note. A simple tech-
relates of breathy vocal quality. J Speech Hear Res 1994;37:769- nique for consistent microphone placement in voice recording.
78. J Voice 1988;2:206-7.
7. Hillenbrand J, Houde RA. Acoustic correlates of breathy 12. Knapp RG, Miller Me. Clinical epidemiology and bio-
vocal quality: dysphonic voices and continuous speech. J Speech statistics. Malvern, Pa: Harwal Publishing Company, 1992:31-
Hear Res 1996;39:311-21. 60.

EUROPEAN ACADEMY OF ALLERGOLOGY AND CLINICAL IMMUNOLOGY


The Congress of the ENT Section of the European Academy of Allergology and Clinical Immunology will be held in Ghent, Belgium,
November 15-18, 2003, and will be accompanied on November 17-19,2003, by the Fifth International Symposium on Experimental
Rhinology and Immunology of the Nose. For further information, contact Congress Secretariat, Semico nv, Korte Meer 16, B-9OO0 Ghent,
Belgium; telephone +32 9 233 8660; fax +32 9 233 85 97; e-mail eaaci@semico.org or see the web site http://www.semico.org/serin.

You might also like