You are on page 1of 8

Journal of Voice

Vol. 10, No. 3, pp. 228-235


© 1996 Lippincott-Raven Publishers, Philadelphia

Singing Power Ratio: Quantitative Evaluation of Singing


Voice Quality

Koichi Omori, Ashutosh Kacker, Linda M. Carroll, William D. Riley, and


Stanley M. Blaugrund
Ames Vocal Dynamics Laboratol3,, Lenox Hill Hospital, New York, New York, U.S.A.

Summary: This paper presents a parameter for objectively evaluating singing


voice quality. Power spectrum of vowel sound / a / was analyzed by Fast
Fourier Transform. The greatest harmonics peak between 2 and 4 kHz and the
greatest harmonics peak between 0 and 2 kHz were identified. Power ratio of
these peaks, termed singing power ratio (SPR), was calculated in 37 singers
and 20 nonsingers. SPR of sung / a / in singers was significantly greater than in
nonsingers. In singers, SPR of sung / a / was significantly greater than that of
spoken / a/. By digital signal processing, power spectrum of sung / a / was
varied, and the processed sounds were perceptually analyzed. SPR had a sig-
nificant relationship with perceptual scores of "ringing" quality. SPR provides
an important quantitative measurement for evaluating singing voice quality for
all voice types, including soprano. Key Words: Singing voice--Ringing--
Spectrum analysis--Perceptual analysis--Singer's formant.

Professional singers have an exciting "ringing" extra formant improved the ability of the vocal tract
voice quality in the singing performance. The pres- to transfer sound (4). In spite of many investigations
ence of this quality, which corresponds to the so- (1-6), no uniform agreement of the definition of the
called singer's formant in the spectrum envelope, singer's formant exists.
enhances the singer's ability to be heard without This paper attempts to provide a new parameter
amplification over an orchestra (1). Seidner et al. of spectrum analysis for quantitative evaluation of
reported that the center frequency of the singer's singing voice quality rather than the presence or
formant varies with pitch between 2.3 and 3.0 kHz absence of the singer's formant. From previous
in basses and between 3.0 and 3.8 kHz in tenors (2). studies (1-6), the center frequency of the singer's
Sundberg reported that the center frequency of the formant lies roughly between 2 and 4 kHz in power
singer's formant varies, depending on the voice spectrum. In the first part of the present study, re-
type, and was approximately 2.2 kHz in basses, 2.7 gardless of the existence of the extra peak, the
kHz in baritones, 2.8 kHz in tenors, and 3.2 kHz in greatest harmonics peak between 2 and 4 kHz was
altos (3). He also reported that there was an extra identified in spectrum display, and its power was
formant in sung vowels between the third and measured in contrast to the power of the greatest
fourth formants of the spoken vowels and that this harmonics peak between 0 and 2 kHz in trained
singers (professional and nonprofessional) and
nonsingers. In the second part of the study, to as-
Accepted November 28, 1995.
Address correspondence and reprint requests to Dr. Koichi certain whether the greatest harmonics peak be-
Omori, Department of Otolaryngology, Kyoto University Hos- tween 2 and 4 kHz affects singing voice quality,
pital, 54 Kawahara-cho, Shogoin, Sakyo-ku, Kyoto 606 Japan. power of the peaks was varied by digital signal pro-
This paper was presented at the Voice Foundations' 24th an-
nual symposium, Care of the Professional Voice, Philadelphia, cessing, and the processed sounds were perceptu-
1995. ally analyzed.

228
SINGING t~OWER RATIO 229

SUBJECTS AND METHODS language) program written by the members of the


Ames Vocal Dynamics Laboratory (7). Figure 1 dis-
Spectrum analysis plays an example of power spectrum of vowel
Thirty-seven trained singers, 21 professional and sound / a / sung by a trained professional singer
16 nonprofessional, were studied. Sixteen were (baritone, aged 42 years). Between 0 and 2 kHz,
males (9 baritones, 7 tenors) and 21 were females (7 two harmonics peaks were identified in the spec-
mezzo-sopranos, 14 sopranos). Age of the 37 sing- trum envelope at 580 Hz (P1) and at 950 Hz (P2).
ers ranged from 19 to 60 years (mean 32.4 years, Between 2 and 4 kHz, two harmonics peaks were
SD 10.9 years). Period of their voice training identified in the spectrum envelope at 2,280 Hz (P3)
ranged from 1 to 42 years (mean 9.1 years, and at 2,850 Hz (P4). These harmonics peaks (P1,
SD 8.8 years). The control group consisted of 10 P2, P3, and P4) correspond to the first, second,
male and 10 female nonsingers who had no voice third, and fourth formant, respectively, although
training. G r o u p d e m o g r a p h i c s are shown in the exact frequencies of the harmonics peaks were
Table 1. slightly different from the formant frequencies. The
Each individual was asked to phonate the sus- greatest harmonics peak between 2 and 4 kHz was
tained vowel sound / a / and to sing the sustained termed singing power peak (SPP). Power ratio of
vowel sound / a / at a comfortable pitch and inten- SPP and the greatest peak between 0 and 2 kHz,
sity. An ECM909 electret condenser microphone termed singing power ratio (SPR), was calculated
and TCD-DI0 and TCD-D3 digital audio tape re- and expressed in dB. As shown in Fig. 1, power
corders (Sony, Tokyo) were used for data acquisi- ratio of P4 and PI was calculated for SPR in the
tion. The distance from the microphone was set at representative case.
20 cm. The frequency response of the microphone
was from 50 Hz to 18 kHz. After the data were
collected, voice samples were checked for peak Perceptual analysis
clipping to ensure that an appropriate recording Stored vowel sounds / a / sung by the 37 singers
level had been used. The data were played back were played back and perceptually judged in Test 1.
with a PCM-2500A digital audio tape recorder Sung voice samples / a / were processed using a
(Sony) transferred to a Macintosh IIx computer Sound Designer II software (Digidesign, Menlo
(Apple Computer, Cupertino, CA, U.S.A.) through Park, CA, U.S.A.) with a Macintosh Ilcx computer
a 16-bit analog-to-digital converter at a sampling (Apple Computer) in Tests 2 and 3. Digital filter
rate of 44.1 kHz, and stored on an erasable optical function was utilized for varying power of the har-
disk. Digital audio tape recorders had completely monics peaks.
fiat frequency responses up to 20 kHz. In Test 2, power of the harmonics peaks between
A steady 92.9-millisecond portion containing 2 and 4 kHz was reduced for 12 dB from the level of
4,096 data points of each sustained vowel / a / was the original sung sample with the frequency range of
selected. The power spectrum of this portion was 300 Hz in 37 singers. When two harmonics peaks
calculated by Fast Fourier Transform (FFT) using a were identified in the spectrum envelope between 2
Hanning window. The 4,096-points FFT has a res- and 4 kHz, each power of the two peaks was re-
olution of 2,048 points in the range of 0 to 22,050 Hz duced. When only one harmonics peak was identi-
and 10.8 Hz/point in the spectrum display. Spec- fied in the spectrum envelope between 2 and 4 kHz,
trum analysis was performed utilizing a software (C power of the one peak was reduced. Figure I rep-
resents a power spectrum of the original sung sam-
ple / a / of a male singer. Figure 2 shows a power
TABLE 1. S u b j e c t s
spectrum of the processed sound sample with 12-dB
Singer reduction of power of P4 in contrast to his original
Professional Nonprofessional Nonsinger sung sample. P1, P2, and P3 of the processed sound
Male 8 8 10 sample had the same spectrum of the original sung
(Baritone) (2) (7) sample.
(Tenor) (6) (I) In Test 3, a sung vowel sound / a / of the same
Female 13 8 10 male singer was also used as an original voice sam-
(Mezzo-soprano) (5) (2) ple for further investigation. Power of SPP (P4 in
(Soprano) (8) (6)
this case) with the frequency range of 300 Hz was

Journal of Voice, Vol. 10, No. 3, 1996


230 K. OMORI E T AL.

OdB]
-10 SPP
P1 P2
-20 1
-30
FIG. 1. Power spectrum display -40
of a sung voice sample / a/. SPP,
singing power peak; SPR, singing
power ratio; Pl, P2, harmonics -50
peaks between 0 and 2 kHz; P3,
P4, harmonics peaks between 2 -60
and 4 kHz.
-70
-80
-90
-100
0 1 2 3 4 5 kHz

reduced for 6, 12, 18, or 24 dB from the level of the quality (dull voice). Degree of " r i c h n e s s " was
original voice sample. scored 1, 2, 3, 4, 5, 6, or 7, with 7 the richest and 1
All the original sung samples and processed the least rich quality (thin voice).
sound samples of singers were played back through
a 16-bit D/A converter (Digidesign) and a Room-
mate II speaker (Bose). Frequency response of the RESULTS
speaker ranged from 50 Hz to 15 kHz. Each sample
was judged by five experienced voice teachers on Spectrum analysis
two semantic bipolar scales (ringing - dull, rich - In the spectrum envelope between 2 and 4 kHz,
thin). Degree of "ringing" was scored I, 2, 3, 4, 5, two harmonics peaks were identified in 24 cases (15
6, or 7, with 7 the best and 1 the worst ringing males, 9 females) and only one peak was identified

OdB
-10
-20
-30
-40 FIG. 2. Power spectrum display
of the processed sound sample.
-50 P4, harmonics peak (original
voice sample); P4', harmonics
-60 peak with 12-dB reduction (pro-
cessed sample).
-70
-80
-90
-100
0 1 2 3 4 5 kHz

Journal of Voice, I/ol. 10, No. 3, 1996


SINGING,PO WER RA TIO 231

in 13 cases (1 male, 12 female). There was no


greater peak above 4 kHz than SSP in any case. SPR
Table 2 summarizes all the data of SPR of sung 10 (dB)
and spoken voice samples in all subjects. Figure 3
illustrates SPR of sung samples in singers (profes-
sional and nonprofessional) and nonsingers. Statis-
tical differences of SPR between singers and non-
singers were examined by analysis of variance
(ANOVA). In males and females, SPR in singers
0 Q.
was significantly greater than in nonsingers (p <
0.01). Statistical differences of SPR between pro-
fessional and nonprofessional singers were also an- -10
alyzed by ANOVA. In males and females, there
were no significant differences of SPR between pro-
fessional and nonprofessional singers.
Figure 4 illustrates SPR of spoken voice sample -20
/ a / and sung voice sample / a / in the 37 singers.
Statistical differences of SPR between spoken sam-
ple and sung sample were analyzed using ANOVA.
In male and female singers, SPR of the sung sample -30
was significantly greater than that of the spoken
sample (p < 0.01). Statistical differences of SPR 0
between 16 male singers and 21 female singers were
analyzed using ANOVA. Between male and female
singers, there were no significant differences in SPR
-40 a i !

of the sung sample and in SPR of the spoken sam- Prof. Non-prof.
ple. However, SPR of the sung sample in soprano
singers was significantly lower than that in other Singer Non-singer
voice type singers (ANOVA, p < 0.01). Data of F I G . 3. Singing p o w e r r a t i o ( S P R ) o f s u n g s a m p l e / a / in s i n g e r s
SPR of sung sample for each voice type were plot- a n d n o n s i n g e r s . Prof.: p r o f e s s i o n a l s i n g e r s ; N o n - p r o f . : n o n p r o -
fessional singers; O , male; © , female.
ted in Fig. 5.
Relationships between SPR and the age and pe- years and the singers who had voice training 1>4
riod of voice training were statistically analyzed in years. Relationships between SPR of singer's sung
sung samples of the 37 singers. SPR had no rela- sample and the acoustic parameters (fundamental
tionship with the singer's age by Pearson's correla- frequency, frequency of SPP) were statistically an-
tion coefficients. Figure 6 shows a relationship be- alyzed by Pearson's correlation coefficients. SPR
tween SPR and the period of voice training. By had no significant relationships with fundamental
ANOVA, there was a significant difference in SPR frequency and with frequency of SPP.
between the singers who had voice training <4 Relationship between voice type and frequency
of SPP in sung samples of trained singers is shown
TABLE 2. SPR o f sung and s p o k e n / a / i n singers in Fig. 7. By ANOVA, frequency of SPP in soprano
and nonsingers singers was significantly higher than that in other
Sung Spoken voice type singers (p < 0.01). Relationship between
(mean -+ SD) (mean -+ SD) fundamental frequency and frequency of SPP in
Nonsinger Male (n = 10) -21.1 -+ 2.8 - 2 2 . 4 -- 8.7 sung samples of trained singers is shown in Fig. 8.
Female (n = 10) - 2 4 . 2 -+ 6.4 - 2 2 . 9 - 6.1
Total (n = 20) - 2 2 . 7 -+ 5.1 - 2 2 . 7 -- 7.3
By Pearson's correlation coefficients, frequency of
Nonprofessional Male (n = 8) - 11.5 - 8.2 - 19.9 -+ 9.1 SPP had a significant relationship with fundamental
singer Female (n = 8) - 1 6 . 9 --- 3.3 - 2 5 . 4 -- 7.0
Total(n= 16) -14.2-+6.7 -22.6-+8.3
frequency (p < 0.01).
Professional singer Male (n = 8) - 11.8 -+ 2.2 - 18.1 -+ 2.9
Female (n = 13) - 14.0 -+ 4.4 - 2 0 . 6 -+ 4.8 Perceptual analysis
Total (n = 21) -13.1 -+3.8 - 1 9 . 7 -+ 4.2
In Test l, perceptual scores of five listeners were
SPR, singing power ratio. averaged for each sample of original sung vowel

Journal of Voice, Vol. I0, No. 3, 1996


232 K. OMORI ET AL.

SPR (spoken/a/) In Test 3, intrasubject relationship b e t w e e n


power of SPP and degree of perceptual scores was
10dB analyzed for each semantic scale by ANOVA. Ta-
ble 3 shows perceptual scores of ringing quality for
five listeners. As power of SPP became lower, the
O, score worsened in all listeners. There were signifi-
cant relationships between power of SPP and de-
...Q gree of ringing voice quality (p < 0.01). Power of
-10, SPP also had a significant relationship with degree
,.°" (~ O of richness quality (p < 0.01).
-20'
DISCUSSION
.,..." o
This paper presents a new parameter for quanti-
-30
tatively evaluating singing voice quality. Based on
the spectrum analysis, SPR of sung sample in sing-
-40 , O ~ ers was significantly greater than that in nonsingers,
-40 -30 -20 -10 0 10 dB and SPR of sung sample was significantly greater
than that of spoken sample in singers. From these
SPR (sung la/) results, SPR can represent the acoustic character-
istic of singing voice quality in trained singers.
FIG. 4. Singing power ratio (SPR) of spoken and sung sample
/ a / in singers. O, male; C), female. Based on the intersubject study of perceptual anal-
ysis, SPR had a significant relationship with ringing
sound / a / in the 37 singers. Intersubject relation-
ship between SPR and averaged perceptual scores
was statistically analyzed by Pearson's correlation SPR
coefficients for each semantic scale (ringing, rich-
ness). Ringing quality of original sung samples had 10 (dB)
a significant correlation to SPR (correlation coeffi-
cient 0.4285, p < 0.01), although there was no signif-
icant correlation to SPR in richness quality. Fig. 9
illustrates a significant relationship between SPR and
perceptual scores of ringing in original sung samples.
In Test 2, perceptual scores of 5 listeners were
averaged for each original and processed sample in []
the 37 singers. Intrasubject differences of averaged
mm
perceptual scores between the original and pro-
cessed sample were analyzed for each semantic -10 .!. == o

scale by ANOVA. Figure I0 represents perceptual


scores of ringing of the original and processed sam-
ple in each subject. Perceptual scores of ringing
• []
quality in the processed sample were significantly
worse than those in the original sample (p < 0.01). -20
Perceptual scores of richness quality also had a sig-
nificant difference between the original and pro-
cessed sample (p < 0.01). In the 24 cases with two
harmonics peaks identified in the spectrum enve-
-30
lope between 2 and 4 kHz, there were no significant ! i ! !

differences of perceptual scores in the two semantic Baritone Tenor Mezzo Soprano
scales between the original voice samples and the soprano
processed sound samples of the smaller harmonics FIG. 5. Voice type and singing power ratio (SPR) of sung sample
peaks (ANOVA). in singers. Horizontal line, mean of SPR.

Journal of Voice, Vol. 10, No. 3. 1996


SINGING,POWER RATIO 233

SPR was no greater peak >4 kHz than the greatest peak
between 2 and 4 kHz. As reported earlier (1-4), the
10 (dB) center frequency of the singer's formant varies from
2.2 to 3.8 kHz. The present study demonstrated that
SPR separated the singers' group from the nonsing-
0 ers' group and also separated sung from spoken
voices. Therefore, the frequency range of our cur-
rent study between 2 and 4 kHz was appropriate to
0
identify the greatest peak in power spectrum that
represents singing voice quality of trained singers.
0 Sundberg (4) reported that the main acoustical
0
contribution to the generation of the singer's for-
mant stems from a clustering of the third, fourth,
-10 and fifth formants. Burns (9) demonstrated that op-
era singers lowered their fourth formant, creating a
o wide-band resonance area. In clustering of these
formants, identification of the exact peak of singer's
§ o o formant is difficult and meaningless. In the percep-
-20 O0 0
0
tual analysis of our current study, power of the
greatest harmonics peak between 2 and 4 kHz (SPP)
had a significant relationship to singing voice qual-

Freq (SPP)
-30 u u I I

0-2 2-4 4-6 6+ (years) 4000- (Hz)

Period of voice training


FIG. 6. Period of voice training and singing power ratio (SPR) of
sung sample in singers.

voice quality. Based on the intrasubject study of 0


perceptual analyses, power of SPP had significant 0
• • [] 0
relationships with degree of ringing and richness
voice quality. Therefore, it is possible to quantita- • 0
tively document singing voice quality by the mea- 3000, • []
[]
O0
0
surement of SPR. Hollien and co-workers measured
the mean energy level within the 2,700- to 3,400-Hz []
frequency band contrasted to the total energy [] 8
within the signal (8). Their artistic-level singers ex- 1 O
hibited more relative energy within 2,700 to 3,400 • | []
Hz than did any nonsinger group. Although their • []
method may provide a quantitative measure of
power of singer's formant in some sense, the band • •
width within 2,700 to 3,400 Hz is in argument be-
cause the band width of singer's formant is incon-
clusive. 2000 i i i i

In spectrum analysis of sung / a / and spoken Baritone Tenor Mezzo Soprano


/ a/, it is well known that frequencies of the first and soprano
second formant are = 2 kHz, and frequencies of the
FIG. 7. Voice type and frequency of singing power peak (SPP)
third, fourth, and fifth formant are >2 kHz (4). In of sung sample in singers. Horizontal line, mean of frequency of
the spectrum analysis of our current study, there SPP.

Journal of Voice, Vol. 10, No. 3, 1996


234 K. O M O R I E T A L .

Freq (SPP) Score (Processed)


4000 (Hz)
] t l
l t
Is"

j o o 8 0,'0
• 0

O," O
• 0

I oo oo : ,'OO
,~'888 8
t •
¢•0
•• 0 8
0
0 0

I o •"
,.• O o
0 0
O
8o° O
O

2 0 0 0 ~ 2
0 100 200 300 400 500 600(Hz)
! I I ! I

Fundamental frequency 2 3 4 5 6 7
FIG. 8. Relationship between fundamental frequency and fre-
quency of singing power peak (SPP) in sung samples of singers. Score (Original)
FIG. 10. Perceptual scores of ringing for the original voice sam-
ity; the smaller harmonics peak did not. In cases ple and the processed sound sample of singing power peak (SPP)
in Test 2.
that had only one harmonics peak in the spectrum
envelope between 2 and 4 kHz, the one peak also
affected singing voice quality without an extra for- for singing voice quality regardless of the existence
mant. Power of the greatest harmonics peak in the of the extra formant and the exact center frequency
frequency range between 2 and 4 kHz is important of the singer's formant.
Hollien reported that the singer's formant has a
lower amplitude in female voices, particularly so-
Score (Ringing) pranos, than in male voices (8). From our current
study, SPR in soprano singers was significantly
lower than that in other voice type singers, although
o
6 o o
there was no significant difference in SPR between
ooo O o ° / male and female singers. Hollien also reported that
the power of the singer's formant appears to be
5

4
~ ~ooOO° o
closely related to variations in fundamental fre-
quency. Our current study, however, showed that
SPR had no significant correlation to fundamental
0 00
o frequency. The reason may be that SPR is not di-
rectly influenced by vocal fold vibration but by the
3
O O
T A B L E 3. Perceptual scores o f ringing quality in Test 3
2
Power of SPP A B C D E
1 Original 7 7 7 7 6
-30 -20 -10 () 1() (dB) 6-dB reduction
12-dB reduction
6
6
6
5
5
3
5
4
6
4
18-dB reduction 4 3 2 3 3
Singing power ratio 24-dB reduction 4 I 2 2 2

FIG. 9. Relationship between singing power ratio (SPR) and per- SPP, singing power peak.
ceptual scores of ringing in the original sung samples in Test I. A, B, C, D, E: listeners.

Journal of Voice, Vol. 10, No. 3, 1996


SINGING pOWER RATIO 235

shape of vocal tract resonators. On the other hand, CONCLUSIONS


our current study demonstrated that frequency of
SPP had significant relationships with fundamental SPR represents an acoustic characteristic of
frequency and with voice type. These findings are trained singers' voices in spectrum analysis. SPR
consistent with previous reports in which the center provides a quantitative measurement for evaluating
frequency of the singer's formant varies depending singing voice quality, and it shows a distinctive re-
on the pitch and voice type (2,3,5). lationship with period of voice training.
Because "ringing" voice quality is essential for
professional singers to be heard clearly over a large REFERENCES
orchestra, electronic instruments, or background
I. Bartholomew WT. A physical definition of "good voice
noise, they have long sought this quality and teach- quality" in the male voice. J Acoustic Soc A m 1934;6:25-33.
ers have aimed to train it. From our perceptual anal- 2. Seidner W, Schune H, Wendler J, Rauhut A. Dependence of
ysis, SPR represents ringing voice quality and also the high singing formant on pitch and vowel in different
voice types. Proceedings o f the Stockholm Music Acoustics
affects richness voice quality in trained singers. Conference, 1983.
SPR analysis provides a quantitative assessment of 3. Sundberg J. Vocal tract resonance. In: Sataloff RT, ed. The
singing voice quality, based on the ratio of the professional voice: the science and art o f clinical care. New
York: Raven Press, 1991:49-68.
power spectrum peaks, rather than simply on the 4. Sundberg J. The science o f the singing voice. De Kalb, Illi-
presence or absence of the extra formant. Our cur- nois: Northern Illinois University Press, 1987.
rent study also showed that SPR of well-trained 5. Dmitriev L, Kiselev A. Relationship between the formant
structure of different types of singing voices and the dimen-
singers, trained 1>4 years, was significantly greater sions of the supraglottal cavities. Folia Phoniatr (Basel)
than that of less trained singers. Measurement of 1979;31:238--41.
SPR is informative to a singing teacher as well as a 6. Sundberg J. Towards a definition of the singer's formant.
Proceedings o f the 23rd annual symposium: Care o f the pro-
student through biofeedback visually from spec- fessional voice, 1994.
trum analysis display. SPR analysis may help sing- 7. Shoji K, Regenbogen E, Yu JD, Blaugrund SM. High fre-
ing pedagogy refine vocal tract resonance. Al- quency power ratio of breathy voice. Laryngoscope 1992;
though our discussion of SPR is a preliminary ex- 102:267-71.
8. Hollien H. The puzzle of the singer's formant. In: Bless DM,
ploration of a new objective voice measurement Abbs JH, eds. Vocal fold physiology: contemporary re-
strategy, it is a valid strategy for the quantitative search and clinical issues. San Diego: College-Hill, 1983:
measurement of a singing voice, including the so- 368-78.
9. Burns P. Acoustical analysis of underlying voice differences
prano voice, that is not involved in the current def- between two groups of professional singers: opera and coun-
inition of the singer's formant. try and western. Laryngoscope 1986;96:549-54.

Journal of Voice. Vol. 10, No. 3, 1996

You might also like