Acoustic Markers To Differentiate

Acoustic markers to differentiate gender in prepubescent childrens
speaking and singing voice

Marco Guzman
a,b,
*, Daniel Munoz
c
, Martin Vivero
d
, Natalia Marn
e
, Mirta Ramrez
e
,
Mara Trinidad Rivera
e
, Carla Vidal
e
, Julia Gerhard
f
, Catalina Gonza lez
e
a
School of Communication Sciences, University of Chile, Santiago, Chile
b
Department of Otolaryngology, Voice Center, Las Condes Clinic, Santiago, Chile
c
Barros Luco-Trudeau Hospital, Department of Network Management, Av. Jose Miguel Carrera 3604, Santiago, Chile
d
Del Salvador Hospital, Department of Otolaryngology, Avenida Salvador 364, Providencia, Santiago, Chile
e
Andres Bello National University, Fernandez Concha 700, Santiago, Chile
f
Department of Otolaryngology, University of Miami, Miami, FL, USA
1. Introduction
Several acoustic differences have been found when comparing
adult male and female voices; fundamental frequency (F0) is one of
the most investigated parameters [14]. However, F0 is less widely
documented as a distinguishing parameter when reporting
gender-related differences in children. F0 has also been considered
a relevant feature in differentiating voices across age groups.
A number of researches have demonstrated a decrease in F0
from infancy and/or preschool through puberty [59]. Anatomical
modications, specically an increased length and mass of the
vocal folds, are the main explanations for the F0 changes in human
voice. Regarding gender differences, there is some evidence to
suggest that male children, overall, have lower fundamental
frequency values than their female peers starting from about 7 to 8
years of age [10,11]. In a study with children between 8 and 10 year
of age, Whiteside et al. reported similar differences in F0 values
between genders, reporting lower frequency values for males
compared to females [12]. On the other hand, speaking funda-
mental frequency (SFF) extracted from running speech has not
been accurately associated with voice differences between genders
in children. Studies have reported only small intergender
International Journal of Pediatric Otorhinolaryngology xxx (2014) xxxxxx
A R T I C L E I N F O
Article history:
Received 8 April 2014
Received in revised form 25 June 2014
Accepted 28 June 2014
Available online xxx
Keywords:
Children
Gender
Acoustic analysis
Perceptual analysis
Singing voice
Speaking voice
A B S T R A C T
Objectives: Investigation sought to determine whether there is any acoustic variable to objectively
differentiate gender in children with normal voices.
Methods: A total of 30 children, 15 boys and 15 girls, with perceptually normal voices were examined.
They were between 7 and 10 years old (mean: 8.1, SD: 0.7 years). Subjects were required to performthe
following phonatory tasks: (1) to phonate sustained vowels [a:], [i:], [u:], (2) to read a phonetically
balanced text, and (3) to sing a song. Acoustic analysis included long-term average spectrum (LTAS),
fundamental frequency (F0), speaking fundamental frequency (SFF), equivalent continuous sound level
(Leq), linear predictive code (LPC) to obtain formant frequencies, perturbation measures, harmonic to
noise ratio (HNR), and Cepstral peak prominence (CPP). Auditory perceptual analysis was performed by
four blinded judges to determine gender.
Results: No signicant gender-related differences were found for most acoustic variables. Perceptual
assessment showed good intra and inter rater reliability for gender. Cepstrumfor [a:], alpha ratio in text,
shimmer for [i:], F3 in [a:], and F3 in [i:], were the parameters that composed the multivariate logistic
regression model to best differentiate male and female childrens voices.
Conclusion: Since perceptual assessment reliably detected gender, it is likely that other acoustic markers
(not evaluated in the present study) are able to make clearer gender differences. For example, gender-
specic patterns of intonation may be a more accurate feature for differentiating gender in childrens
voices.
2014 Published by Elsevier Ireland Ltd.
* Corresponding author at: School of Communication Sciences, University of
Chile, Avenida Independencia 1027, Santiago, Chile. Tel.: +562 2978 6605.
E-mail addresses: guzmanvoz@gmail.com,
mguzman@med.uchile.cl (M. Guzman).
G Model
PEDOT-7187; No. of Pages 7
Please cite this article in press as: M. Guzman, et al., Acoustic markers to differentiate gender in prepubescent childrens speaking and
singing voice, Int. J. Pediatr. Otorhinolaryngol. (2014), http://dx.doi.org/10.1016/j.ijporl.2014.06.030
Contents lists available at ScienceDirect
International Journal of Pediatric Otorhinolaryngology
j o ur n al hom ep ag e: www. el s evi er . c om/ l ocat e/ i j p o r l
http://dx.doi.org/10.1016/j.ijporl.2014.06.030
0165-5876/ 2014 Published by Elsevier Ireland Ltd.
differences, or even contradictory ndings. In general, neither F0
nor SFF has been reported as a reliable predictor of gender in
childrens voices [1321].
Formant frequencies have also been studied as possible
acoustic markers of gender and age [6,5,2225]. Children have
been shown to produce higher values of formant frequencies
than adult females, who, in turn, have higher formant
frequencies than adult males. Previous studies have shown that
formant frequencies decrease with age among children, with the
most evident change between 3 and 5 years of age [6,5]. There is
also some evidence to suggest that formant frequency char-
acteristics may play a role in the perceived gender of a pre-
adolescent child [22]. Girls have demonstrated higher values
than boys for vowel productions. Authors did not attribute
differences to anatomical vocal tract shape; they found that
results are due to boys using a smaller jaw opening, more lip
rounding, and/or a lower larynx position than girls (producing a
relatively longer vocal tract) [23]. In an investigation conducted
by Huber et al., no clear differences for formant frequencies
between girls and boys were demonstrated. Nevertheless,
authors pointed out that frequencies for the rst three formants
decrease with age and that there is a tendency for girls to yield
higher values than boys of comparable age [24]. Sergeant et al.
showed a linear trend in which F1 moves downwards across the
411 years age range [25]. However, authors did not nd any
systematic or signicant intergender differences for the children
sampled in the formant analysis or within any age group. It is
important to highlight that these ndings were calculated from
sung production, not spoken utterances as in previously cited
studies.
Prior works on instrumental measurements of voice in children
have also analyzed spectral energy distribution using long-term
average spectrum [2527]. Results have evidenced that spectral
energy levels at frequencies above 5.75 kHz decreased between the
ages of 4 and 11 years, while those at frequencies below 5.75 kHz
increased [26]. Related to gender differences, a similar study
conducted by White aimed to report the actual and perceived
differences between boys and girls [27]. Outcomes showed that
there are differences between genders related to the spectral
curves; a boy-like sound produced a peak at 5 kHz whereas a girl-
like sound produced a relative decrease of spectral energy at 5 kHz
Interestingly, the same energy peak at 5 kHz existed in the spectra
of girls who were wrongly but condently identied as boys.
Sound pressure level (SPL) has been used to identify possible
gender-related differences in children. Bo hme et al. conducted a
study with the purpose of developing a standard childhood voice
prole describing the capacity of a healthy vocally untrained
childs voice. Results established that boys between the ages of 7
and 10 phonated more loudly than girls [28]. SPL differences also
have been compared between different age groups. Stathopoulos
et al. reported that young children used higher SPL than young
adults when required to phonate at comfortable loudness levels
[7]. Moreover, McAllister et al. demonstrated that women and
older children approaching puberty produced a wider dynamic
range than 10-year-old children [29].
Even though a number of studies have made an attempt to nd
acoustic markers to reliably detect gender-related differences in
childrens voices, to date there are no conclusive results regarding
this issue. The present investigation sought to determine whether
there is any acoustic variable to objectively differentiate gender in
children with normal voices. To that end, we included new
acoustics measures that have not been measured in earlier studies.
The topic of the present study may be of relevance since knowledge
about normality in boys and girls voices could add more
specic information for further treatment or more accurate voice
assessment.
2. Methods
2.1. Participants
A total of 30 children, 15 boys and 15 girls, with perceptually
normal voices were included. They were between seven and 10
years old (mean: 8.1, SD: 0.7 years). The population that was
involved in the present study was selected by convenience.
Therefore, the sample size was determined using non-probabilistic
criteria. Because of technical applicability reasons, the sample size
was arbitrary selected. Participants were recruited from several
primary schools. The severity of dysphonia was assessed with the
GRBAS scale by one of the authors of this article (MG) who has
more than 12 years of experience as a voice clinician. All
participants had a GRBAS scale of 00000 (perceptually normal
voice) and no history of vocal difculty for the last year. Some
minor deviations in voice quality were rated as normal. Although
38 subjects were initially recruited, eight of them did not meet the
inclusion criteria due to higher degree in GRBAS scale. Therefore,
only thirty were included in the analysis. Parents were contacted
and informed about the aims and procedures of the study.
After information was given, parents signed an informed consent.
This study was reviewed and approved by the Andres Bello
University Institutional Review Board.
2.2. Voice recordings
All of the participants were asked to attend a single recording
session lasting no more than 30 min. Before recordings were
conducted, each subject was trained regarding the recording
process and phonatory tasks by one of the experimenters.
Individual demonstrations and verbal descriptions were provided.
Once all of the instructions were understood, children were asked
to enter into a soundproof booth to complete the voice recordings.
The following protocol was performed: (1) production of sustained
vowels [a:], [i:], and [u:] for approximately ve seconds each, (2)
reading of a phonetically balanced text for one minute, (3) singing
the song happy birthday for one minute. All phonatory tasks were
performed at the childs habitual loudness level.
Acoustic output was captured at a constant microphone-to
mouth distance of 20 cm using a condenser omnidirectional
microphone (Samson MM01; Samson Technologies, Hauppauge,
NY) connected to an audio interphase (Tascam US-122MKII; Teac
Corporation, Montebello, CA). Samples were recorded digitally at a
sampling rate of 44,000 Hz with 16 bits per sample quantization.
Samples were edited with the software Goldwave, version V5.57
(GoldWave Inc., St. Johns, Newfoundland, Canada). Audio signal
was calibrated using a 220 Hz tone at 80 dB produced with a sound
generator for further sound level measurements. The SPL of this
reference sound was measured with a Bru el & Kjr 2250L sound
level meter (Bru el & Kjr Sound & Vibration Measurement,
Nrum, Denmark) also positioned at a distance of 20 cm from the
generator.
2.3. Acoustic analysis
To compare the samples recorded from male and female
participants, most acoustic measurements were made using Praat
software Version 5.2 (Boersma and Weenink, University of
Amsterdam, Amsterdam, The Netherlands). From long-term
average spectrum (LTAS) analysis, the following variables were
assessed: (1) level difference between the F1 and F0 regions
(L1L0) [30]. L1L0 may also be described as the level difference
between 300800 Hz and 50300 Hz (Fig. 1). This level difference
provides information on the mode of phonation (degree of glottal
adduction). (2) The alpha ratio, the level difference between
M. Guzman et al. / International Journal of Pediatric Otorhinolaryngology xxx (2014) xxxxxx 2
G Model
501000 Hz and 15 kHz, which provides information on the
overall spectral slope declination (Fig. 2). (3) The energy level
difference between 15 kHz and 58 kHz (Fig. 3), which provides
information about glottal noise (breathy voice quality). All LTAS
variables were obtained from both reading and singing tasks.
A frequency bandwidth of 25 Hz and Hanning window was used
for LTAS analysis. Unvoiced sounds and pauses were automatically
eliminated from the samples by Praat software using the pitch
corrected version with standard settings.
From spectral analysis window (spectrogram obtained from
view and edit command in Praat) the following variables were
assessed: equivalent continuous sound level (Leq) from text
reading (by get intensity command), mean of speaking fundamen-
tal frequency (SFF) from text reading (by get pitch command),
mean fundamental frequency (F0) during sustained vowels (by get
pitch command). Resulting pitch curve was checked by visual
inspection before calculation. The time window for spectral
analysis varied depending on the phonatory task. After obtained
Leq values from Praat, calibration was done using values captured
with the sound level meter. Perturbation measures (jitter % and
shimmer %) and harmonic to noise ratio (HNR) were obtained from
Praat software. Linear predictive coding (LPC) was performed to
obtain the formant frequencies from F1 to F4. Fifth formant
frequency (F5) was not calculated due to lack of accuracy showed
by Praat. FFT was used to corroborate all format frequency values.
Cepstral peak prominence (CPP) was also assessed in sustained
vowels [a:], [i:], and [u:]. Because cepstrum peak is a short-term
measurement and it is obtained in a specic point of the voice
waveform, three different points in middle section of each vowel
waveform were taken and averaged for every audio sample. A Kay
Computerized Speech Laboratory (CSL) and Multi speech software
(KayPENTAX, Lincoln Park, NJ) were used to calculate CPP.
2.4. Auditory perceptual analysis
Audio samples obtained from sustained vowel/a/, text reading,
and singing (total of 90 samples) were perceptually assessed by the
four blinded raters. Additionally, 20 percent of samples were
randomly repeated in order to determine whether judges were
consistent in their perceptions (intra-rater reliability analysis).
Judges were not informed about the purpose of the study. The order
of recordings was randomized to avoid recognition of any pattern.
Raters were required to determine whether the voice sample
belonged to a girl or boy. Raters could replay each sample as many
times as they wanted before making their determination and
moving on to the next recording. The evaluation was performed in a
quiet room using a high quality loudspeaker (Audioengine, Sao
Paulo, Brazil). All the listeners reported normal hearing.
2.5. Statistical analysis
Descriptive statistics were calculated for the variables, includ-
ing mean and standard deviation. Kappa Test was performed to
assess inter and intra-rater concordance for gender. A cut point of
>0.60 was used to determinate good reliability. Fischers exact test
was used to compare proportions between male and female
accuracy by individual judges and overall. In order to obtain the
vocal characteristics that distinguished between girls and boys
voices, various acoustic variables and their relationships with
gender were univariate analyzed by t-test and then multivariate
analyzed to assess their joint association. A multivariate logistic
regression model with variables with t-test p-value at least 0.25
(Hosmer and Lemeshow criteria) in univariate analysis, was used.
Then, a stepwise technique with retention probability of 0.2 was
used. Odds ratio, sensitivity, specicity, positive predictive value
(PPV), negative predictive value (NPV) and receiver operating
characteristic (ROC) curve analysis were reported. All analyses
were performed using Stata 13.1 (StataCorp, College Station, TX).
p < 0.05 was considered to be statistically signicant, and all
reported p values were two-sided.
3. Results
3.1. Acoustic analysis
Values of F0 and formant frequencies (F1F3) extracted from
sustained vowels, are displayed in Table 1. No signicant
Fig. 1. Spectrum showing the alpha ratio.
Fig. 2. Spectrum showing the L1L0 ratio.
Fig. 3. Spectrum showing the 15/58 kHz difference.
G Model
differences were found between acoustic variables for male and
female participants, with the exception of F3 in vowel [i:]
(p = 0.0156) and F0 in vowel [u:] (p = 0.0183). There was no
signicant difference between male and female children with
regard to SFF (p = 0.5775) and Leq (p = 0.1269) obtained from the
reading of a phonetically balanced text. For SFF boys obtained
251.60 (38.54) and girls 257.86 (19.13). For Leq boys obtained
77.87 (4.53), and girls 80.09 (3.06). Results of alpha ratio, L1L0,
and 15/58 kHz difference are summarized in Table 2. No
signicant differences were found for these LTAS markers. Cepstral
analysis obtained from all sustained vowels did not evidence any
difference between boys and girls. Table 3 displays results from
perturbation measures (jitter and shimmer) and harmonic to noise
ratio. No differences were detected between male and females for
any of these parameters. Both boys and girls demonstrated higher
values for shimmer compared to normal values for adults (3%
approximately).
3.2. Auditory perceptual analysis
Results from auditory perceptual assessment performed by four
blinded judges are shown in Table 4. Kappa values indicated that
there was good intra and inter rater reliability for gender. Results
from Fischers exact test to compare proportions between male and
female accuracy by individual judge and overall were as follows:
judge 1: 67.64%; male = 63.43%; female = 76.47% (p = 0.0021),
judge 2: 83.33%; male = 61.76%; female = 94.11% (p < 0.0001),
judge 3: 76.47%; male = 50,0%; female = 94.11% (p < 0.0001), judge
4: 78.43%; male = 63.63%; female = 86.48% (p < 0.0001), and
overall: 76.47%; male = 60.72%; female = 87.13% (p < 0.0001).
Furthermore, regarding type of phonatory tasks, results showed
that gender was properly detected with 84.55% in text reading,
80.88% in singing, and 63.23% in sustained vowel production.
3.3. Multivariate logistic regression model
The acoustic variables that reached the Hosmer and Lemeshow
criteria and stepwise selection were Cepstrum [a:], alpha ratio text,
shimmer [i:], F3 [a:], and F3 [i:]. Therefore, these parameters
compose the predictive model for the present study. Numerical
results are showed in Table 5 and Fig. 4. Moreover, this model
obtained a sensitivity = 80.00%, specicity = 86.67%, positive pre-
dictive value = 85.71%, and negative predictive value = 81.25%.
Fig. 5 shows results from sensitivity/specicity analysis. The
relationship between good sensitivity and specicity of the logistic
model to predict the sex of the individual according to their vocal
characteristics is shown. Results from the receiver operating
characteristic (ROC) curve analysis are reported in Fig. 6. This
gure shows ROC curve sensitivity and specicity of the logistic
model to predict the sex of the individual according to the
predictor variables also reected in a higher value of area under the
ROC curve (0.89).
Table 1
Values of F0 and formant frequencies (F1F3) extracted from sustained vowels. Boys and girls comparison.
Parameter (Hz) Boys Girls p-Value
[a:] F0 233.25 40.69 248.25 27.45 0.2465
F1 820.16 139.82 888.28 148.32 0.2062
F2 1574.97 147.07 1621.31 100.35 0.3221
F3 3080.07 307.93 2825.67 590.79 0.1503
F4 5396.33 536.65 5269.33 414.80 0.4744
[i:] F0 238.31 42.31 265.28 29.89 0.0535
F1 384.05 82.74 414.89 84.05 0.3199
F2 2426.77 477.20 2294.27 707.27 0.5524
F3 3287.60 229.11 3472.50 157.37 0.0156
F4 5466 558.56 5361.26 288.85 0.5241
[u:] F0 230.34 46.77 265.29 27.00 0.0183
F1 448.78 78.73 453.91 59.12 0.8414
F2 1134.24 317.29 966.97 206.41 0.0980
F3 2890.38 191.85 2665.36 567.03 0.1566
F4 6388.2 556.88 6452.86 223.76 0.6796
Table 2
Results of alpha ratio and L1L0. Boys and girls comparison.
Boys Girls p-Value
Alpha ratio text (dB) 18.90 2.39 17.13 3.14 0.0926
Alpha ratio song (dB) 18.42 2.83 17.08 3.11 0.2278
L0L1 text (dB) 2.37 4.43 3.36 3.37 0.4963
L0L1 song (dB) 1.28 4.03 2.75 2.60 0.2448
15/58 text (dB) 17.57 2.80 16.02 3.19 0.2691
15/58 song (dB) 16.58 4.33 17.41 3.20 0.5538
Table 3
Values of perturbation measures (jitter and shimmer) and harmonic to noise ratio.
Boys and girls comparison.
Vowel Boys Girls p-Value
Shimmer (%) [a:] 8.04 2.71 7.12 3.79 0.4493
[i:] 6.38 2.78 4.74 1.44 0.0533
[u:] 5.33 1.75 4.71 1.39 0.2943
Jitter (%) [a:] 0.56 0.24 0.48 0.28 0.4150
[i:] 0.63 0.40 0.64 0.67 0.9687
[u:] 0.42 0.11 0.53 0.32 0.2517
HNR (dB) [a:] 14.38 3.38 14.8 3.90 0.7373
[i:] 17.02 3.93 17.56 2.40 0.6579
[u:] 21.81 2.52 21.33 1.97 0.5652
Table 4
Results from auditory perceptual assessment performed by four blinded judges.
Judge 1 Judge 2 Judge 3 Judge 4 Kappa
Vowel [a:] k = 0.48 K = 0.95 K = 0.73 k = 0.81 0.68
Text K = 0.67 K = 1 K = 0.86 k = 0.81 0.83
Song K = 0.35 K = 1 K = 0.62 k = 1 0.74
Table 5
Estimated results from multivariate logistic regression model for gender prediction.
Variable Odds ratio [95% CI] p-Value
Shimmer [i:] (%) 2.36 [1.144.88] 0.019
Cepstrum [a:] (%) 0.62 [0.380.99] 0.046
F3 [a:] (Hz) 1.001 [0.991.004] 0.156
F3 [i:] (Hz) 0.73 [0.620.81] 0.001
Alpha ratio text (dB) 0.71 [0.570.94] 0.002
G Model
4. Discussion
The present investigation examined several acoustic variables
as possible objective markers of gender in childrens voices. To the
best of our knowledge, this is the rst study to include as possible
acoustic markers the cepstral peak prominence and parameters
related to spectral slope. It is also the rst attempt to include
speaking and singing voice together. To examine whether the
acoustic variables were sensitive to gender, we compared SFF, four
rst formant frequencies, cepstral peak prominence, Leq, alpha
ratio, L1L0 difference, 15/58 difference, jitter, shimmer, and
HNR. Inspection of the results revealed that most acoustic variables
did not differ signicantly between male and female voices. The
multivariate logistic regression analysis showed that cepstrum
during sustained vowel [a:], alpha ratio extracted from reading,
shimmer during sustained vowel [i:], F3 during vowel [a:], and F3
during vowel [i:], are the only parameters that together could be
considered as good predictors of gender for the present study.
In general F0 during sustained vowels and SFF extracted from
text reading did not differ signicantly between male and female
children. Even though there are some previous investigations
reporting differences in F0 between boys and girls, our ndings are
in good agreement with most earlier studies. F0 and SFF have not
been found to be an accurate acoustic variable to detect gender-
related differences in prepubescent childrens voices [1321].
A possible explanation is provided by Bennett [14] whose results
showed that F0 decreased with age. Nevertheless, the decrease
was only 12 Hz with a standard deviation of 8 Hz, suggesting that
the between-subject standard deviation values were larger than
the age-related changes that occurred over a period of time.
A number of studies have reported that formant frequencies are
good acoustic indicators of gender for male and female adults
[6,12,5,2225,3133]. However, this does not appear to contribute
in the same way to the identication of speaker gender in children.
Although there are some data reporting a tendency for girls to have
higher values of formant frequencies than boys [8,12], authors
have noted that the differences are small. Our data evidenced that
only F3 in vowel [i:] has a signicant difference between boys and
girls. According to the acoustic theory of speech, the formant
frequencies depend on the length of the vocal tract and the cross-
sectional shape of the vocal tract as a function of its length [33].
Vocal tract length determines the average spacing of formant
frequencies; as the vocal tract length becomes smaller, the value of
the formant frequency will increase. Conversely, as length
becomes larger, the value of the formant frequency will decrease.
Findings related to morphology of the vocal tract may support the
lack of gender-related differences in children. Fitch et al. found that
differences begin to become established during 10.314.5 years of
age [34]. Yang et al., have reported similar outcomes using
magnetic resonance imaging [35]. Furthermore, Lee et al. observed
that differentiation in formant frequencies begins at around 11
years [36]. Since participants in the present study were between 7
and 10 years old, these morphological outcomes could be a suitable
explanation for our data.
Spectral energy distribution using LTAS has been widely applied
in different types of studies regarding speaker recognition [37,38],
voice qualities [39], voice disorders [4042], aging voice [43,44],
evaluation of techniques of voice therapy [42,45,46], and gender
difference detection [25,27,4749]. To the best of our knowledge,
only two studies have reported clear spectral differences in
childrens voices regarding gender [25,27]. White [28] observed in
LTAS analysis a peak at 5 kHz for boys, and a at spectrum at 5 kHz
for girls. Comparable results were found by Sergeant et al. [25].
Authors found higher spectral energy for boys than girls in several
spectral bands. These differences were found for the age groups 6
8 and 911 years. No signicant differences were observed for the
youngest children (aged 45 years) [25]. On the contrary, results
from the present study did not show any signicant difference for
LTAS parameters, neither for speaking nor for singing voice tasks.
Since L1L0 difference provides information on the mode of
phonation and the alpha ratio provides information on the overall
spectral slope (both related to functional glottal characteristics), it
is likely that our subjects do not have any major difference in
patterns of glottal closure. Moreover, a possible explanation for the
lack of gender-related differences may be the fact that we did not
consider analysis of specic spectral bands as earlier investigations
did. In our study, only spectral slope measures were carried out.
Possibly, the analysis of specic bands of the spectrum is more
sensitive to detect spectral differences between boys and girls. The
Fig. 4. Multivariate logistic regression model results plot.
Fig. 5. Sensitivity/specicity analysis for the multivariate logistic regression model.
Fig. 6. Receiver operating characteristic (ROC) curve analysis.
G Model
main reason to include spectral slope variables in the present study
was the fact that these markers have been not investigated before
to detect gender-related differences in children.
Equivalent level (Leq) did not evidence differences between
boys and girls during text reading tasks in our subjects. These
ndings are in line with previous studies. Sergeant et al. observed
no gender differences for any age group [25]. It is important to
highlight that those results were obtained from singing voice
samples. Similar ndings in speaking voice samples were revealed
by Glaze et al. [50]. However, the opposite has also been reported.
Bo hme et al. established that boys between the ages of 7 and 10
phonate more loudly than girls [28].
Perturbation measures were also analyzed in this study.
Outcomes showed that boys and girls did not differ in jitter and
shimmer values. Comparable results were reported by Nicollas
et al. [51] in a study aimed to investigate possible changes of the
normal voice in children before mutation. No statistically
signicant age-related differences were also observed by Glaze
et al. [52]. Additionally, it was found that jitter was the only
acoustic parameter measured that falls within the normal adult
range. The jitter values reported were lower than values from
normal adults tested [52].
The present study is the rst one using cepstral analysis as a
possible acoustic marker to differentiate gender in childrens
voices. Cepstrum is dened as a Fourier transformation of a
spectrum[53,54]. A strong cepstral peak (high value) is obtained
from a voice characterized by a well-dened harmonic structure
(normal voice). On the other hand, a breathy and hoarse voice has
a poorly dened harmonic structure, hence the cepstral peak is
weak (low value). The reason to include cepstral analysis as a
possible acoustic marker in this study is based on the fact that
previous investigations have reported that cepstral peak value is
the best predictor of overall dysphonia in comparison to
perturbation and noise measures [5558]. Additionally, cep-
strum-related measures have shown strong correlations to
dysphonia severity in different voice disorders [5963]. Our
results showed signicant differences only for vowel [i:]. Since no
previous investigations have used CPP to differentiate gender, no
comparisons are feasible.
In addition to used t-test in our statistical analysis to compare
acoustic variables, a multivariate logistic regression analysis was
performed in order to obtain a predictive model to best
differentiate male and female childrens voices. Results showed
that this predictive model is composed by cepstrum during
sustained vowel [a:], alpha ratio extracted from reading, shimmer
during sustained vowel [i:], F3 during vowel [a:], and F3 during
vowel [i:]. Even though it is generally proper to analyze acoustic
markers using a univariate analysis, it is better to consider a
multivariate model when prediction of gender is targeted since
voice is a complex phenomenon (composed by several features
that coexist).
Results from auditory perceptual assessment indicated that
blinded judges reliably detected gender. Since most acoustic
variables included in our study did not differentiate gender, it is
likely that other acoustic markers (not evaluated in the present
study) are able to make clearer differences. For example, gender-
specic patterns of intonation may be a more accurate feature for
differentiating gender in childrens voices.
There is good evidence that boys and girls use intonation
differentially. Key found that when children read a story, girls
signicantly showed more expressive intonation than boys [64].
Similarly, Ferrand et al. [65] reported that there are clear gender-
related differences in the number and extension of at periods in
intonation. Boys showed more restricted intonational patterns
than girls. Similar differences in intonation have also been found in
adults [13]. The fact that results from our study showed that proper
detection of gender obtained higher value in text reading (84.55%)
than sustained vowel task (63.23%) supports the assumption that
intonational patterns may help gender detection.
5. Conclusion
Comparison of spectral, cepstral peak prominence, perturba-
tion, glottal noise, F0, intensity and formant frequencies between
genders revealed no signicant differences for most parameters. As
earlier acoustic studies have indicated, there are no clear
differences between boys and girls voices. Multivariate approach
seems to be a better option when comparing childrens voices than
univariate analysis. Since perceptual assessment reliably detected
gender in our study as well as in previous studies, it is likely that
other acoustic markers (not evaluated in the present study) are
able to make clearer differentiations between boys and girls voices.
Gender-specic patterns of intonation may be a more accurate
feature for differentiating gender in childrens voices.
References
[1] K. Wilcox, Y. Horii, Age and changes in vocal jitter, J. Geronto. 35 (1980) 194198.
[2] Y. Horii, Fundamental frequency perturbation observed in sustained phonation,
J. Speech Hear Res. 22 (1979) 519.
[3] Y. Horii, Jitter and shimmer differences among sustained vowel phonations,
J. Speech Hear Res. 25 (1982) 1214.
[4] D. Sorensen, Y. Horii, Frequency and amplitude perturbation in the voices of
female speakers, J. Commun. Disord. 16 (1983) 5761.
[5] G.E. Peterson, H.L. Barney, Control methods used in a study of the vowels,
J. Acoust. Soc. Am. 24 (1952) 175184.
[6] S. Eguchi, I.J. Hirsh, Development of speech sounds in children, Acta Otolaryngol.
257 (1969) 151.
[7] E.T. Stathopoulos, C.M. Sapienza, Developmental changes in laryngeal and respi-
ratory function with variations in sound pressure level, J. Speech Hear Res. 40
(1997) 595614.
[8] J.E. Huber, E.T. Stathopoulos, G.M. Curione, T.A. Ash, K. Johnson, Formants of
children, women, and men: the effects of vocal intensity variation, J. Acoust. Soc.
Am. 106 (1999) 15321542.
[9] D. Sergeant, G.F. Welch, Age-related changes in long-term average spectra of
childrens voices, J. Voice 22 (2008) 658670.
[10] C.S. Hasek, S. Singh, T. Murry, Acoustic attributes of preadolescent voices,
J. Acoust. Soc. Am. 68 (1980) 12621265.
[11] D.K. Wilson, Voice Problems of Children, Williams and Wilkins, Baltimore, MD,
1987.
[12] S.P. Whiteside, C. Hodgson, Some acoustic characteristics in the voices of 6- to
10-year-old children and adults: a comparative sex and developmental
perspective, Logoped. Phoniatr. Vocol. 25 (2000) 122132.
[13] J.D. Avery, J.M. Liss, Acoustic characteristics of less masculine-sounding male
speech, J. Acoust. Soc. Am. 99 (1996) 37383748.
[14] S. Bennett, A 3-year longitudinal study of school-aged childrens fundamental
frequencies, J. Speech Hear Res. 26 (1983) 137142.
[15] S. Bennett, B. Weinberg, Sexual characteristics of pre-adolescent childrens voices,
J. Acoust. Soc. Am. 65 (1979) 179189.
[16] D. Ingrisano, G. Weismer, G.H. Schucker, Sex identication of preschool childrens
voices, Folia Phoniatr. 32 (1980) 6169.
[17] P.A. Busby, G.L. Plant, Formant frequency values of vowels produced by preado-
lescent boys and girls, J. Acoust. Soc. Am. 97 (1995) 26032606.
[18] T.L. Perry, R.N. Ohde, D.H. Ashmead, The acoustic basis for gender identication
from childrens voices, J. Acoust. Soc. Am. 109 (2001) 29882998.
[19] R.O. Coleman, A comparison of contribution of two vocal characteristics to the
perception of maleness and femaleness in the voice, J. Speech Hear Res. 19 (1976)
168180.
[20] B. Weinberg, M. Zlatin, Speaking fundamental frequency characteristics of 56
year old children with mongolism, J. Speech Hear Res. 13 (1970) 418425.
[21] D.N. Sorenson, Afundamental frequency investigation of children ages 610 years
old, J. Commun. Disord. 22 (1989) 115123.
[22] S. Bennett, B. Weinberg, Acoustic correlates of perceived sexual identity in
preadolescent childrens voices, J. Acoust. Soc. Am. 66 (1979) 9891000.
[23] S. Bennett, Vowel formant frequency characteristics of preadolescent males and
females, J. Acoust. Soc. Am. 69 (1981) 231238.
[24] J.E. Huber, E.T. Stathopoulos, G.M. Curione, T.A. Ash, K. Johnson, Formants of
children, women and men: the effect of vocal intensity variation, J. Acoust. Soc.
Am. 106 (1999) 15321542.
[25] D.C. Sergeant, G.F. Welch, Gender differences in long-term average spectra of
childrens singing voices, J. Voice 23 (2009) 319336.
[26] D.C. Sergeant, G.F. Welch, Age related changes in the long-termaverage spectra of
childrens voices, J. Voice 22 (2008) 658, 670.
[27] P. White, Long-term average spectrum analysis of sex- and gender-related
differences in childrens voices, Logoped. Phoniatr. Vocol. 26 (2001) 97101.
G Model
[28] G. Bo hme, G. Stuchlik, Voice proles and standard voice prole of untrained
children, J. Voice 9 (1995) 304307.
[29] A. McAllister, E. Sederholm, J. Sundberg, P. Gramming, Relations between voice
range proles and physiological and perceptual voice characteristics in ten-year
old children, J. Voice 8 (1994) 230239.
[30] P. Kitzing, LTAS criteria pertinent to the measurement of voice quality, J. Phon. 14
(1986) 477482.
[31] D.G. Childers, K. Wu, Gender recognition from speech: Part II. Fine analysis,
J. Acoust. Soc. Am. 90 (1991) 18411865.
[32] D. Deterding, The formants of monophthong vowels in standard southern British
English pronunciation, J. Int. Phon. Assoc. 27 (1997) 4755.
[33] R. Kent, Vocal tract acoustics, J. Voice 7 (1993) 97117.
[34] W.T. Fitch, J. Giedd, Morphology and development of the human vocal tract: a
study using magnetic resonance imaging, J. Acoust. Soc. Am. 106 (1993) 1511
1522.
[35] C.-S. Yang, H. Kasuya, Speaker individualities of vocal tract shapes of
Japanese vowels measured by magnetic resonance images, in: Presented at:
The Fourth International Conference on Spoken Language Process, October 3
6, 1996, Philadelphia, PA, 1996, Available at: hhttp://www.isca-speech.org/
archivei.
[36] S. Lee, A. Potamianos, S. Narayanan, Acoustics of childrens speech: developmen-
tal changes of temporal and spectral parameters, J. Acoust. Soc. Am. 105 (1999)
14551468.
[37] W. Majewski, H. Hollien, Speaker identication by long-term spectra under
normal and distorted speech conditions, J. Acoust. Soc. Am. 62 (1997) 975979.
[38] J. Zalewski, W. Majewski, H. Hollien, Cross correlation of long-termspeech spectra
as a speaker identication technique, Acustica 34 (1975) 2024.
[39] J. Wendler, A. Rauhut, J. Kruger, Classication of voice qualities, J. Phon. 14 (1986)
483488.
[40] K. Tanner, N. Roy, A. Ash, E.H. Buder, Spectral moments of the long-term
average spectrum: sensitive indices of voice change after therapy, J. Voice 19
(2005) 211222.
[41] K. Idzebski, Overpressure and breathiness in spastic dysphonia, Acta Otolaryngol.
97 (1984) 373378.
[42] D. Hartl, S. Hans, J. Vaissiere, D. Brasnu, Objective acoustic and aerodynamic
measures of breathiness in paralytic dysphonia, Eur. Arch. Otorhinolaryngol. 260
(2003) 175182.
[43] S. Linville, J. Rens, Vocal tract resonance analysis of aging voice using the long
term average spectra, J. Voice 15 (2001) 323330.
[44] P.T. Da Silva, S. Master, S. Andreoni, P. Pontes, L.R. Ramos, Acoustic and long-term
average spectrum measures to detect vocal aging in women, J. Voice 25 (2011)
411419.
[45] P. De Jonkere, Recognition of hoarseness by means of LTAS, Int. J. Rehabil. Res. 6
(1983) 343345.
[46] S. Master, N. De Blaise, V. Pedrosa, B.M.C. Chiari, The long-term-average spectrum
in research and in the clinical practice of speech therapists, Pro-Fono Rev. Attual.
Cient. 18 (2006) 111120.
[47] A. Bladon, Acoustic phonetics, auditory phonetics, speaker sex and speech recog-
nition-a thread, in: F. Fallside, A. Woods (Eds.), Computer Speech Processing,
Prentice-Hall, Englewood Cliffs, NJ, 1983, pp. 2938.
[48] D. Klatt, Detailed spectral analysis of female voice, J. Acoust. Soc. Am. 81 (1986)
S80.
[49] D. Klatt, L. Klatt, Analysis, synthesis and perception of voice quality variations
among female and male talkers, J. Acoust. Soc. Am. 87 (1990) 820857.
[50] L. Glaze, D. Bless, R. Susser, Acoustic analysis of vowel and loudness differences in
childrens voice, J. Voice 4 (1990) 3744.
[51] R. Nicollas, R. Garrel, M. Ouaknine, A. Giovanni, J-M.B. Nazarian Triglia, Normal
voice in children between 6 and 12 years of age: database and nonlinear analysis,
J. Voice 22 (2007) 671675.
[52] L. Glaze, D. Bless, P. Milenkovic, R. Susser, Acoustic characteristics of childrens
voice, J. Voice 2 (1988) 312319.
[53] J. Hillenbrand, R.A. Cleveland, R.L. Erickson, Acoustic correlates of breathy vocal
quality, J. Speech Hear Res. 37 (1994) 769778.
[54] J. Hillenbrand, R.A. Houde, Acoustic correlates of breathy vocal quality, J. Speech
Hear Res. 39 (1996) 311321.
[55] Y.D. Heman-Ackah, R.J. Heuer, D.D. Michael, Cepstral peak prominence: a more
reliable measure of dysphonia, Ann. Otol. Rhinol. Laryngol. 112 (2003) 324333.
[56] Y.D. Heman-Ackah, D.D. Michael, G.S. Goding Jr., The relationship between
cepstral peak prominence and selected parameters of dysphonia, J. Voice 16
(2000) 2027.
[57] Y.D. Heman-Ackah, Reliability of calculating the cepstral peak without linear
regression analysis, J. Voice 18 (2004) 203208.
[58] K. Zieger, C. Schneider, G. Gerull, D. Mrowinski, Cepstrum analysis in voice
disorders, Folia Phoniatr. Logop. 47 (1995) 210217.
[59] T.L. Eadie, C.R. Baylor, The effect of perceptual training on inexperienced listeners
judgments of dysphonic voice, J. Voice 20 (2006) 527544.
[60] S.N. Awan, N. Roy, Toward the development of an objective index of dysphonia
severity: a four-factor acoustic model, Clin. Linguist. Phon. 20 (2006) 3549.
[61] B. Radish Kumar, J.S. Bhat, N. Prasad, Cepstral analysis of voice in persons with
vocal nodules, J. Voice 24 (2010) 651653.
[62] R.K. Balasubramanium, J.S. Bhat, S. Fahim 3rd, R. Raju 3rd., Cepstral analysis of
voice in unilateral adductor vocal fold palsy, J. Voice 25 (2011) 326329.
[63] S.Y. Lowell, R.H. Colton, R.T. Kelley, Y.C. Hahn, Spectral- and cepstral-based
measures during continuous speech: capacity to distinguish dysphonia and
consistency within a speaker, J. Voice 25 (2011) 223232.
[64] M.R. Key, Linguistic behaviour of male and female, Linguistics 88 (1972) 1531.
[65] C.T. Ferrand, R.L. Bloom, Gender differences in childrens intonational patterns, J.
Voice 10 (1996) 281291.
G Model

Acoustic Markers To Differentiate

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Acoustic Markers To Differentiate

Uploaded by

Copyright:

Available Formats

Acoustic markers to differentiate gender in prepubescent childrens

speaking and singing voice

You might also like