Acoustic markers to differentiate gender in prepubescent childrens
speaking and singing voice
Marco Guzman a,b, *, Daniel Munoz c , Martin Vivero d , Natalia Marn e , Mirta Ramrez e , Mara Trinidad Rivera e , Carla Vidal e , Julia Gerhard f , Catalina Gonza lez e a School of Communication Sciences, University of Chile, Santiago, Chile b Department of Otolaryngology, Voice Center, Las Condes Clinic, Santiago, Chile c Barros Luco-Trudeau Hospital, Department of Network Management, Av. Jose Miguel Carrera 3604, Santiago, Chile d Del Salvador Hospital, Department of Otolaryngology, Avenida Salvador 364, Providencia, Santiago, Chile e Andres Bello National University, Fernandez Concha 700, Santiago, Chile f Department of Otolaryngology, University of Miami, Miami, FL, USA 1. Introduction Several acoustic differences have been found when comparing adult male and female voices; fundamental frequency (F0) is one of the most investigated parameters [14]. However, F0 is less widely documented as a distinguishing parameter when reporting gender-related differences in children. F0 has also been considered a relevant feature in differentiating voices across age groups. A number of researches have demonstrated a decrease in F0 from infancy and/or preschool through puberty [59]. Anatomical modications, specically an increased length and mass of the vocal folds, are the main explanations for the F0 changes in human voice. Regarding gender differences, there is some evidence to suggest that male children, overall, have lower fundamental frequency values than their female peers starting from about 7 to 8 years of age [10,11]. In a study with children between 8 and 10 year of age, Whiteside et al. reported similar differences in F0 values between genders, reporting lower frequency values for males compared to females [12]. On the other hand, speaking funda- mental frequency (SFF) extracted from running speech has not been accurately associated with voice differences between genders in children. Studies have reported only small intergender International Journal of Pediatric Otorhinolaryngology xxx (2014) xxxxxx A R T I C L E I N F O Article history: Received 8 April 2014 Received in revised form 25 June 2014 Accepted 28 June 2014 Available online xxx Keywords: Children Gender Acoustic analysis Perceptual analysis Singing voice Speaking voice A B S T R A C T Objectives: Investigation sought to determine whether there is any acoustic variable to objectively differentiate gender in children with normal voices. Methods: A total of 30 children, 15 boys and 15 girls, with perceptually normal voices were examined. They were between 7 and 10 years old (mean: 8.1, SD: 0.7 years). Subjects were required to performthe following phonatory tasks: (1) to phonate sustained vowels [a:], [i:], [u:], (2) to read a phonetically balanced text, and (3) to sing a song. Acoustic analysis included long-term average spectrum (LTAS), fundamental frequency (F0), speaking fundamental frequency (SFF), equivalent continuous sound level (Leq), linear predictive code (LPC) to obtain formant frequencies, perturbation measures, harmonic to noise ratio (HNR), and Cepstral peak prominence (CPP). Auditory perceptual analysis was performed by four blinded judges to determine gender. Results: No signicant gender-related differences were found for most acoustic variables. Perceptual assessment showed good intra and inter rater reliability for gender. Cepstrumfor [a:], alpha ratio in text, shimmer for [i:], F3 in [a:], and F3 in [i:], were the parameters that composed the multivariate logistic regression model to best differentiate male and female childrens voices. Conclusion: Since perceptual assessment reliably detected gender, it is likely that other acoustic markers (not evaluated in the present study) are able to make clearer gender differences. For example, gender- specic patterns of intonation may be a more accurate feature for differentiating gender in childrens voices. 2014 Published by Elsevier Ireland Ltd. * Corresponding author at: School of Communication Sciences, University of Chile, Avenida Independencia 1027, Santiago, Chile. Tel.: +562 2978 6605. E-mail addresses: guzmanvoz@gmail.com, mguzman@med.uchile.cl (M. Guzman). G Model PEDOT-7187; No. of Pages 7 Please cite this article in press as: M. Guzman, et al., Acoustic markers to differentiate gender in prepubescent childrens speaking and singing voice, Int. J. Pediatr. Otorhinolaryngol. (2014), http://dx.doi.org/10.1016/j.ijporl.2014.06.030 Contents lists available at ScienceDirect International Journal of Pediatric Otorhinolaryngology j o ur n al hom ep ag e: www. el s evi er . c om/ l ocat e/ i j p o r l http://dx.doi.org/10.1016/j.ijporl.2014.06.030 0165-5876/ 2014 Published by Elsevier Ireland Ltd. differences, or even contradictory ndings. In general, neither F0 nor SFF has been reported as a reliable predictor of gender in childrens voices [1321]. Formant frequencies have also been studied as possible acoustic markers of gender and age [6,5,2225]. Children have been shown to produce higher values of formant frequencies than adult females, who, in turn, have higher formant frequencies than adult males. Previous studies have shown that formant frequencies decrease with age among children, with the most evident change between 3 and 5 years of age [6,5]. There is also some evidence to suggest that formant frequency char- acteristics may play a role in the perceived gender of a pre- adolescent child [22]. Girls have demonstrated higher values than boys for vowel productions. Authors did not attribute differences to anatomical vocal tract shape; they found that results are due to boys using a smaller jaw opening, more lip rounding, and/or a lower larynx position than girls (producing a relatively longer vocal tract) [23]. In an investigation conducted by Huber et al., no clear differences for formant frequencies between girls and boys were demonstrated. Nevertheless, authors pointed out that frequencies for the rst three formants decrease with age and that there is a tendency for girls to yield higher values than boys of comparable age [24]. Sergeant et al. showed a linear trend in which F1 moves downwards across the 411 years age range [25]. However, authors did not nd any systematic or signicant intergender differences for the children sampled in the formant analysis or within any age group. It is important to highlight that these ndings were calculated from sung production, not spoken utterances as in previously cited studies. Prior works on instrumental measurements of voice in children have also analyzed spectral energy distribution using long-term average spectrum [2527]. Results have evidenced that spectral energy levels at frequencies above 5.75 kHz decreased between the ages of 4 and 11 years, while those at frequencies below 5.75 kHz increased [26]. Related to gender differences, a similar study conducted by White aimed to report the actual and perceived differences between boys and girls [27]. Outcomes showed that there are differences between genders related to the spectral curves; a boy-like sound produced a peak at 5 kHz whereas a girl- like sound produced a relative decrease of spectral energy at 5 kHz Interestingly, the same energy peak at 5 kHz existed in the spectra of girls who were wrongly but condently identied as boys. Sound pressure level (SPL) has been used to identify possible gender-related differences in children. Bo hme et al. conducted a study with the purpose of developing a standard childhood voice prole describing the capacity of a healthy vocally untrained childs voice. Results established that boys between the ages of 7 and 10 phonated more loudly than girls [28]. SPL differences also have been compared between different age groups. Stathopoulos et al. reported that young children used higher SPL than young adults when required to phonate at comfortable loudness levels [7]. Moreover, McAllister et al. demonstrated that women and older children approaching puberty produced a wider dynamic range than 10-year-old children [29]. Even though a number of studies have made an attempt to nd acoustic markers to reliably detect gender-related differences in childrens voices, to date there are no conclusive results regarding this issue. The present investigation sought to determine whether there is any acoustic variable to objectively differentiate gender in children with normal voices. To that end, we included new acoustics measures that have not been measured in earlier studies. The topic of the present study may be of relevance since knowledge about normality in boys and girls voices could add more specic information for further treatment or more accurate voice assessment. 2. Methods 2.1. Participants A total of 30 children, 15 boys and 15 girls, with perceptually normal voices were included. They were between seven and 10 years old (mean: 8.1, SD: 0.7 years). The population that was involved in the present study was selected by convenience. Therefore, the sample size was determined using non-probabilistic criteria. Because of technical applicability reasons, the sample size was arbitrary selected. Participants were recruited from several primary schools. The severity of dysphonia was assessed with the GRBAS scale by one of the authors of this article (MG) who has more than 12 years of experience as a voice clinician. All participants had a GRBAS scale of 00000 (perceptually normal voice) and no history of vocal difculty for the last year. Some minor deviations in voice quality were rated as normal. Although 38 subjects were initially recruited, eight of them did not meet the inclusion criteria due to higher degree in GRBAS scale. Therefore, only thirty were included in the analysis. Parents were contacted and informed about the aims and procedures of the study. After information was given, parents signed an informed consent. This study was reviewed and approved by the Andres Bello University Institutional Review Board. 2.2. Voice recordings All of the participants were asked to attend a single recording session lasting no more than 30 min. Before recordings were conducted, each subject was trained regarding the recording process and phonatory tasks by one of the experimenters. Individual demonstrations and verbal descriptions were provided. Once all of the instructions were understood, children were asked to enter into a soundproof booth to complete the voice recordings. The following protocol was performed: (1) production of sustained vowels [a:], [i:], and [u:] for approximately ve seconds each, (2) reading of a phonetically balanced text for one minute, (3) singing the song happy birthday for one minute. All phonatory tasks were performed at the childs habitual loudness level. Acoustic output was captured at a constant microphone-to mouth distance of 20 cm using a condenser omnidirectional microphone (Samson MM01; Samson Technologies, Hauppauge, NY) connected to an audio interphase (Tascam US-122MKII; Teac Corporation, Montebello, CA). Samples were recorded digitally at a sampling rate of 44,000 Hz with 16 bits per sample quantization. Samples were edited with the software Goldwave, version V5.57 (GoldWave Inc., St. Johns, Newfoundland, Canada). Audio signal was calibrated using a 220 Hz tone at 80 dB produced with a sound generator for further sound level measurements. The SPL of this reference sound was measured with a Bru el & Kjr 2250L sound level meter (Bru el & Kjr Sound & Vibration Measurement, Nrum, Denmark) also positioned at a distance of 20 cm from the generator. 2.3. Acoustic analysis To compare the samples recorded from male and female participants, most acoustic measurements were made using Praat software Version 5.2 (Boersma and Weenink, University of Amsterdam, Amsterdam, The Netherlands). From long-term average spectrum (LTAS) analysis, the following variables were assessed: (1) level difference between the F1 and F0 regions (L1L0) [30]. L1L0 may also be described as the level difference between 300800 Hz and 50300 Hz (Fig. 1). This level difference provides information on the mode of phonation (degree of glottal adduction). (2) The alpha ratio, the level difference between M. Guzman et al. / International Journal of Pediatric Otorhinolaryngology xxx (2014) xxxxxx 2 G Model PEDOT-7187; No. of Pages 7 Please cite this article in press as: M. Guzman, et al., Acoustic markers to differentiate gender in prepubescent childrens speaking and singing voice, Int. J. Pediatr. Otorhinolaryngol. (2014), http://dx.doi.org/10.1016/j.ijporl.2014.06.030 501000 Hz and 15 kHz, which provides information on the overall spectral slope declination (Fig. 2). (3) The energy level difference between 15 kHz and 58 kHz (Fig. 3), which provides information about glottal noise (breathy voice quality). All LTAS variables were obtained from both reading and singing tasks. A frequency bandwidth of 25 Hz and Hanning window was used for LTAS analysis. Unvoiced sounds and pauses were automatically eliminated from the samples by Praat software using the pitch corrected version with standard settings. From spectral analysis window (spectrogram obtained from view and edit command in Praat) the following variables were assessed: equivalent continuous sound level (Leq) from text reading (by get intensity command), mean of speaking fundamen- tal frequency (SFF) from text reading (by get pitch command), mean fundamental frequency (F0) during sustained vowels (by get pitch command). Resulting pitch curve was checked by visual inspection before calculation. The time window for spectral analysis varied depending on the phonatory task. After obtained Leq values from Praat, calibration was done using values captured with the sound level meter. Perturbation measures (jitter % and shimmer %) and harmonic to noise ratio (HNR) were obtained from Praat software. Linear predictive coding (LPC) was performed to obtain the formant frequencies from F1 to F4. Fifth formant frequency (F5) was not calculated due to lack of accuracy showed by Praat. FFT was used to corroborate all format frequency values. Cepstral peak prominence (CPP) was also assessed in sustained vowels [a:], [i:], and [u:]. Because cepstrum peak is a short-term measurement and it is obtained in a specic point of the voice waveform, three different points in middle section of each vowel waveform were taken and averaged for every audio sample. A Kay Computerized Speech Laboratory (CSL) and Multi speech software (KayPENTAX, Lincoln Park, NJ) were used to calculate CPP. 2.4. Auditory perceptual analysis Audio samples obtained from sustained vowel/a/, text reading, and singing (total of 90 samples) were perceptually assessed by the four blinded raters. Additionally, 20 percent of samples were randomly repeated in order to determine whether judges were consistent in their perceptions (intra-rater reliability analysis). Judges were not informed about the purpose of the study. The order of recordings was randomized to avoid recognition of any pattern. Raters were required to determine whether the voice sample belonged to a girl or boy. Raters could replay each sample as many times as they wanted before making their determination and moving on to the next recording. The evaluation was performed in a quiet room using a high quality loudspeaker (Audioengine, Sao Paulo, Brazil). All the listeners reported normal hearing. 2.5. Statistical analysis Descriptive statistics were calculated for the variables, includ- ing mean and standard deviation. Kappa Test was performed to assess inter and intra-rater concordance for gender. A cut point of >0.60 was used to determinate good reliability. Fischers exact test was used to compare proportions between male and female accuracy by individual judges and overall. In order to obtain the vocal characteristics that distinguished between girls and boys voices, various acoustic variables and their relationships with gender were univariate analyzed by t-test and then multivariate analyzed to assess their joint association. A multivariate logistic regression model with variables with t-test p-value at least 0.25 (Hosmer and Lemeshow criteria) in univariate analysis, was used. Then, a stepwise technique with retention probability of 0.2 was used. Odds ratio, sensitivity, specicity, positive predictive value (PPV), negative predictive value (NPV) and receiver operating characteristic (ROC) curve analysis were reported. All analyses were performed using Stata 13.1 (StataCorp, College Station, TX). p < 0.05 was considered to be statistically signicant, and all reported p values were two-sided. 3. Results 3.1. Acoustic analysis Values of F0 and formant frequencies (F1F3) extracted from sustained vowels, are displayed in Table 1. No signicant Fig. 1. Spectrum showing the alpha ratio. Fig. 2. Spectrum showing the L1L0 ratio. Fig. 3. Spectrum showing the 15/58 kHz difference. M. Guzman et al. / International Journal of Pediatric Otorhinolaryngology xxx (2014) xxxxxx 3 G Model PEDOT-7187; No. of Pages 7 Please cite this article in press as: M. Guzman, et al., Acoustic markers to differentiate gender in prepubescent childrens speaking and singing voice, Int. J. Pediatr. Otorhinolaryngol. (2014), http://dx.doi.org/10.1016/j.ijporl.2014.06.030 differences were found between acoustic variables for male and female participants, with the exception of F3 in vowel [i:] (p = 0.0156) and F0 in vowel [u:] (p = 0.0183). There was no signicant difference between male and female children with regard to SFF (p = 0.5775) and Leq (p = 0.1269) obtained from the reading of a phonetically balanced text. For SFF boys obtained 251.60 (38.54) and girls 257.86 (19.13). For Leq boys obtained 77.87 (4.53), and girls 80.09 (3.06). Results of alpha ratio, L1L0, and 15/58 kHz difference are summarized in Table 2. No signicant differences were found for these LTAS markers. Cepstral analysis obtained from all sustained vowels did not evidence any difference between boys and girls. Table 3 displays results from perturbation measures (jitter and shimmer) and harmonic to noise ratio. No differences were detected between male and females for any of these parameters. Both boys and girls demonstrated higher values for shimmer compared to normal values for adults (3% approximately). 3.2. Auditory perceptual analysis Results from auditory perceptual assessment performed by four blinded judges are shown in Table 4. Kappa values indicated that there was good intra and inter rater reliability for gender. Results from Fischers exact test to compare proportions between male and female accuracy by individual judge and overall were as follows: judge 1: 67.64%; male = 63.43%; female = 76.47% (p = 0.0021), judge 2: 83.33%; male = 61.76%; female = 94.11% (p < 0.0001), judge 3: 76.47%; male = 50,0%; female = 94.11% (p < 0.0001), judge 4: 78.43%; male = 63.63%; female = 86.48% (p < 0.0001), and overall: 76.47%; male = 60.72%; female = 87.13% (p < 0.0001). Furthermore, regarding type of phonatory tasks, results showed that gender was properly detected with 84.55% in text reading, 80.88% in singing, and 63.23% in sustained vowel production. 3.3. Multivariate logistic regression model The acoustic variables that reached the Hosmer and Lemeshow criteria and stepwise selection were Cepstrum [a:], alpha ratio text, shimmer [i:], F3 [a:], and F3 [i:]. Therefore, these parameters compose the predictive model for the present study. Numerical results are showed in Table 5 and Fig. 4. Moreover, this model obtained a sensitivity = 80.00%, specicity = 86.67%, positive pre- dictive value = 85.71%, and negative predictive value = 81.25%. Fig. 5 shows results from sensitivity/specicity analysis. The relationship between good sensitivity and specicity of the logistic model to predict the sex of the individual according to their vocal characteristics is shown. Results from the receiver operating characteristic (ROC) curve analysis are reported in Fig. 6. This gure shows ROC curve sensitivity and specicity of the logistic model to predict the sex of the individual according to the predictor variables also reected in a higher value of area under the ROC curve (0.89). Table 1 Values of F0 and formant frequencies (F1F3) extracted from sustained vowels. Boys and girls comparison. Parameter (Hz) Boys Girls p-Value [a:] F0 233.25 40.69 248.25 27.45 0.2465 F1 820.16 139.82 888.28 148.32 0.2062 F2 1574.97 147.07 1621.31 100.35 0.3221 F3 3080.07 307.93 2825.67 590.79 0.1503 F4 5396.33 536.65 5269.33 414.80 0.4744 [i:] F0 238.31 42.31 265.28 29.89 0.0535 F1 384.05 82.74 414.89 84.05 0.3199 F2 2426.77 477.20 2294.27 707.27 0.5524 F3 3287.60 229.11 3472.50 157.37 0.0156 F4 5466 558.56 5361.26 288.85 0.5241 [u:] F0 230.34 46.77 265.29 27.00 0.0183 F1 448.78 78.73 453.91 59.12 0.8414 F2 1134.24 317.29 966.97 206.41 0.0980 F3 2890.38 191.85 2665.36 567.03 0.1566 F4 6388.2 556.88 6452.86 223.76 0.6796 Table 2 Results of alpha ratio and L1L0. Boys and girls comparison. Boys Girls p-Value Alpha ratio text (dB) 18.90 2.39 17.13 3.14 0.0926 Alpha ratio song (dB) 18.42 2.83 17.08 3.11 0.2278 L0L1 text (dB) 2.37 4.43 3.36 3.37 0.4963 L0L1 song (dB) 1.28 4.03 2.75 2.60 0.2448 15/58 text (dB) 17.57 2.80 16.02 3.19 0.2691 15/58 song (dB) 16.58 4.33 17.41 3.20 0.5538 Table 3 Values of perturbation measures (jitter and shimmer) and harmonic to noise ratio. Boys and girls comparison. Vowel Boys Girls p-Value Shimmer (%) [a:] 8.04 2.71 7.12 3.79 0.4493 [i:] 6.38 2.78 4.74 1.44 0.0533 [u:] 5.33 1.75 4.71 1.39 0.2943 Jitter (%) [a:] 0.56 0.24 0.48 0.28 0.4150 [i:] 0.63 0.40 0.64 0.67 0.9687 [u:] 0.42 0.11 0.53 0.32 0.2517 HNR (dB) [a:] 14.38 3.38 14.8 3.90 0.7373 [i:] 17.02 3.93 17.56 2.40 0.6579 [u:] 21.81 2.52 21.33 1.97 0.5652 Table 4 Results from auditory perceptual assessment performed by four blinded judges. Judge 1 Judge 2 Judge 3 Judge 4 Kappa Vowel [a:] k = 0.48 K = 0.95 K = 0.73 k = 0.81 0.68 Text K = 0.67 K = 1 K = 0.86 k = 0.81 0.83 Song K = 0.35 K = 1 K = 0.62 k = 1 0.74 Table 5 Estimated results from multivariate logistic regression model for gender prediction. Variable Odds ratio [95% CI] p-Value Shimmer [i:] (%) 2.36 [1.144.88] 0.019 Cepstrum [a:] (%) 0.62 [0.380.99] 0.046 F3 [a:] (Hz) 1.001 [0.991.004] 0.156 F3 [i:] (Hz) 0.73 [0.620.81] 0.001 Alpha ratio text (dB) 0.71 [0.570.94] 0.002 M. Guzman et al. / International Journal of Pediatric Otorhinolaryngology xxx (2014) xxxxxx 4 G Model PEDOT-7187; No. of Pages 7 Please cite this article in press as: M. Guzman, et al., Acoustic markers to differentiate gender in prepubescent childrens speaking and singing voice, Int. J. Pediatr. Otorhinolaryngol. (2014), http://dx.doi.org/10.1016/j.ijporl.2014.06.030 4. Discussion The present investigation examined several acoustic variables as possible objective markers of gender in childrens voices. To the best of our knowledge, this is the rst study to include as possible acoustic markers the cepstral peak prominence and parameters related to spectral slope. It is also the rst attempt to include speaking and singing voice together. To examine whether the acoustic variables were sensitive to gender, we compared SFF, four rst formant frequencies, cepstral peak prominence, Leq, alpha ratio, L1L0 difference, 15/58 difference, jitter, shimmer, and HNR. Inspection of the results revealed that most acoustic variables did not differ signicantly between male and female voices. The multivariate logistic regression analysis showed that cepstrum during sustained vowel [a:], alpha ratio extracted from reading, shimmer during sustained vowel [i:], F3 during vowel [a:], and F3 during vowel [i:], are the only parameters that together could be considered as good predictors of gender for the present study. In general F0 during sustained vowels and SFF extracted from text reading did not differ signicantly between male and female children. Even though there are some previous investigations reporting differences in F0 between boys and girls, our ndings are in good agreement with most earlier studies. F0 and SFF have not been found to be an accurate acoustic variable to detect gender- related differences in prepubescent childrens voices [1321]. A possible explanation is provided by Bennett [14] whose results showed that F0 decreased with age. Nevertheless, the decrease was only 12 Hz with a standard deviation of 8 Hz, suggesting that the between-subject standard deviation values were larger than the age-related changes that occurred over a period of time. A number of studies have reported that formant frequencies are good acoustic indicators of gender for male and female adults [6,12,5,2225,3133]. However, this does not appear to contribute in the same way to the identication of speaker gender in children. Although there are some data reporting a tendency for girls to have higher values of formant frequencies than boys [8,12], authors have noted that the differences are small. Our data evidenced that only F3 in vowel [i:] has a signicant difference between boys and girls. According to the acoustic theory of speech, the formant frequencies depend on the length of the vocal tract and the cross- sectional shape of the vocal tract as a function of its length [33]. Vocal tract length determines the average spacing of formant frequencies; as the vocal tract length becomes smaller, the value of the formant frequency will increase. Conversely, as length becomes larger, the value of the formant frequency will decrease. Findings related to morphology of the vocal tract may support the lack of gender-related differences in children. Fitch et al. found that differences begin to become established during 10.314.5 years of age [34]. Yang et al., have reported similar outcomes using magnetic resonance imaging [35]. Furthermore, Lee et al. observed that differentiation in formant frequencies begins at around 11 years [36]. Since participants in the present study were between 7 and 10 years old, these morphological outcomes could be a suitable explanation for our data. Spectral energy distribution using LTAS has been widely applied in different types of studies regarding speaker recognition [37,38], voice qualities [39], voice disorders [4042], aging voice [43,44], evaluation of techniques of voice therapy [42,45,46], and gender difference detection [25,27,4749]. To the best of our knowledge, only two studies have reported clear spectral differences in childrens voices regarding gender [25,27]. White [28] observed in LTAS analysis a peak at 5 kHz for boys, and a at spectrum at 5 kHz for girls. Comparable results were found by Sergeant et al. [25]. Authors found higher spectral energy for boys than girls in several spectral bands. These differences were found for the age groups 6 8 and 911 years. No signicant differences were observed for the youngest children (aged 45 years) [25]. On the contrary, results from the present study did not show any signicant difference for LTAS parameters, neither for speaking nor for singing voice tasks. Since L1L0 difference provides information on the mode of phonation and the alpha ratio provides information on the overall spectral slope (both related to functional glottal characteristics), it is likely that our subjects do not have any major difference in patterns of glottal closure. Moreover, a possible explanation for the lack of gender-related differences may be the fact that we did not consider analysis of specic spectral bands as earlier investigations did. In our study, only spectral slope measures were carried out. Possibly, the analysis of specic bands of the spectrum is more sensitive to detect spectral differences between boys and girls. The Fig. 4. Multivariate logistic regression model results plot. Fig. 5. Sensitivity/specicity analysis for the multivariate logistic regression model. Fig. 6. Receiver operating characteristic (ROC) curve analysis. M. Guzman et al. / International Journal of Pediatric Otorhinolaryngology xxx (2014) xxxxxx 5 G Model PEDOT-7187; No. of Pages 7 Please cite this article in press as: M. Guzman, et al., Acoustic markers to differentiate gender in prepubescent childrens speaking and singing voice, Int. J. Pediatr. Otorhinolaryngol. (2014), http://dx.doi.org/10.1016/j.ijporl.2014.06.030 main reason to include spectral slope variables in the present study was the fact that these markers have been not investigated before to detect gender-related differences in children. Equivalent level (Leq) did not evidence differences between boys and girls during text reading tasks in our subjects. These ndings are in line with previous studies. Sergeant et al. observed no gender differences for any age group [25]. It is important to highlight that those results were obtained from singing voice samples. Similar ndings in speaking voice samples were revealed by Glaze et al. [50]. However, the opposite has also been reported. Bo hme et al. established that boys between the ages of 7 and 10 phonate more loudly than girls [28]. Perturbation measures were also analyzed in this study. Outcomes showed that boys and girls did not differ in jitter and shimmer values. Comparable results were reported by Nicollas et al. [51] in a study aimed to investigate possible changes of the normal voice in children before mutation. No statistically signicant age-related differences were also observed by Glaze et al. [52]. Additionally, it was found that jitter was the only acoustic parameter measured that falls within the normal adult range. The jitter values reported were lower than values from normal adults tested [52]. The present study is the rst one using cepstral analysis as a possible acoustic marker to differentiate gender in childrens voices. Cepstrum is dened as a Fourier transformation of a spectrum[53,54]. A strong cepstral peak (high value) is obtained from a voice characterized by a well-dened harmonic structure (normal voice). On the other hand, a breathy and hoarse voice has a poorly dened harmonic structure, hence the cepstral peak is weak (low value). The reason to include cepstral analysis as a possible acoustic marker in this study is based on the fact that previous investigations have reported that cepstral peak value is the best predictor of overall dysphonia in comparison to perturbation and noise measures [5558]. Additionally, cep- strum-related measures have shown strong correlations to dysphonia severity in different voice disorders [5963]. Our results showed signicant differences only for vowel [i:]. Since no previous investigations have used CPP to differentiate gender, no comparisons are feasible. In addition to used t-test in our statistical analysis to compare acoustic variables, a multivariate logistic regression analysis was performed in order to obtain a predictive model to best differentiate male and female childrens voices. Results showed that this predictive model is composed by cepstrum during sustained vowel [a:], alpha ratio extracted from reading, shimmer during sustained vowel [i:], F3 during vowel [a:], and F3 during vowel [i:]. Even though it is generally proper to analyze acoustic markers using a univariate analysis, it is better to consider a multivariate model when prediction of gender is targeted since voice is a complex phenomenon (composed by several features that coexist). Results from auditory perceptual assessment indicated that blinded judges reliably detected gender. Since most acoustic variables included in our study did not differentiate gender, it is likely that other acoustic markers (not evaluated in the present study) are able to make clearer differences. For example, gender- specic patterns of intonation may be a more accurate feature for differentiating gender in childrens voices. There is good evidence that boys and girls use intonation differentially. Key found that when children read a story, girls signicantly showed more expressive intonation than boys [64]. Similarly, Ferrand et al. [65] reported that there are clear gender- related differences in the number and extension of at periods in intonation. Boys showed more restricted intonational patterns than girls. Similar differences in intonation have also been found in adults [13]. The fact that results from our study showed that proper detection of gender obtained higher value in text reading (84.55%) than sustained vowel task (63.23%) supports the assumption that intonational patterns may help gender detection. 5. Conclusion Comparison of spectral, cepstral peak prominence, perturba- tion, glottal noise, F0, intensity and formant frequencies between genders revealed no signicant differences for most parameters. As earlier acoustic studies have indicated, there are no clear differences between boys and girls voices. Multivariate approach seems to be a better option when comparing childrens voices than univariate analysis. Since perceptual assessment reliably detected gender in our study as well as in previous studies, it is likely that other acoustic markers (not evaluated in the present study) are able to make clearer differentiations between boys and girls voices. Gender-specic patterns of intonation may be a more accurate feature for differentiating gender in childrens voices. References [1] K. Wilcox, Y. Horii, Age and changes in vocal jitter, J. Geronto. 35 (1980) 194198. [2] Y. Horii, Fundamental frequency perturbation observed in sustained phonation, J. Speech Hear Res. 22 (1979) 519. [3] Y. Horii, Jitter and shimmer differences among sustained vowel phonations, J. Speech Hear Res. 25 (1982) 1214. [4] D. Sorensen, Y. Horii, Frequency and amplitude perturbation in the voices of female speakers, J. Commun. Disord. 16 (1983) 5761. [5] G.E. Peterson, H.L. Barney, Control methods used in a study of the vowels, J. Acoust. Soc. Am. 24 (1952) 175184. [6] S. Eguchi, I.J. Hirsh, Development of speech sounds in children, Acta Otolaryngol. 257 (1969) 151. [7] E.T. Stathopoulos, C.M. Sapienza, Developmental changes in laryngeal and respi- ratory function with variations in sound pressure level, J. Speech Hear Res. 40 (1997) 595614. [8] J.E. Huber, E.T. Stathopoulos, G.M. Curione, T.A. Ash, K. Johnson, Formants of children, women, and men: the effects of vocal intensity variation, J. Acoust. Soc. Am. 106 (1999) 15321542. [9] D. Sergeant, G.F. Welch, Age-related changes in long-term average spectra of childrens voices, J. Voice 22 (2008) 658670. [10] C.S. Hasek, S. Singh, T. Murry, Acoustic attributes of preadolescent voices, J. Acoust. Soc. Am. 68 (1980) 12621265. [11] D.K. Wilson, Voice Problems of Children, Williams and Wilkins, Baltimore, MD, 1987. [12] S.P. Whiteside, C. Hodgson, Some acoustic characteristics in the voices of 6- to 10-year-old children and adults: a comparative sex and developmental perspective, Logoped. Phoniatr. Vocol. 25 (2000) 122132. [13] J.D. Avery, J.M. Liss, Acoustic characteristics of less masculine-sounding male speech, J. Acoust. Soc. Am. 99 (1996) 37383748. [14] S. Bennett, A 3-year longitudinal study of school-aged childrens fundamental frequencies, J. Speech Hear Res. 26 (1983) 137142. [15] S. Bennett, B. Weinberg, Sexual characteristics of pre-adolescent childrens voices, J. Acoust. Soc. Am. 65 (1979) 179189. [16] D. Ingrisano, G. Weismer, G.H. Schucker, Sex identication of preschool childrens voices, Folia Phoniatr. 32 (1980) 6169. [17] P.A. Busby, G.L. Plant, Formant frequency values of vowels produced by preado- lescent boys and girls, J. Acoust. Soc. Am. 97 (1995) 26032606. [18] T.L. Perry, R.N. Ohde, D.H. Ashmead, The acoustic basis for gender identication from childrens voices, J. Acoust. Soc. Am. 109 (2001) 29882998. [19] R.O. Coleman, A comparison of contribution of two vocal characteristics to the perception of maleness and femaleness in the voice, J. Speech Hear Res. 19 (1976) 168180. [20] B. Weinberg, M. Zlatin, Speaking fundamental frequency characteristics of 56 year old children with mongolism, J. Speech Hear Res. 13 (1970) 418425. [21] D.N. Sorenson, Afundamental frequency investigation of children ages 610 years old, J. Commun. Disord. 22 (1989) 115123. [22] S. Bennett, B. Weinberg, Acoustic correlates of perceived sexual identity in preadolescent childrens voices, J. Acoust. Soc. Am. 66 (1979) 9891000. [23] S. Bennett, Vowel formant frequency characteristics of preadolescent males and females, J. Acoust. Soc. Am. 69 (1981) 231238. [24] J.E. Huber, E.T. Stathopoulos, G.M. Curione, T.A. Ash, K. Johnson, Formants of children, women and men: the effect of vocal intensity variation, J. Acoust. Soc. Am. 106 (1999) 15321542. [25] D.C. Sergeant, G.F. Welch, Gender differences in long-term average spectra of childrens singing voices, J. Voice 23 (2009) 319336. [26] D.C. Sergeant, G.F. Welch, Age related changes in the long-termaverage spectra of childrens voices, J. Voice 22 (2008) 658, 670. [27] P. White, Long-term average spectrum analysis of sex- and gender-related differences in childrens voices, Logoped. Phoniatr. Vocol. 26 (2001) 97101. M. Guzman et al. / International Journal of Pediatric Otorhinolaryngology xxx (2014) xxxxxx 6 G Model PEDOT-7187; No. of Pages 7 Please cite this article in press as: M. Guzman, et al., Acoustic markers to differentiate gender in prepubescent childrens speaking and singing voice, Int. J. Pediatr. Otorhinolaryngol. (2014), http://dx.doi.org/10.1016/j.ijporl.2014.06.030 [28] G. Bo hme, G. Stuchlik, Voice proles and standard voice prole of untrained children, J. Voice 9 (1995) 304307. [29] A. McAllister, E. Sederholm, J. Sundberg, P. Gramming, Relations between voice range proles and physiological and perceptual voice characteristics in ten-year old children, J. Voice 8 (1994) 230239. [30] P. Kitzing, LTAS criteria pertinent to the measurement of voice quality, J. Phon. 14 (1986) 477482. [31] D.G. Childers, K. Wu, Gender recognition from speech: Part II. Fine analysis, J. Acoust. Soc. Am. 90 (1991) 18411865. [32] D. Deterding, The formants of monophthong vowels in standard southern British English pronunciation, J. Int. Phon. Assoc. 27 (1997) 4755. [33] R. Kent, Vocal tract acoustics, J. Voice 7 (1993) 97117. [34] W.T. Fitch, J. Giedd, Morphology and development of the human vocal tract: a study using magnetic resonance imaging, J. Acoust. Soc. Am. 106 (1993) 1511 1522. [35] C.-S. Yang, H. Kasuya, Speaker individualities of vocal tract shapes of Japanese vowels measured by magnetic resonance images, in: Presented at: The Fourth International Conference on Spoken Language Process, October 3 6, 1996, Philadelphia, PA, 1996, Available at: hhttp://www.isca-speech.org/ archivei. [36] S. Lee, A. Potamianos, S. Narayanan, Acoustics of childrens speech: developmen- tal changes of temporal and spectral parameters, J. Acoust. Soc. Am. 105 (1999) 14551468. [37] W. Majewski, H. Hollien, Speaker identication by long-term spectra under normal and distorted speech conditions, J. Acoust. Soc. Am. 62 (1997) 975979. [38] J. Zalewski, W. Majewski, H. Hollien, Cross correlation of long-termspeech spectra as a speaker identication technique, Acustica 34 (1975) 2024. [39] J. Wendler, A. Rauhut, J. Kruger, Classication of voice qualities, J. Phon. 14 (1986) 483488. [40] K. Tanner, N. Roy, A. Ash, E.H. Buder, Spectral moments of the long-term average spectrum: sensitive indices of voice change after therapy, J. Voice 19 (2005) 211222. [41] K. Idzebski, Overpressure and breathiness in spastic dysphonia, Acta Otolaryngol. 97 (1984) 373378. [42] D. Hartl, S. Hans, J. Vaissiere, D. Brasnu, Objective acoustic and aerodynamic measures of breathiness in paralytic dysphonia, Eur. Arch. Otorhinolaryngol. 260 (2003) 175182. [43] S. Linville, J. Rens, Vocal tract resonance analysis of aging voice using the long term average spectra, J. Voice 15 (2001) 323330. [44] P.T. Da Silva, S. Master, S. Andreoni, P. Pontes, L.R. Ramos, Acoustic and long-term average spectrum measures to detect vocal aging in women, J. Voice 25 (2011) 411419. [45] P. De Jonkere, Recognition of hoarseness by means of LTAS, Int. J. Rehabil. Res. 6 (1983) 343345. [46] S. Master, N. De Blaise, V. Pedrosa, B.M.C. Chiari, The long-term-average spectrum in research and in the clinical practice of speech therapists, Pro-Fono Rev. Attual. Cient. 18 (2006) 111120. [47] A. Bladon, Acoustic phonetics, auditory phonetics, speaker sex and speech recog- nition-a thread, in: F. Fallside, A. Woods (Eds.), Computer Speech Processing, Prentice-Hall, Englewood Cliffs, NJ, 1983, pp. 2938. [48] D. Klatt, Detailed spectral analysis of female voice, J. Acoust. Soc. Am. 81 (1986) S80. [49] D. Klatt, L. Klatt, Analysis, synthesis and perception of voice quality variations among female and male talkers, J. Acoust. Soc. Am. 87 (1990) 820857. [50] L. Glaze, D. Bless, R. Susser, Acoustic analysis of vowel and loudness differences in childrens voice, J. Voice 4 (1990) 3744. [51] R. Nicollas, R. Garrel, M. Ouaknine, A. Giovanni, J-M.B. Nazarian Triglia, Normal voice in children between 6 and 12 years of age: database and nonlinear analysis, J. Voice 22 (2007) 671675. [52] L. Glaze, D. Bless, P. Milenkovic, R. Susser, Acoustic characteristics of childrens voice, J. Voice 2 (1988) 312319. [53] J. Hillenbrand, R.A. Cleveland, R.L. Erickson, Acoustic correlates of breathy vocal quality, J. Speech Hear Res. 37 (1994) 769778. [54] J. Hillenbrand, R.A. Houde, Acoustic correlates of breathy vocal quality, J. Speech Hear Res. 39 (1996) 311321. [55] Y.D. Heman-Ackah, R.J. Heuer, D.D. Michael, Cepstral peak prominence: a more reliable measure of dysphonia, Ann. Otol. Rhinol. Laryngol. 112 (2003) 324333. [56] Y.D. Heman-Ackah, D.D. Michael, G.S. Goding Jr., The relationship between cepstral peak prominence and selected parameters of dysphonia, J. Voice 16 (2000) 2027. [57] Y.D. Heman-Ackah, Reliability of calculating the cepstral peak without linear regression analysis, J. Voice 18 (2004) 203208. [58] K. Zieger, C. Schneider, G. Gerull, D. Mrowinski, Cepstrum analysis in voice disorders, Folia Phoniatr. Logop. 47 (1995) 210217. [59] T.L. Eadie, C.R. Baylor, The effect of perceptual training on inexperienced listeners judgments of dysphonic voice, J. Voice 20 (2006) 527544. [60] S.N. Awan, N. Roy, Toward the development of an objective index of dysphonia severity: a four-factor acoustic model, Clin. Linguist. Phon. 20 (2006) 3549. [61] B. Radish Kumar, J.S. Bhat, N. Prasad, Cepstral analysis of voice in persons with vocal nodules, J. Voice 24 (2010) 651653. [62] R.K. Balasubramanium, J.S. Bhat, S. Fahim 3rd, R. Raju 3rd., Cepstral analysis of voice in unilateral adductor vocal fold palsy, J. Voice 25 (2011) 326329. [63] S.Y. Lowell, R.H. Colton, R.T. Kelley, Y.C. Hahn, Spectral- and cepstral-based measures during continuous speech: capacity to distinguish dysphonia and consistency within a speaker, J. Voice 25 (2011) 223232. [64] M.R. Key, Linguistic behaviour of male and female, Linguistics 88 (1972) 1531. [65] C.T. Ferrand, R.L. Bloom, Gender differences in childrens intonational patterns, J. Voice 10 (1996) 281291. M. Guzman et al. / International Journal of Pediatric Otorhinolaryngology xxx (2014) xxxxxx 7 G Model PEDOT-7187; No. of Pages 7 Please cite this article in press as: M. Guzman, et al., Acoustic markers to differentiate gender in prepubescent childrens speaking and singing voice, Int. J. Pediatr. Otorhinolaryngol. (2014), http://dx.doi.org/10.1016/j.ijporl.2014.06.030