You are on page 1of 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/283623707

Differences between ANSI S3.4-2007 and the proposed ISO532-2

Article  in  The Journal of the Acoustical Society of America · September 2015


DOI: 10.1121/1.4933931

CITATIONS READS

0 662

1 author:

Brian C J Moore
University of Cambridge
835 PUBLICATIONS   38,243 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Evaluation of a frequency-lowering algorithm for adults with high-frequency hearing loss View project

Comparing different forms of frequency-lowering in digital hearing aids for people with dead regions in the cochlea View project

All content following this page was uploaded by Brian C J Moore on 14 January 2016.

The user has requested enhancement of the downloaded file.


Background
Differences between ANSI S3.4-2007 and ANSI S3.4-2007 is based on the loudness model developed by Moore,
Glasberg and Baer (Journal of the Audio Engineering Society, 1997), but with
the proposed ISO532-2 the modified middle-ear transfer function described in Glasberg and Moore
(JASA, 2006)

Brian C.J. Moore

Department of Psychology, University of Cambridge, Downing Street,


Cambridge, CB2 3EB, U.K.
Model assumption: loudness simply sums across ears
Loudness is calculated for each ear and then added across ears
Implications:
(1) A diotic sound is twice as loud as the same sound presented monaurally
(2) The level difference required for equal loudness (LDEL) of monaural and
diotic sounds is about 10 dB for mid-range levels (since a change in level of
10 dB gives roughly a factor of 2 change in loudness)

Data on binaural loudness summation Implementing binaural inhibition


• Most recent data obtained using methods that reduce biases • Moore and Glasberg (JASA, 2007) modified the loudness
suggest that loudness does not simply sum across ears: model to include broadly tuned binaural inhibition
– A diotic sound is less than twice as loud as the same sound presented
monaurally: estimates for the ratio range from about 1.1 to 1.8
– The LDEL is typically about 5-6 dB for mid-range levels
• Binaural loudness summation may be affected by “higher-
level” factors
– The diotic/monaural ratio and the LDEL appear to be lower for speech
sounds presented audio-visually via a loudspeaker (Epstein and
Florentine, JASA, 2012) than for audio-alone headphone presentation
– Not handled by current loudness models
• Scharf (JASA, 1969) showed that the loudness of a tone
presented to one ear was reduced when a tone with a
different frequency was presented to the opposite ear
– Consistent with the concept of binaural inhibition
– The effect was broadly tuned in frequency
ISO 532-2 is based on this version of the model

Predictions of the model Prediction of the data of Scharf (1969)


• A diotic sound is 1.5 times as loud as the same sound
presented monaurally Circles show data
• The LDEL is 5-6 dB for mid-range levels and smaller at low Lines show predictions
levels
• Broadly consistent with empirical data The ordinate shows the
LDEL between a
• The model also gives reasonably accurate predictions of the monaural tone (frequency
loudness of sounds: close to 500, 1000 or
– whose spectra differ at the two ears (Zwicker & Zwicker, 1991, JASA; 2000 Hz) and that tone
Glasberg and Moore, JASA, 2010) together with a tone of
– whose level differs at the two ears (Keen, 1972, JASA; Zwicker & different frequency
Zwicker, 1991, JASA; Shao et al., 2015, JASA) presented to the other ear
Summary: Binaural loudness summation Loudness of time-varying sounds
• The loudness model incorporating binaural inhibition • The models in ANSI S3.4-2007 and ISO532-2 apply only to
gives reasonably accurate predictions of: steady sounds
– the loudness ratio of diotic and monaural sounds • The spectra of the sounds are used as input
– the LDEL • Glasberg and Moore (Journal of the Audio Engineering
Society, 2002) extended the 1997 model to deal with time-
– the loudness of sounds with different spectra at the two
varying sounds - the TVL model
ears
• This version of the model was based on summation of
– the loudness of sounds with different levels at the two ears
loudness across ears
– the effect of a sound in one ear on the loudness of a
• The model has been modified to incorporate binaural
different sound in the other ear
inhibition (Moore, 2014, Trends in Hearing)
• The model does not take into account “higher level” • It also incorporates the modified middle-ear transfer function
factors: proposed by Glasberg and Moore (JASA, 2006)
– the influence of visual cues
– the impression of speaking effort of a talker

Block diagram of the model for time-varying sounds Implementation details


• The model accepts a two-channel (stereo) waveform as its input (32-kHz
sample rate)
• Transfer through the outer and middle ear is modelled using a single finite
impulse response filter with 4097 coefficients
• Predefined filters for free-field, diffuse field and middle-ear alone
• To give good spectral resolution at low frequencies and good temporal
resolution at high frequencies, six FFTs are calculated in parallel, based
on Hanning-windowed signal segments with durations of 2, 4, 8, 16, 32
and 64 ms, all aligned at their temporal centres
• All FFTs are updated at 1-ms intervals
• Each FFT is used to calculate spectral magnitudes over a specific range:
20-80, 80-500, 500-1250, 1250-2540, 2540-4050 and 4050-15000 Hz
• An excitation pattern is calculated at 1-ms intervals, with centre
frequencies spaced by 0.25 Cam (units of the ERBN-number scale)
• “Instantaneous loudness” is calculated from the short-term excitation
pattern
• This is an intervening variable, not available for conscious perception

Calculation of short-term loudness Calculation of long-term loudness


• Short-term loudness corresponds to the loudness of • Long-term loudness corresponds to the overall
a brief segment of a sound, e.g., a specific syllable in loudness impression of a relatively long piece of
speech or note in music sound, e.g., a sentence or a musical phrase
• It is calculated from the instantaneous loudness • It is calculated from the short-term loudness, again
using a form of averaging that resembles an using a form of averaging that resembles an
automatic gain control (AGC) automatic gain control (AGC)
• The “attack” time is short – loudness can increase • Time constants are longer than for the first averager
rapidly when a sound is turned on
• The “release” time is longer – this may correspond to
the persistence of activity at some level in the
auditory system
Empirical tests of the model Empirical tests (2)
• The model gives reasonably accurate predictions of the • The model also gives reasonably accurate
overall loudness of technical sounds: predictions of the loudness of speech and speech-
– Rennies, J., Wächtler, M., Hots, J., and Verhey, J. (2015). "Spectro-
temporal characteristics affecting the loudness of technical sounds:
like signals, including signals subjected to dynamic
data and model predictions," Acta Acust. - Acust. (in press). range compression and expansion
The ordinate shows mean level – Moore, B. C. J., Glasberg, B. R., and Stone, M. A. (2003).
differences between test signal "Why are commercials so loud? - Perception and modeling
and reference signal (jet linear) of the loudness of amplitude-compressed speech," J.
at equal loudness
Audio Eng. Soc. 51, 1123-1132.
All models perform badly for – Rennies, J., Holube, I., and Verhey, J. L. (2013).
ratchet and planing machine "Loudness of speech and speech-like signals," Acta Acust.
- Acust. 99, 268-282.
The long-term loudness of the
TVL model gives better
• The long time constants used to calculate the long-
predictions than other models term loudness appear to be advantageous
for some sounds

Summary and conclusions


• There is now considerable evidence indicating that loudness
does not simply sum across ears, as is assumed in ANSI
S3.4-2007
• The model incorporating binaural inhibition as described in
ISO532-2 gives reasonably accurate predictions of binaural
loudness
– this model is proposed as a revision to ANSI S3.4-2007
• The model for stationary sounds has been extended to deal
with time-varying sounds
• The extended model gives reasonably accurate predictions of
the loudness of technical sounds and speech sounds
• The model for time-varying sounds incorporating binaural
inhibition is proposed as an extension of ANSI S3.4 and as
ISO532-3
– this model is suitable for use with signals picked up by a dummy head

View publication stats

You might also like