Professional Documents
Culture Documents
Conference
Paper
Presented at the Conference on
Audio for Virtual and Augmented Reality
2016 September 30–October 1, Los Angeles, CA, USA
This conference paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at
least two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This conference paper has been
reproduced from the author’s advance manuscript without editing, corrections, or consideration by the Review Board. The
AES takes no responsibility for the contents. This paper is available in the AES E-Library (http://www.aes.org/e-lib), all rights
reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the
Audio Engineering Society.
ABSTRACT
The contribution presents an experiment to use the room acoustic parameter Direct-to-Reverberant-Energy-Ratio
(DRR) to solve the room divergence effect in binaural listening via headphones. Perceived externalization of
auditory events is decreased if acoustic divergence between the listening room and the resynthesized room
occurs. The DRR is used to push the synthesis towards the listening room to increase externalization. The
listeners adjust the DRR of the synthesis on the expected DRR of the listening room until congruence between
synthesis and listening room is perceived. The results show that the DRR is a suitable acoustic parameter. The
listeners are able to reliably adjust the DRR of the listening room only by their expectations and no explicit
external reference. A subsequent quality test shows that the congruent DRR conditions have only a minor effect
on the increase of externalization using divergent room conditions.
achieve congruence between listening room and the loudspeakers and to the resynthesis of those
resynthesized room and to increase externalization. loudspeakers via headphones.
The DRR is investigated as one possible acoustic
parameter which can be used for adaptation of the 3 Direct-to-Reverberant-Energy-Ratio
system to reach high QoE. Former studies show that The sound waves are emitted in multiple directions
the adjustment of the DRR can increase the from a sound source depending on frequency and
perceived congruence between synthesized and directivity of the source. The expansion of these
listening room [9]. The experiment also shows a waves is likely to be disturbed by boundary surfaces
similarity between the values of the inter-quartile of the room. A superposition of sound waves is
distances (IQDs) and reported just-noticeable measureable and perceivable at the listening point.
differences (JNDs) for the perception of DRR. It is Only the amount of sound energy which reaches the
conjectured from that work that the adjustment of listening point undisturbed is defined as direct
the DRR is a valid method to adapt a binaural sound. The sound waves reflected on the room
synthesis on the context parameter listening room surfaces are called reverberant sound. This sound is
because of the high inter-rater-reliability visible in delayed and reaches the listening point via several
the small IQDs. This contribution repeats the DRR paths. In a binaural context we define the direct
adjustment with more test persons and evaluate the sound as sound which reaches the ears up to 1.5 ms
effect on perceived externalization of auditory after the undisturbed sound. This time span includes
events. monaural and binaural cues resulting from reflexions
and diffractions of the outer ear, head, and torso but
2 Binaural Synthesis System not from the room. The reverberant sound is defined
The used binaural synthesis system consist of a as sound reaching the listening point after the direct
headphone system using binaural recordings of sound. The energy ratio between direct and
individual and artificial binaural room impulse reverberant sound is named as DRR. The DRR is
responses (BRIRs, using a KEMAR head and torso strongly dependent on the distance between source
simulator) for the selected rooms, sound sources and and receiver. In addition the DRR is a room specific
positions. A non-dynamic system with no head acoustic parameter for a fixed source to receiver
tracking is used to avoid that dynamic cues resolve distance. The DRR can be calculated:
perceptional ambiguities like front-back confusions T h 2 t dt
(1)
and in-head localization. A customizable binaural DRR 10 log o
h t dt
2
system is used to increase the fidelity of the T
simulation compared to real loudspeakers [11]. The with h(t) as impulse response between two points in
usage of individual BRIRs reduces within-cone and an enclosure and T=1.5 ms to separate direct sound
out-of-cone of confusion errors [10]. Different and room reflections.
rooms have been chosen to include different room
acoustic properties like reverberation and source- Several experiments are conducted in the past to
receiver distances. Reverberation encourages the determine the JND of DRR perception at the 50%-
perception of externalization of an auditory illusion discrimination point of the psychometric function.
and the impression of distance [12]. The headphones JND values between 2 dB and 20 dB are reported
are equalized using individual headphone transfer depending on test design and DRR magnitude. A
functions (HPTFs) if individual BRIRs are used. minimum of 2 dB JND at 0 dB DRR rising up to 20
HPTFs from the head-and-torso simulator are used if dB JND at +/- 20 dB DRR are reported in [16] for
artificial BRIRs are used. In-ear microphones are loudspeaker listening. For virtual acoustics JNDs of
used to measure individual BRIRs and HPTFs at the 5 dB to 6 dB are assessed in [17] and JNDs of 2.4
entrance of the blocked ear canal of each test person dB to 8.7 dB are mentioned in [18].
[13]. The microphones are not removed between the
BRIR and HPTF measurements. The inverse of a 4 Perceptual Evaluation
HPTF is calculated by a least-square method with This section describes the apparatus and the
minimum phase inversion [14]. A BK211 extra-aural methodologies of the experiment. The experiment
headphones [15] which fulfills the requirements for consists of an adjustment session, a quality rating
open headphone is used for playback [13]. The session in two rooms with different room acoustic
distortion of the sound incidence from the real characteristics, and a preceding pre-test session.
sources in the room to the listener´s ears is Twenty-two inexperienced (in terms of binaural
minimized especially for an extra-aural headphone. perceptional experiments) test persons with a mean
This allows a test design with listening to real age of 29 years participate in the listening test. A
speech shaped noise signal, a male speech signal
AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,
Page 2 of 7
Werner, Klein, and Sporer DRR and Externalization in Binaural Synthesis
(Dutch speaker; no test person speaks Dutch), and a Acoustic divergence between the listening room
short part of a saxophone play are used as audio and the resynthesized room is the main object of
signals with a duration of four, five, and nine investigation. A listening lab (naming: HL; Rec.
seconds. ITU-R BS.1116-1, V=179 m³, RT60=0.3 s) and an
empty seminar room (naming: SR; V=182 m³,
a. Quality Feature of Investigation RT60=2.0 s [21]) are used as recording and test
The focus of the experiment lies on the quality rooms. A playback with convergent and divergent
feature externalization as one feature to describe the combinations between the synthesized room and the
perception of an auditory event. Externalization listening room are available.
describes the perception of the position of an Within the adjustment session (see section 4c.) the
auditory event outside or inside the head of the test persons are able to adjust the DRR of the
listener [5, 7, 19]. This feature is a crucial feature to synthesized scene until perceptional congruence
reach a plausible spatial auditory illusion with between the synthesis and the real listening room
binaural headphone systems. The dichotomous appears. The internal reference of the test person is
quality feature externalization is counted as the used as reference for this adjustment. The
index of the ratings on a three-point scale. In adjustment is done in the two rooms with resynthesis
additions to the characteristics “in-head” and of the two rooms. The adjustment of the DRR of the
“outside the head” a transition point “outside but synthesis yields to further combinations of listening
close to the head” is used. This scale is motivated by and synthesized room. The names for the room
the individual mapping to a scale of the percept of conditions follow the nomenclature “synthesized
externalization for every test person. Only the scale room (DRR adjustment on room)_listening room”.
point “outside the head” is counted as an Figure 1 gives an overview about the different room
externalized auditory event in further analysis. We combinations with the congruence of the DRR
define the perception of an event very close to the between the DRR adjusted synthesized room and the
head or ears as in-head-localized or non- listening room.
externalized. We suppose that this conservative
approach maps the ratings in a reliable way referred
congruence
AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,
Page 3 of 7
Werner, Klein, and Sporer DRR and Externalization in Binaural Synthesis
5 Ratings
The analysis of the ratings includes the DRR
adjustment and the externalization ratings. The also
available localization ratings are not analysed in this
contribution because of clarity reason. Furthermore,
the ratings for the three used audio signals are
Figure 2. Adjustable steps of the DRR in dB as combined in the presented analysis. No significant
median and 0.25/0.75 quantile from the BRIR differences are found for the DRR and
measurements of all test person´s right ear and 240° externalization indices (p<.05, Fisher´s exact test for
source direction; left=listening lab, right=seminar externalization and Wilcoxon signed-rank test for
room; step 47 is the measured BRIR. DRR).
AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,
Page 4 of 7
Werner, Klein, and Sporer DRR and Externalization in Binaural Synthesis
40
40
loudspeakers in the seminar room are rated with
30
30
absolute DRR in dB
absolute DRR in dB
externalization indices close to one (see figure 4).
20
20
However, not all loudspeaker test stimuli are rated as
10
10
externalized which is especially visible in the front
0
and back directions. The binaurally resynthesized
loudspeakers using artificial head BRIRs are rated 0° 30° 60° 180°
direction
240° 300° 0° 30° 60° 180°
direction
240° 300°
40
40
real loudspeakers. Significant differences (p<.05) are
30
30
absolute DRR in dB
absolute DRR in dB
only found for the 0° and 30° directions. The lowest
20
20
indices are also visible for front and back directions.
10
10
The non-binaural anchor signal is rated with
0
externalization indices slightly above 0.2. The pre-
0° 30° 60° 180° 240° 300° 0° 30° 60° 180° 240° 300°
test shows that the used signals and systems are direction direction
valid for the further experiment. Figure 5. Adjusted DRR in dB of the test persons as
boxplots with 95% conf. int. as notch; dashed line
loudspeaker indicates the DRR of the listening room; circles
1.0
binaural
anchor indicates outliers; top: listening lab (HL) as listening
0.8
0° 30° 60° direction 180° 240° 300° the range of the JND of DRR perception (see section
3). The mean IQDs over all directions are: 9.3 dB
Figure 4. Externalization indices from the pre-test for “SR in HL”, 4.0 dB for “HL in HL”, 4.7 dB for
with 95% binominal confidence intervals; “SR in SR”, and 2.9 dB for “HL in SR”.
resynthesis of a seminar room in the same room
using dummy head BRIRs.
b. Externalization rating
Figure 6 and 7 show the ratings of the quality test for
a. DRR Adjustment the different room conditions and adjusted room
Figure 5 shows the adjusted DRR for the resynthesis conditions. The externalizations indices and the 95%
of the seminar room or listening lab in the seminar binominal confidence intervals are shown for the
room or listening lab. A dashed line indicates the different directions. Significances and effect sizes
DRR of the listening room. A coincidence between (odd ratios) are calculated using Fisher´s exact test.
the median of the DRR adjustment and the DRR of The room conditions in question are highlighted
the listening room is visible for the congruent room with respective asterisks and numbers. Figure 6
condition “HL in HL” and “SR in SR”. Higher shows the rating for the more reverberant seminar
DRRs of about 6.8 dB (mean over all directions) are room as listening room while figure 7 shows the
adjusted for the divergent room condition “SR in listening lab.
HL”. The test persons chose a less reverberant
resynthesis of the seminar room than the listening
room. Slightly higher DRRs are also chosen for the
divergent room condition “HL in SR”. This effect
can be explained with the DRR steps available in the
test. Figure 2 shows that the most reverberant
possible DRR (step 70) of the listening lab is
approx. 3 dB higher than the original measured DRR
of the seminar room (step 53 in figure 2 right).
AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,
Page 5 of 7
Werner, Klein, and Sporer DRR and Externalization in Binaural Synthesis
0° 30°
(significant at p<.001 or at least with p<.11). This
externalization index
externalization index
*** 0.75 *** 0.15
(142.3) (0.8) (max) (1.6)
observation is also conform to the influence of
0.8
0.8
reverberation on externalization. An increase of the
0.4
0.4
amount of reverberation for the resynthesis of the
0.0
0.0
SR(SR)_SR SR(HL)_SR HL(HL)_SR HL(SR)_SR SR(SR)_SR SR(HL)_SR HL(HL)_SR HL(SR)_SR
room condition
60°
room condition
180°
listening lab in the listening lab (“HL(SR)_HL”)
does not significantly (p<.2) increase or decrease the
externalization index
externalization index
*** 0.10 *** 0.13
(max) (1.9) (20.6) (2.2)
0.8
0.8
externalization indices compared to the congruent
condition “HL(HL)_HL”.
0.4
0.4
0.0
0.0
SR(SR)_SR SR(HL)_SR HL(HL)_SR HL(SR)_SR SR(SR)_SR SR(HL)_SR HL(HL)_SR HL(SR)_SR
room condition room condition
240° 300°
6 Conclusion
externalization index
externalization index
0.8
0.4
0.0
externalization index
*** ***
(7.5) (max)
0.8
0.8
0.4
0.0
60° 180°
divergence between the expected room and the
externalization index
externalization index
0.10 ***
(5.5) (6.3)
synthesized room depends also on the reflection
0.8
0.8
0.4
0.0
240° 300°
effect and to adapt the synthesis on the context of
externalization index
externalization index
0.11 ***
(2.6) (9.5)
0.8
0.8
0.4
0.0
AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,
Page 6 of 7
Werner, Klein, and Sporer DRR and Externalization in Binaural Synthesis
AES Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA.
2016 September 30–October 1,
Page 7 of 7