You are on page 1of 14

Comparative Study of Effective Soundfield Reconstruction 2842 (G-2)

Dermot J. Furlong
Trinity College Dublin
Dublin, Ireland

Presented at AuD,o
the 87th Convention
1989 October 18-21
New York ®
Thispreprint has been reproduced from the author's advance
manuscript, withoutediting, corrections or considerationby
the Review Board. TheAES takes no responsibilityfor the
contents.

Additional preprints may be obtained by sending request


and remittance to the Audio EngineeringSocie_ 60 East
42nd Street, New York,New York 10165,USA.

All rights reserved. Reproduction of thispreprint, or any


portion thereof, is not permitted withoutdirect permission
from the Journal of the Audio EngineeringSociety.

AN AUDIO ENGINEERING SOCIETY PREPRINT


Comnarative Study of Effective Soundfield Reconstruction

D.J. Furlong

Dept. of Microelectronics and Electrical Eng.


Trinity College Dublin,
Ireland.

Abstract

Given that the goal of exact physical reconstruction of primary (concert hall) soundfields in
secondary (domestic) environments is not practical, attention needs to be directed towards
the optimisation of an effective approximation. Whilst the necessity for fiat frequency
response of all audio components is a widely accepted criterion for the correct
reconstruction of the direct soundfield component, their is little agreement as to how the
indirect soundfield component should be handled, while, at the same time, it is generally
recognised that the indirect component is critical to the quality of the acoustic impression of
a concert hall. This study examines how a quantitative measure of subjective preference for
concert hall soundfields is preserved in transmission from primary to secondary
environments under mono, stereo, and ambisonic formats, thereby providing an index of
effective soundfield (direct and indirect) reconstruction.

Introduction

It is perhaps surprising to the average person that the _ane of that part of audio engineering
which concerns itself with the reproduction in a secondary domestic environment of a
primary concert hall performance is the lack of a clearly defined overall design objective.
What might be even more surprising is that many working in this field have failed to realise
this shortcoming. Yet, fundamental to the process which constitutes engineering design is
the establishment and articulation of objectives and criteria by which the end product may
be assesed. When such are lacking, it is true to say that the pursuit being undertaken
cannot be categorised as engineering science, and it is indeed the case that much of what
has been undertaken in audio engineering constitutes what is referred to as "black art"
rather than systematic design procedure. Nowhere is this more manifest than in the
specification of microphone arrangements for signal capture and loudspeaker systems for
signal reconstruction. Much debate revolves around the questions as to the number,
location and type of microphones for concert hall recording and loudspeakers for domestic
reproduction. For example, whether spaced or coincident microphones are used for stereo
recording, and whether loudspeakers have a flat amplitude or power response are popularly
relegated to being a matter of individual preference, as there is no accepted basis by which
to establish the superiority of any particular technique. That it is the transduction aspects of
recording and reproduction that abound most in confused design objectives is a result of the
fact that it is at these stages that a single input single output systems representation ceases to
be sufficient, as the concern is not merely with the frequency response between input and
output points, as is the case, say, for amplifiers, but rather with the handling of spatially
distributed soundfields.
Given this state of affairs, it is essential that we formulate clearly a design objective for
the audio recording and reproduction system as a whole, to which all subsequent design
decisions may be referred, be they global or local. Thus, it will be stated that the
function of the audio recording and reproduction system should be to
optimally reconstruct the concert hall experience for the domestic living
room listener. This may seem a very general statement, but it has two immediate
consequences. First, it distinguishes the domain of discourse and thereby segregates
relevant from irrelevant considerations. For example, and by way of analogy, in the
domain of photography, standard camera design is undertaken with reference to the
objective of projecting a three dimensional space onto a two dimensional surface with a
minimum of distortion. No consideration is given to ultimate three dimensional image
reconstruction as the defined purpose of the device is to generate a satisfactory planar
representation. However, if the design objective were to be altered by specifying that the
final image should reconstruct the three dimensional impression of the original space, then
camera designers would need to broaden their considerations to stereoscopy or holography,
which considerations would undoubtedly have an effect on the technical specification of the
details of the camera system. When we consider that it is widely recognised that the
concert hall listener experience is very much influenced by the acoustic of the performance
space, it becomes evident, in the light of the specified objective, that the reconstruction
techniques used must be such as to preserve the impression of the concert hall soundfield
and not merely to reproduce the sound of the orchestra, i.e. it is not sufficient to translate
the orchestra into the living room acoustic, but, rather, we seek to translate the experience
of hearing the orchestra in the concert hall acoustic to the living room listener. The second
consequence of the stated design objective is that it refers all subsequent considerations to
the comparison of the experience of the listener in the primary environment with that in
the secondary environment. This very simple and obvious requirement has rarely been
articulated with any force, (but see Han [1]) with the result that the specification and design
of audio components, such as loudspeakers, has usually proceeded without direct reference
to how the resulting listener experience compares to that in the concert hall, and,
consequently, with considerable disagreement as to what constitutes a desirable design
objective [2].

Physical and Perceptual Soundfield Reconstruction

The most direct method of achieving the specified objective would be to initially exclude
the listener and restate the design goal in terms of physical reconstruction of the concert hall
soundfield in the domestic living room. That is to say, if we were to reproduce exactly
such a portion of the concert hall soundfield as would fit into the living room, then the
aural experience of the reintroduced listener would have to be the same as that in the
corresponding region of the concert hall. This holographic approach has been investigated
by Romano [3]. Consider recording at M points in a concert hall, and playing back
through L loudspeakers in a listening room such that the original soundfield is exactly
reconstructed at a corresponding M points. Evidently, if the original soundfield is to be
recreated, then compensatory filtering must be introduced for the response of the secondary
listening room. In dealing with the recorded signals and the loudspeaker to listening
position impulse responses, it is essential that directional information be preserved and
consequently a vector representation is used. Thus, if the recorded signals, which we seek
re reconstruct, are given by yl(x,y,z I t), Y2(x,y,z I t).... YM(X,y,z-It) where
Yi(x,Y,z It) = [Yi(x(t)),yi(Y(t)),yi(z(t))] and the impulse response between loudspeaker i
and point j is hij (x,y,z It) = [hij(x(t)), hij(y(t)), hij(z(t)) ], then the input, xi(t) necessary
2
for each of the loudspeakers may be found by inverse Fourier transforming each of Xi(f)
given by:

Xl(f ) Hll(X(f) ) H21(x(f) ) ... HLl(X(f) ) -1 Yl(X(0 )


X2(f) Hll(Y(f)) H21(Y(f))... HLI(Y(f)) Yl(y(f))
· Hll(Z(f)) H21(z(f)) ... ULl(Z(f)) Yl(Z(f))
.,. ., ...... , .

H1M(x(f)) H2M(x(f))... HLM('x(f)) YMiX(f))


H1M(y(f)) 82M(Y(f)) ... HLM(Y(f)) YM(y(f))
XM(f H1M(Z(f)) H2M(Z(f)) ... HLM(Z(f)) YM(Z(f))

Although this filtering could be achieved with precision, information theory


considerations reveal the impracticality of such an endeavour. If we were to reconstruct the
full 20KHz bandwidth soundfield, it would be necessary to specify M points in a volume V
with maximum spacing S = Lmin/8 = 343/(20'103'8) - 2.14 '10 -3 metres. So, for
even a 1 cubic metre volume, the number of microphone to listening point channels
required, M, would be of the order of V/S - 108. Obviously, holographic reconstruction
is not easily or cheaply realisable, and therefore it is necessary to re-examine the objective
of complete physical mapping between spaces.

If the soundfield impression reconstructed in the domestic living room were


indistinguishable from the original performance, then it would be of no consequence
whether or not the the exact physical sound structure had been recreated at the listener's
ears. That is to say, if the recording and reproduction techniques used were such as to
preserve those features of the original soundfield which are o{ perceptual importance, then
a valid reconstuction would be achieved. The need therefore arises to identify those
attributes of the concert hall soundfield which are of perceptual significance. At any
listening position in a concert hall, the soundfield will be a combination of a direct
component due to the straight line propagation from the source to our ears, and an indirect
component due to the reflected sound which constitutes thc acoustic of the hall.
Historically, audio engineering has concentrated on refining the reconstruction of the direct
component by recognising that, at least to a first approximation, the dominant perceptual
characteristics of loudness, pitch, and timbre were strongly correlated to the physical
parameters of intensity, frequency, and spectral distribution, respectively. Perceived
impression can therefore be manipulated by physical parameter variation. As the design
objective here specified concerns the complete soundfield, it is necessary to also identify
which characteristics of the indirect soundfield are perceptually of particular importance,
and what are the correlated physical parameters. These issues have been the subject of
considerable research effort in architectural acoustics over the last three decades, and whilst
the results achieved have not always been in entire agreement, it is fair to say that
considerable progress has been made in both the identification of significant perceptual
characteristics and related physical parameters [4,5]. While reverberation time has
traditionally been recognised as an important factor in the judgement of acoustic quality,
experience has suggested that the achievement of satisfactory reverberation time values
alone does not guarantee acoustic excellence. The recognition by Schroeder [6], from
preference judgements of paired comparisons of concert hall soundfields and
multidimensional scaling analysis, of the importance of interaural coherence indirectly
substantiated the findings of Barton [7] with regard to the significance of early lateral
3
reflections for the production of a desirable "spatial impression" in a concert hall. Ando [8]
developed these insights by examining the nature of the signals arriving at the two ears,
identifying the relevant physical parameters, and then performing paired comparison tests
to establish listener preferred values for these parameters in relation to various types of
source signal. For the concert hall listener, the left and right ear signals resulting from an
impulsive source, p(t), on stage are given by
oo

fl = n_=lP(t)*Anwn(t-Atn)*hnl(t)

oo

fr = JlP(t)*Anwn(t'Atn)*hnr(t)

where An and Atn are the nth reflection amplitude and delay relative to the direct sound,
Wn(t) is the boundary wall impulse response, and hnl and hnr are the impulse responses
from free field to left and right ears respectively. Very briefly, Ando hypothesises that all
acoustic information must be contained in these signals and that, therefore, the significant
objective parameters can be reduced to the following:

(i) Listening Level

(ii) Delay times of early reflections

(iii) Subsequent reverberation time

(iv) Interaural cross-correlation

By carrying out a series of paired comparison tests, preffered values for each of these
parameters were identified, in relation to various source signals (speech, vocal music,
orchestral music etc) which were characterised by their effective auto-correlation duration,
Ate, this being the time taken for the auto-correlation function of the source signal to
decrease to one tenth of its maximum value. The results achieved from these tests can be
summarised as follows:

(i) A listening level of approximately 79dB is most preffered for all source types.

(ii) The most preferred delay time to the strongest reflection is given by Atp -- [1 -
lOgl0A]Ate, where

/ co A2 x 1/2

(iii) The most preferred reverberation time is given by

[Tsub]p = 23Ate (seconds)

(iv) For all source types, the magnitude of the interaural cross-correlation defined by

_cc = IOlr(At)lmax
forIatl< lms should be as small as possible.
6
Knowing the objective soundfield parameters and their most preffered values for a
particular source type, allows the calculation of a total scale value of preference, S, for any
listening position in a concert hall, thereby providing a quantitive preference rating. At any
particular location, for any particular source type

S = - 4 a. Ix.13/2
i=z--,11' Il

where for each oftbe objective parameters, ai and xi have been defined by a least squares
fit to the measured preference data, normalised by the most preffered value. For listening
level, x 1 = 201ogP / 79, al= 0.07 for x 1 z 0 and aI = 0.04 for x! < 0. For reflection
delay time, x2 = logAt / Atp, a2 = 1.42 for x2 > 0 and a2 = 1.11 for x 2 < 0. For
reverberation time, x 3 = logTsu b / 23Ate, a3 = 0.45 + 0.74A for x 3 > 0 and a3 = 2.36 -
0.42A for x2 < 0. For interaural cross-correlation, x 4 = IACC, and a4 = 1.45. The
nearer S is to 0, the nearer the listening conditions are to the ideal for the particular source
type in question. By evaluating the total preference value for each listener location, a map
of listener preference may be established which indie_es how listener preference will vary
over some distributed region of the concert hall.

Investigation of Effective Soundfield Transmission

The concern being addressed here is not how good a particular environment is for the
performance of a particular type of music, but, rather, how well the perceptual impression
of a primary soundfield (with specific inclusion of the indirect relected eomponen 0 is
transmitted to a secondary environment. To assess this it is necessary to compare the map
of S values over some area in the concert hall to the map of S values over the same area in
the domestic living room. If there is a strong correlation between the maps, then the
listener preference for the perceptual impression of the domestic living room soundfield
will approach that of the concert hall situation. Thus, by adopting this approach, it is
possible to investigate such matters as the effectiveness of various recording and
reproduction formats in soundfield reconstruction, the influence of microphone and
loudspeaker polar characteristics, and the effect of listening room construction details on
effective soundfield transmission.

For the purposes of preliminary experimentation, a room impulse response simulation


package was developed which allowed the evaluation of listener preference with full control
of both primary and secondary room shape, size and absorbtion characteristics,
microphone and loudspeaker number, directionalities and placement, and listener locations.
Room impulse responses were generated using the image method [9,10] for a single
omnidirectional source on stage. In order to allow flexibility in the specification of
geometric and acoustic details, while avoiding inordinate representational complexities, a
systematic framework for all geometric details was achieved through the adoption of
homogeneous coordinates for all points, lines, a_d planes. Generation of any potential
higher order virtual source location could then be simply achieved by multiplication of the
initial source, or virtual source, location by the appropriate plane reflection transformation
matrix. As, using the image method for impulse response simulation, the computational
load increases greatly with room shape complexity, and since details of room shape were
not critical to the present study, the room models used for the following investigations were
5
rectangular. Furthermore, since the number of parameters which can be varied is large
(primary and secondary room size and absorbtion, microphone placement, loudspeaker
placement, listener locations etc), some resl_rictionswere adopted in order to limit the
number of degrees of freedom. The primary (concert hall) environment was taken to have
dimensions roughly corresponding to those of Boston Symphony Hall (25 *50*18 metres),
wnile the seconuary (domestic living room) environment was dimensioned as specified in
BS6840:13 Recommended Listening Room Dimensions, i.e.(6.7'4.2'2.8 metres). The
wall absorbtion characteristics were chosen such that the calculated reverberation time in the
primary environment was 1.8 seconds, and that of the secondary environment was 0.3
seconds. A matrix of 25 listener locations were examined in each space, and this matrix
was chosen to encompass an area of 1 square metre, this covering the typical range of
listener movements in normal situations [11]. Simulated impulse responses were then
generated for each of the 25 locations in the primary room from which measurements were
made of the objective parameters required to calculate listener preference. Sound pressure
level was evalua_l using

SPL = PWL + 10log(l+ A 2) - 201ogd0 - 11 (dB)

where PWL is the power level of the source, A is the total amplitude of reflections, and do
is the distance between the source and the particular listener position. PWL was assigned a
value which made the SPL at the centre point of the listener array equal to the optimum
value of 79 dB. The delay time between the direct sound and the strongest reflection was
found by simple examination of the impulse response

At1 = (d1 - do) / c (seconds)

where d 1 is the path length to the strongest reflection, and c is the speed of sound.
Reverberation times were initially calculated using the integrated impulse response method.
However, as this proved to be computationally expensive in that response times oftbe
order of a second were required for accuracy, and as reverberation time was not found to
vary much over a particular space, Millington's formula was used to calculate an estimate
which was assumed to apply throughout. This assumption allowed the truncation of all
simulated impulse responses to a value of around 0.5 seconds, as the remaining parameters
were fotmd to be evaluated reasonably accurately from the first few hundred milliseconds
of response. In the light of not being able to model the listener's head, IACC was
evaluated at each listener location by cross-correlating the impulse responses received at
two omnidirectional microphones at a distance of 16.4 em each side of the listener position,
as 32.8 em has been found to be the effective acoustic distance between listeners' ears
[12,13]. From these measures, the total preference value, Sll..25for the matrix of
primary environment listener locations was calculated for an assumed orchestral music
source characterised by a Ate value of 0.075.

Before repeating this process for the secondary environment, it was necessary to
convolve all simulated loudspeaker to listener location impulse responses with signals
recorded in the primary space. The simulated recording microphone options available
included spaced or coincident techniques, with omnidirectional, bipolar, cardiod,
hypercardiod, or supercardiod directional characteristics. Coincident recording was
_mplemented using a simulated Soundfield microphone arrangement of three orthogonal
bipolar characteristics (x, y, z) and an omnidirectional characteristic (w). This allowed
easy generation of any coincident recording arrangement, from mono, through stereo, to B-
format ambisonic, by manipulation of the relative contibutions of w, x, y, and z. In order
6
that thc interaction of loudspeaker directionalities and listening room characteristics might
be investigated, the number, location, and polar characteristics of the secondary
environment loudspeakers were variable. The range of loudspeaker polar characteristics
were idcalised to being one of omnidirectional, bipolar, cardiod, hypercardiod, and
supercardiod, over the entire audio bandwidth. The secondary environment listener
location impulse responses were thus simulated by summing the contributions from each
loudspeaker convolved with its appropriate recorded impulse response, from which the
objective measures were calculated as above, the only difference being that the
reverberation time was taken to be the same as in the primary environment, this being the
general case for electroaconstically coupled rooms [14]. From these measures, the total
preference value, S21..25 for the matrix of secondary environment listener locations was
calculated. The simulation arrangement is summarised in Fig. 1.

Finally, a measure of similarity or difference was sought in order to calculate an index of


correlation between the matrices of S1 and S2 values. Initially, normalised cross-
correlation between S1 and S2 was used as an index of similarity. However, this proved
not to be very sensitive to the obvious dissimilarities between the S1 and S2 matrices, but
rather reflected the similarities in the variation of S1i or S2i values relative to their
respective averages. Thus, if S11..25 were all at a constant value of K 1 and S21..25 were
all at a constant value of K2, the cross-correlation would be unity irrespective of whether or
not K 1 = K2. Consequently, a Euclidean distance measure was adopted as an index of
difference between the two matrices, defined as

25

Index of Preference Field Difference D.I. = [i_l(Sli - S2i)2 ]1/2

This has the disadvantage of not being bounded, as is the case for normalised cross-
correlation, but this is not seen as particularily detrimental as the concern is with relative
comparison rather than absolute classification. A DI value of zero indicates exact
preferential similarity for primary and secondary soundfields, while increasing values
provide a relative measure of preference matrix dissimilarity. It might be noted that a more
refined test ofsoundfield reconstruction can be based on the individual preference ratings
for each of the objective parameters (SPL, At1, Tsub, and IACC) as this gives more
detailed insight into the nature of the distortion being introduced. However, for simplicity
of comparison, DI values will be used throughout this presentation.

Even with the simplifications and idealisations outlined above, the computational
requirements are still very high, mainly due to the performance of all calculations on full
20kHz bandwidth signals, and the necessity of performing multiple convolutions of long
impulse responses. For this reason, all computations were carried out on an Inmos T800
transputer system, the speed of which makes repeated, complex room acoustic simulations
practicable.

Experimental Results

The nature of this study is such that the experimental results can not be taken as final or
conclusive, as the number of parameters involved and their range of variations mitigate
against exhaustive investigation. However, what has been attempted is to calculate DI
values for various configurations where one significant parameter is allowed to vary whilst
all others are kept at some fixed values. Of principal interest to the investigator was the
7
effect of choice of recording format on DI. Very generally, there seems to be tacit
agreement that stereo transmission is superior to mono in the transmission of the concert
hall soundfield impression with little appreciation as to why this might be, other than
recourse to the ability to localise sources on the sound stage with stereo. On the other hand
there seems to be quite a prevalent, but again tacit, belief that the various "surround sound"
techniques that have been introduced over the years are mere contrivances with limited
novelty value compared to purist stereo techniques, and indeed, in many cases, this is tree.
What is being suggested here is that the assessment of any particular reconstruction
methodology can be examined in the light of its comparison to the original concert hall
impression and to other reconstruction methodologies, rather than be merely a matter of
subjective opinion. In the light of these considerations, the initial investigation undertaken
was to compare DI values for mono, stereo, and ambisonic formats (using a single
Soundfield microphone placement) with all other factors being constant. Concert
hall and listening room dimensions and absorbtion were as specified above, while listener
position matrices in both spaces, and microphone and loudspeaker placement were as
shown in Fig. 1. Under these arrangements, the following results were achieved:

Index of Soundfield Difference


DI

Mono 6.0715
Omnidirectional mic
1 loudspeaker with cardiod direetivity

Mono 4.9952
Omnidirectional mic
2 loudspeakers with eardiod direetivities

Mono 5.1374
Omnidirectional mic
4 loudspeakers with cardiod directivities

Coincident Stereo 4.93


Bipolar mics
2 loudspeakers with cardiod direetivities

Coincident Stereo 5.2254


Cardiod mics
2 loudspeakers with eardiod directivities

Coincident' Stereo 5.2179


ltypereardiod mits
2 loudspeakers with cardiod directivities

Coincident Stereo 5.1678


Supereardiod mies
2 loudspeakers with cardiod directivities
8
B-format Ambisonic 4.0596
4 loudspeakers with omni directivities

B-format Ambisonic 4.1459


4 loudspeakers with cardiod directivities

B-format Ambisonic 4.0411


4 loudspeakers with hypercardiod directivities

B.format Ambisonic 4.1122


4 loudspeakers with supercardiod directivities

Recalling that all the above results were achieved based on the one Sound field microphone
recording, some comments can be made regarding the DI values. First, ambisonic ratings
are lowest in general, irrespective of the directional characteristics of the loudspeakers. At
the other end of the scale, mono reconstruction using a single loudspeaker is furthest from
the original soundfield impression. This is not surprising, as ambisonic techniques were
developed to preserve the directional nature of the concert hall soundfield, while mono
techniques were never intended to transmit spatial information. Consequently, the
measurements of IACC in the secondary environment for mono are high at all the listener
locations (0.39 - 0.57) compared to those for ambisonic transmission (0.23 - 0.3 I) or
compared to those in the concert hall (0.1 - 0.23). This situation is not at all ameliorated by
adding extra loudspeakers in mono reproduction as the IACC values are modified to (0.43 -
0.51) for two loudspeakers, and (0.55 - 0.6) for four loudspeakers. The changes in DI
values for ambisonic transmission with variation in loudspeaker directivity is an indication
of a complex interplay between variations in IACC and reflection delay time (SPL and
reverberation time being the same for all 4 loudspeaker cases). For omnidirectional
loudspeakers, [ACC values are in a range of (0.24 - 0.31), while this changes to (0.23 -
0.31) for cardiod, (0.18 - 0.3) for hypercardiod, and (0.21 - 0.31) for supercardiod
characteristics. The preference for omnidirectional loudspeakers above cardiod or
supercardiod is a consequence of the changes in reflection patterns, and therefore in the
delay time to the strongest reflection At1, which results from the interaction of the
loudspeakers with the listening room boundary walls. Any such changes will depend on
the details of room shape, wall absorbtion coefficients, loudspeaker directivity and
placement. That this is not a simple interaction can be seen from the following tests, where
the room absorbtion coefficients were changed to values approaching their limits:
Index of Soundfield Difference
DI

B-formatAmbisonic 4.3841
4 loudspeakers with eardiod directivities
All wall absorbtion coefficients = 0.19

B-format Ambisonic 5.229


4 loudspeakers with cardiod directivities
All wall absorbtion coefficients = 0.99
9
The latter result seems to contra-indicate the use of dead rooms for ambisonic reproduction.
This same loudspeaker-room interaction is responsible for the DI value variation for
coincident stereo reconstruction given above. The IACC value ranges go from (0.35 -
0.42) for the Blumlein coincident bipolars, to (0.42 - 0.49) for 900 coincident cardiods, to
(0.41 - 0.47) for 900 coincident hypercardiods, to (0.4 - 0.46) for 90o coincident
supereardiods, while the delay time to the strongest reflection, Atl, varies in an
unpredictable way. As a result, the DI values for coincident stereo overlap with those of
mono while, in all cases tested, the IACC values for coincident stereo were lower on
average, indicating that the stereo reproductions are more spacious than those using mono.
It is indeed the case that the ranking indicated could very well be altered by changes to
microphone or loudspeaker positioning and/or listening room characteristics.

While in all the above we have been comparing like with like, in the sense that all
recordings were made at the same location, it would be interesting to examine the DI values
for spaced stereo recording techniques. To this end, spaced stereo recordings were made
with omni and cardiod microphones for the various locations indicated in Fig. 1. The
results achieved are as follows:

Index of Soundfield Difference


DI

Spaced Stereo 5.4543


Omni mics at locations
(10,20,4) and (15,20,4)
2 loudspeakers with cardiod directivities

Spaced Stereo 5.6302


Cardiod mics at locations
(10,20,4) and (15,20,4)
2 loudspeakers with cardiod direetivitics

Spaced Stereo 7.93


Omni mics at locations
(7.5,20,8) and (17.5,20,8)
2 loudspeakers with eardiod direetivities

Spaced Stereo 8.3578


Omni mics at locations
(7.5,35,8) and (17.5,35,8)
2 loudspeakers with cardiod directivities

The high values recorded indicate a relatively large difference between primary and
secondary preference value matrices. Examination of the details show IACC ranges of
(0.44 - 0.5), (0.42 - 0.49), (0.45 - 0.52), and (0.36 - 0.43) for each of the above,
respectively, with At1 values showing wide fluctuations, leading to the markedly larger DI
figures. The IACC results are not very different from those achieved using coincident
stereo, while, in general, spaced stereo recordings are judged to be "airier" or more
spacious than coincident techniques. To some degree, the DI values above substantiate the
i0
remarks of Lipshitz concerning the artificiality of the spaciousness of spaced stereo
reproduction [14].

Observations

The comparative, exploratory nature of this investigation is such that it would be wrong,
even if it were possible, to draw any firm conclusions. However, as one becomes more
familiar with any experimental apparatus, various insights emerge. In the ease of the
simulation studies carried out to date, an observation of note is that, at least for the B-
format signals used, ambisonic reconstruction consistently leads to the lowest DI values,
from which it can be inferred that listener preference will be a maximum. It is good to see
this in parallel with the more physical approach to ambisonic wavefront reconstruction
[15], which tends to highlight its limitations. Obviously, some further interesting
experiments would be to look at 6 loudspeaker ambisonic reproduction, and the effect of
UHJ encoding-decoding. These issues will be examined in the near future. Another point
of interest is that the interplay of loudspeaker directivity and listening room geometry and
acoustics can have very considerable effects on listener preference, as noted by
Griesinger[16]. This is surely even more the case with real loudspeakers whose
directivities will vary significantly with frequency. It is therefore not very surprising that
loudspeakers whose on-axis responses measure very similarly in the laboratory, may
sound very different in situ. Detailed investigations of the nature of these interactions in a
wide variety of listening rooms should prove interesting for the specification of general
purpose loudspeaker directivity characteristics. Finally, while the presentation here is
based entirely on simulated datal there is no reason other than the availability of equipment
why the same investigative methodology could not be applied to real recordings and
reproductions. It would be very interesting to examine the relative performance of digital
surround sound processors, direct-reflect loudspeakers, stereo shuffling, and binaural
W_ehniques,along with the mono, stereo, and ambisonic formats examined here, with a
view to refining their obvious respective merits and diminishing their shortcomings.

Acknowledgements

The author would like to thank Dr. David Vernon of the Comp. Sc. Dept., Trinity
College Dublin for the introduction to homogeneous transformations, use of transputer
facilities, and general support and discussion. Thanks are also due to Mr. John Coleman
of Bose (Itl) for equipment and support.

ll
References

[1] Han, H.L., "Frequency responses in acoustical enclosures", presented at the 82 nd


AES Convention, London (March 1987), preprint 2452.

[2] Toole, F.E., "Loudspeaker Measurements and Their Relationship to Listener


Preferences: Part 1", J. Audio Eng. Soc., vol.34, pp. 227-235 (Apr, 1986).

[3] Romano, A., "Three-Dimensional Image Reconstruction in Audio", J. Audio Eng.


Soc., vol.35, pp. 749-759 (Oct, 1987).

[4] Kuttruff, H., Chap. VII, Room Acoustics (Applied Science Publishers, 1979).

[5] Cremer,L. and Muller, H. A., Part III, Principles and Applications of Room Acoustics,
Vol. I (Applied Science Publishers, 1982).

[6] Scbroeder, M. et al., "Comparative study of European concert halls: correlation of


subjective preference with geometric and acoustic parameters", J. Acous. SOc.Am., vol.
56, pp.1195-1201 (Oct, 1974).

[7] Barton, M., "The Subjective Effects of First Reflections in Concert Halls - The Need
for Lateral Reflections", J. Sound Vib., vol. 15, pp. 475-494 (1971).

[8] Ando, Y., Concert Hall Acoustics, (Springer-Verlag, 1985).

[9] Allen, J.B. and Berkeley, D.A., "Image method for efficiently simulating small-room
acoustics", J. Acous. SOc.Am., vol. 65, pp.943-950 (Apr, 1979).

[10] Borish, J., "Extension of the image model to arbitrary polyhedra", J. Acous.
SOc.Am., vol. 75, pp.1827-1836 (June, 1984).

[11] Plenge, G., "On the behavior of listeners to stereophonic sound reproduction and the
consequences for the theory of sound perception in a stereophonic sound field", presented
at the 83rd AES Convention, New York (Oct, 1987), preprint 2532.

[12] Tohyama, T. and Suzuki, A., "Interaural cross-correlation coefficients in stereo-


reproduced sound fields", J. Acous. SOc.Am., vol. 85, pp.780-786 (Feb, 1989).

[13] Lindevald,I.M. and Benade, A.H., "Two-ear correlation in the statistical sound fields
of rooms", J. Acous. SOc.Am., vol. 80, pp.661-664 (Aug, 1986).

[14] Lipshitz, S.P., "Stereo Microphone Techniques", J. Audio Eng. Soc., vol.34, pp.
716-744 (Sept. 1986).

[15] Vandcrkooy, J. and Lipshitz, S.P., "Anomalies of Wavefront Reconstruction in


Stereo and Surround-Sound Reproduction", presented at the 83 rd AES Convention, New
York (Oct, 1987), preprint 2554.

[16] Griesinger, D., "Spaciousness and Loealisation in Listening Rooms and Their Effects
on the Recording Technique", J. Audio Eng. Soc., vol.34, pp. 255-268 (Apr, 1986).

12
Primary Room (not to scale)

y Height, z = 18m

oSource Reverberation time = 1.8 sec


(12.33,42.5,2.2}

Q f3 Listener matrix centre


= (12.5,20,1.1)
50m lm
Q
------.i_mtD Soundfield Microphone
position = (12.5,20,1.1)

Spaced mic positions


L (7.5,20,8) , R (17.5,20,8)
L 00,20,8) R (15,20,8)
L (7.5,35,8) R (17.5,35,8)

x
25m

Secondary Room (not to scale)


Height = 2.8m
y Reverberation time = 0.2 sec.

· · · Listener matrix centre


= (2,3.35,1.1)
lm

6.7m ina Loudspeakers


(2.1,6,1.25)

· · (1,6,1.25), (3.2,6,1.25)
(1,0.7,1.25), (3.2,6,1.25)
X
4.2m

Fig. 1 Simulation Arrangement.

You might also like