Professional Documents
Culture Documents
This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration
by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request
and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org.
All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the
Journal of the Audio Engineering Society.
_________________________________
ABSTRACT
The reappearance of surround sound in the form of the DVD's 5.1 format has led sound engineers to re-evaluate
current microphone techniques used for stereo recording.
The author has measured various 2-channel main-microphone signals in order to prove that their correlation is
strongly frequency dependent. Due to properties of the human hearing mechanism as well as standard
loudspeaker playback-arrangements frequencies below approximately 700 Hz are particularly critical in terms of
faithful spatial reproduction in a stereo as well as 5.1 surround environment and therefore deserve special
attention already during the recording process. Conclusions from the measurements are drawn and a
microphone system ("AB-Polycardioid Centerfill"), well suited for 5.1 surround, is proposed.
0. INTRODUCTION
Stereo sound recording and reproduction is by now more than 100 Apparently there is a disagreement in respect to the amount of
years old, however the "one and only" method how to capture a capsule spacing needed for “correct spatial reproduction” of a sound
musical performance in the perfect way has not been found due to too event: On the one hand important representatives of the academic
many variables involved: the size of the hall and the individual room world [1,2,3] seem convinced that only small microphone spacings
acoustics, the size of the ensemble, the type of event (is it a live (“small AB”), based on psychoacoustic principles, are able to
performance or a recording session) are all parameters which enter provide correct localisation. On the other hand a large percentage of
into the equation. At the end of the day it is simply also a matter of practicing sound engineers favour largely spaced AB techniques
taste which microphone technique one might prefer for capturing and (“large AB”) [4], or at least use supplementary “outriggers” (largely
reproducing a musical event. spaced omnidirectional microphones in front of the orchestra), due to
the more “open sound” they provide.
Simply put, in order to achieve decorrelation of the stereophonic In addition RCA’s “Living Stereo” [5] series of stereophonic
sound signal we need to have either sufficent spacing between two recordings from the 1950-ies and 60-ies (which used a large
omnidirectional transducers or employ coincident or near coincident AB/Centerfill microphone technique) is still cherished by music
techniques with directional microphones. lovers world wide.
PFANZAGL-CARDONE AB - POLYCARDIOID CENTERFILL (AB-PC)
Research by Yost [8] in 1971 has shown that the low frequency
components of transient sounds in a binaural signal are of greatest
importance for localisation: High-pass filtering of clicks, so as to
include only energy above 1500 Hz resulted in a deterioration of
localisation ability, but the same clicks low-pass filtered to include
only energy up to 1500Hz resulted in little change.
Researchers have proposed various measures for acoustic parameters The technically most objective way to measure how well a particular
related to spatial impression and envelopment: microphone technique is able to translate an acoustic event to the
listener at home with the least amount of alteration is probably the
Traditionally spaciousness is regarded to be linked to "apparent following:
source width" (ASW) and "listener envelopment" (LEV). The 1. make a recording of the (orchestral) event with the microphone
research of Marshall [17] and Barron [18] has proved the importance technique of your choice
of lateral reflections for spatial impression in concert halls. Lateral 2. at the same time make a second recording at the "best seat in the
Energy Fraction (LF) has been used as a measure of apparent source house" position of a concert hall with a binaural arificial head
width (ASW) by Barron and Marshall [19]: (dummy head)
2. while reproducing the first recording through a stereo or 5.1
loudspeaker system make a third recording (using the same dummy
head) in the sweet spot of the listening room
3. measure the correlation of the two dummy head recordings. Of
course one has to be aware that with this method signal distortion
(concerning amplitude, frequency and phase) introduced by the
eqn. (6)
replay system and the acoustics of the listening room will be included
in the evaluation process and might bias the results.
where θ is the lateral angle (where θ=0 is 90° from straight ahead),
However, the microphone technique, which produces the highest
p(t) is the sound pressure (measured by a nondirectional
correlation (regarding the entire frequency range) between the
microphone).
original sound event and the re-recorded reproduction would appear
to be the one with the highest fidelity.
The "Interaural Cross-Correlation Function" IACFt (τ) is a binaural
measure of the difference in sound at the two ears and, hence, of
The measurements for this paper are based on a much simpler
lateralness:
approach: phase correlation over frequency is being evaluated in
respect to various 2-channel stereo main-microphone techniques.
3.3.2 Measurement procedure As will be shown later, the exact amount of correlation cannot be
The measurement employed digital tape-based recorders with 16bit deducted from this simple method, since the measurement provides
resolution (as recording and playback sources), a 1/3rd octave only the peak hold value and not the value (averaged over time) of
bandpass-based digital implementation of a realtime spectrum the correlation coefficient in each frequency band, respectively.
analyser (RTA) with snapshot memory and peak-hold function for all However, the results of the measurements are indicative for the
31-bands, and an analog mixing desk with phase-reversal function on overall behavior in respect to decorrelation of the various main-
the line inputs. microphone techniques.
The length of music passages used for the measurements was usually The results of the measurements with the correlation meter are
between 30s and 2min. compared with the "frequency-dependent-level-attenuation" based
The RTA was set up to measure with a short integration time of 15ms evaluation method.
and peak characteristics (instead of averaging mode), in order to Research by Yamamoto and Nagata at NHK [28] has showed that
optimize accuracy towards detail. the definition D of a recorded source for symphonic music is 0.5.
For the first measurement the signals of the L and R channel of the Measurement of common values of correlation coefficients for
stereo-recording were summed and the resulting mono-signal standard recordings, conducted by the author, showed that the
("L+R") was sent to the RTA, which was set to "peak hold" mode averaged CC was around 0.5 (CCmin=0.0, CCmax=+0.7) for
recordings that used a lot of spot microphones, while recordings with
(with τ=∞; i.e. the peaks are held indefinitely, until the machine is
AB-technique and few spot microphones had an average value of
reset).
around +/-0.0 (CCmin=-0.4, CCmax=+0.4).
Therefore, at the end of the first run-through the peak hold values for
each of the 31 center frequencies were representing the absolute peak
level, which had been measured within the respective frequency
band. The peak-hold values were stored to one of the snapshot
memories.
Fig. 7: The ORTF L-R signal exhibits level attenuation already from
600Hz downwards, but to a smaller extent: it is usually in the range
of 10dB or less.
Fig. 10: The dummy head organ recording exhibits high amounts of
correlation below 200Hz, as was to be expected. Level attenuation in
this frequency range reaches values of 15dB and more. With a Fig. 12: The 40cm AB system (the recording of which was done at a
CC=+0.75 it has one of the highest values of low frequency different occasion) shows a similar behavior: attenuation starts to
correlation. High frequency content of the stereo signal appears to be establishes itself clearly from the 400Hz region downwards. Again,
relatively decorrelated (CC=+0.25). attenuation levels are mainly around 12-14dB.
Overall, the characteristics of the signal in respect to correlation LF(<700Hz): CC=+0.7 (averaged) (CCmin=+0.4, CCmax=+0.8)
are relatively close to small AB (50cm) and the sphere-microphone. HF(>1000Hz): CC=+/-0.0 (avgd) (CCmin=-0.1, CCmax=+0.2)
LF(<700Hz): CC=+0.75 (averaged) (CCmin=+0.5, Cmax=+0.8) full-bw signal: CCtot=+0.4 (avgd) (CCmin=+0.3, CCmax=+0.7)
HF(>1000Hz): CC=+0.25 (avgd) (CCmin=+0.2, CCmax=+0.3)
full-bw signal: CCtot=+0.3 (avgd) (CCmin=+0.25, CCmax=+0.5)
Dummy head microphones: information not available
Fig. 11: The small AB-system with 20cm spacing exhibits clear level
attenuation below 400Hz, which usually is in the order of 12-14dB.
LF(<700Hz): CC=+0.85 (averaged) (CCmin=+0.7, CCmax=+0.9)
HF(>1000Hz): CC=+/-0.0 (avgd) (CCmin=-0.15, CCmax=+0.2)
full-bw signal: CCtot=+0.7 (avgd) (CCmin=+0.4, CCmax=+0.8)
Fig. 15: The 720cm spacing looks more decorrelated again than the
320cm spacing as can be seen mainly in the region between 100 and
200Hz, as well as from the values of attenuation from 40Hz
downwards.
LF(<700Hz): CC=±0.0 (averaged) (CCmin=-0.3, CCmax=+0.4)
HF(>1000Hz): CC=±0.0 (avgd) (CCmin=-0.1, CCmax=+0.1)
full-bw signal: CCtot=±0.0 (avgd) (CCmin=-0.25, CCmax=+0.3)
Level attenuation in the L-R signal seems to have a 9dB/Octave 3.3.3.4 Cross-referencing
slope from the "critical frequency" downwards. With the 20cm For reasons of cross-referencing recordings made at acoustically
spacing, frequencies up to 800Hz got cleary affected, for the 40cm quite different locations have been measured as well.
spacing the "critical frequency" dropped to the 500Hz region, and The ORTF recording of an orchestra in a small church (Fig. 22)
with the 80cm spacing it shifted to the 250Hz region. showed a similar pattern as the one carried out at the Salzburg
Festival Hall (Fig. 17): level attenuation starts with a few dB in the
1kHz region and increases towards lower frequencies to values of
about 10-12dB for the Salzburg Festival Hall and around 14dB for
the church.
On the above mentioned test CD (see section 3.1) recordings of a operation within the measurements will result in getting "2M" and
woodwind trio can be found, executed in relatively dry studio "2S". Since the L-R (=2S) signal in fig. 26 is always weaker than the
acoustics. Taking a look at the measurement of the Blumlein-Pair L+R signal, this means essentially that the side signal (S) of the MS
recording of the trio gives a similar result: from about 1 kHz system had less level than the forward facing cardioid capsule (M).
downwards level attenuation becomes apparent with relatively small This can be due to either
values of just a few dB. a) lack of side reflections (dry studio acoustics), and / or
LF(<700Hz): CC=+0.25 (averaged) (CCmin=-0.2, CCmax=+0.7) b) the side signal was simply included in the mix with several dB less
HF(>1000Hz): CC=+0.1 (avgd) (CCmin=-0.25, CCmax=+0.2) than the M signal.
full-bw signal: CCtot=+0.25 (avgd) (CCmin=+0.1, CCmax=+0.7) However, it is to be noted that once we compensated the lack of level
in the L-R signal of Fig. 26 by about +6 to +10dB, the overall level
attenuation would drop significantly displaying a reasonably
decorrelated stereo signal. It is mainly the frequency range below
100Hz, which displays level attenuation in excess of 10dB (before
level compensation).
Fig. 25: XY-pair (90 deg.) recording in studio The author is of contrary opinion, since the perception of the spectral
LF(<700Hz): CC=+0.9 (averaged) (CCmin=+0.5, CCmax=+0.9) characteristics of an instrument within the orchestra will depend on
HF(>1000Hz): CC=+0.6 (avgd) (CCmin=+0.3, CCmax=+0.75) the position of the listener in the hall, due to the complex radiation
full-bw signal: CCtot=+0.8 (avgd) (CCmin=+0.6, CCmax=+0.95) characteristics of acoustic music instruments interacting with
shadowing effects of adjacent musicians. In order to achieve a more
The 90° XY-microphone system recording on the same CD brought a
naturalistic impression of the sound event it is necessary to achieve
little surprise (Fig. 25): probably due to the dry studio acoustics (lack
proper localization (in the sense of spatialisation, positioning of
of diffuse field) the L-R signal displays almost constant level
instruments in relation to sound stage depth, etc.). The use of too
attenuation also at higher frequencies. For frequencies below 1kHz
many spot microphones (for the sake of sonic "brightness") however
the measurements show a similar behavior as with the already
certainly "blurs" the integrity of the overall sonic picture of the
examined XY-recording (see Fig. 8): significant amounts of level
orchestra in respect to spectral fidelity as well as spatialisation.
attenuation in the L-R signal, mainly in the range of 12-14dB.
"Single point" recordings in the form of coincident or near coincident
Remark: The somewhat strange "hump" at frequencies below 40Hz
techniques have one major disadvantage in comparison to wide AB:
with the last two recordings from the test CD are probably results of
They almost always provide a perspective "from the inside" of the
lack of sound-proofing (mechanical de-coupling) in that frequency
orchestra, meaning from somewhere along the center line that splits
range against exterior noise sources.
the orchestra in half left and right from the conductor's position.
Usually the microphones are also quite close to the orchestra,
especially to the string instruments next to the conductor's podium.
Only XY techniques with crossed cardioids or figure of eight's
(Blumein-Pair) might allow to move the microphones half the stage
width or even more out in the hall. For most of the other
arrangements the microphones stay inside the critical distance
(reverberation radius), usually slightly back from the conductor's
position and between 3 to 5 meters high.
Large AB on the other hand captures the orchestra more from the
"outside". Since the recording normally gets played back on home
systems with loudspeaker spacings of much smaller dimension, it
also seems to be the right approach to try to capture the orchestra in
its full width, which will suffer "downscaling" on playback in any
Fig. 26: MS recording (with cardioid capsule) in studio case. It is the author's experience (with orchestra as well as opera
LF(<700Hz): CC=+0.6 (averaged) (CCmin=+0.3, CCmax=+0.8) recording) that large spaced main-microphone techniques which
HF(>1000Hz): CC=+0.3 (avgd) (CCmin=+0.1, CCmax=+0.7) make use of only very few spot-microphones preserve the sense of
full-bw signal: CCtot=+0.6 (avgd) (CCmin=+0.4, CCmax=+0.8) "space" much better when played back on low-fi sound systems with
very small speaker spacings (tv-monitors, for example) than most
The MS recording displays an interesting characteristic: in contrast other techniques.
to the MS system measurement of Fig. 9, the MS recording from the
studio exhibits clear level attenuation throughout the whole
frequency range. Two plausible explanations come to mind:
First, the L-channel signal in an MS system is essentially of the form
(M+S), while the R-channel signal consists of (M-S). Therefore, the
result of our L+R and L-R signal processing
(20Hz – 20kHz)
4. The "sonic picture" created through the recording should emanate
from one plane ("zero delay plane"), therefore the use of spot
microphones should be restricted as much as possible (with the
exception of vocal- and instrumental soloists)
5. Microphones are preferably pointing from the direction of the
audience towards the orchestra ("audience perspective"), in order to
achieve more natural tonal colors.
6. The use of artificial reverb or room simulation should be kept to a
minimum
At first glance these guidelines seem to be pretty much common
knowledge of practicing sound engineers, however following them
striktly leads to the exclusion of various well-established main-
microphone techniques.
One of the main ideas behind the purist AB-Polycardiod Centerfill to create envelopment, it is important to capture or add rear signal
approach is to preferably picture the entire orchestra from a so-called information of decorrelated nature.It is also the author's experience
"Zero-Delay Reference-Plane" (see also Theile [31]), which is that, if no particular sound event at the back of the hall apart from the
oriented at the AB microphones. An offset of the centerfill section of acoustic response of the room is to be captured, two cardioid
about +/- 1m from the plane seems to be uncritical in respect to microphones, pointing towards the rear of the hall, separated by and
localization distortion (depending on the size of the orchestra and at a distance from the orchestra of at least the reverberation radius
exact position of the microphones), bigger values have to be (critical distance) will deliver appropriate rear channel signals.
compensated for. In general it can be said that the ORTF-Triple may
be positioned further back in the hall in comparison with the AB
6.2 AB-PC in relation to other 5.1 systems
microphones to achieve a better balance between the front and rear
Since the above proposed AB-PC technique, which eliminates as far
parts of the orchestra's middle section. However, since it is also used
as possible the use of spot-microphones, may seem odd to some of
as some sort of spot-microphone for the first desks of the string
the readers, the author was happy to find that another professional in
sections left and right of the conductor it will probably be positioned
the field uses a very similar technique: As described in [32]
nearer to the orchestra than if it was used as a traditional main
Tomlinson Holman uses an arrangement in which he combines an
microphone system.
ORTF-system, panned center-L and center-R , but uses cardioids for
the AB-pair, which are each fully routed to the L and R channel. To
Also, it has to be pointed out that the L and R channel signal of the
this front-channel arrangement he adds delayed spot-microphones.
ORTF-microphone will not be panned fully L and R as usual, but
The information for the rear channels is derived from a mix of
instead panned towards the center (L channel at "10-11a.m.", R
cardioid and omnidirectional microphones positioned in the hall.
channel at "1-2p.m.") according to its function as centerfill system.
Sonic Time-of-arrival differences between the front and rear
microphones are compensated for by advancing the rear microphone
signals in time via shifting of tracks on a harddisk editing system.
On the one hand there is quite a number of proposals for 5.1 surround
microphone systems which make use of relatively small spaced
omnidirectional or as well as directional microphone systems, on the
other hand a recent survey [33] of techniques applied by practicing
sound-engineers in the field has shown that there is a discrepancy:
the majority of sound-engineers actually uses proprietary techniques
for 5.1 recording, which are derived from previous 2-channel stereo
techniques, usually with larger microphone spacings in the range of
1.5 to 4m. Also, there is a trend towards using more omnidirectional
transducers than directional ones; the majority of engineers in this
survey used 3-5 omnidirectional microphones. The advantage of
these "free" microphone arrangements is that they can easily be
Fig. 32: AB-PC with "Decca-Triangle" style centerfill system adjusted according to the requirements of the piece, hall acoustics,
etc.
Figure 32 displays an alternative centerfill system: a "DECCA-
Triangle" style arrangement with 3 cardioids. The center microphone
The conclusion of a project on surround recording of orchestral
C is pointing towards the woodwinds and is routed directly to the
music, carried out by of a group of students at the "Hochschule der
center channel in a 5.1 system. There are significant differences
Künste" in Berlin in 1997 was that the front channel signals should
though in comparison to a traditional Decca–Triangle setup: the
preferably be decorrelated (at least the L and R channel), which
centerfill microphones D and E are not assigned fully to the L and R
could be achieved by using three omnidirectional microphones at a
channel, but are panned "ad libitum" center-L and center-R. The
spacing of at least 1.5m, each. [34]
microphones might be angled slightly outwards for better acoustic
separation (important, since their signals are being panned in relative A comparative study of statistically evaluated listening tests by
vicinity on the L/R stereo-bus). Hildebrandt and Braun [35] also brought the result that for broad
Again, the center microphone C should be compensated in terms of sound sources, like an orchestra, microphone techniques using widely
arrival-time differences to optimize localization. Replacing the single spaced omnidirectional microphones (in that particular case 5 pan-
C microphone with an ORTF-Triple is an alternative, if better potted pressure transducers) were preferred for their superior spatial
coverage of the rear orchestra parts seems necessary. reproduction and localization qualities (for off-axis listening
positions, in the second case).
All the above explanations might sound quite theoretical, therefore
the author would like to emphasize the fact that these microphone
setups are a result of tried-and-tested work practice, which has All the above mentioned studies show that largely spaced systems
already achieved superior sonic results in respect to transparency and provide superior spatial reproduction of sound events, at least for big
spatial reproduction. In connection to the above described AB- sound sources like orchestras.
Polycardioid Centerfill system, the author has coined the term
"Natural Perspective", because this is what the minimalist
microphone technique wants to achieve.
[5] Valin J., "The RCA Bible – A Compendium of Opinion on RCA [21] Griesinger D., "Spatial Impression and Envelopment in Small
Living Stereo Records", Second Edition, The Music Lovers Press Rooms", paper presented at the 103rd AES Convention in New York,
(1994), pp. 123,124 Sept. 1997, preprint # 4638 (H-2)
[23] "Best of Chesky Classics & Jazz and Audiophile Test Disk
Volume 3", Chesky Records, JD111D, Tracks 15, 18
[31] Theile G., "Microphone and Mixing Concepts for 5.1 Music
Recordings", paper presented at the "21st International Audio
Convention" of the VDT in Hannover, 2000, Proceedings (ISBN 3-
598-20362-4), pp. 384
[32] Holman T., "Mixing the Sound (Part 2): Perspective – where do
the sounds go ?", Surround Professional, May/June 2001,
pp.35
10 APPENDIX
Fig. 2: Variation in loudness level as a sound source is rotated in a horizontal plane around the head (from [7])