Professional Documents
Culture Documents
l=0
w
mi
l
R
x
i
x
i
(nl) R
x
i
d
m
(n) = 0. (6)
For the causally constrained-case, one sets 0 n L1.
For the unconstrained case one sets n and the
summation operates fromto . In this equation, R
x
i
x
i
and R
x
i
d
m
are the auto-correlation of the reference signal
and cross-correlation between the reference signal and
array microphone m, respectively. For the applications
discussed in this paper, the proposed method relies on
the unconstrained normal equations. In this case, the op-
timal lter from reference signal i to microphone m is
found using discrete Fourier transform of the previous
equation. Thus, for frequency bin k [5]
w
mi
(k) =
S
x
i
d
m
(k)/
S
x
i
x
i
(k) 0 k L1, (7)
where indicates frequency domain quantity. Finite-
impulse-response lter coefcients w
mi
are found by in-
verse Fourier transform. In Eq. (7),
S
x
i
d
m
is the cross
spectral density and
S
x
i
x
i
the power spectral density.
AES 55
TH
INTERNATIONAL CONFERENCE, Helsinki, Finland, 2014 August 2729
Page 3 of 8
Gauthier et al. Microphone arrays and proximity microphones
Foreground
source i
Proximity
microphone
Microphone
array
Surrounding sound
environment
(background sources)
SIMO optimal
filter
W
i
Reference signal
x
i
(n)
Array signals
d
m
(n)
Surrounding signals
s
mi
(n)
-1
Foreground signals
y
mi
(n)
+
M channels M
M
Fig. 2: Signal processing for foreground and surrounding signal separation at the microphone array illustrated for the
i-th reference signal and M-microphone array.
5. NUMERICAL SIMULATIONS
The results are based on an audio ray-tracing simulation
of the environment shown in Fig. 3. The space simulates
two large rooms and two corridors. The room height is
4 m. The largest dimensions of the model are 76 m by
21 m. Wall reectivity coefcients are set to 0.8, 0.75
and 0.7 for the low, mid and high frequency ranges, re-
spectively. The 3D model was done with Blender [7]
and the acoustical simulation was achieved using the
E.A.R. [8] plug-in for Blender. The sound speed is
343 m/s. Based on Schroeder curve of one impulse re-
sponse obtained from the model, the reverberation time
of the space is approximately 1.59 seconds. Preliminary
validation tests of E.A.R. with free-eld condition in or-
der to verify sound speed and direction of arrival at the
circular microphone array were done and proved conclu-
sive. E.A.R. was also shown to be in good agreement
with Sabine prediction of the reverberation time [8].
The background and foreground sound sources are simu-
lated as omnidirectional sources fed by real monophonic
recordings of various machinery noise (engines, indus-
trial sewing machine). These stationary or nearly station-
ary signals are shown as Welch power spectral densities
in Fig. 4. These power spectral densities overlap over
certain frequency bands. The foreground sound tends to
cover more uniformly the entire spectrum and the back-
ground sources have more pronounced low-frequency
content. The reference signal used for the foreground
source was simulated with a omnidirectional proximity
microphone 50 cm from the foreground sound source in
the 3D model. In Fig. 3, note that the foreground source
10
1
10
0
100
80
60
40
20
Frequency [kHz]
P
o
w
e
r
/
f
r
e
q
u
e
n
c
y
[
d
B
/
H
z
]
Background source #1
Background source #2
Foreground source
Fig. 4: Power spectral densities of sources.
and the background source #1 can both produce direct
sound at the microphone array while the background
source #2 cannot provide direct sound at the microphone
array, since this background source is not aligned with
an opening like background source #1. This may impact
on the acoustical imaging results and should be kept in
mind. Simulations were done with a 44.1 kHz sampling
rate before being downsampled to 12 kHz to test the pre-
sented method. Downsampling was achieved to reduce
the computational burden of the algorithm to illustrate
the validity of the method. This does not limit the ex-
tent of the reported results since they could be achieved
at higher sampling rates.
To capture the sound eld, a 48-channel microphone ar-
ray was used and shown in Fig. 3. The microphones are
omnidirectional. It is a uniform circular and horizontal
array at 1.22 m above the oor in the 3D model. The
array radius is 1 m.
AES 55
TH
INTERNATIONAL CONFERENCE, Helsinki, Finland, 2014 August 2729
Page 4 of 8
Gauthier et al. Microphone arrays and proximity microphones
Background
sound source #1
Foreground
sound source
48-microphone
array
Background
sound source #2
76 m
2
1
m
135
o
176.8
o
Fig. 3: Top view of the modeled environment with background sources and a foreground source.
6. NUMERICAL RESULTS
First, standard beamforming was performed on the orig-
inal microphone array signals d
m
(n) to evaluate the
acoustical map and validate the incoming sound direc-
tions of the foreground and background sound sources
along with sound reections. The acoustical maps are
obtained using horizontal beamforming with a scan grid
dened by a 360
(with 1
).
This suggests that, besides being able to properly sepa-
rate the frequency content of the foreground and back-
ground sound as suggested by Fig. 7, the method is able
to preserve the phase and relative time-alignment infor-
mation between each of the microphones, for both sepa-
AES 55
TH
INTERNATIONAL CONFERENCE, Helsinki, Finland, 2014 August 2729
Page 5 of 8
Gauthier et al. Microphone arrays and proximity microphones
0 45 90 135 180 225 270 315 360
80
70
60
50
(a)
L
e
v
e
l
[
d
B
r
e
f
1
]
0 45 90 135 180 225 270 315 360
80
70
60
50
L
e
v
e
l
[
d
B
r
e
f
1
]
(b)
0 45 90 135 180 225 270 315 360
80
70
60
50
Steering direction []
L
e
v
e
l
[
d
B
r
e
f
1
]
(c)
Original scene
Original foreground
Extracted foreground
Original scene
Original scene
Original background
Extracted background
58.63 dB
64.31 dB
176.8 135
Fig. 5: (a): Acoustical map obtained with beamform-
ing of the original scene with foreground and background
sources. (b): Comparison of the acoustical maps of the
original foreground sound with the extracted foreground
sound. (c): Comparison of the acoustical maps of the
original background signals with extracted background
signals. Exact angular positions of sources with respect
to the array are shown as vertical dashed lines.
0 0.2 0.4 0.6 0.8 1 1.2
0.5
0
0.5
1
1.5
Time [s]
w
1
1
Fig. 6: Example of optimal FIR lter coefcients from
proximity microphone to array microphone #1.
10
1
10
0
100
80
60
40
20
P
o
w
e
r
/
f
r
e
q
u
e
n
c
y
[
d
B
/
H
z
]
Foreground sound at microphone #1 of the array
10
1
10
0
100
80
60
40
20
P
o
w
e
r
/
f
r
e
q
u
e
n
c
y
[
d
B
/
H
z
]
Background sound at microphone #1 of the array
10
1
10
0
100
80
60
40
20
Frequency [kHz]
P
o
w
e
r
/
f
r
e
q
u
e
n
c
y
[
d
B
/
H
z
]
Mixed sound at microphone #1 of the array
Original
Extracted
Original
Extracted
Original
Reconstructed
Fig. 7: Power spectral density of foreground and back-
ground signals separation for one microphone of the ar-
ray. Top: Original and extracted (y
11
(n)) foreground sig-
nals. Center: Original and extracted background (s
11
(n))
sound signals. Bottom: Original (d
1
(n)) and recon-
structed mixed (s
11
(n) +y
11
(n)) signals.
rated array signals. Accordingly, any subsequent spatial
microphone array processing (as in Fig. 1) on either ex-
tracted foreground or background signals should perform
adequately. Acloser look at Fig. 5(b) shows that the orig-
inal and extracted acoustical maps correspond less per-
fectly for two directions. The rst direction corresponds
to the background source angular position at 176.8
. The
second direction (between 270 and 315
) corresponds to
the lowest level of the map where it is assumed that this
discrepancy is less signicant and possibly introduced
through beamforming side lobes and background source
AES 55
TH
INTERNATIONAL CONFERENCE, Helsinki, Finland, 2014 August 2729
Page 6 of 8
Gauthier et al. Microphone arrays and proximity microphones
#2. In Fig. 5(c), the agreement between the original and
extracted background map is also good (-65.13 dB FS at
177
for
the extracted map). As for Fig. 5(a), in Fig. 5(c), the cor-
respondence is good except for the angular position of
the foreground source.
6.1. Investigation of near-eld effects
In the previous case, most of the signal sent to the fore-
ground source fully propagates to the microphone array
since there is no near-eld effect simulation in the ray-
tracing model. In order to evaluate the impact of near-
eld sound such as evanescent waves, a synthetically
generated tone at 93 Hz was articially added to the ref-
erence signal x
i
(n) with different levels, namely -26, -20
and -14 dB FS. The peak signal in the original reference
occurs at 58 Hz at -40 dB FS. Therefore the addition of
this additional near-eld signal is drastic with respect to
the original reference signal. The effect on the extracted
foreground signal at one of the microphone of the array
is shown in Fig. 8. Ideally, the extracted power spec-
tral density should not be inuenced by this simulated
near-eld effect. Note that the four extracted curves are
superimposed in Fig. 8, except at 93 Hz. Clearly, the ex-
traction of the foreground signal is less efcient at 93 Hz
where part of the articial near-eld signal (that does not
radiate to the microphone array) has been associated with
other environmental sound at the microphone array.
In order to attenuate this undesirable effect, an additional
lter is introduced for each mi path in w
mi
. This lter is
derived from the coherence measurement given by:
C
mi
(k) =|
S
x
i
d
m
(k)|
2
/
S
x
i
x
i
(k)
S
d
m
d
m
(k), (8)
on which a sigmoid function is applied
F
mi
(k) = 1/(1+e
c(
C
mi
(k)a)
), (9)
with 0
F
mi
(k) 1, where a represents the threshold co-
herence below which the lter will cut the signal and c is
the steepness of the sigmoid function, i.e. how rapidly
the gate is open around the threshold. Therefore,
F
mi
represents a frequency-domain gate that only let through
coherent signals from the i-th proximity microphone to
m-th microphone in the array. The nal lter is a multi-
plication of the optimal lter and this coherence lter
w
mi
(k) =
F
mi
(k)
S
x
i
d
m
(k)/
S
x
i
x
i
(k) 0 k L1. (10)
For this test, a was set to 0.33 and c was set to 10. The
result of this modied lter for foreground source sepa-
ration is illustrated in Fig. 9. Clearly, the articial peak
0.08 0.09 0.1 0.11 0.12
80
60
40
20
0
Frequency [kHz]
P
o
w
e
r
/
f
r
e
q
u
e
n
c
y
[
d
B
/
H
z
]
Foreground sound at microphone #1 of the array
Original
Extracted
Extracted, 93Hz @ 26dB
Extracted, 93Hz @ 20dB
Extracted, 93Hz @ 14dB
Fig. 8: Power spectral densities of foreground signal sep-
aration illustrated at the rst microphone of the array for
original scene with three cases of articial signal mixed
with the reference signal.
0.08 0.09 0.1 0.11 0.12
80
60
40
20
0
Frequency [kHz]
P
o
w
e
r
/
f
r
e
q
u
e
n
c
y
[
d
B
/
H
z
]
Foreground sound at microphone #1 of the array
Original
Extracted
Extracted, 93Hz @ 26dB
Extracted, 93Hz @ 20dB
Extracted, 93Hz @ 14dB
Fig. 9: Power spectral densities of foreground signal sep-
aration with optimal ltering combined with coherence
ltering illustrated at the rst microphone of the array for
original scene with three cases of articial signal mixed
with the reference signal.
at 93 Hz is reduced except for the most extreme case of
93 Hz at -14 dB FS. In all cases, the peak is lower than
without the additional coherence lter. However, part of
the extracted signal just above the 93 Hz peak is also al-
tered, but it is still closer to the original foreground sig-
nal. This avenue seems promising and should be further
investigated.
7. PRACTICAL CONSIDERATIONS
Although the method performs well over a large band-
width, it was observed that some leakage of the back-
ground sound environment in the proximity microphone
signal can degrade the performance of the source sep-
aration. For the reported case, this was enhanced by
the recordings spectral difference: At 26 Hz (the fre-
AES 55
TH
INTERNATIONAL CONFERENCE, Helsinki, Finland, 2014 August 2729
Page 7 of 8
Gauthier et al. Microphone arrays and proximity microphones
quency at which the method was less performing) the
background sound spectrumat the proximity microphone
was no less than 25 dB louder than the foreground sound
signal. If this highlights a limitation of the proposed
method, it also gives some hints for practical consid-
erations. First, the proximity microphone should be in
very close proximity of the source in order to increase
the foreground to background ratio. Second, directive
(cardioid or hyper-cardioid) microphones could be used
as proximity microphones to further reduce background
signals. Third, if the proximity microphone is close to
the source, the channel sensitivity can be reduced and
this should also attenuate the background sound leak-
age in the foreground sound signal. Finally, this may
not impact too much in practical situations since in many
cases placing a microphone very close to a sound source
will enhance the low-frequency content of this source
because of low-frequency evanescent waves in the near-
eld not taken into account in the reported simulations
(this is also known as proximity effect in audio engineer-
ing). Consequently, it is not expected that the method
would suffer greatly in real practical situation. Another
approach would be to use a vibration sensor on the fore-
ground source in place of a proximity microphone.
Once the foreground and background has been separated
at the microphone array, one has to spatially process this
information in order to drive conventional WFS algo-
rithms and virtual sources. The reproduction of back-
ground and surrounding sound environments could be
achieved using beamforming and plane wave reproduc-
tion, although a crossover might be necessary to intro-
duce more beams for the high frequencies than for the
low frequencies where main lobes are larger [6].
8. CONCLUSION
In this paper, a simple and efcient method based on
optimal lters, microphone array and proximity micro-
phone was proposed. The aim was to separate the mi-
crophone array signal in different parts. The rst part
is associated with foreground sound sources, identied
on site and recorded with proximity microphones. The
second part is associated with the remaining background
sound environment related to other background sound
sources and reected sound. The optimal lters can ex-
tract the foreground sound source signal at the micro-
phone array, i.e. excluding near eld in the close prox-
imity of the foreground source. The results show that
this separation is effective over a wide frequency band.
Acoustical maps of the original foreground and back-
ground signals were compared with acoustical maps of
the extracted signals. Their agreement demonstrates that
time-alignment and relative phase between microphone
signals (both informations are essential to the perfor-
mance of any subsequent array or spatial processing) is
preserved through the separation process. In order to in-
crease the separation performance, a ltering stage based
on coherence function was introduced. Investigations of
howthis performs for sound eld reproduction are a topic
of current research. In due course, several industrial en-
vironments will be recorded using a 196-channel micro-
phone array system where some of these channels will
be used as proximity microphones for up to 8 foreground
sources. The separated signals will then be reproduced
using a 96-channel WFS system.
9. REFERENCES
[1] Nicol R. and Emerit M., 3D-sound Reproduc-
tion Over an Extensive Listening Area: A Hybrid
Method Derived from Holophony and Ambisonic,
presented at the AES 16th International Confer-
ence, Rovaniemi, Finland, 1999.
[2] Ahrens J., Analytical Methods of Sound Field Syn-
thesis, Springer, Berlin, 2012.
[3] Hulsebos E., de Vries D., and Bourdillat E., Im-
proved Microphone Array Congurations for Au-
ralization of Sound Fields by Wave-Field Synthe-
sis, J. Audio Eng. Soc., vol. 50, no. 10, pp. 779
790 (2002 October).
[4] Gauthier P.-A., Camier C., Lebel F.-A., Pasco Y.,
and Berry A., Experiments of Sound Field Repro-
duction Inside Aircraft Cabin Mock-Up, presented
at the 133rd AES Convention, San Francisco, 2012.
[5] Elliott S., Signal Processing for Active Control,
Academic Press, San Diego, 2001.
[6] Hur Y., Abel J.S., Park Y.-C., and Youn D.H.,
Techniques for Synthetic Reconguration of Mi-
crophone Arrays, J. Audio Eng. Soc., vol. 59, no.
6, pp. 404418 (2011 June).
[7] Home of the Blender project, http://www.
blender.org, accessed on 2014 January 22th.
[8] E.A.R, http://www.explauralisation.org,
accessed on 2014 January 22th.
AES 55
TH
INTERNATIONAL CONFERENCE, Helsinki, Finland, 2014 August 2729
Page 8 of 8