You are on page 1of 13

CAN A GROUND-BASED VEHICLE HEAR THE SHAPE OF A ROOM?

MIREILLE BOUTIN AND GREGOR KEMPER

Abstract. Assume that a ground-based vehicle moves in a room with walls or other planar
arXiv:2204.00244v1 [math.AC] 1 Apr 2022

surfaces. Can the vehicle reconstruct the positions of the walls from the echoes of a single
sound event? We assume that the vehicle carries some microphones and that a loudspeaker is
either also mounted on the vehicle or placed at a fixed location in the room. We prove that the
reconstruction is almost always possible if (1) no echoes are received from floors, ceilings or
sloping walls and the vehicle carries at least three non-collinear microphones, or if (2) walls of
any inclination may occur, the loudspeaker is fixed in the room and there are four non-coplanar
microphones.
The difficulty lies in the echo-matching problem: how to determine which echoes come from
the same wall. We solve this by using a Cayley-Menger determinant. Our proofs use methods
from computational commutative algebra.

Introduction
This paper is concerned with the problem of reconstructing the position of walls and other
planar surfaces using echoes. More specifically, an omni-directional loudspeaker produces a short,
high frequency impulse. The echoes of the impulses are captured by some microphones. The
loudspeaker and the microphones are assumed to be synchronized. The walls are modeled using
mirror points, which are the reflections of the loudspeaker with respect to the planar surfaces.
The times of arrival of an echo give us the total distance travelled by the sound, which is equal
to the distance from the microphone to a mirror point. We are interested in determining to what
extent it is theoretically possible to reconstruct the wall positions from the distance measurements
obtained in this fashion. All measurements are assumed to be exact, and the computations are
assumed to be performed with infinite precision. Yet, even in this theoretical scenario, it may
be impossible, at least in some circumstances, to correctly reconstruct the wall positions. One of
the goal of our work is to narrow down the problematic cases and show that one can expect to
avoid them in some common application scenarios. Here, we are thinking of a vehicle carrying
the microphones and having the ability to move more or less freely on the ground, for example,
a robot rolling on the floor inside a warehouse.
In previous work [5], we considered the case of a drone or, more generally, a vehicle with all
six degrees of freedom (the three coordinate axes and yaw, pitch and roll). If there are four
microphones mounted on the vehicle and there is a loudspeaker either on the vehicle or at a fixed
location in the room, then our algorithm [5] reconstructs the position of every wall from which
an echo is received by all microphones. But it cannot be avoided that for certain “bad” vehicle
positions a “ghost wall”, one which is not really there, is detected. Our main result states that
the bad positions are rare. However, to get out of a possibly bad position the vehicle may need
to use all six degrees of freedom, including pitch and roll. For a drone this means that being in a
good position may result in receiving a thrust in some direction, so the drone cannot hover while
carrying out the wall reconstruction. So we are interested in investigating whether our results
carry over to a situation where the degrees of freedom are restricted to the coordinate axes and
yaw. Perhaps more interesting is the case of a ground-based vehicle. These are the scenarios
considered in this paper.
The restriction of degrees of freedom not only makes the mathematical arguments harder but
also yields somewhat weaker results. In fact, we show in Section 2 that there are “unlucky” wall

Date: April 4, 2022.


2010 Mathematics Subject Classification. 51K99, 13P10, 13P25.
Key words and phrases. Geometry from echoes, echo sorting, shape reconstruction.
1
2 MIREILLE BOUTIN AND GREGOR KEMPER

arrangements for which it is not true that almost all vehicle positions are good, if the degrees of
freedom are restricted as stated above. So the cases of a “hovering drone” and a ground-based
vehicle really are more difficult. The first main result (Theorem 3.1) deals with the case of a
ground-based vehicle and a loudspeaker placed at a fixed position in the room, and gives a precise
characterization of the wall arrangements for which it is not true that almost all vehicle positions
are good. These are the arrangements where an “unlucky stack of mirror points”, as defined
in Section 2, occurs, and they are themselves very rare. The second main result (Theorem 3.2)
treats the case of a hovering drone (four degrees of freedom) and, perhaps surprisingly, reaches
the same conclusion: the additional degree of freedom does not result in fewer exceptional wall
arrangements. The proofs of these two results are similar in spirit to the proofs in [5], and in
particular involve large computations in ideal theory done by computer. But apart from the
new aspect of exceptional wall arrangements, there arise further theoretical difficulties that are
explained after the statement of Theorem 3.1. Analyzing our proof method shows that for each
degree of freedom taken away, the final computation requires roughly two additional variables.
This is why computational bottlenecks prevented us from achieving general results in the case
that the loudspeaker is mounted on the vehicle.
In the first result of this paper, the vehicle moves in two dimensions, but still lives in three-
dimensional space, meaning that the walls it detects may be floors, ceilings, and sloping walls.
The last result (Theorem 4.1) of this paper concerns the truly two-dimensional case. In real-world
terms, this means that no echoes should be received from floors, ceilings, and sloping walls, since
they cannot be dealt with. This case is much easier, has no exceptional wall arrangements, only
requires three microphones, and also allows for the loudspeaker to be mounted on the vehicle,
which we were not able to address in a satisfactory way in three dimensions.
In this paper, a room is just a spacial arrangement of walls, and a wall is just a plane surface.
We only consider first order echoes, meaning that the sound has not bounced twice or more
often. A further assumption is that the sound frequency is high enough to justify the use of
ray acoustics. When we say that almost all vehicle positions are good, we mean that within the
configuration space of all vehicle positions, the bad ones form a subset of volume zero. In fact,
we use this language only if the bad positions are contained in a subvariety of lower dimension.
Intuitively “almost all” can be thought of as “with probability one.”
Acknowledgments. This work has benefited immensely from a research stay of the two
authors at the Banff International Research Station for Mathematical Innovation and Discovery
(BIRS) under the “Research in Teams” program. We would like to thank BIRS for its hospitality
and for providing an optimal working environment.

1. Related work
The relationship between the geometry of a room and the source-to-receiver acoustic impulse
response has long been a subject of interest, in particular for the design of concert halls with
desirable acoustic properties. For example, the two-point impulse response within rectangular
rooms was analyzed in [1]. The results were later extended to arbitrarily polyhedral rooms in
[3]. Subsequent work used the impulse response of a room to reconstruct its geometry, including
that of objects within the room (e.g. [14]).
Some reconstruction methods are focused on 2D room geometry (e.g., [7],2], [9]) while others
consider a more general but still constrained 3D geometry. For example, Park and Choi [15]
reconstruct the wall surface for a convex, polyhedral and bounded room using several loudspeak-
ers and microphones in known positions. Other methods are applicable to reconstructing of
arbitrarily planar surfaces in 3D (e.g. [8]).
Some methods reconstruct wall points directly (e.g., [6], [11], [19], [9], [15]) while others, like
us, reconstruct mirror points representing the reflection of a loudspeaker with respect to a wall
plane (e.g., [23], [13], [8], [20], [10]). Some, like us, assume that the microphones are synchronized
(e.g., [23], [8],[12], [18],), while others do not (e.g., [21], [17], [2]).
In certain setups, the microphones are placed on a vehicle such as a robot (e.g. [16]) or a drone
[5]. Recent work uses cell phones as a cheaper alternative. For example, Shih and Rowe [22]
place an omnidirectional speaker at a known position in a polyhedral room and use a cell phone
CAN A GROUND-BASED VEHICLE HEAR THE SHAPE OF A ROOM? 3

to record the echoes while Zhou et al. [24] assume a smart phone is held horizontally by a person
walking along rooms and corridors.
The vast majority of the work in the literature is focused on the numerical reconstruction of the
room, including the difficult problem of labeling echoes coming from the same surface. However,
from a theoretical standpoint, there is still a lot to be understood regarding the well-posedness of
different reconstruction setups, especially for non-generic room geometries or receiver and source
placements.

2. Unlucky stacks of mirror points


As mentioned above, the primary challenge in wall detection from echoes is the matching of
echoes heard by different microphones, i.e., determining which echoes come from the same wall
and which ones do not.
Let us recall an example from [5] where this matching inevitably goes wrong, leading to a
ghost wall. As it turns out, this type of example will play a crucial role in the present paper.
Figure 2.1 shows three microphones in a plane at positions m1 , m2 , m3 that hear echoes from

W2 W3

m3
m2
W1

m1
z

x
L
Wghost

Figure 2.1. The microphones mi think they are hearing echoes from the wall
Wghost . The dotted lines stand for sound rays, L for the loudspeaker.

three walls Wi , but the time elapsed between sound emission and echo detection is the same as if
they were hearing echoes from one single wall Wghost , which does not exist. This arises because
(1) the walls Wi are all horizontal, (2) the distances between mi and Wi are the same for all i,
and (3) each mi can hear the echo from Wi . It is easy to add a fourth microphone outside of the
drawing plane, together with a wall possibly also outside of the plane, such that (1)–(3) extend
to the fourth microphone and wall. (We find it harder to include that in our two-dimensional
sketch in Figure 2.1.) Since the echoes heard by the microphones behave precisely like echoes
from the single wall Wghost , any matching procedure will falsely assume that they have, in fact,
come from Wghost , giving rise to a ghost wall. The situation is particularly severe since the ghost
wall has some persistence properties. In fact, the conditions (1)–(3), and therefore the emergence
of a ghost wall, will be preserved if
(a) the microphones are moved, independently and within a certain range, in directions
parallel to the walls,
(b) the microphones are moved, together and within a certain range, along the z-axis, or, less
importantly,
(c) the loudspeaker is moved, within a certain range, in any direction.
Our sketch does not contain any vehicle that carries the microphones. Let us now imagine
there is such a vehicle, and that it is ground-based, in the sense that its movement (rotation and
translation) is restricted to the x-y-plane, where the y-axis is perpendicular to the drawing plane.
The vehicle may also be allowed to move up or down along the z-axis, like a hovering drone.
Then the persistence properties imply that within a certain range, vehicle movements will not
make the ghost wall disappear. For this it does not matter whether the loudspeaker is mounted
4 MIREILLE BOUTIN AND GREGOR KEMPER

on the vehicle or at a fixed position. In all cases, there is a region of positive volume within the
space of all vehicle movements, in which the vehicle remains in a bad position. So it is not true
that almost all vehicle positions are good. Thus here we hit a limitation to what can possibly be
shown for microphones mounted on ground-based vehicles, or vehicles with an additional degree
of freedom of moving up or down. (In comparison, a drone has two additional degrees of freedom:
pitch and roll.)
The following closer analysis will reveal a more general situation where it is not true that
almost all positions are good. Since we are using ray acoustics, an echo reflected at a wall arrives
with the same time delay as if it were emitted at what we call the mirror point, which is the
reflection of the loudspeaker position at the wall or, more precisely, at the plane containing the
wall surface. Figure 2.2 shows the mirror points si in the situation of Figure 2.1.

s3 s3

s2 s2

s1 s1
W3
W2
W1 m3 m3
m2 m2

m1 m1

L
Wghost
sghost sghost

Figure 2.2. Virtually, the sound comes from the mirror points si . So the
loudspeaker and walls might as well be left out of the sketch.

s3 s3

s2 s2

s1 s1
W3
W2
W1 m3 m3
m2 m2

m1 L m1

Wghost

sghost sghost

Figure 2.3. A different loudspeaker position and different walls produce the
exact same unlucky stack of mirror points as in Figure 2.2. But here the walls
are not horizontal.

If we have mirror points as in the right sketch of Figure 2.2, then we have the persistence
properties (a) and (b) of the ghost wall discussed above. For this, it is not necessary that the
CAN A GROUND-BASED VEHICLE HEAR THE SHAPE OF A ROOM? 5

loudspeaker be situated on the same vertical line as the mirror points, so the walls may be non-
horizontal. Figure 2.3 illustrates that. In fact, it is only if the loudspeaker is moving as well
that we need horizontal walls to achieve the persistence properties of the ghost wall. It is useful
to introduce a catchphrase for the situation shown in Figures 2.2 and 2.3, which is done in the
following definition. From now on, we will only consider the case in which the loudspeaker is at
a fixed position and not moving with the vehicle.
Definition 2.1. Assume that four microphones are mounted on a vehicle that can move within
the x-y-plane, and possibly up and down along the z-axis. Assume that the vehicle moves in
a scene that contains several flat walls and a loudspeaker at a fixed position. We say that the
microphones are deceived by an unlucky stack of mirror points if among the walls there
are four whose mirror points s1 , . . . , s4 are contained in a common vertical line, such that for
i, j ∈ {1, . . . , 4}, the z-coordinate of the vector sj − si is twice the z-coordinate of mj − mi . In
particular, this means that if for some i 6= j the positions mi and mj share the same z-coordinate,
then si = sj ; so for deception by an unlucky stack of mirror points, the mirror points (and the
walls) need not be pairwise distinct.
Assume that the microphones are deceived by an unlucky stack of mirror points s1 , . . . , s4 .
Then the reflection of si at the horizontal plane containing mi produces a point sghost that is
the same for all i. Therefore the distance between mi and si equals the distance between mi
and sghost . So if the vehicle is positioned in such a way that the ith microphone can hear the
echo of the ith wall, then a ghost wall, corresponding to sghost , will be detected. This justifies
our speaking of “deception.”

3. A ground based vehicle and a hovering drone


Before stating our main result it is useful to recall how the wall detection algorithm from [5]
works. The idea is to use a relation satisfied by the distances travelled by the sound that is
reflected at the same wall, but received by different microphones. This relation between the
squared distances d1 , . . . , d4 is
fD (d1 , . . . , d4 ) = 0, (3.1)
where for real numbers u1 , . . . , u4 ∈ R, and with Di,j the squared distance between the ith and
the jth microphone, the polynomial fD is defined as the Cayley-Menger determinant
 
0 u1 · · · u4 1
u1 D1,1 · · · D1,4 1
 
 ..  .
fD (u1 , . . . , u4 ) := det  ... ..
.
..
. . (3.2)
 
u4 D4,1 · · · D4,4 1
1 1 ··· 1 0
With this, we can state the wall detection algorithm as Algorithm 3.1.
There is no need here to be specific about what parameters are computed to describe a
wall detected by the algorithm. Algorithm 3.1 is guaranteed to detect every wall from which
a first order echo is heard by all four microphones. However, it may happen that the relation
fD (d1 , . . . , d4 ) = 0 is satisfied by accident even though the squared distances di come from sound
reflections at different walls. This can deceive the algorithm into outputting walls that do not
actually exist. As in [5], we call such walls ghost walls and we say that the vehicle carrying
the microphones (and possibly the loudspeaker) is in a bad position if at least one ghost wall is
detected. One instance where this happens is when the microphones are deceived by an unlucky
stack of mirror points, as discussed in the previous subsection.
Theorem 3.1 (A ground-based vehicle in a scene with a fixed loudspeaker). Assume that four
microphones are mounted on a ground-based vehicle, which can move on the ground plane within
a three-dimensional scene containing n walls and a loudspeaker at fixed positions. Assume the
microphones do not lie on a common plane. If the microphones are not deceived by an unlucky
stack of mirror points, according to Definition 2.1, then almost all vehicle positions are good.
6 MIREILLE BOUTIN AND GREGOR KEMPER

Algorithm 3.1 Detect walls from first-order echoes

Input: The delay times of the first-order echoes recorded by four microphones, and the distances
Di,j between the microphones.
1: For i = 1, . . . , 4, collect the recorded times of the first-order echoes recorded by the ith
microphone in the set Ti .
2: Set Di := {c2 (t − t0 )2 | t ∈ Ti } (i = 1, . . . , 4), where c is the speed of sound and t0 is the time
of sound emission.
3: for (d1 , d2 , d3 , d4 ) ∈ D1 × D2 × D3 × D4 do
4: With fD defined by (3.2), evaluate fD (d1 , . . . , d4 ).
5: if fD (d1 , . . . , d4 ) = 0 then
6: Use the geometric methods such as those given in [5, Proposition 1.1] to compute the
parameters describing the wall corresponding to the distances d1 , . . . , d4 .
7: Output the parameters of this wall.
8: end if
9: end for

Moreover, if l ∈ {2, 3, 4} is the number of distinct z-coordinates of the four microphone


positions, then within the (3n)-dimensional configuration space of all wall arrangements, the
ones where an unlucky stack of mirror points occurs are contained in a subvariety of codimen-
sion 3(l − 1).
Before giving the proof, we make a remark that should explain why some additional difficulties
arise compared to our proofs in [5]. In [5] we used the notion of a “very good position,” which
we briefly recall now. Let W be the (finite) set of walls in our room. The mirror points are given
by s = ref W (L), the reflection of the loudspeaker position at a wall W ∈ W. We call a vehicle
position very good if the following holds: for four walls W1 , . . . , W4 ∈ W the relation

fD kref W1 (L) − m1 k2 , . . . , kref W4 (L) − m4 k2 = 0
is satisfied only if W1 = W2 = W3 = W4 . (Of course, the above expression depends on the
vehicle position because the mi do.) It is easy to see, and formally proved in [5], that a very
good position is good. The utility of this concept lies in the fact that the very good positions
form a Zariski-open set, so for proving “allmost all” statements it suffices to show that for every
wall arrangement there exists at least one very good position. This turned out to be possible in
the cases considered in [5], where the microphones are mounted on a drone with six degrees of
freedom. However, for microphones on a ground-based vehicle, it may happen that no very good
position exists, but nevertheless all positions are good. Figure 3.1 shows a wall arrangement and
microphone configuration (again in dimensions two) where this happens.
In fact, if the vehicle carrying the microphones and possibly also the loudspeaker moves in
the x-y-plane, m1 will remain exactly in the middle between the walls, so m1 receives the echoes
from both walls simultaneously. This implies

fD kref W2 (L) − m1 k2 , kref W1 (L) − m2 k2 , kref W1 (L) − m3 k2 =

fD kref W1 (L) − m1 k2 , kref W1 (L) − m2 k2 , kref W1 (L) − m3 k2 = 0,
since in the middle expression all echoes come from the same wall. But since W1 6= W2 , this
means that the vehicle is not in a very good position. Nevertheless, the wall detection algorithm
will not detect a ghost wall in this situation, so the position is good. So indeed Figure 3.1 provides
an example where no very good position exists for a ground-based vehicle, but all positions are
good.
We must be able to deal with such situations in our proof. To this end, we will introduce the
notion of a “really good position,” which encapsulates the idea that if the position is not very
good, then this is because of simultaneous arrivals of echoes. The definition of this notion will
be given in the proof of Theorem 3.1, where we will also see that a really good position is good.
(That very good implies really good will be obvious.) The problem with really good positions is
CAN A GROUND-BASED VEHICLE HEAR THE SHAPE OF A ROOM? 7

W2

m2

z L m1

x m3

W1

Figure 3.1. The echoes from both walls arrive at m1 simultaneously. No ghost
wall is detected.

that, unlike very good positions, they do not form a Zariski-open set. So the proof has to take
care of this difficulty.
Proof of Theorem 3.1. Let us first prove the last statement about the codimension of the wall
arrangements with unlucky stacks of mirror points. To have such a stack, l walls are needed to
satisfy the restrictions of Definition 2.1. These restrictions leave 3 degrees of freedom for the
mirror points of these l walls, which gives an affine subspace of codimension 3l − 3 = 3(l − 1).
The wall arrangements with unlucky stacks are contained in the (finite) union of these subspaces
associated to each choice of l walls, so we obtain the claimed codimension. Having thus shown
the last statement, let us now turn to proving the main statement.
A configuration of microphones on the vehicle is given by initial positions mini 3
i ∈ R . For a
a1,1 a1,2 a1,3  2×3 a1,1 a1,2  2×3
matrix A = a2,1 a2,2 a2,3 ∈ R such that a2,1 a2,2 ∈ R is orthogonal with determinant 1,
consider the map
   
a1,1 a1,2 0 a1,3
ϕ = ϕA : R3 → R3 , v 7→ a2,1 a2,2 0 · v + a2,3  , (3.3)
0 0 1 0
and write ASO(2) for the group of all such maps. This group describes the possible vehicle
positions. If a vehicle position is given by ϕ ∈ ASO(2), then the microphone positions are
ϕ(mini
i ). The correspondence A ↔ ϕA makes ASO(2) into an affine variety in R .
6

Let W be the (finite) set of walls from our room. In this proof we identify the walls with
the planes containing them. The mirror points are given by s = ref W (L), the reflection of the
loudspeaker position at a wall W ∈ W. We call a group element ϕ ∈ ASO(2) really good if the
following holds: for four (not necessarily distinct) walls W1 , . . . , W4 ∈ W the relation

fD kref W1 (L) − ϕ(mini 2 ini 2
1 )k , . . . , kref W4 (L) − ϕ(m4 )k =0
is satisfied only if there is an i ∈ {1, . . . , 4} such that
k ref Wj (L) − ϕ(mini ini
j )k = k ref Wi (L) − ϕ(mj )k for all j ∈ {1, . . . , 4}.

(The index i on the right hand side is not a typo, but the whole point of the condition.) It follows
immediately that if ϕ (or the vehicle position given by ϕ) is very good, then it is really good.
More important is the following claim.
Claim 1. If ϕ ∈ ASO(2) is really good, then the vehicle position given by ϕ is good.
Indeed, with the notation of Algorithm 3.1, let (d1 , . . . , d4 ) ∈ D1 × · · · × D4 . For each j there
exists a wall Wj ∈ W such that dj = kref Wj (L) − ϕ(mini 2
j )k . If the algorithm detects a wall from
8 MIREILLE BOUTIN AND GREGOR KEMPER

the tuple (d1 , . . . , d4 ) , then fD (d1 , . . . , d4 ) = 0, so by hypothesis we have an i such that


k ref Wi (L) − ϕ(mini 2 ini 2
j )k = k ref Wj (L) − ϕ(mj )k = dj for all j. (3.4)
Step 6 of the algorithm computes a wall whose mirror point s satisfies ks − ϕ(mini
j )k
2
= dj for
all j, so (3.4) implies
1 
s − ref Wi (L), ϕ(mini
j ) = ksk2 − kref Wi (L)k2 ,
2
where h·, ·i denotes the standard scalar product. This holds for every ϕ(minij ), so if s−ref Wi (L) 6=
0 this would imply that the vectors ϕ(mj ) lie on a common affine plane. But the mini
ini
j and
therefore also the ϕ(mini
j ) are not coplanar, so s = ref Wi (L). This means that the wall computed
by the algorithm is Wi , a wall that actually exists. Therefore the algorithm detects no ghost
walls.
Claim 2. Let s1 , . . . , s4 ∈ R3 be vectors that do not form an unlucky stack. If

fD ks1 − ϕ(mini 2 ini 2
1 )k , . . . , ks4 − ϕ(m4 )k = 0 for all ϕ ∈ ASO(2), (3.5)
then there exists an i ∈ {1, . . . , 4} such that
ksj − ϕ(mini ini
j )k = ksi − ϕ(mj )k for all ϕ ∈ ASO(2) and all j ∈ {1, . . . , 4}. (3.6)
Before proving the claim, we show that it implies the theorem. For W1 , . . . , W4 ∈ W, consider
the set
 
UW1 ,...,W4 := ϕ ∈ ASO(2) | fD kref W1 (L) − ϕ(mini 2 ini 2
1 )k , . . . , kref W4 (L) − ϕ(m4 )k 6= 0 .

Since fD kref W1 (L) − ϕ(mini 2 ini 2
1 )k , . . . , kref W4 (L) − ϕ(m4 )k depends polynomially on the coef-
ficients of the matrix A defining ϕ, and since ASO(2) is an irreducible variety, it follows that if
UW1 ,...,W4 6= ∅, then its complement in ASO(2) has dimension strictly less than dim(ASO(2)) = 3.
Therefore the intersection \
U := UW1 ,...,W4
W1 ,...,W4 ∈W such that
UW1 ,...,W4 6=∅

has a complement of dimension ≤ 2. To prove the theorem, it suffices to show that all ϕ ∈ U
are good under the hypothesis that there is no unlucky stack of mirror points. So let ϕ ∈
U. By Claim 1 it is enough to show that ϕ is really  good, so let W1 , . . . , W4 ∈ W such that
fD kref W1 (L)−ϕ(mini 1 )k 2
, . . . , kref W4 (L)−ϕ(m ini 2
4 )k = 0. Since ϕ ∈ U, this implies UW1 ,...,W4 =
∅. Now we apply Claim 2 (which we are assuming to be true), to si := ref Wi (L). This tells us
that there is an i such that ksj − ϕ(mini ini
j )k = ksi − ϕ(mj )k holds for all j. (The claim makes
this assertion for all ϕ ∈ ASO(2), but we only need it for ϕ′ = ϕ.) But this is just what it means

for ϕ to be really good.


So we are left with proving Claim 2. Thus we are given four non-coplanar vectors mini i and
four vectors s1 , s2 , s3 , s4 ∈ R3 that do not form an unlucky stack. To make the computations in
the final part of the proof feasible, we “preprocess” the given data. For this, we use the group G
of all maps
ψ = ψσ,α,v0 : R3 → R3 , v 7→ α · σ(v) + v0
where σ is an orthogonal map sending the x-y-plane to itself, 0 6= α ∈ R, and v0 ∈ R3 . It is
crucial but straightforward to check that G normalizes ASO(2), i.e., ψ −1 ϕψ ∈ ASO(2) for ψ ∈ G
and ϕ ∈ ASO(2).
Now assume that we have chosen a suitable ψ = ψσ,α,v0 ∈ G and a ϕ0 ∈ ASO(2) such that we
can prove Claim 2 for e si := ψ(si ) and m e ini ini
i := (ϕ0 ◦ ψ)(mi ). To show that Claim 2 then also
follows for the original si and mini
i , we first verify that the e
s i do not form an unlucky stack. First,
if two of the si have different x- or y-coordinates, then the same is true for the e si . Moreover, the
z-coordinate of e si − e
sj is ±α times the z-coordinate of si − sj ; and, likewise, the z-coordinate
of me ini
i −m e ini ini ini
j is ±α times the z-coordinate of mi − mj . It follows that indeed the e si do not
form an unlucky stack.
CAN A GROUND-BASED VEHICLE HEAR THE SHAPE OF A ROOM? 9


Now assume that fD ks1 − ϕ(mini 2 ini 2
1 )k , . . . , ks4 − ϕ(m4 )k = 0 for all ϕ ∈ ASO(2). We need
to deduce the assertion (3.6) of Claim 2 from this. Let ϕ′ ∈ ASO(2). Then ϕ := ψ −1 ϕ′ ϕ0 ψ ∈
ASO(2) and ϕ′ (me ini ini
i ) = (ψ ◦ ϕ)(mi ). For i, j ∈ {1, . . . , 4} we obtain
 2  2
si − ϕ′ (m
ke e ini 2 ini
j )k = ψ si − ϕ(mj ) = α2 σ si − ϕ(mini
j ) = α2 ksi − ϕ(mini 2
j )k . (3.7)

For the Di,j = kϕ(mini ini 2


i ) − ϕ(mj )k that go into the polynomial fD , the same calculation shows
e i,j := kϕ′ (m
e ini ′ e ini 2 2
D i ) − ϕ (m j )k = α Di,j . With (3.7) we obtain

 
0 de1 ··· de4 1
e e e 1,4 1
d1 D1,1 · · · D 
  . . . . 
ke ′
s1 − ϕ (m ini 2
e 1 )k , . . . , ke ′
s4 − ϕ (m ini 2 
e 4 )k = det  .. .. .. .. 
fDe =
 
e e
d4 D4,1 · · · D4,4 1e
1 1 ··· 1 0
 
0 d1 ··· d4 1
d1 D1,1 · · · D1,4 1
 

α8 det  ... .. .. ..  = α8 f ks − ϕ(mini )k2 , . . . , ks − ϕ(mini )k2  = 0.
 . . .
D 1 1 4 4
d4 D4,1 · · · D4,4 1
1 1 ··· 1 0

Since we are assuming Claim 2 for the e si and me ini


i , there is an i ∈ {1, . . . , 4} such that ke
sj −
′ e ini ′ e ini ′
ϕ (mj )k = kesi − ϕ (mj )k for all j and all ϕ ∈ ASO(2).
Now let ϕ ∈ ASO(2) and set ϕ′ := ψϕψ −1 ϕ−1 0 ∈ ASO(2). Then (3.7) yields

−1
ksj − ϕ(mini
j )k = |α| sj − ϕ′ (m
ke e ini
j )k = |α|
−1
si − ϕ′ (m
ke e ini ini
j )k = ksi − ϕ(mj )k

for all j, so indeed Claim 2 follows for the si and mini i if it is true for the esi = ψ(si ) and
me ini
i = (ϕ0 ◦ ψ)(m ini
i ).  si,1 
We use this to simplify the si and mini
i in five steps. First, writing s i = si,2
si,3
and mini
i =
 mi,1   
−s1,1
mi,2 , we use ψ = ψid,1,v ∈ G with v0 = −s1,2 . Then es1 has x- and y-coordinates 0, and
m 0
i,3 −m1,3
me ini ini
1 has z-coordinate 0. Hence without loss we may assume these things of s1 and m1 . The
3×4
second step comes from a QR-decomposition if the top two rows of S := (si,j ) ∈ R . We have
 
s2,1 s3,1 s4,1
= QR
s2,2 s3,2 s4,2

with Q ∈ O(2) orthognal and R ∈ R2×3 upper triangular. Now forming σ ∈ O(3) with Q−1 as
upper left part, and using ψσ,1,0 , we may assume s2,2 = 0. A bit more can be done: if the upper
2 × 4-part of S is nonzero and k ∈ {2, 3, 4} is the number of the first nonzero column, then we
may assume sk,1 6= 0 and sk,2 = 0. As a third step, we use ψid,α,0 with α = s−1
k,1 . This means we
can additionally assume sk,1 = 1. Summing up the first three steps, we may assume
 
 0 1 b1 b2
S = s1 s2 s3 s4 =  0 0 b3 b4  (3.8)
b5 b6 b7 b8

with bi ∈ R. (This is for the case k = 2; for k = 3 or 4, the upper part has more zeroes, and as
a last case, which we indicate by setting k := 5, it may be all zeroes.) For the mini
i we have only
achieved m1,3 = 0, but so far we have only used transformations where ϕ0 = id. Now, as the
fourth step, we set ϕ0 ∈ ASO(2) to be the translation by the vector −m1 . This yields m e ini
1 = 0,
so we may assume mini 1 = 0. The last step uses a QR-decomposition of the upper 2 × 4-part of
Mini := (mi,j ) ∈ R3×4 as we did before for S. (Notice that Q can be assumed special orthogonal.)
10 MIREILLE BOUTIN AND GREGOR KEMPER

So finally we may assume


 
 0 c1 c2 c3
Mini = mini
1 mini
2 mini
3 m4 ini
= 0 0 c4 c5  (3.9)
0 c6 c7 c8
 c1 c2 c3 
with ci ∈ R. The hypothesis that the mini i are not coplanar translates into det c6 c47 c58
0 c c 6= 0.
a1,1 a1,2 a1,3  2×3 a a  2×3
For A = a2,1 a2,2 a2,3 ∈ R with a2,1 a2,2 ∈ R
1,1 1,2
orthogonal of determinant 1, and for S
and Mini as in (3.8) and (3.9), there are polynomials F (x1,1 , . . . , x2,3 , y1 , . . . , y8 , z1 , . . . , z8 ) and
Fi,j (x1,1 , . . . , x2,3 , y1 , . . . , y8 , z1 , . . . , z8 ) in 22 indeterminates such that
ksi − ϕA (mini 2
j )k = Fi,j (a1,1 , . . . , a2,3 , b1 , . . . , b8 , c1 , . . . , c8 ) (i, j ∈ {1, . . . , 4})

and

fD ks1 − ϕA (mini 2 ini 2
1 )k , . . . , ks4 − ϕA (m4 )k = F (a1,1 , . . . , a2,3 , b1 , . . . , b8 , c1 , . . . , c8 ).
(In the case where k ≥ 3, there are fewer bi and hence fewer yi .) Let I ⊂ R[x1,1 , . . . , x2,3 ] be the
ideal generated by x1,1 − x2,2 , x1,2 + x2,1 and x22,1 + x22,2 . This is the vanishing ideal of ASO(2)
as a subvariety of R6 . Choose a Gröbner basis G of I with respect to an arbitrary monomial
ordering, such as the ideal basis given above, and consider the normal forms Fe := NFG (F ) and
Fei,j := NFG (Fi,j ). Let J ⊆ R[y1 , . . . , y8 , z1 , . . . , z8 ] be the ideal generated by the coefficients of
Fe , viewed as a polynomial in x1,1 , . . . , x2,3 with coefficients in R[y1 , . . . , y8 , z1 , . . . , z8 ]. Moreover,
for i = 1, . . . , 4, let Di ⊆ R[y1 , . . . , y8 , z1 , . . ., z8 ] be the ideals generated by the coefficients of all
z1 z2 z3 
Fei,j − Fej,j (j = 1, . . . , 4). Also set d := det 0 z4 z5 .
z6 z7 z8

Claim 3. If there is an r such that for each k ∈ {2, . . . , 5}


r
(d) · D1 · · · D4 ⊆ J in the case k ≤ 4 (3.10)
or
r
(y6 − y5 − 2z6 , y7 − y5 − 2z7 , y8 − y5 − 2z8 ) · (d) · D1 · · · D4 ⊆J in the case k = 5, (3.11)
then Claim 2 and therefore the theorem follow. Recall that k is the number of the first nonzero
column of the upper 2 × 4-submatrix of S, with k = 5 indicating that this submatrix is zero.
In fact, we have already seen that to prove Claim 2, we may assume the si and mini i to be given
by (3.8) and (3.9). Assume that the assertion (3.6) of Claim 2 is not true, so for every i = 1, . . . , 4
there is a j ∈ {1, . . . , 4} and a ϕ = ϕA ∈ ASO(2) such that ksj − ϕA (mini
j )k =6 ksi − ϕA (minij )k.
Then
0 6= Fi,j (a1,1 , . . . , a2,3 , b1 , . . . , b8 , c1 , . . . , c8 ) − Fj,j (a1,1 , . . . , a2,3 , b1 , . . . , b8 , c1 , . . . , c8 ).
Since Fi,j − Fei,j vanishes at xi,j = ai,j , we may replace Fi,j − Fj,j in the above inequality by
Fei,j − Fej,j . So there is a j such that some coefficient of Fei,j − Fej,j , viewed as a polynomial in
x1,1 , . . . , x2,3 , does not vanish when evaluated at yi = bi and zi = ci . This coefficient is one of
the generators of Di , so each Di contains an element that does not vanish at yi = bi and zi = ci .
Moreover, d does not vanish when evaluated at zi = ci , and we might as well also evaluate it
at yi = bi since the yi do not occur in d. So in the case k ≤ 4, (3.10) implies that there exists an
element of J that does not vanish when evaluated at yi = bi and zi = ci .
On the other hand, if k = 5, then all si share the same x- and y-coordinates, so the hypothesis
that they do not form an unlucky stack translates into bi − b5 6= 2ci for some i ∈ {6, 7, 8}. So
one of the generators of the first ideal in the product in (3.11) does not vanish when evaluated
at yi = bi . In this case (3.11) yields the same conclusion: that J contains an element that does
not vanish when evaluated at yi = bi and zi = ci .
Because of the way J was constructed, this means that at least one coefficient of Fe does not
vanish at yi = bi and zi = ci , so
Fe(x1,1 , . . . , x2,3 , b1 , . . . , b8 , c1 , . . . , c8 ) 6= 0.
CAN A GROUND-BASED VEHICLE HEAR THE SHAPE OF A ROOM? 11


The linearity of the normal form map implies NFG F (x1,1 , . . . , x2,3 , b1 , . . . , b8 , c1 , . . . , c8 ) =
Fe (x1,1 , . . . , x2,3 , b1 , . . . , b8 , c1 , . . . , c8 ), so we conclude that F (x1,1 , . . . , x2,3 , b1 , . . . , b8 , c1 , . . . , c8 )
has nonzero normal form and therefore does not lie in I. Since I is the vanishing ideal of ASO(2),
this means that there exists a ϕ = ϕA ∈ ASO(2) such that

0 6= F (a1,1 , . . . , a2,3 , b1 , . . . , b8 , c1 , . . . , c8 ) = fD ks1 − ϕ(mini 2
1 )k , . . . , ks4 − ϕ(m4 )k .
ini 2

So the hypothesis (3.5) of Claim 2 is not true under the assumption that the assertion is not
true. So indeed Claim 2 follows from (3.10) and (3.11).
It remains to verify (3.10) and (3.11), and this can be checked with the help of a computer.
For the computation, we used MAGMA [4] and proceeded as follows:
• We ran the following steps for each k ∈ {2, . . . , 5}. We will not introduce any notation
to indicate the cases for different k below.
• It is straightforward to compute the polynomials F , Fi,j , Fe , and Fei,j according to their
definitions, and to pick out the sets of coefficients C, Ci,j ⊂ R[y1 , . . . , y8 , z1 , . . . , z8 ] of Fe
and the Fei,j .
• Using an additional indeterminate t, we computed the set C hom of homogenizations of
the polynomials in C with respect to t.
• We computed truncated Gröbner bases Ghom of the ideal generated by C hom of rising
degree. 
• For each degree we computed the normal form NFGhom ti dj for all i and j such that ti dj
has the degree up to which the Gröbner basis was computed. If at least one of the normal
forms is zero, this shows that ti dj is an R[y1 , . . . , y8 , z1 , . . . , z8 , t]-linear combination of
the polynomials in C hom . Setting t = 1 shows that dj ∈ J, so (3.10) holds with r = j.
As it turned out, this was successful for all k ≤ 4, and we never needed to go beyond
degree 11. This means that in these cases the ideals Di are not actually needed in (3.10).
• Finally we dealt with the case k = 5. Here, thanks to the reduced number of veriables,
we were able to directly compute the ideal J0 := J : (d)∞ , and then Ji := Ji−1 : Di∞
for i = 1, . . . , 4. The result is J4 = (y6 − y5 − 2z6 , y7 − y5 − 2z7 , y8 − y5 − 2z8 ), which
shows (3.11).
The total computation time less than two minutes. 
Theorem 3.2 (A hovering drone and a fixed loudspeaker). Assume the same situation as in
Theorem 3.1, except that the vehicle can take positions in the same way as a hovering drone.
Then the assertions from Theorem 3.1 hold.
Proof. The proof is exactly like the one of Theorem 3.1. The only difference is that instead of
ASO(2) we need to consider the group of all ϕ as in (3.3), but with the last vector having a third
component a3,3 instead of zero. The computer computations have to be run with modified input,
and take about four minutes to finish. Optimizing the “preprocessing” of the data for the group
used in this proof would probably shorten the computation time. 

4. The two-dimensional case


In this section we briefly consider the purely two-dimensional case: a vehicle moves in the
plane, and all reflecting walls are also in this plane. A physical example of such a scene is a robot
navigating in a warehouse, with no echoes coming back from either the ceiling or the floor, nor
from any inclined walls. In this case only three microphones on the vehicle are required. The
loudspeaker can be at a fixed position, but will more typically be mounted on the vehicle. In
contrast to the case of a ground-based vehicle moving in 3D, we can also handle the case of a
mounted loudspeaker here. In fact, this is the two-dimensional variant of the situation considered
in [5], and everything in that paper carries over directly to other dimensions. This includes the
relation, given as (3.1) in this paper, and the wall detection algorithm. Moreover, the MAGMA-
programs used for the computational verifications in [5] were written for general dimension. As
it turn out, if the dimension is set to 2, the entire computations require only about one second.
(They take about five minutes for the three-dimensional case considered in [5].) Notice that for
12 MIREILLE BOUTIN AND GREGOR KEMPER

a two-dimensional modelling to be admissible, the microphones and the loudspeaker need to be


contained in a common plane, which is parallel to the plane of motion. The following result
emerges.
Theorem 4.1 (A vehicle in a two-dimensional scene). Assume a vehicle can move in a two-
dimensional scene, which contains a finite number of walls. Assume a loudspeaker is either
placed at a fixed position in the scene or mounted on the vehicle, and that three microphones are
mounted on the vehicle, but do not lie on a common line. In any case, the microphones and the
loudspeaker need to be in a common plane, which is parallel to the plane of motion. Then almost
all vehicle positions are good, again in the sense that all walls whose echoes are heard by every
microphone are detected, but no ghost walls are detected.
In the two-dimensional situation the issue of unlucky stacks does not occur.

References
[1] JB Alien, DA Berkley, Image method for efficiently simulating small-room acoustics, The Journal of the
Acoustical Society of America 60(S1) (1976), S9–S9.
[2] Fabio Antonacci, Jason Filos, Mark RP Thomas, Emanuël AP Habets, Augusto Sarti, Patrick A Naylor,
Stefano Tubaro, Inference of room geometry from acoustic impulse responses, IEEE Transactions on Audio,
Speech, and Language Processing 20(10) (2012), 2683–2695.
[3] Jeffrey Borish, Extension of the image model to arbitrary polyhedra, The Journal of the Acoustical Society
of America 75(6) (1984), 1827–1836.
[4] Wieb Bosma, John J. Cannon, Catherine Playoust, The Magma Algebra System I: The User Language,
J. Symb. Comput. 24 (1997), 235–265.
[5] Mireille Boutin, Gregor Kemper, A Drone Can Hear the Shape of a Room, SIAM J. Appl. Algebra Geometry
4 (2020), 123–140.
[6] Antonio Canclini, Fabio Antonacci, Mark RP Thomas, Jason Filos, Augusto Sarti, Patrick A Naylor, Stefano
Tubaro, Exact localization of acoustic reflectors from quadratic constraints, in: 2011 IEEE Workshop on
Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 17–20, IEEE, 2011.
[7] Ivan Dokmanić, Yue M Lu, Martin Vetterli, Can one hear the shape of a room: The 2-D polygonal case, in:
2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 321–324,
IEEE, 2011.
[8] Ivan Dokmanić, Reza Parhizkar, Andreas Walther, Yue M. Lu, Martin Vetterli, Acoustic echoes reveal room
shape, Proceedings of the National Academy of Sciences 110 (2013).
[9] Youssef El Baba, Andreas Walther, Emanuël AP Habets, Reflector localization based on multiple reflection
points, in: 2016 24th European Signal Processing Conference (EUSIPCO), pp. 1458–1462, IEEE, 2016.
[10] Youssef El Baba, Andreas Walther, Emanuël AP Habets, 3D room geometry inference based on room impulse
response stacks, IEEE/ACM Transactions on Audio, Speech, and Language Processing 26(5) (2017), 857–
872.
[11] Jason Filos, Antonio Canclini, Mark RP Thomas, Fabio Antonacci, Augusto Sarti, Patrick A Naylor, Robust
inference of room geometry from acoustic measurements using the Hough transform, in: 2011 19th European
Signal Processing Conference, pp. 161–165, IEEE, 2011.
[12] Ingmar Jager, Richard Heusdens, Nikolay D Gaubitch, Room geometry estimation from acoustic echoes
using graph-based echo labeling, in: 2016 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp. 1–5, IEEE, 2016.
[13] Edwin Mabande, Konrad Kowalczyk, Haohai Sun, Walter Kellermann, Room geometry inference based on
spherical microphone array eigenbeam processing, The Journal of the Acoustical Society of America 134(4)
(2013), 2773–2789.
[14] Marco Moebus, Abdelhak M Zoubir, Three-dimensional ultrasound imaging in air using a 2D array on a fixed
platform, in: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07,
vol. 2, pp. II–961, IEEE, 2007.
[15] Sooyeon Park, Jung-Woo Choi, Iterative Echo Labeling Algorithm With Convex Hull Expansion for Room
Geometry Estimation, IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021),
1463–1478.
[16] Fangrong Peng, Tiexing Wang, Biao Chen, Room shape reconstruction with a single mobile acoustic sensor,
in: 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 1116–1120, IEEE,
2015.
[17] Marc Pollefeys, David Nister, Direct computation of sound and microphone locations from time-difference-
of-arrival data, in: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.
2445–2448, IEEE, 2008.
[18] Tilak Rajapaksha, Xiaojun Qiu, Eva Cheng, Ian Burnett, Geometrical room geometry estimation from room
impulse responses, in: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pp. 331–335, IEEE, 2016.
CAN A GROUND-BASED VEHICLE HEAR THE SHAPE OF A ROOM? 13

[19] Luca Remaggi, Philip JB Jackson, Wenwu Wang, Jonathon A Chambers, A 3D model for room boundary
estimation, in: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
pp. 514–518, IEEE, 2015.
[20] Luca Remaggi, Philip JB Jackson, Philip Coleman, Wenwu Wang, Acoustic reflector localization: Novel
image source reversion and direct localization methods, IEEE/ACM Transactions on Audio, Speech, and
Language Processing 25(2) (2016), 296–309.
[21] Jan Scheuing, Bin Yang, Disambiguation of TDOA Estimation for Multiple Sources in Reverberant Envi-
ronments, IEEE Transactions on Audio, Speech, and Language Processing 16(8) (2008), 1479–1489.
[22] Oliver Shih, Anthony Rowe, Can a phone hear the shape of a room?, in: 2019 18th ACM/IEEE International
Conference on Information Processing in Sensor Networks (IPSN), pp. 277–288, IEEE, 2019.
[23] Sakari Tervo, Timo Tossavainen, 3D room geometry estimation from measured impulse responses, in: 2012
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 513–516, IEEE,
2012.
[24] Bing Zhou, Mohammed Elbadry, Ruipeng Gao, Fan Ye, BatMapper: Acoustic sensing based indoor floor plan
construction using smartphones, in: Proceedings of the 15th Annual International Conference on Mobile
Systems, Applications, and Services, pp. 42–55, 2017.

Department of Mathematics, Purdue University, 150 N. University St., West Lafayette, IN, 47906,
USA
Email address: mboutin@purdue.edu

Technische Universität München, Zentrum Mathematik - M11, Boltzmannstr. 3, 85748 Garching,


Germany
Email address: kemper@ma.tum.de

You might also like