You are on page 1of 15

1

Log-GPIS-MOP: A Unified Representation for


Mapping, Odometry and Planning
Lan Wu1 , Ki Myung Brian Lee1 , and Teresa Vidal-Calleja1

Abstract—Whereas dedicated scene representations are re-


quired for each different tasks in conventional robotic sys-
tems, this paper demonstrates that a unified representation
can be used directly for multiple key tasks. We propose the
arXiv:2206.09506v1 [cs.RO] 19 Jun 2022

Log-Gaussian Process Implicit Surface for Mapping, Odometry


and Planning (Log-GPIS-MOP): a probabilistic framework for
surface reconstruction, localisation and navigation based on a
unified representation. Our framework applies a logarithmic
transformation to a Gaussian Process Implicit Surface (GPIS)
formulation to recover a global representation that accurately
captures the Euclidean distance field with gradients and, at
the same time, the implicit surface. By directly estimate the
distance field and its gradient through Log-GPIS inference,
the proposed incremental odometry technique computes the
optimal alignment of an incoming frame, and fuses it globally
to produce a map. Concurrently, an optimisation-based planner
computes a safe collision-free path using the same Log-GPIS
surface representation. We validate the proposed framework on
simulated and real datasets in 2D and 3D and benchmark against
the state-of-the-art approaches. Our experiments show that Log-
GPIS-MOP produces competitive results in sequential odometry,
surface mapping and obstacle avoidance.
Index Terms—Gaussian Process Implicit Surfaces, Euclidean
Distance Field, Mapping, Localisation, Planning, SLAM.

Fig. 1. The proposed framework Log-GPIS-MOP incrementally builds the


I. I NTRODUCTION distance field (yellow to blue colour map) and estimates the implicit surface
Simultaneous localisation and mapping (SLAM) is a well- (red) of the first loop around the Intel Lab [7]. Our incremental odometry and
planning use the distance and gradients of the Log-GPIS as input to compute
known problem in robotics [1], which concerns reconstructing the current alignment and the white optimal trajectory for the robot to avoid
or building a map of an unknown environment while simulta- colliding with the surfaces respectively.
neously tracking the location of a robot-sensor system within
the map. Using the built map and the estimated location, a
ability to reconstruct a sparse map while self localising and
path planner may produce collision-free paths for the robot to
path planning in the unknown environment. Although sparse
navigate safely. Each different task raises varying requirements
feature-based representations have been widely used for rep-
leading to dedicated environment representations. The choice
resenting geometric structure and proven advantageous for
of scene representation strongly affects the performance of the
localisation [3], they are not suitable for accurate depiction
localisation, mapping, and path planning tasks.
of the environment or path planning, because dense maps are
For instance, motion estimation favours sparse representa-
necessary, and it is difficult to reconstruct dense maps from
tions such as the locations of features in the environment,
sparse features. As an alternative, several approaches [4], [5]
which allows consistent estimation of the robot’s location. On
employ signed distance function (SDF) as a representation for
the other hand, a key objective of the mapping task is to
path planning and dense mapping. The distance function can
reconstruct an accurate, dense and high resolution map of the
be used directly by a planner and implicit surfaces can be com-
scene for, e.g., inspection purposes. Path planning similarly
puted to recover the dense map. However, without matching
requires dense information such as obstacle occupancy or
sparse features to identify correspondences, it is difficult to
closest distance to collision in order to avoid obstacles.
preform a globally consistent alignment for motion estimation
As a unified representation for multi-purposes [2], a feature-
for localisation purpose using the SDF representation [6].
based representation such as a set of landmarks has the
To this end, we propose a unified representation called
This work was supported by the Australian Government Research Training Log-GPIS suitable for all of localisation, mapping, and path
Program. planning (see Fig. 1), inspired by recent work on SDF rep-
1 All authors are with the UTS Robotics Institute, Faculty of Engineering
and IT, University of Technology Sydney, Ultimo, NSW 2007, Australia resentation with Gaussian process implicit surfaces (GPISs).
Lan.Wu-1@student.uts.edu.au GPIS [8], [9], [10] is a continuous and probabilistic represen-
2

tation that exploits the implicit surfaces. Whereas conventional SLAM systems [3], [15], [16] have demonstrated accurate
GPIS representations [11], [12] can only infer the Euclidean pose estimation, global consistency for long term trajectory,
distance field (EDF) accurately close to the measurements and real-time performance. Due to the difficulties of extracting
(noisy points on the surface), Log-GPIS proposed in our continuous surfaces from a sparse set of points, most of
previous work [13] faithfully models the EDF and gradients by them are not suitable for applications other than localisation.
enforcing Eikonal equation [14] through a simple log trans- In addition, dedicated challenging algorithms are required
formation on the standard GPIS. This representation allows to recover a dense and accurate reconstruction from sparse
us to use Log-GPIS as a single unified representation for points [17].
localisation, mapping and planning multiple robotics tasks, As an alternative representation to achieve sensor tracking
which is the focus of this paper. The smooth and probabilistic and mapping, the seminal work of KinectFusion [4] demon-
nature of Log-GPIS provides an accurate dense surface for strates the performance of camera motion estimation and
mapping given noisy measurements. The extrapolative power dense reconstruction in real-time. To represent the geome-
of Log-GPIS provides information not only near but also far try, KinectFusion uses the truncated signed distance function
away from the measurements, and hence enables estimation (TSDF) [18] and a coarse-to-fine ICP algorithm [19] to
of the full EDF with gradients. This allows motion estimation perform alignment. TSDF, originally proposed in computer
via direct optimisation of the alignment between the global graphics literature, has become a widely used representation
map representation and a new set of measurements. Further- to exploit in odometry and mapping approaches. However,
more, the EDF and its gradients are suitable for trajectory KinectFusion has some drawbacks. First, the algorithm im-
optimisation-based planners. plementation relies on GPU. Second, the distance of the
The contributions of this paper are as follow: TSDF is truncated and generally overestimates the distance
• The Log-GPIS-MOP framework, which achieves incre- within limited viewpoints. Third, the representation can not
mental localisation, mapping and planning based on a be directly used for planning applications.
single continuous and probabilistic representation through Another relevant representation for mapping and odometry
accurate prediction of the EDF and the implicit surface tracking is from RGB-D SLAM [20]. Such as the framework
with gradients. recently proposed in [21], the ElasticFusion represents the map
• A sound formulation for 2D/3D point cloud alignment as a collection of surfels, which are small units with normals
for odometry estimation, where the Log-GPIS is used to and colour. Given a surfel-based representation, ElasticFusion
minimise the point cloud distance to the implicit surface incrementally estimates the sensor pose through the dense
from a set of newly arrived observations following the frame-to-map tracking. No pose-graph optimisation or post-
gradients. processing is required. Such representation generally assists
• A generic mapping approach for organised and unorgan- to compute accurate pose estimation, along with high-quality
ised point clouds that consists in probabilistically fusing reconstruction and a rich understanding of the occupied space.
new measurements into a global Log-GPIS representation However, the representation does not differentiate free space
and is ready for a modified version of marching cubes to from unknown space explicitly, which is of limited applicabil-
recover dense surface reconstruction. ity for autonomous navigation tasks.
• A path planning approach that uses Log-GPIS to avoid In contrast, occupancy-based representations [22] efficiently
collisions with the mapped surfaces following the EDF compute occupancy information, such as free, occupied, and
and gradient information. unknown. It is commonly used for mapping and path planning
Moreover, we present a thorough evaluation of the Log- applications. Many options for occupancy maps have been
GPIS-MOP incremental tasks and overall framework on public presented in the literature varying from discrete [22] [23] [24]
benchmarks, and qualitative and quantitative benchmarking to probabilistic and continuous [25], [26]. For the discrete
with the state-of-the-art. Our work enables sequential and occupancy map, the constructed maps contain a set of grid
concurrent localisation, mapping and planning, focusing in cells with occupancy information of the environment. The
accurate performance as opposed to real-time operations. planning approaches take the occupancy grids as input [27].
The paper structure is organised as follows. The related As for the probabilistic method in [28], the authors propose
work is presented in Section II. Section III explains the the continuous occupancy mapping through viewing Gaussian
background knowledge. Our Log-GPIS representation is dis- processes as implicit functions of occupancy. By doing that, it
cussed in Section IV. The methodology of Log-GPIS-MOP is capable of building the representation with both occupancy
framework is presented in Section V , with odometry in VI, information and implicit surfaces. If naively extended from 2D
mapping VII, and planning VIII. In Section IX, we present a occupancy grids to 3D, however, that requires a huge memory
thorough evaluation of the framework on public benchmarks, consumption. It is difficult to perform fast ray-casting and
and compare the performance to existing solutions. Finally, look-up tabling for any space larger than a room. In particular,
conclusion and future works is given the Section X. the online solution commonly used in 3D is Octomap [23].
This approach uses an octree-based structure to represent
II. R ELATED W ORK occupancy probabilities of cells. The octree structure allows
For visual SLAM, several successful frameworks process the map to deal with large space and immensely decreases the
the input data into a set of sparse features as the repre- amount of memory to represent unknown or free space.
sentation for odometry and mapping. Recent feature-based However, the representation with occupancy information
3

only is not enough for some planning approaches. For instance, discontinuity. In contrast, our paper uses a single and unified
trajectory optimisation-based planners including CHOMP [5] representation generated from depth data only for odometry
and TrajOpt [29], require the robot to build an efficient estimation, continuous dense mapping and planning with no
representation with distance to obstacles and direction infor- aid from any other positioning system.
mation. In addition, having a distance map speeds up collision Gaussian Process implicit surfaces (GPIS) is a continuous
checking of complex circumstances. For example, robot arms and probabilistic representation given noisy sensor obser-
are commonly represented as a set of overlapping spheres. It vations initially proposed to reconstruct surfaces [8]. Later
is convenient to check the distance field in the center of each in [11], the authors used the GPIS as an approximate signed
sphere [5]. distance function (GPIS-SDF) that infers the distance to the
These planners require a distance field, which is not trun- nearest surfaces. However, it only keeps tracking the distance
cated further from surface. ESDF containing distance values information close to the surface as there is a lack of observa-
over the entire exploring space is also necessary. The ESDF tions further away. Based on the Varadhan’s approach [34], our
information can be naturally passed to the collision cost func- previous work in [13] applies the logarithmic transformation
tion to produce collision-free trajectories [5] [29]. Checking to a GPIS with a specific covariance function to recover
the distance values of the neighbors gives the gradient at a the distance field in the space. This method provides the
given point. This allows CHOMP and other planners to follow extra potential of GPIS to accomplish accurate distance field
the gradient direction of the distance to push the robot on the mapping even further away from the surface. However, our
trajectory without collision [30]. previous work does not consider to solve the localisation
Following above requirements, Oleynikova et al [31] pro- problem as assumes the sensor poses are known and do not
posed the Voxblox framework to incrementally generate a actually propose a planning approach that leverages such a
representation for mapping and optimised planning. It adopts representation.
the advantages of TSDF to generate implicit surface as the Catering for the various requirements to have a represen-
dense reconstruction ideal for human visualisation. Based on tation suitable for odometry, mapping and navigation is still
the TSDF, the distance within truncated threshold is relatively an open research problem. Thus, in the paper we present the
accurate. Then it uses the ESDF representation that is nu- unified environment representation that can be used for dense
merically approximated from TSDF to the space within the reconstruction, collision avoidance, while providing additional
observations field of view. The basic unit for ESDF is voxel information needed for sensor’s odometry estimation.
grid, which contains the distance to the nearest obstacles
III. BACKGROUND
and gradient. Given the exploring ESDF, it has the ability
to work with trajectory optimisation methods, such as the In this section we introduce the Eikonal equation, the
trajectory optimization-based local planner [32] proposed by derivation of the distance approximation used in this work and
the same authors to perform online navigation in unknown a brief description of the Gaussian Process Implicit Surfaces.
environments.
Compared to KinectFusion, Voxblox does not use the dis- A. Euclidean Distance Field
tance information of the TSDF or ESDF to perform motion Assume a bounded domain denoted as S ⊂ RD with suit-
estimation. The system still requires ground truth poses as ably smooth boundary ∂S. The ∂S has a constant orientation,
input from a motion capturing system or estimated poses which is given by the the normal n. The nearest distance from
from a visual-inertial odometry framework [33]. To make all x ∈ RD to w ∈ ∂S is expressed as EDF d(x):
the distance play a role in the trajectory estimation, same d(x) = min |x − w| . (1)
authors extended their work to tackle the long term consistent w∈∂S
mapping problem in Voxgraph [6]. It introduces submap-based Suppose the boundary travels continuously along its normal
ESDF and TSDF representations for mapping and navigation. direction to reach at x. Let f (x) be the speed of travel and
Voxgraph adopts the distance information to generate corre- b(x) the arrival time. The minimal amount of time required is
spondence free constraints between submaps. Then through given by the solution of
optimising a pose graph, the relative pose given submaps 1
in a closed loop is computed to assist with the long term |∇b(x)| = (2)
f (x)
drifts. However, Voxgraph system needs an external visual
odometry to convert incoming sensor data into submaps for where | · | represents the Euclidean norm. In the special case
further optimisation and localisation. that travel speed equals to 1, the arrival time b(x) is exactly
The authors proposed a landmark-based representation the distance from x to ∂S, equivalent to d(x). This basic
in [2] for sparse map construction, self localisation and path characteristic of distance function satisfies the solution of the
planning. The representation is a set of image landmarks well-known Eikonal equation [35]:
selecting from a sequence of images. The localisation is |∇d(x)| = 1 , (3)
implemented by computing a flow between the landmarks,
where the Euclidean distance between two points is given by
while at the same time the shortest path is computed along
a straight line. ∇ represents the gradient. Note that (3) is
the selected landmarks. However, sparse landmarks represent
constrained with boundary conditions,
geometric structure properly, but they are not suitable for dense
reconstruction or path planning due to the sparseness and d(x) = 0 and ∂d(x)/∂n = 1 on x ∈∂S. (4)
4

B. Varadhan’s Formula Thus the expression of v(x) in terms of u(x) from (6) is
given by, √
In the robotics literature, recently proposed methods such
v(x) = exp[−u(x)/ t]. (8)
as [31] [36] adopted the concept of wave front propagation
to estimate the distance field efficiently. This approach incre- Substituting v(x) from (8) in (7) leads to
mentally propagates the distance field to the neighbors directly h  √ i
from TSDF or from the occupancy map through a voxel grid. v − t∆v = v 1 − |∇u|2 + t∆u = 0, (9)
In this way, compromise has to be made to not obtain the
and this results in the regularised Eikonal equation given by,
EDF directly as further post processing algorithm is required.
 √
To recover the distance field exactly, one option is to solve 1 − |∇u|2 + t∆u = 0. (10)
the Eikonal equation. Various methods have been introduced
in the past few decades. Some of these methods take advantage Essentially, (10) demonstrates that the regularised form of the
of the relationships provided by physical constraints, generally Eikonal equation can be effectively solved in a linear way.
discretising the domain through a grid, thereby computing a
solution for each discrete point. However, the main restriction C. Gaussian Process Implicit Surfaces
in exploiting the Eikonal equation (3) is that the equation is Gaussian Process (GP) [38] is a non-parametric stochastic
hyperbolic and non-linear, which makes it difficult to directly method ideal for solving non-linear regression problems. GPIS
solve and recover the accurate distance values. techniques [8], [9], [10] use GP approach to estimate a prob-
Inspired by the idea proposed in the computer graphics abilistic and continuous representation of the implicit surface
literature [14], we adopt a smooth alternative based on Varad- given noisy measurements. Recently as proposed in [11], [12],
han’s distance formula [34] [37]. The intuitive explanation GPIS representation can be used for distance field estimation,
of Varadhan formula in physical perspective is as follows. which provides approximate distance values only near the
Assume a hot needle touching a surface, so that the heat surface.
radiates on the manifold. The heat spreading is treated as Following the formulation in [11], for an unknown function
taking a random walk from the boundary. Assume the time of d that represents the distance to a surface, the implicit surface
random walks to be minimum, the nearest possible trajectory is modelled with zero mean GP and gradient ∇d:
will be taken. This relationship between heat and distance on  
d
the manifold S is captured by the Varadhan formula through ∼ GP(0, k̃(x, x0 )). (11)
the heat kernel. The heat kernel denoted as v(x) represents ∇d
the conduction of heat from a source to a target over time. Then, the combined covariance matrix k̃(x, x0 ) with partial
According to the celebrated result of Theorem 2.3 of derivatives can be written as,
Varadhan in [34], the heat kernel v(x) is closely related to
k (x, x0 ) k (x, x0 ) ∇>
 
the EDF d(x) as follows: 0 x0
k̃ (x, x ) = , (12)
√ ∇x k (x, x0 ) ∇x k (x, x0 ) ∇>x0
d(x) = lim {− t ln[v(x)]} , (5) where k(x, x0 ) is the covariance function, which characterises
t→0
the properties of function d through multiple hyperparameters.
where t is the time and the condition for the equation to hold ∇x and ∇x0 are the partial derivatives of k(x, x0 ) at x and
is that t approaches zero. Consequently, given a relative small x0 [10].
value of t, it is possible to define the approximation of d(x) Consider a set of points corrupted by noise. The distance
as u(x): J
measurements y = {yj = v(xj ) + j }j=1 ⊂ R and the corre-

u(x) = − t ln v(x). (6) sponding gradients ∇y ⊂ R are taken at locations xj ∈ RD
and affected by additive Gaussian noise j ∼ N (0, σv2 ). The
In this case, t controls the smoothing property of u(x).
posterior distribution of d at an arbitrary testing point x∗ is
Instead of solving the non-linear solution of the Eikonal
given by d(x∗ ) ∼ N (d¯∗ , V [d∗ ]) with the predictive mean and
equation in terms of d(x), we can solve the homogeneous
variance with derivatives are expressed as follows
screened Poisson partial differential equation (PDE) [37] in
d¯∗
   
terms of v(x). Screened Poisson exhibits similar properties to > 2 −1 y
= k̃∗ (K̃ + σd I) (13)
the conventional heat equation: ∇d¯∗ ∇y
 
(1 − t∆)v = (1/t − ∆)v = 0 in S, V [d∗ ]
(7) = k̃(x∗ , x∗ ) − k̃> 2 −1
∗ (K̃ + σd I) k̃∗ . (14)
V [∇d∗ ]
v = 1 on ∂S,
P ∂2 Note that I represents the identity matrix of size J(D + 1) ×
v is to be determined. ∆ = i ∂x 2 is the Laplace operator J(D + 1). k̃∗ represents the covariance vector between the
i
and 1/t is a positive parameter that controls the screening. training points and the testing point x∗ . K̃ is the covariance
Unlike the non-linear Eikonal equation (3), it is important to matrix of the training points J with gradients. Furthermore,
highlight that the screened Poisson equation (7) is linear and k̃(x∗ , x∗ ) is the covariance function between the testing point.
easy to solve. That is to say, the proposed method solves the Most GPIS techniques ensure the accuracy of the recon-
heat equation in terms of v(x) instead of Eikonal equation structed surfaces, but only few of them directly model the
related to d(x). distance field [11], [12]. However, one of the disadvantages of
5

standard GPIS applying inference of the distance field is that We also note that the log-transformation affects the gradient.
only accurately infer the distance near the surface. Given the Taking the partial derivative of (16) respect to the distance
limited training data, it is hard to keep the high precision and leads to: √
continuity of the distance field further from the measurements. − t
∇d¯∗ = ∇v̄∗ , (17)
Our representation, Log-GPIS method [13], aims to overcome v̄∗
this shortcoming. It uses the log transformation of the well- where we can find that the main difference between
√ ∇d¯∗ and
known GPIS model after applying the Varadhan’s equation to ∇v̄∗ is the direction and the scaling factor t/v̄∗ . Typically,
predict the exact Euclidean distance field given minimal and for a standard GPIS, the magnitude of the gradients at the
sparse observations close to surface. With accurate distance surface does not have to be fixed. However, the Eikonal
predicted, we unlock the potential of Log-GPIS to accomplish equation (3) requires that the magnitude of the gradient is
robotic tasks other than mapping, such as Log-GPIS based normalised to 1. Since the gradient of d¯∗ is in the opposite
odometry discussed in section VI and path planning in sec- direction as the gradient of v̄∗ , we simply normalise the
tion VIII. gradient of v̄∗ , and invert the sign to have the d¯∗ .
Based on (17), the predictive variance V [d∗ ] can be directly
IV. L OG -G AUSSIAN P ROCESS I MPLICIT S URFACES recovered in terms of V [v∗ ] using a first-order approximation,
A. Derivation √ √ >
− t − t
As the name suggests, our Log-GPIS [13] representation is V [d∗ ] = V [v∗ ] . (18)
v̄∗ v̄∗
based on applying the logarithmic transformation of Varad-
han’s formula on the GPIS. As discussed above, the main So far the Varadhan’s equation only applies to x ∈ S as
difference between the Eikonal equation with solution d(x) shown in (7). In [13], we showed that this domain can be
and the equation with the heat kernel v(x) as solution, is that extended to x ∈ RD . For completeness we detail here.
heat equation (also known as screened Poisson PDE) is linear Let us consider the complement of the EDF as S c , and
to solve. It is advantageous to model the heat kernel v(x) (i.e. the corresponding heat equation in S c . Given ∂S c = ∂S, the
the exponential of the d(x)) as a GP, instead of the d(x), boundary condition remains the same for both heat equations
  in S and S c . Combining two heat equations for x ∈ S and
v ∈ S c , we arrive at:
∼ GP(0, k̃(x, x0 )) . (15)
∇v
v − t∆v = 0 in RD ,
We then use (6) to recover the distance and thus its (19)
v = 1 on ∂S.
name. Essentially, Log-GPIS representation simply takes the
logarithm of a standard GPIS to model v(x). Thus, the linear Note that, this domain change has essentially ‘stitched’ the
heat equation or the screened Poisson PDE constraints can EDFs of S and S c . Thus the information on whether x is inside
be naturally incorporated into GP regression by ensuring that or outside S is lost (i.e. the sign is lost). In our perspective,
the covariance function follows the linear PDE [39], which the advantages of using (19) to approximate the distance field
we will describe how to choose the covariance function for everywhere on RD , losing the sign is a small price to pay for
Log-GPIS shortly in section IV-B. the convenience of the simplicity and accuracy of the distance
Moreover, assuming that the choice of covariance function estimation.
respects (19) already, we estimate v̄(x) and ∇v̄ and its
variance by simply using the GPIS regression equation (13) B. Choice of Covariance Function
and (14) with target measurements set to yi = 1 at locations xi In our previous work [13], we derived a covariance function
on the surface (instead of d¯ and ∇d).
¯ Intuitively, this implies
that exactly satisfies the screened Poisson PDE (19). Inspired
the target values of measurements in terms of the EDF is by Sarkka’s work [39], we noted that the so-called Whittle
log(1) = 0 on the surface boundary. Similarly, this enforces kernel [40], solves exactly the screened Poisson equation (19),
the boundary condition in (19).
|x − x0 |
Subsequently, after the GP inference, we need to use Varad- k (x, x0 ) = Kν (λ |x − x0 |) , (20)
han’s formula (6) to recover the EDF from the predictive mean 2λ

and covariance function: where λ = 1/ t and Kν is the modified Bessel function
√ of the second kind of order ν = 1. λ is the characteristic
d¯∗ = − t ln v̄∗ . (16)
length scale hyperparameter. Note that in this case, (5) shows
Remarkably, (16) shows that we can solve the screened that when t gets closer zero it will produce a better distance
Poisson equation by a careful choice of covariance and a approximation, so that relative larger λ will produce a more
simple log-transformation, which justifies the name of Log- accurate EDF. In this paper, hyperparameter optimisation is not
GPIS. Despite its simplicity, the log-transformation has a non- considered. Nevertheless, λ is exactly the hyperparameter of
trivial benefit as the predicted distance will approach infinity the Whittle covariance, and (5) provides the necessary intuition
when querying points further away from the surface unlike to choose a good λ value for the EDF inference. Empirically
the standard GPIS, which will predict distance value of zero we found that λ can be chosen from 5 up to 40. Too large
(falling back to the mean far away from the measurements). In λ will cause numerical issues when querying distance away
other words, Log-GPIS solves the issue of undesirable artifacts from input observations and too small λ value will introduce
in the predicted surface geometry. inaccuracy to distance field around the surface.
6

Furthermore, as the Whittle covariance is |ν − 1| = 0 times that probabilistically represents the EDF and its gradient.
differentiable, it cannot be used for GPIS gradient inference. Finally, a path planning approach uses the Log-GPIS to avoid
Given that the standard Whittle covariance is a special case of the surfaces of the mapped environment using the EDF and
the Matérn family of covariance functions, and Matérn ν = gradient information. Note that Log-GPIS-MOP is based on
3/2 is once differentiable, we proposed in [13] to use this the assumption that the environment has sufficient curvature
latter instead. The Matérn family of covariance functions are to unequivocally recover the pose of the sensor at any given
given by: sensor frame.
1−ν
√ !ν √ !
0 22 2ν 0 2ν 0 VI. O DOMETRY
k(x, x ) = σ |x − x | Kν |x − x |
Γ(ν) l l
Let us consider a depth sensor moving in a static envi-
(21) ronment. The sensor captures a sequence of organised or
where l is the characteristic length scale and σ 2 (the signal unorganised point clouds as measurements of the environment.
variance) are the hyperparameters controlling the generalisa- Let us assume that the measurement noise is independent and
tion information of GP models. Sample functions from Matérn Gaussian distributed. The sensor frame at time ti is denoted as
family are |ν − 1| times differentiable. Thus, ν manages the FCi , where i = (0, ..., L). The world reference FW is given
degree of smoothness. by the sensor frame FC0 at initial the timestamp. The pose
We empirically observed
√ in [13] that a modified version of of FCi in world reference FW is given by the rotation matrix
Matern 3/2 with l = 2νλ is a good approximation of Whittle W
RCi ∈ SO(3) and the translation vector W tCi , combined
covariance with the advantage that it is once differentiable. into the 4 × 4 homogeneous transformation matrix ∈ SE(3)
Given that the modified Matérn 3/2 allows normal prediction as,
and Whittle does not, and the approximation is acceptable,
 W
RCi W pCi

W
we propose to use this Matérn version as a trade-off for extra TCi = . (22)
0> 1
information.
Exploiting the accurate EDF with gradients in the space, not Denote the point cloud measurement of the sensor frame
only around the boundary but also further away, establishes the FCi at time ti as Ci Pi and Ci pi,j ∈ R3 the j-th point of Ci Pi
mathematical foundation to solve the pose estimation problem. with j = 0, ..., Ji . This 3D point cloud Ci Pi can be projected
Furthermore, faithfully modelling the EDF with gradients is from FCi to FW using W TCi ,
 W   C 
sufficient for a trajectory optimisation-based planner, which Pi W
i
Pi
requires the distance and gradients to the nearest obstacles = TCi . (23)
1 1
to explore in complex surroundings. Moreover, the GPIS
provides continuous and probabilistic representation for map- Also, let us express W TCi through the concatenation of
W
ping and surface reconstruction. Our Log-GPIS unlocks the TCi−1 and Ci−1 TCi as
potential to provide a unified solution for sensor odometry, W
TCi = W TCi−1 Ci−1 TCi , (24)
surface reconstruction, and safe navigation.
where W TCi−1 is given by the odometry computation at time
ti−1 . We aim to find the W TCi such that the newly arrived
V. L OG -GPIS-MOP OVERVIEW Ci
Pi can be projected to the global world frame.
We present first the incremental formulation that leverages
directly the Log-GPIS. This incremental odometry formulation
is the one used by the incremental mapping and planning
approaches all tightly integrated through the Log-GPIS. Fur-
thermore, we also present the batch optimisation odometry
formulation that can be used as post-processing step to refine
the incremental odometry. Note that the batch formulation uses
Fig. 2. Diagram of Log-GPIS-MOP framework. the sequential poses only and not directly the Log-GPIS.
Let us consider the Log-GPIS as a unified representation for
simultaneous mapping, odometry and planning (MOP) with A. Incremental Formulation
depth sensors. Given the input data from any type of depth In this work, we propose an iterative approach to estimate
sensor, e.g. LiDAR and depth cameras, Log-GPIS-MOP aims the odometry based on the Log-GPIS representation. Our
at incrementally estimate the EDF, while at the same time odometry estimation approach follows the iterative and frame
localise the sensor, and with this information plan its path to transformation estimation aspects behind Iterative Closest
avoid colliding into the surfaces implicitly described by the Point (ICP)[19] but without the CP approach. ICP algorithm
EDF. Fig. 2 shows an overview of the full approach. aligns two point clouds using for instance, point to point or
The pose estimation problem for odometry is formulated as point to plane distance metrics. For each point in the source
iterative alignment optimisation where the Log-GPIS is used point cloud, ICP searches for the closest point in the target
to minimise the distance to the surface from a newly arrived point cloud, which are tentative correspondences. Then, the
point cloud following the gradients. The mapping consists in transformation than aligns both point clouds is computed by
fusing this current point cloud into the existing Log-GPIS iteratively minimising the total distance error of all points
7

correspondences. Searching for the closest points is the most Given the local Log-GPIS for an arbitrary testing point
Ci−1
expensive part in a standard ICP [4], [41]. pi,j , the predictive mean v̄i with derivatives are obtained
Our work proposes instead, a direct query of the distance from (13). Also, as per (16), d¯i can be computed by applying
to the surface based on Log-GPIS to find the alignment the logarithm transformation to v̄i ,
efficiently and accurately without the need of the closest 
v̄i
 
yi−1

points search. Given a fused global representation with the = k̃>
i−1 (K̃ i−1 + σ 2 −1
I) (27)
∇v̄i ∇yi−1
information of all previous frames, the minimum distance to
the surface can be obtained directly from the Log-GPIS, which √
d¯i = − t ln v̄i . (28)
saves us from doing the correspondences search. With the
assumption that the difference between two consecutive frames K̃i−1 is the covariance matrix of the input points and k̃i−1
is relatively small, the transformation is computed iteratively represents the covariance vector between the input points of
by minimising in each step the sum of the Log-GPIS inference frame FCi−1 and the testing point. yi−1 are the measurements
itself (which is equivalent to the distance to the surface in a at ti−1 , i.e. Ci−1 Pi−1 .
point to plane way as the distance gradients are utilised as
well). After the transformation of current frame is applied,
the current point cloud is fused to the global representation ...
using a fusion update method, which will be discussed in
section VII-A. X0 X1 X2 Xi−1 Xi
More formally, let us consider a local Log-GPIS represen-
tation d¯i obtained using the previous frame only and a global Fig. 4. Factor graph representation of the frame to map odometry. Low opacity
Log-GPIS representation d¯W i , which is the result of fusing all nodes is used for those nodes that are not used for current alignment. Black
the previous frames into the global frame. Let us assume the factors are the global Log-GPIS distance constrains.
initial W TC0 is identity and define Xi = {W TC1 , ..., W TCi }
As in any frame to frame odometry, it can easily drift. The
with i = (0, ..., L) as the state to be estimated. The odometry
other downside of this simple alignment is that the local log-
is then formulated incrementally to estimate W TCi with ti
GPIS has to be created at every step and discarded afterwards.
as the current time stamp through a frame to map iterative
This motivates the use of a less subject to drift frame to map
alignment Fig. 3, with the possibility to compute the frame to
alignment, which uses the fused global Log-GPIS.
frame alignment as shown in Fig. 4.
b) Frame-to-Map: Let us consider a sequentially com-
The iterative alignment in both cases, frame to map and
puted global Log-GPIS d¯Wi , which is used frame to map align-
frame to frame, is performed by minimising the total distance
ment to compute the poses as the sensor moves through the
to the surface to find the optimal current pose in an non-linear
environment. Particularising eidis for frame to map odometry
least squares optimisation manner,
(Fig. 4), we have
2 Ji
X̌i = arg min eidis ,

(25)  X
eidis W
d¯W W
TCi Ci pi,j ,

Xi TCi = i (29)
j=0
where eidiscorresponds to the total distance from current point
Ci
cloud Pi to the surface and X̌i is the sequential estimation where the global Log-GPIS d¯W
i is computed from,
of the current pose. More importantly, with this formulation
v̄iW
   W 
Log-GPIS is directly used to compute this metric from all the W > W 2 −1 yi−1
points in point cloud Ci Pi by querying the distance values. = (k̃i−1 ) (K̃i−1 + σ I) (30)
∇v̄iW W
∇yi−1
a) Frame-to-frame: This alignment is computed between √
the current frame and the previous-only representation d¯i ; the d¯W W
i = − t ln v̄i , (31)
local Log-GPIS. W
where K̃i−1
is the covariance matrix of all the fused point
clouds from 0 to i − 1 in global reference frame W P{0,...,i−1}
... and k̃W
X0 X1 X2 Xi−1 Xi i−1 represents the covariance vector between all merged
W W
points in world frame and the testing point. yi−1 and ∇yi−1
are the measurements and gradients of W P{0,...,i−1} .
Fig. 3. Factor graph representation of the frame to frame odometry. Low Further, to solve the optimisation of the alignment problem,
opacity nodes is used for those nodes that are not used for current alignment.
Black factors are the local Log-GPIS distance constrains. the derivative of the distance with respect of W TCi is desired.
Based on the chain rule, the Jacobian function is as follows,
Particularising (25) for this case (Fig. 3), let us write eidis ∂ d¯W ∂ d¯W
W
  
i TCi Ci pi,j i
W
TCi Ci pi,j ∂ W TCi Ci pi,j
as: =
∂ (W TCi ) ∂ (W TCi Ci pi,j ) ∂ (W TCi )
Ji
X (32)
eidis Ci−1
TCi = W TCi−1 d¯i Ci−1
TCi Ci pi,j ,
 
(26) The first term is one of the valuable byproducts of the global
j=0 Log-GPIS, which is the gradient of query point W TCi Ci pi,j :
where the W TCi−1 is given by accumulating the relative poses ∂ d¯W W

i TCi Ci pi,j
from 1 to i − 1. = ∇d¯W W Ci
i ( TCi pi,j ). (33)
∂ (W TCi Ci pi,j )
8

The second term is the derivative of W TCi regarding to a A. Incremental Mapping


point Ci pi,j . It is equivalent as below: After the pose is estimated and applied as described in
∂ W TCi Ci pi,j
  ∧  Section VI, the current frame needs to be fused/appended
I − W TCi Ci pi,j
= , (34) with the global Log-GPIS. To do so, we adopt the approach
∂ (W TCi ) 0T 0T
proposed in [11] for GPIS-SDF. This approach uses a Octree-
where ∧ is the skew-symetric operator for the non- based Gaussian Process regressor and Bayesian fusion to
homogeneous 3D point p, merge old measurements and new observations.
 ∧   We briefly summarise the process here, which has been
p1 0 −p3 p2 generalised for organised and unorganised point clouds. Let
p∧ =  p2  =  p3 0 −p1  . (35) us consider a current point cloud W Pi and all fused global
p3 −p2 p1 0 points W P{0,...,i−1} . The full point cloud W P{0,...,i−1} is
Using the querying distance and analytical Jacobian func- stored in a Octree-based structure to reduce the computational
tion, we implement our proposed odometry in Ceres 1 . complexity, which divides the complete data into many over-
lapped clusters. In order to fuse W Pi with W P{0,...,i−1} , let
us model a GP regressor in terms of the bearing angles α and
B. Post Process Batch Optimisation ε of each point W pi,j ∈ W Pi and its inverse range for either
unorganised or organised point clouds as,
ρ−1 ∼ GP{0, k((α, ε), (α0 , ε0 ))}. (38)
...
The Ornsten-Uhlenbeck covariance is used in this case as it
... captures more details without strong smoothing. Note that
X0 X1 X2 Xi−1 Xi
for organised point clouds the conversion from pixels co-
ordinates to bearing angles given the camera calibration is
Fig. 5. Factor graph representation the pose SLAM problem. Xi = straightforward. Then GP regressor can infer smoothly and
{W TC1 , ..., W TCi } with i = (0, ..., L). Green factors represent the frame probabilistically the reciprocal range value at any given point
to map constrains and red factors the frame to frame constrains.
with values α and ε. For each point W pi,j in W Pi , the
algorithm first searches the target point W p0 i,j in the Octree
Given the incremental pose estimation with frame to frame
and frame to map transformations, it is possible to optimise of W P{0,...,i−1} . When W p0 i,j is found, it is fused using the
as a batch the poses as the pose graph problem in Fig. 5. GP regressor.
Instead of naively fusing W p0 i,j (equivalent to α0 , ε0 and
Therefore, we employ the standard formulation as a Maximum 0−1
Likelihood estimation: ρ in polar coordinate) with W pi,j , the regressor accepts
the bearings of W p0 i,j to infer the range value ρ̄−1 as in (38).
2
X̂ = argmin − log(p(X | X̌ )) = arg min kepgo kΣpgo , (36) Then, test the occupancy function Φ below,
X X
2
Φ= , (39)
where X̂ is the state on the pose estimation problem and, 1 + exp (−η (ρ0−1 − ρ̄−1 ))
epgo and Σpgo are the error and the covariance of the poses
where ρ0 is the given range of the point and ρ̄ is the range
respectively on a pose graph batch optimisation (PGO). In
inference. η is a slope parameter. The point is free with ρ0 > ρ̄,
here, the PGO takes the frame to map constraints (in green
and occupied with ρ0 < ρ̄.
measurements) and frame to frame constraints (in red mea-
If the occupancy value is larger than a threshold, then
surements) to optimise a sequence of poses:
target point moving procedure is performed. With negative
occupancy value for beyond the surface, it moves one step
X
epgo W TCi = log W Ť−1 W
 
Ci TCi Σ . (37)
i∈0,.,L
i
along the direction of surface normal. On the contrary, with
positive value, it moves along the opposite direction of normal.
Note that the batch optimisation is presented here as a post- Iteratively, using the regressor, the algorithm ends up with a
processing algorithm. It is not used during the sequential new point W p00 i,j very close to the surface.
estimation, thus not used for incremental mapping, nor path Given W pi,j with noise σi,j and a new target point W p00 i,j
planning as it is too time consuming as an odometry approach. with noise σ 00 i,j both in the overlapping part, the Bayesian
fusion directly updates the surface point pf used with noise
VII. M APPING σf used of global Log-GPIS using:
Mapping aims to fuse new point clouds into the global Log- σ 00 i,j W pi,j + σi,j W p00 i,j
GPIS. We present two different pipelines again, the incremen-
W
p00 i,j =
σ 00 i,j + σi,j (40)
tal mapping that appends the Log-GPIS with new point clouds
σ 00 i,j = (σ i,j ) + (σi,j )−1 .
00 −1
sequentially and the post-processing mapping, which simply
uses the output of the odometry and the raw point clouds all
B. Post Processing Mapping
together to recover a global GPIS.
Given the final estimated transformations W TCi for the
1 https://ceres-solver.org sequence of point clouds Ci Pi , which can be naively merged
9

into one point cloud in global reference frame, we aim to build FV move towards the implicit surface and iteratively query
a globally consistent continuous and probabilistic reconstruc- the Log-GPIS until the iso-values are smaller than a threshold
tion. However, a well-known disadvantage of general GPs lies (0.0001m in our experiments).
on the memory allocation and the computational complexity to
invert the covariance matrix of the size of the training points. VIII. P LANNING
Moreover, as pointed out above, we model the joint GPIS
with gradients, which improves the accuracy of the predicted The differentiable nature of Log-GPIS permits straight-
surfaces. Despite this being advantageous as there is no need to forward integration with existing optimisation-based planners.
generate artificial points to direct inside and outside surfaces, In this section, we present how Log-GPIS can be used together
the computational complexity gets worse when considering the with the covariant Hamiltonian optimisation-based motion
gradient inferences. planner (CHOMP) [5] for obstacle avoidance planning.
To reduce the general computational complexity, and Let xr : t 7→ xr (t) be the robot’s trajectory. For simplicity,
thereby improving its scalability, in our previous work [9], we consider the robot’s position in the workspace over time,
we adopted the idea of the scaling GP algorithm to achieve excluding orientation, so that xr (t) ∈ RD . However, the
fast kernel learning [42], and exploited the Structured Kernel framework can be easily extended to cases with orientation
Interpolation framework with Derivatives (D-SKI) [43] to or with multiple joints.
consider the gradients. The D-SKI is an extension development CHOMP minimises an objective functional defined over a
of the Structured Kernel Interpolation method (SKI) [42] and space of trajectories. We consider the conventional objective
the Matrix-Vector Multiplications method (MVMs) [44]. The functional as introduced in [5] for smooth obstacle avoidance:
main idea is to use an approximate kernel function with in- Z T
1 r
ducing points, enabling fast processing through interpolation. C[xr ] ≡ ||ẋ (t)||2 + λc(xr (t))||ẋr (t)||dt. (44)
The input dataset is reprojected onto a grid generated by 0 2

inducing points, which significantly reduces the requirement Here, the first term encourages smoothness by regularising
to compute GPIS. The interpolated kernel function of k (x, x0 ) the velocity. The second term is responsible for obstacle
is formulated as, avoidance, which is enforced by the collision penalty term
c(xr (t)) set as:
K ≈ V K(U, U )V T , (41) 
r 1
where U is a uniform grid of inducing points m, and V −d(x (t)) + 2 ,
 if d(xr (t)) < 0
r 1
is an N × m sparse matrix of interpolation weights. When c(x (t)) = 2 (d(xr (t)) − )2 , if 0 < d(xr (t)) ≤  (45)

considering gradients and preserving the positive definiteness 0, otherwise.

of the approximate kernel, D-SKI differentiates combined
covariance matrix. The target scaling covariance matrix of CHOMP incrementally minimises the objective functional
k̃(x, x0 ) can be written: C[xr ] using its functional gradient2 . With the objective set
as (44), the functional gradient is given by:
   T
V V ∇C[xr ] = −ẍr + ||ẋr || I − ẋr ẋrT ∇c(xr ) − c(xr )κ .
 
K̃ ≈ K(U, U ) =
∇V ∇V (46)
V K(U, U )V T V K(U, U )(∇V )T
 
Here, κ = ||ẋr ||−2 I − ẋr ẋrT ẍr is the curvature vector.

, (42)
(∇V )K(U, U )V T (∇V )K(U, U )(∇V )T The gradient of the collision penalty term c(xr ) is given by
where V and ∂V are sparse matrices with assumption of 
r
if d(xr ) < 0
quintic interpolation [45]. −∇d(x ),

r
∇c(x ) = (d(xr ) − )∇d(xr ) if 0 < d(xr ) ≤  (47)

0, otherwise.

C. Marching Cubes for Log-GPIS
Given a set of query points for Log-GPIS to obtain the The merit of our framework is that we can use the Log-GPIS
implicit surface values, let us use a modified version of gradient inference equation (17) to efficiently and accurately
marching cubes algorithm [46] to reconstruct a dense map. compute the gradient ∇d(xr ), and, in turn, (46), (47). This is
The marching cube computes and extracts the surface by because our formulation produces the gradients analytically,
connecting points of a constant value (iso-value) within a whereas conventional approaches rely on discrete grid-based
volume of space: approximation [5], [32].
FV = fiso (M, viso ), (43)
IX. E XPERIMENTAL R ESULTS
where M specifies a set of points for querying. viso is the
iso-value at which to compute the surface, specified as a In this section, we illustrate the performance of the proposed
scalar. FV contains the computed faces and vertices of the framework both qualitatively and quantitatively. The frame-
surface. Note that since the lost of the sign, instead of going work is implemented in C++ and Matlab. All experiments were
through exactly zero crossing, our marching cubes sets the run on an 8-core i7 CPU at 2.5GHz.
iso-value as a constant value (0.1m for 2D cases and 0.001m 2 We focus only on gradient computation and omit the exact update process.
for 3D in our experiments). Then, the computed vertices in Interested readers are referred to [5].
10

increasing noise values, with a median RMSE is 0.07629m


given reasonable sensor noise of 0.01m. In particular, a linear
trend is observed between the noise and the distance field
RMSE. This illustrates that the proposed Log-GPIS technique
can accurately predict the EDF far away from measurements
even though the measurements are only available around the
surface.

B. Comparison to SLAM Frameworks


1) 2D SLAM: We compare the performance of Log-GPIS
MOP for 2D SLAM against Cartographer [24], a popularly
(a) Trajectory comparison with different noise used open-source 2D SLAM framework. Cartographer pro-
duces an occupancy grid as opposed to EDF produces by our
framework. We thus qualitatively examine the similarity of the
produced maps. Nonetheless, we quantitatively compare the
odometry performance. In doing so, Cartographer is advan-
taged in that it additionally uses IMU readings for short-term
odometry, as well as loop closure detection and pose graph
optimisation for long-term drift compensation. Meanwhile, our
Log-GPIS odometry and mapping approach solely uses the
LiDAR measurements. IMU fusion and loop closure remain
avenues for future work.
We use two real-world datasets for this comparison, namely
the Deutsches Museum [24] and the Intel Research Lab [7]
datasets. The Deutsches Museum dataset was collected using
(b) Translation RMSE (c) Rotation RMSE
a 2D LiDAR backpack. The Intel Research Lab dataset also
Fig. 6. Comparison of ground truth and estimated trajectories using Log-
GPIS with varying levels of simulated measurement noise. The estimated
consists of 2D LiDAR measurements (13631 frames) collected
trajectories in a) closely resembles the ground truth for all noise magnitudes. from a robot. The dataset comprises multiple loops, with two
The translational and rotational RMSE in b) and c) exhibits acceptable loops in the corridor followed by detailed scans of each room.
increase with noise.
Since the Deutsches Museum dataset was collected from the
real world, there is no ground truth trajectory. On the other
A. Robustness to Noise hand, we adopt the evaluation method suggested in [47], which
We first illustrate the robustness of the proposed Log- computes relative poses between frames for the Intel Research
GPIS-MOP framework against noise. To do so, we use a Lab dataset, which we use to derive the ground truth poses.
simulated 2D LiDAR dataset from [11], [25], and add varying The comparison using the Deutsches Museum dataset is
magnitudes of Gaussian noise from 0.01m to 0.3m. The full shown in Fig. 8. Due to the lack of ground truth poses,
20 × 16m environment is illustrated in Fig. 14. we qualitatively compare the trajectory from Log-GPIS-MOP
The result for odometry is shown in Fig. 6a. It can be seen against that from Cartographer [24] as shown in Fig. 8a.
that, with a realistic noise (σ = 0.01m, cyan), the result is Even without explicit loop closure or IMU preintegration, our
almost identical to the ground truth trajectory (black). Even incremental Log-GPIS trajectory (red line) produces a close
with a significantly large noise of σ = 0.3m (blue), the to identical result to Cartographer (blue line). Similarly, we
estimated trajectory only has acceptable drift. Figs. 6b and 6c compare the mapping results in Figs. 8c and 8b. In Fig. 8c,
illustrate the root mean squared error (RMSE) over 50 Monte the colormap shows the distance field values from 0 (blue) to
Carlo runs in translation and rotation respectively. It can be 4 (yellow) meters. The reconstructed surface (i.e. minimum
seen that the overall error of Log-GPIS odometry remains absolute distance) is shown in white. We truncated the colour
small, demonstrating robustness against noise. In particular, map at 4 meters for better visualisation of the building
even with a large sensor noise of σ = 0.3m, the maximum structures, although Log-GPIS remains accurate further away.
translation RMSE is less than 0.25m. It can be seen that the reconstructed surface closely aligns
In Fig. 7, we compare the reconstructed EDF (Fig. 7b with the occupancy map from Cartographer shown in Fig. 8b.
against the ground truth (Fig. 7a). We visualise the EDF in 3D,
with the z-axis being the distance field value. The colourmap TABLE I
Q UANTITATIVE COMPARISON OF ERRORS ON I NTEL L AB DATASET
varys from blue to yellow, corresponding to distance field
value from 0 to 4 meters. It can be seen that the EDF Absolute translation[m] Absolute rotation[deg]
reconstructed using our method is close to identical to the Scan matching 0.220±0.296 1.7±4.8
Log-GPIS 0.0336±0.0253 1.4904±4.5507
ground truth, sharing the same peaks and valleys.
Graph Mapping 0.031±0.026 1.3±4.7
The RMSE plot in Fig. 7c further shows that the recon- Cartographer 0.0229±0.0239 0.453±1.335
structed EDF is accurate across the entire space even with
11

TABLE II 100 frames for evaluation.


Q UANTITATIVE COMPARISON OF ERRORS WITH ORB-SLAM2 ON A The result is shown in Table. II. As can be seen, Log-
SEGMENT OF THE T EDDY B EAR DATASET
GPIS-MOP outperforms ORB-SLAM in translational error,
ORB-SLAM2 Log-GPIS although ORB-SLAM2 shows lower rotational error. In other
Translational error RMSE [m] 0.020317 0.017197 words, Log-GPIS-MOP exhibits comparable performance to
Translational error mean [m] 0.017291 0.014583
Translational error median [m] 0.015385 0.013027 ORB-SLAM2, even though it does not use RGB features for
Translational error std [m] 0.010668 0.009114 alignment as ORB-SLAM2 does.
Translational error max [m] 0.050399 0.044593
Rotational error RMSE [deg] 1.225683 1.452126
Rotational error mean [deg] 1.096656 1.289137 C. Comparison to Distance Field Mapping Frameworks
Rotational error median [deg] 0.018657 0.021269 We compare the performance of Log-GPIS-MOP in surface
Rotational error std [deg] 0.547399 0.668427
Rotational error max [deg] 2.582371 3.734151
reconstruction against Voxblox [31], a state-of-the-art distance
field mapping and surface reconstruction framework. Since
Voxblox is a mapping-only framework that requires external
odometry, we allow Voxblox to use the ground truth trajectory
We conduct a more quantitative analysis using the Intel from the dataset. Given the pointcloud data and ground truth
Research Lab dataset [7]. We adopt the relative poses between trajectory, Voxblox generates similar outputs as ours including
frames as the ground truth and the absolute translational the surface mesh, and the EDF. This is achieved by building
and rotational errors technique from [47] to evaluate the a TSDF first, followed by wavefront propagation from some
performance. initial voxels for a certain distance to compute the EDF values.
For the Intel Research Lab dataset, we present comparison A dense mesh is reconstructed from the TSDF. The resolution
against the Scan Matching [48] (SM), Graph Mapping [49] and TSDF are set at 0.01m and 0.03m respectively. For fair
(GM) in addition to Cartographer, using the values reported comparison, Log-GPIS-MOP uses the a grid with the same
in [24]. Table. I presents the absolute translational and ro- resolution as testing points. The rest of the parameters of
tational errors with standard deviation. Overall, Cartographer Voxblox are set as default.
exhibits the lowest error owing to loop closure detection. Our We use two datasets, the Freiburg3 Teddy [50] used in
framework outperforms the SM approach, and demonstrates Sec. IX-B and the Cow and Lady dataset 3 . For this com-
similar results to the GM and Cartographer approaches. The parison, we use the first 600 frames of the Freiburg3 Teddy
mapping result in Fig. 9 shows that the framework performs dataset, which corresponds to the first loop around the teddy
as expected, providing an accurate reconstruction of the wall bear.
surface, rooms and corridors. The Cow and Lady dataset includes fibreglass models of
2) 3D SLAM: To evaluate the performance of Log-GPIS- a large cow and a lady standing side by side in a room, as
MOP in the 3D setting, we compare against ORB-SLAM2 [3], illustrated in Fig. 12a. The dataset consists of RGB-D point
a state-of-the-art feature-based SLAM framework. ORB- clouds collected using a Kinect RGB-D camera, with camera
SLAM2 uses sparse keypoint features from the RGB images trajectory captured using a Vicon motion system. We use the
for odometry, whereas Log-GPIS-MOP does not use the RGB first 450 frames of the dataset. This dataset is much more
component. As ORB-SLAM2 does not produce a dense map, challenging than the Freiburg3 Teddy dataset, as the camera
we only compare the odometry performance. The quality of motion is relatively fast, with large changes in orientation.
the estimated camera trajectory is evaluated using the tool Further, the Vicon trajectory is occasionally misaligned. We
from [50]. use Log-GPIS-MOP to refine such misalignment by using the
We use a real-world RGB-D dataset called the Freiburg3 Vicon trajectory as a prior for odometry.
Teddy from TUM [50], illustrated in Figs. 10a and 10b. The The results for the Freiburg3 Teddy dataset is shown in
dataset consists of RGB-D images collected using Kinect sen- Fig. 10. Figs. 10 c-d) and e-f) show the reconstructed map
sor, along a ground-truth trajectory recorded using a motion- using Voxblox and Log-GPIS-MOP respectively. Most notably,
capture system with eight tracking cameras. The image reso-
lution is 640 × 480 at a frequency of 30 Hz. We use the first 3 https://projects.asl.ethz.ch/datasets/doku.php?id=iros2017

(a) Ground Truth (b) Log-GPIS (σ = 0.01m) (c) RMSE


Fig. 7. Comparison of ground truth and estimated EDF with varying levels of noise. z-axes in b) and c) correspond to the distance value. The estimated EDF
with a realistic noise σ = 0.01m in c) is accurate and closely resembles the ground truth in b). The RMSE in d) exhibits a linear increase with noise.
12

(a) Estimated Trajectory (b) Cartographer (c) Log-GPIS


Fig. 8. a) Cartographer map using 2D LiDAR dataset. The resolution of the occupancy grid is 0.05 meter. The odometry and global optimisation are enabled.
The rest of parameters are set as default. b) As a comparison, the mapping result of our proposed method is shown. We can clearly see the wall and hallways.
c) The surface modeling with distance field is demonstrated on the Intel dataset. The local minimal of iso-surfaces is in white colour.

because Log-GPIS-MOP has the advantage of extrapolating


the gaps in data, as we saw in the Freiburg3 Teddy dataset.
Fig. 12a shows the raw RGB image of the scene. Log-GPIS
and Voxblox both produce similar reconstructed meshes shown
Fig. 12b and Fig. 12c respectively. As we can see on the top
of the mattresses, Log-GPIS has the advantage of filling the
missing data gaps.
To further examine the behaviour of the two frameworks, we
compare 2D slices of the ground-truth and estimated distance
fields in Fig. 13. We choose a horizontal slice 0.8m above
the ground. As shown in Fig. 13c, Voxblox only computes
Fig. 9. Reconstructed Log-GPIS map on the Intel Research Lab dataset.
the EDF within the sensor’s field of view, whereas Log-GPIS-
MOP naturally predicts the EDF value at all points as can be
seen in Fig. 13b. For a quantitative comparison, we compare
it can be seen that there is a gap in the Voxblox result (Figs. 10 against a ground-truth distance field computed using the global
c-d)) between the hind paws due to missing data, whereas no point cloud from a Leica scanner. We compute the RMSE
such gap exists in the Log-GPIS-MOP result (Figs. 10 e-f)). for Voxblox only within the sensor field of view, as Voxblox
This is because Log-GPIS-MOP naturally allows dealing with does not compute the EDF outside this region. Fig. 13d shows
incomplete and sparse data, as GPIS allows extrapolation at that the RMSE error of Log-GPIS and Voxblox are 0.0672m
unseen points. and 0.0735m respectively. Thus, Log-GPIS-MOP outperforms
Qualitatively, our reconstruction is closer to the image than Voxblox in both accuracy of the EDF, and the prediction range.
Voxblox, especially the head and hind paws. Some colour
differences are apparent due to light source variations. Voxblox
D. Effects of Log Transformation
fuses the colour of each frame. In contrast, our method obtains
the colour directly from the global merged point cloud, which We examine the effects of the log transformation on map-
clearly produces the three black stripes on the sleeve of the ping, odometry and planning. We first illustrate the nominal
yellow shirt. behaviour of the full Log-GPIS-MOP framework including
It is challenging to quantitatively evaluate the surface recon- mapping, odometry, and planning. We use the first simulated
struction performance, since no ground truth is available for dataset used in Sec. IX-A, the second simulated dataset in [51],
the Teddy bear’s geometry. Instead, we examine the precision both with a sensor noise of σ = 0.01, and the Intel Research
by computing the difference between the estimated normal Lab dataset [7] used in Sec. IX-B. To emulate online planning
vectors and the smoothed normal over ten nearest neighbours, with incremental mapping, we generate the plan from the
as this reveals how noisy the estimated normal vectors are. In robot’s current pose to its future pose in the dataset, 50 time-
Fig. 11, it can be seen that the normal vectors from Log-GPIS- steps ahead. Although the robot does not strictly follow the
MOP are less noisy than that of Voxblox. This implies that generated plan, this serves as a useful proxy for performance
Log-GPIS-MOP produces a smoother surface with less noisy in online scenarios.
normal vectors. The result for the first simulated dataset is shown in Fig. 14,
The reconstruction results on the Cow and Lady dataset with the travelled trajectory in red, the planned trajectory in
using Log-GPIS-MOP and Voxblox are shown in Figs. 12. It white, and the underlying colourmap representing the EDF
can be seen that while there is a gap at the top of the mattresses built so far. Most notably, in Fig. 14c), it can be seen that the
on the left-hand side in the Voxblox result due to missing data, white planned trajectory avoids the obstacles even when it is
whereas Log-GPIS-MOP does not have the same gap. This is not yet seen. The same behaviour is observed in the more
13

(a) Image of Teddy’s front (b) Image of Teddy’s back

Fig. 11. A histogram of the angle distribution of the normals in degrees. Top
figure is Log-GPIS and bottom figures is Voxblox.

because GPIS-SDF produces increasingly inaccurate EDF


predictions further from previous measurements unlike Log-
GPIS. Thus, we present a portion of the results before GPIS-
SDF fails, in order to compare the planned trajectories.
The planned trajectories are shown in white in Fig. 15.
In Figs. 15a-b), It can be seen that trajectory planning using
GPIS-SDF results in nearly a straight line, with a near miss
(c) Voxblox + GT trajectory(d) Voxblox + GT trajectory
(front) (back) in Fig. 15b. On the other hand, with the same weighting for
collision avoidance, Log-GPIS-MOP produces a plan strongly
biased towards the medial axis of the environment (i.e. equidis-
tant from obstacles). This is because GPIS-SDF predictions are
inaccurately low further away from the measured surfaces.
These observations illustrate that the log transformation in
Log-GPIS-MOP represents a critical improvement in predic-
tion accuracy that is necessary for practical use in odometry
and planning.
The full Log-GPIS-MOP for example on the simulated
dataset [51] (angular range from 0◦ to 360◦ with 1◦ resolution)
consumes a median computational time of 3.63s, including
(e) Log-GPIS-MOP (front) (f) Log-GPIS-MOP (back) 1.98s for the frame to map odometry, 0.03s for incremental
Fig. 10. Qualitative evaluation on Freiburg3 Teddy dataset. a) and b) show mapping 0.03s, and 2.29s for path planning. Queries for final
the raw images roughly in the same angle as the reconstructed mesh. Voxblox map reconstruction after post processing takes an additional
uses the ground truth poses for the reconstruction. In contrast, our method 29.12s for the entirety of the map.
has no prior for the localisation.

challenging scenario of the Intel Research Lab dataset [7] X. C ONCLUSION


illustrated in Fig. 1, where the white planned trajectories are We proposed Log-GPIS-MOP, a unified probabilistic frame-
smooth and stays a fixed distance away from the obstacles work for mapping, odometry and planning based on Log-
as intended. This is because Log-GPIS extrapolates well to GPIS. The main ingredient is the Log-GPIS representation,
areas with missing data, and to regions further away from mea- which allows accurate prediction of the EDF and its gradients.
surements. Moreover, in both cases, Log-GPIS-MOP produces By exploiting the global and local Log-GPIS, we presented a
accurate odometry and EDF, as was observed in Sec. IX-A. sequential odometry formulation for the incremental mapping
We attribute such extrapolation to the log transformation, as and planning approaches, and a batch optimisation as a post-
it implicitly enforces the Eikonal equation across the entire processing to refine a sequence of poses. For Log-GPIS
space. mapping, two different pipelines were proposed: the incre-
We now interrogate this hypothesis through comparison mental mapping that appends the Log-GPIS with incoming
against GPIS-SDF [11] on the second simulated dataset. For point clouds and the post-processing mapping, which simply
fairness, we implemented the same odometry and planning uses the output of the odometry and the raw point clouds
algorithms using GPIS-SDF. In particular, we use the same all together to recover a global GPIS. Concurrently, a path
weights for trajectory smoothness and collision avoidance planning approach uses the reconstructed map to compute an
when planning the trajectories. optimal collision-free trajectory in the environment. Extensive
The results are shown in Fig. 15. We observed that odometry analysis has been conducted to evaluate the proposed method
based on GPIS-SDF fails with a moderately large motion, on simulated and real datasets in both 2D and 3D against state-
14

(a) Raw image of the scene (b) Log-GPIS mesh (c) Voxblox mesh
Fig. 12. 3D odometry and mapping evaluation on the cow and lady dataset. a) shows the raw image of the scene. b) and c) show the reconstructed meshes
of our framework and Voxblox respectively. Note that Voxblox fuses the colour for the final reconstruction. In contrast, we directly take the colour from the
global merged point cloud.

(a) Ground truth distance (b) Log-GPIS distance (c) Voxblox distance (d)

Fig. 13. A horizontal slice (5m × 3m) shows the distance accuracy given Log-GPIS odometry and mapping on the cow and lady dataset. The slice is 0.8m
above the ground. All figures show from the top view.

(a) (b) (c) (d) (e)


Fig. 14. The Log-GPIS-MOP on a simulated 2D dataset. The reconstructed surfaces are red lines. The colourmap represents the distance field varying from
-4 to 4 meters. Red line with arrow is the estimated trajectory where the robot traveled. The white line is the suggested path computed by the path planner
using the global Log-GPIS.

of-the-art frameworks. Our experiments showed that Log- real-time 3d reconstruction and interaction using a moving depth cam-
GPIS-MOP exhibits comparable results to the state-of-the- era,” in Proc. of the ACM symposium, 2011, pp. 559–568.
[5] M. Zucker, N. Ratliff, A. D. Dragan, M. Pivtoraiko, M. Klingensmith,
art frameworks for localisation, surface mapping and obstacle C. M. Dellin, J. A. Bagnell, and S. S. Srinivasa, “Chomp: Covari-
avoidance, even without using IMU data or loop closure ant hamiltonian optimization for motion planning,” Int. J. of Rob.
detection. Future work lies in loop closure detection to develop Res.(IJRR), pp. 1164–1193, 2013.
a full SLAM solution which will improve long-term robustness [6] V. Reijgwart, A. Millane, H. Oleynikova, R. Siegwart, C. Cadena, and
J. Nieto, “Voxgraph: Globally consistent, volumetric mapping using
in large-scale environments. Further, we would like to recover signed distance function submaps,” IEEE Robotics and Automation
the sign of EDF and develop an efficient implementation that Letters, 2020.
will allow online operation. [7] D. Haehnel, Cyrill Stachniss—Robotics Datasets, Intel Lab, 2019.
[Online]. Available: http://www2.informatik.uni-freiburg.de/∼stachnis/
datasets/
[8] O. Williams and A. Fitzgibbon, “Gaussian process implicit surfaces,”
R EFERENCES 2007.
[1] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, [9] L. Wu, R. Falque, V. Perez-Puchalt, L. Liu, N. Pietroni, and T. Vidal-
I. Reid, and J. Leonard, “Past, present, and future of simultaneous Calleja, “Skeleton-based conditionally independent gaussian process
localization and mapping: Towards the robust-perception age,” Trans. implicit surfaces for fusion in sparse to dense 3d reconstruction,” IEEE
on Rob., p. 1309–1332, 2016. Robotics and Automation Letters, no. 2, pp. 1532–1539, 2020.
[2] J. Thoma, D. P. Paudel, A. Chhatkuli, T. Probst, and L. V. Gool, [10] W. Martens, Y. Poffet, P. R. Soria, R. Fitch, and S. Sukkarieh, “Geo-
“Mapping, localization and path planning for image-based navigation metric priors for gaussian process implicit surfaces,” IEEE Robotics and
using visual features and map,” in Proc. of CVPR, 2019, pp. 7383–7391. Automation Letters (RA-L), pp. 373–380, 2017.
[3] R. Mur-Artal and J. D. Tardós, “ORB-SLAM2: an open-source SLAM [11] B. Lee, C. Zhang, Z. Huang, and D. D. Lee, “Online continuous
system for monocular, stereo, and RGB-D cameras,” IEEE Transactions mapping using gaussian process implicit surfaces,” in 2019 International
on Robotics (T-RO), vol. 33, no. 5, pp. 1255–1262, 2017. Conference on Robotics and Automation (ICRA), 2019.
[4] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, [12] J. A. Stork and T. Stoyanov, “Ensemble of sparse gaussian process
J. Shotton, S. Hodges, D. Freeman, A. Davison et al., “Kinectfusion: experts for implicit surface mapping with streaming data,” 2020 IEEE
15

gaussian processes,” in IEEE International Conference on Robotics and


Automation (ICRA), 2012, pp. 4756–4761.
[27] E. G. Tsardoulias, A. Iliakopoulou, A. Kargakos, and L. Petrou, “A
review of global path planning methods for occupancy grid maps
regardless of obstacle density,” Int. J. of Rob. Res.(IJRR), 2016.
[28] S. Kim and J. Kim, “Occupancy mapping and surface reconstruction
using local Gaussian Processes with Kinect Sensors,” in IEEE Transac-
tions on Cybernetics, 2013, pp. 1335–1346.
[29] J. Schulman, Y. Duan, J. Ho, A. Lee, I. Awwal, H. Bradlow, J. Pan,
S. Patil, K. Goldberg, and P. Abbeel, “Motion planning with sequential
convex optimization and convex collision checking,” The International
Journal of Robotics Research, vol. 33, no. 9, pp. 1251–1270, 2014.
[30] H. Oleynikova, A. Millane, Z. Taylor, E. Galceran, J. Nieto, and
R. Siegwart, “Signed distance fields: A natural representation for both
(a) (b) mapping and planning,” in RSS 2016 Workshop: Geometry and Beyond-
Representations, Physics, and Scene Understanding for Robotics, 2016.
[31] H. Oleynikova, Z. Taylor, M. Fehr, R. Siegwart, and J. Nieto, “Voxblox:
Incremental 3d euclidean signed distance fields for on-board mav
planning,” in 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), 2017, pp. 1366–1373.
[32] H. Oleynikova, Z. Taylor, R. Siegwart, and J. Nieto, “Safe local
exploration for replanning in cluttered unknown environments for micro-
aerial vehicles,” IEEE Robotics and Automation Letters, 2018.
[33] M. Bloesch, M. Burri, S. Omari, M. Hutter, and R. Siegwart, “Iterated
extended kalman filter based visual-inertial odometry using direct pho-
tometric feedback,” Int. J. of Rob. Res.(IJRR), pp. 1053–1072, 2017.
[34] S. R. S. Varadhan, “On the behavior of the fundamental solution of the
heat equation with variable coefficients,” Communications on pure and
applied mathematics, pp. 431–455, 1967.
(c) (d) [35] S. Mauch, “A fast algorithm for computing the closest point and
distance transform,” Technical Report caltechASCI/2000.077, Applied
Fig. 15. Mapping and planning results of GPIS-SDF (a-b) and Log-GPIS- and Computational Math., Calif. Inst. of Technology, 2000.
MOP (c-d). Red arrows show robot’s current pose. With the same collision [36] B. Lau, C. Sprunk, and W. Burgard, “Improved updating of euclidean
avoidance weight, the planned trajectories (white) from Log-GPIS-MOP in distance maps and voronoi diagrams,” in 2010 IEEE/RSJ International
c-d) are strongly biased towards the medial axis, whereas that of GPIS-SDF Conference on Intelligent Robots and Systems, 2010.
in a-b) are nearly straight lines, with a near-hit in b). [37] A. G. Belyaev and P. A. Fayolle, “On variational and PDE-based
distance function approximations,” Computer Graphics Forum, 2015.
[38] C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine
International Conference on Robotics and Automation(ICRA), 2020. Learning. Cambridge, Mass.: MIT Press, 2006.
[13] L. Wu, K. M. B. Lee, L. Liu, and T. Vidal-Calleja, “Faithful eu- [39] S. Särkkä, “Linear operators and stochastic partial differential equations
clidean distance field from log-gaussian process implicit surfaces,” IEEE in gaussian process regression,” in International Conference on Artificial
Robotics and Automation Letters, pp. 2461–2468, 2021. Neural Networks and Machine Learning, 2011, pp. 151–158.
[14] K. Crane, C. Weischedel, and M. Wardetzky, “Geodesics in heat,” ACM [40] P. Whittle, “On stationary processes in the plane,” Biometrika, 1954.
Transactions on Graphics, 2012. [41] H. Pottmann, S. Leopoldseder, and M. Hofer, “Registration without icp,”
[15] T. Schneider, M. T. Dymczyk, M. Fehr, K. Egger, S. Lynen, I. Gilitschen- Computer Vision and Image Understanding, vol. 95, 2004.
ski, and R. Siegwart, “maplab: An open framework for research in [42] A. G. Wilson and H. Nickisch, “Kernel interpolation for scalable
visual-inertial mapping and localization,” IEEE Robotics and Automa- structured gaussian processes (kiss-gp),” International Conference on
tion Letters, 2018. Machine Learning, pp. 1775–1784, 2015.
[16] T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular [43] D. Eriksson, D. Kun, E. H. Lee, D. Bindel, and A. G. Wilson, “Scal-
visual-inertial state estimator,” IEEE Transactions on Robotics, 2018. ing gaussian process regression with derivatives,” Neural Information
Processing Systems (NIPS), 2018.
[17] S. Fuhrmann, F. Langguth, and M. Goesele, “Mve: A multi-view
[44] K. Dong, D. Eriksson, H. Nickisch, D. Bindel, and A. G. Wilson,
reconstruction environment,” in Eurographics Workshop on Graphics
“Scalable log determinants for gaussian process kernel learning,” Neural
and Cultural Heritage (GCH), 2014, pp. 11–18.
Information Processing Systems (NIPS), pp. 6330–6340, 2017.
[18] B. Curless and M. Levoy, “A volumetric method for building complex [45] E. H. W. Meijering, K. J. Zuiderveld, and M. A. Viergever, “Image
models from range images,” in Proc. of the Conference on Computer reconstruction by convolution with symmetrical piecewise nth-order
Graphics and Interactive Techniques, 1996, p. 303–312. polynomial kernels,” IEEE Transactions on Image Processing, 1999.
[19] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in [46] W. E. Lorensen and H. E. Cline, “Marching cubes: A high resolution
Sensor fusion IV: control paradigms and data structures. International 3d surface construction algorithm,” in Proc. of the Conf. on Computer
Society for Optics and Photonics, 1992, pp. 586–606. Graphics and Interactive Techniques, 1987.
[20] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d mapping: [47] R. Kümmerle, B. Steder, C. Dornhege, M. Ruhnke, G. Grisetti, C. Stach-
Using kinect-style depth cameras for dense 3d modeling of indoor niss, and A. Kleiner, “On measuring the accuracy of slam algorithms,”
environments,” Int. J. of Rob. Res.(IJRR), pp. 647–663, 2012. Autonomous Robots, vol. 27, no. 4, pp. 387–407, 2009.
[21] T. Whelan, S. Leutenegger, R. F. Salas-Moreno, B. Glocker, and A. J. [48] A. Censi, “Scan matching in a probabilistic framework,” in Proc. of
Davison, “Elasticfusion: Dense slam without a pose graph,” in Robotics: IEEE ICRA. IEEE, 2006, pp. 2291–2296.
Science and Systems, 2015. [49] G. Grisetti, C. Stachniss, S. Grzonka, and W. Burgard, “A tree param-
[22] A. Elfes, “Using occupancy grids for mobile robot perception and eterization for efficiently computing maximum likelihood maps using
navigation,” Computer, vol. 22, no. 6, pp. 46–57, 1989. gradient descent.” in Robotics: Science and Systems, vol. 3, 2007, p. 9.
[23] A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and W. Burgard, [50] J. Sturm, W. Burgard, and D. Cremers, “Evaluating egomotion and
“Octomap: an efficient probabilistic 3d mapping framework based on structure-from-motion approaches using the TUM RGB-D benchmark,”
octrees,” Autonomous Robots, pp. 189–206, 2013. in Workshop on Color-Depth Camera Fusion in Robotics at the
[24] W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure IEEE/RJS Int. Conf. on Intelligent Robot Systems (IROS), 2012.
in 2d lidar slam,” in 2016 IEEE International Conference on Robotics [51] S. Anilkumar, “Lidar slam,” https://github.com/soorajanilkumar/Lidar
and Automation (ICRA), 2016, pp. 1271–1278. SLAM, 2019.
[25] S. T. O’Callaghan and F. T. Ramos, “Gaussian process occupancy maps,”
The International Journal of Robotics Research, 2012.
[26] S. Kim and J. Kim, “Building occupancy maps with a mixture of

You might also like