You are on page 1of 11

IEEE TRANSACTIONS ON ROBOTICS, VOL. 30, NO.

1, FEBRUARY 2014

177

3-D Mapping With an RGB-D Camera
Felix Endres, J¨urgen Hess, J¨urgen Sturm, Daniel Cremers, and Wolfram Burgard

Abstract—In this paper, we present a novel mapping system
that robustly generates highly accurate 3-D maps using an RGB-D
camera. Our approach requires no further sensors or odometry. With the availability of low-cost and light-weight RGB-D
sensors such as the Microsoft Kinect, our approach applies to small
domestic robots such as vacuum cleaners, as well as flying robots
such as quadrocopters. Furthermore, our system can also be used
for free-hand reconstruction of detailed 3-D models. In addition to
the system itself, we present a thorough experimental evaluation
on a publicly available benchmark dataset. We analyze and discuss
the influence of several parameters such as the choice of the feature
descriptor, the number of visual features, and validation methods.
The results of the experiments demonstrate that our system can
robustly deal with challenging scenarios such as fast camera motions and feature-poor environments while being fast enough for
online operation. Our system is fully available as open source and
has already been widely adopted by the robotics community.
Index Terms—Localization, mapping, open source, RGB-D,
simultaneous localization and mapping (SLAM).

I. INTRODUCTION
HE problem of simultaneous localization and mapping
(SLAM) has been one of the most actively studied problems in the robotics community over the past decade. The availability of a map of the robot’s workspace is an important requirement for the autonomous execution of several tasks including
localization, planning, and navigation. Especially for mobile
robots that work in complex, dynamic environments, e.g., fulfilling transportation tasks on factory floors or in a hospital, it is
important that they can quickly generate (and maintain) a 3-D
map of their workspace using only onboard sensors.
Manipulation robots, for example, require a detailed model of
their workspace for collision-free motion planning and aerial vehicles need detailed maps for localization and navigation. While
previously many 3-D mapping approaches relied on expensive
and heavy laser scanners, the commercial launch of RGB-D
cameras based on structured light provided an attractive, powerful alternative.

T

Manuscript received January 3, 2013; revised May 16, 2013; accepted August 13, 2013. Date of publication September 9, 2013; date of current version
February 3, 2014. This paper was recommended for publication by Associate
Editor R. Eustice and Editor W. K. Chung upon evaluation of the reviewers’
comments. This work was supported in part by the European Commission under
the Contract FP7-ICT-248258-First-MM.
F. Endres, J. Hess, and W. Burgard are with the Department of
Computer Science, University of Freiburg, 79110 Freiburg, Germany
(e-mail: endres@informatik.uni-freiburg.de; hess@informatik.uni-freiburg.de;
burgard@informatik.uni-freiburg.de).
J. Sturm and D. Cremers are with the Department of Computer Science,
Technische Universit¨at M¨unchen, 85748 Munich, Germany (e-mail: sturmju@
in.tum.de; cremers@in.tum.de).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TRO.2013.2279412

Fig. 1. (Top) Occupancy voxel map of the PR2 robot. Voxel resolution is
5 mm. Occupied voxels are represented with color for easier viewing. (Bottom
row) A sample of the RGB input images.

In this study, we describe one of the first RGB-D SLAM
systems that took advantage of the dense color and depth images provided by RGB-D cameras. Compared with previous
work, we introduce several extensions that aim at further increasing the robustness and accuracy. In particular, we propose
the use of an environment measurement model (EMM) to validate the transformations estimated by feature correspondences
and the iterative-closest-point (ICP) algorithm. In extensive experiments, we show that our RGB-D SLAM system allows us to
accurately track the robot pose over long trajectories and under
challenging circumstances. To allow other researchers to use
our software, reproduce the results, and improve on them, we
released the presented system under an open-source license.
The code and detailed installation instructions are available
online [1].

1552-3098 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

e. Time-of-flight cameras are less sensitive to sunlight but have lower resolutions. a graph-based SLAM system can be broken up into three modules [27]. compute the robot’s motion and the map using cameras as sensors. are more noisy. A variant particularly suited for man-made environments uses the point-to-line metric [5]. high-frequency depth information at a low price. allow for precise localization of a 2-D sensor in 3-D. For SURF. we extensively evaluated the overall system [24]. The first scientifically published RGB-D SLAM system was proposed by Henry et al. [22] show that the memory requirements of the voxel grid can be greatly reduced using an octree to store the distance values. the system cannot deal with loop closures. APPROACH A. It contains synchronized ground truth data for the sensor trajectory of each sequence. However. most state-of-the-art SLAM (and also localization only) systems use variants of the ICP algorithm [2]–[4]. Recent approaches demonstrated that the robot pose can be estimated at millimeter accuracy [6] using two laser range scanners and ICP. Visual SLAM approaches [8]–[10]. which are captured with a highprecision motion capturing system. yet lacks the capability to recover from accumulating drift by loop closures. which commonly provide very accurate geometric measurements of the environment at high frequencies. Hu et al. FEBRUARY 2014 II. which is much faster (though at the price of lower repeatability) than the detector proposed in [14]. (Sequence shown: “fr2/pioneer slam. 30. they are generally not applicable in direct sunlight. and therefore.178 IEEE TRANSACTIONS ON ROBOTICS. In combination with an inertial measurement unit (IMU). The vertical deviations are within 20 cm. and ORB [15]. on a mobile base with a pan-tilt unit or at the tip of a manipulator.. While many of the discussed approaches bear the potential to perform well. In contrast with other RGB-D SLAM systems. Even under challenging conditions. size. and final map . Zeng et al.. also referred to as “structure and motion estimation” [11]. because the evaluation data are not available. Kintinuous [21] overcomes this limitation by virtually moving the voxel grid with the current camera pose. VOL. 1.g. which is perceived by an infrared camera with a small baseline. Recently introduced RGB-D cameras such as the Microsoft Kinect or the Asus Xtion Pro Live offer a valuable alternative to laser scanners. KinectFusion [19] is an impressive approach for surface reconstruction based on a voxel grid that contains the truncated signed distance [20] to the surface. as they provide dense. The size of the voxel grid has cubic influence on the memory usage so that KinectFusion only applies to small workspaces. Unfortunately neither the software nor the data used for evaluation have been made publicly available so that a direct comparison cannot be carried out. [17]. [23] recently proposed a SLAM system that switches between bundle adjustment with and without available depth. The depth sensor projects structured light in the infrared spectrum. NO. Therefore. Stereo cameras are commonly used to gain sparse distance information from the disparity in textured areas of the respective images.”) rectly fused into the voxel representation. Descriptors can easily be combined with different keypoint detectors. [25] and freely provide an open-source implementation to stimulate scientific comparison and progress. Each measurement is di- Fig. they are difficult to compare. Our system has been one of the first SLAM systems specifically designed for Kinect-style sensors. This reduces drift as compared with the frame-to-frame comparisons we employ. may drift indefinitely. Our experiments show comparable quality in the trajectory estimation. Each sequence consists of approximately 500 to 5000 RGB-D frames. due to distance limitations and sunlight. more difficult to calibrate. In our experiments. SURF [14]. [12]. a robot’s trajectory can be accurately reconstructed for long trajectories using our approach. and much more expensive. Approaches that use planar localization and a movable laser range scanner. Real-time performance is achieved but requires highperformance graphics hardware. Popular general purpose keypoint detectors and descriptors include SIFT [13]. As structured light sensors are sensitive to illumination. so far. To compute the relative motion between observations. backend. 2. System Architecture Overview In general. who used visual features in combination with Generalized-ICP [18] to create and optimize a pose graph. the detection time strongly dominates the runtime. e. Visual SLAM systems typically extract sparse keypoints from the camera images. Visual feature points have the advantage of being more distinctive than typical geometric structures. [28]: Frontend. this can also be used to create a map with a quadrocopter [7]. and weight. In contrast with laser-based SLAM. which makes it more robust to lack of depth information. therefore. RELATED WORK Wheeled robots often rely on 2-D laser range scanners. we advocate the use of publicly available benchmarks and developed the TUM RGB-D benchmark [26] that provides several sequences with varying difficulty. III. We compare the performance of the aforementioned descriptors in our SLAM system in Section IV-B.g. we use the detector originally proposed for the descriptor. which simplifies data association. Disadvantages of ICP include the dependence on a good initial guess to avoid getting stuck in a local minimum and the lack of a measure of the overall quality of the match. The parts that are shifted out of the reconstruction volume are triangulated. we further analyzed the descriptor in combination with the keypoint detector proposed by Shi and Tomasi [16].

therefore.g. We quickly initialize a transformation estimate from three feature correspondences. Schematic overview of our approach. Subsequently. as these usually contain repetitive structures. we mutually register pairs of image frames and build a pose graph. which can be efficiently used for navigation and manipulation tasks. The alternative validation method proposed in Section III-C is. one computes their distance in the descriptor space. Arandjelovi´c and Zisserman [29] proposed to use the Hellinger kernel to compare SIFT features. For increased robustness in the case of largely missing depth values. the input is an RGB image IRGB and a depth image ID . . in case of few possible matches or many similar features. Thus.g. They report substantial performance improvements for object recognition. However. . B. therefore. by sensor noise. it is crucial to exploit the possible feature matches. dj ). Particularly. Combined with a threshold for the minimum number of matching features for a valid estimate.: 3-D MAPPING WITH AN RGB-D CAMERA 179 Fig. Egomotion Estimation The frontend of our SLAM system uses the sensor input in form of landmark positions Y = y1 . The frontend processes the sensor data to extract geometric relationships. as using sensor data directly would be highly inefficient. a threshold on the ratio between the distances of nearest and second nearest neighbor can be used effectively to control the ratio between false negatives and false positives. as proposed by Chum et al. For larger man-made environments.ENDRES et al. there are different methods that can be applied to compute the motion in between two observations. a task-specific map representation is required. It has been highly beneficial to recursively reestimate the transformation with reducing threshold θ for the inlier determination. window. [31]... The following sections describe the illustrated parts of the system. For SIFT and SURF. which proves to be very effective against individual mismatches. or repetitive wallpapers. To deal with the inherent uncertainty introduced. A schematic representation of the presented system is shown in Fig. the method is limited in its effectiveness. we generate a textured voxel occupancy map using the OctoMapping approach. However.. . Optimization of this graph structure can be used to obtain a maximum likelihood solution for the represented robot trajectory. however. . To be robust against false positive matches.g. the same type of chair. We extract visual features that we associate to 3-D points. For ORB.. The threshold on the minimum number of matches helps against random similarities and repetition of objects with few features. We. we resort to the ratio between the nearest neighbor and the second nearest neighbor in feature space. We determine landmarks by extracting a high-dimensional descriptor vector d ∈ R64 from IRGB and storing them together with y ∈ R3 . e. the distance is not a criterion for association as the distance of matching descriptors can vary greatly. in most applications.g. that is optimized using g 2 o. Finally. the robot’s motion needs to be computed from a sequence of observations. yn to compute geometric relations zij that allow us to estimate the motion of the robot between state xi and xj . the Hamming distance is used. Under the assumption that a keypoint only matches to exactly one other keypoint in another image. their location relative to the observation pose x ∈ R6 . e. As proposed by Lowe. Except for sensors that measure the motion itself as. In the case of an RGB-D camera. We implemented both distance measures and briefly discuss the impact on accuracy in Section IV-B. The transformation is verified by computing the inliers using a threshold θ based on the Mahalanobis distance between the corresponding features. Because of the high dimensionality of the feature space. a highly beneficial extension. we can project the sensor data into a common coordinate frame. we also include features without depth reading into the verification. the backend of the SLAM system constructs a graph that represents the geometric relations and their uncertainties. create a 3-D probabilistic occupancy map from the RGB-D data. By itself. e. the corresponding feature matches between two images result in the estimation of a bogus transformation. the proposed distance is Euclidean. but our experiments show that setting the threshold high enough to exclude estimates from systematic misassociations comes with a performance penalty in scenarios without the mentioned ambiguities. the second nearest neighbor should be much further away. this approach works well in many scenarios. With the known trajectory. therefore. We. between the robot and landmarks at different points in time. 3. we employ RANSAC [30] when estimating the transformation between two frames. Given enough similar features through such identical instances. . To match a pair of the keypoint descriptors (di . threshold the matches at a permissive ratio. Depending on the sensor type. e. it is generally not feasible to learn a mapping for a rejection threshold. 3. representation. wheel encoders or IMUs. Visual features ease the data association for the landmarks by providing a measure for similarity. The frontend is specific to the sensor type used.

yi ..g. the null hypothesis being tested is the assumption that after applying the transformation estimate. Contrary to our expectations. occlusions. spatially corresponding depth measurements stem from the same underlying surface location. FEBRUARY 2014 neither constitutes an absolute quality measure. As the number of landmarks is much higher than the number of poses. Unfortunately. to penalize transformations for which observed points of one depth image should have been occluded by a point of the other depth image. To compute the spatial correspondences for an alignment of   and ID . the space between the origin and the measurement. as it contains information about free space. For the considered points. Beam-based models have been mostly used for 2-D range finders such as laser range scanners. In the resampling step. the contained dense free-space information. To transform a covariance matrix of a point to the coordinate frame of the other sensor pose. Our method exploits the availability of structured dense depth data. this additional optimization step shows only a slight improvement of the overall trajectory estimates. To take the strongly anisotropic uncertainty of the measurements into account. VOL. we can rewrite the right-hand side to  p(yi | yj ) = η p(yi . The sensor noise for the points in the second depth image is represented accordingly. due to motion blur. [34]. Σ i j (7) (8) . The image raster allows for a quick association to a corresponding depth reading yj .g. This procedure has also been independently proposed by Henry et al. we still need to find a way to decide whether to accept the transformation based on the observation. We propose the use of a beam-based EMM. nor does the product of the densities of the individual beams. with η = p(yj )−1 . where the range readings are evaluated using ray casting in the map. We. or lack of texture. we rotate it using R. due to the size of the input data. since a beam model only provides a probability density for each beam [33]. [17] in his most recent work and referred to as two-frame sparse bundle adjustment. we could only achieve minor improvements. z. a low percentage does not necessarily indicates an unsuccessful transformation estimate and could be a consequence of low overlap between the frames or few visual features. thus. depending on the range value. However. developed a method to verify a transformation estimate. z. where they are used to determine the likelihood of particles on the basis of the current observation.e. We implemented this by applying g2 o (see Section III-E) after the motion estimation. Points that do not project into the image area of ID or onto a pixel without valid depth reading are ignored. However. the transformation estimates can be improved by minimizing the squared Mahalanobis distance instead of the squared Euclidean distance between the correspondences. EMMs have been extensively researched in the context of 2-D Monte Carlo Localization and SLAM methods [33]. 30. need to compute an absolute quality measure. Σij ) N (z..e. In our case. Since yj is given with respect to the sensor pose. in particular..180 We use a least-squares estimation method [32] in each iteration of RANSAC to compute the motion estimate from the established 3-D point correspondences. both ICP and RANSAC using feature correspondences lack a reliable failure detection. Σi ) N (yj . the EMM cannot be trivially adapted for our purpose. We optimize a small graph that consists only of the two sensor poses and the previously determined inliers. i. First. While this is typically done in a 2-D occupancy grid map. The probability for the observation yi given an observation yj from a second frame can be computed as p(yi | yj ) = η p(yi . i. Σ −1 −1 −1 −1 −1 where yc = (Σ−1 i + Σj ) (Σi yi + Σj yj ) Σij = Σi + Σj and ˆ ij = (Σ−1 + Σ−1 )−1 . in our experiments. the probability density is used as a likelihood value that determines the particle weight. therefore. the discussed methods for egomotion estimation can be assumed successful. from which we construct the covariance matrix Σj for each point yj . yj . we employ a beam model. a perfect match will differ. (1) Since the observations are independent given the true obstacle location z. In our case. 1. yj . we model the measurement noise according to the equations for the covariances given by Khoshelham and Elberink [35]. Environment Measurement Model Given a high percentage of inliers. Hence. yc . yj | z) p(z) dz (2)  p(yi | z) p(yj | z) p(z) dz =η (3)  =η N (yi . which is the rotation matrix of the estimated transformation. The value for. (4) Exploiting the symmetry of Gaussians. a recent adaptation of such a beam-based model for localization in 3-D voxel maps has been proposed by Oßwald et al. it implicitly represents a beam. Σi = RT Σi R. We also investigated including the landmarks in the global graph optimization as it has been applied by other researchers. Σj ) p(z) dz. we can write  =η N (z. We propose to use a procedure analogous to the statistical hypothesis testing. (5) The product of the two normal distributions contained in the integral can be rewritten [36] so that we obtain  ˆ ij )p(z) dz (6) p(yi | yj ) = η N (yi . e. In Monte Carlo methods. The resulting probability density value obtained for each beam IEEE TRANSACTIONS ON ROBOTICS. C. this weight is used as a comparative measure of quality between the particles. e. independent of the estimation method used. An EMM can be used to penalize pose estimates under which the sensor measurements are improbable given the physical properties of the sensing process. Σi ) N (z. Σj )p(z) dz. This is not applicable in our context as we do not perform a comparison between several transformation candidates for one measurement. NO. it is computationally expensive to compute even a partial 3-D voxel map in every time step. Second. we project the points yi of ID two depth images ID into ID to obtain the points yi (denoted without the prime).. the optimization runtime increases substantially. We. yj ).

Σj ) dz = p(z)−1 . Care has to be taken when examining the statistical properties. D. 4 illustrates the different cases of associated observations. Σ Since we have no prior knowledge about p(z). yj . The last observation y k is outside of the field of view of camera A and is. Fig. as it falls in the free space between camera A and observation y q . . z. Analogously to robust statistics such as the median and the median absolute deviation. therefore. are common. However. the N individual terms Δyij = yi − yj . by the definition of a probability density function. Σij ). the individual estimations are noisy. Visual Odometry and Loop-Closure Search Applying an egomotion estimation procedure. A standard hypothesis test could then be used to reject a transformation estimate at a certain confidence level. between consecutive frames provides visual odometry information. the data association of y i and y j is counted as an inlier. further than three standard deviations) are. therefore. we assume it to be a uniform distribution. We. We will see later that the posterior distribution remains a proper distribution. ignored.g. range readings that are projected behind the corresponding depth value. (9) p(yi | yj ) = ηN (yi . therefore. Fig. This can be shown by expanding the normalization factor  −1 −1 p(yj |z) dz (11) η = p(yj ) =  = N (yj . yj . thus. since we expect a static environment during mapping and want to penalize this kind of range reading. three standard deviations can be computed from the cumulative density function of the normal distribution. and one occluded. 4. e. . (10) Informally speaking. particularly in situations with few features or when most features are far away or even out of range.. Projecting the points observed from camera A to camera B. which in practice confirm the transformation estimate. despite the choice of an improper prior [37]. this test is very sensitive to small errors in the transformation estimate and is. Σ) ∈ R3N  . In contrast. one outlier. thus.: 3-D MAPPING WITH AN RGB-D CAMERA 181 The first term in the integral in (6) is constant with respect to z. (15) We can combine the aforementioned 3-D distributions of all data associations to a 3N -dimensional normal distribution.g. Note that the aforementioned formulation contains no additional term for short readings.g.ENDRES et al. The projection of y q cannot be seen from camera B. . having no prior knowledge about the true obstacle also means that we have no prior knowledge about the measurement. yc .) is a column vector containing where ΔY = (. where N is the number of data associations. we obtain  −1  −1 p(yj ) = p(z) N (z. Σij ) p(z). The fraction of inliers is independent of the absolute value of the outliers and. In practice. the value of p(z). In contrast. yj .” points projected closely behind the associated beam. The remaining integral only contains the normal distribution over z and. the outliers at depth jumps will be highly improbable under the given model and will lead to rejection.” the points projected far behind the associated beam (e. yj . as given in [33]. reduces to one. becomes independent of z and we can move it out of the integral. hardly useful. the final result of the EMM is two inliers. we use the hypothesis test on the distributions of the individual observations (15) and compute the fraction of outliers as a criterion to reject a transformation. e. smoothly degenerates for increasing errors in the transformation while retaining an intuitive statistical meaning. We assume each point occludes an area of one pixel. Two cameras and their observations aligned by the estimated transformation T A B . In our experiments (see Section IV-D). However. As it is constant. the association between y i and y j is counted as an inlier. Σj )p(z) dz −1 and using the same reasoning as earlier. “Occluded outliers. . as this effectively doubles the inliers. the fraction of inliers within. by testing the p-value of the Mahalanobis distance for ΔY for a χ23N distribution (a chi-square distribution with 3N degrees of freedom). apply a measure that varies more smoothly with the error of the transformation. we get the final result p(yi | yj ) = N (yi .. ignored. In the projection from camera A to camera B. which allows us to move it out of the integral  ˆ ij )p(z) dz. however. Σij ) N (z. as it is occluded by y k . when looking behind an obstacle from a different perspective. . .. therefore. Δyij  (16) . Assuming independent measurements yields p(ΔY ) = N (ΔY | 0. we do want to use the positive information of “occluded inliers. Assuming a perfect alignment and independent measurements. Hence. we show that applying a threshold on this fraction allows effective reduction of highly erroneous transformation estimates that would greatly diminish the overall quality of the map. y p is counted as an outlier. leaving only p(yi | yj ) = η N (yi . (12) (13) (14) Combining (10) and (14). Even under small misalignments. as it is our main indication for misalignment. and Σ contains the corresponding covariance matrices Σij on the (block-) diagonal. such as the one described in Section III-B.

We then remove the n immediate predecessors from the tree to avoid duplication and randomly draw k frames from the tree with a bias toward earlier frames. TABLE I DETAILED RESULTS OBTAINED WITH THE PRESENTED SYSTEM E. while the field of view of the frames in between keyframes always overlaps with at least one keyframe. this increases the computational expense linearly with the number of estimates. the repeatability of the keypoint detector.182 Combining several motion estimates. respectively. More precisely. this is mitigated to a certain degree.j ∈C to find the optimal trajectory X∗ = argminX F(X). A frame is added to the set of keyframes. with the sequential predecessor as the root node. loop closures. the terms zij and Ωij represent. as done for “fr1/desk. and the possibility of estimating a valid transformation is strongly limited by the overlap of the field of view. While these methods may be required for datasets spanning hundreds of kilometers. (Bottom) Additional exploitation of previously found matches using the geodesic neighborhood. Table I exemplary states the parameterization for a subset of our experiments. we minimize an error function of the form  F(X) = e(xi . Because of estimation errors. On the challenging “Robot SLAM” dataset. Successful transformation estimates to much earlier frames. To find large loop closures. Here. xn ) is a vector of sensor poses. We. Recognition of images in large sets of images has been investigated mostly in the context of image retrieval systems [38] but for large scale SLAM as well [39]. Because of the sensor limitations. and the robustness of the keypoint descriptor. e(xi . zij ) Ωij e(xi . feature-rich sequences low values can be set. . 5 shows a comparison between a pose graph constructed without and with sampling of the geodesic neighborhood. when it cannot be matched to the previous keyframe. This way. 30. as computed by the SLAM frontend. . since the individual frame-to-frame estimates are independent and can. In particular. we focus on indoor applications and proposed an efficient.. Pose graph for the sequence “fr1/floor. X =   (x 1 . Furthermore. To compute a globally consistent trajectory. VOL.” (Top) Transformation estimation to consecutive predecessors and randomly sampled frames. zij ) (17) i. xj . For short. therefore. the mean and the information matrix of a constraint relating the poses xi and xj . xj . may drastically reduce the accumulating error.e. xj . 5. the average error is reduced by 26 %. Fig. the same overall number of candidate frames for frameto-frame matching were processed. k = 5. the edges form no globally consistent trajectory. the pairwise transformation computed by the frontend. Therefore. 1. Naturally. zij ) is a vector error function that measures how . . The top graph has been created by matching n = 3 immediate predecessors and k = 6 randomly sampled keyframes. we apply the egomotion estimation to n immediate predecessors. IEEE TRANSACTIONS ON ROBOTICS. straightforward-to-implement algorithm to suggest candidates for frame-to-frame matching [25]. form the edges of a pose graph. this procedure exploits the knowledge about the loop by preferring candidates near the loop closure in the sampling. we require a more efficient strategy to select candidate frames for which to estimate the transformation. additionally estimating the transformation to frames other than the direct predecessor substantially increases accuracy and reduces the drift. However. i. Graph Optimization The pairwise transformation estimates between sensor poses. therefore. the values need to be increased. The extension of found loop closures is clearly visible. they require an offline training step to build efficient data structures. In both runs.” For longer sequences. and l = 2 sampled frames from the geodesic neighborhood. we second search for loop closures in the geodesic (graph-) neighborhood of the previous frame. The choice of these parameters is crucial for the performance of the system. guide the search for potentially successful estimates by those previously found. we optimize the pose graph using the g2 o framework [40].e. We employ a strategy with three different types of candidates. To efficiently reduce the drift. i. NO. FEBRUARY 2014 Fig.. The bottom graph has been created with n = 2. we randomly sample l frames from a set of designated keyframes. For multicore processors. once a loop closure is found. We compute a minimal spanning tree of limited depth from the pose graph. First. a comparison of a new frame to all predecessor frames is not feasible. which performs a minimization of a nonlinear error function that can be represented as a graph. when the robot revisits a place. Finally. . be parallelized. the number of frames for sampling is greatly reduced.

Fig. An unfiltered map constructed from the benchmark data used in our experiments would require between 2 and 5 GB. The map error and the trajectory error of a specific dataset strongly depends on the given scene and the definition of the respective error functions. Synchronized ground truth data for the sensor trajectory. For online generation of a voxel map. 1. Map Representation The system described so far computes a globally consistent trajectory. Using this trajectory. x the corresponding ground truth X. in combination with the EMM. We.2 to 25 MB. large errors in the motion estimation step can impede the accuracy of large parts of the graph. we analyze the presented system by means of the quantitative results of our experiments. the root-mean-square of the Euclidean distances between the corresponding ground truth and the estimated poses. In the following.. as the number of traversed voxels per ray increases with the resolution. improves the rate of faulty transformation estimates.40 GHz and an nVidia GeForce GTX 570 graphics card for all experiments. It is 0 when xi and xj perfectly match the constraint. each frame added to a point cloud map requires approximately 3. it is. The validation method proposed in Section III-C. use a threshold on the summands of (17) to prune edges with high error values after the initial convergence and continue the optimization. For challenging data where several bogus transformations are found. thereby creating a point cloud representation of the world. IV. The voxels are managed in an efficient tree structure that leads to a compact memory representation and inherently allows for map queries at multiple resolutions. F.. is available for all sequences.g. Using an explicit 3-D representation. However. X) =  n i=1 i. therefore. In our implementation. which is essential for collision avoidance and exploration tasks. To determine the influence of each parameter on the overall performance of our system. The residual error of the graph after optimization allows to determine inconsistencies in edges. the point clouds are often subsampled.g. therefore.5D representation in a depth image cannot be used to store a complete map. the trajectories are aligned such that the aforementioned error is minimal. In contrast. we employ the RGB-D benchmark [26]. particularly. using a resolution of 10 cm and a subsampling factor of 16 allowed for 30 Hz updates of the map and resulted in maps suitable for online navigation. A further reduction to an average of few hundred kilobytes can be achieved if the maps are stored binary (i. there are many parameters and design choices that influence the overall performance. therefore. However. Such models. the presented system has also been successfully used for online mapping.e. .6 MB in memory. The use of probabilistic occupancy estimation furthermore provides a means of coping with noisy measurements and errors in pose estimation. the difference of the poses exactly matches the estimated transformation. To overcome the limitations of point cloud representations. we use 3-D occupancy grid maps to represent the environment. we can project the original point measurements into a common coordinate frame.ENDRES et al. are highly redundant and require vast computational and memory resources. the corresponding OctoMaps with a resolution of 2 cm ranges from only 4. when revisiting known parts of the map. We used an Intel Core i7 CPU with 3. as in Fig. we use the octree-based mapping framework OctoMap [41]. RGB-D Benchmark Datasets The RGB-D benchmark provides an RGB-D dataset of several sequences captured with two Microsoft Kinect and one Asus Xtion Pro Live sensor. To make the error metric independent of the coordinate system in which the trajectories are expressed. since the loop closing edges in the graph diminish the accumulated error. due to repeated occurrences of objects. it is defined as   n 1  ˆ trans(ˆ xi ) − trans(xi )2 (18) ATERM SE (X. The benchmark also provides evaluation tools to compute several error metrics given an estimated trajectory. we first describe the benchmark datasets and error metric. raycasting a single RGB-D frame took about 25 s on the mentioned hardware. however. For a trajectory estimate X x1 .: 3-D MAPPING WITH AN RGB-D CAMERA well the poses xi and xj satisfy the constraint zij . captured with a high precision motion capturing system. which makes the qualitative results independent of the used hardware. In most applications. that a voxel map cannot be updated efficiently in case of major corrections of the past trajectory as obtained by large loop closures. The time required per frame is highly 183 dependent on the voxel size. The error metric chosen in this study .. we. Unfortunately. need to lower the resolution and raycast only a subset of the cloud. the trajectory estimate obtained after graph optimization may be highly distorted. is the explicit representation of free space and unmapped areas. A crucial advantage. i. The correspondences of poses are established using the timestamps.. in contrast with a point-based representation. the validity of transformations cannot be guaranteed in every case.e. reasonable to recreate the map in the case of such an event. using a voxel grid. A. e. only “free” versus “occupied”). e. i. Subsequently. EXPERIMENTAL EVALUATION For a 3-D SLAM system that aims to be robust in real-world applications. Adding the sensor viewpoint to each point. Raycasting takes about one second per 100 000 points at a voxel size of 5-cm on a single core. . This is primarily a problem in areas of systematic misassociation of features. Note. processing every recorded frame. 10 shows the effectiveness of this approach. therefore. Global optimization is especially beneficial in case of large loop closures.e. We use the root-mean-square of the absolute trajectory error (ATE) in our experiments which measures the deviation of the estimated trajectory to the ground ˆ = {ˆ ˆ n } and truth trajectory. the creation of an OctoMap requires more computational resources since every depth measurement is raycasted into the map.. In our experiments. substantially. a surfel map can be created. At a 5-mm resolution.e. The memory efficient 2. however.. On the downside. Note that the presented results were obtained offline.

Further. similar to a barber’s pole. the sequences contain stretches with hardly any visual features with depth measurements. except for SIFT. Occupied voxels are represented with color for easier viewing. as it provides the highest accuracy (median RMSE of 0.csail. For SIFT. B.tum.in. at the sensor rate of 30 Hz). Further information about the dataset can be found on the benchmark’s web page. Evaluation of (left) accuracy and (right) runtime with respect to feature type. 30. The comparison results clearly show that each feature offers a tradeoff for a different use case. The repeatedly occurring poles have a spiral pattern that. FEBRUARY 2014 Fig. suggest a vertical motion when viewed from a different angle.g. Recorded in an industrial hall. the extraction speed of these options comes at the price of reduced accuracy and robustness. such that they are only visible in the RGB image. and a combination of the ShiTomasi detector with SURF descriptors. fast motions. 6. Note that. for most sequences in the used dataset. we could observe improved matching of features led to improvement of up to 25. Occupancy voxel representation of the sequence “fr1/desk” with 1 cm3 voxel size. The keypoint detectors and descriptors offer different tradeoffs between accuracy and processing times.edu/stata/ . and short-term absence of salient visual features.8% for some datasets. VOL. motion blur.1 Fig. To emphasize the robustness of our system.. A desired property of the sequences is the possibility to find loop closures. SIFT is clearly the best choice. resulting in features with wrong depth information. The data sequences cover a wide range of challenges. ORB and the combination of Shi–Tomasi and SURF may be good choices for robots with limited computational resources or for applications that require real-time performance (i. quickly changing lighting conditions. 7 show a performance comparison on the “fr1” dataset. we make no use of any additional information in the presented experiments. ORB. the floor contains few distinctive visual features. Fig. it is reasonable to assume that the error in the map will be directly related to the error of the trajectory. even though the sequences contain wheel odometry and the motion is roughly restricted to a plane. where we employ a GPU-based implementation [42]. however. In our experiments. we also evaluated the performance impact of matching SIFT and SURF descriptors with the Hellinger distance instead of the Euclidean distance. like cables and tripods. However. SURF. as the error is dominated by effects other than those from erroneous feature matching. We use the OpenCV implementations in our implementation. Timings do not include graph optimization.184 Fig. have a very thin structure. the improvement was not significant.04 m). which makes them applicable only in benign scenarios or with additional sensing. However.2 Results obtained with our system for a 229-m-long sequence of this dataset are given in Table I. if a GPU is available. together with the computed ATE root mean squared error. for example.e. As the change in distance measure 2 http://projects. increasing the number of features until about 600 to 700 improves the accuracy. does not directly assess the quality of the map. 6 shows a voxel map our system created for the “fr1/desk” sequence. No noticeable impact on accuracy was obtained when using more features. Another influential choice is the number of features extracted per Frame. where the Kinect was mounted on a Pioneer 3 robot. these sequences combine many properties that are representative of highly challenging input.de/data/datasets/rgbd-dataset IEEE TRANSACTIONS ON ROBOTICS.. With an average error of about 15 cm on the “fr1” dataset. the scenario is office-sized and rich on features. e. 1 http://vision. Because of the size of the hall and the comparatively short maximum range of the Kinect. In addition to the aforementioned difficulties. We evaluated SIFT. 8 shows the detailed results per sequence. The plots in Fig. 2 shows a 2-D projection of the ground truth trajectory for the “fr2/pioneer_slam” sequence and a corresponding estimate of our approach. The sequences also contain short periods of sensor outage. NO. as recently proposed in the context of object recognition [29]. In our experiments. odometry. occurrence of repeated instances of objects of the same kind can easily lead to faulty associations. Recently. yet do not occur in the depth image. Fig. In contrast. we further present an evaluation on the four sequences of the “Robot SLAM” category. Some objects. The aforementioned results were computed on the nine sequences of the “fr1” dataset with four different parameterizations for each sequence.mit. 1. 7. RGB-D datasets captured at the MIT Stata center have been made available. The “fr1” set of sequences contains. Overall. Visual Features One of the most influential choices for accuracy and runtime performance is the used feature detector and descriptor.

the results from CHOLMOD and CSparse are more reliable. therefore. which measures the discrepancy between the individual transformation estimates before and after optimization. For the presented experiments. in some cases. Our system tries to estimate the transformation of captured RGB-D frames to a selection of previous frames. we suggest the adoption of the Hellinger distance. methods. we compute the number of inliers..ENDRES et al. we subsample the depth image. Increased robustness can be achieved by detecting transformations that are inconsistent to other estimates. g2 o provides three solvers. it could occlude or be occluded by a projected point. both in terms of accuracy and computation time. 9(b)]. Pruning of graph edges based on the statistical error also proves to be effective. In the presented . we describe the implementation of the EMM that is proposed in Section III-C and evaluate the increase in robustness. (b) Evaluation of the proposed EMM for the “Robot SLAM” scenarios of the RGB-D benchmark for various quality thresholds. the EMM in the SLAM frontend as well as the pruning of edges in the SLAM back end. Graph Optimization The graph optimization backend is a crucial part of our SLAM system. The runs represent various parameterizations for the required EMM inlier fraction [as in Fig. The plot has been generated from 240 runs and shows the successive application of the named techniques. particularly in combination with the EMM. Therefore. The boxes to the right show that the accuracy is drastically improved by pruning erroneous edges. 8. The data association between projected point and beam is not symmetric.: 3-D MAPPING WITH AN RGB-D CAMERA Fig. evaluate both the projection of the points in the new image to the depth image of the older frame and vice versa. Fig. Evaluation of the accuracy of the proposed system on the sequences of the “fr1” dataset using SIFT features. (a) Evaluation of the accuracy of the presented system on the sequences of the “Robot SLAM” dataset. the runtime of PCG drastically decreases given good initialization.1 Hz and a maximum of 16. and matching candidates (18–36). and occluded points. we optimize the pose graph offline using CSparse. outliers. two of which are based on Cholesky decomposition (CHOLMOD. Outliers are classified as occluded if they are projected behind the corresponding measurement. however. We. by applying the three sigma rule to (15). e. 10. The achieved frame rates are similar for all sequences. except when large loop closures cause major changes in the shape of the graph. feature count (600–1200). However. for offline optimization. graph optimization may also distort the trajectory. Since the main computational cost of optimization lies in solving a system of linear equations. In a challenging scenario (“Pioneer SLAM”). neither increases the runtime nor the memory requirements noticeably. The plot has been generated from 288 evaluation runs using different parameter sets. In online operation. Fig. [44]. In particular. It significantly reduces the drift in the trajectory estimate. the best performance is achieved using both rejection D. we investigated the effect of the used solver. The value 0. We do this by pruning edges after optimization based on the Mahalanobis distance obtained from g2 o. the application of the EMM leads to greatly improved SLAM results.0 on the horizontal axis represents the case where no EMM has been used.0 Hz. in the reversed process. this is usually given by the previous optimization step. 9. points projected within a Mahalanobis distance of three are counted as inliers. The use of the EMM substantially reduces the error. Common causes for this are wrong “loop closures” due to repeated structure in the scenario. which contains several of the mentioned pitfalls. The median frame rate for the above experiments is 13. 185 Fig. To reduce the requirements on runtime and memory.g. CHOLMOD and CSparse are less dependent on the initial guess. As shown in Fig. Environment Measurement Model In this section. PCG is ideal for online operation. C. After computing such a transformation. with a minimum of 9.4 Hz. As can be seen from the green boxes. However. Most recently similar approaches for discounting edges during optimization have been published [43]. a point projected outside of the image area of the other frame has no association. 10 shows an evaluation of edge pruning on the “fr2/slam” sequence. CSparse) and one implements preconditioned conjugate gradient (PCG). 4. or highly erroneous transformation estimates due to systematically wrong depth information. in case of thin structures. than PCG. As stated.

Autom.186 IEEE TRANSACTIONS ON ROBOTICS. “A method for registration of 3-D shapes. J. 13–14. Endres. we have fully released all source code required to run and evaluate the RGB-D SLAM system as open source. S. By rejecting highly inaccurate estimates based on this quality measure.” IEEE Trans. pp. Pattern Anal. 2012. Robots Syst. the EMM-based rejection will provide no substantial gain. pp. Henry. pp. VOL. Finally. Kaess. we presented a novel 3-D SLAM system for RGB-D sensors such as the Microsoft Kinect. Leonard. [15] E. i. Fitzgibbon. pp. the average runtime for the bidirectional EMM evaluation was 0. Comput. 2012. Symp. 3158–3164. we first ran the system repeatedly with eight minimum values for the quality q on the “fr1” dataset. and A. This also decreases the statistical dependence between the measurements. 1403– 1410. Jul. and path planning.” Comput. 239–256. Zhao. outliers.. IEEE ACM Int. [14] H. Karlsruhe. J. Japan. we introduced a beam-based EEM that allows us to evaluate the quality of a frame-to-frame estimate. A.: Sci. Quebec. IEEE Int. we also require the inliers to be at least 25 % of the observed points. 2007. and D. 2..” Int. J. 1992.. Conf. H. Shotton. Zhao. Res.” in Proc. pp. no. “Real-time simultaneous localisation and mapping with a single camera. Comput. and X. 303–312. M. “Kinectfusion: Real-time dense surface mapping and tracking.. Robot.” in Proc. Zheng. pp. pp. To improve the reliability of the transformation estimates.. Conf.. J. vol. Comput. “RGB-D mapping: Using kinect-style depth cameras for dense 3D modeling of indoor environments. [12] D.e. Syst. Grzonka. “An ICP variant using a point-to-line metric. Comput. [17] P. the rejection of inaccurate estimates and wrong associations significantly reduces the error in the final trajectory estimates. Krainin. vol. WA.” in Proc. CA. Grisetti.” Int. Herbst. M. [5] A. Engelhard. May 2013. Conf. Hilliges. Syst. Pfaff.” in Proc. Apr. [6] J.. Reality. “Efficient variants of the ICP algorithm. 30. “Speeded-up robust features (SURF). Davison. 1996. IEEE Int. 75. pp. 2003. A. “Generalized-ICP. REFERENCES [1] F.” in Proc. D.. Conf. the system performs substantially worse in a number of trials. Cremers. O. Robot.25 to 0. [Online]. Intell. pp. and G. Pattern Recognit. NO. 2001. USA. pp. The number of edges of the pose graph is only minimally reduced and the overall runtime increases slightly due to the additional computations. To allow other researchers to reproduce our results. Vis. 1.. 13. 127–136. Stachniss. Germany. 91–110. Vis. “Distinctive image features from scale-invariant keypoints. Dissanayake.” in Proc. and W. Pattern Recog. Conf. vol. M. pp. 5. Hodges. Rublee. Ess. [9] G. Bradski. Besl and H. [8] A. 1145–1153. “A fully autonomous indoor quadrotor. 3-D Digital Imaging and Modeling. X.” presented at the Int. Hu. Comput. Zeng. [4] A. 110. [23] G. we show statistics over many trials with varied parameters settings. Levoy. IEEE Int. For thresholds above 0.. 2010. 19–25. Montiel. Censi. (2013. Vis. “Preemptive RANSAC for live structure and motion estimation. 21. 31. Murray. 90–100. P. A. C. To evaluate the effect of rejecting transformations with the EMM. and S. [21] T. Soatto. we construct our point cloud using only every eighth row and column of the depth image. Nister. 2012. 2000. 2009. 2003. Strasdat. and W. M. Analogous to this finding.. “Octree-based fusion for realtime 3D reconstruction. Robotics Automation. Davison. While most trials remain unaffected. “Real-time 3-D motion and structure of point-features: A front-end for vision-based control and interaction. 647–663. 2003. Shi and C. and L. In our experiments. that the error in the “fr1” dataset does not stem from individual misalignments. A. N. Comput. Alempijevic. Ren.” Image Vision Comput.). We furthermore provided detailed information about the properties of an RGB-D SLAM system that are critical for its performance. Zaragoza. IEEE Int. inliers. Nara. Whelan. IEEE/RSJ Int. We use RANSAC to estimate the transformations between associated keypoints and optimize the pose graph using nonlinear optimization. J. 2011. Models. Special Interest Group Graph. the same evaluation on the four sequences of the “Robot SLAM” category results in greatly increased accuracy and robustness. [16] J. no.” in Proc.” presented at the IEEE Int. Newcombe. “Good features to track. Mixed Augmented Reality. May 2008.82 ms. USA. 593–600. Our approach extracts visual keypoints from the color images and uses the depth images to localize them in 3-D.). vol. from these experiments. Levoy. W. Curless and M.” in Proc. “Robust registration of 2D and 3D point sets. Klein and D. and to build upon them. and A.” presented at the Robotics. 346– 359. [11] H. Interactive Techn. Tomasi. 14. J. IEEE Intl. the use of the EMM decreases the average error for thresholds on the quality measure up to 0. Vis. as it can only filter the estimates. Because of the properties described in the previous section. FEBRUARY 2014 experiments. J. Vis.9. Van Gool. F. Vis. Vis. “On the position accuracy of mobile robot localization based on particle filters combined with scan matching. As apparent in Fig. Konolige. and G. V.” presented at the Robotics: Sci. IEEE/RSJ Int. Symp. for which alternative higher precision alignments are available. G. 2011.. and W. McKay. Burgard. vol. Davison. Favaro.. We compute the quality q of the point cloud alignment using the number of inliers I and the sum of inliers and outliers I + O I . Robots Syst. “Parallel tracking and mapping for small AR workspaces. [10] H. Kim. “ORB: An efficient alternative to SIFT or SURF.95. D. Spain. Burgard. and occluded points. Roewekaemper. [22] M. Mach. the robustness decreases. Intell. vol. [20] B. To avoid accepting transformations with nearly no as q = I +O overlap. no.. to improve on them. Lowe. J. D. Kohli. 778–779. 1714–1719. Feb. We performed a detailed experimental evaluation of all components and parameters based on a publicly available RGB-D benchmark and characterized the expected error in different types of scenes. In this case. P. vol. In contrast..” in Proc. no. McDonald. 2012. Fitzgibbon. no.” in Proc. P. Conf. Molyneaux. We conclude. Feb. Image Understanding.” IEEE Trans. effectively reducing the cloud to be stored to a resolution of 80 by 60. D.” in Proc. [18] A. Pasadena. Conf. Johannsson. Available:http://ros. pp. Jin. Bay. Liu. Izadi. . 2. 1994.org/wiki/rgbdslam [2] P. Mixed Augment. [19] R. 1. Burgard. and J.” Graph. “Scale drift-aware large scale monocular slam. pp.. Conf. 225–235. Fox. Thrun. L. J. pp. G. K. 28. C. Rabaud. 2008. “A volumetric method for building complex models from range images. Huang. Seattle. Sturm. 2564–2571. Tipaldi. 199–206. A q-threshold from 0. we generate a volumetric 3-D map of the environment that can be used for robot localization. Haehnel. navigation. 2004. Canada. W. Conf. “A robust RGB-D SLAM algorithm. pruning edges based on the Mahalanobis distance in the graph does not improve the results for the “fr1” dataset. [7] S. 60. Rusinkiewicz and M. IEEE Int. Hess. D. Conf. Robot. S. 2012. Intell. “Robust real-time visual odometry for dense RGB-D mapping.9 results in a minor improvement over the baseline (without EMM). Tuytelaars. J. Conf. T. our approach can robustly deal with highly challenging scenarios. Segal. S. CONCLUSION In this paper. Sprunk. pp. [13] D. IEEE Conf. and S. To avoid reporting results depending on a specific parameter setting. V. E. 9(b). pp. [3] S. pp. vol.

vol. IEEE Int. O. pp. Hess. Pattern Anal. Available: http://www2. where he is currently working toward the Ph. 2911–2918. Robot. no. 2005.. J. May 2012.D. Bonn.” Wolfram Burgard received the Ph. China. the most prestigious German research award.” in Proc. Germany. W. Conf. 2011. 13. 2. Germany. Arandjelovi´c and A.. Sep. Durrant-Whyte and T. and W.” Sensors. G. vol. he spent two years as a Postdoctoral Researcher with the University of California at Los Angeles (UCLA). 13. Germany.” in Proc. Olson and P.. Pattern Recog. S. Int. degree in computer science from the University of Mannheim. Daniel Cremers received the Bachelor degree in mathematics and physics and the Master degree in theoretical physics from the University of Heidelberg. vol. “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Autom. Spinello. ACM. His major research interests include dense localization and mapping. Jun. Shanghai.D. Comput. 2006. “A benchmark for the evaluation of RGB-D slam systems. Since 2009. 2008. Sturm received the European Coordinating Committee for Artificial Intelligence (ECCAI) Artificial Intelligence Dissertation Award 2011 for his Ph.” IEEE Robot. His research interests include robot manipulation. Res. University of Freiburg. and W. Comput. 270– 278. [41] A. Conf. in 2002..imm. Oßwald.. Agarwal. Tipaldi. He is a Postdoctoral Researcher with the Computer Vision and Pattern Recognition group of Prof. D. In December 2010. New York. and visual navigation for autonomous quadrocopters. Jun. His research interests include 3-D perception. 1809–1814. Cremers has received several awards for his publications. Portugal. in 1991 He is currently a Professor of computer science with the University of Freiburg.D. Umeyama. 12. vol. degree. Strasdat. IEEE Conf. he was an Associate Professor with the University of Bonn. Apr. Germany. thesis and was shortlisted for the European Robotics Research Network (EURON) Georges Giralt Award 2012. and D.. Technical University of Munich.dtu. USA. May 2013. pp. J. Available:http://cs. degree with the Autonomous Intelligent Systems Lab headed by W. Nister and H. he received the Advanced Grant of the European Research Council. Felix Endres received the Master degree in applied computer science from the University of Freiburg. Mag.” in Proc. he received the Gottfried Wilhelm Leibniz Prize.edu/ccwu/siftgpu [43] P. 2003. Res. Heidelberg. 6. NY. [35] K. 1991. and learning of manipulation skills by interaction. Hornung. 573–580. Newman. “Simultaneous localization and mapping (SLAM): Part II.ENDRES et al. Probabilistic Robotics. 376–380. Germany. M. 7. Burgard. Burgard. Washington.D. Burgard.” in Pattern Recognition. Conf. Conf. Fox. Germany. pp.” Oct. [33] S. [42] C. St. [28] T. Sturm. path-planning. “Least-squares estimation of transformation parameters between two point patterns. C. Vilamoura. In 2009.” Auton. 2012.. D. where he heads the Laboratory for Autonomous Intelligent Systems. Burgard has received several best paper awards from outstanding national and international conferences for his work. Robot Syst. and M. Pedersen. in 2008.unc. N.. “6D visual SLAM for RGB-D sensors. IEEE/RSJ Int. Robot. 2010. Freiburg. 381–395.D. They cover different aspects such as localization. [26] J. “The Matrix Cookbook. [32] S. Fischler and R. Intell. DC. In 2010. Pattern Recognition and Machine Learning. Endres. M. Autom. 826–840. Endres. 2. degree in computer science from the University of Bonn. J. NY. Mag. Mach. [30] M. no. Kittler. the magazine Capital listed him among “Germany’s Top 40 Researchers Below 40. pp. Bailey and H. Pattern Recog. Thrun. “Inference on networks of mixtures for robust robot mapping. vol. he and his group developed several innovative probabilistic techniques for robot navigation and control. Burgard. 4. [31] O.” in Proc. ¨ Jurgen Hess received the Master degree in computer science from the University of Freiburg. Grisetti. in 1994 and 1997. Endres. K. 3. 2161–2168. and W. Intell. 3-D reconstruction. 1691–1696. he developed a novel approaches to estimate kinematic models of articulated objects and mobile manipulators. pp. vol. Robot. MN. surface coverage. USA: MIT Press. pp. Freiburg Germany.” IEEE Trans. Freiburg. Intell. C. 236–243. Bennewitz. L. . Los Angeles.: 3-D MAPPING WITH AN RGB-D CAMERA [24] F. “Scalable recognition with a vocabulary tree. and J. Burgard. 60. degree from the Autonomous Intelligent Systems Lab headed by Prof. [38] D. no. Burgard. Dr. Bishop. and exploration. and D. J. USA: Springer-Verlag. Vis. Technical University of Munich. [34] S. Bonn. D. [36] K. “Locally optimized ransac.” Int. pp. SiftGPU: A GPU implementation of scale invariant feature transform (SIFT). [27] H. “OctoMap: An efficient probabilistic 3D mapping framework based on octrees.” Automatisierungstechnik. Matas. Burgard. [29] R. IEEE Int.0. where. J. Vis. F. Subsequently. Konolige.php?3274 [37] C.. W. Engelhard. Robot. 2006. vol. N. He is the Spokesperson of the Research Training Group Embedded Microsystems and the Cluster of Excellence BrainLinks-BrainTools.D. Paul. 2012. Autom. degree with the Autonomous Intelligent Systems Lab headed by W. A. respectively. Elberink. 2013. Autom. he has been working toward the Ph. Stewenius. Chum. “Appearance-only SLAM at large scale with FAB-MAP 2. including the award of Best Paper of the Year 2003 from the International Pattern Recognition Society and the 2005 UCLA Chancellor’s Award for Postdoctoral Research. New York. 2012. “G2o: A general framework for graph optimization. Cummins and P. 99–110. K¨ummerle. Stachniss. 32. Autom. W. 3607–3613. Robot. Stachniss. 2012. [Online].” IEEE Robot. 13. Bolles. Hess. Engelhard. Cremers. In the past. Agarwal. 2006. USA: Springer-Verlag. “Improved proposals for highly accurate localization using range and vision data. “Three things everyone should know to improve object retrieval. Engelhard. 30. NJ. USA. pp. Hornung. vol. mapbuilding. 2013. Bailey. 108–117. Khoshelham and S. [44] E. since 2009. “An evaluation of the RGB-D SLAM system. and the Ph. 2006. 24. Bennewitz. he has been the Chair with Computer Vision and Pattern Recognition. Robots. and W. Burgard. pp. Petersen and M. Princeton. B. Soc. pp. From 2005 to 2009. pp. N. G. no. Munich. (2007). “Robust map optimization using dynamic covariance scaling.” in Proc. While working toward the Ph. Burgard. Germany. J. Durrant-Whyte. USA.” in Proc. [25] F. pp. Germany.D. 189–206. Oct.. Germany. Wu. Mannheim. Sturm. K. Dr. Conf. “Simultaneous localization and mapping: Part I. pp.dk/pubdb/p. Zisserman. and one year as a Permanent Researcher with Siemens Corporate Research. Conf. no. “Accuracy and resolution of kinect depth data for indoor mapping applications. 1981. and W. 1437–1454. 2012. New York.” Int. USA. the Department of Computer Science. [39] M. and 3-D perception. 1100–1123. Wurm.” in Proc. no. H. Cremers... 187 ¨ Jurgen Sturm received the Ph. [40] R. NY. J. CA. pp. simultaneous localization and mapping systems. His areas of interests include artificial intelligence and mobile robots. pp. Burgard. vol. Sturm. Cremers. IEEE Int. as well as for tactile sensing and imitation learning. 2. pp.” Commun. Robots Syst. Munich. Freiburg. IEEE Comput. Dr.