Professional Documents
Culture Documents
4
System Integration, Sapporo Convention Center, Sapporo, Japan,
December 13-15, 2016
Abstract— For a mobile robot to explore in an unknown mation matrix for alignment of two point cloud. The ICP
environment, the robot needs to do Simultaneous Localization algorithm is iteratively applied to consecutive point cloud
and Mapping (SLAM). Recently, RGB-D sensor such as the data from the RGB-D sensor. The map of the environment
Microsoft Kinect has been extensively used as a low cost 3D
sensor measurement and mapping the environment. The advan- can then be constructed by simultaneous tracking the motion
tage of RGB-D sensor is that it can capture RGB image with of the sensor.
per-pixel depth information. One disadvantage for this sensor is Almost all sensors are affected by noise and many re-
that its depth information is very noisy. Moreover, if the robot searchers in the past have studied the effect of sensor noise
equipped with the Kinect moves long distances, the localization on mapping and localization [9], [10]. For short distances
error gets accumulated over time. The main aim of this study is
to reduce the error of the SLAM using RGB-D sensor, by taking the noise from the sensor can be neglected. However, if
into account the noise characteristics of the Kinect sensor. The the RGB-D sensor moves long distances, the number of
noise characteristics are investigated by experiments taking into point cloud data become large, and the error from mapping
account the RGB and depth information. A noise model is and localization gets accumulated over time. To avoid the
derived as a function of both distance and angle to the observed accumulation of error, loop closure is often used. Loop
surface. The noise model is applied to Iterative Closest Point
(ICP) algorithm for SLAM. Finally, the results of the proposed closure is an algorithm that detects whether the scene has
SLAM are shown and the improvement in accuracy is verified. already been seen and if so, it can correct the trajectory of
the sensor through matching the features from the location.
I. INTRODUCTION However, loop closure needs high computational cost and it
For a mobile robot, navigating in an unknown environment can only be used if the loop closure is detected.
is a crucial task. The robot traversing in an environment Some research utilize the noise characteristics of RGB-D
needs to localize its position with respect to the environment sensor for ICP algorithm [11]. This approach can decrease
and at the same time needs to build the map of the area by the accumulated error without higher computational cost.
utilizing its sensors. This problem is termed as SLAM or The Kinect sensor has two sensing modes, near-mode and
Simultaneous Localization and Mapping [1]–[3]. Depending normal-mode. Most of the previous researches have only
on the type of sensor used, the environment can be mapped considered the near-mode in their studies [12]. Most RGB-
in 2D or 3D. SLAM is necessary for applications of mobile D sensors including Kinect are especially affected by dis-
robots in search and rescue, infrastructure inspection, agri- cretization effects in depth measurements and in fact most
culture, space and so on [4] [5]. Recently many researchers cameras are only calibrated for certain range. This causes
have proposed 3D SLAM system using various sensors such considerable measurement errors in far ranges (< 3.5 m).
as Kinect, 3D-LIDAR and stereo camera [6]. These errors hinder the measurements of surfaces at longer
RGB-D sensors such as the Microsoft Kinect has emerged ranges and thus affect surface modeling.
as a popular choice amongst the robotics researchers in recent In this paper, we investigated the noise characteristics in
years. Due to its small form factor and easy integration it is normal-mode for the Kinect sensor, and proposed an accurate
very popular with hobby roboticists and researchers. What ICP algorithms that utilizes the noise characteristics. More-
also makes it popular is its low cost, and relatively high over, the SLAM algorithm along with the ICP is evaluated
performance. RGB-D sensors are devices that can capture by comparing the accuracy for localization.
RGB images and depth images at high frame rates. We can
get colored point cloud data from these images that can be II. NOISE CHARACTERISTICS OF K INECT
used to construct high density 3D point cloud model of the The depth information from Kinect sensor is very noisy
object or environment. Kinect [7] is one of the most popular in long range. Many researchers have investigated noise
choices for RGB-D sensor and has been extensively been characteristics of this sensor. An experiment to evaluate the
used in various robotics applications. However, the value of accuracy of its depth data has been carried out in [13]. In
depth have only 11-bits, which makes its accuracy to be the experiment, the planar surface of a door was measured
lesser than other 3D sensor like 3D-LIDARs. by Kinect sensor, and the noise model was derived by using
SLAM algorithms with RGB-D sensor often uses Iterative point cloud on the surface. The result showed that the random
Closest Point (ICP) algorithm [8] to estimate rigid transfor- error of depth measurement increases with increasing the
distance to the sensor. But the noise model did not consider
Corresponding author email: xlfs 6862@eis.hokudai.ac.jp
1 Graduate School of Engineering, Hokkaido University, Sapporo, Japan the surface angle. In works by [14], the surface angle was
2 Faculty of Engineering, Hokkaido University, Sapporo, Japan considered and the noise model was derived as a function
380
created from these two images. We build the 3D map by
combining many point clouds captured by moving the Kinect
sensor. ICP (Iterative Closest Point) matching is often used
to determine the alignment between the current frame and
the previous frame. At the same time, we can know the
relative motion of Kinect sensor between frames by using
ICP. However, ICP matching algorithm cannot determine the
correct alignment without a good initial alignment. So we
use the feature points and descriptor of RGB images for
initial alignment. Moreover, we introduce a weighted ICP
matching algorithm by using noise model derived in the
previous section.
A. Initial alignment
In this paper, initial alignment(IA) is executed by detecting
Fig. 5: An image of sequence freiburg2 desk
feature points and by extracting descriptors from RGB image.
We used ORB (Oriented FAST and Rotated BRIEF) feature
TABLE I: Information about dataset for SLAM
detector and descriptor [18] for this study. The ORB has low
calculation cost and gives binary features that are invariant Duration 99.36 s
to rotation and scale. Fig. 3 shows ORB feature points in Duration with ground-truth 69.15 s
the RGB image, and the point cloud. Some of the points Ground-truth trajectory length 18.880 m
Avg. translational velocity 0.193 m/s
vanishes in point cloud because some pixels of depth image Avg. angular velocity 6.338 deg/s
have no depth value, due to long distance, surface material, Trajectory dim 3.90 × 4.13 × 0.57 m
and surface angle. We estimated the initial alignment by
using only feature points. The top figure in Fig. 4 shows
the descriptor matching result for two consecutive images.
Most of the matches seems to be good, but some matches sensor in one frame, and it reduced to 15, 000 points after
indicates totally different points in the two images. To avoid filtering. We applied the noise model to the ICP algorithm to
the influence of false matches like this, RANSAC algorithm find the best transformation and by minimizing the sum of
is often used to determine a rigid transformation to align squared distances between correspondences. We considered
the consecutive two point clouds. The result of RANSAC the problem to find the transformation M , that transforms
is shown in the lower figure in Fig. 4 and disappearance of the source cloud si to the coordinate of the destination cloud
false matches can be clearly seen. di . The sum of weighed squared distances is expressed in
Equation 2.
B. Iterative Closest Point (ICP) 2
After the initial alignment is done, it is important to refine Mopt = argmin (M · si − di ) · ni · wi (2)
M i
the alignment using ICP. The ICP is iterative method to
find the transformation to align one point cloud to another Here, Mopt is the best transformation to align the two clouds
point cloud. The flow of ICP algorithm is as follows. Firstly, and ni is the unit normal vector at di , which is estimated
the algorithm searches for corresponding points from source by PCA (principal component analysis) [21]. The standard
point cloud and destination point cloud. These corresponding deviation calculated from the noise model indicates how
points are usually the nearest neighbor points. Secondly, the noisy the data is. If the value of standard deviation is high,
algorithm estimates the transformation matrix to minimize it means that the data is unlikely to be accurate. Therefore,
the sum of squared distances between the corresponding the weighing factor wi is calculated by the equation wi =
points. Thirdly, the original source cloud is transformed. 1/σ(diz , θi ) where diz is the z component of points di in
Moreover, corresponding points from the transformed cloud destination cloud, and θi is angle of the surface containing
and destination cloud are estimated and finally, the ICP some points near points di .
algorithm is iterated.
In the calculation of transformation matrix estimation, IV. EVALUATION
Levenberg-Marquardt algorithm [19] or linearization are To evaluate our algorithm, we used datasets of TUM RGB-
generally used. In the ICP algorithm of this paper, we D benchmark [22], [23]. This data consists of the RGB and
took linearization by using small-anlgle approximation [20], depth images captured by moving the Kinect sensor, and the
because the angle difference of two consecutive frames is ground truth of Kinect motion obtained from motion-capture
very small. system. Furthermore, the benchmark provides an evaluation
ICP uses not only the feature points used in IA but tool to calculate some values to evaluate the accuracy of
uniformly filtered (voxel grid filter) point cloud from the SLAM system or visual odometry. We chose the sequence
Kinect. In total, 307, 200 points are obtained by the Kinect ‘freiburg2 desk’ for this evaluation. A scene from the dataset
381
TABLE II: Each parameter of our algorithm TABLE III: Comparison of translational ATE
382
Fig. 8: Comparison of rotational RPE
383
to use not only Kinect data but also data from other sensors, [15] M. A. Fischler and R. C. Bolles, “Random sample consensus: a
such as wheel odometry and IMU in case of a mobile robot. paradigm for model fitting with applications to image analysis and
automated cartography,” Communications of the ACM, vol. 24, no. 6,
In addition, global optimization of estimated trajectory is also pp. 381–395, 1981.
needed to make a more consistent map. [16] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox, “Rgb-d
mapping: Using kinect-style depth cameras for dense 3d modeling of
VI. CONCLUSIONS indoor environments,” The International Journal of Robotics Research,
vol. 31, no. 5, pp. 647–663, 2012.
In this paper, we proposed a weighting ICP algorithm [17] A. Ravankar, “Probabilistic approaches and algorithms for indoor robot
using the error characteristics of the normal-mode Kinect mapping in structured environments,” Ph.D. dissertation, Hokkaido
sensor and SLAM algorithm using both RGB image and University, Japan, 2015.
[18] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An
depth image. The results shows the improvement in the efficient alternative to sift or surf,” in 2011 International conference
accuracy of the position estimation. We also showed that it on computer vision. IEEE, 2011, pp. 2564–2571.
is possible to improve the accuracy of SLAM using Kinect [19] J. J. Moré, “The levenberg-marquardt algorithm: implementation and
theory,” in Numerical analysis. Springer, 1978, pp. 105–116.
without changing the sensor to more expensive one. [20] K.-L. Low, “Linear least-squares optimization for point-to-plane icp
However, we estimated the noise model of only one surface registration,” Chapel Hill, University of North Carolina, vol. 4,
Kinect sensor and we ignored the individual difference of 2004.
[21] I. Jolliffe, Principal component analysis. Wiley Online Library, 2002.
it for evaluation. For future work, we plan to investigate [22] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers,
the individual difference and take evaluations with different “A benchmark for the evaluation of rgb-d slam systems,” in 2012
Kinect sensors. Moreover, we plan to improve the function IEEE/RSJ International Conference on Intelligent Robots and Systems.
IEEE, 2012, pp. 573–580.
of noise model to fit the data obtained by Kinect sensor and [23] Rgb-d slam dataset and benchmark, homepage, (last checked: Jul
find better weighting function for the ICP algorithm. 2016). [Online]. Available: http://vision.in.tum.de/data/datasets/rgbd-
dataset
R EFERENCES [24] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs,
R. Wheeler, and A. Y. Ng, “Ros: an open-source robot operating
[1] S. Thrun, W. Burgard, and D. Fox, Probabilistic robotics. MIT press, system,” in ICRA workshop on open source software, vol. 3, no. 3.2.
2005. Kobe, Japan, 2009, p. 5.
[2] S. Thrun and J. J. Leonard, “Simultaneous localization and mapping,” [25] Opencv japan, homepage, (last checked: Jul 2016). [Online].
in Springer handbook of robotics. Springer, 2008, pp. 871–889. Available: http://opencv.jp/
[3] A. Ravankar, A. A. Ravankar, Y. Hoshino, T. Emaru, and Y. Kobayashi, [26] R. B. Rusu and S. Cousins, “3d is here: Point cloud library (pcl),” in
“On a hopping-points svd and hough transform-based line detection Robotics and Automation (ICRA), 2011 IEEE International Conference
algorithm for robot localization and mapping,” 2016. on. IEEE, 2011, pp. 1–4.
[4] L. Pedersen, D. Kortenkamp, D. Wettergreen, I. Nourbakhsh, and [27] Point cloud library (pcl), homepage (last checked: Jul 2016). [Online].
D. Korsmeyer, “A survey of space robotics,” 2003. Available: http://pointclouds.org/
[5] K. Nagatani, S. Kiribayashi, Y. Okada, K. Otake, K. Yoshida, S. Ta- [28] F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Bur-
dokoro, T. Nishimura, T. Yoshida, E. Koyanagi, M. Fukushima et al., gard, “An evaluation of the rgb-d slam system,” in Robotics and
“Emergency response to the nuclear accident at the fukushima daiichi Automation (ICRA), 2012 IEEE International Conference on. IEEE,
nuclear power plants using mobile rescue robots,” Journal of Field 2012, pp. 1691–1696.
Robotics, vol. 30, no. 1, pp. 44–63, 2013.
[6] Z. Fang and Y. Zhang, “Experimental evaluation of rgb-d visual odom-
etry methods,” International Journal of Advanced Robotic Systems,
vol. 12, 2015.
[7] Microsoft kinect, homepage, (last checked: Jul 2016). [Online].
Available: http://www.xbox.com/ja-JP/kinect/
[8] S. Rusinkiewicz and M. Levoy, “Efficient variants of the icp algo-
rithm,” in 3-D Digital Imaging and Modeling, 2001. Proceedings.
Third International Conference on. IEEE, 2001, pp. 145–152.
[9] A. A. Ravankar, Y. Hoshino, A. Ravankar, L. Jixin, T. Emaru, and
Y. Kobayashi, “Algorithms and a framework for indoor robot mapping
in a noisy environment using clustering in spatial and hough domains,”
Int J Adv Robot Syst, vol. 12, p. 27, 2015.
[10] A. A. Ravankar, Y. Hoshino, T. Emaru, and Y. Kobayashi, “Map
building from laser range sensor information using mixed data clus-
tering and singular value decomposition in noisy environment,” in
System Integration (SII), 2011 IEEE/SICE International Symposium
on. IEEE, 2011, pp. 1232–1238.
[11] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim,
A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon,
“Kinectfusion: Real-time dense surface mapping and tracking,” in
Mixed and augmented reality (ISMAR), 2011 10th IEEE international
symposium on. IEEE, 2011, pp. 127–136.
[12] S. May, D. Droeschel, D. Holz, S. Fuchs, E. Malis, A. Nüchter,
and J. Hertzberg, “Three-dimensional mapping with time-of-flight
cameras,” Journal of Field Robotics, vol. 26, no. 11-12, pp. 934–965,
2009.
[13] K. Khoshelham and S. O. Elberink, “Accuracy and resolution of kinect
depth data for indoor mapping applications,” Sensors, vol. 12, no. 2,
pp. 1437–1454, 2012.
[14] C. V. Nguyen, S. Izadi, and D. Lovell, “Modeling kinect sensor noise
for improved 3d reconstruction and tracking,” in 2012 Second Interna-
tional Conference on 3D Imaging, Modeling, Processing, Visualization
& Transmission. IEEE, 2012, pp. 524–530.
384