Professional Documents
Culture Documents
GSLAM - A General SLAM Framework and Benchmark
GSLAM - A General SLAM Framework and Benchmark
Yong Zhao1 , Shibiao Xu∗,2 , Shuhui Bu†,1 , Hongkai Jiang1 , and Pengcheng Han1
1
Northwestern Polytechnical University
2
NLPR, Institute of Automation, Chinese Academy of Sciences
arXiv:1902.07995v1 [cs.CV] 21 Feb 2019
Rotation, rigid and similarity are three of the most used R = exp(φ∧ ) = exp(θa∧ ) (2)
transformations in SLAM research. A similarity transfor- = cos θI + (1 − cos θ)aaT + sin θaT . (3)
mation of a point p = (x, y, z)T is common to use a 4 × 4
Where a is the rotation axis and θ is the angle to rotate. φ∧
homogeneous transformation matrix or decompose such a
is the skew-symmetric matrix of φ.
matrix into rotational and translational components:
Similarly the Lie algebra of rigid and similarity trans-
0
p sR t
p
formation se(3) and sim(3) are defined. GSLAM uses
= . (1) quaternion to represent the rotational component and pro-
1 0T 1 1
vide functions converting from one representation to other
Here R ∈ R3×3 represents the rotation matrix, which is representations. Table 1 demonstrates our transforms im-
given as a member of the SO(3) Lie group [61] with three plementation with comparison to three other popular im-
unit direction axises. t ∈ R3 means the translation and s is plementations Sophus, TooN and Ceres. Since Ceres im-
the scale factor. The similarity transform matrix belongs to plementation uses the angle axis representation, the rota-
tion exponential and logarithm are not needed. As the ta- Table 2: Algorithms which the GSLAM Estimator imple-
ble demonstrates, the GSLAM implementation outperforms mented.
due to the use of quaternion and better optimization, while Algorithm Ref. Model
TooN utilizes the matrix implementation and outperforms F8-Point [19] Fundamental
on point transformation. F7-Point [33] Fundamental
E5-Stewenius [62] Essential
3.2.4 Image Format 2D-2D E5-Nister [54] Essential
E5-Kneip [42] Essential
Image data storing and transferring are two of the most im-
H4-Point [33] Homography
portant functions for visual SLAM. For efficiency and con-
A3-Point [4] Affine2D
venience, GSLAM utilizes a data structure GImage which
P4-EPnP [43] SE3
is compatible to cv::Mat. It has a smart point counter for
P3-Gao [26] SE3
safely memory free and is easy to be transfered without
P3-Kneip [41] SE3
memory copy. And the data pointer is aligned so that it 2D-3D
P3-GPnP [40] SE3
would be easier for single instruction multiple data (SIMD)
P2-Kneip [38] SE3
speed up. Users can convert between GImage and cv::Mat
T2-Triangulate [39] Translation
seamlessly and safely without memory copy.
A4-Point [4] Affine3D
3D-3D S3-Horn [34] SIM3
3.2.5 Camera Models P3-Plane [41] SE3
A camera model should be defined to project a 3D point pc
from camera coordinates to 2D pixel x. One most popular
camera model is the pinhole model where the projection can Mappoints are used to express the environment observed
be represented by multiply an intrinsic matrix K known as: by frames, which are generally used by feature based meth-
ods. However, a mappoint could not only represents a key-
fx cx point but also a GCP, edge line or 3D object. Their cor-
x = Kpc = fy cy pc (4) respondences with mapframes form an observation graph
1 which are often called as bundle graph.
As images for SLAM possibly contain radial and tangen- 4. SLAM Implementation Utilities
tial distortion due to imperfect manufacturing or are cap-
tured with a fish-eye or panorama camera, different camera For making things easier to implement a SLAM system,
models are proposed to describe the projection. GSLAM GSLAM provides some utility classes. This section will
provides implementations including the OpenCV [23] (used briefly introduce three optimized modules named Estimator,
by ORBSLAM [51]), ATAN (used by PTAM [37]) and Optimizer and Vocabulary.
OCamCalib [59] (used by MultiCol-SLAM [66] ). Users
4.1. Estimator
are also easy to inherit the class and implement some other
camera models like Kannala-Brandt [35] and Equirectangu- The purely geometric computation remains a fundamen-
lar panorama model. tal problem that requires robust and accurate real-time solu-
tions. Both classical visual SLAM algorithms [21, 37, 49])
3.2.6 Map Data Structure or modern visual-inertial solutions [44, 56, 67] rely on ge-
ometric vision algorithms for initialization, relocalization
For a SLAM implementation, its goal is to localize the real- and loop-closure. OpenCV [4] provides several geometry
time poses and generate a map. GSLAM suggests an unified algorithms and Kneip presents a toolbox for geometric vi-
map data structure which is consisted by several mapframes sion OpenGV [39] which is limited to camera pose compu-
and mappoints. This data structure is appropriate for most tation. Estimator of GSLAM aims to provide a collection
of the existed visual SLAM systems including both feature of close-form solvers cover all interesting cases with robust
based or direct methods. sample consensus (RANSAC) [18] methods.
Mapframes are used to represent location statuses in dif- Table 2 lists the algorithms supported by the estima-
ferent times with various information captured by sensors or tor. They are divided into three categories according to
estimated results including IMU or GPS raw data, depth in- the given observations. 2D-2D matches are used to esti-
formation and camera models. Relationships between them mate epipolar or homography constraints and relative pose
are estimated by SLAM implementations and their connec- could be decomposed from them. 2D-3D correspondences
tions form a pose graph. are used to estimate both central or non-central absolute
pose for monocular or multiple camera systems, which is Table 3: Comparison of four BoW implementations in load-
the famous PnP problem. 3D geometry functions such as ing, saving and training a same vocabulary with memory us-
plane fitting, and estimating the SIM 3 transformation of age statistics. The experiment is performed on a computer
two point clouds are also supported. Most algorithms are with i7-6700 CPU, 16GB RAM running 64bit Ubuntu. 400
implemented depending on the open-source linear algebra and 10k images from DroneMap [7] dataset are used to train
library Eigen, which is header-only and readily for most the models with 4 and 6 levels.
platforms. Implementation Ours DBoW2 DBoW3 FBoW
4.2. Optimizer ORB-4 67.3us 47.2ms 7.1ms 72.3us
ORB-6 7.2ms 6.8 s 1.1 s 9.5ms
Nonlinear optimization is the core part of state-of-the-art Load
SIFT-4 1.0ms 436.1ms 5.1ms 1.1ms
geometric SLAM systems. Due to the high dimensional- ORB-4 437.9us 40.4ms 1.7ms 553.1us
ity and sparseness of Hessian matrix, graph structures are ORB-6 34.4ms 4.8 s 632.4ms 20.6ms
used to modeling complex estimation problems for SLAM. Save
SIFT-4 4.4ms 437.6ms 6.7ms 2.7ms
Several frameworks including Ceres [1], G2O [30] and GT- ORB-4 7.6 s 24.8 s 23.6 s 8.5 s
SAM [12] are proposed to solve general graph optimization ORB-6 230.5 s 1.1Ks 911.4 s 270.4 s
problems. These frameworks are popular used by different Train
SIFT-4 23.5 s 327.7 s 299.0 s 18.7 s
SLAM systems. ORB-SLAM [49, 51], SVO [21, 22] use ORB-4 615.5us 2.1ms 1.9ms 862.4us
G2O for bundle adjustment and pose graph optimization. ORB-6 723.7us 6.0ms 4.9ms 1.2ms
OKVIS [44], VINS [56] use Ceres for graph optimization Trans
SIFT-4 1.1ms 10.3ms 9.2ms 11.5ms
with IMU factors and sliding window is used to control the ORB-4 0.44MB 2.5MB 2.5MB 0.45MB
computation complex. Forster et al. present a visual-initial ORB-6 44.4MB 247.1MB 246.5MB 45.3MB
method [20] based on SVO and implement the back-end Mem
SIFT-4 5.8MB 7.8MB 7.8MB 5.8MB
with GTSAM.
Optimizer of GSLAM aims to provide an unified inter-
face for most of nonlinear SLAM problems such as PnP
solver, bundle adjustment, pose graph optimization. A gen- Inspired by the above works, GSLAM carries out a
eral implementation plugin for these problems is carried out header only implementation of DBoW3 vocabulary with the
based on the Ceres library. For a particular problem such following features:
as bundle adjustment, some more efficient implementations 1. Removed OpenCV dependency and the all functions
such as PBA [71] and ICE-BA [45] could also be provided are implemented within one single header file only de-
as a plugin. With the optimizer utility, developers are able pending on C++11.
to access different implementations with an united interface,
particularly for deep learning based SLAM systems. 2. Combined the advantages of both DBoW2/3 and
FBoW [48] which are extremely fast and easy to use.
4.3. Vocabulary Interface similar to DBoW3 is provided and both bi-
Place recognition is one of the most important part for nary and float descriptors are accelerated using SSE
SLAM relocalization and loop detection. Bag of words and AVX instructions.
(BoW) approach is popular used in SLAM systems since 3. We improved the memory usage and accelerated the
its efficiency and performance. FabMap [10] [29] propose a speed of loading, saving or training a vocabulary and
probabilistic approach to the problem of recognizing places transformation from features of images to a BoW vec-
based on their appearance, which is used by RSLAM [47], tor.
LSD-SLAM [15]. As it uses float descriptors like SIFT and
SURF, DBoW2 [24] builds a vocabulary tree for training A comparison of the four implementations is demon-
and detection, which supports both binary and float descrip- strated in Table 3. In the experiment, each parent node
tors. Rafael presents two improved versions of DBoW2 has 10 children, and for ORB feature detection we use the
named DBoW3 and FBoW [48], which simplify the in- ORB-SLAM [51], and SiftGPU [70] is used for SIFT de-
terface and accelerate the training and loading speed. Af- tection. Two ORB vocabularies with 4 and 6 levels and
ter ORB-SLAM [49] adopts the ORB [58] descriptor and one SIFT vocabulary are used in the results. Both FBoW
uses DBoW2 for loop detection [50], relocalization and fast and GSLAM use multi-thread for vocabulary training. Our
matching, a varies of SLAM systems such as ORB-SLAM2 implementation outperforms in almost all items including
[51], VINS-Mono [56] and LDSO [25] use DBoW3 for loop loading and saving the vocabulary, training a new vocabu-
detection. It has become the most popular tool to implement lary, transforming a descriptor list to a BoW vector for place
place recognition for SLAM systems. recognition and a feature vector for fast feature matching.
Table 4: Some open dataset plugins build-in implemented
by now.
Dataset Year Environment Type
KITTI [28] 2012 outdoors multi-cam, imu
TUMRGBD [64] 2012 indoors RGBD
ICL [32] 2014 simulation RGBD
TUMMono [17] 2016 indoors mono
Euroc [8] 2016 indoors stereo, imu
NPUDroneMap [7] 2016 aerial mono
TUMVI [60] 2018 in/outdoors stereo, imu
CVMono [4] - - mono Figure 2: Screenshots of some SLAM and SfM plugins
ROS [57] - - - implemented, including direct method DSO [14] (left-top),
semi-direct visual odometer SVO [21, 22] (right-top), fea-
ture based method ORBSLAM [49, 51] (left-bottom) and
global SfM system TheiaSfM [65] (right-bottom).
Furthermore, the GSLAM implementation uses less mem-
ory and allocates less pieces of dynamic memories as we
found that the fragmentation problem is the main reason that
GSLAM has already implemented several popular visual
DBoW2 needs lots of memory.
SLAM dataset plugins which are listed in Table. 4. It is also
very easy for users to implement a dataset plugin based on
5. SLAM Evaluation Benchmark the header-only GSLAM core and publish it as a plugin or
compile it along with the applications.
Existed benchmarks [28, 64] need users download test
datasets and upload results for precision evaluation, which
5.2. SLAM Implementations
are not able to unify the running environment and evalu-
ate a fairy performance comparison. Benefit from the uni- Fig.2 demonstrates the screenshots of some open-source
fied interface of GSLAM, the evaluation of SLAM systems SLAM and SfM plugins running with build-in Qt visualizer.
becomes much more elegant. With help of GSLAM, de- Different architectures of SLAM systems including direct,
velopers just need to upload the SLAM plugin and various semi-direct, feature based and even SfM methods are sup-
evaluations on speed, computation cost and accuracy could ported by our framework. It should be mentioned that since
be done in a dockerlized environment with fixed resources. SVO, ORBSLAM and TheiaSfM utilize the map data struc-
In this section we will firstly introduce some datasets ture introduced in Sec. 3.2.6, the visualization is auto sup-
and SLAM plugins implemented. And then an evaluation ported. The DSO implementation needs to publish the re-
is carried out with three representative SLAM implemen- sults such as pointcloud, camera poses, trajectory and pose
tations on speed, accuracy, memory and CPU usages. The graph for visualization just like the ROS based implementa-
evaluation is performed to demonstrate the possibility of an tion does. Users are able to access different SLAM plugins
united SLAM benchmark with different SLAM implemen- with the unified framework and it is very convenient to de-
tation plugins. velop a SLAM based applications depending on the C++,
Python and Node-JS interfaces. As many researchers use
5.1. Datasets ROS for development, GSLAM also provides the ROS vi-
sualizer plugin to transfer the ROS defined messages seam-
A sensor data stream with corresponding configuration
lessly, and developers could utilize Rviz for display or con-
is always needed to run a SLAM system. For letting devel-
tinue to develop other ROS based applications.
opers focus on the development of the core SLAM plugins,
GSLAM provides a standard dataset interface where devel- 5.3. Evaluation
opers do not need to take care of the SLAM inputs. Both on-
line sensor input and offline data are provided through dif- As most existing benchmarks only provide datasets with
ferent dataset plugins, and correct plugin is dynamic loaded or without ground-truth for users to perform evaluations by
by identify the given dataset path suffix. A dataset imple- themselves. GSLAM provides a build-in plugin and some
mentation should provide all sensor stream requested with script tools for both computation performance and accuracy
related configurations, thus no extra setting is needed for evaluation.
different datasets. All different sensor streams are published The sequence nostructure-texture-near-withloop from
through Messenger introduced in Sec. 3.2.2 with standard TUMRGBD dataset is used to demonstrate how the eval-
topic names and data formats. uation performs. And three open-source monocular SLAM
(a) Memory usage (b) Memory malloc number (c) CPU usage (d) Frame duration
Figure 3: Computation performance statistics of three monocular implementations integrated within the evaluation tool. The
recordings of memory usage and memory allocated numbers are started after the SLAM application loaded, and updated
after every frame processed. CPU usage is updated when the process occupied CPU time increases a curtain value. Frame
duration is measured by the time between current frame published and processed.
Figure 4: Odometer trajectories aligned with ground-truth (left) and absolute pose error (APE) distributions of DSO, SVO
and ORBSLAM. The odometer trajectories published lively instead of final results are used.
plugins DSO, SVO and ORBSLAM are adopted for the fol- while ORBSLAM achieves the best accuracy in terms of
lowing experiments. A computer with i7-6700 CPU, GTX absolute pose error (APE). The relative pose error (RPE)
1060 GPU and 16GB RAM running 64bit Ubuntu 16.04 is are also provided, however due to the limitation of the para-
used for all the experiments. graph, more experimental results are provided in the sup-
The computation performance evaluation including plementary materials. Since the integrated evaluation is a
memory usage, malloc numbers, CPU usage and time used pluggable plugin application, it can be reimplemented with
by every-frame statistics are shown in Fig. 3. The re- more evaluation metrics such as the precision of pointcloud.
sults demonstrate that SVO uses the least memory, CPU re-
sources and obtains fastest speed. And all cost keeps stable 6. Conclusions
since SVO is a visual odometer and just a local map is main-
tained inside the implementation. DSO mallocs fewer mem- In this paper, we introduce a novel and generic SLAM
ory block numbers, but consumes more than 100MB RAM platform named GSLAM, which provides support from de-
which increases slowly. One problem of DSO is that the velopment, evaluation to application. Through this plat-
processing time increases dramatically when frame num- form, frequently used toolkits are provided by plugin form,
ber is below 500, in addition, the processing times for key- and users can also easily develop their own modules. To
frames are even longer. ORBSLAM uses the most CPU make the platform easy to be used, we make the interfaces
resources and the computation time is stable, but the mem- only depend C++11. In addition, Python and JavaScript in-
ory usage increases fast and it allocates and frees a lot of terfaces are provided for better integrating traditional and
memory blocks since the bundle adjustment uses the G2O deep learning based SLAM or distributed on the web.
library and no incremental optimization approach is used. We hope that researchers and engineers will find
The odometer trajectory evaluation is presented in Fig. GSLAM is an useful platform for practical development,
4. As we can see, SVO is faster but have much higher drift, and it could further boost the applications of SLAM to var-
ious fields. In the following research, more SLAM imple- [15] J. Engel, T. Schöps, and D. Cremers. Lsd-slam: Large-scale
mentations, documents and demonstrate code will be sup- direct monocular slam. In Computer Vision–ECCV 2014,
plied for easy to learn and use. In addition, integration of pages 834–849. Springer, 2014. 1, 2, 6
traditional and deep learning based SLAM will be provided [16] J. Engel, J. Stuckler, and D. Cremers. Large-scale direct
to further explore the undiscovered possibilities of SLAM slam with stereo cameras. In Intelligent Robots and Systems
systems. (IROS), 2015 IEEE/RSJ International Conference on, pages
1935–1942. IEEE, 2015. 1, 2
References [17] J. Engel, V. Usenko, and D. Cremers. A photometrically
calibrated benchmark for monocular visual odometry. arXiv
[1] S. Agarwal, K. Mierle, and Others. Ceres solver. http: preprint arXiv:1607.02555, 2016. 7
//ceres-solver.org. 6 [18] M. A. Fischler and R. C. Bolles. Random sample consen-
[2] T. Bailey and H. Durrant-Whyte. Simultaneous localization sus: a paradigm for model fitting with applications to image
and mapping (slam): Part ii. IEEE Robotics & Automation analysis and automated cartography. Communications of the
Magazine, 13(3):108–117, 2006. 1 ACM, 24(6):381–395, 1981. 5
[3] B. Bodin, H. Wagstaff, S. Saecdi, L. Nardi, E. Vespa, [19] M. A. Fischler and O. Firschein. Readings in computer vi-
J. Mawer, A. Nisbet, M. Luján, S. Furber, A. J. Davison, sion: issues, problem, principles, and paradigms. Elsevier,
et al. Slambench2: Multi-objective head-to-head bench- 2014. 5
marking for visual slam. In 2018 IEEE International Confer- [20] C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza. On-
ence on Robotics and Automation (ICRA), pages 1–8. IEEE, manifold preintegration for real-time visual–inertial odome-
2018. 1, 2 try. IEEE Transactions on Robotics, 33(1):1–21, 2017. 6
[4] G. Bradski et al. The opencv library (2000). Dr. Dobbs [21] C. Forster, M. Pizzoli, and D. Scaramuzza. Svo: Fast semi-
Journal of Software Tools, 2000. 5, 7 direct monocular visual odometry. In Robotics and Automa-
[5] S. Brahmbhatt, J. Gu, K. Kim, J. Hays, and J. Kautz. Mapnet: tion (ICRA), 2014 IEEE International Conference on, pages
Geometry-aware learning of maps for camera localization. 15–22. IEEE, 2014. 1, 2, 5, 6, 7
arXiv preprint arXiv:1712.03342, 2017. 1, 2
[22] C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and
[6] S. Bu, Y. Zhao, G. Wan, and K. Li. Semi-direct tracking and
D. Scaramuzza. Svo: Semidirect visual odometry for
mapping with rgb-d camera for mav. Multimedia Tools and
monocular and multicamera systems. IEEE Transactions on
Applications, 75:1–25, 2016. 1, 2
Robotics, 33(2):249–265, 2017. 1, 2, 6, 7
[7] S. Bu, Y. Zhao, G. Wan, and Z. Liu. Map2dfusion: real-time
[23] J. G. Fryer and D. C. Brown. Lens distortion for close-range
incremental uav image mosaicing based on monocular slam.
photogrammetry. Photogrammetric engineering and remote
In Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ
sensing, 52(1):51–58, 1986. 5
International Conference on, pages 4564–4571. IEEE, 2016.
[24] D. Gálvez-López and J. D. Tardós. Bags of binary words for
6, 7
fast place recognition in image sequences. IEEE Transac-
[8] M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder,
tions on Robotics, 28(5):1188–1197, October 2012. 6
S. Omari, M. W. Achtelik, and R. Siegwart. The euroc micro
aerial vehicle datasets. The International Journal of Robotics [25] X. Gao, R. Wang, N. Demmel, and D. Cremers”. Ldso: Di-
Research, 35(10):1157–1163, 2016. 7 rect sparse odometry with loop closure”. In International
[9] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, Conference on Intelligent Robots and Systems (IROS), Octo-
J. Neira, I. Reid, and J. J. Leonard. Past, present, and ber 2018. 6
future of simultaneous localization and mapping: Toward [26] X.-S. Gao, X.-R. Hou, J. Tang, and H.-F. Cheng. Complete
the robust-perception age. IEEE Transactions on Robotics, solution classification for the perspective-three-point prob-
32(6):1309–1332, 2016. 1 lem. IEEE transactions on pattern analysis and machine in-
[10] M. Cummins and P. Newman. Fab-map: Probabilistic local- telligence, 25(8):930–943, 2003. 5
ization and mapping in the space of appearance. The Interna- [27] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for au-
tional Journal of Robotics Research, 27(6):647–665, 2008. 6 tonomous driving? the kitti vision benchmark suite. In
[11] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse. IEEE Conference on Computer Vision and Pattern Recog-
Monoslam: Real-time single camera slam. IEEE trans- nition (CVPR), 2012. 2
actions on pattern analysis and machine intelligence, [28] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for au-
29(6):1052–1067, 2007. 1, 2 tonomous driving? the kitti vision benchmark suite. In Com-
[12] F. Dellaert. Factor graphs and gtsam: A hands-on intro- puter Vision and Pattern Recognition (CVPR), 2012 IEEE
duction. Technical report, Georgia Institute of Technology, Conference on, pages 3354–3361. IEEE, 2012. 7
2012. 6 [29] A. Glover, W. Maddern, M. Warren, S. Reid, M. Milford,
[13] H. Durrant-Whyte and T. Bailey. Simultaneous localization and G. Wyeth. Openfabmap: An open source toolbox for
and mapping: part i. IEEE robotics & automation magazine, appearance-based loop closure detection. In Robotics and
13(2):99–110, 2006. 1 Automation (ICRA), 2012 IEEE International Conference
[14] J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. on, pages 4730–4735, May 2012. 6
IEEE transactions on pattern analysis and machine intelli- [30] G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. g2o:
gence, 40(3):611–625, 2018. 1, 2, 7 A general framework for graph optimization. In IEEE In-
ternational Conference on Robotics and Automation, 2011. [46] Y. Maruyama, S. Kato, and T. Azumi. Exploring the per-
6 formance of ros2. In Proceedings of the 13th International
[31] A. Handa, T. Whelan, J. McDonald, and A. Davison. A Conference on Embedded Software, page 5. ACM, 2016. 2
benchmark for RGB-D visual odometry, 3D reconstruction [47] C. Mei, G. Sibley, M. Cummins, P. Newman, and I. Reid.
and SLAM. In IEEE Intl. Conf. on Robotics and Automa- Rslam: A system for large-scale mapping in constant-time
tion, ICRA, Hong Kong, China, May 2014. 2 using stereo. International journal of computer vision,
[32] A. Handa, T. Whelan, J. McDonald, and A. J. Davison. A 94(2):198–214, 2011. 6
benchmark for rgb-d visual odometry, 3d reconstruction and [48] R. Muoz-Salinas. FBox fast bag of words, 2017. 6
slam. In Robotics and automation (ICRA), 2014 IEEE inter- [49] R. Mur-Artal, J. Montiel, and J. D. Tardos. Orb-slam: a ver-
national conference on, pages 1524–1531. IEEE, 2014. 7 satile and accurate monocular slam system. arXiv preprint
[33] R. Hartley and A. Zisserman. Multiple view geometry in arXiv:1502.00956, 2015. 1, 2, 5, 6, 7
computer vision. Cambridge university press, 2003. 5 [50] R. Mur-Artal and J. D. Tardós. Fast relocalisation and loop
[34] B. K. Horn. Closed-form solution of absolute orientation closing in keyframe-based slam. In Robotics and Automation
using unit quaternions. JOSA A, 4(4):629–642, 1987. 5 (ICRA), 2014 IEEE International Conference on, 2014. 6
[35] J. Kannala and S. S. Brandt. A generic camera model and cal- [51] R. Mur-Artal and J. D. Tardós. Orb-slam2: An open-source
ibration method for conventional, wide-angle, and fish-eye slam system for monocular, stereo, and rgb-d cameras. IEEE
lenses. IEEE transactions on pattern analysis and machine Transactions on Robotics, 33(5):1255–1262, 2017. 1, 2, 5,
intelligence, 28(8):1335–1340, 2006. 5 6, 7
[36] C. Kerl, J. Sturm, and D. Cremers. Dense visual slam for [52] L. Nardi, B. Bodin, M. Z. Zia, J. Mawer, A. Nisbet, P. H.
rgb-d cameras. In Intelligent Robots and Systems (IROS), Kelly, A. J. Davison, M. Luján, M. F. O’Boyle, G. Riley,
2013 IEEE/RSJ International Conference on, pages 2100– et al. Introducing slambench, a performance and accuracy
2106. IEEE, 2013. 1, 2 benchmarking methodology for slam. In Robotics and Au-
[37] G. Klein and D. Murray. Parallel tracking and mapping for tomation (ICRA), 2015 IEEE International Conference on,
small ar workspaces. In Mixed and Augmented Reality, 2007. pages 5783–5790. IEEE, 2015. 1
ISMAR 2007. 6th IEEE and ACM International Symposium
[53] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison. Dtam:
on, pages 225–234. IEEE, 2007. 1, 2, 5
Dense tracking and mapping in real-time. In Computer Vi-
[38] L. Kneip, M. Chli, and R. Y. Siegwart. Robust real-time vi-
sion (ICCV), 2011 IEEE International Conference on, pages
sual odometry with a single camera and an imu. In Proceed-
2320–2327. IEEE, 2011. 1, 2
ings of the British Machine Vision Conference 2011. British
[54] D. Nistér. An efficient solution to the five-point relative pose
Machine Vision Association, 2011. 5
problem. IEEE transactions on pattern analysis and machine
[39] L. Kneip and P. Furgale. Opengv: A unified and general-
intelligence, 26(6):756–770, 2004. 5
ized approach to real-time calibrated geometric vision. In
Robotics and Automation (ICRA), 2014 IEEE International [55] G. L. Oliveira, N. Radwan, W. Burgard, and T. Brox.
Conference on, pages 1–8. IEEE, 2014. 5 Topometric localization with deep learning. arXiv preprint
arXiv:1706.08775, 2017. 1, 2
[40] L. Kneip, P. Furgale, and R. Siegwart. Using multi-camera
systems in robotics: Efficient solutions to the npnp prob- [56] T. Qin, P. Li, and S. Shen. Vins-mono: A robust and versatile
lem. In Robotics and Automation (ICRA), 2013 IEEE In- monocular visual-inertial state estimator. IEEE Transactions
ternational Conference on, pages 3770–3776. IEEE, 2013. on Robotics, 34(4):1004–1020, 2018. 1, 2, 5, 6
5 [57] M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote,
[41] L. Kneip, D. Scaramuzza, and R. Siegwart. A novel J. Leibs, E. Berger, R. Wheeler, and N. Andrew. Ros : an
parametrization of the perspective-three-point problem for a open-source robot operating system. Proc. IEEE ICRA Work-
direct computation of absolute camera position and orienta- shop on Open Source Robotics, 2009. 2, 7
tion. 2011. 5 [58] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. Orb:
[42] L. Kneip, R. Siegwart, and M. Pollefeys. Finding the ex- an efficient alternative to sift or surf. In Computer Vi-
act rotation between two images independently of the trans- sion (ICCV), 2011 IEEE International Conference on, pages
lation. In European conference on computer vision, pages 2564–2571. IEEE, 2011. 6
696–709. Springer, 2012. 5 [59] D. Scaramuzza, A. Martinelli, and R. Siegwart. A flexi-
[43] V. Lepetit, F. Moreno-Noguer, and P. Fua. Epnp: An accurate ble technique for accurate omnidirectional camera calibra-
o (n) solution to the pnp problem. International journal of tion and structure from motion. In Computer Vision Systems,
computer vision, 81(2):155, 2009. 5 2006 ICVS’06. IEEE International Conference on, pages 45–
[44] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Fur- 45. IEEE, 2006. 5
gale. Keyframe-based visual–inertial odometry using non- [60] D. Schubert, T. Goll, N. Demmel, V. Usenko, J. Stueck-
linear optimization. The International Journal of Robotics ler, and D. Cremers. The tum vi benchmark for evaluating
Research, 34(3):314–334, 2015. 1, 2, 5, 6 visual-inertial odometry. In International Conference on In-
[45] H. Liu, M. Chen, G. Zhang, H. Bao, and Y. Bao. Ice-ba: telligent Robots and Systems (IROS), October 2018. 7
Incremental, consistent and efficient bundle adjustment for [61] J. Selig. Lie groups and lie algebras in robotics. In Compu-
visual-inertial slam. In The IEEE Conference on Computer tational Noncommutative Algebra and Applications, pages
Vision and Pattern Recognition (CVPR), June 2018. 6 101–125. Springer, 2004. 4
[62] H. Stewenius, C. Engels, and D. Nistér. Recent develop-
ments on direct relative orientation. ISPRS Journal of Pho-
togrammetry and Remote Sensing, 60(4):284–294, 2006. 5
[63] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cre-
mers. A benchmark for the evaluation of rgb-d slam systems.
In IEEE/RSJ International Conference on Intelligent Robots
and Systems, pages 573–580, Oct 2012. 2
[64] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cre-
mers. A benchmark for the evaluation of rgb-d slam systems.
In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ In-
ternational Conference on, pages 573–580. IEEE, 2012. 7
[65] C. Sweeney. Theia multiview geometry library: Tutorial &
reference. University of California Santa Barbara, 2, 2015.
7
[66] S. Urban and S. Hinz. Multicol-slam-a modular
real-time multi-camera slam system. arXiv preprint
arXiv:1610.07336, 2016. 5
[67] L. von Stumberg, V. Usenko, and D. Cremers. Direct
sparse visual-inertial odometry using dynamic marginaliza-
tion. arXiv preprint arXiv:1804.05625, 2018. 1, 2, 5
[68] S. Wang, R. Clark, H. Wen, and N. Trigoni. Deepvo:
Towards end-to-end visual odometry with deep recurrent
convolutional neural networks. In Robotics and Automa-
tion (ICRA), 2017 IEEE International Conference on, pages
2043–2050. IEEE, 2017. 1, 2
[69] T. Whelan, R. F. Salas-Moreno, B. Glocker, A. J. Davison,
and S. Leutenegger. Elasticfusion: Real-time dense slam
and light source estimation. The International Journal of
Robotics Research, 35(14):1697–1716, 2016. 1, 2
[70] C. Wu. Siftgpu: A gpu implementation of scale invari-
ant feature transform (sift)(2007). URL http://cs. unc. edu/˜
ccwu/siftgpu, 2011. 6
[71] C. Wu, S. Agarwal, B. Curless, and S. M. Seitz. Multicore
bundle adjustment. In Computer Vision and Pattern Recogni-
tion (CVPR), 2011 IEEE Conference on, pages 3057–3064.
IEEE, 2011. 6
[72] Z. Yin and J. Shi. Geonet: Unsupervised learning of dense
depth, optical flow and camera pose. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recogni-
tion (CVPR), volume 2, 2018. 1, 2
[73] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe. Unsu-
pervised learning of depth and ego-motion from video. In
CVPR, volume 2, page 7, 2017. 1, 2