Role of Deep Learning in Loop Closure Detection For Visual and LiDAR SLAM-A Survey

sensors
Review
Role of Deep Learning in Loop Closure Detection for Visual
and Lidar SLAM: A Survey
Saba Arshad and Gon-Woo Kim *
Intelligent Robots Laboratory, Department of Control and Robot Engineering, Chungbuk National University,
Cheongju-si 28644, Korea; sabarshad1000@gmail.com
* Correspondence: gwkim@cbnu.ac.kr
Abstract: Loop closure detection is of vital importance in the process of simultaneous localization
and mapping (SLAM), as it helps to reduce the cumulative error of the robot’s estimated pose and
generate a consistent global map. Many variations of this problem have been considered in the past
and the existing methods differ in the acquisition approach of query and reference views, the choice
of scene representation, and associated matching strategy. Contributions of this survey are many-fold.
It provides a thorough study of existing literature on loop closure detection algorithms for visual
and Lidar SLAM and discusses their insight along with their limitations. It presents a taxonomy of
state-of-the-art deep learning-based loop detection algorithms with detailed comparison metrics.
Also, the major challenges of conventional approaches are identified. Based on those challenges,
deep learning-based methods were reviewed where the identified challenges are tackled focusing
on the methods providing long-term autonomy in various conditions such as changing weather,
light, seasons, viewpoint, and occlusion due to the presence of mobile objects. Furthermore, open
challenges and future directions were also discussed.
Keywords: simultaneous localization and mapping; loop closure detection; deep learning; neural

networks; autonomous mobile robots
Citation: Arshad, S.; Kim, G.-W. Role
of Deep Learning in Loop Closure
Detection for Visual and Lidar SLAM:
A Survey. Sensors 2021, 21, 1243. 1. Introduction
https://doi.org/10.3390/s21041243
Over the past few decades, simultaneous localization and mapping (SLAM) has been
one of the most actively studied problems in autonomous robotic systems. Its main function
Academic Editor:
is to enable the robot to navigate autonomously in an unknown environment by generating
Oscar Reinoso Garcia
the map and accurately localize itself in the map. During localization, the robot must
Received: 3 December 2020
Accepted: 4 February 2021
correctly recognize the previously visited places known as true loops. This recognition is
Published: 10 February 2021
done by one of the components of SLAM, known as loop closure detection. A true-loop
closure detection helps the SLAM system to relocalize and enhances the mapping accuracy
Publisher’s Note: MDPI stays neutral
by reducing the accumulated drift in the map due to robot motion [1]. However, the
with regard to jurisdictional claims in
existing loop closure detection systems are not yet that efficient to accurately detect the
published maps and institutional affil- true loops. The parameters affecting the detection of true loops are changing illumination,
iations. environmental conditions, seasons, different viewpoints, occlusion in features due to the
presence of mobile objects, and the presence of similar objects in different places.
Earlier studies have used point features for the detection of closed loops. Point fea-
tures, such as scale invariant feature transform (SIFT) [2] and speedup robust features
Copyright: © 2021 by the authors.
(SURF) [3], etc., used for visual loop closure detection are computationally expensive
Licensee MDPI, Basel, Switzerland.
and are not suitable for real-time visual SLAM systems [4–6]. To enhance computational
This article is an open access article
efficiency, many researchers have developed the bag-of-words (BoW)-based loop closure
distributed under the terms and detection methods [7–10]. These methods store the visual information of the environment
conditions of the Creative Commons as a visual dictionary and generate the clusters where each cluster represents a “word”.
Attribution (CC BY) license (https:// The computational efficiency of BoW-based loop detection methods is further boosted with
creativecommons.org/licenses/by/ the inverted-index approach used for the data retrieval of previously visited places [10].
4.0/). Though the BoW approach provides a fast solution to the handcrafted features-based loop
Sensors 2021, 21, 1243. https://doi.org/10.3390/s21041243 https://www.mdpi.com/journal/sensors

Sensors 2021, 21, 1243 2 of 17
closure detection, it requires a large amount of memory to store the visual words [11].
Most of the BoW-based loop detection algorithms generate fixed-sized vocabulary in an
offline step and perform the loop closure detection in an online step to further reduce the
computational cost [5,10]. Such methods perform well only if the loop closure detection
is performed in the preknown environment and are not practical for unexplored environ-
ments. To address this issue, many researchers have created the BoW vocabulary in an
online step to enable the system to work in real environments [10]. However, these meth-
ods are still inefficient as the memory requirement increases with increased vocabulary
size. The recent studies using convolution neural network (CNN) features for loop closure
detection have proven to be more robust against the above-mentioned challenges [12–15].
In addition, efforts have been made to reduce the memory usage of stored features through
deep neural networks [16]. However, detecting truly closed loops is still an open problem.
Loop closure detection is of key importance to the SLAM system for the relocalization
of a robot in a map. Several survey articles have extensively discussed the SLAM algo-
rithms and their efforts to improve closed-loop detection. Hence, a thorough taxonomy
is required to categorize the loop closure detection algorithms. In recent research [17], an
extensive comparative analysis for feature-based visual SLAM algorithms was presented.
The existing research is grouped into four categories based on visual features, i.e., low level,
middle level, high level, and hybrid features, and highlighted their limitations. Another
review for SLAM systems is provided where the scope is only limited to the vision-based
SLAM algorithms [18]. Similarly, Sualeh et al. [19] developed a taxonomy of SLAM algo-
rithms proposed in the last decade and discussed the impact of deep learning to overcome
the challenges of SLAM. The loop closure detection was not thoroughly highlighted. A
detailed survey of deep learning-based localization and mapping is provided in [20]. Other
surveys and tutorials focusing on the individual flavors of SLAM include the probabilistic
formulation of SLAM [21], pose-graph SLAM [22], visual odometry [23], and SLAM in
dynamic environments [24]. We recommend these surveys and tutorials to our readers for
a detailed understanding of SLAM.
In this survey, state-of-the-art loop closure detection algorithms developed for visual
and Lidar SLAM in the past decade have been discussed and categorized into three major
categories of vision-based, Lidar-based, and deep learning-based loop closure detection
methods. To the best of our knowledge, this is the first survey article that provides
an extensive study primarily focused on loop closure detection algorithms for SLAM.
Moreover, open challenges are also identified.
The rest of the paper is organized as follows: In Section 2, the existing research of
vision and Lidar-based loop detection using conventional approaches is discussed along
with their limitations. Section 3 explains the state-of-the-art deep learning-based loop
closure detection methods. Section 4 summarizes the challenges of conventional loop
closure detection schemes and the existing research on deep learning-based loop detection
to overcome these challenges. Finally, Section 5 concludes the survey.
2. Taxonomy of Loop Closure Detection

One of the essential parts of SLAM is the recognition of the previously mapped places
and eliminating the incremental drift by recognizing the premapped environment. For
loop closure detection, the estimation process cannot be trusted because of inconsistency.
Thus, a dedicated algorithm is needed for the relocalization of the vehicle in a prebuilt map.
In this section, the loop closure detection techniques are grouped into two major categories:
vision-based and Lidar-based loop closure detection. Each category further groups the
existing research on the basis of data acquisition and matching methods.
2.1. Vision-Based Loop Closure Detection

As imagery provides rich visual information, most of the methods make use of camera
sensors for loop detection. Based on the matching schemes, we have grouped the vison-
Sensors 2021, 21, 1243 3 of 17
based loop detection methods as image-to-image matching [25], map-to-map matching [26],
and image-to-map matching [27] schemes, as done in [28].
2.1.1. Image-to-Image Matching

Loop closure detection methods performing the image-to-image matching using the
correspondence between the visual features are grouped in this category. These methods
do not require the metric information of the features; instead, they apply topological
information.
Bag-of-words [29] model has been widely used for loop closure detection. It first
generates a vocabulary consisting of visual words where each word is a combination of
some features extracted from a large training dataset. These features are clustered using the
K-means clustering algorithm [30], as it is effective for unsupervised learning [31]. While
searching for a similar match for the current image, the BoW method converts the image
into a set of descriptors and for each descriptor. It searches for the closest cluster center
to generate a BoW vector which is used for image matching with previously seen images.
The vocabulary generation process is done as a preprocessing step either offline or online.
Methods with Offline Vocabulary

This category includes the loop detecting techniques using the bag-of-words approach
where image features are discretized in the descriptor space and a unique vocabulary word
is assigned to the group of similar visual or binary features. This vocabulary is generated
in a hierarchical structure which enhances the matching performance.
In [8], a probabilistic framework is presented, FABMAP, for appearance-based place
recognition and loop closing. FABMAP does not only detect the previously visited places
on the map but also identifies the new places and augments the map. The algorithm
applies the Chow–Liu tree [32] for building a generative model of visual BoW vocabulary
with 11,000 words. These visual words consist of groups of visual features extracted from
the images using a SURF detector/descriptor [33]. The system has been evaluated in an
outdoor environment dataset. The complexity of FABMAP is linear to the number of places
on the map. Although FABMAP achieved high performance, it was suitable for a few
kilometers of trajectory. Also, the system performance degraded in the environments other
than the training data. As FABMAP is using SURF features, it requires 400 milliseconds for
only the feature extraction process.
To enhance the applicability of the FABMAP in large-scale environments, FABMAP
2.0 [34] was proposed for the 70 km and 1000 km trajectories dataset. It has also applied
the inverted index with the BoW model for place recognition, generating a vocabulary
of 100,000 words which improves the overall performance of the system in terms of loop
detection and resource consumption. It has become the gold standard for loop detection,
but the robustness decreases if the similar structure appears in the images for a long
time [35].
Vocabulary generated using visual features such as SIFT and SURF provides high
performance because of invariance to light, scale, and rotation. However, these features
required a longer computation time [35–38]. This problem was addressed by the usage
of binary features such as BRIEF [39], BRISK [40], and ORB [41]. As their information
is compact so they are fast to compute and compare thus allowing much faster place
matching. For the first time, binary features have been used in [9,42], Fast detector and
BRIEF descriptor, for building a vocabulary of binary words. The system can perform
the loop detection and verification at one order of magnitude less than the other similar
techniques. As the BRIEF is not invariant to significant scale and rotation, these methods
are good for loop detection with planar camera motion.
The BoW-based loop closure detection methods depend on the appearance features
and their existence in the dictionary, ignoring the geometric information and relative
position in the space, thereby resulting in false loops due to similar features appearing in
Sensors 2021, 21, 1243 4 of 17
different places [43]. Also, in the presence of dynamic objects, the similarity in the loop
scene is reduced, thus causing the system to lose stability.
Methods with Online Vocabulary

The above-mentioned schemes build the vocabulary in offline steps. The systems
trained on the prebuild vocabulary show high performance on the same dataset, but the
performance degrades if the traversed places are inconsistent with the trained dataset. This
problem is addressed by the generation of the vocabulary in an online step.
Offline vocabulary is not suitable for the dynamic robot environment. In [44] a loop
closure detection method for a dynamic indoor and outdoor environment is developed by
incrementally generating and updating the vocabulary, in an online step, through feature
tracking among consecutive frames. The loop candidates are identified by a likelihood
function that is based on inverse frequency of corresponding image features. After the
likelihood evaluation, the vocabulary is updated based on new features extracted from the
current image. Through extensive experiments, it is shown that the incremental vocabulary
generation achieves a higher number of true positives in comparison to [37].
The BoW provides fast and easy loop detection. However, the performance of such sys-
tems is highly dependent on the appearance of a place. Thus, they suffer from a perceptual
aliasing problem, i.e., occurrence of similar features in different places or drastic change
in appearance of a place due to variation in environmental conditions. Kejriwal et al. [45]
generated a bag-of-word-pairs dictionary using quantized SURF features by incorporating
the spatial co-occurrence information of the image features to improve the recall rate and
reported better performance than [44].
To deal with the occlusion due to the presence of dynamic objects, SIFT features
have been used to enhance the loop detection accuracy for monocular SLAM [46]. The
appearance changes in the dynamic environment due to moving objects have been detected
through features projection from keyframes to current image and comparison among
them. Tracking is performed through matching features. As a result, image comparisons
and dynamic change in the environment are detected through gradual change in image
portions. As SIFT features are computationally expensive, the system ensures the real-
time performance through GPU acceleration and multithread programming. Similarly,
in [47], SURF and BRIEF features have been extracted to perform the word training for
loop detection in long-term autonomous driving. To improve the detection accuracy of
BoW-based closed-loop detection in a dynamic environment, Xu et al. [48] performed the
discrimination among feature points that belong to the static and dynamic objects. The
algorithm first detects and removes the feature points belonging to the dynamic objects
and then generates the BoW vocabulary using the static features.
2.1.2. Map-to-Map Matching

Methods performing map-to-map feature matching detect the loops by using the visual
features and relative distance between features common to two submaps. In [26], loop
closure detection is performed in monocular SLAM by using the geometric compatibility
branch and bound (GCBB) algorithm which matches the submaps based on the similarity
in visual features common in both submaps and their relative geometry. However, the
system is not suitable for sparse maps [28]. The major limitation of such methods is that
the maps are either too sparse to be distinctive or too complex such that they cannot be
completely explored for high performance in real-time. For such methods, the exploration
space can be reduced by using the position information of the map features as done in [49].
2.1.3. Image-to-Map Matching

This group includes the loop detection methods which use the correspondences
between the visual features of the current camera image and the feature map. While
in image-to-map matching, the aim is to determine the camera pose relative to the point
Sensors 2021, 21, 1243 5 of 17
features in the map and matching is based on appearance features along with their structure
information.
In [25], a feature-tracking and loop closing method is proposed for monocular SLAM
using appearance and structure information. At each time step, 16D SIFT features have been
extracted from the current image representing the appearance information and matching
with the map features using BoW model. The BoW appearance model helps to identify the
part of the map that is similar to the current image by comparing image features with the
map features and generating the loop closure candidates. The map is stored as a graph
where each node stores the structure of landmarks. The structure for loop candidates
is matched through landmark appearance models. The current pose relative to the map
features is further determined by MLESAC [50] and the three-point pose algorithm [51].
William et al. [52] proposed a method for camera-pose estimation relative to the map
for relocalization and loop closing through an image-to-map matching scheme. The feature
map is built using the visual and metric information of landmarks. The appearance informa-
tion of map features is learned using a randomized tree classifier [53], and correspondences
between the current image and map are generated by landmark recognition. Once the
landmarks are recognized in an image, the camera pose is determined using estimated
metric information. For this purpose, a global metric map is divided into submaps. The
relative positions of these submaps are determined by the mutual landmarks. The global
map is represented by a graph where each node is a submap and edges between the nodes
represent the transformation between submaps. Tracking is performed between the current
and the previous submap, thus merging the maps. In the case of true overlap, the relative
transformation between submaps is determined by the poses from their trajectories and an
edge is added between the two consecutive submaps, thus representing a detected loop.
Though it performs well in relocalization and loop closing, the randomized list classifier is
memory inefficient.
Xiang et al. [54] proposed direct sparse odometry with loop closure detection (LDSO)
as an extension of direct sparse odometry (DSO) [55] for monocular visual SLAM. The
DSO ensures the robustness of the system in a featureless environment. To retain the
repeatability of the feature points, LDSO extracts ORB features from keyframes. The loop
closure candidates are selected using the BoW approach as used in [9]. Later, the RANSAC
PnP [56] is applied for the verification of loop candidates. Raul et al. [57] addressed the
relocalization and loop closure problem in keyframe-based SLAM by using the ORB feature.
The proposed solution applies the image-to-map feature matching scheme and is robust
to scale changes from 0.5 to 2.5 and 50 degrees of viewpoint changes. The loop can be
detected and corrected at a 39 milliseconds frame rate in the database of 10,000 images.
Due to scale and viewpoint invariance, the proposed method achieves a higher recall rate
in comparison to [8,9,34].
2.2. Lidar-Based Loop Closure Detection

Vision-based loop closure detection for a long-term autonomous system is a chal-
lenging task due to large viewpoint and appearance changes. When such systems revisit
a place, they are subject to extreme variations in seasons, weather, illumination, and a
viewpoint along with the dynamic objects. These environmental changes make robust
place recognition extremely difficult. These limitations can be handled by the LiDAR, up
to some extent, as Lidar measurements are less prone to light and environmental changes
in comparison to vision sensors, providing a 360-degree field of view. Unlike vision-based
loop detection, research for Lidar-based solutions is rare. One of the reasons could be
the high cost of LiDAR sensors which prevents the wider use. Another reason is that
the LiDAR point clouds only contain the geometry information while images contain
rich information; thus the place recognition is a challenging problem when using point
clouds. The existing research on Lidar-based loop detection can be generally grouped into
histograms and segmentation-based methods.
Sensors 2021, 21, 1243 6 of 17
2.2.1. Histograms
Histogram extracts the feature values of points and encodes them as descriptors
using global features [58–60] or selected keypoints [61–64]. One of the approaches used
by these methods is the normal distribution transform (NDT) histogram [65], [66] which
provides the compact representation of point cloud maps into a set of normal distributions.
In [67], an NDT histogram is used to extract the structural information from Lidar scans
by spatially dividing the scans into overlapping cells. NDT is computed for each cell
and instances of certain classes of NDT in range intervals constitute the histograms. The
authors have compared the histograms using the Euclidean distance metric [68]. It is shown
that structural information provided by the histograms of NDT descriptors improves the
accuracy of the loop detection algorithm. A similar approach is used in [69] for loop closure
detection where scan matching is performed using the histograms of NDT descriptors.
NDT histogram-based methods are computationally expensive. To overcome the
computational overhead, many researchers have put efforts into developing fast loop
detection methods. In [58], the performance of the loop detection method presented in [67]
has been improved and the computational cost is reduced by using the similarity measure
histograms extracted from Lidar scans that are independent of NDT. Lin et al. [70] devel-
oped a fast loop closure detection system for Lidar odometry and mapping. It performs
similarity matching among keyframes through 2D histograms. Another approach used
for reducing the matching time is proposed in [62], where place recognition is performed
by using 3D point cloud keypoints and 3D Gestalt descriptors [71,72]. The descriptors
of current scan keypoints are matched with the point cloud map and a matching score
is computed for each keypoints using nearest neighbor voting scheme. The true loop is
determined by the obtained highest voting score after geometric verification.
The histogram-based methods can handle the two major issues: rotation invariance for
large viewpoint changes and noise handling for spatial descriptors, as spatial descriptors
are affected by the relative distance of an object from Lidar [61,73,74]. The major limitation
of histogram-based methods is that they cannot preserve the information of the internal
structure of a scene, thus making it less distinctive and causing false loop detections.
2.2.2. Segmentation
The loop detection methods using a point-cloud-segmentation approach are based
on shapes or objects recognition [75–80]. In such methods, segmentation is performed
as a preprocessing step because a priori knowledge about location of objects, that are to
be segmented during robot navigation, is needed. The segment maps provide a better
representation of a scene where static objects may become dynamic and are more related to
the ways human’s environment perception. One of the advantages of such techniques is
the ability to compress the point cloud map into a set of distinctive features which largely
reduced the matching time and likelihood of obtaining false matches. Douillard et al. [81]
provide a detailed discussion on several segmentation methods for Lidar point clouds
including ground segmentation, cluster-all, base-of, base-of with ground method for dense
data segmentation, and Gaussian process incremental sample consensus, mesh-based
segmentation for sparse data. SegMatch [82] uses the cluster-all method for point cloud
segmentation and extracts two types of features including eigenvalue-based and shape
histograms. The features are segmented as trees, vehicles, buildings, etc., and matching
is done by using a random forest algorithm [83]. It is observed that SegMatch requires
real-time odometry for loop detection and does not perform well when using only the Lidar
sensor. Also, the maps generated by SegMatch are less accurate. A similar segmentation
approach is used in [84] to enhance the robustness of loop closure detection by reducing the
noise and resolution effect. The point cloud descriptor encodes the topological information
of segmented objects. However, the performance degrades if the segmentation information
is not sufficient. In recent research [85], an optimized Lidar odometry and mapping
algorithm is proposed in integration with SegMatch-based loop detection [82] to enhance
the robustness and optimization of the global pose. The false matches are removed using
Sensors 2021, 21, 1243 7 of 17
ground plane constraints based on RANSAC [86]. Tomono et al. [87] applied a coarse-to-
fine approach for loop detection among feature segments to reduce the processing time
where lines, planes, and balls are used for coarse estimation instead of feature points.
Based on the literature reviewed in this section, the benefits and limitations of each
method are summarized in Table 1.
Table 1. Summary of the benefits and limitations of camera and Lidar-based loop closure detection methods.
Method Benefits Limitations
• Does not require the metric

• Not suitable for dynamic robot
information of the features
environment
• Dependent on the appearance
Offline • Memory consumption is proportional
features and their existence in
Vocabulary to vocabulary size
dictionary
• Performance reduces if tested on
Image-to-Image • Good for loop detection with
different dataset.
planar camera motion
• Memory consumption is proportional

Vision-based Online • Allows to learn features in real time to vocabulary size
Vocabulary • Does not use geometric information
• Not suitable for sparse maps.

• Detects true loops when common
Map-to-Map • Cannot achieve high performance for
features exist in two submaps
complex dense maps.
• High performance when tuned for

100% precision
Image-to-Map • Memory inefficient
• Allows online map feature training
for real environment
• Provides rotation invariance for

• Cannot preserve distinctive
large viewpoint changes
Histograms information of internal structure of a
• Noise handling for spatial
scene
Lidar-based descriptors
• Compresses large point cloud

• Requires prior knowledge of object
Segmentation maps into set of distinctive features
locations
• Reduced matching time
3. Role of Deep Learning in Loop Closure Detection

In the past few years, deep learning has been introduced in visual and Lidar SLAM
systems to overcome the challenges of truly-closed-loop detection [13–16,19,88–90]. The
deep learning-based loop detection methods are known to be more robust to changing
environmental conditions, seasonal changes, and occlusion due to the presence of dynamic
objects [91]. This subsection presents the state-of-the-art deep learning-based loop clo-
sure detection methods using camera and Lidar sensors. The main characteristics of the
algorithms are tabulated in Table 2. The table represents the reference of the algorithms
followed by the year of publication, the sensors used for environment perception, the type
of features used for environment representation, the neural network used by the algorithm,
the type of environment for which the algorithm is developed, the loop closure challenges
addressed by the algorithm, i.e., variation in weather, seasons, light, and viewpoint, com-
putational efficiency, dynamic interference in the environment due to moving objects and
the semantics used for environment classification.
3.1. Vision-Based Loop Closing

In deep learning-based visual loop closure detection algorithms, research efforts have
been made to overcome the limitations of handcrafted feature-based methods. In a recent
research [89], a multiscale deep-feature fusion-based solution is presented where abstract
features are extracted from AlexNet [92] pre-trained on ImageNet [93] and fused with
Sensors 2021, 21, 1243 8 of 17
different receptive fields to generate fixed-length image representations that are invariant to
illumination changes. A similar approach is used in [94] where features are extracted from
fast and lightweight CNN to improve the loop detection accuracy and computation speed.
Many researchers have used semantics-based objects and scene classification for loop
detection through deep learning [95,96]. Semantic segmentation classifies each image pixel
according to the available object categories. All pixels, in an image, with same object
class label are grouped together and represented with same color. Maps with semantic
information enable the robots to have a high-level understanding of the environment.
Another approach used by the researchers for loop detection is autoencoders. The au-
toencoder compresses the input frame and regenerates it to the original image at the output
end of the system [88,97]. Merril et al. [98] proposed an autoencoder-based unsupervised
deep neural network for visual loop detection. The illumination invariance is achieved
by generating HoG descriptor [99] from the autoencoder at output instead of the original
image. The network is trained on the Places dataset [100] containing images from different
places, primarily build for scene recognition. During the robot navigation, there may exist
similar objects in different places which can greatly affect the performance of the algorithm
when it is executed for a sequence of images of a path. The major limitation of this approach
is that the autoencoder cannot show which keyframe in the database matches with the
current image; instead, it can only detect if the current place is already visited or not.
Gao et al. [88] have used the deep features from the intermediate layer of stacked denoising
autoencoder (SDA) [101] and performed the comparison of the current image with previous
keyframes. This method is also time-consuming as the current image is compared to all the
previous images. Also, the perceptual aliasing problem is not addressed, which may result
in false loops and incorrect map estimations.
3.2. Lidar-Based Loop Closing

To improve the matching time and detection accuracy of histogram-based loop detec-
tion methods, Zaganidis et al. [102] generated an NDT histogram-based local descriptor
using semantic information obtained from PointNet++ [103]. Here, [104] implemented
PointNetVLAD [105] which integrates PointNet [106] and NetVLAD [107] to generate a
global descriptor from 3D point cloud. Similarly, LocNet [108] applies a semi-handcrafted
deep network to generate a global representation of scan maps for place matching and
loop detection.
In deep learning-augmented segment-based loop detection methods, the SegMap [109]
produces segments of the scene incrementally as the robot navigates and passes those
segments to a deep neural network to generate a signature per segment. A loop is detected
by matching the segment signatures. SegMap aims to extract meaningful features for global
retrieval while the semantic class types were limited to vehicles, buildings, and others. The
performance of SegMap is further improved in [110].
Though segmentation-based loop detection methods are successful to enhance the
performance in terms of processing time, they are highly dependent on segmentation
information available in the environment. One of the solutions can be the formulation
of a more generic and robust descriptor using segmentation information from multiview
Lidar scans.
Sensors 2021, 21, 1243 9 of 17
Sensors 2021, 21, x FOR PEER REVIEW 9 of 17
Table 2. State-of-the-art deep learning-based loop closure detection methods for visual and Lidar SLAM.
Sensors
[102] 2021,
Sensors 201921,
2021, 21,xL
xFOR
FORPEER
PEERREVIEW
Semantic-NDT
REVIEW PointNet++ [103] - - + + + + +9 9ofof1717
[108]
[102] 20182019 LL Semi-handcrafted
Semantic-NDT Deep Siamese [103]
PointNet++ -- -- ++ Challenges ++ ++ ++ -+
Learning
Ref.
[109] Year Sensor
2018 L Components
SegMap CNN Env - - -+ ++ -+ Dynamic -+ Seman-tics -
[108]
[102] 2021,2018
2019 21, L L Semi-handcrafted Siamese
Algorithm Weather - Seasons - Light Viewpoint Effi-ciency +- 9 of 17
Sensors x FOR Semantic-NDT
PEER REVIEW PointNet++ [103] - - + + + Env +
[110] 2020
[109] 20192018 L SegMap
L Semi-handcrafted
SegMap CNN
CNN[103] - - - + + -
[108]
[102]
Sensors
Sensors 2018
2021,
[102] 2021,
2019 21,xC
21, L
xFOR Semantic-NDT
FORPEER
PEER REVIEW PointNet++
REVIEW Siamese -- -- +- +++ +-+ +-+ +-9- 9ofof1717
[112]
[89]
[110] 2016
2019
2020 CLL SIFT, Semantic-NDT
CNN SURF,
featureORB
SegMap PointNet++
PCANet
AlexNet
CNN [113][103] - - -- - - -- ++-+ +- -+- -+- - - -+
[109] 2021,
[108] 2018 21, xxL SegMap CNN +- + +- + +- +
Sensors
Sensors
[108]2021,
[114]
[112] 21,
2018
2020
2016 L L Semi-handcrafted
FOR
FOR PEER
PEER REVIEW
SIFT,REVIEW
Semi-handcrafted
Semantic SURF,class
ORB RangeNet++
Siamese
Siamese [115]
--
-
--
- -
++
+++ -- - ++-
- of 17
+99- of 17
[95]
[110]
[109] 2019
2020
2018 CC
L SIFT, SURF,
SegMap PCANet
Faster R-CNN
CNN [113] - -- - -- - -+ + +- +- + --
[109]
[116]
Sensors 2018
2020
2021, 21, x C L
FOR PEERCNN ORB
SegMap
feature
REVIEW ResNet18 CNN [117] + - + - + - + + + - + - - 9 -
[114]
Sensors 2020
[102]2021, 21, x C LL PEER
FOR Semantic
REVIEW class
ORB RangeNet++ [115] - - - ++++ +- -+++ +-++++ + 17 of
[112]
[110]
[96]
[110]
2019
2016
2020
2018
2020 L
C L
Semantic-NDT
SIFT, SURF,
SegMap
ORB
SegMap
PointNet++
PCANet
YoloCNN[111]
CNN
[113][103] - - -- - - -- ++- +- + -9+- of 17
[118]
[116] 2020
2020 C/LCL SIFT, CNN
CNN feature
feature VGG16
ResNet18 [119]
[117] + + -+ + + - - + -
[108] 2021,
[102]
[114]
[102] 201821, C
2019
2020
[112] 2019
[112]
Sensors
[98] 2016
2018
2016 C/L L
xC
C FOR Semi-handcrafted
Semantic-NDT
Semantic
Semantic-NDT
PEER
SIFT, SURF,
HoG class
REVIEW
SURF, ORB PointNet++
ORB PCANetSiamese
PointNet++
RangeNet++
Autoencoder
PCANet
[103]
[115]
[103]
[113]
[113] +- -- + - -- ++- + +++++ +-++-- +-+++- -
- +- +--9 of 17
[120]
[118] 2018
2020 C CNN
CNN feature
feature VGG16
VGG16 [119] ++ +- +++ +- -- -+ --
[109]
[108]
[116]
[108]
[114]
[102]
Sensors 2018
2018
2020
2019
2021, 21, C
L
x L
FOR CNN SegMap
Semi-handcrafted
Semantic feature
Semi-handcrafted
Semantic-NDT
Semantic-
PEER class
REVIEW ResNet18
RangeNet++
PointNet++CNN
Siamese
Siamese [117]
[115]
[103] +- - +- - +- + - + + +- -
+ + -
+ + - -
[102]
Sensors
[102]
[120]
2019
[114] 2021,
2020
2019
2018
21, xLLFOR
CL
Semantic-NDT
PEER
CNN REVIEW
Semantic
CNN Multiview PointNet++
class PointNet++
feature RangeNet++
VGG16
[103]
[103] [115] -- +- - - +- +++- +++
++
+ -+
-+-
+++
-+- + +9-+9ofof17 17
[110] 2018
[109]
[121]
[118]
[109]
[116]
[108] 2020 C/L
2018
2019
2020
2018 C
L CNN NDT
L Semi-handcrafted
SegMap
SegMapfeature
Semi-handcrafted ResNet-50
VGG16
ResNet18 CNN
CNN
Siamese [117]
[119]
[117] +- - +- - +- - +- +- +- - -
[108]
[102] 2019 LL Semantic-NDT Siamese
PointNet++ [103] - -+ - -+ + ++ + + + -
[116] 2019
Sensors
[102]
[110]
[108]
[120]
[110]
[118]
202021, L
2021,
[112] 2018
2016 C/L
2020
2018
2018
2020
xCFOR Semantic-NDT
LC
C C
L
CNN
PEER
CNN
SIFT,
CNN
feature PointNet++
REVIEW
descriptor
Semi-
Multiview
SURF,
SegMap
feature
ResNet18[103]
ORB ResNet-50
PCANetCNN
Siamese
VGG16
VGG16
[117]
[113]
[119] --+-+- - -+-+- +++-++- +++-+++ +++- ++-+ +++-+++- + -9 of 17
- +- --
[109]
[121]
[109]
[108] 2019
2020 L
2018
[118] 2018 L
L
C/L SegMap
SegMap
handcrafted
Semi-handcrafted
CNN feature CNN
CNN
Siamese
VGG16 [117]
[119] -+- + -+- - -+ + +++- -+- -- -+-++ -+- -
[122]
[108] 2019 LC Semantic
Semi-handcrafted feature
descriptor Hybrid
Siamese [123] - - + + + + -
Sensors
[114]2021,
[102]
[112]
[112] 21, x C
2020
2019
2016
2016 FOR
L PEER
C SIFT,
CNN REVIEW
Semantic
Semantic-NDT
SURF,
Multiviewclass
ORB RangeNet++
PointNet++
PCANet [115]
[103]
[113] --+-- +- - -+-- +- --+- + - +++ ++- +-- +++---+-- - ---9+-- of 17
[120]
[110]
[110]
[109]
[109]
[121]
2018
2020
[120] 2020
2018 L
2018
2018
2019
L
LC
C C SIFT, CNN SURF,
SegMap
SegMap
SegMap
SegMap
CNN featureORB
feature
L: LiDAR;
PCANet
VGG16
CNN
CNN
CNN
CNN
VGG16
ResNet-50
C:
[113]
[117]
Camera; -+++- : Outdoor; -+++- +:-+--Present;
+++- ++++++-: Absent.--- -+-- --++-
[109]
[116] 2018
[122]
[108]
[114]
[102]
[114]
[102]
[112]
Sensors
[112]
[110]
2019
2018
202021, C
2019
2020
2019
2016
2021,
2016
2020
L
L
C
x
L C
L
FOR SIFT,
PEER
SIFT,
SegMap
Semantic
CNN
Semantic
Semantic-NDT
descriptor
Semantic
CNN SURF,
feature
feature
Semi-handcrafted
REVIEW
SURF,
SegMap class
class
Multiview
Semantic-NDT ORB
ORB PCANet
PCANet
CNN
Hybrid
ResNet18
Siamese
RangeNet++
PointNet++
RangeNet++
PointNet++
CNN
[123] [115] : Indoor;
[117]
[103]
[115]
[103]
[113]
[113] - - - - - - -+ + + + - +- +
- -+
- +-+ - -+--- 9 of 17
[110]
[121]
[110] 2020
2019
2020 LL
C CNN SegMap
SegMapMultiview ResNet-50 CNN [117]
CNN - +- - +- - +- Present; + - -: Absent. + +- + ++ --
[121]
[118]
[109]
[116]
[122]
[116]
[108]
[114]
Sensors 2019
2018
2020
[108]2021,
2019
2018
2020 21, x C/L
C
L L
C
FOR CNN
Semantic
CNN SegMap
descriptor
Semi-handcrafted
Semantic
PEERSIFT, L:
feature
Semi-handcrafted
feature
feature
SURF,
REVIEW class LiDAR; Hybrid
ResNet18C:
ResNet-50
VGG16CNN
ResNet18
Siamese
Siamese
RangeNet++ Camera;
[123] [117]
[119]
[117]
[117]
[115] : Indoor; +- +- : Outdoor; +- + - +: +- + - + + +- -
+ -+
+ -
+ +-
[114]
[102]
[112]
[112]
[112]
2020
2019
2016
2016
2016
L
CC
C L Semantic
Semantic-NDT
SIFT,
SIFT, SURF,
descriptor
SURF,
class
ORB
ORB
RangeNet++
PointNet++
PCANet
PCANet
PCANet [113] [115]
[113]
[113]
[103] -- - - - -- - - - ++ - + + + + +
+-++- of Deep- +-Learning
- - +- + -
- -+- - --9+of 17
+
[120]
[110]
[118]
[109]
[122]
[118]
[109]
[116]
[116]
Sensors
[108]
[114]
2020
2018
2019
2018
2020
2020
2021, 21, xC
2018
2020 C/L
L
C C
C/L
L
FOR
L L PEER CNN
Semantic
CNN
CNN
ORB
SegMap
SegMap
feature
REVIEW
Semi-handcrafted
Semantic
4.
feature
feature
featureL:
class
Challenges
LiDAR; Hybrid
VGG16 VGG16
VGG16
C:
ResNet18 CNN
CNN
ResNet18
Siamese
RangeNet++ [123]
[119]of
[119]
[117]
Camera;
[117]
[115]
Loop : Closure
Indoor; + +-
-
+-
- Detection
: Outdoor;+ +-
-
+ -
- +:and
+ +-
-
+ -
Present;
+ Role + +
+ + -: Absent. + +-
- + + -
+
+ + - ++-9-- of 17
[122]
[114]
[102] 2019
2020
2019 L C Semantic
Semantic-NDT feature
Semanticclass RangeNet++ Hybrid
PointNet++ [123]
[115]
[103] +
-+- +- +
-+- +- +-+- +- ++-++ +-+- +-- - ++-++- - +- +-
[112]
[120]
[110]
[114]
[120]
[110]
[118]
[118] 2016
2018
2020
2020
2018
2020
2020 L
C
L
C/L C
L CNN
SIFT,
CNN
CNN Multiview
SURF,
SegMap
SegMap feature
featureL:ORB
4. PCANet
Challenges
RangeNet++
LiDAR; VGG16
CNN
VGG16
VGG16CNN
C: [113]
[115]
[119]
Camera; of Loop : Closure
Indoor; - :Detection
-
Outdoor; +: -and Present; Role -: of Deep
Absent. Learning +
[109]
[116]
Sensors
[121]
[116] 2020
2021, 21, x C/L
2018
2019
2020 CC
FOR L PEER CNN
CNN feature
SegMap
class
REVIEWfeature VGG16
Based
ResNet18
L: LiDAR; CNN
ResNet-50
C: on [119]the
[117]existing
[117]
Camera; : Indoor; ++ +- : Outdoor;
literature and--
+-+ + their
++ +-Present;
+ +- limitations
+: -+ +
++ ++ -: Absent. presented -+ -
+- +--
++ +-
in Section 3, the9--major -of -- 17
[108]
Sensors
[102]
[112]
[112] 2018
[114]2021, 21, xC
2020
2019
2016
2016 LFOR
C L
C CNN
PEER
CNN
SIFT,
CNN feature
Semi-handcrafted
REVIEW
descriptor
Semantic class
Multiview
Semantic-NDT
SURF,
Multiview ResNet18
Siamese
RangeNet++
PointNet++
ORB ResNet18
PCANet [117] [115]
[103]
[113] +- -
+ ++-- +--
+- +- --9+-- of 17
[120]
[120]
[110]
[116]
[118]
[121]
2018
2018
2020
2020
2020
2019 CC
C/L L SIFT,
C
CNN
CNN
CNN
CNN SURF,
feature
feature
SegMap
feature
featureORB
4. PCANet
Challenges
challenges
VGG16
VGG16
VGG16 CNN
ResNet-50of
[113]
[117]
[119]of
loop Loop
[117] closure Closure +++
detection+- - Detection
+ methods +++
and + + Role
are ++-+identified.
- limitations + of Deep --- +-Learning
+
This section -+++++ lists the- -major
[121]
[118]
[109]
[122]
[116]
2019
2020
2018
2019 C/L
LC
LC
CNNSegMap
Semantic
CNN feature
feature
feature Based
ResNet-50
VGG16CNN
Hybrid
ResNet18 on[119] the
[117]
[123]
[117] existing literature
+- + - and
+ their+- - +
++-++-++
presented
+--- +---
in Section
++-+-++--
3, the -- - major
[108]
[114]
[102]
[118]
[120] 2018
2020
[112] 2019
[114] 2020
2016 C/L
2020
2018 LC L CNN
C SIFT,
CNN
CNN
descriptor
Semi-handcrafted
Semantic
Semantic-NDT
CNN descriptor
SemanticMultiview
Multiview
SURF,
feature class
class
feature ORB
4.
Siamese
RangeNet++
PointNet++
RangeNet++
PCANet
VGG16
Challenges VGG16[119] [115]
[103]
[115]
[113]
of Loop Closure +-+- +-- Detection - -+- +-- +++- +
and + Role of Deep Learning - ++- +--
[121] 2019 C ResNet-50
challenges
challenges
4.LiDAR;
Challenges and
of [117]role
loop
of of deep
closure
Loop: Indoor;Closure +
learning
detection Detection +in SLAM
methods + systems
are + to
identified. overcome - This those
section + challenges.
lists the -9major
[120]
Sensors
[121]
[110]
[122]
[118]
[109]
[116]
[108]
[122]
[102] 2018
2021,
2019
2020 21,
2019
2018
2020
2018
2019
[114] 2019
[116]
[120] 2020 L
2020
2018
x C
LFOR
C/L
C
C L
C CNN
PEER SegMap
Semantic
CNN feature
REVIEW
SegMap
L Semi-handcrafted
Semantic
Semantic-NDT
CNN
descriptor
descriptor
Semantic
CNN
CNN
feature
featureL:
feature
feature
feature
Multiviewclass Based VGG16
ResNet-50
CNN
Hybrid
VGG16
Hybrid
PointNet++
ResNet18 on
CNN
ResNet18
C:
Siamese
RangeNet++
VGG16
[117]
[123]the
[123]
[119]
[117]
Camera;
[103]
[117] existing
[115]
+ -
literature
+-+ +--
+ -
and their
+ -+ +--
: Outdoor; ++and
+
++:
- Role
+--Present; +++++- -:of
+
limitations Deep
Absent. ++- +---Learning
+
presented - +
in Section -
++--++- 3, the- +--+major - ++- of 17
[112] 2016 C CNN
SIFT, Multiview
SURF, ORB PCANet
Based on [113]
the existing -
literature - + + - - -
[121]
[108]
[122]
[118]
[121]
[102]
[122]
[116]
2019
[120]2019
[110]
[118]
[109] 2018 L
2020
2018
2019
2020
2019
2020
Sensors L
CC
C
C/L
2021,C
L 21,
C/L
C CNN SegMap
SegMap
Semi-handcrafted
Semantic
xCNN
Semantic-NDT
Semantic
CNNCNN
FOR featurechallenges
L:
feature
featureL:
feature
feature
PEER
descriptor LiDAR;
REVIEW
ResNet-50
challenges
LiDAR; Siamese
Hybrid
VGG16
Based C:
ResNet-50
PointNet++
Hybrid
ResNet18 and
of
VGG16
VGG16CNN
CNN
C: [117]
loop
[119]
Camera;
[123]
[119]
Camera;
on role
the
[117]
[103]
[123][117] of: :Indoor;
closure
existing deep
Indoor; - +-learning
detection+
+ ++- : :Outdoor;
literature -and
Outdoor; +
+-+-andin
++- SLAM
methods their
+ +:
+:
their + limitations
systems
are
++-Present;
+-+Present; limitations +
+ ++-identified. topresented
++--:-:Absent.
Absent. overcome
presented -
+ +--- ++- This section inin Section
those
+Section lists3,3,the
+--+- ++- challenges.
+ the + +-+major
the -- -
major
- 9 of 17
[102]
[114]
[121]
[112] 2019
2020
2019
2016 LC C Semantic-NDT
CNNdescriptor
Semantic
Multiview
SIFT, SURF, class
Multiview4.1.
ORB PointNet++
RangeNet++
Perceptual
ResNet-50
PCANet [117][103]
[115]
Aliasing
[113] + - - + - - + + - + + + + + - -- + + - - +
[108][120]
[110]
[109]
[120]
[118]
[122] 20182018
2020
2018
2018
2020
2019 L LC
C/L
C CNN
SegMap
CNN feature
Semi-handcrafted
CNN
Semantic challenges
feature
feature
L:
feature4. Challenges
LiDAR; CNN
VGG16
Siamese
VGG16
C:
Hybrid of
VGG16
and
Camera; loop
[119]
[123] role
of closure
of
Loop : deepClosure
Indoor; detection
- +
-++ + : Detection
learning -
Outdoor; methods
-++in + - SLAM+
+: -++and +
Present; are
systems
Role
+ +++identified.
-
-: of toDeep
Absent. + +--- - This
overcome Learningsection
those
+ lists
+--- + challenges. the - - --
-+major
[121] 2018
[122]
[108]
[116] 2019 C
2019
2020 L C Semi-handcrafted
Semantic
descriptor
CNN feature
feature challenges
ResNet-50
Hybrid
Siamese
ResNet18 of
[123] loop closure detection
[117]
[117] + - + + - +methods + + are + +identified. + - - This section + - + lists the+- major -
[114]
[112]
[110] 2020
2016
2018 LC
2020 LC L SIFT,
CNN
CNN descriptor
Semantic
SURF,
SegMap
Multiviewclass
Multiview
ORB4.1. RangeNet++
PCANetCNN
Perceptual
challenges and [113] [115] of deep-learning
Aliasing
role - +- Detection - -in - +-and - systems
+ +++ +++visual, - ++-- ---Learning - ++-+ ++- topological - -- +-- fea-
[109][120]2018
[121]
[102]
[121]
[109]
[118] 2019
2019
2019
2018
2020 L
LCC SegMap
CNN
Semantic
Semantic-NDT
SegMap
feature
4.
L: 4.Challenges
In
LiDAR; CNN
Challenges
an VGG16
C:
ResNet-50
challenges
PointNet++
ResNet-50
Based CNN Camera;
and
on of
environment, of
[117]
[103]
[117] Loop
Loop
role
the of
existing Closure
Closure
theliterature
: Indoor;
Indoor;
deep objects + Detection
+---+learning may
: Outdoor;
Outdoor;
+ --+and ++inSLAM -and
have
+:
SLAM
their+-+Present;++ some
Present; Role
Role
systems
limitations
+-++-: -:ofof to
Deeptoovercome
Deep
Absent. - +--Learning
geometric,
overcome
presented
those
and
those
in
challenges.
++-- -+- challenges.
Section 3, ++
the - major
[122]
[122]2020
[116]
[114]
[112]
[110] 2019 LC/L
2019
2020
2020
2016 LCC SIFT,
C CNN
Semantic
CNN feature
descriptor
Semantic
SURF,
descriptor
feature
SegMap
CNN Multiview
L:
class
ORB LiDAR;
feature
feature VGG16
C:
Hybrid
Hybrid
ResNet18
RangeNet++
PCANet
CNN
[119]
Camera;
[123] [123]
[117]
[115]
[113] : -
+ + : - - + +:
-
++
+ - + + Absent. + - + - +-- +-
[108] 2018 L Semi-handcrafted 4.
4.1.Challenges
tures Perceptual
based
In Siamese
an on of Loop
Aliasing
appearance,
environment, Closure
the --objects
structure, Detection may--andand and
++- + some
relative
have Role ++ +visual,
position of Deep of++- -Learning
geometric, objects as ++- + topological
depicted
and in Figure
[110]
[120]
[121]
[122]
[118]
[116] 2020
2018
[122] 2016
[114] 2019
2019
2020
[102] CC
2019
2020
L
C
LC C
C
C/L
2019
CNNSegMap
Semantic
CNN
CNN
Semantic
Semantic
LSURF,
feature
feature
feature
feature L:
feature
class Based
LiDAR;
Semantic-NDT
CNN
VGG16
ResNet-50
Based
challenges Hybrid
VGG16
ResNet18
Hybrid
RangeNet++ on on
of
C:Camera; the
[117]
the
[123]
[119]
[117]
Camera;
[123][115]
PointNet++ existing
existing
loop closure : Indoor; literature
+
literature
- detection
+-+ +
+ +
and
: Outdoor; +
-- +--+ methods
+- their
their
+ +: +
+-+- Present; limitations
limitations
are++ identified. +- -: Absent. presented
presented
+ - +-- This+ section
- inin Section
Section
- +-- -
+ 3,
lists 3, the
the
+ the
- +major
- +--+major -- fea- +
[112]
[109]
[112] 2018
2016 L SIFT,
C SIFT,
CNN
descriptor
SegMap
SURF,
ORB
Multiview 4.
ORB
PCANet
L:Challenges
4.1. LiDAR;
PCANetC:
Perceptual
CNN [113]
Aliasing
of
[113] Loop : [103]
Indoor;
Closure -- : Outdoor;
Detection -- +and +: Present;
and+- + some -: +Absent.
Role ++ of as Deep --many Learning -depicted -Figure
[114][120]
[118]
[116]
[122]2020
[121] 2018
2020
[108]
2019 LC
2019 C/L
C C
2018 CNN
CNN
L
C Semantic 1.
4.
feature
feature Similar
tures
4.1.
L:LiDAR;
LiDAR;
Semi-handcrafted
classL: based
challenges
In an
Challenges
challenges VGG16
ResNet18
RangeNet++ features
Perceptual
Based on
C:of
VGG16
C:
ResNet-50 on
of
environment,
ofloop
loop
[119]
and [117]
Camera;
Camera;
[115] rolemay
appearance,
Aliasing
the existing
Loop
Siamese
[117] closure
closure
of appear
the
Closure
deep
: :Indoor;
Indoor; structure,
literature
detection
objects
detection
+ +
- +-learning in
+ Detectiondifferent
may and
-- +- methods
: :Outdoor;+
Outdoor; their
methods
have
+in SLAM+: places
relative
-and
+:+ are
Role
are
+Present;
+- -Present; such
limitations position
+ ++identified.
systems + - + identified.
visual,
+-:of Deep
-:Absent.
Absent.in of
presented
geometric,
+ -
to+ -overcome - objects
This
Learning buildings
in
This +section as
Section
section
and
+ ++ - lists
those + - that
3,
lists
topological
challenges. have
thein
the
+ the+ - +- fea-
- majormajorthe -
[110]
[114] 2020
2020 L
L
Semantic
SegMap
Semantic
descriptor
feature
class RangeNet++
Hybrid
CNN [123]
[115] - - - ++ +-- - +- +-
[120] 2020
[118] 2018
2020 C
C/L CNN Multiview
feature same
1. In
Similar
challenges
challenges
tures an
based environment,
structure,
VGG16
VGG16 features
of loop
[119]
and
on color,
role may
closureand
ofofClosure
appearance, the
appear
: deep objects
topological
detection
+ learning
structure, in may
different
+ - methods
and have
features,
inSLAM
SLAM +
relative places some
arein
systems
position-such visual,
corridors
identified. asto geometric,
in -
overcome
of in
many Thisa and
building
buildings
section
those + topological
withthat
lists the
the
challenges. have - same
major fea-
the -
Based on the existing literature and their limitations presented -- -objects inas depicted
Section 3, the in Figure
the
[116] [109] C 2018 CNN L feature SegMap
4. ResNet18
Challenges and
[117] CNN
role
of Loop deepClosure+ learning + -
Detection in + - systems
+ - to + overcome
+ - those
+ -
challenges. -
[121] 2016
[112]
[116]
[122] 2019 C
2020
2019 CC SIFT, CNN
SemanticSURF,
featureORB
4.
featureL: In
LiDAR;
Based anC:on
ResNet-50
PCANet
Challenges
ResNet18
Hybrid environment,
Camera;
the
[113]
of [117]
[117]
[123] existing
Loop the
Indoor; +-objects
literature +Detection may
: Outdoor;
and
+- + their have
and +:++and RoleRole
some
+Present;
limitations ++of -:of
+ visual,
DeepDeep
Absent. geometric,
presented
+Learning Learning in and
Section+-- + topological 3, -- - fea-
+major
[120] 2020
[118] 2018
[110]C/L 2020 CNN
C CNN L descriptor
CNN Multiview
feature
feature SegMap
tures
same
4. based
structure, VGG16
or
structure,
VGG16
Challenges on
[119]in CNN
appearance,
doors
ofcolor,
Loop that and
Closure+ +
structure,
havetopological the - -
same
Detection+ and features,
+ +
relative
geometry. - -in +
position - This
corridors -of
occurrence - objects
in a + as
of
building
+ -
depicted
similar with + in
features
the- -
Figure same at -
1.tures
Similar
challenges
4.4.1. Perceptual
Challenges
based ofon
features
and on loop
ofrole closure
may
of: deep
Aliasing
Loop
appearance, appear
Closure detection- in
learning
++structure, different
Detection +--in methodsSLAM and ++-and
places Role
are
systems
Role such ofofas
identified. toDeep
Deep in
overcome --many Learning
This
Learning section
buildings
those lists
that
challenges. thehave --major the
+and relative position of objects asSection
[121]
[114] 2019
2020 C
L Semantic class ResNet-50
RangeNet++
challenges of [117]
[115]
loop closure detection methods are +-identified. ++depicted in+major Figure
[118]
[120]
2020
[122]20182019 C
[112]
C/L C CNN
2016
CNN
Semantic
CNN
feature
descriptor
C Multiview
SIFT,
feature L: LiDAR;
feature
SURF,Based
VGG16
Based
ORB C:on
Hybrid
VGG16
[119]
Camera;
the
PCANetthe
[123] existing
existing
[113] Indoor; literature
literature
+ + : Outdoor; and
+ - and their
+:
their
+ Present;
- limitations
+limitations
+ + +-: Absent. presented
presented
+ - - This section
inin
Section
- - lists3, 3,the
the
- themajor
- +major -
[116]
[121] 20202019 C CNN
CNN feature 1.1.Similar
different
structure,
4.1.
same
challenges
4.1.
4. Perceptual
Challenges
ResNet18features
locations
Perceptual
ResNet-50
Similar or
structure,and in
[117]
[117]
features role
Aliasing
of may
causes
doors
Aliasing
color,of
Loop
may that
and
deepappear
the
have ++ loop
topological
Closure
appear learning inin
thedifferent
detection
same
Detection++in
different places
geometry.
features,
SLAM ++and insuch
algorithm
systems
Role ++such ofas
This
corridors toDeepin +-many
generate
occurrence
overcome buildings
inLearningfalse
a buildingofSection
those loop
+-similar that
with
challenges. have
correspond-
features
the - same theat
[120]
[122] 20182019
[114] C
C
2020 Semantic
feature
descriptor
L challenges
featureL:
Semantic In
Based
challenges
LiDAR;
VGG16
BasedanC:
Hybrid
class on on
of
[123] the
environment,
and role
the
loop
Camera;
RangeNet++ existing
of
existing deep
closure the literature
+objects
learning
literature
: Indoor;
[115] detection may
+in
and
: Outdoor; and
-methods their
have
SLAM
their
methods +: +places limitations
some
systems
limitations
- Present; are +- identified. toas
visual, ingeometric,
presented
+overcome
presented
-: Absent. - Thismany This -buildings
inin
thoseand
Section
section - lists that
+3,the
topological
challenges.
3,
lists have
the
the +-major major the
fea- +
[118] 2019
[121] 2020 C C/L CNN CNNCNN Multiview
feature
Multiview challenges
same
ences
different
4.1.
structure, and
VGG16
Perceptual
ResNet-50 of
structure,
is
or loop
termed
locations
[119]
in
[117] closure
color,
Aliasing
doors as
causes and
that a detection
topological
“perceptual
the
have
+ + loop the +detection
same features,
- aliasing” geometry.
+ + algorithmare in
+identified.
problem. - corridors
This to Perceptual
generate
occurrence
- - in a buildingsection
aliasing
false
of + + loop
similar withistheonethemajor
correspond-
features- -of same the
at
[122] 2019
[121] [116] C 2020 Semantic
C
descriptor 4.
L:
CNN same
tures In
challenges
feature LiDAR; In
challenges an
Basedan
based
Challenges
Hybrid
feature C:
ResNet-50 environment,
structure,
of of
environment,
onon
Camera; ofloop
loop
[123]
and
ResNet18the
[117] color,
role closure
appearance,
Loopclosure
existing
of
:
[117] the
and
the
Closure
deep
Indoor; objects
detection
literature
+ objects
topological
detection
structure,
learning: +may
may
Detection
Outdoor; + and
methods
and in have
features,
methods
have relative
and
their
SLAM
+: + + some
Present; some
are
Role
are in
limitations
systems + -: visual,
corridors
identified.
visual,
position
of
+identified. Deepto
Absent. geometric,
geometric,
of
+presented
overcome- in
Learning
This a building
This
objects+section
in and
section
and
as - depicted
Section
those + topological
with
lists
topological
lists +3,the
challenges. the
the
in
the +-major major
Figure fea- -
same
fea-
[120] 2018 C CNN featurechallenges
structure, and
VGG16
oror in roledoorsof deep learning+ of the in + SLAM + systems + loop to overcome - those - challenges. -BoW-
:that have same geometry. This occurrence of similar is+features atat -
descriptor main
ences
4.1. reasons
and
Perceptual
different is
locations for
termed the
Aliasing as
causes failure
a the
“perceptual loop appearance-based
detection aliasing” problem. detection
Perceptual methods.
aliasing Many one offea- the
[122] 2019 [118] C2020Semantic C/L feature CNN
L: tures
4.1.
tures In
LiDAR; based
structure,
feature
challenges
1.
4. anC:
Perceptual
based
Similar
Hybrid
challenges
Challenges on on
Camera;
of
features
[123]
and in
environment,
and VGG16 appearance,
doors
role
Aliasing [119]
appearance,
loop
role
of may
of
Loop of that
the
deep
Indoor;
closure
deep appear
Closure+have structure,
objects learning
structure,
detection
learning :inthe ++same
may
Outdoor;
different
Detection inandinand
have
SLAM
SLAM relative
geometry.
+relative
+:
methods - algorithm
some
Present;
places
and systems
+ position
are
systems
Role position
visual,
+such ofto
This to
-:identified.
Absent.
toas
Deep-generate
-of
in ofmany
occurrence
geometric,
overcome
overcome Thisfalse
objects
objects
Learning asas
of
and
-buildings
those
- depicted
section
those loop
depicted
similar correspond-
topological
challenges.
lists
that
challenges. inin
features
+Figure
the
have Figure
major the
[122] 2019 C CNN Multiview
Semantic featuremain Based
different
based Hybrid
locations
methods
reasons on the
[123] for existing
causes
generate
the failureliterature
the
false + loop
of and
+
detection
correspondences
appearance-based their + limitations
algorithm
as + they
loop to presented
onlygenerate
detection - consider in
false Section -
similar
methods. loop 3, the
correspond-
visual
Many +major BoW- fea-
[121] 2019 [120] C 2018 C
descriptor
CNN ences
L:4.1.1.
LiDAR;
tures
4.
1.same In and
ResNet-50
feature
Similar
different
challenges
4.1. an is
C: features
based
Challenges
Similar Camera;
on
structure,
Perceptual
Perceptual andtermed
[117]
environment,
features
locations VGG16 :as
may
causes
appearance,
of Loop
may
color,
role
Aliasing
Aliasing aappear
Indoor;
of “perceptual
the
appear
Closure
and
deep the +
objects
structure,
topological Outdoor;
Detection
in
learning +
may
indifferent
:loop +
different
detection aliasing”
and have
+:
inhave
SLAM
+
Present;
relative
and
places
features, +
places problem.
some
algorithm
Role +
position
such
in
systems +
-:suchvisual,
Absent.
of toPerceptual
asas
Deep
corridors +inof
overcome
-
geometric,
in
generate many many
objects
Learning - aliasing
asand
buildings
false
in a buildings
building
+
depicted
those is
that
with -
topological
loop thatone
correspond-
in
challenges. have
have
the
-
Figure ofsame the
the -
fea-
the
L: In
challenges
LiDAR; an
BasedC:environment,
of on
Camera;loopthe closure
existing the
: Indoor; objects
detection
literature may
: Outdoor; methods
and their
+: some
Present; are
limitations visual,
identified.
-:they
Absent. geometric,
presented This inand
section Section topological
lists 3,thethe major fea-
major
[122] 2019 [121] C 2019 Semantic
CNN ences
tures
4.based
1.Multiview
main
same
4.1.
tures
ences
same
4.1.
feature and
for
methods
reasons
Challenges
Similar is
true-loop
features
structure,
Perceptual
based
and
structure,
Perceptual
Hybrid on
is termed
for
of
termed
[123] the
Loop
may
color,
Aliasing as
detection
generate
appearance,
color,
Aliasing asanda
failure“perceptual
Closure
and false
appear
athe [88].
+ ofin
topological
structure,
“perceptual
topological This
correspondences
appearance-based
Detection
different
+same aliasing”
and problemand
features, ++ Role
places
relative
aliasing”
features, problem.
is as
such
inin
position
problem. well loop
of
++corridors Deep
corridors Perceptual
addressed
only
asdetection
+in of
Perceptual-many objects
in in
consider
Learning aliasing
recent
methods.
buildings
inaabuilding
building
asaliasing
depicted similar
- similar is
work
Many
that
with
with one
is+ the by
visual
have
inthe
one
the +BoW-
Figureof com-
of
same the
fea-
the
samethe
C structure,
tures
challenges
In
descriptor based
Based
In
an
challenges an or
on
and ofin
ResNet-50
on the
environment,
environment, doors
appearance,
role
loop of [117]
existing that
deep
closure the have
structure,
literature
learning
objectsobjects
detection the +and
may
may inand geometry.
relative
their
have
SLAM
methods some
systems
some
are position
limitations This
visual,
visual,
identified. occurrence
of
presented
geometric,
toaddressed
overcome objects
geometric,
This -inasof
those depicted
Section
and
section and 3,
topological
challenges. features
in
topological
lists the Figure
major major at -
fea-
4.main
bining
tures
based reasons
Challenges
same multiview
for true-loop
methods
structure,
structure, or of for
in Loopthe
color,
doors failure
information
detection
generate Closure
and
that false
topological
have of of
[88]. appearance-based
a
Detection
the place,
This
correspondences same instead
problem
and
features,
geometry. Role of
is
as
in well
of loop
single
they
corridorsDeep
This detection
onlyview,
Learning
occurrence inconsider
a methods.
through
in recent
building
of similar deepwith
similar Many
work neural
visual
theby
features BoW-
same net-
com-
fea-
1. Similar
main features
reasons for may
the appear
failure ofin different
appearance-based places such loopas in
detection many buildings
methods. that
Many have in at +
the
BoW-
L: LiDAR;
4.1.
structure,
different
Based
challenges
1. InIn
Similar
4.challenges an
Challenges C:
Perceptual
an orCamera;
in
locations
on theAliasing
environment,
of loop
environment,
features doors :
causes
existing
closure
may
ofappearance,
Loop Indoor;
that the
the
appearhave
the
literature
detection
objects objects
in:
the
loop Outdoor;
same
detection
may and
may
different methods +:
geometry.
their
have
haveplaces Present;
algorithm
limitations
some
are
some such -: Absent.
This
visual,
identified.
visual,as tooccurrence
in generate
presented
geometric,
geometric,
many This of
false
in
section similar
Section
and
and
buildings loop 3, features
correspond-
the
topological
lists
topological
that the
have major at
fea-
fea-
the
[122] 2019 C Semantic tures
features based
based
feature andon
based
works on
Hybrid
bining methods
[121]. ofClosure
appearance,
role [123]
multiview deepstructure,
Also, generate
the
information
Detection
structure,
learning
false
temporal
+ in and
correspondences
of information
aThis
place, and
SLAMand
relative
insteadis
Role
+ relativesystems
asas they
embedded
of single only inin
view,
ofposition
+position Deep
consider
the ofLearning
to+ overcome
descriptors
through similar
deepobjects
of objects
by as depicted
- those
visual
concate-
neural fea-
net-
- in Figure
aschallenges.
depicted
tures
same for true-loop
structure,
different color,detection
and [88].
topological problem
features, is
in well addressed in recent work by com-
based
structure,
different
ences
same
4.1.
tures
Figure Based
challenges
tures In
L:
1. Similar
1.
Based
and
based
an locations
methods
or
on
structure,
Perceptual
basedLiDAR; is
of in
locations
the
on doors
termed
loop
environment,
and
on
features
Similar
on the color,
role
Aliasing
C: causes
generate
existing
of
appearance, that
causes
as
closure
appearance,
Camera;
may
features
existing and
deep the
maythe
false
have
the
aappear loop
loop
“perceptual
literature
detection the
structure,
:objects
topological
learning
structure,
Indoor; detection
correspondences
same
detection
andmay
in aliasing”
:their
methods
and
and
ininformation
appear
literature different
andin SLAMhave
features, algorithm
geometry.
algorithm
relative
relative
Outdoor;
places
different
their are
somein
systems corridors
problem.
limitations they
This to
identified.
position
visual,
corridors
position
+:places
limitations to
Present;
such to
assuchgenerate
only
occurrence
generate
Perceptual
presented
-:
in of Thisaain
objects
geometric,
overcome
of
presentedin
objects
Absent.
many
as building
consider
in false
of
false
Section
section
building as
those and
buildings
many
in
loop
similar
loop
asaliasing with
similar
depicted
depicted
Section topological
with
challenges.
buildingsthat the
correspond-
visual
features
3,correspond-
lists is
the
theone
inin
the
3,neural
the
same
offea-
major
major
Figure
same
Figure
havethat
major
at
the
fea-
the
tures
nating
works
bining
4.
tures
different
ences for
the
and
for
and true-loop
[121]. descriptors
multiview
Challenges
structure,
ences or isin
locations
is Also,
of
termeddoors
termed
true-loop detection
the
causes
as of
information
Loop Closure
that
as
detection
a a have [88].
consecutive
temporal of
“perceptual
the loop
“perceptual a
the
[88]. This
place,
Detectionsame
This problem
frames.
detection instead
and Through
geometry.
aliasing”
problem
aliasing” is is
Role
algorithm well
embedded
of single
problem.
is
problem. of
This
well addressed
experiments,
Deep
to in
view,
occurrence
Perceptual
addressed
generate
Perceptual the
Learning in it
in recent
of
false is
descriptors
through shown
deep
similar
aliasing
recent
loop
aliasing work is
work that
by by
features
one
correspond-
is one byofcom-
image
concate-
ofnet-
com- at
the
the
main
challenges
challenges
1.
1.4.1.
have Similar
same the reasons
Similar
tures
structure, based of
sameorandloop
features
on
in
features
structure,
Perceptual for
role
structure, the
closure
of
may
appearance,
doors may
color,
Aliasing failure
deep
thatappear
and detection
appear
have
color, of
learning
structure,
the
in
and
topologicalappearance-based
methods
in
indifferent
different
same SLAM
and
topological are
systems
places
relative
geometry.
places
features, loop
identified.
such asas
position
such
features,
in This
corridors detection
in
in ofThis
overcome
incorridors
many
occurrence
many objects
in methods.
section
those
buildings
assimilar
ofand
a buildings
in
building lists
adepicted Many
the
challenges.
that
that
building
with major
have
in
features
have
the BoW-
Figure
with the
same the
at
bining
nating
works In
challenges
descriptors an
the
[121]. environment,
multiviewof loop
generated
descriptors
Also, closure
information
the of the
from objects
detection
the
consecutive
temporal of may have
methods
aappearance-based
place,
sequence frames.
information instead some
are
ofalgorithm
images
Through
is of visual,
identified.
embeddedsingle
are geometric,
view,
more
experiments,
in This
the section
through
robust it is
descriptors and topological
lists
deep
shown the
neural
distinctive
by that majorfea-
net-
image
concate- in
different
main
bining
ences
main
based
challenges and locations
reasons
multiview
reasons
Based is
4.methods
andon termed
Challenges for
for
the
role causes
the
the as
existing
generate
of failure
information
a
failure
deep the
false loop
“perceptual
of
literature
learning ofof adetection
place,
aliasing”
appearance-based
and
correspondences
insame
SLAM instead
their limitations
systems of
problem. loop
loop
asgeometry.
they
to to
single generate
detection
view,
Perceptual
detection
presented
only
overcome consider false
methods.
through
aliasing
methods.
in
those loop
Section deep
similar correspond-
isMany
Many
challenges.3, neural
one
the
ofvisual of BoW-
BoW-
major net-
the
fea-
thesame
4.1.
same
1. same
structure,
tures
works
structure,
Perceptual
differentstructure,
Similar
based
challenges
In
comparison
descriptors
structure,
an
[121]. or
on
and Aliasing
features
locations in color,
or
doors ofof
color,
may
causes
appearance,
role
environment,
toAlso,
the
generated the inand Loop
and
doors
that
deep the
temporal
descriptors have Closure
topological
topological
appear
the loop
structure,
learning in
that
objects the in
may
Detection
different
detection
have and
information
generated
features,
features,
the
SLAM places
algorithm
same
geometry.
relative
have
from
inand
issystems
some in
embedded
single
Role
corridors
corridors
such This
position
visual,
image.
ofmany
toasgenerate
to in Deep
in
occurrence
of
overcome in
This
objects
geometric,
in the
Learning
aabuilding
building
buildings
false
occurrence
ofand
asis
itthose
descriptors
loop
similar
depicted with
with that
challenges.
topological
by in the
the
correspond-have
similar
features same
same
Figure
concate-
the
at
fea-
nating
main
based
ences
works
based
challenges
4.1.
tures
structure,
same
structure,
ences
features
different
the
reasons
methods
and[121].
methods
Perceptual
for
at
descriptors
is
of
true-loop
or
structure,
and or
is in for
termed
Also,
loop
in
termed
different
locations
the
generate
generate
Aliasing
doors
doors
of
asthat
the
closure afrom
failure
detection
color,as that
a
locations
causes and
consecutive
false
have
have
the
of
“perceptual
temporal
false
detection
“perceptual
the [88].
causes
sequence
the
topological
the
loop This frames.
appearance-based
correspondences
aliasing”
information
correspondences
methods
same
same
the problem
detection
ofThrough
images
geometry.
features,
geometry.
aliasing”
loop isproblem.
are isin
problem.
detection
algorithm
loop
embedded
as they
well
are
asidentified.
they
This
This
more
experiments,
detection
only
Perceptual
only
addressed
corridors occurrence
occurrence
Perceptual
algorithm
to generatein
robust
consider
inconsider
the
This a methods.
in
to of
and
shown
aliasing
descriptors
section
recent
of
building similar
similar lists
similar
similar
aliasing
generate
false loop
distinctive
Many
is
work
with
is
that
visual
one
by
visual
the
features
one
false
image
byBoW-
of
concate-
major
features
the
correspond- of
loop the
fea-
com-
same
the
in
fea-
atat
1.
nating Inbased
Similar
tures
comparison
descriptorsthe Based
an features
environment,
on
descriptors
tofor on
may the
appearance,
the
generated of
descriptors
from existing
the
appear objectsin
structure,
consecutive
the literature
may
different
generated
sequence have
and
frames. and some
places
relative
from
of their
Through such
single
images limitations
visual,as in
position
image.
are geometric,
moremany
ofconsider
experiments, presented
objects
robust itit and
buildings
assimilar
is indistinctive
depicted
shown
and Section
topological
that thathave
inimage3,fea-
Figure the
thein major
based
tures
main
nating
tures
4.1.
bining
different methods
for
reasons
for
correspondences
structure,
different
main the
Perceptual
challenges
reasons true-loop
and
multiview
locations
or in generate
descriptors
true-loop
Aliasing
role
andthe
doors detection
ofisfailure
of
detection
deep
information
causes false
termed
that the of
consecutive correspondences
[88].
[88].
learning
have of
loop
as thea
aThis
This in
place, problem
appearance-based
frames.
problem
SLAM
detection
“perceptual
same insteadThrough
isas
is
systems
algorithm
geometry. ofthey
well
loop
well
aliasing” to
single
This only
addressed
detection
experiments,
addressed
toovercome
view,
generate
problem.
occurrence inin recent
methods.
those
through
false is
recent shown
Perceptual
of work
workMany
challenges.
deep
loop
similar visual
thatby
neuralby
correspond- BoW-
aliasing
features fea-
com-
image
com- net-at
ences
tures
4.1.
same
1. In anlocations
and
challenges
based
Perceptual
structure,
Similar
descriptors is
on for
environment,
termed
features causes
the
of
appearance,
Aliasing
generatedcolor, may failure
as
loop
and the
from
the
aappear loop
of
objects
“perceptual
closure
structure,
topological
the in detection
appearance-based
may
detection
and
different
sequence have
aliasing” algorithm
some
methods
relative
features,
ofplaces
imagesin loop
visual,
problem. are
position to
corridors
such are as generate
detection
geometric,
Perceptual
identified.
of in
in
more objects
many arobustfalse
methods.
as
building and
buildings loop
aliasing
This and section
depictedwith correspond-
Many
topological
is the
that one
in
distinctive
BoW-
listsof
Figure
have fea-
same the
the
inin major
the
comparison
based
bining
tures
bining
works formethods
descriptors multiviewto
true-loop
multiview
[121]. the
Also, descriptors
generate
generated information
detection
information
the fromfalse
temporal generated
the
[88].correspondences
ofof a a
Thisplace,
sequence
place, from
problem
information instead
of
instead single
is isas
imagesofofthey
well image.
single
single
embedded areonly
addressedview,
more
view,in consider
the through
robust
in
through similar
recent
descriptors anddeep
deepwork visual
neural
distinctive
neural
by by fea-
com- net-
net-
concate-
ences
istures
one
ences
main
1. of
different
based
Similar
structure,anand
and
Inbased the
methods isis
maintermed
locations
termed
environment,
reasons
challengeson
features
or in generate
appearance,
for
doors the
and
may as
reasons
causes
asthat
the afalse
aand
failure
role
appear “perceptual
for
the
“perceptual
objects the
loop
structure,
of of
deep
in failure
may aliasing”
detection
correspondences have
and
appearance-based
learning
different of
aliasing” some
relative
in
places SLAMproblem.
appearance-based
algorithm
as
problem.they
visual,
position
such loop asto
only
systems Perceptual
generate
Perceptual
geometric,
in ofmany toloop
consider
objects
detection and
asaliasing
detection
false
aliasing
methods.
overcome
a buildings loop
similar isis
topological
depicted those
thatMany one
methods.
correspond-
visual
one
in
have ofoffea-
fea-
Figure
BoW-
challenges.thethe
the
same
comparison
tures
works
4.1.
structure,
Infor
comparison an
Perceptual
bining
works multiview
[121]. to the
environment,
true-loop
[121]. toAlso,
the
color,
Aliasing
Also, descriptors
detection
the
information
the thehave
temporal
descriptors
temporal [88].
of
the
topological
generated
objects same
may
This
generated
adifferent
place,
information
geometry.
features,
from
have
problem
information from
instead is single
some is
in
issingle
of
This
corridors
well
embedded
embedded image.
visual,
single
occurrence
addressed
image. view,
in
in
geometric,
inthey
thethe in
through
of
building
andsimilar
recent
descriptors
descriptors deep
with
topological
work
features
by
neural
by
the
by
same
fea-
com-
concate-
concate- net-
at
nating
main
Many
ences
tures
main
tures
1. for
Similar
based
same the
reasons
BoW-based
and
reasons
based methods
structure,ondescriptors
is
true-loop
features for
termed
for the
methods
the
appearance,may
generate
color, as of
failure
detection a
failure
appear
and consecutive
generate
false of
structure,of
“perceptual
[88].in
topological This frames.
appearance-based
falsealiasing”
problem
appearance-based
and
correspondences relative
places
features, Through
correspondencesproblem.
is
in well
as loop
loop
position
such they experiments,
as
corridors detection
as
Perceptual
addressed
detection
of
in
only objects
many
in a inonlyit
methods.
as isconsider
methods. shown
aliasing
recent
depicted
buildings
consider
building work
similar
with Many
is
Many
that inthat
have
visual
the image
similar
one
by BoW-
of
com-
BoW-
Figure
same thethe
fea-
different
structure,
tures based locations
or
on in causes
doors
appearance, thatthe have loop
structure, the detection
same
and algorithm
geometry.
relative to generate
This
position occurrence
of objects false
as loop
ofdepicted
similar correspond-
features
in Figure at
bining
nating
works
nating multiview
the
[121].
the
descriptors descriptors
Also,
descriptors
generated information
the of
temporal
of consecutive
consecutive
from of
the aappearance-based
place,
informationframes.
frames.
sequence instead
of Through
is of
embedded
Through
images singleare view,
experiments,
inin
experiments,
more the through itand
descriptors
it
robust is deep
issimilar
shown
shown
and neural
by that
that concate-
distinctive imagenet-
image
visual
1. based
based
main
bining
Similar
same
tures
ences
1.
features
In methods
an
methods
reasons
multiview
features
structure,
structure, for
4.1. for
environment,
true-loop
orPerceptual
and features
different
Similar in doors
is termed
locations true-loop
generate
generate
for may
color,the
information
and
detectionthe
appear
Aliasing
asthat
causes
may aappear detection
false
false
failure objects
in
topological
have
“perceptual
the [88].
the
loop
in
may
This
same[88].
correspondences
correspondences
of a place,
different haveThis
instead
places
features,
problem
detection
different geometry.
aliasing” problem
some as
of
in
isas
such
problem.
algorithm
places such
they
visual,
they
loop
single
as
corridors
well
This as
is well
only
only
in geometric,
detection
view,
many
addressed
occurrence
Perceptual
toin generate
many
addressed
consider
consider
athrough methods.
buildings
building
in recent
of similar
aliasing
false
buildings
in
similar
deep
loop recent
topological
that
with work
that
visual
Many
neural
have
the
features work
visual
by
iscorrespond-
onehave
fea-
BoW-
net-
the
same
com-
of thetheatin
fea-
bynating
works
descriptors
comparison
tures
based
works
same the
descriptors
[121].
combining
tures for
based
formethodsdescriptors
on
true-loop
[121].
structure, generated
Also,
generated
to the
multiview
true-loop the
appearance,
Also, generate
color, of
from
descriptors
detection
detection
the and consecutive
from
temporal
temporal false the
the
information[88].
structure,
[88].
topological sequence
generated
This
This frames.
sequence
information
information of
and
correspondences of
problem ofThrough
from is
ageometry.
place,
relative
problem
features, is images
inembedded
images
single
isisinstead
as well
position
they
well
embedded
corridorsexperiments,
are
are
image. more
in
more
of
addressed
of
only
addressed
in the
insingle robust
objects
aconsider
the in it
robust
in asis
descriptors
view, shown
and
recent
descriptors
building and
depicted
recent similar
with work
work bythat
distinctive
by
distinctive
through thein by image
concate-
deep
by
Figure
visual
same com-
com-
concate- inin
fea-
structure,
bining
different
main
ences reasons
and or
multiview
is
Inin
locations doors
for
termed
an the that
information
causesfailure
as
environment, have
a the loop
of
“perceptual the
of same
adetection
place,
appearance-based
the objects instead
aliasing”
may algorithm of This
single
loop
problem.
have some to occurrence
view,
generate
detection
Perceptual
visual, throughof
false
methods. similar
loop
aliasing
geometric, deep isfeatures
neural
correspond-
Many
and one BoW-
of
topological at
net-
the
same
1.
structure,
descriptors
comparison
nating
comparison
neural
bining
Similar
tures
bining
nating
structure, the
networks
multiview
for generated
toto
descriptors
features
true-loop
multiview
the or the
descriptors
in doors
color,
the
[121].may
and
from
descriptors
ofappear
descriptors
Also,
information
detection
information
of
that
topological
the
consecutive
the
consecutive
have in
[88].
of
the sequence
generated
generated
temporal
of a a
different
This
place,
same
features,
frames.
place,
frames. of
from
from
places
problem
instead
geometry. images
Through
information
instead
Through
in
single
single of
such
is
of
corridors
are
image.
single
well
single
This as more
experiments,
image.
is embedded
in
in
view,
many
addressed
view,
experiments,
occurrence
a building
robust it is
in
through is
buildings
in
through
it
of theand
shown
recent
shown
similar
with
descriptors
deep
deep that
work
the
distinctive
that
neural
have
neural
that
features by
same
image
imageby in fea-
net-
the
com-
net-
at
different
works
ences
based
main and
structure, locations
[121].
methods
reasons
tures is Also,
termed
based
orgenerated
inthe for
doors causes
generatethe
as
the
on that a
failurethe
temporal
appearance,
have loop
“perceptual
false detection
information
correspondences
ofthe aliasing”
appearance-based
structure,
same from algorithm
and
geometry. is embedded
problem.
as they
loop
relative
This to generate
only in
Perceptual
detection
position
occurrence the
consider false
descriptors
aliasing
of objects loop
similar
methods.
of similar as correspond-
is by
one
visual
Many
depicted concate-
features atof the
fea-
BoW- in Figure
descriptors
comparison
concatenating
works
same
bining
works
different
ences
nating [121].
structure,
descriptors multiview
[121].
and locations
the totheAlso,
generated
Also,
isdescriptors
termed descriptors
color, the
the
causes from
descriptors
atemporal
and
information
from
temporal
asfailure
of the ofthe
topological
the
loop
“perceptual
consecutive ofsequence
generated
consecutive
information
a place,
sequence
information
detectionaliasing”
frames. of
features,
instead
of images
frames.single
isproblem.
images
is
algorithm
Throughembedded
in of
embedded are
image.
Through
corridors
single
are
to more in
inas
view,
more
in
generate
Perceptual
experiments, the robust
experiments,
the
arobust
through
descriptors
false and
descriptors
building and
loop
aliasing
it is itdeep
shownis
withdistinctive
shown
byby the
neural
distinctive
correspond-
is one that
concate-
same
concate-
that of
image in
net-
thein
main
tures
based
differentreasons
for1. true-loop
methods
Similar
locations for the
detection
generate
features
causes false
may
the of
[88]. appearance-based
appear
loop This problem
correspondences
in
detection different is
algorithm loop
well
as they
places to detection
addressed
only
such
generate in
consider
in methods.
manyrecent
false work
similar
loop Many
buildings by
visual
correspond- BoW-
com-
that fea-
have the
comparison
image
nating
works descriptors
structure,
comparison
nating
ences
main and the
isor
[121].
the
reasons to
in
to
termed the
descriptors
doors
Also,
the
descriptors
for descriptors
generated
the
as
the aofof
that temporal
descriptors from
consecutive
have
consecutive
“perceptual generated
thethe samesequence
frames.
information
generated frames. from
geometry.
from
aliasing” single
of
Through
is
single
ofThrough images
embedded
problem. image.
This
image. are
experiments,
Perceptualmore
experiments,
occurrence
in the robustitis
of is
descriptors
itin
aliasing and
shown
similar
shown distinctive
iswork bythat
features
that
one image
concate-
image
of the at
descriptors
based
bining
tures
ences methods
multiview
for
same
and generated
true-loop generate
structure,
is termed asfailure
afrom
information
detection
color,false of
the appearance-based
[88].
and
“perceptual asequence
correspondences
of place,
This
topological instead
problem
aliasing” images
as
features,
problem. loop
is they
of single
well are
indetection
only more
view,
addressed consider
corridors
Perceptual methods.
robust
through
in recent
aliasing and
asimilar
deep
building Many
distinctive
is visual
neural
onewithbyBoW-
of fea-
net-
com-
the in same
the
indifferent
maincomparison
descriptors
descriptors
nating
based reasons locations
the
methods
comparison
tures for true-loop to
generated
for
to the
generated
descriptorsthe
generate
the descriptors
causes
failurefrom
from
of
descriptors
detection the
false of the
loop
the
consecutive
[88]. generated
sequence
detection
sequence frames.
appearance-based
correspondences
generated
This problemfrom
ofof
from single
images
algorithm
images
Through
as
singleloop
they
isgeometry.
well image.
are
to
are more
generate
more
experiments,
detection
only
image.
addressed robust
robust
consider false
it
methods.
in isandand
loop
shown
similar
recentdeep distinctive
correspond-
distinctive
Many
work that
visualbyBoW-image
BoW- fea-
com- inin
works
bining
main [121].
multiview
structure,
reasons Also,
for or the
the temporal
information
infailure
doors of of
that information
a place,
have the instead
appearance-based same is embeddedofloop
single in
view,
This
detection theoccurrence
descriptors
through
methods. by
ofMany concate-
neural
similar net-
features at
comparison
ences
bining
nating
works and
descriptors
comparison
based
tures methods
for is
multiview
the to
true-loop to
termed
descriptors
[121].
different the
generated
the
generate
Also, descriptors
as a
information
the
locations of from
descriptors
detection “perceptual
false
consecutive
temporal
causes the
ofgenerated
a sequence
generated
correspondences
[88]. This
place,aliasing”
problem
frames.
information
the loop from
from of
insteadThrough
detection issingle
problem.
images
single
asisof they
well image.
single
embedded are
image. Perceptual
only more
addressed
view,
experiments,
algorithm consider
in tothe robust
in
throughitaliasing
similar
recent
is and
deep
shown
descriptors
generate work
false is one
distinctive
visual
neural
that
by
loopby offea-
com-
image the
net-
concate- in
correspond-
based methods generate false correspondences as they only consider similar visual fea-
main
tures
biningforreasons
comparison true-loop
multiview toforthe the failure
descriptors
detection
information [88].ofof appearance-based
agenerated
This
aThis
place,problem from
instead issingle
well loop detection
image.
addressed methods.
indescriptors
recent work Many by BoW-
com-
works
descriptors
Figure
nating
tures for[121].
1. (a)
the
ences Also,
generated
Same
descriptors
true-loopand isthe
place temporal
from
with
termed
detection the
different
of consecutive
as[88]. information
sequence
appearance
“perceptualframes.
problem is
and isofwell
embedded
of aliasing”
images
Through (b)single
are
similar view,
inlooking
more
experiments,
problem.
addressed the through
robust it isand
indifferent
Perceptualrecent deep
shown neural
by
distinctive
places.
aliasing
work concate-
that
by is net-
image
com- in of the
one
based
bining
works
nating
comparison
Figure methods
multiview
[121].
the
descriptors Also,
descriptors
to
1. (a) Same generate
information
theplace
generated the of
descriptors
with false
temporal
consecutive
from of correspondences
a place,
information
generated
thedifferent
the instead
frames. from is
Throughof
of single as they
single
embedded image. only
view,in
experiments, consider
through
the descriptors
it is similar
deep
shown visual
neural
by
that concate-
imagefea-
net-
bining main
multiview reasons for
information failure appearance
of asequence of appearance-based
place, instead and of(b)
images similar
singleareview,more
loop looking robust
detection
through different and places.
distinctive
methods.
deep neuralMany net-in BoW-
tures
works
nating
4.2. for
[121].
descriptors
comparison
Figure the true-loop
based
1. Also,
(a)descriptors
Variation in
Sameto detection
the temporal
of
Environmental
generated
the
methodsplace from
descriptors
generate
with
[88].
consecutive
the This
information
Conditions
different sequence
generated
false problem
frames. ofis
from
correspondences
appearance images
and
is well
embedded
Throughsingle addressed
are in
experiments,
asmore
image.
(b) similar they theonly init
robust recent
descriptors
is and
shown
consider workbysimilar
distinctive byimage
concate-
that com- in
visual fea-
works [121]. Also, the temporal information is embedded inlooking
the different
descriptors places.
by concate-
bining
nating
Figure the
descriptors
4.2. The multiview
1. descriptors
Variation
comparison (a)
tures
loop generated
to
Same
forinthe
closure information
ofis consecutive
from
Environmental
descriptors
place
true-loop withan open the of a place,
generated
different
detection frames.
sequence
Conditions
appearance
problem [88]. instead
from
This
due Through
of images
single
and
problem
to of(b) single
variation are
image.
similar
is view,
experiments,
more
well
in looking through
addressed
illumination itdifferent
robust is shown
and deep
in neural
that
distinctive
places.
recent
conditions image
work net-
[124], inby com-
nating
Figurethe 1. (a)descriptors
Same placeofwith consecutive frames. Through
different appearance and (b) experiments,
similar lookingitdifferent is shown that image
places.
works
descriptors
comparison
4.2. [121].
Variation
bining toAlso,
generatedthe
inmultiview the temporal
from
descriptors
Environmental the
information information
sequence
generated
Conditions of afromof
place, isdepicts
images embedded
single
instead are
image. more
of in the
single robust descriptors
view,in and through by
distinctive concate-
deep in
neural
seasons The
descriptors [125],
loop and
closure
generated viewpoints
is
from an open
the [126].
problem
sequence Figure of2images
due to variation the
are variation
in
more illumination
robust seasons
and conditions from
distinctive sum-
[124], in net-
nating
comparison
4.2. the
Variation
works descriptors
to the
in of
descriptors
Environmental
[121]. Also, consecutivegenerated
Conditions frames.
from Through
single experiments,
image. it is shown that image
mer to
seasons
comparison
4.2.
Figure winter,
1.[125],
Variation
The loop to light
and
the
in
(a)generated
Same changes anthe
viewpoints
descriptors
Environmental
closure place iswith openfromtemporalday
[126].
generated
Conditions
different problem toinformation
Figurenight,
due2to
from
appearance and
depicts
single
variation
and is
(b) embedded
viewpoint
the
image. invariation
similar variation
illumination
looking inin theseasons
due
different descriptors
to
conditions lateral
from[124],
places. by
and
sum- concate-
descriptors
angularThe nating
loop
changes. the
closure isfrom
descriptors an open the
of sequence
consecutive
problem dueofframes.
images
to variation are
Through more
in robust
experiments,
illumination and it distinctive
is
conditions shown that
[124], in image
mer
seasons
Figure to
FigureThe winter,
1.[125],
(a)
1. (a)loop Same
Same light
and placechanges
viewpoints
place with
closure with from[126].
different
is andifferent
open day to
Figure
appearance
appearance
problem night,2
dueand and
depicts
and viewpoint
(b) the
similarvariation
(b) similarinlooking
to variation variation
looking
illumination indifferent
different due
seasons to lateral
from
places.
places. [124],
conditions sum- and
comparison descriptorsto andthe descriptors
generated generated
from the from
sequence single of image.
images are more robust
seasons
angular
mer
Figure
4.2. to
seasons 1. [125],
changes.
winter,
(a)
Variation Same
[125], light
inandplaceviewpoints
changes
with different
Environmental
viewpoints from [126].day Figure
to
appearance
Conditions
[126]. night,
Figure 22and
depicts
and
depicts(b) the
thevariation
viewpoint
similar variation
looking
variation ininseasons
different dueplaces.
seasons toand from
lateral
from distinctive
sum-
and
sum- in
mer
angular
4.2.
Figure
mer to
to comparison
winter,
4.2.Variation
Variation
changes.
1.winter,
(a)loopSame light
ininclosure to the
changes
Environmental
Environmental
placechanges
light with descriptors
from
different
from day
Conditions
Conditions generated
to
appearance
day night,
to night, and from
and single
viewpoint
(b)viewpoint image.
similarinlooking variation
looking different due to
places.lateral and
Figure The
1. (a) Same place is andifferent
with open problem
appearance dueand toandvariation
(b) similar variation
illumination different due to lateral
conditions
places. and
[124],
angular
4.2. Variation
angular The changes.
The1.loop loop
changes.in Environmental
closure
closure is
iswith an
an open open Conditions
problem
problem due
due 2toand to variation
variation in illumination conditions [124],
seasons
Figure [125],
(a) Same and viewpoints
place different[126]. Figure
appearance depicts theinvariation
(b) similar illumination in seasons
looking different conditions from[124],
places. sum-
4.2.
4.2. Variation
seasons
seasons
mer
FigureThe
Variation [125],
loop
[125],
to1.winter, in
in
(a) Same Environmental
and
closure viewpoints
iswith
Environmental
andlight an open
viewpoints
place changes Conditions
[126].
problem
Conditions
[126].
from
different day Figure
todue
Figure
appearance night, 2depicts
2to depicts
variation
and(b)
and thein
the
viewpoint
similar variation
illumination
variation variation
looking ininseasons
seasons
differentconditions
dueplaces. from
tofrom
lateral sum-
[124],
sum- and
Figure 1. (a) Same place with different appearance and (b) similar looking different places.
merto
mer
seasonstowinter,
The
angular
4.2. The winter,
1.loop
[125],
changes.
Variation
Figure in
(a) Same
loop light
closure
light
and changes
iswith
changes
Environmental
closureplace
is an from
an open
open
viewpoints from daytotonight,
problem
day
[126].
Conditions
different night,
due
Figure
appearance
problem due to
2to and(b)
and
depicts
and viewpoint
variation in
viewpoint
the
similar
variation in variation
illumination
variation
variation
looking
illumination due
due
indifferent
seasons tofrom
to lateral
conditions
lateral
places.
conditions and
[124],
and
sum-
[124],
mer to winter, light changes from day to night, and viewpoint variation due to lateral and
based methods generate false correspondences as they only consider similar visual fea-
tures for true-loop detection [88]. This problem is well addressed in recent work by com-
bining multiview information of a place, instead of single view, through deep neural net-
works [121]. Also, the temporal information is embedded in the descriptors by concate-
Sensors 2021, 21, 1243
nating the descriptors of consecutive frames. Through experiments, it is shown that image
10 of 17
descriptors generated from the sequence of images are more robust and distinctive in
comparison to the descriptors generated from single image.
(a)Same
Figure1.1.(a)
Figure Sameplace
placewith
withdifferent
differentappearance
appearanceand
and(b)
(b)similar
similarlooking
lookingdifferent
differentplaces.
places.
4.2. Variation in Environmental Conditions

4.2. Variation in Environmental Conditions
The loop closure is an open problem due to variation in illumination conditions [124],
The [125],
seasons loop closure is an open[126].
and viewpoints problem due
Figure to variation
2 depicts in illumination
the variation conditions
in seasons [124],
from summer
seasons
to winter, light changes from day to night, and viewpoint variation due to lateralsum-
Sensors 2021, 21, x FOR PEER REVIEW [125], and viewpoints [126]. Figure 2 depicts the variation in seasons from 10and
of 17
mer to winter,
angular changes.light changes from day to night, and viewpoint variation due to lateral and
angular changes.
Figure
Figure2.2. Change
Change in
in appearance of places
appearance of places due
due to
to variation
variationin
in(a)
(a)seasons
seasons[125],
[125],(b)
(b)light
light[124],
[124],and
and
(c) viewpoint [126].
(c) viewpoint [126].
For
Forrobot
robotnavigation
navigationininenvironments
environments withwithviewpoint
viewpoint variations, the the
variations, conventional
conventionalap-
proaches succeed up to some extent to achieve high performance in loop
approaches succeed up to some extent to achieve high performance in loop closure detec- closure detection.
In the case of light, weather, and seasonal variations, the features cannot be detected i.e.,
tion. In the case of light, weather, and seasonal variations, the features cannot be detected
features detected during daytime are not detectable at night due to light effects. Seasonal
i.e., features detected during daytime are not detectable at night due to light effects. Sea-
changes affect the appearance of the environment drastically, e.g., leaves disappear in
sonal changes affect the appearance of the environment drastically, e.g., leaves disappear
autumn, the ground is covered with snow in winter, etc. Similarly, weather conditions such
in autumn, the ground is covered with snow in winter, etc. Similarly, weather conditions
as rain, clouds, and sunlight change the appearance of the environment. It is complicated
such as rain, clouds, and sunlight change the appearance of the environment. It is compli-
to overcome these challenges using conventional methods as they are sensitive to such
cated to overcome these challenges using conventional methods as they are sensitive to
environmental conditions [88].
such environmental conditions [88].
To overcome the challenges of changing environmental conditions, many researchers
To overcome the challenges of changing environmental conditions, many researchers
have proposed CNN-based loop detection methods that are robust to the variance of
have proposed CNN-based loop detection methods that are robust to the variance of illu-
illumination and other conditions [16,88,98,122]. In [98], an unsupervised deep neural
minationisand
network usedother conditions
to achieve [16,88,98,122].
illumination In [98],
invariance an unsupervised
in visual loop detection. deep neural net-
Autoencoder-
work is used to achieve illumination invariance in visual loop detection.
based loop detection methods using unsupervised deep neural networks achieved state- Autoencoder-
based loop
of-the-art detection methods
performance for loopusing unsupervised
detection in variable deep neural networks
environmental achieved
conditions state-
[88,127].
of-the-art these
However, performance
methods forare
loop
notdetection in variable
scale invariant. The environmental conditions
viewpoint invariance [88,127].
problem up
However, these methods are not scale invariant. The viewpoint invariance
to 180-degree rotation change is addressed in [84] through an object-based point-cloud- problem up to
180-degree rotation
segmentation change is addressed in [84] through an object-based point-cloud-seg-
approach.
mentation approach.
BoW features are sensitive to illumination changes and cause loop detection failure in
severe BoW features arecondition
environmental sensitive to illumination
changes. Chen changes
et al. [89]and cause loop
extracted detection
the abstract failure
features
in severe
from environmental
AlexNet and improved condition changes. Chen
the illumination et al. [89]
invariance extracted
through the abstract
multiscale features
deep-feature
from AlexNet
fusion. and improved
For varying the illumination
weather conditions, invariance
[118] performed through multiscale deep-feature
camera-LiDAR-based loop closure
fusion. For
detection varying
using a deepweather
neuralconditions,
network. [118] performed camera-LiDAR-based loop clo-
sure detection using a deep neural network.
4.3. Dynamic Environment

The presence of moving objects in the environment is one of the major challenges for
true loop detection. The mobile objects cause occlusion to the essential features in the
scene [126,128], as shown in Figure 3; thus, the available features are not sufficient for the
Sensors 2021, 21, 1243 11 of 17
4.3. Dynamic Environment

The presence of moving objects in the environment is one of the major challenges
for true loop detection. The mobile objects cause occlusion to the essential features in the
Sensors 2021, 21, x FOR PEER REVIEW 11 the
scene [126,128], as shown in Figure 3; thus, the available features are not sufficient for of 17
algorithm to detect the loop closure, leading to the closed-loop detection failure.
Figure 3.
Figure 3. Occlusion
Occlusionininfeatures
featuresofof
same place
same duedue
place to moving objects
to moving suchsuch
objects as person [128] [128]
as person and vehi-
and
cles [126].
vehicles [126].
Many researchers
Many researchers have have addressed
addressed this this problem
problem to to improve
improve the the accuracy
accuracy of of the
the algo-
algo-
rithm.The
rithm. TheBoW-based
BoW-basedloop loop closure
closure detection
detection methods
methods perform
perform wellwell in a static
in a static environ-
environment.
ment. However,
However, the detection
the detection performance
performance decreases decreases in the presence
in the presence of dynamicof dynamic
objects due objectsto
reduced feature feature
due to reduced similarity in loopinscenes.
similarity This issue
loop scenes. can becan
This issue better addressed
be better addressedusingusingthe
semantic information
the semantic information of theofenvironment
the environment as theassemantic extraction
the semantic is not is
extraction affected if thereif
not affected
are
theremobile objectsobjects
are mobile or people [112]. [112].
or people
Hu
Huetetal.al.[95]
[95]fused
fusedthethe
object-level
object-level semantic information
semantic informationwith the
withpoint
the features based
point features
on the on
based BoW themodel
BoW model [129] to enhance
[129] the image
to enhance similarity
the image in loop
similarity in scenes and stabilize
loop scenes and stabilizethe
system in a in
the system dynamic
a dynamic environment.
environment. Object-level
Object-level semantic
semantic information
information is extracted
is extracted usingus-
Faster R-CNN [130], pretrained on the COCO dataset [131].
ing Faster R-CNN [130], pretrained on the COCO dataset [131]. It is shown that the pointIt is shown that the point
feature
featurematching
matchingfused fusedwithwithsemantics
semantics achieves
achieves better
betterdetection precision
detection precisionin comparison
in comparison to
only BoW-based loop detection. In [96], the object-based semantic
to only BoW-based loop detection. In [96], the object-based semantic information is em- information is embedded
with
beddedthe with
ORB features
the ORBto improve
features to the performance
improve of the overall
the performance of theSLAM framework.
overall SLAM frame- The
proposed
work. Theframework extracts theextracts
proposed framework semantic theinformation of objects and
semantic information assigns
of objects andtheassigns
class
labels to the
the class feature
labels to thevectors
feature which
vectors lie within
which liethewithin
boundary boxes. Feature
the boundary boxes.matching
Feature is only
match-
performed among the features with same class labels. Thus,
ing is only performed among the features with same class labels. Thus, avoiding the avoiding the wrong matches
and
wrongreducing
matches theand computation
reducing time for the loop closure
the computation time fordetection
the loopthread.
closureMemon
detection et al. [16]
thread.
combined the supervised and unsupervised deep learning
Memon et al. [16] combined the supervised and unsupervised deep learning methods to methods to speed up the loop
detection
speed up process. The true-loop
the loop detection process. detection is ensured
The true-loop by removing
detection is ensured thebyfeatures
removing fromthe
dynamic objects that are either moving or temporarily static. Through
features from dynamic objects that are either moving or temporarily static. Through deep deep learning, the
proposed
learning, system
the proposedperforms eightperforms
system times faster loop
eight closure
times detection
faster at lowdetection
loop closure memory usage at low
in comparison to traditional BoW-based methods.
memory usage in comparison to traditional BoW-based methods.
4.4. Real-Time Loop Detection
4.4. Real-Time Loop Detection
In SLAM, the mapping and loop detection run in parallel threads. As the robot keeps
In SLAM, the mapping and loop detection run in parallel threads. As the robot keeps
on generating the environment map along the trajectory, the loop detection algorithm
on generating
compares the currentthe environment map along
frame (in visual SLAM) the trajectory,
with the loop
all previously detection
seen images algorithm
to detect
compares
the the current
closed loop. As the frame (in increases,
map size visual SLAM) with all previously
the similarity computation seen
timeimages to frame
for each detect
increases which slows down the system and is not suitable for real-time applicationsframe
the closed loop. As the map size increases, the similarity computation time for each [88].
increases
Many which slows
approaches havedown
beenthe system and
proposed is not
in the pastsuitable for real-time
to overcome applications
this challenge such[88].
as
Many approaches
selecting have been
random frames proposed
[132,133] infixed
or the the past to overcome
keyframes thisinchallenge
as used ORB-SLAM2such [134]
as se-
lecting
for random with
comparison framesthe[132,133]
current or the fixed
frame, keyframes
but still, as used
the system inslow
will ORB-SLAM2
down in [134]
case for
of
comparison with the current frame, but still, the system will slow down in case of longer
trajectories and also the probability of detecting true loop will decrease [88]. Thus, devel-
oping a real-time loop closure detection algorithm able to optimize the computation time
with the variable map size is one of the major challenges. In the previous few years, deep
learning-based loop closure detection methods have been developed to enhance the com-
Sensors 2021, 21, 1243 12 of 17
longer trajectories and also the probability of detecting true loop will decrease [88]. Thus,
developing a real-time loop closure detection algorithm able to optimize the computation
time with the variable map size is one of the major challenges. In the previous few years,
deep learning-based loop closure detection methods have been developed to enhance
the computational efficiency through different schemes such as reducing the descriptor
size [121], reducing the deep network layers during deployment [94,98], matching features
of same semantic class [96], generalizing scene representation with segmentation and
matching segment feature descriptors instead of point features [109].
5. Conclusions and Future Research Directions

SLAM is an integral part of most autonomous robots. This article presents an extensive
survey primarily focused on loop closure detection methods based on visual and Lidar
features and groups them into two major categories. Based on the limitations of each
approach, the major challenges of loop closure detection are identified. The survey also
argues on how those challenges are addressed by the deep learning-based methods. From
the reviewed literature, it is observed that loop detection methods based on deep neural
networks proved to be robust to the challenges, but true-loop detection is still an open
issue as both the camera and Lidar-based deep-learning loop closure detection approaches
have some limitations.
The vision-based loop closure detection methods are sensitive to illumination varia-
tions and cannot work, but LiDAR can. Similarly, the Lidar-based methods fail in weather
changes such as rain, while vision-based methods can perform comparatively well [135].
Thus, there is a need for research in visual–LiDAR fusion-based loop closure detection
to take advantage of both modalities for achieving robustness against illumination and
environmental changes [118]. To this end, the LiDAR scan analysis for feature detection and
camera–LiDAR calibration are the primary problems to be addressed [118]. The semantics
provides high-level understanding of the environment allowing the robot to percept the en-
vironment like the humans. One of the major limitations of semantics-based loop detection
methods is the assumption that there are enough objects learned by the pretrained CNN
model. In a real environment, this assumption may not be satisfied. Also, learning-based
methods are computationally expensive, and the performance is dependent on the dataset
used for training the network.
Author Contributions: Conceptualization, S.A.; Data curation, S.A.; Formal analysis, S.A.; Resources,
S.A.; writing—original draft preparation, S.A.; Supervision, G.-W.K.; Review & edit, S.A. and G.-W.K.
All authors have read and agreed to the published version of the manuscript.
Funding: This research was financially supported in part by the Ministry of Trade, Industry and En-
ergy (MOTIE) and Korea Institute for Advancement of Technology (KIAT) through the International
Cooperative R&D program (Project No. P0004631), in part by the MSIT (Ministry of Science and ICT),
Korea, under the Grand Information Technology Research Center support program (IITP-2020-0-
01462) supervised by the IITP (Institute for Information & communications Technology Planning
& Evaluation) and in part by Institute of Information & communications Technology Planning &
Evaluation (IITP) grant funded by the Korea government (MSIT) (IITP-2020-0-00211, Development of
5G based swarm autonomous logistics transport technology for smart post office).
Institutional Review Board Statement: Not Applicable.
Informed Consent Statement: Not Applicable.
Data Availability Statement: Not Applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.D.; Leonard, J.J. Past, Present, and Future of
Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE Trans. Robot. 2016, 32, 1309–1332. [CrossRef]
2. Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [CrossRef]
Sensors 2021, 21, 1243 13 of 17
3. Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359.
[CrossRef]
4. Li, S.; Zhang, T.; Gao, X.; Wang, D.; Xian, Y. Semi-direct monocular visual and visual-inertial SLAM with loop closure detection.
Robot. Auton. Syst. 2019, 112, 201–210. [CrossRef]
5. Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot.
2015, 31, 1147–1163. [CrossRef]
6. Zhu, H.; Wang, H.; Chen, W.; Wu, R. Depth estimation for deformable object using a multi-layer neural network. In Proceedings
of the 2017 IEEE International Conference on Real-time Computing and Robotics (RCAR), Okinawa, Japan, 14–18 July 2017;
Volume 2017, pp. 477–482.
7. Stumm, E.; Mei, C.; Lacroix, S.; Nieto, J.; Hutter, M.; Siegwart, R. Robust Visual Place Recognition with Graph Kernels. In
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June
2016; Volume 2016, pp. 4535–4544.
8. Cummins, M.; Newman, P. FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance. Int. J. Robot. Res.
2008, 27, 647–665. [CrossRef]
9. Galvez-López, D.; Tardos, J.D. Bags of Binary Words for Fast Place Recognition in Image Sequences. IEEE Trans. Robot. 2012, 28,
1188–1197. [CrossRef]
10. Garcia-Fidalgo, E.; Ortiz, A. IBoW-LCD: An Appearance-Based Loop-Closure Detection Approach Using Incremental Bags of
Binary Words. IEEE Robot. Autom. Lett. 2018, 3, 3051–3057. [CrossRef]
11. Chen, D.M.; Tsai, S.S.; Chandrasekhar, V.; Takacs, G.; Vedantham, R.; Grzeszczuk, R.; Girod, B. Inverted Index Compression for
Scalable Image Matching. In Proceedings of the 2010 Data Compression Conference, Snowbird, UT, USA, 24–26 March 2010;
p. 525.
12. Naseer, T.; Ruhnke, M.; Stachniss, C.; Spinello, L.; Burgard, W. Robust visual SLAM across seasons. In Proceedings of the 2015
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015;
Volume 2015, pp. 2529–2535.
13. Zhang, X.; Su, Y.; Zhu, X. Loop closure detection for visual SLAM systems using convolutional neural network. In Proceedings of
the 23rd International Conference on Automation and Computing (ICAC), Huddersfield, UK, 7–8 September 2017; pp. 1–6.
14. Shin, D.-W.; Ho, Y.-S. Loop Closure Detection in Simultaneous Localization and Mapping Using Learning Based Local Patch
Descriptor. Electron. Imaging 2018, 2018, 284–291. [CrossRef]
15. Qin, H.; Huang, M.; Cao, J.; Zhang, X. Loop closure detection in SLAM by combining visual CNN features and submaps. In
Proceedings of the 4th International Conference on Control, Automation and Robotics, ICCAR, Auckland, New Zealand, 20–23
April 2018; pp. 426–430.
16. Memon, A.R.; Wang, H.; Hussain, A. Loop closure detection using supervised and unsupervised deep neural networks for
monocular SLAM systems. Rob. Auton. Syst. 2020, 126, 103470. [CrossRef]
17. Azzam, R.; Taha, T.; Huang, S.; Zweiri, Y. Feature-based visual simultaneous localization and mapping: A survey. SN Appl. Sci.
2020, 2, 1–24. [CrossRef]
18. Taketomi, T.; Uchiyama, H.; Ikeda, S. Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 2017,
9, 16. [CrossRef]
19. Sualeh, M.; Kim, G.-W. Simultaneous Localization and Mapping in the Epoch of Semantics: A Survey. Int. J. Control. Autom. Syst.
2019, 17, 729–742. [CrossRef]
20. Chen, C.; Wang, B.; Lu, C.X.; Trigoni, N.; Markham, A. A Survey on Deep Learning for Localization and Mapping: Towards the
Age of Spatial Machine Intelligence. arXiv 2020, arXiv:2006.12567.
21. Thrun, S. Probalistic Robotics. Kybernetes 2006, 35, 1299–1300. [CrossRef]
22. Grisetti, G.; Kummerle, R.; Stachniss, C.; Burgard, W. A Tutorial on Graph-Based SLAM. IEEE Intell. Transp. Syst. Mag. 2010, 2,
31–43. [CrossRef]
23. Scaramuzza, D.; Fraundorfer, F. Tutorial: Visual odometry. IEEE Robot. Autom. Mag. 2011, 18, 80–92. [CrossRef]
24. Saputra, M.R.U.; Markham, A.; Trigoni, N. Visual SLAM and structure from motion in dynamic environments: A survey. ACM
Comput. Surveys 2018, 51. [CrossRef]
25. Eade, E.; Drummond, T. Unified Loop Closing and Recovery for Real Time Monocular SLAM. In Procedings of the British Machine
Vision Conference 2008; BMVA Press: London, UK, 2008; Volume 13, p. 6.
26. Burgard, W.; Brock, O.; Stachniss, C. Mapping Large Loops with a Single Hand-Held Camera. In Robotics: Science and Systems III;
MIT Press: Cambridge, MA, USA, 2008; pp. 297–304.
27. Williams, B.; Cummins, M.; Neira, J.; Newman, P.; Reid, I.; Tardos, J. An image-to-map loop closing method for monocular SLAM.
In Proceedings of the 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems; IEEE: Piscataway, NJ, USA, 2008;
pp. 2053–2059.
28. Williams, B.; Cummins, M.; Neira, J.; Newman, P.; Reid, I.; Tardós, J. A comparison of loop closing techniques in monocular
SLAM. Robot. Auton. Syst. 2009, 57, 1188–1197. [CrossRef]
29. Nister, D.; Stewenius, H. Scalable Recognition with a Vocabulary Tree. In Proceedings of the 2006 IEEE Computer Society Conference
on Computer Vision and Pattern Recognition (CVPR’06); IEEE: Piscataway, NJ, USA, 2006; Volume 2, pp. 2161–2168.
30. Likas, A.; Vlassis, N.; Verbeek, J.J. The global k-means clustering algorithm. Pattern Recognit. 2003, 36, 451–461. [CrossRef]
Sensors 2021, 21, 1243 14 of 17
31. Silveira, G.; Malis, E.; Rives, P. An Efficient Direct Approach to Visual SLAM. IEEE Trans. Robot. 2008, 24, 969–979. [CrossRef]
32. Chow, C.; Liu, C. Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 1968, 14,
462–467. [CrossRef]
33. Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded up robust features. In Lecture Notes in Computer Science (Including Subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Elsevier: Amsterdam, The Netherlands, 2006; Volume 3951,
pp. 404–417.
34. Cummins, M.; Newman, P. Appearance-only SLAM at large scale with FAB-MAP 2.0. Int. J. Robot. Res. 2010, 30, 1100–1123.
[CrossRef]
35. Piniés, P.; Paz, L.M.; Gálvez-López, D.; Tardós, J.D. CI-Graph simultaneous localization and mapping for three-dimensional
reconstruction of large and complex environments using a multicamera system. J. Field Robot. 2010, 27, 561–586. [CrossRef]
36. Cadena, C.; Gálvez-López, D.; Ramos, F.; Tardós, J.D.; Neira, J. Robust place recognition with stereo cameras. In IEEE/RSJ 2010
International Conference on Intelligent Robots and Systems, IROS 2010—Conference Proceedings; IEEE: Piscataway, NJ, USA, 2010;
pp. 5182–5189.
37. Angeli, A.; Filliat, D.; Doncieux, S.; Meyer, J.A. Fast and incremental method for loop-closure detection using bags of visual
words. IEEE Trans. Robot. 2008, 24, 1027–1037. [CrossRef]
38. Paul, R.; Newman, P. FAB-MAP 3D: Topological mapping with spatial and visual appearance. In Proceedings—IEEE International
Conference on Robotics and Automation; IEEE: Piscataway, NJ, USA, 2010; pp. 2649–2656.
39. Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. BRIEF: Binary robust independent elementary features. In European Conference on
Computer Vision; Elsevier: Amsterdam, The Netherlands, 2010; Volume 6314, pp. 778–792.
40. Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary Robust invariant scalable keypoints. In Proceedings of the IEEE International
Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2548–2555.
41. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In 2011 International Conference on
Computer Vision; IEEE: Piscataway, NJ, USA, 2011; pp. 2564–2571.
42. Galvez-Lopez, D.; Tardos, J.D. Real-time loop detection with bags of binary words. In Proceedings of the 2011 IEEE/RSJ International
Conference on Intelligent Robots and Systems; IEEE: Piscataway, NJ, USA, 2011; pp. 51–58.
43. Gao, X.; Zhang, T. Loop closure detection for visual SLAM systems using deep neural networks. In Proceedings of the 2015 34th
Chinese Control Conference (CCC); IEEE: Piscataway, NJ, USA, 2015; pp. 5851–5856.
44. Khan, S.; Wollherr, D. IBuILD: Incremental bag of Binary words for appearance based loop closure detection. In Proceedings—IEEE
International Conference on Robotics and Automation; IEEE: Piscataway, NJ, USA, 2015; pp. 5441–5447.
45. Kejriwal, N.; Kumar, S.; Shibata, T. High performance loop closure detection using bag of word pairs. Robot. Auton. Syst. 2016, 77,
55–65. [CrossRef]
46. Tan, W.; Liu, H.; Dong, Z.; Zhang, G.; Bao, H. Robust monocular SLAM in dynamic environments. In Proceedings of the 2013 IEEE
International Symposium on Mixed and Augmented Reality (ISMAR); IEEE: Piscataway, NJ, USA, 2013; pp. 209–218.
47. Johannsson, H.; Kaess, M.; Fallon, M.; Leonard, J.J. Temporally scalable visual SLAM using a reduced pose graph. In Proceedings—
IEEE International Conference on Robotics and Automation; IEEE: Piscataway, NJ, USA, 2013; pp. 54–61.
48. Xu, H.; Zhang, H.-X.; Yao, E.-L.; Song, H.-T. A Loop Closure Detection Algorithm in Dynamic Scene. DEStech Trans. Comput. Sci.
Eng. 2018. [CrossRef]
49. Li, H.; Nashashibi, F. Multi-vehicle cooperative localization using indirect vehicle-to-vehicle relative pose estimation. In 2012
IEEE International Conference on Vehicular Electronics and Safety, ICVES 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 267–272.
50. Torr, P.H.S.; Zisserman, A. MLESAC: A new robust estimator with application to estimating image geometry. Comput. Vis. Image
Underst. 2000, 78, 138–156. [CrossRef]
51. Kneip, L.; Scaramuzza, D.; Siegwart, R. A novel parametrization of the perspective-three-point problem for a direct computation
of absolute camera position and orientation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition; IEEE: Piscataway, NJ, USA, 2011; pp. 2969–2976.
52. Williams, B.; Klein, G.; Reid, I. Automatic Relocalization and Loop Closing for Real-Time Monocular SLAM. IEEE Trans. Pattern
Anal. Mach. Intell. 2011, 33, 1699–1712. [CrossRef] [PubMed]
53. Lepetit, V.; Fua, P. Keypoint recognition using randomized trees. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1465–1479.
[CrossRef]
54. Gao, X.; Wang, R.; Demmel, N.; Cremers, D. LDSO: Direct Sparse Odometry with Loop Closure. In Proceedings of the IEEE
International Conference on Intelligent Robots and Systems, Madrid, Spain, 1 –5 October 2018; pp. 2198–2204.
55. Engel, J.; Koltun, V.; Cremers, D. Direct Sparse Odometry. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 611–625. [CrossRef]
56. Zhou, H.; Zhang, T.; Jagadeesan, J. Re-weighting and 1-Point RANSAC-Based PnP Solution to Handle Outliers. IEEE Trans.
Pattern Anal. Mach. Intell. 2019, 41, 3022–3033. [CrossRef] [PubMed]
57. Mur-Artal, R.; Tardós, J.D. Fast relocalisation and loop closing in keyframe-based SLAM. In Proceedings—IEEE International
Conference on Robotics and Automation; IEEE: Piscataway, NJ, USA, 2014; pp. 846–853.
58. Rohling, T.; Mack, J.; Schulz, D. A fast histogram-based similarity measure for detecting loop closures in 3-D LIDAR data. In
IEEE International Conference on Intelligent Robots and Systems; IEEE: Piscataway, NJ, USA, 2015; Volume 2015, pp. 736–741.
59. Granstrom, K.; Schön, T.B.; Nieto, J.I.; Ramos, F.T. Learning to close loops from range data. Int. J. Robot. Res. 2011, 30, 1728–1754.
[CrossRef]
Sensors 2021, 21, 1243 15 of 17
60. Zhou, Q.-Y.; Park, J.; Koltun, V. Fast Global Registration. In Mining Data for Financial Applications; Springer Nature: London, UK,
2016; Volume 9906, pp. 766–782.
61. Muhammad, N.; Lacroix, S. Loop closure detection using small-sized signatures from 3D LIDAR data. In 9th IEEE International
Symposium on Safety, Security, and Rescue Robotics, SSRR 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 333–338.
62. Bosse, M.; Zlot, R. Place recognition using keypoint voting in large 3D lidar datasets. In Proceedings—IEEE International Conference
on Robotics and Automation; IEEE: Piscataway, NJ, USA, 2013; pp. 2677–2684.
63. Schmiedel, T.; Einhorn, E.; Gross, H.M. IRON: A fast interest point descriptor for robust NDT-map matching and its application
to robot localization. In IEEE International Conference on Intelligent Robots and Systems; IEEE: Piscataway, NJ, USA, 2015; Volume
2015, pp. 3144–3151.
64. Rusu, R.B.; Blodow, N.; Beetz, M. Fast Point Feature Histograms (FPFH) for 3D registration. In 2009 IEEE International Conference
on Robotics and Automation; IEEE: Piscataway, NJ, USA, 2009; pp. 3212–3217.
65. Biber, P. The Normal Distributions Transform: A New Approach to Laser Scan Matching. In Proceedings of the 2003 IEEE/RSJ
International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA, 27–31 October 2003.
66. Magnusson, M.; Lilienthal, A.J.; Duckett, T. Scan registration for autonomous mining vehicles using 3D-NDT. J. Field Robot. 2007,
24, 803–827. [CrossRef]
67. Magnusson, M.; Andreasson, H.; Nüchter, A.; Lilienthal, A.J. Automatic appearance-based loop detection from three-dimensional
laser data using the normal distributions transform. J. Field Robot. 2009, 26, 892–914. [CrossRef]
68. Weinberger, K.Q.; Blitzer, J.; Saul, L.K. Distance Metric Learning for Large Margin Nearest Neighbor Classification. Adv. Neural
Inf. Process. Syst. 2005, 18, 1473–1480.
69. Magnusson, M.; Andreasson, H.; Nuchter, A.; Lilienthal, A.J. Appearance-based loop detection from 3D laser data using the
normal distributions transform. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation; IEEE: Piscataway,
NJ, USA, 2009; pp. 23–28.
70. Lin, J.; Zhang, F. A fast, complete, point cloud based loop closure for LiDAR odometry and mapping. arXiv 2019, arXiv:1909.11811.
71. Bosse, M.; Zlot, R. Keypoint design and evaluation for place recognition in 2D lidar maps. Robot. Auton. Syst. 2009, 57, 1211–1224.
[CrossRef]
72. Walthelm, A. Enhancing global pose estimation with laser range. In International Conference on Intelligent Autonomous Systems;
Elsevier: Amsterdam, The Netherlands, 2004.
73. Himstedt, M.; Frost, J.; Hellbach, S.; Bohme, H.-J.; Maehle, E. Large scale place recognition in 2D LIDAR scans using Geometrical
Landmark Relations. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems; IEEE: Piscataway,
NJ, USA, 2014; pp. 5030–5035.
74. Wohlkinger, W.; Vincze, M. Ensemble of shape functions for 3D object classification. In Proceedings of the 2011 IEEE International
Conference on Robotics and Biomimetics; IEEE: Piscataway, NJ, USA, 2011; pp. 2987–2992.
75. Fernández-Moral, E.; Rives, P.; Arévalo, V.; González-Jiménez, J. Scene structure registration for localization and mapping. Robot.
Auton. Syst. 2016, 75, 649–660. [CrossRef]
76. Fernández-Moral, E.; Mayol-Cuevas, W.; Arevalo, V.; Gonzalez-Jimenez, J. Fast place recognition with plane-based maps. In
Proceedings of the 2013 IEEE International Conference on Robotics and Automation; IEEE: Piscataway, NJ, USA, 2013; pp. 2719–2724.
77. Nieto, J.; Bailey, T.; Nebot, E. Scan-SLAM: Combining EKF-SLAM and scan correlation. In Springer Tracts in Advanced Robotics;
Springer: Berlin, Germany, 2006; Volume 25, pp. 167–178.
78. Douillard, B.; Underwood, J.; Vlaskine, V.; Quadros, A.; Singh, S. A pipeline for the segmentation and classification of 3D point
clouds. In Springer Tracts in Advanced Robotics; Springer: Berlin, Germany, 2014; Volume 79, pp. 585–600.
79. Douillard, B.; Quadros, A.; Morton, P.; Underwood, J.; De Deuge, M.; Hugosson, S.; Hallstrom, M.; Bailey, T. Scan segments
matching for pairwise 3D alignment. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation; IEEE:
Piscataway, NJ, USA, 2012; pp. 3033–3040.
80. Ye, Q.; Shi, P.; Xu, K.; Gui, P.; Zhang, S. A Novel Loop Closure Detection Approach Using Simplified Structure for Low-Cost
LiDAR. Sensors 2020, 20, 2299. [CrossRef]
81. Douillard, B.; Underwood, J.; Kuntz, N.; Vlaskine, V.; Quadros, A.; Morton, P.; Frenkel, A. On the segmentation of 3D LIDAR
point clouds. In Proceedings of the IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011.
82. Dube, R.; Dugas, D.; Stumm, E.; Nieto, J.; Siegwart, R.; Cadena, C. SegMatch: Segment based place recognition in 3D point clouds.
In Proceedings—IEEE International Conference on Robotics and Automation; IEEE: Piscataway, NJ, USA, 2017; pp. 5266–5272.
83. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
84. Fan, Y.; He, Y.; Tan, U.-X. Seed: A Segmentation-Based Egocentric 3D Point Cloud Descriptor for Loop Closure Detection. In
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2020; pp. 25–29.
85. Liu, X.; Zhang, L.; Qin, S.; Tian, D.; Ouyang, S.; Chen, C. Optimized LOAM Using Ground Plane Constraints and SegMatch-Based
Loop Detection. Sensors 2019, 19, 5419. [CrossRef] [PubMed]
86. Schnabel, R.; Wahl, R.; Klein, R. Efficient RANSAC for Point-Cloud Shape Detection. Comput. Graph. Forum 2007, 26, 214–226.
[CrossRef]
87. Tomono, M. Loop detection for 3D LiDAR SLAM using segment-group matching. Adv. Robot. 2020, 34, 1530–1544. [CrossRef]
88. Gao, X.; Zhang, T. Unsupervised learning to detect loops using deep neural networks for visual SLAM system. Auton. Robots
2017, 41, 1–18. [CrossRef]
Sensors 2021, 21, 1243 16 of 17
89. Chen, B.; Yuan, D.; Liu, C.; Wu, Q. Loop Closure Detection Based on Multi-Scale Deep Feature Fusion. Appl. Sci. 2019, 9, 1120.
[CrossRef]
90. Cascianelli, S.; Costante, G.; Bellocchio, E.; Valigi, P.; Fravolini, M.L.; Ciarfuglia, T.A. Robust visual semi-semantic loop closure
detection by a covisibility graph and CNN features. Robot. Auton. Syst. 2017, 92, 53–65. [CrossRef]
91. Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 30th IEEE Conference on Computer Vision and
Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525.
92. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017,
60, 84–90. [CrossRef]
93. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.F. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the
2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255.
94. He, Y.; Chen, J.; Zeng, B. A fast loop closure detection method based on lightweight convolutional neural network. Comput. Eng.
2018, 44, 182–187.
95. Hu, M.; Li, S.; Wu, J.; Guo, J.; Li, H.; Kang, X. Loop closure detection for visual SLAM fusing semantic information. In Chinese
Control Conference, CCC; IEEE: Piscataway, NJ, USA, 2019; Volume 2019, pp. 4136–4141.
96. Wang, Y.; Zell, A. Improving Feature-based Visual SLAM by Semantics. In Proceedings of the 2018 IEEE International Conference on
Image Processing, Applications and Systems (IPAS), Sophia Antipolis, France, 12–14 December 2018; IEEE: Piscataway, NJ, USA, 2018;
pp. 7–12.
97. Liao, Y.; Wang, Y.; Liu, Y. Graph Regularized Auto-Encoders for Image Representation. IEEE Trans. Image Process. 2017, 26,
2839–2852. [CrossRef]
98. Merrill, N.; Huang, G. Lightweight Unsupervised Deep Loop Closure. In Robotics: Science and Systems; Springer: Berlin,
Germany, 2018.
99. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, CVPR, San Diego, CA, USA, 20–25 June 2005.
100. Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 Million Image Database for Scene Recognition. IEEE Trans.
Pattern Anal. Mach. Intell. 2018, 40, 1452–1464. [CrossRef] [PubMed]
101. Ca, P.V.; Edu, L.T.; Lajoie, I.; Ca, Y.B.; Ca, P.-A.M. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep
Network with a Local Denoising Criterion Pascal Vincent Hugo Larochelle Yoshua Bengio Pierre-Antoine Manzagol. J. Mach.
Learn. Res. 2010, 11, 3371–3408.
102. Zaganidis, A.; Zerntev, A.; Duckett, T.; Cielniak, G. Semantically Assisted Loop Closure in SLAM Using NDT Histograms. In
Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2019;
pp. 4562–4568.
103. Li, C.R.Q.; Hao, Y.; Leonidas, S.; Guibas, J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Adv.
Neural Inform. Process. Syst. 2017, 30, 5099–5108.
104. Yang, Y.; Song, S.; Toth, C. CNN-Based Place Recognition Technique for Lidar Slam. ISPRS—Int. Arch. Photogramm. Remote. Sens.
Spat. Inf. Sci. 2020. [CrossRef]
105. GitHub—Mikacuy/Pointnetvlad: PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR
2018. Available online: https://github.com/mikacuy/pointnetvlad (accessed on 27 November 2020).
106. Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA,
2017; pp. 652–660.
107. Arandjelovi, R.; Gronat, P.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ,
USA, 2016; pp. 5297–5307.
108. Yin, H.; Tang, L.; Ding, X.; Wang, Y.; Xiong, R. LocNet: Global Localization in 3D Point Clouds for Mobile Vehicles. In IEEE
Intelligent Vehicles Symposium, Proceedings; IEEE: Piscataway, NJ, USA, 2018; pp. 728–733.
109. Dubé, R.; Cramariuc, A.; Dugas, D.; Nieto, J.; Siegwart, R.; Cadena, C. SegMap: 3D Segment Mapping using Data-Driven
Descriptors. In Robotics: Science and Systems XIV; MIT Press: Cambridge, MA, USA, 2018.
110. Dubé, R.; Cramariuc, A.; Dugas, D.; Sommer, H.; Dymczyk, M.; Nieto, J.; Siegwart, R.; Cadena, C. SegMap: Segment-based
mapping and localization using data-driven descriptors. Int. J. Robot. Res. 2019, 39, 339–355. [CrossRef]
111. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2016; pp. 779–788.
112. Xia, Y.; Li, J.; Qi, L.; Fan, H. Loop closure detection for visual SLAM using PCANet features. In Proceedings of the International Joint
Conference on Neural Networks; IEEE: Piscataway, NJ, USA, 2016; pp. 2274–2281.
113. Chan, T.-H.; Jia, K.; Gao, S.; Lu, J.; Zeng, Z.; Ma, Y. PCANet: A Simple Deep Learning Baseline for Image Classification? IEEE
Trans. Image Process. 2015, 24, 5017–5032. [CrossRef]
114. Chen, X.; Labe, T.; Milioto, A.; Rohling, T.; Vysotska, O.; Haag, A.; Behley, J.; Stachniss, C. OverlapNet: Loop Closing for
LiDAR-based SLAM. In Proceedings of the Robotics: Science and Systems (RSS), Online Proceedings, 14–16 July 2020.
115. Milioto, A.; Vizzo, I.; Behley, J.; Stachniss, C. RangeNet++: Fast and Accurate LiDAR Semantic Segmentation. In 2019 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS); IEEE: Piscataway, NJ, USA, 2019; pp. 4213–4220.
Sensors 2021, 21, 1243 17 of 17
116. Wang, S.; Lv, X.; Liu, X.; Ye, D. Compressed Holistic ConvNet Representations for Detecting Loop Closures in Dynamic
Environments. IEEE Access 2020, 8, 60552–60574. [CrossRef]
117. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778.
118. Żywanowski, K.; Banaszczyk, A.; Nowicki, M. Comparison of camera-based and 3D LiDAR-based loop closures across weather
conditions. arXiv 2020, arXiv:2009.03705.
119. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd
International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015.
120. Olid, D.; Fácil, J.M.; Civera, J. Single-View Place Recognition under Seasonal Changes. arXiv 2018, arXiv:1808.06516.
121. Facil, J.M.; Olid, D.; Montesano, L.; Civera, J. Condition-Invariant Multi-View Place Recognition. arXiv 2019, arXiv:1902.09516.
122. Liu, Y.; Xiang, R.; Zhang, Q.; Ren, Z.; Cheng, J. Loop closure detection based on improved hybrid deep learning architecture.
In Proceedings—2019 IEEE International Conferences on Ubiquitous Computing and Communications and Data Science and Computa-
tional Intelligence and Smart Computing, Networking and Services, IUCC/DSCI/SmartCNS 2019; IEEE: Piscataway, NJ, USA, 2019;
pp. 312–317.
123. Sunderhauf, N.; Shirazi, S.; Dayoub, F.; Upcroft, B.; Milford, M. On the performance of ConvNet features for place recognition. In
IEEE International Conference on Intelligent Robots and Systems; IEEE: Piscataway, NJ, USA, 2015; pp. 4297–4304.
124. Maddern, W.; Pascoe, G.; Linegar, C.; Newman, P. 1 year, 1000 km: The Oxford RobotCar dataset. Int. J. Robot. Res. 2016, 36, 3–15.
[CrossRef]
125. Nordlandsbanen: Minute by Minute, Season by Season. Available online: https://nrkbeta.no/2013/01/15/nordlandsbanen-
minute-by-minute-season-by-season/ (accessed on 5 December 2020).
126. Suenderhauf, N.; Shirazi, S.; Jacobson, A.; Dayoub, F.; Pepperell, E.; Upcroft, B.; Milford, M. Place recognition with ConvNet
landmarks: Viewpoint-robust, condition-robust, training-free. In Proceedings of the Robotics: Science and Systems Conference
XI, Rome, Italy, 13–15 July 2015.
127. Hou, Y.; Zhang, H.; Zhou, S. Convolutional neural network-based image representation for visual loop closure detection. In 2015
IEEE International Conference on Information and Automation, ICIA 2015—In Conjunction with 2015 IEEE International Conference on
Automation and Logistics; IEEE: Piscataway, NJ, USA, 2015; pp. 2238–2245.
128. Computer Vision Group—Dataset Download. Available online: https://vision.in.tum.de/data/datasets/rgbd-dataset/download
(accessed on 2 December 2020).
129. Peng, X.; Wang, L.; Wang, X.; Qiao, Y. Bag of visual words and fusion methods for action recognition: Comprehensive study and
good practice. Comput. Vis. Image Underst. 2016, 150, 109–125. [CrossRef]
130. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances
in Neural Information Processing Systems; Neural Information Processing Systems Foundation Inc.: San Diego, CA, USA, 2015;
pp. 91–99.
131. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects
in context. In Computer Vision–ECCV 2014. ECCV 2014. Lecture Notes in Computer Science; Springer: Cham, Swizerland, 2014;
pp. 740–755.
132. Stückler, J.; Behnke, S. Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J. Vis. Commun. Image Represent.
2014, 25, 137–147. [CrossRef]
133. Endres, F.; Hess, J.; Sturm, J.; Cremers, D.; Burgard, W. 3-D Mapping With an RGB-D Camera. IEEE Trans. Robot. 2014, 30, 177–187.
[CrossRef]
134. Mur-Artal, R.; Tardos, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans.
Robot. 2017, 33, 1255–1262. [CrossRef]
135. Debeunne, C.; Vivet, D. A Review of Visual-LiDAR Fusion based Simultaneous Localization and Mapping. Sensors 2020, 20, 2068.
[CrossRef]

Role of Deep Learning in Loop Closure Detection For Visual and LiDAR SLAM-A Survey

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Role of Deep Learning in Loop Closure Detection For Visual and LiDAR SLAM-A Survey

Uploaded by

Copyright:

Available Formats

sensors

Sensors 2021, 21, 1243. https://doi.org/10.3390/s21041243 https://www.mdpi.com/journal/sensors

2. Taxonomy of Loop Closure Detection

2.1. Vision-Based Loop Closure Detection

2.1.1. Image-to-Image Matching

Methods with Offline Vocabulary

Methods with Online Vocabulary

2.1.2. Map-to-Map Matching

2.1.3. Image-to-Map Matching

2.2. Lidar-Based Loop Closure Detection

Method Benefits Limitations

• Does not require the metric

• Memory consumption is proportional

• Not suitable for sparse maps.

• High performance when tuned for

• Provides rotation invariance for

• Compresses large point cloud

3. Role of Deep Learning in Loop Closure Detection

3.1. Vision-Based Loop Closing

3.2. Lidar-Based Loop Closing

4.2. Variation in Environmental Conditions

4.3. Dynamic Environment

4.3. Dynamic Environment

5. Conclusions and Future Research Directions

You might also like