Professional Documents
Culture Documents
IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 4, JULY 2020
Abstract—Monocular vision-based navigation is a considerable robots is becoming an increasingly urgent issue. Visual
ability for a home mobile robot. However, due to diverse navigation in an indoor environment has considerable value
disturbances, helping robots avoid obstacles, especially non-
Manhattan obstacles, remains a big challenge. In indoor for monitoring and mission planning. However, there exist a
environments, there are many spatial right-corners that are multitude of disturbance of clutters and occlusion in an indoor
projected into two dimensional projections with special geometric environment, resulting in the predicament of avoiding
configurations. These projections, which consist of three lines, obstacles, especially non-Manhattan obstacles (e.g., shelf,
might enable us to estimate their position and orientation in 3D sofa, chair), which remains a difficult challenge for vision-
scenes. In this paper, we present a method for home robots to
based robots.
avoid non-Manhattan obstacles in indoor environments from a
monocular camera. The approach first detects non-Manhattan Instead of current methods (e.g., 3D laser scanners), visual
obstacles. Through analyzing geometric features and constraints, navigation that uses a single low-cost camera draws more
it is possible to estimate posture differences between orientation of attention, because is advantageous in consumption and
the robot and non-Manhattan obstacles. Finally according to the efficiency. In the human visual system, Gibson described that
convergence of posture differences, the robot can adjust its perception of depth is inborn and does not require additional
orientation to keep pace with the pose of detected non-Manhattan
obstacles, making it possible avoid these obstacles by itself. Based knowledge via experiment “visual cliff” [1]. It used to be seen
on geometric inferences, the proposed approach requires no prior that humans can recover the three dimensional structures
training or any knowledge of the camera’s internal parameters, using a binocular parallax. However, it was indicated that the
making it practical for robots navigation. Furthermore, the human ability to estimate the depth of isolated points is
method is robust to errors in calibration and image noise. We extremely weak, and that we are more likely to infer relative
compared the errors from corners of estimated non-Manhattan
obstacles against the ground truth. Furthermore, we evaluate the depths of different surfaces from their jointed points [2]. This
validity of convergence of differences between the robot indicates binocular features are not that important, and that it
orientation and the posture of non-Manhattan obstacles. The is possible to understand scenes using only monocular images.
experimental results showed that our method is capable of Meanwhile, it was reported that humans are sensitive to
avoiding non-Manhattan obstacles, meeting the requirements for surfaces of different orientations, allowing us to extract
indoor robot navigation.
surface and orientation information for understanding a scene
Index Terms—Avoiding obstacle, monocular vision, navigation, [3]. Accordingly, it can be assumed that there are some simple
non-Manhattan obstacle, spatial corner.
rules that can be used to infer 3D structure over a short period
of time. Methods were presented to understand indoor scenes
I. Introduction
based on projections of rectangles and right angles, but non-
ITH the aging population and growing amount of
W disabled peoples, the development of home service
Manhattan obstacles remains an undiscussed issue [4], [5].
In this paper, we present a method which allows for
Manuscript received June 16, 2019; revised December 6, 2019, February 6, understanding of non-Manhattan obstacles in an indoor
2020; accepted February 21, 2020. This work was supported by the National environment from a single image, without prior training or
Natural Science Foundation of China (61771146, 61375122), the National
Thirteen 5-Year Plan for Science and Technology (2017YFC1703303), and in internal calibration of a camera. First, straight lines were
part by Shanghai Science and Technology Development Funds detected, and the spatial corners projections consisting of
(13dz2260200, 13511504300). Recommended by Associate Editor Pu Wang.
(Corresponding author: Luping Wang.)
three lines can be extracted. Secondly, through geometric
Citation: L. P. Wang and H. Wei, “Avoiding non-Manhattan obstacles inferences, it is possible to understand the non-Manhattan
based on projection of spatial corners in indoor environment,” IEEE/CAA J. obstacles. Finally, through convergence of differences in
Autom. Sinica, vol. 7, no. 4, pp. 1190–1200, Jul. 2020. geometric features, it is possible to adjust robot orientation to
L. P. Wang is with the School of Mechanical Engineering, University of
Shanghai for Science and Technology, Shanghai 200093, China (e-mail:
keep pace with the posture of non-Manhattan obstacles,
wangluping@usst.edu.cn). allowing for the avoidance of such objects.
H. Wei is with the Laboratory of Algorithms for Cognitive Models, Instead of data-driven methods, such as those using deep
Shanghai Key Laboratory of Data Science, School of Computer Science, learning, the proposed approach requires no prior training.
Fudan University, Shanghai 201203, China (e-mail: weihui@fudan.edu.cn).
Color versions of one or more of the figures in this paper are available
With the use of simple geometric inferences, the proposed
online at http://ieeexplore.ieee.org. algorithm is robust to changes in illumination and color. For
Digital Object Identifier 10.1109/JAS.2020.1003117 disturbances, the method can understand non-Manhattan
Authorized licensed use limited to: VIT University. Downloaded on November 26,2020 at 09:51:38 UTC from IEEE Xplore. Restrictions apply.
WANG AND WEI: AVOIDING NON-MANHATTAN OBSTACLES BASED ON PROJECTION OF SPATIAL CORNERS IN INDOOR ENVIRONMENT 1191
obstacles with neither knowledge of the camera’s intrinsic [25]. However, this method sampled possible spatial layout
parameters nor the relation between the camera and the world, hypotheses without clutter, was prone to errors because of
making it practical and efficient for a navigating robot. occlusions, and tended to fit rooms where walls coincided with
Besides, without other external devices, the method has the object surfaces. Meanwhile, the relative depth-order of
advantages of lower required investment. rectangular surfaces were inferred by considering their
For classic benchmarks, our algorithm is capable of relationships [26], [27], but it just provided depth cues of partial
describing details of non-Manhattan obstacles. We compared rectangular regions in the image and not the entire scene.
the corners estimated by the proposed approach against the Approaches that can estimate what part of the 3D space is
corner ground truth, measuring the error through the free and what part is occupied by objects are modeled either in
percentage of pixels from summing up all Euclidean distances terms of clutter [28], [29] or bounding boxes [30], [22]. A
between estimated corners and the associated ground truth significant work was found to combine 3D geometry and
corners. Furthermore, the experimental results demonstrated semantics in the scope of outdoor scenes. Hedau proposed a
that robots can understand the non-Manhattan obstacles and method that identified beds by combining image appearances
avoid them via the convergence of posture difference between and 3D reasoning made possible by estimating the room
the robot orientation and the non-Manhattan obstacle, meeting layout [31].
the requirements of indoor robot navigation.
As to Dasgupta’s work [32], indoor layout can be estimated
by using a fully convolutional neural network in conjunction
II. Related Work with an optimization algorithm. It evenly sampled a grid of a
There are previous works which have made impressive feasible region to generate candidates for vanishing points.
progress, including structure-from-motion [6]–[9] and visual Nevertheless, the vanish point may not lie in the feasible
SLAM [10]–[14]. Through a series of visual observations, region when the robot faces certain layout scenarios, such as,
they propose a scene model in the form of a 3D point cloud. A a two-wall layout scenario. Additionally, because of the
method showed that three dimensional point clouds and image iterative refinement process, optimization took approximately
data were combined for semantic segmentation [15]. 30 seconds per frame, with a step size of 4 pixels for sampling
Nevertheless, just a fraction of the information from original lines, and a grid of 200 vanishing points. Hence, the efficiency
images can be provided via point clouds and geometric cues, of this method cannot meet the requirements of robot
thus some aspects such as edge textures are sometimes lost. navigation in an indoor environment. Also, a method was
Also, 3D structures can be reconstructed through inferring presented to predict room layout from a panoramic image
the relationship between connected super pixels. Saxena et al. [33]. Meanwhile, other methods using convolutional neural
assigned each pixel of an image of grass, trees, sky, or network were proposed to infer indoor scenes from a single
something else, through heuristic knowledge [16]. But these image [34]–[38]. Since these methods have no regard for non-
methods hardly work in indoor settings with different levels of Manhattan structures, it is difficult for them to understand
clutter and incomplete surfaces and coverage. non-Manhattan obstacles.
Furthermore, there are approaches that model geometric Recently, a method was presented to detect horizontal
scene structures from a single image, including approaches for vanishing points and the zenith vanishing point in man-made
geometric label classification [17] and for finding vertical/ environments [39]. Also, another method was proposed to
ground fold-lines [18]. As to others [19], local image estimate the camera orientation and vanishing points through
properties were linked to a classification system of local nonlinear Bayesian filtering in a non-Manhattan world [40].
surface orientation, and walls were extracted based on jointed However, it is difficult for these methods to understand non-
points with the floor. However, due to a great dependance on Manhattan obstacles. In previous works, the proposed
precise floor segmentation, these methods may fail in an algorithm can estimate the layout of an indoor scene via
indoor environment with clutter and covers. There has been projections of spatial rectangles, but there was difficulty in
renewed interest in 3D structures in restricted domains such as handling non-Manhattan structures [5]. Also, a method can
the Manhattan world [20], [21]. Based on vanishing points, provide understanding of indoor scenes that satisfy the
method detected rectangular surfaces aligned with major Manhattan assumption [4]; however, it failed to understand
orientations [5]. But dominant directions alone were discussed non-Manhattan obstacles because the structures that do not
and object surface information were not extracted. satisfy the Manhattan assumption were not discussed.
Additionally, a top down approach for understanding indoor Therefore, it is necessary to develop an algorithm to
scenes was presented by Pero et al. [22]. However, it was understand the non-Manhattan obstacles for visual navigation
difficult to explain room box edges when there were no that uses a single low-cost camera in robots. Furthermore the
additional objects. Although Pero’s algorithm [23] can method of low consumption and high efficiency meet the
understand the 3D geometry of indoor environments, it required requirement of robot navigation.
Authorized licensed use limited to: VIT University. Downloaded on November 26,2020 at 09:51:38 UTC from IEEE Xplore. Restrictions apply.
1192 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 4, JULY 2020
Corner projection
Non-Manhattan Dt
obstacle DtA
A DtB
Posture
DtC
estimating B
C
t
Line extraction
Ft
Y
If Dt = 0? Forwarding
Difference
Dt
Robot N
orientation
Orientation Turning
updating
estimate the layout and details satisfying the constraint of the LVP
Manhattan world assumption [4], [5]. The Manhattan world at infinity
assumption means that many surfaces are aligned with the
three main world axes. In other words, many surfaces are
parallel to the three principle ones. However, there are many
surface (e.g., chairs and sofas in clutter) that are not aligned
with the three main world axes, and these surfaces can be seen
as non-Manhattan structures. Due to clutter, there are many LVP
obstacles consisting of non-Manhattan spatial corners, LMVP
resulting in difficulty in navigation for home robots. In 2D
images, the part projections of these obstacles could be
considered a composition of corner-pairs, which would enable
us to estimate their original positions in a 3D scene.
Fig. 1 shows our system architecture. F t was a monocular Fig. 2. The LVPs and LMVP [4].
capture at time t. After preprocessing, we can detect non-
Manhattan obstacles and the robot’s orientation. By estimating can be projected onto the image, resulting in all kinds of
the pose of non-Manhattan obstacles, it is possible to calculate corner projections. Each of them can be seen as a composition
the difference Dt , which can help us determine the robot's of three lines. Hence, one corner can be defined as follows:
action (forwarding or turning). The details are shown in the
following sections.
C = {L s ; Ln ; Lr }. (2)
The integrity of a corner can be defined as follows:
A. Preprocessing
ls ln
Firstly, the edges are detected and straight lines are found ΛLP = |d(pc , p s ) − | + |d(pc , pn ) − | (3)
[41]. The lines are defined as follows: 2 2
Authorized licensed use limited to: VIT University. Downloaded on November 26,2020 at 09:51:38 UTC from IEEE Xplore. Restrictions apply.
WANG AND WEI: AVOIDING NON-MANHATTAN OBSTACLES BASED ON PROJECTION OF SPATIAL CORNERS IN INDOOR ENVIRONMENT 1193
100 100
50 50
0 0
−50 −50
−100 −100
−150 −100 −50 0 50 100 150 −150 −100 −50 0 50 100 150
(a) (b)
100 100
50 50
0 0
−50 −50
−100 −100
−150 −100 −50 0 50 100 150 −150 −100 −50 0 50 100 150
(c) (d)
Fig. 3. Extraction of spatial corners projections. (a) line segments; (b) projection of spatial corners (e.g., in red, blue, brown); (c) spatial corner projections of
better integrity; (d) spatial corner projections satisfying the constraint.
g
λ1 = Lr ΘLrh → 0 (6)
TABLE I g
Lines in Corner Pairs Should Satisfy the Constraint. where Lr represents the line which comes from and Lrh Cg
LV1 , LV2 , LV3 Represents the Line Is Assigned to V1 , V2 , V3 represents the other. Here Θ represents an operator which
Line/Corner Cg Ch determines whether two lines are collinear. A smaller λ1
g
{ } { } represents that these two lines are more likely to belong to the
Ls Ls < LV1 ∪ LV2 Lhs < LV1 ∪ LV2
g
{ } { } same line, as shown in Fig. 4.
Ln Ln < LV1 ∪ LV2 Lnh < LV1 ∪ LV2
g
Secondly, these two lines should also satisfy the following
Lr Lr ∈ L V3 Lrh ∈ LV3
condition:
g
according to Λ value, and it is possible to select top-ranked λ2 = vr Θvhr → π (7)
(e.g., top 100) corners via smaller ΛC. The projection of worse g
where vr represents the vector from Cg
center to midpoint,
g
Lr
integrity (e.g., in brown) would be eliminated. and vhr represents the other. Here Θ is the operator that
For each corner of an obstacle that does not satisfy the determines whether two vectors are collinear and π indicates
Manhattan world assumption, there are at least two lines that that they are in opposite directions. For smaller λ2, for
do not belong to the scene layout VPs. Since the obstacle to be example, a smaller angle β represents that these two vectors
discussed is placed on the floor, there is at least one line in the are more likely to run in the opposite direction, as shown in
corner that belongs to the LVP (V1 is the LMVP and V3 Fig. 4.
infinity). Here, lines in corner pairs should satisfy the Thirdly, the corners of the obstacle share the same obstacle-
constraint as shown in Table I. Therefore, corners can be vanishing-points (OVPs).
extracted as shown in Fig. 3.
λ3 = Og ΘOh ΘOt → 0 (8)
2) Detection of Non-Manhattan Obstacles: Therefore, it is
possible to determine the obstacle via corner pairs as following: where the OVPs ( Og
are vanishing points of and Cgbelong Oh
to C h , and here, for example, are O1 , O2 , O3) of the obstacle
G = {C g ;C h }. (5) can be computed from corner pairs (C g and C h ). Since all the
Firstly, there exist two lines, which respectively come from other corners of the obstacle would share the same OVPs(Ot),
two corners, and these two lines should belong to a same line. it is possible to determine other corners via λ3. Here Θ stands
It can be defined as follows: for an operator which determines whether the corners share
Authorized licensed use limited to: VIT University. Downloaded on November 26,2020 at 09:51:38 UTC from IEEE Xplore. Restrictions apply.
1194 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 4, JULY 2020
V3 V3
Lgn
Lg
Cg
vg
Lgr
Lhn
Lhr β
vhr
Ch
Lhs
(a) (b)
O3
O2 100
O1
50
0
β
−50
−100
Fig. 4. Non-Manhattan estimation. (a) Collinear condition; (b) Opposite directions; (c) Sharing the OVPs; (d) Estimation of non-Manhattan obstacles.
the same OVPs. A smaller λ3 implies that corners (e.g., in Dt = [ηt , ϕt ] (11)
yellow) are more likely to share the same OVPs with the where Dt represents the difference between the orientation of
obstacle, as shown in Fig. 4. the robot and the pose of the non-Manhattan obstacle.
Authorized licensed use limited to: VIT University. Downloaded on November 26,2020 at 09:51:38 UTC from IEEE Xplore. Restrictions apply.
WANG AND WEI: AVOIDING NON-MANHATTAN OBSTACLES BASED ON PROJECTION OF SPATIAL CORNERS IN INDOOR ENVIRONMENT 1195
Dt
M
OVP DtA
DtB DtC
A
DtB
DtA
DtC
B
Non-Manhattan RO C
obstacle
t
C
RO Fig. 6. The decreasing posture difference between the obstacle and robot
orientation.
B
summing up all Euclidean distances between the estimated
RO
corners and the associated ground truth corners. The
performance on the LSUN dataset [44] is compared in Table III.
A
Although the errors of corners appear lower in methods [32],
[34], they only measured the error of corners that belong to
the layout of their indoor setting, without competence of
Fig. 5. An example of avoiding an obstacle of non-Manhattan structure. understanding and estimating corners of non-Manhattan
obstacles. However, our method estimates the error of corners
TABLE II that are non-Manhattan obstacles, which plays an important
Turn Motion Mode in the Camera Coordinate System role in the navigation of the robot, allowing the robot to avoid
non-Manhattan obstacles in the indoor setting.
Difference Motion Angle
Dt < 0 turn left abs(Dt) TABLE III
Dt > 0 trun right abs(Dt) Performance on the LSUN Dataset
Dt = 0 forward NA Method Corner error (%)
Hedau et al. [42] 15.48
Based on its understanding of the indoor scene, the robot Mallya et al. [43] 11.02
can turn its orientation in order to keep pace with the posture
Dasgupta et al. [32] 8.2
of different structures (Manhattan or non-Manhattan). The
Ren et al. [34] 7.95
turning of its orientation can be modeled as a convergence of
the function Dt , as shown in Fig. 6. With a converging value Wei [5] 10.86
for posture difference, the robot can adjust its orientation, step Our method 9.98
by step. As Dt → 0, the robot’s orientation is in accordance
with the posture of the obstacle, allowing it to avoid the Experimental comparisons were conducted between our
obstacle by itself.
method and Wei’s method [4], as shown in Fig. 7. In Fig. 7,
the image size (height and width) of the Wei’s scene
IV. Experimental Results understanding and for our non-Manhattan obstacle detection
We design experiments to evaluate the performance of the are the same. For example, the scene understanding (the fifth
robot in avoiding non-Manhattan obstacles through the row, fourth column) and the non-Manhattan obstacle detection
proposed approach. The focus of the experiments is to (the sixth row, fourth column) are from the same group of line
evaluate algorithms underlying the execution of a real robot segments, in which the line segments’ numbers are the same.
mounted with only one camera. The goals of the experiment What is different is that some line segments which do not
are to evaluate not only their performance in detecting non- satisfy the constraint of angle projections in the scene
Manhattan obstacles in indoor settings, but also their ability to
understanding (the fifth row, fourth column) are eliminated,
avoid such non-Manhattan obstacles by turning its orientation
resulting in displaying less lines. Since Wei’s method only
via Dt .
considers lines belonging to vanishing points of layout in
A. Performance of Detecting Non-Manhattan Obstacles indoor scenes, it is prone to failure in detecting non-
For an input image that contains many occlusions and Manhattan obstacles. However, our method can deal with
clutter, our method copes with clutter without prior training. clutter, and can efficiently detect details, especially non-
Based on geometric constraints of spatial corners, our Manhattan obstacles, without any prior training.
approach not only detects the obstacles satisfying the Experimental comparisons were conducted between our
Manhattan assumption, but also can estimate the pose of the method and Wang’s method [5], as shown in Fig. 8. Since
obstacles, especially non-Manhattan obstacles. Wang’s method only considers rectangular projections that
We compare the obstacles estimated by our algorithm belong to vanishing points of layout of indoor scenes, it is also
against the ground truth, measuring the corner error by difficult to detect non-Manhattan obstacles.
Authorized licensed use limited to: VIT University. Downloaded on November 26,2020 at 09:51:38 UTC from IEEE Xplore. Restrictions apply.
1196 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 4, JULY 2020
(a)
150 150
100
100 100 100
50 50
50 50
0 0 0 0
−100 −100
−100
−100
−150 −150
−200 −150 −100 −50 0 50 100 150 200 −200 −150 −100 −50 0 50 100 150 200 −200 −150 −100 −50 0 50 100 150 200 −200 −150 −100 −50 0 50 100 150 200
(b)
150 150
100
100 100 100
50 50
50 50
0 0 0 0
−100 −100
−100
−100
−150 −150
−200 −150 −100 −50 0 50 100 150 200 −200 −150 −100 −50 0 50 100 150 200 −200 −150 −100 −50 0 50 100 150 200 −200 −150 −100 −50 0 50 100 150 200
(c)
(d)
250 250 250
200
200 200 200
150
150 150 150
100 100 100 100
50 50 50 50
0 0 0 0
−50 −50 −50 −50
−100 −100 −100 −100
−150 −150 −150 −150
−200 −200 −200
−200
−250 −250 −250
−200 −150 −100 −50 0 50 100 150 200 −300 −200 −100 0 100 200 300 −300 −200 −100 0 100 200 300 −300 −200 −100 0 100 200 300
(e)
250 250 250
200
200 200 200
150 150 150
150
100 100 100 100
50 50 50 50
0 0 0 0
Fig. 7. Experimental comparisons. (a) input frames from UCB dataset [26]; (b) understanding of indoor scenes by Wei’s method [4]; (c) non-Manhattan
obstacles estimated by our method; (d) images from Hedau dataset [42]; (e) results estimated by Wei’s method [4]; (f) non-Manhattan structures estimated by
our method.
B. Avoiding Non-Manhattan Obstacles The vision information was transmitted to the computer with
Here, as shown in Fig. 9, an unmanned aerial vehicle with a the CPU Intel Core i7-6500, 2.50 GHz. Then, our method can
two mega-pixel fixed camera was used for capturing video. efficiently be applied to identify non-Manhattan obstacles in a
Authorized licensed use limited to: VIT University. Downloaded on November 26,2020 at 09:51:38 UTC from IEEE Xplore. Restrictions apply.
WANG AND WEI: AVOIDING NON-MANHATTAN OBSTACLES BASED ON PROJECTION OF SPATIAL CORNERS IN INDOOR ENVIRONMENT 1197
(a)
150 200
200
100
150
150 100
100 100
50
50
50 50
0 0 0 0
−50 −50
−50
−50 −100 −100
−150 −100
−150
−100
−200
−150 −200
−200 −150 −100 −50 0 50 100 150 200 −300 −200 −100 0 100 200 300 −250 −200 −150 −100 −50 0 50 100 150 200 250 −250 −200 −150 −100 −50 0 50 100 150 200 250
(b)
150 200
200
100
150
150 100
100 100
50
50
50 50
0 0 0 0
−50 −50
−50
−50
−100 −100
−150 −100
−150
−100
−200
−150 −200
−200 −150 −100 −50 0 50 100 150 200 −300 −200 −100 0 100 200 300 −250 −200 −150 −100 −50 0 50 100 150 200 250 −250 −200 −150 −100 −50 0 50 100 150 200 250
(c)
Fig. 8. Experimental comparisons. (a) input images from LSUN dataset [44]; (b) understanding of indoor scenes by Wang’s method [5]; (c) non-Manhattan
obstacles estimated by our method.
Authorized licensed use limited to: VIT University. Downloaded on November 26,2020 at 09:51:38 UTC from IEEE Xplore. Restrictions apply.
1198 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 4, JULY 2020
(a)
50 50 50 50
0 0 0 0
−100 −50 0 50 100 −100 −50 0 50 100 −100 −50 0 50 100 −100 −50 0 50 100
(b)
(c)
50 50 50 50
0 0 0 0
−100 −50 0 50 100 −100 −50 0 50 100 −100 −50 0 50 100 −100 −50 0 50 100
(d)
Fig. 10. Pose difference estimation; (a) (c) input frame; (b) (d) pose difference (Mx and η) between the robot orientation and the non-Manhattan obstacles.
keep pace with the posture of the detected non-Manhattan this method has the advantages of lower investment and
obstacles, making it possible to avoid such obstacles. Instead energy efficiency. The experiments measure the error of
of data driven approaches, the proposed method requires no corners by comparing the corners of non-Manhattan obstacles
prior training. With use of geometric inference, the presented estimated by our algorithm against the ground truth.
method is robust against changes in illumination and color. Moreover, we demonstrated the validity of avoiding obstacles
Furthermore, without any knowledge of the camera’s internal via the convergence of difference between the robot
parameters, the algorithm is more practical for robotic orientation and non-Manhattan obstacle posture. The
application in navigation. In addition, using features from a experimental results showed that our method can understand
monocular camera, the approach is robust to the errors in and avoid the non-Manhattan obstacles, meeting the
calibration and image noise. Without other external devices, requirements of indoor robot navigation.
Authorized licensed use limited to: VIT University. Downloaded on November 26,2020 at 09:51:38 UTC from IEEE Xplore. Restrictions apply.
WANG AND WEI: AVOIDING NON-MANHATTAN OBSTACLES BASED ON PROJECTION OF SPATIAL CORNERS IN INDOOR ENVIRONMENT 1199
1000 90
900 80
800 70
700 60
600
50
500
40
400
300 30
200 20
100 10
0 0
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Mx η
(a) (b)
Fig. 11. Convergence. (a) convergence curve of Mx; (b) convergence curve of η.
Authorized licensed use limited to: VIT University. Downloaded on November 26,2020 at 09:51:38 UTC from IEEE Xplore. Restrictions apply.
1200 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 4, JULY 2020
“Simultaneous body part and motion identification for human-following using global image context in a non-Manhattanworld,” in Proc. IEEE
robots,” Pattern Recognition, vol. 50, pp. 118–130, 2016. Conf. Computer Vision and Pattern Recognition, 2016, pp. 5657–5665.
[29] Z. Y. Jia, A. Gallagher, A. Saxena, and T. Chen, “3d-based reasoning [40] J. Lee and K. Yoon, “Joint estimation of camera orientation and
with blocks, support, and stability,” in Proc. IEEE Conf. Computer vanishing points from an image sequence in a non-Manhattan world,”
Vision and Pattern Recognition. IEEE, pp. 1–8, 2013.
Int. J. Computer Vision, vol. 127, no. 10, pp. 1426–1442, 2019.
[30] D. Lee, A. Gupta, M. Hebert, and T. Kanade, “Estimating spatial layout
of rooms using volumetric reasoning about objects and surfaces,” NIPS, [41] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik “From contours to
pp. 1288–1296, 2010. regions: An empirical evaluation,” in Proc. Asian Conf. Computer
[31] V. Hedau, D. Hoiem, and D. Forsyth, “Thinking inside the box: Using Vision, Springer, Cham, pp. 2294–2301, 2009.
appearance models and context based on room geometry,” in Proc. [42] V. Hedau, D. Hoiem, and D. Forsyth, “Recovering the spatial layout of
European Conf. Computer Vision: Part VI. Berlin, Heidelberg, cluttered rooms,” In Proc. 12th IEEE Int. Conf. Computer Vision,
Germany: Springer, pp. 224–237, 2010. Kyoto, Japan: IEEE, pp. 1849–1856, 2009.
[32] S. Dasgupta, K. Fang, K. Chen, and S. Savarese, “Delay: Robust spatial
[43] A. Mallya and S. Lazebnik, “Learning informative edge maps for indoor
layout estimation for cluttered indoor scenes,” in Proc. IEEE Conf.
Computer Vision and Pattern Recognition. IEEE, pp. 616–624, 2016. scene layout prediction,” In Proc. IEEE Int. Conf. Computer Vision.
Santiago, Chile: IEEE, pp. 936–944, 2015.
[33] C. H. Zou, A. Colburn, Q. Shan, and D. Hoiem, “Layoutnet:
Reconstructing the 3d room layout from a single rgb image,” in Proc. [44] Y. Zhang, F. Yu, S. Song, P. Xu, A. Seff, and J. Xiao, Largescale Scene
IEEE/CVF Conf. Computer Vision and Pattern Recognition. Salt Lake Understanding Challenge: Room Layout Estimation, 2016.
City, USA: IEEE, 2018, pp. 2051–2059.
[34] Y. Z. Ren, S. W. Li, C. Chen, and C.-C. J. Kuo, “A coarse-to-fine
Luping Wang received the Ph.D. degree in the
indoor layout estimation (CFILE) method,” in Proc. Asian Conf.
Department of Computer Science and Engineering,
Computer Vision, Springer, Cham, pp. 36–51, 2016. Fudan University in 2019. Since August 2019, he has
[35] P. Miraldo, F. Eiras, and S. Ramalingam, “Analytical modeling of joined the Department of Electrical Engineering,
vanishing points and curves in catadioptric cameras,” in Proc. IEEE University of Shanghai for Science and Technology.
Conf. Computer Vision and Pattern Recognition. Salt Lake City, USA: His research interests include scene understanding,
IEEE, 2018, pp. 2012–2021. robotics, navigation, computer vision, pattern
[36] H. Howard-Jenkins, S. Li, and V. Prisacariu, “Thinking outside the box: recognition, and artificial intelligence.
Generation of unconstrained 3d room layouts,” in Proc. Asian Conf.
Computer Vision. Perth, Australia: Springer, 2018, pp. 432–448.
[37] X. T. Li, S. F. Liu, K. Kim, X. L. Wang, M. H. Yang, and J. Kautz,
“Putting humans in a scene: Learning affordance in 3d indoor Hui Wei received the Ph.D. degree in the
environments,” in Proc. IEEE Computer Vision and Pattern Department of Computer Science, Beijing University
Recognition. Long Beach, CA, USA: IEEE, 2019, pp. 12368–12376. of Aeronautics and Astronautics in 1998. From 1998
to 2000, he was a Postdoctoral Fellow in the
[38] A. Atapour-Abarghouei and T. P. Breckon, “Veritatem dies aperit –
Department of Computer Science and the Institute of
temporally consistent depth prediction enabled by a multi-task
Artificial Intelligence, Zhejiang University. Since
geometric and semantic scene understanding approach,” in Proc. IEEE November 2000, he has joined the Department of
Conf. Computer Vision and Pattern Recognition. Long Beach, CA, Computer Science and Engineering, Fudan
USA: IEEE, 2019, pp. 3373–3384. University. His research interests include artificial
[39] M. H. Zhai, S. Workman, and N. Jacobs, “Detecting vanishing points intelligence and cognitive science.
Authorized licensed use limited to: VIT University. Downloaded on November 26,2020 at 09:51:38 UTC from IEEE Xplore. Restrictions apply.