You are on page 1of 17

Learning-based

methods for VO
and Global
Localization
NGUYEN ANH MINH - IVSR
Chen et. al – “A Survey on Deep Learning for Localization and Mapping
Towards the Age of Spatial Machine Intelligence”
Chen et. al – “A Survey on Deep Learning for Localization and Mapping
Towards the Age of Spatial Machine Intelligence”
Learning-based Visual Odometry estimation

Supervised
Learning
End-to-end
Learning

Un (Self)-
supervised Learning

Hybrid
Learning
Supervised Learning

Image at timestamp
t

𝑝 ={ ^
 ^ 𝑅 , 𝑡^ }
Relative pose:
Image at timestamp Rotation and
t+1 translation
Supervised Learning

Image at timestamp
t

𝑝 ={ ^
 ^ 𝑅 , 𝑡^ }
Relative pose:
Image at timestamp Rotation and
t+1 translation

Euclidian regression loss:   ,)


Deep VO
Other works

Publication Contribution
Saputra et. al Add curriculum learning and geometric loss
2019a constraints
Saputra et. al Knowledge distillation for compressed pose
2019b resgressor
Xue et. Al 2019 - Introduced a memory module that stores
global information
- A refining module that improve pose estimates
and preserved contextual information
Self-Supervised Learning

Image at timestamp
^𝐷 
t Predicted depth
(Target view) image

𝐼  𝑡
Synthetic of source
image wrapped into ^𝐼  𝑠
target view

Image at timestamp 𝑝 ={ ^
 ^ 𝑅 , 𝑡^ }
t+1 Relative pose:
(Source view) Rotation and
translation
𝐼  𝑠
Photometric reconstruction loss:   ,)
SFM Learner
View synthesis as supervision

For each pixel in the targeted view image (ground-truth):


// project the coordinate of target pixel to source view

// Calculate intensity of the similar-location pixel in predicted synthesis


image using bilinear interpolation

Calculate photometric reconstruction loss between real target image and predicted synthetic:
Two main problems of unsupervised VO

1. Non-consistent in global scale due to scale ambiguity


2. Photometric loss assumes that the scene is static and no
camera occlusions
Two main problems of unsupervised VO

1. Non-consistent in global scale due to scale ambiguity


2. Photometric loss assumes that the scene is static and no
camera occlusions

Publication Contribution
Bian et. al Scale recovery by transform depth map into 3D
space and project them back to produce
reconstructed depth.
GeoNet geometric consistency loss and 2D flow generator
GANVO Generative-Adversarial Learning to generate better
synthetic depth map.
Li. et al Use GAN for more accurate synthetic target view.
Benchmarks

1. KITTI – Odometry:
- 11 training sequences, 11 testing sequences
- Sequence 9 and 10 are commonly used for learning-
based methods
2. TUM-RGBD
- 21 sequences for training and testing
Summary
Summary

- Hybrid VO shows the best


performance

- Unsupervised VO is slightly
outperformed by supervised VO;
however, the gap is diminishing
Thank you

- NGUYEN ANH MINH

You might also like