You are on page 1of 29

Unsupervised Learning of Probably Symmetric

Deformable 3D Objects from Images in the Wild

Shangzhe Wu Christian Rupprecht Andrea Vedaldi

VISUAL GEOMETRY GROUP, UNIVERSITY OF OXFORD


Supervision for 3D Reconstruction

3D ground truth or multi-views depth maps


shape models

silhouettes keypoints camera viewpoint


2
Unsupervised Learning of 3D Objects
Training Data Output

Unsup3D

single-view images of a category instance-specific 3D shapes


NO other supervision!
3
Training Pipeline:
Photo-Geometric Autoencoding

5
Photo-Geometric Autoencoding

input 𝐈

encoder encoder encoder

decoder decoder

view 𝑤 depth 𝑑 texture

Reconstruction
Loss

Renderer

reconstruction 𝐈መ
6
Photo-Geometric Autoencoding
Q1: How to avoid degenerate solutions?

input 𝐈

encoder encoder encoder

decoder decoder

view 𝑤 depth 𝑑 texture

Reconstruction
Loss

Renderer

reconstruction 𝐈መ
7
Photo-Geometric Autoencoding
Q1: How to avoid degenerate solutions? A1: Enforce symmetry

input 𝐈

encoder encoder encoder

decoder decoder

view 𝑤 depth 𝑑 texture

Reconstruction
Loss

Renderer

reconstruction 𝐈መ
8
Photo-Geometric Autoencoding
Q1: How to avoid degenerate solutions? A1: Enforce symmetry by flipping
: horizontal flip
input 𝐈

encoder encoder encoder

decoder decoder

? ?
view 𝑤 depth 𝑑 depth 𝑑′ texture flipped

9
Photo-Geometric Autoencoding
Q1: How to avoid degenerate solutions? A1: Enforce symmetry by flipping
: horizontal flip
input 𝐈

encoder encoder encoder

decoder decoder

? ?
view 𝑤 depth 𝑑 depth 𝑑′ texture flipped

flip switch
Reconstruction
Loss

? Renderer

reconstruction 𝐈መ
10
Photo-Geometric Autoencoding
Q1: How to avoid degenerate solutions? A1: Enforce symmetry by flipping
: horizontal flip
input 𝐈

encoder encoder encoder

decoder decoder

? ?
view 𝑤 depth 𝑑 depth 𝑑′ texture flipped

flip switch
Reconstruction
Loss

Renderer

reconstruction 𝐈መ
11
Photo-Geometric Autoencoding
Q1: How to avoid degenerate solutions? A1: Enforce symmetry by flipping
: horizontal flip
input 𝐈

encoder encoder encoder

decoder decoder

? ?
view 𝑤 depth 𝑑 depth 𝑑′ texture flipped

flip switch
Reconstruction
Loss

? Renderer

reconstruction 𝐈መ
12
Photo-Geometric Autoencoding
Q1: How to avoid degenerate solutions? A1: Enforce symmetry by flipping
: horizontal flip
input 𝐈

encoder encoder encoder

decoder decoder

view 𝑤 depth 𝑑 depth 𝑑′ texture flipped

flip switch
Reconstruction
Loss

Renderer

reconstruction 𝐈መ
13
Photo-Geometric Autoencoding
Q2: What about non-symmetric lighting?
: horizontal flip
input 𝐈

encoder encoder encoder

decoder decoder

view 𝑤 depth 𝑑 depth 𝑑′ texture flipped

14
Photo-Geometric Autoencoding
Q2: What about non-symmetric lighting? A2: Enforce symmetry on albedo
: horizontal flip
input 𝐈

encoder encoder encoder encoder

decoder decoder

view 𝑤 depth 𝑑 depth 𝑑′ light 𝑙 albedo 𝑎 albedo 𝑎′

flip switch
Reconstruction
Loss

shading
Renderer

reconstruction 𝐈መ canonical view 𝐉


15
Photo-Geometric Autoencoding
Q3: Non-symmetric albedo, deformation, etc?
: horizontal flip
input 𝐈

encoder encoder encoder encoder

decoder decoder

view 𝑤 depth 𝑑 depth 𝑑′ light 𝑙 albedo 𝑎 albedo 𝑎′

flip switch
Reconstruction
Loss

shading
Renderer

reconstruction 𝐈መ canonical view 𝐉


16
Photo-Geometric Autoencoding
Q3: Non-symmetric albedo, deformation, etc? A3: Predict uncertainty
: horizontal flip
input 𝐈

encoder encoder encoder encoder encoder

decoder decoder decoder

conf. 𝜎 conf. 𝜎′ view 𝑤 depth 𝑑 depth 𝑑′ light 𝑙 albedo 𝑎 albedo 𝑎′

flip switch
Reconstruction
Loss

shading
Renderer

reconstruction 𝐈መ canonical view 𝐉


17
Results on human faces
Images taken from CelebA, 3DFAW

24
input reconstruction input reconstruction 25
Results on face paintings
Images taken from [1]

[1] Elliot J. Crowley, Omkar M. Parkhi, and Andrew Zisserman. Face painting: querying art with photos. In Proc. BMVC, 2015. 26
input reconstruction input reconstruction 27
Results on abstract faces
Images taken from [1] and the Internet

[1] Elliot J. Crowley, Omkar M. Parkhi, and Andrew Zisserman. Face painting: querying art with photos. In Proc. BMVC, 2015. 28
input reconstruction input reconstruction 29
Results on video frames
Video clips taken from VoxCeleb2
We do not use videos for training or fine-tuning. These results are obtained
by applying our model trained on CelebA frame by frame.

30
31

recon. new view rotated recon. new view rotated


input

input

recon. new view rotated recon. new view rotated


input

input
Relighting effects
Video clips taken from CelebA

32
input reconstruction input reconstruction 33
Results on cat faces
Images taken from [2] and [3]

[2] Weiwei Zhang, Jian Sun, and Xiaoou Tang. Cat head detection - how to effectively exploit shape and texture features. In Proc. ECCV, 2008.
[3] Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman, and C. V. Jawahar. Cats and dogs. In Proc. CVPR, 2012. 34
input reconstruction input reconstruction 35
Thank you!

Demo: https://bit.ly/2zBNjXx
Code: https://github.com/elliottwu/unsup3d

38

You might also like