You are on page 1of 43

Joint Embeddings of Shapes and Images

via CNN Image Purification


Yangyan Li* Hao Su* Charles R. Qi Noa Fish
Daniel Cohen-Or Leonidas J. Guibas
(*Joint First Authors)
Joint Embeddings of Shapes and Images
via CNN Image Purification
Deep learning is so cool for so many problems…
Deep learning, yay or nay?
A piece of cake, What the hell is
elementary math… Y = 𝑓(𝑋) the 𝑓?

It eats, a lot!
Joint Embeddings of Shapes and Images
via CNN Image Purification
128 dim space visualized by t-SNE
Image based Shape Retrieval
Shape based Image Retrieval
Cross-View Image Retrieval
Text Images Shapes
Text based Shape Retrieval
Text based Shape Retrieval
Shape Embedding

𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑆𝑖 ,𝑆𝑗 ) = 𝒫𝑖 − 𝒫𝑗
Many choices for 𝒫𝑖 :
Shape Histograms, Spin Images, Spherical
Harmonics, Shape Distributions, etc.
LFD-HoG
Very Strong!

Light Field Rendering

… … … …
HoG HoG HoG HoG HoG
… … … …
Concatenate
𝑆𝟏 𝑆𝟏
𝑆𝟐 𝑆𝟐
𝑆𝟑 𝑆𝟑
. .
. .

. .

. .

.
PCA
.
. .

. .

. .

𝑆𝒌 𝑆𝒌
. .

. .

𝑆𝒏 𝑆𝒏

203,760 128
chairs

planes

cars

Distance Matrix: 𝑑(𝑆𝑖 , 𝑆𝑗 ) in the 𝑖, 𝑗 − 𝑡ℎ element


𝑆𝟏 𝑆𝟐 𝑆𝟑 . . . . . . . . . . . 𝑆𝒏
𝑆𝟏 𝑆𝟏
𝑆𝟐 𝑆𝟐
𝑆𝟑 𝑆𝟑
. .
. .
. MDS .
. .
. Sammon's Error .

. 1 (𝑑𝑖𝑗 − 𝑑𝑖𝑗 )2 .
𝐸= ∗ ෍ ∗
. σ𝑖<𝑗 𝑑𝑖𝑗 𝑑𝑖𝑗 .
𝑖<𝑗
. .

𝑆𝒌 𝑆𝒌
. .
. .

𝑆𝒏 𝑆𝒏

Distance Matrix: 𝑑(𝑆𝑖 , 𝑆𝑗 ) in the 𝑖, 𝑗 − 𝑡ℎ element 128


Each row can serve as the embedding point
250

Sammon
Num of neighbors by original distance
PCA
200 LLE
NPE
Optimal
150

100

50

0
0 50 100 150 200 250
Neighborhood size in embedding space
Shape Embedding

𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑆𝑖 ,𝑆𝑗 ) = 𝒫𝑖 − 𝒫𝑗
Our choice of embedding point 𝒫𝑖 :
1. Extract Light Field HoG Descriptors
2. Compute Distance Matrix
3. MDS with Sammon’s Error
Image Embedding
via CNN Image Purification
Deep learning, yay or nay?

𝒫𝑖 = 𝑓(𝐼𝑖 )
A piece of cake, What the hell is
elementary math…
𝒫2 − 𝒫3 < 𝒫1 − 𝒫2 the 𝑓?
http://shapenet.org
Shape Embedding Image Synthesis

Many image-point pairs (𝐼𝑆𝑖 , 𝒫𝑖 )


≠ 1014 ∗

It’s not only the number…


Training Phase Testing Phase
Input: many image-point pairs (𝐼𝑆𝑖 , 𝒫𝑖 )
Task: learn the function 𝒫𝑖 = 𝑓(𝐼𝑆𝑖 )
Hey, wake up!
Here comes the most important slide!
Shape Embedding Precious High Quality Supervision

Image Synthesis Messy but Nutritional Training Data

Training Phase
Testing Phase 𝒫𝑖 = 𝑓(𝐼𝑆𝑖 ), the hell function
Quantitative Evaluation

AUC of image to image retrieval precision-recall curve

First and last image match rankings in shape to image retrieval


Quantitative Evaluation

Image to shape retrieval


Key Steps towards 3D Reconstruction

Similar Shape Retrieval


+
Viewpoint estimation
Render for CNN: Viewpoint Estimation in Images Using CNNs
Trained with Rendered 3D Model Views, ICCV 2015 Oral
Limitations & Future Work
•Dynamic embedding space construction
•Similarity: visual  semantic
-For example, the upcoming SHED!
•Similarity: scalar  vector/matrix
•Whole shape  Parts
•Joint analysis of shapes and images
•…...
http://shapenet.github.io/JointEmbedding/
Stay Cool with http://shapenet.github.io/RenderForCNN/
Thank you!
FC Layer Softmax Loss Layer Euclidean Loss Layer
CONV Layer m Dimensions

Class Embedding
Label Point

You might also like