You are on page 1of 18

Learning to Predict 3D Objects with an

Interpolation-based Differentiable Renderer


NISP '19

1
Outline
● Introduction
● Contribution
● Method
● Experimental Results
● Inspiration

2
Introduction

● Input: 2D image (RGBA) and camera parameters


● Output: mesh, light, and texture

3
Contribution
● 2D supervision with camera viewpoint
● Differentiable renderer design
● State-of-the-art of 3D reconstruction
● Train with natural images by adversarial loss

4
Prediction of Network: Triangle Mesh
Represent a 3D shape as a set of triangles

● Vertices: V x 3 matrix giving real-valued positions in 3D space


● Faces: F x 3 matrix giving F triangles, each point specified as an
index into the vertices

(+) Standard representation for graphics


(+) Explicitly represents 3D shapes
(+) Adaptive: Can represent flat surfaces very efficiently, can allocate more
faces to areas with fine detail
(+) Can attach data on verts and interpolate over the whole surface: RGB
colors, texture coordinates, normal vectors, etc.
5
slide of ICCV mesh-rcnn
Method

6
Differentiable Renderer

● Render with input camera parameters


○ Colored image
○ Rendered sihouette image
● Compute loss on reneded images
○ Convert 3D triangles into sets of pixels
7
○ Backpropagate gradients to mesh attributes
Three Steps for Rendering Pipeline

1) Vertex shader 2) Rasterization 3) Fragment shader

Project each 3D vertex onto Determine which pixels are Compute how each pixel is
the defined 2D image plane covered colored

(V) Differentialble X Differentialble (V) Differentialble

8
Step-1: Vertex Shader
● Prupose: Project 3D points onto 2D image plane
● How: Multiplying with the corresponding 3D model, view and projection matrices
● The vertex shader operation is directly differentiable

9
Step-2: Interpolation-based Differentiable Rasterization
● Foreground pixels
○ A weighted interpolation of local face property
■ Vertex positions
■ Attributes (color, lights)

○ Notations
■ Ii: value of pixel i
■ wk : barycentric weight
■ pi : pixel i (2D coordinates on image plane)
■ vk : vertext k (2D coordinates on image plane)
■ fj: traigle j
■ uk: attribute of vk
10
Step-2: Interpolation-based Differentiable Rasterization
● Background pixels
○ An distance-based aggregation of global geometry
■ Softly assign face fj to pixel pi’
■ Combine the probabilistic influence of each face

■ Produce alpha channel predictions (?)


● Notations
○ Ii: value of pixel i
○ wk : barycentric weight
○ pi : pixel i
○ Notations ○ vk : vertext
■ d(pi’, fj): distance function from pi’ to fj ○ fj: traigle j
■ δ: smoothness control ○ uk: attribute of vk 11
Loss Computation on Rendered Images

● Compute loss on reneded images


● Render with input camera parameters
○ Colored image

○ Sihouette image
● Loss functions
○ LIOU: silhouette loss
○ Lcol: color loss
○ Lsm: smoothness loss (?)
○ Llap: laplacian loss (?)
12
AI + Inverse Graphics

● Allows to take just 2D images and viewpoints as input


● Predict corresponding 3D properties, including 3D mesh, light and texture map
● Differentiable renderer

13
3D GAN of Textured Shapes via 2D Supervision
● Training a discriminator to distinguish real image and rendered image
● Predict camera pose by multiview and SfM

14
Experimental Result

15
Geometry Results on Single Image 3D Object Prediction
IOU (%)/F-score
(%)

Ground-truth Prediction SoftR. [20] N3MR [14] Ground-truth Prediction SoftR. [20] N3MR [14]
image image

● The only difference in this experiment is the renderer


● Demonstrate different results to SofrR(?)
● Hard to judge the statement that “DIR-B faithfully reconstructs both the fine-detailed
16
color and geometry of the 3D shape, compared to SoftRas-Mesh and N3MR”
Inspiration
● Introduce a mesh structure to introduce the property of the object
○ location, semantic label of parts, boundary of the objects
● While mesh representation is so powerful, what is the reason of point cloud related
research?
● Can not train on natural images easily since that it requires camera parameters

17
Key References
● [14] Neural 3d mesh renderer. CVPR ‘18
● [20] Soft rasterizer: Differentiable rendering for unsupervised single-view mesh
reconstruction. CVPR ‘19 (slide)

18

You might also like