You are on page 1of 111

Intrinsic3D: High-Quality 3D Reconstruction by

Joint Appearance and Geometry Optimization


with Spatially-Varying Lighting
R. Maier1,2, K. Kim1, D. Cremers2, J. Kautz1, M. Nießner2,3
Fusion Ours

1 NVIDIA 2 TU Munich 3 Stanford University


International Conference on Computer Vision 2017
Overview

• Motivation & State-of-the-art


• Approach
• Results
• Conclusion

2
Overview

• Motivation & State-of-the-art


• Approach
• Results
• Conclusion

3
Motivation

• Recent progress in Augmented Reality (AR) /


Virtual Reality (VR)
Microsoft HoloLens HTC Vive

4
Motivation

• Recent progress in Augmented Reality (AR) /


Virtual Reality (VR)
Microsoft HoloLens HTC Vive
• Requirement of high-quality 3D content for AR,
VR, Gaming …

NVIDIA VR Funhouse

5
Motivation

• Recent progress in Augmented Reality (AR) /


Virtual Reality (VR)
Microsoft HoloLens HTC Vive
• Requirement of high-quality 3D content for AR,
VR, Gaming …
• Usually: manual modelling (e.g. Maya)

NVIDIA VR Funhouse

6
Motivation

• Recent progress in Augmented Reality (AR) /


Virtual Reality (VR)
Microsoft HoloLens HTC Vive
• Requirement of high-quality 3D content for AR,
VR, Gaming …
• Usually: manual modelling (e.g. Maya)

• Wide availability of commodity RGB-D sensors:


efficient methods for 3D reconstruction

NVIDIA VR Funhouse

Asus Xtion
7
Motivation

• Recent progress in Augmented Reality (AR) /


Virtual Reality (VR)
Microsoft HoloLens HTC Vive
• Requirement of high-quality 3D content for AR,
VR, Gaming …
• Usually: manual modelling (e.g. Maya)

• Wide availability of commodity RGB-D sensors:


efficient methods for 3D reconstruction

NVIDIA VR Funhouse
• Challenge: how to reconstruct high-quality 3D
models with best-possible geometry and color
from low-cost depth sensors?
Asus Xtion
8
State-of-the-art
RGB-D based 3D Reconstruction
• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric
consistency

9
State-of-the-art
RGB-D based 3D Reconstruction
• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric
consistency
• Real-time, robust, fairly accurate geometric reconstructions

KinectFusion, 2011
“KinectFusion: Real-time Dense
Surface Mapping and Tracking”,
Newcombe et al., ISMAR 2011.
10
State-of-the-art
RGB-D based 3D Reconstruction
• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric
consistency
• Real-time, robust, fairly accurate geometric reconstructions

KinectFusion, 2011 DynamicFusion, 2015


“KinectFusion: Real-time Dense “DynamicFusion: Reconstruction and Tracking of
Surface Mapping and Tracking”, Non-rigid Scenes in Real-time”, Newcombe et al.,
Newcombe et al., ISMAR 2011. CVPR 2015.
11
State-of-the-art
RGB-D based 3D Reconstruction
• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric
consistency
• Real-time, robust, fairly accurate geometric reconstructions

KinectFusion, 2011 DynamicFusion, 2015 BundleFusion, 2017


“KinectFusion: Real-time Dense “DynamicFusion: Reconstruction and Tracking of “BundleFusion: Real-time Globally Consistent
Surface Mapping and Tracking”, Non-rigid Scenes in Real-time”, Newcombe et al., 3D Reconstruction using On-the-fly Surface
Newcombe et al., ISMAR 2011. CVPR 2015. Re-integration”, Dai et al., ToG 2017.
12
State-of-the-art
Voxel Hashing
• Baseline RGB-D based 3D reconstruction
framework
• initial camera poses
• sparse SDF reconstruction

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013. 13
State-of-the-art
Voxel Hashing
• Baseline RGB-D based 3D reconstruction
framework
• initial camera poses
• sparse SDF reconstruction

• Challenges:
• (Slightly) inaccurate and over-smoothed geometry

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013. 14
State-of-the-art
Voxel Hashing
• Baseline RGB-D based 3D reconstruction
framework
• initial camera poses
• sparse SDF reconstruction

• Challenges:
• (Slightly) inaccurate and over-smoothed geometry
• Bad colors

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013. 15
State-of-the-art
Voxel Hashing
• Baseline RGB-D based 3D reconstruction
framework
• initial camera poses
• sparse SDF reconstruction

• Challenges:
• (Slightly) inaccurate and over-smoothed geometry
• Bad colors
• Inaccurate camera pose estimation

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013. 16
State-of-the-art
Voxel Hashing
• Baseline RGB-D based 3D reconstruction
framework
• initial camera poses
• sparse SDF reconstruction

• Challenges:
• (Slightly) inaccurate and over-smoothed geometry
• Bad colors
• Inaccurate camera pose estimation
• Input data quality (e.g. motion blur, sensor noise)

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013. 17
State-of-the-art
Voxel Hashing
• Baseline RGB-D based 3D reconstruction
framework
• initial camera poses
• sparse SDF reconstruction

• Challenges:
• (Slightly) inaccurate and over-smoothed geometry
• Bad colors
• Inaccurate camera pose estimation
• Input data quality (e.g. motion blur, sensor noise)

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013. 18
State-of-the-art
Voxel Hashing
• Baseline RGB-D based 3D reconstruction
framework
• initial camera poses
• sparse SDF reconstruction

• Challenges:
• (Slightly) inaccurate and over-smoothed geometry
• Bad colors
• Inaccurate camera pose estimation
• Input data quality (e.g. motion blur, sensor noise)

• Goal: High-Quality Reconstruction of


Geometry and Color
“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013. 19
State-of-the-art

20
State-of-the-art
High-Quality Colors [Zhou2014]

Optimize camera poses and image deformations


to optimally fit initial (maybe wrong)
reconstruction

But: HQ images required, no geometry refinement


involved
“Color Map Optimization for 3D Reconstruction with Consumer Depth
Cameras”, Zhou and Koltun, ToG 2014

21
State-of-the-art
High-Quality Colors [Zhou2014] High-Quality Geometry [Zollhöfer2015]

Adjust camera poses in advance (bundle


Optimize camera poses and image deformations adjustment) to improve color
to optimally fit initial (maybe wrong) Use shading cues (RGB) to refine geometry
reconstruction (shading based refinement of surface & albedo)
But: HQ images required, no geometry refinement But: RGB is fixed (no color refinement based on
involved refined geometry)
“Color Map Optimization for 3D Reconstruction with Consumer Depth “Shading-based Refinement on Volumetric Signed Distance Functions”,
Cameras”, Zhou and Koltun, ToG 2014 Zollhöfer et al., ToG 2015

22
State-of-the-art
High-Quality Colors [Zhou2014] High-Quality Geometry [Zollhöfer2015]

Adjust camera poses in advance (bundle


Optimize camera poses and image deformations adjustment) to improve color
to optimally fit initial (maybe wrong) Use shading cues (RGB) to refine geometry
reconstruction (shading based refinement of surface & albedo)
But: HQ images required, no geometry refinement But: RGB is fixed (no color refinement based on
involved refined geometry)
“Color Map Optimization for 3D Reconstruction with Consumer Depth “Shading-based Refinement on Volumetric Signed Distance Functions”,
Cameras”, Zhou and Koltun, ToG 2014 Zollhöfer et al., ToG 2015

Idea: jointly optimize for geometry, albedo and image formation model to
simultaneously obtain high-quality geometry and appearance!
23
Our Method

24
Our Method

• Temporal view sampling & filtering


techniques (input frames)

25
Our Method

• Temporal view sampling & filtering


techniques (input frames)

• Joint optimization of
• surface & albedo (Signed Distance
Field)
• image formation model

26
Our Method

• Temporal view sampling & filtering


techniques (input frames)

• Joint optimization of
• surface & albedo (Signed Distance
Field)
• image formation model

27
Our Method

• Temporal view sampling & filtering


techniques (input frames)

• Joint optimization of
• surface & albedo (Signed Distance
Field)
• image formation model

• Lighting estimation using Spatially-


Varying Spherical Harmonics (SVSH)

28
Our Method

• Temporal view sampling & filtering


techniques (input frames)

• Joint optimization of
• surface & albedo (Signed Distance
Field)
• image formation model

• Lighting estimation using Spatially-


Varying Spherical Harmonics (SVSH)

• Optimized colors (by-product)

29
Overview

• Motivation & State-of-the-art


• Approach
• Results
• Conclusion

30
Approach
Overview
RGB-D

31
Approach
Overview
RGB-D SDF Fusion

32
Approach
Overview
RGB-D SDF Fusion Shading-based Refinement
(Shape-from-Shading)

33
Approach
Overview
RGB-D SDF Fusion Shading-based Refinement
(Shape-from-Shading)

Temporal view sampling / filtering


34
Approach
Overview
RGB-D SDF Fusion Shading-based Refinement
(Shape-from-Shading)

Spatially-Varying
Lighting Estimation

Temporal view sampling / filtering


35
Approach
Overview
RGB-D SDF Fusion Shading-based Refinement
(Shape-from-Shading)

Spatially-Varying
Lighting Estimation

Joint Appearance and


Geometry Optimization
• surface
• albedo
• image formation model

Temporal view sampling / filtering


36
Approach
Overview
RGB-D SDF Fusion Shading-based Refinement High-Quality 3D
(Shape-from-Shading) Reconstruction

Spatially-Varying
Lighting Estimation

Joint Appearance and


Geometry Optimization
• surface
• albedo
• image formation model

Temporal view sampling / filtering


37
Approach
Overview
RGB-D

38
RGB-D Data
Example: Fountain dataset

• 1086 RGB-D frames


• Sensor:
• Depth 640x480px

• Color 1280x1024px

• ~10 Hz

• Primesense

• Poses estimated using Voxel Hashing

Source: http://qianyi.info/scenedata.html 39
Approach
Overview
SDF Fusion

40
Signed Distance Fields
Volumetric 3D model representation
• Voxel grid: dense (e.g. KinectFusion) or sparse (e.g. Voxel Hashing)

“A volumetric method for building complex models from range images”, Curless and Levoy, SIGGRAPH 1996. 41
Signed Distance Fields
Volumetric 3D model representation
• Voxel grid: dense (e.g. KinectFusion) or sparse (e.g. Voxel Hashing)
• Each voxel stores:
• Signed Distance Function (SDF): signed distance to closest surface
• Color values
• Weights

“A volumetric method for building complex models from range images”, Curless and Levoy, SIGGRAPH 1996. 42
Signed Distance Fields
Fusion of depth maps

• Integrate depth maps into SDF with their


estimated camera poses

43
Signed Distance Fields
Fusion of depth maps

• Integrate depth maps into SDF with their


estimated camera poses
• Voxel updates using weighted average

44
Signed Distance Fields
Fusion of depth maps

• Integrate depth maps into SDF with their


estimated camera poses
• Voxel updates using weighted average

45
Signed Distance Fields
Fusion of depth maps

• Integrate depth maps into SDF with their


estimated camera poses
• Voxel updates using weighted average

46
Signed Distance Fields
Fusion of depth maps

• Integrate depth maps into SDF with their


estimated camera poses
• Voxel updates using weighted average

47
Signed Distance Fields
Fusion of depth maps

• Integrate depth maps into SDF with their


estimated camera poses
• Voxel updates using weighted average
• Extract ISO-surface with Marching Cubes
(triangle mesh)

”Marching cubes: A high resolution 3D surface construction algorithm”, Lorensen and Cline, SIGGRAPH 1987. 48
Approach
Overview

Temporal view sampling / filtering


49
Keyframe Selection
• Compute per-frame blur score (for color image)

Frame 81 Frame 84

• Select frame with best score within a fixed size window as keyframe
“The blur effect: perception and estimation with a new no-reference perceptual blur metric”, Crete et al., SPIE 2007. 50
Sampling / Filtering
Sampling of voxel observations
Input keyframes

• Sample from selected keyframes only


• Collect observations for voxel in input views:

Reconstruction
51
Sampling / Filtering
Sampling of voxel observations
Input keyframes

• Sample from selected keyframes only


• Collect observations for voxel in input views:

Voxel center transformed and projected into input view

Reconstruction
52
Sampling / Filtering
Sampling of voxel observations
Input keyframes

• Sample from selected keyframes only


• Collect observations for voxel in input views:

Voxel center transformed and projected into input view

• Observation weights: view-dependent on


normal and depth

• Filter observations: keep only best 5


observations by weight

Reconstruction
53
Approach Double-hierarchical
(coarse-to-fine: SDF Volume / RGB-D)
Overview
Shading-based Refinement
(Shape-from-Shading)

Spatially-Varying
Lighting Estimation

Joint Appearance and


Geometry Optimization
• surface
• albedo
• image formation model

54
Shape-from-Shading

• Shading equation:

55
Shape-from-Shading
surface normal
• Shading equation:

56
Shape-from-Shading
surface normal
• Shading equation:
lighting

57
Shape-from-Shading
albedo surface normal
• Shading equation:
lighting

58
Shape-from-Shading
Shading albedo surface normal
• Shading equation:
lighting

59
Shape-from-Shading
Shading albedo surface normal
• Shading equation:
lighting

• Shading-based refinement:
• Intuition: high-frequency changes in surface geometry → shading cues in input images

60
Shape-from-Shading
Shading albedo surface normal
• Shading equation:
lighting

• Shading-based refinement:
• Intuition: high-frequency changes in surface geometry → shading cues in input images
• Estimate lighting given surface and albedo (intrinsic material properties)

61
Shape-from-Shading
Shading albedo surface normal
• Shading equation:
lighting

• Shading-based refinement:
• Intuition: high-frequency changes in surface geometry → shading cues in input images
• Estimate lighting given surface and albedo (intrinsic material properties)
• Estimate surface and albedo given the lighting: minimize difference between estimated
shading and input luminance 62
Approach
Overview

Spatially-Varying
Lighting Estimation

63
Lighting Estimation
Spherical Harmonics (SH)
• Encode incident lighting for a given surface point
• Smooth for Lambertian surfaces

64
Lighting Estimation
Spherical Harmonics (SH)
• Encode incident lighting for a given surface point
• Smooth for Lambertian surfaces
• SH Basis functions Hm parametrized by unit normal n

65
Lighting Estimation
Spherical Harmonics (SH)
• Encode incident lighting for a given surface point
• Smooth for Lambertian surfaces
• SH Basis functions Hm parametrized by unit normal n

• Good approx. using only 9 SH basis functions (2nd order)

66
Lighting Estimation
Spherical Harmonics (SH)
• Encode incident lighting for a given surface point
• Smooth for Lambertian surfaces
• SH Basis functions Hm parametrized by unit normal n

• Good approx. using only 9 SH basis functions (2nd order)


• Estimate SH coefficients:

67
Lighting Estimation
Spherical Harmonics (SH)
• Encode incident lighting for a given surface point
• Smooth for Lambertian surfaces
• SH Basis functions Hm parametrized by unit normal n

• Good approx. using only 9 SH basis functions (2nd order)


• Estimate SH coefficients:

• Shortcoming: purely directional → cannot represent scene lighting for all surface
points simultaneously 68
Spatially-Varying Lighting
Subvolume Partitioning

69
Spatially-Varying Lighting
Subvolume Partitioning

• Partition SDF volume into


subvolumes

70
Spatially-Varying Lighting
Subvolume Partitioning

• Partition SDF volume into


subvolumes
• Estimate independent SH
coefficients for each
subvolume

71
Spatially-Varying Lighting
Subvolume Partitioning

• Partition SDF volume into


subvolumes
• Estimate independent SH
coefficients for each
subvolume
• Obtain per-voxel SH
coefficients through tri-linear
interpolation

72
Spatially-Varying Lighting
Joint Optimization

73
Spatially-Varying Lighting
Joint Optimization
• Estimate SVSH coefficients for all subvolumes jointly:

74
Spatially-Varying Lighting
Joint Optimization
• Estimate SVSH coefficients for all subvolumes jointly:

Data term:

Similarity between estimated shading and input luminance

75
Spatially-Varying Lighting
Joint Optimization
• Estimate SVSH coefficients for all subvolumes jointly:

Data term:

Similarity between estimated shading and input luminance

Laplacian regularizer:

Smooth illumination changes


76
Approach
Overview

Joint Appearance and


Geometry Optimization
• surface
• albedo
• image formation model

77
Joint Optimization
Shading-based SDF optimization
• Joint optimization of geometry, albedo and image formation model (camera poses
and camera intrinsics):

with

78
Joint Optimization
Shading-based SDF optimization
• Joint optimization of geometry, albedo and image formation model (camera poses
and camera intrinsics):

with

Gradient-based shading constraint (data term)

79
Joint Optimization
Shading-based SDF optimization
• Joint optimization of geometry, albedo and image formation model (camera poses
and camera intrinsics):

with

Gradient-based shading constraint (data term)


Volumetric regularizer: smoothness in distance values (Laplacian)

80
Joint Optimization
Shading-based SDF optimization
• Joint optimization of geometry, albedo and image formation model (camera poses
and camera intrinsics):

with

Gradient-based shading constraint (data term)


Volumetric regularizer: smoothness in distance values (Laplacian)
Surface Stabilization constraint: stay close to initial distance values

81
Joint Optimization
Shading-based SDF optimization
• Joint optimization of geometry, albedo and image formation model (camera poses
and camera intrinsics):

with

Gradient-based shading constraint (data term)


Volumetric regularizer: smoothness in distance values (Laplacian)
Surface Stabilization constraint: stay close to initial distance values
Albedo regularizer: constrain albedo changes based on chromaticity (Laplacian)

82
Joint Optimization
Shading Constraint (data term)
• Idea: maximize consistency between estimated voxel shading and sampled
intensities in input luminance images (gradient for robustness)

83
Joint Optimization
Shading Constraint (data term)
• Idea: maximize consistency between estimated voxel shading and sampled
intensities in input luminance images (gradient for robustness)

Best views for voxel and respective view-dependent weights

84
Joint Optimization
Shading Constraint (data term)
• Idea: maximize consistency between estimated voxel shading and sampled
intensities in input luminance images (gradient for robustness)

Best views for voxel and respective view-dependent weights


Shading: allows for optimization of surface (through normal) and albedo

85
Joint Optimization
Shading Constraint (data term)
• Idea: maximize consistency between estimated voxel shading and sampled
intensities in input luminance images (gradient for robustness)

Best views for voxel and respective view-dependent weights


Shading: allows for optimization of surface (through normal) and albedo
Voxel center transformed and projected into input view

86
Joint Optimization
Shading Constraint (data term)
• Idea: maximize consistency between estimated voxel shading and sampled
intensities in input luminance images (gradient for robustness)

Best views for voxel and respective view-dependent weights


Shading: allows for optimization of surface (through normal) and albedo
Voxel center transformed and projected into input view
Sampling: allows for optimization of camera poses and camera intrinsics

87
Recolorization
Optimal colors
• Recompute voxel colors after optimization at each level

88
Recolorization
Optimal colors
• Recompute voxel colors after optimization at each level
• Sampling
• Sample from keyframes only

• Collect, weight and filter observations

89
Recolorization
Optimal colors
• Recompute voxel colors after optimization at each level
• Sampling
• Sample from keyframes only

• Collect, weight and filter observations

• Weighted average of observations:

90
Overview

• Motivation & State-of-the-art


• Approach
• Results
• Conclusion

91
Ground Truth: Geometry
Frog (synthetic)
Ours

Fusion Ground truth

92
Ground Truth: Geometry
Frog (synthetic)
Ours

Fusion Ground truth

Zollhöfer et al. 15
93
Ground Truth: Geometry
Frog (synthetic)
Ours

Fusion Ground truth

Zollhöfer et al. 15 Ours


94
Ground Truth: Quantitative Results
Frog (synthetic) Zollhöfer et al. 15

• Generated synthetic RGB-D dataset (noise on


depth and camera poses)
• Quantitative surface accuracy evaluation
• Color coding: absolute distances (ground truth)

95
Ground Truth: Quantitative Results
Frog (synthetic) Zollhöfer et al. 15

• Generated synthetic RGB-D dataset (noise on


depth and camera poses)
• Quantitative surface accuracy evaluation
• Color coding: absolute distances (ground truth)

Ours

96
Ground Truth: Quantitative Results
Frog (synthetic) Zollhöfer et al. 15

• Generated synthetic RGB-D dataset (noise on


depth and camera poses)
• Quantitative surface accuracy evaluation
• Color coding: absolute distances (ground truth)

Ours

Mean absolute deviation:


• Ours: 0.222mm (std.dev. 0.269mm)
• Zollhöfer et al: 0.278mm (std.dev. 0.299mm)
→ 20.14% more accurate
97
Qualitative Results
Fusion
Relief (geometry)
Input Color Ours

Zollhöfer et al. 15

Ours

98
Qualitative Results Fusion

Fountain (appearance)

Input Color Ours

Zollhöfer et al. 15

Ours

99
Qualitative Results
Lion
Geometry (ours) Appearance (ours)
Input Color

Fusion Ours Fusion Ours 100


Qualitative Results
Tomb Statuary

Input Color Geometry (ours) Appearance (ours)

Fusion Ours Fusion Ours

101
Qualitative Results
Gate

Input Color
Geometry (ours) Appearance (ours)

Fusion Ours Fusion Ours


102
Qualitative Results
Hieroglyphics

Input Color
Geometry (ours) Appearance (ours)

Fusion Ours Fusion Ours


103
Qualitative Results
Bricks

Input Color Geometry (ours)

Appearance (ours)

Ours Fusion104
Shading: Global SH vs. SVSH
Fountain

Luminance

105
Shading: Global SH vs. SVSH
Fountain

Luminance

Albedo 106
Shading: Global SH vs. SVSH
Fountain

Global SH

Luminance Shading Difference

Albedo 107
Shading: Global SH vs. SVSH
Fountain

Global SH

Luminance Shading Difference

SVSH

Albedo Shading Difference 108


Overview

• Motivation & State-of-the-art


• Approach
• Results
• Conclusion

109
Conclusion

• High-Quality 3D Reconstruction of Geometry


and Appearance
• Temporal view sampling & filtering techniques
• Spatially-Varying Lighting estimation
• Joint optimization of surface & albedo (SDF) and
image formation model
• Optimized texture as by-product

110
Conclusion

• High-Quality 3D Reconstruction of Geometry Thank you!


and Appearance
• Temporal view sampling & filtering techniques Questions?
• Spatially-Varying Lighting estimation
• Joint optimization of surface & albedo (SDF) and
image formation model
• Optimized texture as by-product
Robert Maier
Technical University of Munich
Computer Vision Group

robert.maier@in.tum.de
https://vision.in.tum.de/members/maierr

111