Professional Documents
Culture Documents
Derek Hoiem
July 10, 2007
Robotics Institute
Carnegie Mellon University
Thesis Committee: Alexei A. Efros, Martial Hebert,
Rahul Sukthankar, Takeo Kanade, William Freeman
Scene Understanding
The World Behind the Image
3D Spatial Layout
SKY
VERTICAL
VERTICAL
SUPPORT
• Description of 3D Surfaces
• Occlusion Relationships
• Camera Viewpoint & Objects
3D Spatial Layout
3
1 2
• Description of 3D Surfaces
• Occlusion Relationships
• Camera Viewpoint & Objects
3D Spatial Layout
• Description of 3D Surfaces
• Occlusion Relationships
• Camera Viewpoint & Objects
Recent Work in 3D
…
Our World is Structured
…
Infer Most Likely Scene
Unlikely Likely
Description of 3D Surfaces
• Vertical
– Planar: facing Left (), Center ( ), Right ()
• Sky
Use All Available Cues
Texture gradient
Get Good Spatial Support
…
Labeling Segments
Labeled Pixels
Very High
Blue? Vanishing
Point?
Yes No Yes No
Avg. Accuracy
Main Class: 88%
Subclass: 62%
Avg. Accuracy
Main Class: 93%
Subclass: 76%
Guzman 1968
Figure/Ground Accuracy
Shapemes + CRF
Pb Boundaries 68.9%
Human Boundaries 78.3%
R1
R2
Occlusion
Initial Segmentation boundary?
2D Cues for Occlusions
1 3
Depth Depth
Underestimate Overestimate
Illustration of Depth Range
SKY
SUPPORT
P(occlusion)
P(occlusion)
Depth (Min)
• Training: 50 images
• Testing: 250 images (50 quantitative)
Occlusion vs. Non-Occlusion
Foreground/Background Accuracy
Ours
Edge/Region Cues + 3D Cues With CRF
Stage 1 58.7% 71.7%
Stage 2 65.4% 75.6% 77.3%
Stage 3 68.2% 77.1% 79.9%
Depth (Min)
Depth (Min)
Objects
Viewpoint 3D Surfaces
Results of a 2D Pedestrian Detector
True
Detection
False
Detections
Missed
Missed
True
Detections
Detector from [Dalal Triggs 2005]
2D Contextual Reasoning
Close
Not
Close
Camera Viewpoint
Image
3D Object Heights
Viewpoint from Scene Matching
Input Image
+
…
What does surface and viewpoint
say about objects?
...
o1 on
…
s1 sn
Improved Viewpoint Estimate
Viewpoint Initial Viewpoint Final
Likelihood
Likelihood
Height Horizon Horizon
Height
Improved Object Estimate
Car: TP / FP Initial (Local) Final (Global)
Ped: TP / FP
Car Detection
4 TP / 2 FP 4 TP / 1 FP
Ped Detection
3 TP / 2 FP 4 TP / 0 FP
Experiments on LabelMe Dataset
Detection Rate
Detection Rate
Only Only
90%
Bound:
More is Better
Initial: 6 TP / 1 FP Final: 9 TP / 0 FP
Results
Car: TP / FP Ped: TP / FP
Initial: 3 TP / 3 FP Final: 5 TP / 1 FP
Putting Objects in Perspective
Ped
meters
Ped
Car
meters
Geometrically Coherent Image
Interpretation
Surface Maps
Surfaces Occlusions
Su
pp
ort
Viewpoint/Size
Reasoning
Surface Maps
Depth, Boundaries
Surfaces Occlusions
Ho Su r i es
pp a ps
riz
on ort u nd Ma
,O Bo e ct
bje j
ct , Ob
Ma n
ps ir zo
Ho
Viewpoint/Size
Reasoning
Input Surfaces
Input Surfaces
Input Surfaces
Input Surfaces
Acknowledgements
• Committee: Alyosha, Martial, Rahul, Takeo, and Bill
• Practice Presentation: Srinivas, Tom, Alex
Vision as Scene Understanding
Initial: 1 TP / 23 FP Final: 0 TP / 10 FP
Initial: 1 TP / 5 FP Final: 5 TP / 2 FP
How do we get robust scene priors?
Input Image
Multiple Surface
Segmentations Estimates Final Labels
Learned Models
Estimating surface properties
• We want to know:
– Is a segment is good?
P(good segment | data)
Before After
Object Pasting
Before After
Are Surfaces Enough?