Lecture1 PDF

Lecture 1 - Introduction to Visual Geometry
DD2429
August 29, 2018

Computational Photography
General definition: Integration of digital cameras and

computers in order to create new visual information
• Image enhancement
• Removal of artifacts
• Automatic focusing and stabilization
• Stiching images into panoramas
• Multiple view 3D reconstruction
• Texture mapping to 3D models

Image stitching to generate panoramas
• Generic problem formulation: given several images of
the same object or scene,Multiple
compute views → 3D structure
a representation of
its 3D shape
Given several images of the same object or scene, compute a

representation of its 3D shape.
Multiple Views → 3D structure + Textured 3D Model
(a)
(a) ⇓
(b)
Figure 1. Surface reconstruction process
defined recursively as 4 Parameterization

⇢
1 if ui  u < ui+1
Ni0 (u) =
0 otherwise A parameterization is a one-to-one mapping between a
Given several
p
N (u) =
i
u images
u i
N
(b) of the same object or scene, compute a
p 1
(u) +
i
u u
N
i+p+1
(u) p 1
i+1
planar domain and the data points in 3-space. For B-spline,
the planar domain is a rectangular area D = [a, b] ⇥ [c, d].
ui+p u i u u i+p+1 i+1 Therefore, the process of parameterization is to find a func-
representation
on a knot vector of its 3D shape + transfer texture.
Figure 1. Surface reconstruction process tion : D ! {Qi }. For a data point Qk we seek a pair
Photo
Current View Tourism:
+ Multiple Exploring
other views → Photo
Where Collect
am I?
Photo
Noah Tourism:
Snavely Exploring Photo
Steven M. Seitz Collectio
University Noah
of Washington
Snavely University
StevenofM.
Washington
Seitz M
R
University of Washington University of Washington Mic
(a) (a) (b)

(b)
Figure 1:Figure 1: Our system takes unstructured collections

Camerasof corresponding
photographs such astothose
such as those from
fromonline
onlineimage
Images the images
Our system takes unstructured collections of photographs
and viewpoints (b) to enable novel ways of browsing the photos (c).
imag
and viewpoints (b) to enable novel ways of browsing the photos (c).
Given multiple images of the same scene where was each image
taken in the 3D world relative to one another.
Abstract is that these approaches wi
Abstract isworld’s
that these approaches
interesting and imp
We present a system for interactively browsing and exploring large During this same time, d
Basic high level ideas behind general approach.
3D pointstereo
Multi-view generates 2D point in multiple cameras
3D point generates 2D point in multiple cameras
Stereo
scene point
image plane
optical center
3D point is projected into each camera view (perspective camera).

Stereo
2D → 3D for calibrated cameras: Triangulation
Basic Principle: Triangulation

• Gives reconstruction as intersection of two rays
Can triangulate
• Requires back to 3D if have point correspondence &
calibrated –cameras.
calibration
– point correspondence
2D → 3D for un-calibrated cameras?
• Possible but need multiple (≥ 7 point matches for two views)

point matches.
• Can only recover cameras and 3D points up to a projective

transformation.
• Need an extra process to determine metric information about

the scene and camera projections.
• Usually need many more point matches than minimal number

to obtain an accurate 3D reconstruction because of noise in
the point positions.
2D → 3D for un-calibrated cameras
Fei-Fei Li Lecture 9 - 12 21‐Oct‐11
Usual intermediary step

• For an initial small number of point matches can compute a
geometric relation which when given position of point in image
a predicts its possible corresponding positions in image b.
• =⇒ Makes it easier to find and verify more matches.

High level overview of standard pipeline
1. Find corresponding points across images.
2. Use these correspondences to infer

- 3D position of these points and
- position and orientation of each camera relative to one another
This process usually gives a sparse reconstruction.
3. From this sparse reconstruction find a denser set of

correspondences to build a dense 3D model.
Steps 1 and 2 are known as Structure-from-Motion.

High level overview of standard pipeline
1.2. OVERVIEW 7
input
sequence
Feature
Extraction/ (matched)
Matching features
Relating multi-view
Images relations
Projective
Reconstruction projective
3D model
metric
Self-Calibration
3D model
Dense Matching dense

depth maps
textured metric
3D Model Building
3D surface model
Figure 1.7: Overview of the presented approach for 3D modeling from images
Applications of image based geometric computing
• Visualization of objects, scenes, events.
• Motion capture: medical diagnosis, character animation.
• Immersion of 3D objects into existing video.
• Robotics: navigation for automonous vehicles, object grasping

and manipulation.
• People identification in forencsic science.
What you will learn and do in this course.
Overview of the course
Understanding geometrical and mathematical foundations of the

connection between the 3D world and the 2D image.
• Prerequisites: Basic linear algebra and statistics
• Good to know: Multivariate analysis, optimization, matrix

theory, computer vision, machine learning
• Examination:
- Panoramic image + 3D visualization lab (Pass/Fail) (3 hp)
- Written examination (results in final grade) (3 hp)
Lab Part 1: Image stitching to panoramas (image mosaics)
A topic in the laboratory exercises

Lab Part 2: Multi-view 3D object reconstruction
But not as large as this one from ETHZ (Thomas Schöps)

Course Admin: Laboratory exercises
To pass the course you must complete the lab projects.

• The lab has 3 parts.
• There are 4 lab sessions.
• At the lab sessions you can

- get help and
- demonstrate and explain your code and your work
approved.
If you complete and demonstrate your lab projects on time, you

can get bonus points that go towards the final exam.
Course Admin: Laboratory exercises
Lab timetable and deadlines for bonus points
Date Time Part Deadline Bonus Points

Mon, 17 Sep 13:00 - 17:00 -
Mon, 24 Sep 13:00 - 17:00 1 1
Mon, 1 Oct 13:00 - 17:00 2 2
Mon, 8 Oct 13:00 - 17:00 3 1
Material you learn in this course is not purely academic
Start-up companies with former DD2429 students and
technology covered by DD2429
13th Lab (acquired Occulus AI) Tracab (acquired ChyronHego)
Volumental Univrses
Computer vision courses at RPL/CSC
• Image analysis and computer vision (DD2423), P2, 7.5hp

- Image restoration, filtering,
- feature extraction,
- stereo matching recognition.
• Computational Photography (DD2429), P1, 6hp
- Panoramic images,
- Estimating 3D structure from images.
• Deep learning in data science (DD2424), P4, 7.5hp

- Neural Network architectures,
- Nitty gritty of maths and practicalities of training neural networks,
- Applied to problems of recognition and classification in computer
vision, speech and nlp.
Need image processing and geometry to find point correspondences
Capturing images
Images are just pixels
Images are just pixels
Feature extraction - potential points to match
• Independently, in each image find points that have potentially

distinctive local image structure.
• Build a descriptor of the image patch around each found point.
More on this in DD2423 Image analysis and computer vision

Point matching
Match points across images such that

• local image descriptors are similar and
• the resulting point matches are geometrically consistent
More on this in DD2423 Image analysis and computer vision

Or make finding point matches easy...
• Use distinctive markers on the 3D object.

• Then easy to find and match these across multiple views.
• Example: Motion capture with optical markers.
Now to some more explicit geometrical details
2D image projections from 3D object
Parallel projection Perspective projection

Triangulation from 2D to 3D
With multiple 2D images you can reconstruct the object in 3D.

Single camera (parallel) projection equation
What is the position in 3D space?
 
X
x m11 m12 m13 m14 Y 

= 
y m21 m22 m23 m24 Z
1
2 equations, 3 unknowns ⇒ multiple solutions per point.

Calibrated reconstruction from two cameras
 
X
x 1 m11 m12 m13 m14  Y 
1 1 1 1 

=
y 1 m121 m122 m123 m124  Z 
1
 
X
x2 m211 m212 m213 m14 
2
Y 

= 
y2 m221 m222 m223 2
m24 Z
1
4 equations, 3 unknowns ⇒ solvable, but over-determined!

Unknown camera
What is the projection matrix?
 
X
x m11 m12 m13 m14 Y 

= 
y m21 m22 m23 m24 Z
1
2 equations, 8 unknowns ⇒ too many solutions.

Calibration from multiple points
X1
 
x1 1

m11 m12 m13 m14  Y 

= 
y1 m21 m22 m23 m24  Z 1 
1
 2
X2
x2

m11 m12 m13 m14 
Y 

=
y2 m21 m22 m23 m24  Z 2 
1
 3
X3
x3

m11 m12 m13 m14 
Y 

=
y3 m21 m22 m23 m24  Z 3 
1
 4
X4
x4

m11 m12 m13 m14 
Y 

=
y4 m21 m22 m23 m24  Z 4 
1
8 equations, 8 unknowns ⇒ solvable

Simultaneous calibration and reconstruction
What is the projection matrix and positions in 3D space?
 
X
x m11 m12 m13 m14 Y 

= 
y m21 m22 m23 m24 Z
1
Assuming M points in N views

⇒ 2M N equations, 3M + 8N unknowns
⇒ solvable if M and N large enough.
Reconstruction from single view
2 equations, 3 unknowns ⇒ multiple solutions per point.

But... this is an equally valid solution.

Can be avoided by constrained optimization

f1 (X 1 , Y 1 , Z 1 , X 2 , Y 2 , . . .) = 0,
f2 (X 1 , Y 1 , Z 1 , X 2 , Y 2 , . . .) = 0,
.. .
. = ..
Perspective projection
Pinhole camera model:
Most basic set-up of perspective projection.

• Centre of projection at (0, 0, 0)T .
• Image plane is the plane

 
X
(0, 0, 1)  Y  = 1
Z
(i.e. points of the form (·, ·, 1))
• Origin of image plane’s (2D)

coordinate system is (0, 0, 1)T .
Perspective projection
A 3D point (X, Y, Z)T is projected onto the image plane
X Y
x= y=
Z Z
Homogeneous coordinates
• Perspective projection is non-linear
X Y
x= y=
Z Z
• This makes things awkward...
• But can introduce a scale factor λ such that

   
x X
y  = λ  Y 
1 Z
• For what value of λ do we get back the original

projection equations?
• What’s great about this equation?

X Y
x= y=
Z Z

   
x X
y  = λ  Y 
1 Z


X Y
x= y=
Z Z

   
x X
y  = λ  Y 
1 Z


X Y
x= y=
Z Z

   
x X
y  = λ  Y 
1 Z


X Y
x= y=
Z Z

   
x X
y  = λ  Y 
1 Z


X Y
x= y=
Z Z

   
x X
y  = λ  Y 
1 Z


Relating projections in different cameras
In first camera:
   
x X
y  = λ  Y 
1 Z
In second camera:
 0  0
x X
 y 0  = λ0  Y 0 
1 Z0
In first camera:
   
x X
y  = λ  Y 
1 Z
In second camera:
 0  0
x X
 y 0  = λ0  Y 0 
1 Z0
 0  
x x
0
How do we relate y  to y  ??
1 1
Camera rotations - single axis
Say cameras are related by a rotation around the Y -axis.
θ
Camera rotations - single axis
Say cameras are related by a rotation around the Y -axis.
 0   
X cos(θ) 0 sin(θ) X
In matrix notation:  Y 0  =  0 1 0  Y 
Z0 − sin(θ) 0 cos(θ) Z
Camera rotation - single axis
1. Rotate camera around Y -axis:

 0   
X cos(θ) 0 sin(θ) X
Y 0  =  0 1 0  Y 
Z0 − sin(θ) 0 cos(θ) Z
2. Project point onto the image:

 0  0
x X
0
y  = λ  Y 0 
0
θ
1 Z0
3. Putting these together:

 0   
x cos(θ) 0 sin(θ) X
y 0  = λ0  0 1 0  Y 
1 − sin(θ) 0 cos(θ) Z
Remember in the first camera:

       
x X X x
y  = λ  Y  =⇒  Y  = λ−1 y 
1 Z Z 1
• Remember in the first camera:

       
x X X x
y  = λ  Y  −1
=⇒  Y  = λ y 
1 Z Z 1
• Therefore:
 0   
θ x cos(θ) 0 sin(θ) X
 y 0  = λ0  0 1 0  Y 
1 − sin(θ) 0 cos(θ) Z
  
cos(θ) 0 sin(θ) x
λ0 
= 0 1 0  y 
λ
− sin(θ) 0 cos(θ) 1
Next part of the lecture
• Image mosaics (panoramas)
• Estimating the camera rotation
• Least squares parameter estimation

Image mosaics
• Field-of-view of most cameras is limited to 30-40 degrees.
• How can we build a wide field of view image given one

ordinary camera?
1. Take multiple pictures of the scene
2. Each picture covers some of the overall scene.
Image mosaics

ordinary camera?
Image mosaics

ordinary camera?
Image mosaics

ordinary camera?
• What do we do with all these images??

Image mosaic from rotated cameras
Z
image A
Top view of 5 points in the 3D world

Z
image A
• Camera A is aligned with the world coordinate axis.

• Three points are in field of view of Camera A.
• Two points are not.
Z
Z0
image B
image A
X0
• Camera B related to Camera A by a rotation around the Y -axis.

• Red points are visible in Camera B.
Z
Z0
image B
image A
Extend image plane of camera A
X0
• Can estimate θ from points viewed in both images.

• Given θ can transfer points in image B to image A.
Estimating the camera rotation
Know:
• Our two cameras are related by a rotation θ (unknown) around the
Y -axis.
• The real world point (X, Y, Z)T maps to (xa , y a , 1)T in camera A
and (xb , y b , 1)T in camera B.
Given the matched image points, what can we say about θ?
Know:
• Our two cameras are related by a rotation θ (unknown) around the
Y -axis.
• The real world point (X, Y, Z)T maps to (xa , y a , 1)T in camera A
and (xb , y b , 1)T in camera B.
As we know the image points are related by

 a    b
x cos(θ) 0 − sin(θ) x
y a  = σ  0 1 0 y b 
1 sin(θ) 0 cos(θ) 1
As we know the image points are related by

 a    b
x cos(θ) 0 − sin(θ) x
y a  = σ  0 1 0 y b 
1 sin(θ) 0 cos(θ) 1
Multiplying this out get:
σ(cos(θ) xb − sin(θ))
 a  
x
y a  =  σy b 
1 σ(sin(θ) xb + cos(θ))
Multiplying this out get:
σ(cos(θ) xb − sin(θ))
 a  
x
y a  =  σy b 
1 σ(sin(θ) xb + cos(θ))
Equality holds when σ = 1/(sin(θ) xb + cos(θ)) thus get:
cos(θ) xb − sin(θ)
xa =
sin(θ) xb + cos(θ)
yb
ya =
sin(θ) xb+ cos(θ)
Equality holds when σ = 1/(sin(θ) xb + cos(θ)) thus get:
cos(θ) xb − sin(θ)
xa =
yb
ya =
These two equations give these constraints:
sin(θ) (xb xa + 1) + cos(θ)(xa − xb ) = 0

sin(θ) xb y a + cos(θ) y a − y b = 0
Least squares parameter estimation
Say we have n point matches between the two images:
(xai , yia ) ←→ (xbi , yib ) for i = 1, . . . , n

• Each pair of matched points gives these constraints on θ
sin(θ) (xbi xai + 1) + cos(θ)(xai − xbi ) = 0

sin(θ) xbi yia + cos(θ) yia + yib = 0
• We live, however, in a world of noisy measurements....

• We live, however, in a world of noisy measurements....
• Hence the constraints on θ are really
sin(θ) (xbi xai + 1) + cos(θ)(xai − xbi ) ≈ 0

sin(θ) xbi yia + cos(θ) yia + yib ≈ 0
Introduction of notation
• Unknowns:
s = sin(θ) c = cos(θ)
• For each matched pair introduce:
ai = xbi xai + 1 bi = xai − xbi ei = 0

ci = xbi yia di = yia fi = −yib
• Soft constraints on s and c: for i = 1, . . . , n
s ai + c bi + ei ≈ 0
s ci + c di + fi ≈ 0
Optimization problem
• Finding an s and c such that for i = 1, . . . , n:
s ai + c bi + ei ≈ 0
s ci + c di + fi ≈ 0
• Similar to finding the s and c which minimizes:
n
X
min (s ai + c bi + ei )2 + (s ci + c di + fi )2
s,c
i=1
Optimization problem
n
X
min (s ai + c bi + ei )2 + (s ci + c di + fi )2
s,c
i=1
• How is least-squares optimization problem solved??

1. Take derivatives w.r.t. s and c.
2. Set to these derivatives to zero.
3. Solve the resulting set of linear equations.
Calculate the derivatives
n
X
C(s, c) = (s ai + c bi + ei )2 + (s ci + c di + fi )2
i=1
Calculate the derivatives:

∂C(s, c) ∂C(s, c)
and
∂s ∂c
n
X
i=1
n
1 ∂C(s, c) X
= [(s ai + c bi + ei ) ai + (s ci + c di + fi ) ci ]
2 ∂s i=1
n
X n
X n
X
=s (a2i + c2i ) + c (ai bi + ci di ) + (ai ei + ci fi )
i=1 i=1 i=1
n
X
i=1
n
1 ∂C(s, c) X
= [(s ai + c bi + ei ) bi + (s ci + c di + fi ) di ]
2 ∂c i=1
n
X n
X n
X
=s (ai bi + ci di ) + c (b2i + d2i ) + (ei bi + di fi )
i=1 i=1 i=1
Set derivatives to zero
n
X
i=1
Set derivatives to zero

n
X n
X n
X
s (a2i + c2i ) +c (ai bi + ci di ) + (ai ei + ci fi ) = 0
i=1 i=1 i=1
Xn n
X Xn
s (ai bi + ci di ) + c (b2i + d2i ) + (ei bi + di fi ) = 0
i=1 i=1 i=1
Solve this linear set of equations
n
X
i=1
Solve this equation system:

n
X n
X n
X
s (a2i + c2i ) + c (ai bi + ci di ) + (ai ei + ci fi ) = 0
i=1 i=1 i=1
Xn n
X Xn
s (ai bi + ci di ) + c (b2i + d2i ) + (ei bi + di fi ) = 0
i=1 i=1 i=1
n
X
i=1
In matrix form the system can be written as:

 n n
  n 
X 2 2
X X
 (ai + ci ) (ai bi + ci di ) !  (ai ei + ci fi ) !
 i=1 i=1
 s  i=1  0

n n
 + n
=
c 0
X X 2  X 
2
(ai bi + ci di ) (bi + di ) (ei bi + di fi )
   
i=1 i=1 i=1
n
X
i=1
Thus
 n n
−1  n 
X 2 2
X X
!  (ai + ci ) (ai bi + ci di )  (ai ei + ci fi )
s  i=1 i=1
  i=1 
= − n n
 
n

c
X X 2   
2  X
(ai bi + ci di ) (bi + di ) (ei bi + di fi )
 
i=1 i=1 i=1
To summarize
• Know
One camera is related to another by a rotation around the
Y -axis by an unknown θ.
• Given
n point matches between the two cameras’ images
• Estimate
sin(θ) and cos(θ) from the correspondences by solving an
unconstrained optimization problem - linear least squares.
Technical issue
• In this example cos(θ) and sin(θ) were treated as independent

variables and the constraint
sin2 (θ) + cos2 (θ) = 1
was not enforced.
• To be formally correct we should have used this constraint

and solve a non-linear constrained minimization problem.
• However, in this case one can still get adequate results by

ignoring this constraint.
Re-examine the image mosaic
Z0
image B
image A
Extend image plane of camera A
X0
If the field of views of the two cameras overlap then can:

1. Find point matches between the two images.
2. Estimate the rotation matrix.
3. Transfer points from one camera to the other.
Extended image back-projected to cylinder
Approach in the previous slide can result in a very distorted final

image therefore..
extended
image
a
x
rotation
center
Figure 6: Cylindrical backprojection of extended image
x-coordinate transformation
In eq (8) we have extended thisto themodel
simple cylindrical include rotationθaround
somewhat tocoordinate is the
Y − axis.
x
tan(θ) =
In this section we will extend the camera models even further and include a complete
f
description of the position and orientation in 3D , known as external parameters, as well
as a description of scale factors and other parameters affecting the projection from 3D
to the image known as internal parameters
Looks better, but straight lines are no longer straight...

General camera rotation
• The camera can also be rotated around the X- and Z-axis

through the projection center.
• Can use 3 × 3 rotation matrices to represent these:

 
1 0 0
RX = 0 cos(φ) sin(φ)
0 − sin(φ) cos(φ)
 
cos(θ) 0 sin(θ)
RY =  0 1 0
− sin(θ) 0 cos(θ)
 
cos(ψ) sin(ψ) 0
RZ = − sin(ψ) cos(ψ) 0
0 0 1
General camera rotation
• Any rotation in 3D can be decomposed into simple rotations

around each axis.
• A general rotation can be written as:
R = RX RY RZ
• The projection from 3D to image coordinates can then be

written as:
 0  
x X
 y 0  = λ0 R  Y 
1 Z

Lecture1 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture1 PDF

Uploaded by

Copyright:

Available Formats

Lecture 1 - Introduction to Visual Geometry

August 29, 2018

General definition: Integration of digital cameras and

• Automatic focusing and stabilization

• Stiching images into panoramas

• Multiple view 3D reconstruction

• Texture mapping to 3D models

Given several images of the same object or scene, compute a

Figure 1. Surface reconstruction process

defined recursively as 4 Parameterization

(a) (a) (b)

Figure 1:Figure 1: Our system takes unstructured collections

3D point is projected into each camera view (perspective camera).

Basic Principle: Triangulation

• Possible but need multiple (≥ 7 point matches for two views)

• Can only recover cameras and 3D points up to a projective

• Need an extra process to determine metric information about

• Usually need many more point matches than minimal number

Fei-Fei Li Lecture 9 - 12 21‐Oct‐11

Usual intermediary step

• =⇒ Makes it easier to find and verify more matches.

1. Find corresponding points across images.

2. Use these correspondences to infer

3. From this sparse reconstruction find a denser set of

Steps 1 and 2 are known as Structure-from-Motion.

Dense Matching dense

• Visualization of objects, scenes, events.

• Motion capture: medical diagnosis, character animation.

• Immersion of 3D objects into existing video.

• Robotics: navigation for automonous vehicles, object grasping

Understanding geometrical and mathematical foundations of the

• Prerequisites: Basic linear algebra and statistics

• Good to know: Multivariate analysis, optimization, matrix

A topic in the laboratory exercises

But not as large as this one from ETHZ (Thomas Schöps)

To pass the course you must complete the lab projects.

• There are 4 lab sessions.

• At the lab sessions you can

If you complete and demonstrate your lab projects on time, you

Lab timetable and deadlines for bonus points

Date Time Part Deadline Bonus Points

13th Lab (acquired Occulus AI) Tracab (acquired ChyronHego)

• Image analysis and computer vision (DD2423), P2, 7.5hp

• Deep learning in data science (DD2424), P4, 7.5hp

• Independently, in each image find points that have potentially

More on this in DD2423 Image analysis and computer vision

Match points across images such that

More on this in DD2423 Image analysis and computer vision

• Use distinctive markers on the 3D object.

Parallel projection Perspective projection

With multiple 2D images you can reconstruct the object in 3D.

What is the position in 3D space?

2 equations, 3 unknowns ⇒ multiple solutions per point.

4 equations, 3 unknowns ⇒ solvable, but over-determined!

What is the projection matrix?

2 equations, 8 unknowns ⇒ too many solutions.

8 equations, 8 unknowns ⇒ solvable

What is the projection matrix and positions in 3D space?

Assuming M points in N views

2 equations, 3 unknowns ⇒ multiple solutions per point.

But... this is an equally valid solution.

Can be avoided by constrained optimization

Pinhole camera model: