You are on page 1of 95

Lecture 1 - Introduction to Visual Geometry

DD2429

August 29, 2018


Computational Photography

General definition: Integration of digital cameras and


computers in order to create new visual information

• Image enhancement

• Removal of artifacts

• Automatic focusing and stabilization

• Stiching images into panoramas

• Multiple view 3D reconstruction

• Texture mapping to 3D models


Image stitching to generate panoramas
• Generic problem formulation: given several images of
the same object or scene,Multiple
compute views → 3D structure
a representation of
its 3D shape

Given several images of the same object or scene, compute a


representation of its 3D shape.
Multiple Views → 3D structure + Textured 3D Model

(a)

(a) ⇓

(b)

Figure 1. Surface reconstruction process

defined recursively as 4 Parameterization



1 if ui  u < ui+1
Ni0 (u) =
0 otherwise A parameterization is a one-to-one mapping between a
Given several
p
N (u) =
i
u images
u i
N
(b) of the same object or scene, compute a
p 1
(u) +
i
u u
N
i+p+1
(u) p 1
i+1
planar domain and the data points in 3-space. For B-spline,
the planar domain is a rectangular area D = [a, b] ⇥ [c, d].
ui+p u i u u i+p+1 i+1 Therefore, the process of parameterization is to find a func-
representation
on a knot vector of its 3D shape + transfer texture.
Figure 1. Surface reconstruction process tion : D ! {Qi }. For a data point Qk we seek a pair
Photo
Current View Tourism:
+ Multiple Exploring
other views → Photo
Where Collect
am I?
Photo
Noah Tourism:
Snavely Exploring Photo
Steven M. Seitz Collectio
University Noah
of Washington
Snavely University
StevenofM.
Washington
Seitz M
R
University of Washington University of Washington Mic

(a) (a) (b)


(b)

Figure 1:Figure 1: Our system takes unstructured collections


Camerasof corresponding
photographs such astothose
such as those from
fromonline
onlineimage
Images the images
Our system takes unstructured collections of photographs
and viewpoints (b) to enable novel ways of browsing the photos (c).
imag
and viewpoints (b) to enable novel ways of browsing the photos (c).
Given multiple images of the same scene where was each image
taken in the 3D world relative to one another.
Abstract is that these approaches wi
Abstract isworld’s
that these approaches
interesting and imp
We present a system for interactively browsing and exploring large During this same time, d
Basic high level ideas behind general approach.
3D pointstereo
Multi-view generates 2D point in multiple cameras
3D point generates 2D point in multiple cameras
Stereo

scene point

image plane

optical center

3D point is projected into each camera view (perspective camera).


Stereo
2D → 3D for calibrated cameras: Triangulation

Basic Principle: Triangulation


• Gives reconstruction as intersection of two rays
Can triangulate
• Requires back to 3D if have point correspondence &
calibrated –cameras.
calibration
– point correspondence
2D → 3D for un-calibrated cameras?

• Possible but need multiple (≥ 7 point matches for two views)


point matches.

• Can only recover cameras and 3D points up to a projective


transformation.

• Need an extra process to determine metric information about


the scene and camera projections.

• Usually need many more point matches than minimal number


to obtain an accurate 3D reconstruction because of noise in
the point positions.
2D → 3D for un-calibrated cameras

Fei-Fei Li Lecture 9 - 12 21‐Oct‐11

Usual intermediary step


• For an initial small number of point matches can compute a
geometric relation which when given position of point in image
a predicts its possible corresponding positions in image b.

• =⇒ Makes it easier to find and verify more matches.


High level overview of standard pipeline

1. Find corresponding points across images.

2. Use these correspondences to infer


- 3D position of these points and
- position and orientation of each camera relative to one another
This process usually gives a sparse reconstruction.

3. From this sparse reconstruction find a denser set of


correspondences to build a dense 3D model.

Steps 1 and 2 are known as Structure-from-Motion.


High level overview of standard pipeline
1.2. OVERVIEW 7

input
sequence

Feature
Extraction/ (matched)
Matching features

Relating multi-view
Images relations

Projective
Reconstruction projective
3D model

metric
Self-Calibration
3D model

Dense Matching dense


depth maps

textured metric
3D Model Building
3D surface model

Figure 1.7: Overview of the presented approach for 3D modeling from images
Applications of image based geometric computing

• Visualization of objects, scenes, events.

• Motion capture: medical diagnosis, character animation.

• Immersion of 3D objects into existing video.

• Robotics: navigation for automonous vehicles, object grasping


and manipulation.
• People identification in forencsic science.
What you will learn and do in this course.
Overview of the course

Understanding geometrical and mathematical foundations of the


connection between the 3D world and the 2D image.

• Prerequisites: Basic linear algebra and statistics

• Good to know: Multivariate analysis, optimization, matrix


theory, computer vision, machine learning

• Examination:
- Panoramic image + 3D visualization lab (Pass/Fail) (3 hp)
- Written examination (results in final grade) (3 hp)
Lab Part 1: Image stitching to panoramas (image mosaics)

A topic in the laboratory exercises


Lab Part 2: Multi-view 3D object reconstruction

But not as large as this one from ETHZ (Thomas Schöps)


Course Admin: Laboratory exercises

To pass the course you must complete the lab projects.


• The lab has 3 parts.

• There are 4 lab sessions.

• At the lab sessions you can


- get help and
- demonstrate and explain your code and your work
approved.

If you complete and demonstrate your lab projects on time, you


can get bonus points that go towards the final exam.
Course Admin: Laboratory exercises

Lab timetable and deadlines for bonus points

Date Time Part Deadline Bonus Points


Mon, 17 Sep 13:00 - 17:00 -
Mon, 24 Sep 13:00 - 17:00 1 1
Mon, 1 Oct 13:00 - 17:00 2 2
Mon, 8 Oct 13:00 - 17:00 3 1
Material you learn in this course is not purely academic
Start-up companies with former DD2429 students and
technology covered by DD2429

13th Lab (acquired Occulus AI) Tracab (acquired ChyronHego)

Volumental Univrses
Computer vision courses at RPL/CSC

• Image analysis and computer vision (DD2423), P2, 7.5hp


- Image restoration, filtering,
- feature extraction,
- stereo matching recognition.
• Computational Photography (DD2429), P1, 6hp
- Panoramic images,
- Estimating 3D structure from images.

• Deep learning in data science (DD2424), P4, 7.5hp


- Neural Network architectures,
- Nitty gritty of maths and practicalities of training neural networks,
- Applied to problems of recognition and classification in computer
vision, speech and nlp.
Need image processing and geometry to find point correspondences
Capturing images
Images are just pixels
Images are just pixels
Feature extraction - potential points to match

• Independently, in each image find points that have potentially


distinctive local image structure.
• Build a descriptor of the image patch around each found point.

More on this in DD2423 Image analysis and computer vision


Point matching

Match points across images such that


• local image descriptors are similar and
• the resulting point matches are geometrically consistent

More on this in DD2423 Image analysis and computer vision


Or make finding point matches easy...

• Use distinctive markers on the 3D object.


• Then easy to find and match these across multiple views.
• Example: Motion capture with optical markers.
Now to some more explicit geometrical details
2D image projections from 3D object

Parallel projection Perspective projection


Triangulation from 2D to 3D

With multiple 2D images you can reconstruct the object in 3D.


Single camera (parallel) projection equation

What is the position in 3D space?

 
    X
x m11 m12 m13 m14 Y 

= 
y m21 m22 m23 m24 Z
1

2 equations, 3 unknowns ⇒ multiple solutions per point.


Calibrated reconstruction from two cameras

 
    X
x 1 m11 m12 m13 m14  Y 
1 1 1 1 

=
y 1 m121 m122 m123 m124  Z 
1
 
    X
x2 m211 m212 m213 m14 
2
Y 

= 
y2 m221 m222 m223 2
m24 Z
1

4 equations, 3 unknowns ⇒ solvable, but over-determined!


Unknown camera

What is the projection matrix?

 
    X
x m11 m12 m13 m14 Y 

= 
y m21 m22 m23 m24 Z
1

2 equations, 8 unknowns ⇒ too many solutions.


Calibration from multiple points

X1
 
x1 1
   
m11 m12 m13 m14  Y 

= 
y1 m21 m22 m23 m24  Z 1 
1
 2
 X2
x2
  
m11 m12 m13 m14 
Y 

=
y2 m21 m22 m23 m24  Z 2 
1
 3
 X3
x3
  
m11 m12 m13 m14 
Y 

=
y3 m21 m22 m23 m24  Z 3 
1
 4
 X4
x4
  
m11 m12 m13 m14 
Y 

=
y4 m21 m22 m23 m24  Z 4 
1

8 equations, 8 unknowns ⇒ solvable


Simultaneous calibration and reconstruction

What is the projection matrix and positions in 3D space?

 
    X
x m11 m12 m13 m14 Y 

= 
y m21 m22 m23 m24 Z
1

Assuming M points in N views


⇒ 2M N equations, 3M + 8N unknowns
⇒ solvable if M and N large enough.
Reconstruction from single view

2 equations, 3 unknowns ⇒ multiple solutions per point.


Reconstruction from single view

But... this is an equally valid solution.


Reconstruction from single view

Can be avoided by constrained optimization


f1 (X 1 , Y 1 , Z 1 , X 2 , Y 2 , . . .) = 0,
f2 (X 1 , Y 1 , Z 1 , X 2 , Y 2 , . . .) = 0,
.. .
. = ..
Perspective projection

Pinhole camera model:

Most basic set-up of perspective projection.


• Centre of projection at (0, 0, 0)T .

• Image plane is the plane


 
X
(0, 0, 1)  Y  = 1
Z

(i.e. points of the form (·, ·, 1))

• Origin of image plane’s (2D)


coordinate system is (0, 0, 1)T .
Perspective projection
A 3D point (X, Y, Z)T is projected onto the image plane

X Y
x= y=
Z Z
Homogeneous coordinates

• Perspective projection is non-linear

X Y
x= y=
Z Z
• This makes things awkward...

• But can introduce a scale factor λ such that


   
x X
y  = λ  Y 
1 Z

• For what value of λ do we get back the original


projection equations?

• What’s great about this equation?


Homogeneous coordinates

• Perspective projection is non-linear

X Y
x= y=
Z Z
• This makes things awkward...

• But can introduce a scale factor λ such that


   
x X
y  = λ  Y 
1 Z

• For what value of λ do we get back the original


projection equations?

• What’s great about this equation?


Homogeneous coordinates

• Perspective projection is non-linear

X Y
x= y=
Z Z
• This makes things awkward...

• But can introduce a scale factor λ such that


   
x X
y  = λ  Y 
1 Z

• For what value of λ do we get back the original


projection equations?

• What’s great about this equation?


Homogeneous coordinates

• Perspective projection is non-linear

X Y
x= y=
Z Z
• This makes things awkward...

• But can introduce a scale factor λ such that


   
x X
y  = λ  Y 
1 Z

• For what value of λ do we get back the original


projection equations?

• What’s great about this equation?


Homogeneous coordinates

• Perspective projection is non-linear

X Y
x= y=
Z Z
• This makes things awkward...

• But can introduce a scale factor λ such that


   
x X
y  = λ  Y 
1 Z

• For what value of λ do we get back the original


projection equations?

• What’s great about this equation?


Homogeneous coordinates

• Perspective projection is non-linear

X Y
x= y=
Z Z
• This makes things awkward...

• But can introduce a scale factor λ such that


   
x X
y  = λ  Y 
1 Z

• For what value of λ do we get back the original


projection equations?

• What’s great about this equation?


Relating projections in different cameras

In first camera:
   
x X
y  = λ  Y 
1 Z
Relating projections in different cameras

In second camera:
 0  0
x X
 y 0  = λ0  Y 0 
1 Z0
Relating projections in different cameras

In first camera:
   
x X
y  = λ  Y 
1 Z

In second camera:
 0  0
x X
 y 0  = λ0  Y 0 
1 Z0

 0  
x x
0
How do we relate y  to y  ??
1 1
Camera rotations - single axis

Say cameras are related by a rotation around the Y -axis.

θ
Camera rotations - single axis

Say cameras are related by a rotation around the Y -axis.

 0   
X cos(θ) 0 sin(θ) X
In matrix notation:  Y 0  =  0 1 0  Y 
Z0 − sin(θ) 0 cos(θ) Z
Camera rotation - single axis

1. Rotate camera around Y -axis:


 0   
X cos(θ) 0 sin(θ) X
Y 0  =  0 1 0  Y 
Z0 − sin(θ) 0 cos(θ) Z

2. Project point onto the image:


 0  0
x X
0
y  = λ  Y 0 
0
θ
1 Z0

3. Putting these together:


 0   
x cos(θ) 0 sin(θ) X
y 0  = λ0  0 1 0  Y 
1 − sin(θ) 0 cos(θ) Z
Relating projections in different cameras

Remember in the first camera:


       
x X X x
y  = λ  Y  =⇒  Y  = λ−1 y 
1 Z Z 1
Relating projections in different cameras

• Remember in the first camera:


       
x X X x
y  = λ  Y  −1
=⇒  Y  = λ y 
1 Z Z 1

• Therefore:
 0   
θ x cos(θ) 0 sin(θ) X
 y 0  = λ0  0 1 0  Y 
1 − sin(θ) 0 cos(θ) Z
  
cos(θ) 0 sin(θ) x
λ0 
= 0 1 0  y 
λ
− sin(θ) 0 cos(θ) 1
Next part of the lecture

• Image mosaics (panoramas)

• Estimating the camera rotation

• Least squares parameter estimation


Image mosaics

• Field-of-view of most cameras is limited to 30-40 degrees.

• How can we build a wide field of view image given one


ordinary camera?
1. Take multiple pictures of the scene
2. Each picture covers some of the overall scene.
Image mosaics

• Field-of-view of most cameras is limited to 30-40 degrees.

• How can we build a wide field of view image given one


ordinary camera?
1. Take multiple pictures of the scene
2. Each picture covers some of the overall scene.
Image mosaics

• Field-of-view of most cameras is limited to 30-40 degrees.

• How can we build a wide field of view image given one


ordinary camera?
1. Take multiple pictures of the scene
2. Each picture covers some of the overall scene.
Image mosaics

• Field-of-view of most cameras is limited to 30-40 degrees.

• How can we build a wide field of view image given one


ordinary camera?
1. Take multiple pictures of the scene
2. Each picture covers some of the overall scene.

• What do we do with all these images??


Image mosaic from rotated cameras
Z

image A

Top view of 5 points in the 3D world


Image mosaic from rotated cameras
Z

image A

• Camera A is aligned with the world coordinate axis.


• Three points are in field of view of Camera A.
• Two points are not.
Image mosaic from rotated cameras
Z

Z0

image B

image A

X0

• Camera B related to Camera A by a rotation around the Y -axis.


• Red points are visible in Camera B.
Image mosaic from rotated cameras
Z

Z0

image B

image A
Extend image plane of camera A

X0

• Can estimate θ from points viewed in both images.


• Given θ can transfer points in image B to image A.
Estimating the camera rotation
Know:
• Our two cameras are related by a rotation θ (unknown) around the
Y -axis.
• The real world point (X, Y, Z)T maps to (xa , y a , 1)T in camera A
and (xb , y b , 1)T in camera B.
Given the matched image points, what can we say about θ?
Estimating the camera rotation
Know:
• Our two cameras are related by a rotation θ (unknown) around the
Y -axis.
• The real world point (X, Y, Z)T maps to (xa , y a , 1)T in camera A
and (xb , y b , 1)T in camera B.
Given the matched image points, what can we say about θ?
Estimating the camera rotation
Given the matched image points, what can we say about θ?

As we know the image points are related by


 a    b
x cos(θ) 0 − sin(θ) x
y a  = σ  0 1 0 y b 
1 sin(θ) 0 cos(θ) 1
Estimating the camera rotation
Given the matched image points, what can we say about θ?

As we know the image points are related by


 a    b
x cos(θ) 0 − sin(θ) x
y a  = σ  0 1 0 y b 
1 sin(θ) 0 cos(θ) 1

Multiplying this out get:

σ(cos(θ) xb − sin(θ))
 a  
x
y a  =  σy b 
1 σ(sin(θ) xb + cos(θ))
Estimating the camera rotation
Given the matched image points, what can we say about θ?

Multiplying this out get:

σ(cos(θ) xb − sin(θ))
 a  
x
y a  =  σy b 
1 σ(sin(θ) xb + cos(θ))

Equality holds when σ = 1/(sin(θ) xb + cos(θ)) thus get:

cos(θ) xb − sin(θ)
xa =
sin(θ) xb + cos(θ)
yb
ya =
sin(θ) xb+ cos(θ)
Estimating the camera rotation
Given the matched image points, what can we say about θ?

Equality holds when σ = 1/(sin(θ) xb + cos(θ)) thus get:

cos(θ) xb − sin(θ)
xa =
sin(θ) xb + cos(θ)
yb
ya =
sin(θ) xb + cos(θ)

These two equations give these constraints:

sin(θ) (xb xa + 1) + cos(θ)(xa − xb ) = 0


sin(θ) xb y a + cos(θ) y a − y b = 0
Least squares parameter estimation

Say we have n point matches between the two images:

(xai , yia ) ←→ (xbi , yib ) for i = 1, . . . , n


Least squares parameter estimation

Say we have n point matches between the two images:

(xai , yia ) ←→ (xbi , yib ) for i = 1, . . . , n

• Each pair of matched points gives these constraints on θ

sin(θ) (xbi xai + 1) + cos(θ)(xai − xbi ) = 0


sin(θ) xbi yia + cos(θ) yia + yib = 0
Least squares parameter estimation

Say we have n point matches between the two images:

(xai , yia ) ←→ (xbi , yib ) for i = 1, . . . , n

• We live, however, in a world of noisy measurements....


Least squares parameter estimation

Say we have n point matches between the two images:

(xai , yia ) ←→ (xbi , yib ) for i = 1, . . . , n

• We live, however, in a world of noisy measurements....

• Hence the constraints on θ are really

sin(θ) (xbi xai + 1) + cos(θ)(xai − xbi ) ≈ 0


sin(θ) xbi yia + cos(θ) yia + yib ≈ 0
Introduction of notation

• Unknowns:

s = sin(θ) c = cos(θ)

• For each matched pair introduce:

ai = xbi xai + 1 bi = xai − xbi ei = 0


ci = xbi yia di = yia fi = −yib

• Soft constraints on s and c: for i = 1, . . . , n

s ai + c bi + ei ≈ 0
s ci + c di + fi ≈ 0
Optimization problem

• Finding an s and c such that for i = 1, . . . , n:

s ai + c bi + ei ≈ 0
s ci + c di + fi ≈ 0

• Similar to finding the s and c which minimizes:

n
X  
min (s ai + c bi + ei )2 + (s ci + c di + fi )2
s,c
i=1
Optimization problem

n
X  
min (s ai + c bi + ei )2 + (s ci + c di + fi )2
s,c
i=1

• How is least-squares optimization problem solved??


1. Take derivatives w.r.t. s and c.
2. Set to these derivatives to zero.
3. Solve the resulting set of linear equations.
Calculate the derivatives

n
X  
C(s, c) = (s ai + c bi + ei )2 + (s ci + c di + fi )2
i=1

Calculate the derivatives:


∂C(s, c) ∂C(s, c)
and
∂s ∂c
Calculate the derivatives

n
X  
C(s, c) = (s ai + c bi + ei )2 + (s ci + c di + fi )2
i=1

n
1 ∂C(s, c) X
= [(s ai + c bi + ei ) ai + (s ci + c di + fi ) ci ]
2 ∂s i=1
n
X n
X n
X
=s (a2i + c2i ) + c (ai bi + ci di ) + (ai ei + ci fi )
i=1 i=1 i=1
Calculate the derivatives

n
X  
C(s, c) = (s ai + c bi + ei )2 + (s ci + c di + fi )2
i=1

n
1 ∂C(s, c) X
= [(s ai + c bi + ei ) bi + (s ci + c di + fi ) di ]
2 ∂c i=1
n
X n
X n
X
=s (ai bi + ci di ) + c (b2i + d2i ) + (ei bi + di fi )
i=1 i=1 i=1
Set derivatives to zero

n
X  
C(s, c) = (s ai + c bi + ei )2 + (s ci + c di + fi )2
i=1

Set derivatives to zero


n
X n
X n
X
s (a2i + c2i ) +c (ai bi + ci di ) + (ai ei + ci fi ) = 0
i=1 i=1 i=1
Xn n
X Xn
s (ai bi + ci di ) + c (b2i + d2i ) + (ei bi + di fi ) = 0
i=1 i=1 i=1
Solve this linear set of equations

n
X  
C(s, c) = (s ai + c bi + ei )2 + (s ci + c di + fi )2
i=1

Solve this equation system:


n
X n
X n
X
s (a2i + c2i ) + c (ai bi + ci di ) + (ai ei + ci fi ) = 0
i=1 i=1 i=1
Xn n
X Xn
s (ai bi + ci di ) + c (b2i + d2i ) + (ei bi + di fi ) = 0
i=1 i=1 i=1
Solve this linear set of equations

n
X  
C(s, c) = (s ai + c bi + ei )2 + (s ci + c di + fi )2
i=1

In matrix form the system can be written as:


 n n
  n 
X 2 2
X X
 (ai + ci ) (ai bi + ci di ) !  (ai ei + ci fi ) !
 i=1 i=1
 s  i=1  0

n n
 + n
=
c 0
X X 2  X 
2
(ai bi + ci di ) (bi + di ) (ei bi + di fi )
   
i=1 i=1 i=1
Solve this linear set of equations

n
X  
C(s, c) = (s ai + c bi + ei )2 + (s ci + c di + fi )2
i=1

Thus
 n n
−1  n 
X 2 2
X X
!  (ai + ci ) (ai bi + ci di )  (ai ei + ci fi )
s  i=1 i=1
  i=1 
= − n n
 
n

c
X X 2   
2  X
(ai bi + ci di ) (bi + di ) (ei bi + di fi )
 
i=1 i=1 i=1
To summarize

• Know
One camera is related to another by a rotation around the
Y -axis by an unknown θ.

• Given
n point matches between the two cameras’ images

(xai , yia ) ←→ (xbi , yib ) for i = 1, . . . , n

• Estimate
sin(θ) and cos(θ) from the correspondences by solving an
unconstrained optimization problem - linear least squares.
Technical issue

• In this example cos(θ) and sin(θ) were treated as independent


variables and the constraint

sin2 (θ) + cos2 (θ) = 1

was not enforced.

• To be formally correct we should have used this constraint


and solve a non-linear constrained minimization problem.

• However, in this case one can still get adequate results by


ignoring this constraint.
Re-examine the image mosaic

Z0

image B

image A
Extend image plane of camera A

X0

If the field of views of the two cameras overlap then can:


1. Find point matches between the two images.
2. Estimate the rotation matrix.
3. Transfer points from one camera to the other.
Extended image back-projected to cylinder

Approach in the previous slide can result in a very distorted final


image therefore..
Extended image back-projected to cylinder

extended
image

a
x

rotation
center

Figure 6: Cylindrical backprojection of extended image

x-coordinate transformation
In eq (8) we have extended thisto themodel
simple cylindrical include rotationθaround
somewhat tocoordinate is the
Y − axis.
x
tan(θ) =
In this section we will extend the camera models even further and include a complete
f
description of the position and orientation in 3D , known as external parameters, as well
as a description of scale factors and other parameters affecting the projection from 3D
to the image known as internal parameters
Extended image back-projected to cylinder

Looks better, but straight lines are no longer straight...


General camera rotation

• The camera can also be rotated around the X- and Z-axis


through the projection center.

• Can use 3 × 3 rotation matrices to represent these:


 
1 0 0
RX = 0 cos(φ) sin(φ)
0 − sin(φ) cos(φ)
 
cos(θ) 0 sin(θ)
RY =  0 1 0
− sin(θ) 0 cos(θ)
 
cos(ψ) sin(ψ) 0
RZ = − sin(ψ) cos(ψ) 0
0 0 1
General camera rotation

• Any rotation in 3D can be decomposed into simple rotations


around each axis.

• A general rotation can be written as:

R = RX RY RZ

• The projection from 3D to image coordinates can then be


written as:
 0  
x X
 y 0  = λ0 R  Y 
1 Z

You might also like