Professional Documents
Culture Documents
Perception II:
Fundamentals of Computer Vision
Autonomous Mobile Robots
Margarita Chli
Roland Siegwart and Nick Lawrance
www.v4rl.ethz.ch
environment model
path
local map
Information Path
Motion Control
Extraction Execution
see-think-act
Perception
actuator
raw data
commands
Sensing Acting
Real World
Environment
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 4
ASL
Autonomous Systems Lab
Today’s Topics
§ Perspective Projection
§ Camera Calibration
Vision
§ Vision is our most powerful sense in aiding our perception of the 3D world around us.
a a large proportion of our brain power is dedicated to processing the signals from our
eyes
Cameras:
§ Vision is increasingly popular as a sensing modality:
§ descriptive
§ compactness, compatibility, …
§ low cost
§ HW advances necessary to support the processing of images
q97country.com
sky
roof tower
building
window
door
people plant
car
shade
ground
outdoors
§ Partial 3D model from overlapping § Dense 3D surface model (from § Face recognition
images more overlapping images)
Enqvist et al., Tech Rep 2011 Goesele et al., ICCV 2007 www.dailymail.co.uk
Photoreceptive
object surface
Photoreceptive
object barrier surface
§ Image is inverted
§ Depth of the room (box) is the effective focal length
§ Pinhole model:
§ Captures beam of rays – all rays through a single point (note: no lens!)
§ The point is called Center of Projection or Optical Center
§ The image is formed on the Image Plane
§ We will use the pinhole camera model to describe how the image is formed
Based on slide by Steve Seitz
What can we do
to reduce the blur?
www.debevec.org/Pinhole/
§ Making the pinhole bigger (i.e. aperture) makes the image blurry
§ Making the pinhole bigger (i.e. aperture) makes the image blurry
Optical Center
object Lens
Focal Point
Optical Axis
Optical Center
Focal Length: f
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 26
ASL
Autonomous Systems Lab
A Focal Point
B
Image
f
z e
A Focal Point
B
Image
f
z e
§ Similar Triangles: B e
=
A z
A A Focal Point
B
Image
f
z e
A Focal Point
B
Image
f
z e
§ What happens if z is very big, i.e. z >> f ?
§ We need to adjust the image plane s.t. objects at infinity are in focus
1 1 1
= +
f z e
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
≅0 Perception II | 30
ASL
Autonomous Systems Lab
h Focal Point
C h'
Optical Center or
Center of Projection
e
Image
z f
§ What happens if z is very big, i.e. z >> f ?
§ We adjust the image plane s.t. objects at infinity are in focus
1 1 1 1 1
= + ⇒ ≈ ⇒ f ≈e
f z e f e
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
≅0 Perception II | 31
ASL
Autonomous Systems Lab
h Focal Point
C h'
Optical Center or
Center of Projection
z f
§ The dependence of the apparent size of an object on its depth (i.e. distance from
camera) is known as perspective.
h' f f
= ⇒ h' = h
h z z
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 32
ASL
Autonomous Systems Lab
“Ames room”
Perspective effects
Vertical vanishing
point
(at infinity)
Vanishing
line
Vanishing
Vanishing
point
point
Perspective projection
Perspective projection
§ For convenience: the image plane is usually represented in front of C, such that the
image preserves the same orientation (i.e. not flipped)
Zc = optical axis
§ A camera does not measure Zc
distances but angles! P
u ϕ
a a camera is a “bearing sensor”
v
O
O = principal point
p
f Image plane (CCD)
C Xc
C = optical center = center of the lens
Yc
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 38
ASL
Autonomous Systems Lab
§ The Camera point Pc= ( Xc , 0 , Zc )T projects to p = (x, y) onto the image plane
§ Convert p from the local image plane coords (x,y) to the pixel coords (u,v), we need to account
for:
(0,0) Image plane
§ the pixel coords of the camera optical center O = (u0 , v0 ) u
§ scale factors ku , kv for the pixel-size in both dimensions
ku fX c v
§ So: u = u 0 + k u x ⇒ u = u 0 + O (u0,v0) x
Zc
kv fYc
v = v0 + kv y ⇒ v = v0 + y p
Zc
§ Use Homogeneous Coordinates for linear mapping from 3D to 2D, by introducing an extra
element (scale): ⎡ λu ⎤ ⎡ u ⎤
⎛u ⎞ ⎢ ⎥ ⎢ ⎥
p = ⎜⎜ ⎟⎟ p! = ⎢ λ v ⎥ = λ ⎢ v ⎥ and similarly for the world coordinates. Note, usually λ = 1
⎝v⎠ ⎢ λ ⎥ ⎢ 1 ⎥
⎣ ⎦ ⎣ ⎦
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 41
ASL
Autonomous Systems Lab
Radial
Distortion
parameter
Camera Calibration
Camera Calibration
§ To estimate 11 unknowns, we need at least 6 points to calibrate the camera a solved using
linear least squares
⎢Z ⎥ ⎢ Zw ⎥
⎢⎣1 ⎥⎦ ⎢⎣m31 m32 m33 m34 ⎥⎦ ⎢ w ⎥ ⎢ ⎥
1
⎣ ⎦ ⎣ 1 ⎦
§ what we obtained: the 3 x 4 projection matrix,
what we need: its decomposition into the camera calibration matrix K, and the rotation R and position T of
the camera.
§ Use QR factorization to decompose the 3x3 submatrix (m11:m33) into the product of an upper triangular matrix
K and a rotation matrix R (orthogonal matrix)
⎡ m14 ⎤
§ The translation T can subsequently be obtained by: T = K −1 ⎢m24 ⎥
⎢ ⎥
⎢⎣ m34 ⎥⎦
learning.covcollege.ac.uk csiro.au
§ From a single image: we can only deduct the ray along which each image-point lies
3D Object
Left Right
Center of projection Center of projection
§ Observe the scene from 2 different viewpoints a solve for the intersection of the rays
and recover the 3D structure
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 50
ASL
Autonomous Systems Lab
§ Stereo vision:
using 2 cameras with known relative position T and orientation R, recover the 3D
scene information
§ Stereopsys: the brain allows us to see the left and right retinal images as a single 3D image
§ Observe image disparity
disparity
§ An ideal, simplified case: assume both cameras are identical and are aligned with
the x-axis
Z P = ( X P , YP , Z P )
From Similar Triangles:
f ul
=
Z P XP bf
ZP = ZP
f ur u l −u r
= ul ur
Z P X P −b
f X
Disparity Cl Cr
difference in image location of the projection of a b
3D point in two image planes
Baseline
distance between the optical centers
Autonomous Mobile Robots of the two cameras
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 53
ASL
Autonomous Systems Lab
§ Disparity: the difference in image location of corresponding 3D points as projected to the left
& right images
= ( X w , Yw , Z w )
§ To estimate the 3D position of Pw we construct the system of equations of the left and
right camera
= ( X w , Yw , Z w )
⎡ Xw ⎤
⎡ u ⎤ ⎢ ⎥
⎢ l ⎥ ⎢
⎡ Yw ⎥
Left camera: p! l = λl ⎢ vl ⎥ = K l ⎢⎣ I 0 ⎤⎥⎢ ⎥
⎢ ⎥ ⎦
set the world frame to coincide ⎢ Zw ⎥
(R,T) ⎢⎣ 1 ⎥⎦ ⎢ ⎥
with the left camera frame ⎣ 1 ⎦
§ Triangulation: the problem of determining ⎡ u ⎤
⎡ Xw ⎤
⎢ ⎥
the 3D position of a point, given a set of ⎢ r ⎥ ⎢ Yw ⎥
! ⎡ ⎤
corresponding image locations & known Right camera: pr = λr vr = K r ⎢⎣ R T ⎥⎦⎢
⎢ ⎥ ⎥
⎢ ⎥ ⎢ Zw ⎥
camera poses. ⎢⎣ 1 ⎥⎦ ⎢ ⎥
⎣ 1 ⎦
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 56
ASL
Autonomous Systems Lab
§ goal: identify image regions / patches in the left & right images, corresponding to the same
scene structure
§ Typical similarity measures: Normalized Cross-Correlation (NCC) , Sum of Squared Differences
(SSD), Sum of Absolute Differences (SAD), …
§ Exhaustive image search can be computationally very expensive!
Can we search for correspondences more efficiently?
eL CR
Autonomous Mobile Robots
CL
Margarita Chli, Nick Lawrance and Roland Siegwart eR Perception II | 58
ASL
Autonomous Systems Lab
§ Thanks to the epipolar constraint, corresponding points can be searched for, along
epipolar lines
⇒ computational cost reduced to 1 dimension!
Epipolar Rectification
Epipolar Rectification
Epipolar Rectification
Focal lengths
Epipolar Rectification
Focal lengths
Lens Distortion
Epipolar Rectification
Focal lengths
Lens Distortion
Translation
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 64
ASL
Autonomous Systems Lab
Epipolar Rectification
Focal lengths
Lens Distortion
Translation
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 65
ASL
Autonomous Systems Lab
Triangulation
• Given the projections p1 and p2 of a 3D point P in two or more images (with known camera
matrices R and T), find the coordinates of the 3D point
P=?
p2
p1
C1 C2
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
[R|T] Perception II | 67
ASL
Autonomous Systems Lab
Triangulation
• We want to intersect the two visual rays corresponding to p1 and p2, but because of noise and
numerical errors, they don’t meet exactly
P=?
p2
p1
C1 C2
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
[R|T] Perception II | 68
ASL
Autonomous Systems Lab
• Find shortest segment connecting the two viewing rays and let P be the midpoint of that
segment
p2
p1
C1 C2
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
[R|T] Perception II | 69
ASL
Autonomous Systems Lab
Left camera:
⎡ X ⎤
⎡ u ⎤ ⎢ w
⎥
⎢ 1 ⎥ ⎢ Yw ⎥
p!1 = K1 ⎡⎣ I 0⎤⎦ P! ⎡ ⎤
⇒ λ1 ⎢ v1 ⎥ = K1 ⎣ I 0⎦⎢ ⎥
⎢ ⎥ ⎢ Zw ⎥
⎢⎣ 1 ⎥⎦
M1 ⎢⎣ 1 ⎥⎦ P=?
Right Camera:
⎡ Xw ⎤
⎡ u ⎤ ⎢ ⎥
2
⎢ ⎥ Yw ⎥
p! 2 = K 2 ⎡⎣ R T ⎤⎦ P! ⎢
⇒ λ2 ⎢ v2 ⎥ = K 2 ⎡⎣ R T ⎤⎦⎢ ⎥
⎢ ⎥ Zw ⎥ p2
⎢⎣ 1 ⎥⎦ p1 ⎢
M2 ⎢⎣ 1 ⎥⎦
C1 C2
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
[R|T] Perception II | 70
ASL
Autonomous Systems Lab
Left camera:
Cross-multiply by p!1
0
p!1 = M1P! ⇒ M1P − p!1 = 0 ⇒ p!1 × M1P − p!1 × p!1 = 0 ⇒ [ p!1× ] M1P! = 0
! !
Right Camera:
Cross-multiply by p! 2
0
p! 2 = M 2 P! ⇒ M 2 P! − p! 2 = 0 ⇒ p! 2 × M 2 P! − p! 2 × p! 2 = 0 ⇒ [ p! 2× ] M 2 P! = 0
Left camera:
Right Camera:
p! 2 = M 2 P! ⇒ M 2 P! − p! 2 = 0 ⇒ p! 2 × M 2 P! = 0 ⇒ [ p! 2× ] M 2 P! = 0
⎡ 0 −az ay ⎤⎡ bx ⎤
⎢ ⎥⎢ ⎥
Cross product as matrix multiplication: a × b = ⎢ az 0 −ax ⎥⎢ by ⎥ = [a × ]b
⎢ ⎥⎢ ⎥
⎢⎣ −ay ax 0 ⎥⎢ bz ⎥
Autonomous Mobile Robots
⎦⎣ ⎦
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 72
ASL
Autonomous Systems Lab
Left camera:
Right Camera:
p! 2 = M 2 P! ⇒ M 2 P! − p! 2 = 0 ⇒ p! 2 × M 2 P! = 0 ⇒ [ p! 2× ] M 2 P! = 0
3 unknowns ( P ) : X w ,Yw , Z w
6 independent equations
C
(R,T) /
C
Multi-view SFM
§ Results of Structure from motion from user images from flickr.com [Seitz & Szeliski, ICCV 2009]
Event-based Cameras