You are on page 1of 81

ASL

Autonomous Systems Lab

Perception II:
Fundamentals of Computer Vision
Autonomous Mobile Robots

Margarita Chli
Roland Siegwart and Nick Lawrance

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 1
ASL
Autonomous Systems Lab

www.v4rl.ethz.ch

Collaborative scene perception

Vision-based robotic perception

Focus on small aircraft


Onboard visual 3D reconstruction
§  Tracking & mapping
§  Collaborative sensing & mapping for multi-robot
§  Experimentation with a variety of cameras
Autonomous Mobile Robots
Depth-camera surface estimation
Margarita Chli, Nick Lawrance and Roland Siegwart | 2
ASL
Autonomous Systems Lab

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 3
ASL
Autonomous Systems Lab

Mobile Robot Control Scheme


knowledge, mission
data base commands

Localization “position“ Cognition


Map Building global map Path Planning

environment model
path
local map

Information Path

Motion Control
Extraction Execution
see-think-act
Perception

actuator
raw data
commands

Sensing Acting

Real World
Environment
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 4
ASL
Autonomous Systems Lab

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart | 5
ASL
Autonomous Systems Lab

Today’s Topics

Section 4.2 in the book : Fundamentals of Computer Vision


§  Pinhole Camera Model

§  Perspective Projection
§  Camera Calibration

§  Stereo Vision, Epipolar Geometry & Triangulation

§  Structure from Motion & Optical Flow


Additional, optional reading on Computer Vision:
“Computer Vision: Algorithms and Applications“, by
Richard Szeliski (Springer)

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 6
ASL
Autonomous Systems Lab

Vision

Sony Cybershot WX1

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 7
ASL
Autonomous Systems Lab

One picture, a thousand words

§  Vision is our most powerful sense in aiding our perception of the 3D world around us.

§  Retina is ~10cm2. Contains millions of photoreceptors


(120 mil. rods and 7 mil. Cones for colour sampling)

§  Provides enormous amount of information: data-rate of ~3 GBytes/s

a a large proportion of our brain power is dedicated to processing the signals from our
eyes

Autonomous Mobile Robots http://webvision.med.utah.edu/sretina.html


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 8
ASL
Autonomous Systems Lab

Human Visual Capabilities

§  Our visual system is very sophisticated


§  Humans can interpret images successfully under a wide range of conditions – even in
the presence of very limited cues

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 9
ASL
Autonomous Systems Lab

Do we get it always right?

Count the black spots

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 10
ASL
Autonomous Systems Lab

Do we get it always right?

Which square is darker, A or B?

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 11
ASL
Autonomous Systems Lab

Do we get it always right?

Which square is darker, A or B?

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 12
ASL
Autonomous Systems Lab

Computer Vision for Robotics

§  Enormous descriptability of images a a lot of data to process


(human vision involves 60 billion neurons!)

§  Vision provides humans with a great deal of useful cues


a explore the power of vision towards intelligent robots

Cameras:
§  Vision is increasingly popular as a sensing modality:
§  descriptive
§  compactness, compatibility, …
§  low cost
§  HW advances necessary to support the processing of images

Image: Nicholas M. Short


Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 13
ASL
Autonomous Systems Lab

Computer Vision | applications

q97country.com

www.robertlpeters.com www.samsung.com news.nationalgeographic.com

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 14
ASL
Autonomous Systems Lab

Computer Vision | what is it?

§  Automatic extraction of “meaningful” information from images and videos


– varies depending on the application

sky
roof tower
building

window
door
people plant
car
shade
ground
outdoors

Semantic information Geometric information


Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 15
ASL
Autonomous Systems Lab

Computer Vision | how far along are we?

§  Partial 3D model from overlapping §  Dense 3D surface model (from §  Face recognition
images more overlapping images)

Enqvist et al., Tech Rep 2011 Goesele et al., ICCV 2007 www.dailymail.co.uk

§ How many animals in the image?


§ Resort to physics-based probabilistic
models to disambiguate solutions

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 16
http://articles.orlandosentinel.com
ASL
Autonomous Systems Lab

The camera | image formation

§  If we place a piece of film in front of an object, do we get a reasonable image?

Photoreceptive
object surface

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 17
ASL
Autonomous Systems Lab

The camera | image formation

§  If we place a piece of film in front of an object, do we get a reasonable image?


§  Add a barrier to block off most of the rays
§  This reduces blurring
§  The opening is known as the aperture

Photoreceptive
object barrier surface

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 18
ASL
Autonomous Systems Lab

The camera | camera obscura

§  Basic principle known to Mozi (470-390 BC),


Aristotle (384-322 BC), Euclid (323-283 BC)
§  Drawing aid for artists: described by Leonardo da
Vinci (1452-1519)
commons.wikimedia.org

§  Image is inverted
§  Depth of the room (box) is the effective focal length

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 19
www.thelivingmoon.com
ASL
Autonomous Systems Lab

The camera | the pinhole camera model

§  Pinhole model:
§  Captures beam of rays – all rays through a single point (note: no lens!)
§  The point is called Center of Projection or Optical Center
§  The image is formed on the Image Plane

§  We will use the pinhole camera model to describe how the image is formed
Based on slide by Steve Seitz

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 20
ASL
Autonomous Systems Lab

The camera | home-made pinhole camera

What can we do
to reduce the blur?

www.debevec.org/Pinhole/

Based on slide by Steve Seitz


Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 21
ASL
Autonomous Systems Lab

The camera | shrinking the aperture

Why not make the aperture as small as possible?

§  Less light gets through (must increase the exposure)


§  Diffraction effects…

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 22
ASL
Autonomous Systems Lab

The camera | why use a lens?

§  The ideal pinhole:


only one ray of light reaches each point on the film
a image can be very dim; gives rise to diffraction effects

§  Making the pinhole bigger (i.e. aperture) makes the image blurry

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 23
ASL
Autonomous Systems Lab

The camera | why use a lens?

§  The ideal pinhole:


only one ray of light reaches each point on the film
a image can be very dim; gives rise to diffraction effects

§  Making the pinhole bigger (i.e. aperture) makes the image blurry

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 24
ASL
Autonomous Systems Lab

The camera | why use a lens?

§  A lens focuses light onto the film


§  Rays passing through the optical center are not deviated

object Lens film

Optical Center

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 25
ASL
Autonomous Systems Lab

The camera | why use a lens?

§  A lens focuses light onto the film


§  Rays passing through the optical center are not deviated
§  All rays parallel to the optical axis converge at the focal point

object Lens

Focal Point
Optical Axis

Optical Center

Focal Length: f
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 26
ASL
Autonomous Systems Lab

The camera | how to create a focused image?


Lens
Object

A Focal Point

B
Image
f
z e

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 27
ASL
Autonomous Systems Lab

The camera | how to create a focused image?


Lens
Object

A Focal Point

B
Image
f
z e

§  Similar Triangles: B e
=
A z

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 28
ASL
Autonomous Systems Lab

The camera | the thin lens equation


Lens
Object

A A Focal Point

B
Image
f
z e

B e “Thin lens equation”


§  Similar Triangles: =
A z e e 1 1 1
B e− f e −1 = ⇒ = +
= = −1 f z f z e
A f f
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 29
ASL
Autonomous Systems Lab

The camera | pinhole approximation


Lens
Object

A Focal Point

B
Image
f
z e
§  What happens if z is very big, i.e. z >> f ?
§  We need to adjust the image plane s.t. objects at infinity are in focus

1 1 1
= +
f z e
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
≅0 Perception II | 30
ASL
Autonomous Systems Lab

The camera | pinhole approximation

Object Lens Focal Plane

h Focal Point
C h'
Optical Center or
Center of Projection
e
Image
z f
§  What happens if z is very big, i.e. z >> f ?
§  We adjust the image plane s.t. objects at infinity are in focus

1 1 1 1 1
= + ⇒ ≈ ⇒ f ≈e
f z e f e
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
≅0 Perception II | 31
ASL
Autonomous Systems Lab

The camera | pinhole approximation

Object Lens Focal Plane

h Focal Point
C h'
Optical Center or
Center of Projection

z f
§  The dependence of the apparent size of an object on its depth (i.e. distance from
camera) is known as perspective.

h' f f
= ⇒ h' = h
h z z
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 32
ASL
Autonomous Systems Lab

Playing with Perspective

§  Perspective gives us very strong depth cues


a hence we can perceive a 3D scene by viewing its 2D representation (i.e. image)

“Ames room”

A clip from "The computer that


ate Hollywood" documentary. Dr.
Vilayanur S. Ramachandran.

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 33
ASL
Autonomous Systems Lab

Perspective effects

§  Far away objects appear smaller

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 34
ASL
Autonomous Systems Lab

Vanishing points and lines

§  Parallel lines in the world intersect in the image at a “vanishing point”

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 35
ASL
Autonomous Systems Lab

Vanishing points and lines

Vertical vanishing
point
(at infinity)

Vanishing
line

Vanishing
Vanishing
point
point

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 36
ASL
Autonomous Systems Lab

Perspective projection

How world points map to pixels in the image?

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 37
ASL
Autonomous Systems Lab

Perspective projection

§  For convenience: the image plane is usually represented in front of C, such that the
image preserves the same orientation (i.e. not flipped)

Zc = optical axis
§  A camera does not measure Zc
distances but angles! P
u ϕ
a a camera is a “bearing sensor”
v
O
O = principal point
p
f Image plane (CCD)

C Xc
C = optical center = center of the lens
Yc
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 38
ASL
Autonomous Systems Lab

Perspective projection | from world to pixel coordinates

Find pixel coordinates (u,v) of point


Pw in the world frame:

0. Convert world point Pw to


camera point Pc
Pc , Pw
u
Find pixel coordinates (u,v) of point
Pc in the camera frame:
v
O
1. Convert Pc to image-plane p x
coordinates (x,y) y Zw Xw
Zc
C Xc W
2. Convert Pc to (discretised)
Yw
pixel coordinates (u,v) Yc [R|T]
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 39
ASL
Autonomous Systems Lab

Perspective projection | from camera frame to image plane

§  The Camera point Pc= ( Xc , 0 , Zc )T projects to p = (x, y) onto the image plane

§  From similar triangles: Pc=( Xc , 0 , Zc )T


Xc Image Plane
p
x Xc fX c Zc Xc
= ⇒x= C x
f Zc Zc O
f
Zc
§  Similarly, in the general case:
1. Convert Pc to image-plane
y Yc fYc coordinates (x,y)
= ⇒y=
f Zc Zc
2. Convert Pc to (discretised)
pixel coordinates (u,v)
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 40
ASL
Autonomous Systems Lab

Perspective projection | from camera frame to pixel coords.

§  Convert p from the local image plane coords (x,y) to the pixel coords (u,v), we need to account
for:
(0,0) Image plane
§  the pixel coords of the camera optical center O = (u0 , v0 ) u
§  scale factors ku , kv for the pixel-size in both dimensions
ku fX c v
§  So: u = u 0 + k u x ⇒ u = u 0 + O (u0,v0) x
Zc
kv fYc
v = v0 + kv y ⇒ v = v0 + y p
Zc

§  Use Homogeneous Coordinates for linear mapping from 3D to 2D, by introducing an extra
element (scale): ⎡ λu ⎤ ⎡ u ⎤
⎛u ⎞ ⎢ ⎥ ⎢ ⎥
p = ⎜⎜ ⎟⎟ p! = ⎢ λ v ⎥ = λ ⎢ v ⎥ and similarly for the world coordinates. Note, usually λ = 1
⎝v⎠ ⎢ λ ⎥ ⎢ 1 ⎥
⎣ ⎦ ⎣ ⎦
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 41
ASL
Autonomous Systems Lab

Perspective projection | from camera frame to pixel coords.


ku fX c
u = u0 +
Zc
k fY u
v = v0 + v c Zc
Zc Pc
v
§  Expressed in matrix form and homogeneous coordinates: O
⎡λ u ⎤ ⎡ k u f 0 u0 ⎤ ⎡ X c ⎤ p
⎢ λv ⎥ = ⎢ 0 f
⎢ ⎥ ⎢ kv f v0 ⎥⎥ ⎢⎢ Yc ⎥⎥
⎢⎣ λ ⎥⎦ ⎢⎣ 0 0 1 ⎥⎦ ⎢⎣ Z c ⎥⎦
C Xc
§  Or alternatively Yc
⎡λu ⎤ ⎡α u 0 u0 ⎤ ⎡ X c ⎤ ⎡Xc ⎤ Focal length in u-direction
⎢ λv ⎥ = ⎢ 0 α v0 ⎥⎥ ⎢⎢ Yc ⎥⎥ = K ⎢⎢ Yc ⎥⎥
⎢ ⎥ ⎢ v Focal length in v-direction
⎢⎣ λ ⎥⎦ ⎢⎣ 0 0 1 ⎥⎦ ⎢⎣ Z c ⎥⎦ ⎢⎣ Z c ⎥⎦
“Calibration matrix”/
“Matrix of Intrinsic Parameters”
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 42
ASL
Autonomous Systems Lab

Perspective projection | from the world to the camera frame

⎡ X c ⎤ ⎡ r11 r12 r13 ⎤ ⎡ X w ⎤ ⎡ t1 ⎤


⎢ Y ⎥ = ⎢r in homogeneous
r 22 r23 ⎥⎥ ⎢⎢ Yw ⎥⎥ + ⎢⎢t 2 ⎥⎥
⎢ c ⎥ ⎢ 21 coordinates
⎢⎣ Z c ⎥⎦ ⎢⎣ r31 r32 r33 ⎥⎦ ⎢⎣ Z w ⎥⎦ ⎢⎣t3 ⎥⎦ Pc , Pw
u
v
⎡ X c ⎤ ⎡ r11 r12 r13 t1 ⎤ ⎡ X w ⎤ ⎡ ⎤⎡ X w ⎤ O Zw Xw
⎢ Y ⎥ ⎢r p
⎢ c ⎥ = ⎢ 21 r22 r23 t 2 ⎥⎥ ⎢⎢ Yw ⎥⎥ ⎢⎢ R T ⎥⎥ ⎢⎢ Yw ⎥⎥
= Zc W
⎢ Z c ⎥ ⎢ r31 r32 r33 t3 Z w
⎥ ⎢ ⎥ ⎢ ⎥⎢ Zw ⎥ Yw
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ C Xc
⎣ 1 ⎦ ⎣0 0 0 1 ⎦ ⎣ 1 ⎦ ⎣0 0 0 1 ⎦ ⎣ 1 ⎦
Yc
Projection Matrix [R|T]
From the previous slide: ⎡ X ⎤
⎡u ⎤ ⎡Xc ⎤ ⎡ u ⎤ ⎢ w ⎥ Extrinsic
⎢ ⎥ ⎢ Y ⎥ Parameters
λ ⎢⎢ v ⎥⎥ = K ⎢⎢ Yc ⎥⎥ λ⎢ v ⎥ = K ⎡⎣ R T ⎤⎦⎢ w ⎥
⎢⎣1 ⎥⎦ ⎢⎣ Z c ⎥⎦ ⎢
⎣ 1 ⎥⎦ ⎢ Zw ⎥
⎢⎣ 1 ⎥⎦

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 43
ASL
Autonomous Systems Lab

Perspective projection | radial distortion

§  Amount of distortion is a non-linear


fn of distance from center of image

From ideal (u,v) to distorted pixel


coordinates (ud , vd):
§  Simple quadratic distortion model:

Radial
Distortion
parameter

§  Works well for most lenses


No distortion Barrel distortion Pincushion distortion
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 44
ASL
Autonomous Systems Lab

Camera Calibration

§  Finding the intrinsic parameters of your camera

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 45
ASL
Autonomous Systems Lab

Camera Calibration

§  Use camera model to interpret the projection from world to


image plane

§  Using known correspondences of p ó P, we can compute the


unknown parameters K, R, T by applying the perspective
projection equation
§  … so associate known, physical distances in the world to pixel-
distances in image Projection Matrix
⎡ X ⎤
⎡ u ⎤ ⎢ w ⎥
⎢ ⎥ ⎢ Y ⎥
λ ⎢ v ⎥ = K ⎡⎣ R T ⎤⎦⎢ w ⎥
⎢ 1 ⎥ ⎢ Zw ⎥
⎣ ⎦
⎢⎣ 1 ⎥⎦

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 46
ASL
Autonomous Systems Lab

Camera Calibration | direct linear transform (DLT)


⎡ Xw ⎤
⎡ u ⎤ ⎢ ⎥
⎢ ⎥ ⎢ Yw ⎥
§  We know that : λ ⎢ v ⎥ = K ⎡⎣ R T ⎤⎦⎢ ⎥
⎢ 1 ⎥ ⎢ Zw ⎥
⎣ ⎦
⎢⎣ 1 ⎥⎦ ⎡ X ⎤
⎡ ⎤ ⎡ ⎤
m13 m14 ⎢ w ⎥
§  So there are 11 values to estimate: ⎢ u ⎥ ⎢ m11 m12 ⎥⎢ Y ⎥
(the overall scale doesn’t matter, λ ⎢ v ⎥ = ⎢ m21 m22m23 m24 ⎥⎢ w ⎥
⎢ 1 ⎥ ⎢ m ⎥ Z
so e.g. m34 could be set to 1) ⎣ ⎦ ⎢⎣ 31 m32 m33 m34 ⎥⎦⎢ w ⎥
⎢⎣ 1 ⎥⎦
λu m X + m12Yi + m13 Z i + m14
§  Each observed point gives us a pair of equations: ui = i = 11 i
λ m31 + m32 + m33 + m34
λvi m21 X i + m22Yi + m23 Z i + m24
vi = =
λ m31 + m32 + m33 + m34

§  To estimate 11 unknowns, we need at least 6 points to calibrate the camera a solved using
linear least squares

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 47
ASL
Autonomous Systems Lab

Camera Calibration | direct linear transform (DLT)


⎡X ⎤ ⎡X w⎤
⎡u ⎤ ⎡ m11 m12 m13 m14 ⎤ ⎢ w ⎥ ⎢Y ⎥
Y
λ ⎢⎢ v ⎥⎥ = ⎢⎢m21 m22 m23 ⎥
m24 ⎥ ⎢ ⎥ = K [ R | T ]⎢ w ⎥
w

⎢Z ⎥ ⎢ Zw ⎥
⎢⎣1 ⎥⎦ ⎢⎣m31 m32 m33 m34 ⎥⎦ ⎢ w ⎥ ⎢ ⎥
1
⎣ ⎦ ⎣ 1 ⎦
§  what we obtained: the 3 x 4 projection matrix,
what we need: its decomposition into the camera calibration matrix K, and the rotation R and position T of
the camera.

§  Use QR factorization to decompose the 3x3 submatrix (m11:m33) into the product of an upper triangular matrix
K and a rotation matrix R (orthogonal matrix)
⎡ m14 ⎤
§  The translation T can subsequently be obtained by: T = K −1 ⎢m24 ⎥
⎢ ⎥
⎢⎣ m34 ⎥⎦

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 48
ASL
Autonomous Systems Lab

How do we measure distances with cameras?

learning.covcollege.ac.uk csiro.au

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 49
ASL
Autonomous Systems Lab

How do we measure distances with cameras?

§  From a single image: we can only deduct the ray along which each image-point lies
3D Object

Left Image Right Image

Left Right
Center of projection Center of projection

§  Observe the scene from 2 different viewpoints a solve for the intersection of the rays
and recover the 3D structure
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 50
ASL
Autonomous Systems Lab

How do we measure distances with cameras?

§  Stereo vision:
using 2 cameras with known relative position T and orientation R, recover the 3D
scene information

§  Structure from Motion:


recover the 3D scene structure & the camera poses (up to scale) from multiple
images, from potentially unknown cameras

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 51
ASL
Autonomous Systems Lab

Disparity in the human retina

§  Stereopsys: the brain allows us to see the left and right retinal images as a single 3D image
§  Observe image disparity

disparity

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 52
ASL
Autonomous Systems Lab

Stereo Vision | simplified case

§  An ideal, simplified case: assume both cameras are identical and are aligned with
the x-axis
Z P = ( X P , YP , Z P )

From Similar Triangles:

f ul
=
Z P XP bf
ZP = ZP
f ur u l −u r
= ul ur
Z P X P −b
f X

Disparity Cl Cr
difference in image location of the projection of a b
3D point in two image planes
Baseline
distance between the optical centers
Autonomous Mobile Robots of the two cameras
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 53
ASL
Autonomous Systems Lab

Stereo Vision | disparity map

§  Disparity: the difference in image location of corresponding 3D points as projected to the left
& right images

§  Disparity map holds the disparity value


at every pixel:

§  Identify correspondent points of all


image pixels in the original images Left image Right image

§  Compute the disparity for each pair Disparity map


of correspondences

§  Near-by objects experience bigger disparity:


bf
ZP =
u l −u r
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 54
ASL
Autonomous Systems Lab

Stereo Vision | general case

§  Two identical cameras do not exist in nature!


§  Aligning both cameras on a horizontal axis is very difficult
§  In order to use a stereo camera, we need:
§  relative pose between the cameras (rotation, translation), and ð “extrinsics”

§  the focal length, optical center, radial distortion of each ð “intrinsics”


a Use calibration method

= ( X w , Yw , Z w )

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart
(R,T) Perception II | 55
ASL
Autonomous Systems Lab

Stereo Vision | general case

§  To estimate the 3D position of Pw we construct the system of equations of the left and
right camera
= ( X w , Yw , Z w )

⎡ Xw ⎤
⎡ u ⎤ ⎢ ⎥
⎢ l ⎥ ⎢
⎡ Yw ⎥
Left camera: p! l = λl ⎢ vl ⎥ = K l ⎢⎣ I 0 ⎤⎥⎢ ⎥
⎢ ⎥ ⎦
set the world frame to coincide ⎢ Zw ⎥
(R,T) ⎢⎣ 1 ⎥⎦ ⎢ ⎥
with the left camera frame ⎣ 1 ⎦
§  Triangulation: the problem of determining ⎡ u ⎤
⎡ Xw ⎤
⎢ ⎥
the 3D position of a point, given a set of ⎢ r ⎥ ⎢ Yw ⎥
! ⎡ ⎤
corresponding image locations & known Right camera: pr = λr vr = K r ⎢⎣ R T ⎥⎦⎢
⎢ ⎥ ⎥
⎢ ⎥ ⎢ Zw ⎥
camera poses. ⎢⎣ 1 ⎥⎦ ⎢ ⎥
⎣ 1 ⎦
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 56
ASL
Autonomous Systems Lab

Correspondence Search | the problem

§  goal: identify image regions / patches in the left & right images, corresponding to the same
scene structure
§  Typical similarity measures: Normalized Cross-Correlation (NCC) , Sum of Squared Differences
(SSD), Sum of Absolute Differences (SAD), …
§  Exhaustive image search can be computationally very expensive!
Can we search for correspondences more efficiently?

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 57
ASL
Autonomous Systems Lab

Correspondence Search | the epipolar constraint

§  Potential matches for pL have to lie on the corresponding epipolar line


§  Epipolar line: the projection of the infinite ray that connects CL and pL , onto the image
plane of CR (and vice versa)
§  Epipole: the projection of the optical center CL onto the image plane of CR (and vice
versa)
§  Epipolar plane is defined by pL , CL , CR
Corresponding epipolar line for pL
a pR lies on this line

Impose the epipolar


pL constraint to aid the search
for correspondences

eL CR
Autonomous Mobile Robots
CL
Margarita Chli, Nick Lawrance and Roland Siegwart eR Perception II | 58
ASL
Autonomous Systems Lab

Correspondence Search | the epipolar constraint

§  Thanks to the epipolar constraint, corresponding points can be searched for, along
epipolar lines
⇒ computational cost reduced to 1 dimension!

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 59
ASL
Autonomous Systems Lab

Epipolar Rectification

§  In practice, it is convenient if image scan-lines are the epipolar lines


§  Epipolar rectification: warps image pairs into new “rectified” images, whose
epipolar lines are parallel & collinear (aligned to the baseline)

Image from Left Camera Image from Right Camera

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 60
ASL
Autonomous Systems Lab

Epipolar Rectification

§  In practice, it is convenient if image scan-lines are the epipolar lines


§  Epipolar rectification: warps image pairs into new “rectified” images, whose
epipolar lines are parallel & collinear (aligned to the baseline)

Image from Left Camera Image from Right Camera


Rotation

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 61
ASL
Autonomous Systems Lab

Epipolar Rectification

§  In practice, it is convenient if image scan-lines are the epipolar lines


§  Epipolar rectification: warps image pairs into new “rectified” images, whose
epipolar lines are parallel & collinear (aligned to the baseline)

Image from Left Camera Image from Right Camera


Rotation

Focal lengths

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 62
ASL
Autonomous Systems Lab

Epipolar Rectification

§  In practice, it is convenient if image scan-lines are the epipolar lines


§  Epipolar rectification: warps image pairs into new “rectified” images, whose
epipolar lines are parallel & collinear (aligned to the baseline)

Image from Left Camera Image from Right Camera


Rotation

Focal lengths

Lens Distortion

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 63
ASL
Autonomous Systems Lab

Epipolar Rectification

§  In practice, it is convenient if image scan-lines are the epipolar lines


§  Epipolar rectification: warps image pairs into new “rectified” images, whose
epipolar lines are parallel & collinear (aligned to the baseline)

Image from Left Camera Image from Right Camera


Rotation

Focal lengths

Lens Distortion

Translation
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 64
ASL
Autonomous Systems Lab

Epipolar Rectification

§  In practice, it is convenient if image scan-lines are the epipolar lines


§  Epipolar rectification: warps image pairs into new “rectified” images, whose
epipolar lines are parallel & collinear (aligned to the baseline)

Image from Left Camera Image from Right Camera


Rotation

Focal lengths

Lens Distortion

Translation
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 65
ASL
Autonomous Systems Lab

Epipolar Rectification | example

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 66
ASL
Autonomous Systems Lab

Triangulation

•  Given the projections p1 and p2 of a 3D point P in two or more images (with known camera
matrices R and T), find the coordinates of the 3D point

P=?

p2
p1

C1 C2
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
[R|T] Perception II | 67
ASL
Autonomous Systems Lab

Triangulation

•  We want to intersect the two visual rays corresponding to p1 and p2, but because of noise and
numerical errors, they don’t meet exactly

P=?

p2
p1

C1 C2
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
[R|T] Perception II | 68
ASL
Autonomous Systems Lab

Triangulation | geometric approach

•  Find shortest segment connecting the two viewing rays and let P be the midpoint of that
segment

p2
p1

C1 C2
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
[R|T] Perception II | 69
ASL
Autonomous Systems Lab

Triangulation | linear approach

Left camera:
⎡ X ⎤
⎡ u ⎤ ⎢ w

⎢ 1 ⎥ ⎢ Yw ⎥
p!1 = K1 ⎡⎣ I 0⎤⎦ P! ⎡ ⎤
⇒ λ1 ⎢ v1 ⎥ = K1 ⎣ I 0⎦⎢ ⎥
⎢ ⎥ ⎢ Zw ⎥
⎢⎣ 1 ⎥⎦
M1 ⎢⎣ 1 ⎥⎦ P=?
Right Camera:
⎡ Xw ⎤
⎡ u ⎤ ⎢ ⎥
2
⎢ ⎥ Yw ⎥
p! 2 = K 2 ⎡⎣ R T ⎤⎦ P! ⎢
⇒ λ2 ⎢ v2 ⎥ = K 2 ⎡⎣ R T ⎤⎦⎢ ⎥
⎢ ⎥ Zw ⎥ p2
⎢⎣ 1 ⎥⎦ p1 ⎢
M2 ⎢⎣ 1 ⎥⎦

C1 C2
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart
[R|T] Perception II | 70
ASL
Autonomous Systems Lab

Triangulation | linear approach

Left camera:
Cross-multiply by p!1
0
p!1 = M1P! ⇒ M1P − p!1 = 0 ⇒ p!1 × M1P − p!1 × p!1 = 0 ⇒ [ p!1× ] M1P! = 0
! !

Right Camera:
Cross-multiply by p! 2
0
p! 2 = M 2 P! ⇒ M 2 P! − p! 2 = 0 ⇒ p! 2 × M 2 P! − p! 2 × p! 2 = 0 ⇒ [ p! 2× ] M 2 P! = 0

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 71
ASL
Autonomous Systems Lab

Triangulation | linear approach

Left camera:

p!1 = M1P! ⇒ M1P! − p!1 = 0 ⇒ p!1 × M1P! = 0 ⇒ [ p!1× ] M1P! = 0

Right Camera:

p! 2 = M 2 P! ⇒ M 2 P! − p! 2 = 0 ⇒ p! 2 × M 2 P! = 0 ⇒ [ p! 2× ] M 2 P! = 0

⎡ 0 −az ay ⎤⎡ bx ⎤
⎢ ⎥⎢ ⎥
Cross product as matrix multiplication: a × b = ⎢ az 0 −ax ⎥⎢ by ⎥ = [a × ]b
⎢ ⎥⎢ ⎥
⎢⎣ −ay ax 0 ⎥⎢ bz ⎥
Autonomous Mobile Robots
⎦⎣ ⎦
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 72
ASL
Autonomous Systems Lab

Triangulation | linear approach

Left camera:

p!1 = M1P! ⇒ M1P! − p!1 = 0 ⇒ p!1 × M1P! = 0 ⇒ [ p!1× ] M1P! = 0

Right Camera:

p! 2 = M 2 P! ⇒ M 2 P! − p! 2 = 0 ⇒ p! 2 × M 2 P! = 0 ⇒ [ p! 2× ] M 2 P! = 0

3 unknowns ( P ) : X w ,Yw , Z w
6 independent equations

Autonomous Mobile Robots


⇒ Solve system: A⋅P = 0 using SVD
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 73
ASL
Autonomous Systems Lab

Triangulation | non-linear approach

§  Find P that minimizes the Sum of Squared Reprojection


ReprojectionError:
Error
2 2
SSRE = p1 − M 1 P + p2 − M 2 P
§  In practice: initialize P using linear approach and then minimize SSRE using Gauss-
Newton or Levenberg-Marquardt
P=?

Reprojected point: M1P


p2 : Observed point
p1 : Observed point
Reprojected point: M2P

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart C1 C2 Perception II | 74
ASL
Autonomous Systems Lab

Triangulation | sensitivity to baseline

§  What’s the optimal baseline?


§  Too small:
§  Large depth error
§  Can you quantify the error as a function of the disparity?
§  Too large:
§  Object might be visible from one camera, but not the other
§  Difficult search problem for close objects

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Small Baseline Large Baseline Perception II | 75
ASL
Autonomous Systems Lab

SFM: Structure From Motion

§  Given image point correspondences, xi n xi/,


determine R and T

§  Simultaneously estimate both 3D geometry


(structure) and camera pose (motion):
/
solve via non-linear minimization of the x
reprojection errors. x

C
(R,T) /
C

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 76
Image courtesy of Nader Salman
ASL
Autonomous Systems Lab

Multi-view SFM

§  Results of Structure from motion from user images from flickr.com [Seitz & Szeliski, ICCV 2009]

Colosseum, Rome San Marco square, Venice


2,106 images, 819,242 points 14,079 images, 4,515,157 points
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 77
ASL
Autonomous Systems Lab

Optical Flow -- the most general (and challenging) version of motion


estimation: compute the motion of each pixel
§  Optical flow: vector field representing
motion of parts of an image in subsequent
frames

§  It computes the motion vectors of pixels in


the image (or a subset of them to be faster)

By minimizing the brightness/color difference


between corresponding pixels, summer over
the image

§  Explain motion w.r.t. pixel intensity -- does it


always work? Link
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 78
ASL
Autonomous Systems Lab

Event-based Cameras

§  Dynamic Vision Sensor (DVS)


§  Similar to the human retina:
captures intensity changes asynchronously instead of capturing image
frames at a fixed rate
ü  Low power
ü  High temporal resolution è tackle motion blur
ü  High dynamic range

Effects of motion blur

Work with Wilko Schwarting and Dragos Stanciu


Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 79
ASL
Autonomous Systems Lab

Event-based Cameras for Robot Navigation

Work with I. Alzugaray, W.


Schwarting & D. Stanciu
Autonomous Mobile Robots
Margarita Chli, Nick Lawrance and Roland Siegwart Perception II | 80
ASL
Autonomous Systems Lab

Autonomous Mobile Robots


Margarita Chli, Nick Lawrance and Roland Siegwart www.donparrish.com/FavoriteOpticalIllusion.html Perception II | 81

You might also like