You are on page 1of 70

Autonomous Mobile Robots

"Position"
Localization Cognition
Global Map

Environment Model Path


Local Map

Perception Real World Motion Control


Environment

Perception
Sensors
Vision
Uncertainties, Line extraction from laser scans

Zürich Autonomous Systems Lab


Lecture 5 - Perception - Vision
Lec. 5
2
One picture, a thousand words
 Of all our senses, vision is the most powerful in aiding our perception of
the 3D world around us.

 Retina is ~1000mm2. Contains millions of photoreceptors


(120 mil. rods and 7 mil. Cones for colour sampling)

 Provides enormous amount of information: data-rate of ~3GBytes/s


 a large proportion of our brain power is dedicated to processing the
signals from our eyes

http://webvision.med.utah.edu/sretina.html
© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL
Lecture 5 - Perception - Vision
Lec. 5
3
Human Visual Capabilities

 Our visual system is very sophisticated


 Humans can interpret images successfully under a wide range of
conditions – even in the presence of very limited cues

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
4
Do we get it always right?

Count the black dots

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
5
Do we get it always right?

Which square is darker, A or B?

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
6
Do we get it always right?

Which square is darker, A or B?

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
7
Vision for Robotics

 Enormous descriptability of images


 a lot of data to process (human vision involves 60 billion neurons!)

 Not sensible to copy the biology, but learn from it

 Capture light  Convert to digital image


 Process to get „salient‟ information

 Vision is increasingly popular as a sensing


modality:
 compactness,
 compatibility,
 low cost, …
 HW advances necessary to support the
processing of images

Image: Nicholas M. Short

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
8
Computer Vision

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
9
Connection to other disciplines

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
10
Applications of Computer Vision

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
11
Applications of Computer Vision

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
12
Today‟s Topics

Section 4.2 in the book

 Pinhole Camera Model

 Perspective Projection

 Stereo Vision

 Optical Flow

 Color Tracking

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
13
The camera

Sony Cybershot WX1

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
14
How do we see the world?

object film

 Place a piece of film in front of an object


 Do we get a reasonable image?

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
15
Pinhole camera

object barrier film

 Add a barrier to block off most of the rays


 This reduces blurring
 The opening known as the aperture

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
16
Camera obscura
 Basic principle known to Mozi (470-390
BC), Aristotle (384-322 BC)
 Drawing aid for artists: described by
Leonardo da Vinci (1452-1519)
 Depth of the room (box) is the
effective focal length

Solar eclipse
http://www.thelivingmoon.com/45jack_files/03files/Launch_Sites_Baikonur_Tour.html
© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL
Lecture 5 - Perception - Vision
Lec. 5
17
Pinhole camera model

 Pinhole model:
 Captures beam of rays – all rays through a single point
 The point is called Center of Projection or Optical Center
 The image is formed on the Image Plane

 We will use the pinhole camera model to describe how the image is
formed
Slide by Steve Seitz

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
18
Home-made pinhole camera

Why so
blurry?

http://www.debevec.org/Pinhole/

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
19
Shrinking the aperture

 Why not make the aperture as small as possible?


 Less light gets through (must increase the exposure)
 Diffraction effects…
Images courtesy of Steve Seitz

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
20
Shrinking the aperture

Images courtesy of Steve Seitz


© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL
Lecture 5 - Perception - Vision
Lec. 5
21
Why use a lens?
 Ideal pinhole: only one ray of light reaches each point
 image can be very dim
 Why not make the pinhole bigger?  aperture

 A lens can focus multiple rays coming from the same point

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
22
Why use a lens?
 Ideal pinhole: only one ray of light reaches each point
 image can be very dim
 Why not make the pinhole bigger?  aperture

 A lens can focus multiple rays coming from the same point

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
23
Image Formation using a converging lens

object Lens film

 A lens focuses light onto the film


 Rays passing through the optical center are not deviated

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
24
Image formation using a converging lens

Lens
object

Focal Point
Optical Axis

Focal Length: f

 A lens focuses light onto the film


 Rays passing through the center are not deviated
 All rays parallel to the Optical Axis converge at the Focal Point

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
25
Thin lens equation

Lens
Object

A Focal Point

B
Image
f
z e

 Similar Triangles:
B e

A z

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
26
Thin lens equation

Lens
Object

A A Focal Point

B
Image
f
z e

B e f e “Thin lens equation”


 Similar Triangles:   1
A f f e e 1 1 1
1    
B e f z f z e

A z

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
27
Thin lenses

Lens
Object

Focal Point

Image
f

z e
1 1 1
 Thin lens equation:  
f z e
 Any object point satisfying this equation is in focus
 “Depth from Focus”: use this to estimate (roughly) the distance to the object

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
28
“In focus”

Lens film
object

Optical Axis Focal Point

“Circle of Confusion”
f or
“Blur Circle”

 There is a specific distance at which objects are “in focus”


 other points project to a “blur circle” in the image

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
29
Blur Circle
Lens
Focal Plane

Object Image Plane


f
L
Blur Circle of radius R

z e

L
 Object is out of focus  Blur Circle has radius: R 
2e
 A minimal L (pinhole) gives minimal R
 For objects out of focus, larger aperture gives worse blur
 Adjust camera settings, such that R remains smaller than the image resolution

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
30
Blur Circle
Lens
Focal
Plane
Object Image Plane
f
L
Blur Circle of radius R
e' L 1 1 1
z e R ,  

2e f z e
 Let e’=1.2, L=0.2 and f=0.5. If z=1 then R = ? e  1,   0.2, R  0.02
R
0.14

 Same setup, but now 0.12

 z=2 then R = ? R  0.08


0.1

0.08
 z=10 then R = ? R  0.117 0.06

 z=11 then R = ? R  0.129 0.04

0.02
0 2 4 6 8 10 12 14
z
 Increased sensitivity to blurring when object is close to the lens

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
31
From Pinhole to Perspective Camera
Object

h
Object C C
h'

f Image Image

z e z f
C = “optical center”, “center of projection”

 Adjust the image plane so that objects at infinity are in focus


So: z  f , z  L 1 1
1 1 1   f e
  f e
f z e
h' f f
 Assuming that f stays constant:   h'  h
h z z
The dependence of the apparent size of objects on their distance from the
observer is known as perspective
© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL
Lecture 5 - Perception - Vision
Lec. 5
32
Playing with Perspective

 Perspective gives us very strong depth cues  hence we can


perceive a 3D scene by viewing its 2D representation (i.e.
image)
 When viewing 3D scenes, it is possible to be mislead by
perception

“Ames room”

A clip from "The computer


that ate Hollywood"
documentary. Dr. Vilayanur
S. Ramachandran.

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
33
Perspective Camera
Zc = optical axis
Zc
Pc
u 

O = principal point
v
O
p
Image plane (CCD)
f

C Xc
C = optical center = center of the lens
Yc
 For convenience, the image plane is usually represented in front so that the
image preserves the same orientation (i.e. not flipped)

 Note: a camera does not measure distances but angles!


 a camera is a “bearing sensor”
© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL
Lecture 5 - Perception - Vision
Lec. 5
34
Perspective Projection: from world to pixel coords

Find pixel coordinates (u,v) of


point Pw in the world frame:
P c Pw
u
0. Convert world point Pw to
v camera point Pc
O
p x
Find pixel coordinates (u,v) of
y Zw
Zc Xw point Pc in the camera frame:
C Xc
W
1. Convert Pc to image-plane
Yc Yw
coordinates (x,y)

[R|T]
2. Convert Pc to (discretised)
pixel coordinates (u,v)

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
35
Perspective Projection (1)
From the Camera frame to the image plane

 The 3D camera point Pc=( Xc , 0 , Zc )T projects to point p=(x, y)


onto the image plane
Pc=( Xc , 0 , Zc )T
Xc
p
 Analysing similar triangles: Zc Xc
x Xc fX c C x
 x O
f Zc Zc
f Image Plane

 So, for the 3D case, we can also obtain:


y Yc fY 1. Convert Pc to image-plane
 y c
f Zc Zc coordinates (x,y)

2. Convert Pc to (discretised)
pixel coordinates (u,v)
© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL
Lecture 5 - Perception - Vision
Lec. 5
36
Perspective Projection (2)
(0,0) u Image plane
From the Camera frame to pixel coordinates

 Convert p, from the local image plane coords (x,y) v


to the pixel coords (u,v) we have to take account for: O (u0,v0) x
 Pixel coords of the camera optical center O  (u0 , v0 )
 Scale factors for the pixel-size in both dimensions ku , kv y p

So: ku fX c
u  u0  ku x  u  u0 
Zc
kv fYc
v  v0  kv y  v  v0 
Zc
 Use Homogeneous Coordinates for linear mapping from 3D to 2D, by
introducing an extra element (scale):

 u 
u   
p    ~
p   v  and similarly for the world coordinates. Note, usually  1
v 
 
© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL
Lecture 5 - Perception - Vision
Lec. 5
37
Perspective Projection (3)
ku fX c
u  u0  Zc
Zc
 So: Pc
k fY u
v  v0  v c
Zc
v
Expressed in matrix form & Homogenerous coords: O
p
 u   k u f 0 u0   X c  f Image plane (CCD)
 v    0 kv f v0   Yc 
   C
    0 0 1   Z c  Xc

Yc
Or alternatively

u   u 0 u0   X c  Xc  Focal length in u-direction


 v    0  v0   Yc   K  Yc 
   v Focal length in v-direction
    0 0 1   Z c   Z c 
“Calibration matrix”
or
“Matrix of Intrinsic Parameters” © R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL
Lecture 5 - Perception - Vision
Lec. 5
38
Perspective Projection (4)
From the Camera frame to the World frame Pc
u Pw
 X c   r11 r12 r13   X w   t1  v
 Y   r in homogeneous
r23   Yw   t 2 
O Zw Xw
 c   21 r 22 coords p
 Z c  r31 r32 r33   Z w  t3  W
Zc Yw
C Xc
 X c   r11 r12 r13 t1   X w    X w  Yc
 Y  r t 2   Yw   R T   Yw  [R|T]
 c    21 r22 r23

 Z c  r31 r32 r33 
t3 Z w     Zw  Extrinsic Parameters
       
 1  0 0 0 1   1  0 0 0 1   1 

Projection Matrix
X w
u  Xc   
u Y 
  v   K  Yc    v   K R T  w 
 
 Zw 
1   Z c  1   
 1 

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
39
Radial distortion
 Straight lines in the world  Straight lines in the image?

From ideal (u,v) to distorted


pixel coordinates (ud , vd)

 Simple distortion model:

Radial Distortion
parameter
Barrel distortion Pincushion distortion

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
40
Camera Calibration
 Use our camera model to interpret the projection from world to image plane

 Using known correspondences of p  P, we can compute the unknown parameters


K, R, T by applying the perspective projection equation
 … so associate known, physical distances in the world to pixel-distances in image
Projection Matrix
X w
u  Y 
  v   K R T  w 
 Zw 
1   
 1 

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
41 Camera Calibration
X w
u  Y 
 We know that :  v  K R T  w 
 
   Zw 
1   
 1  Xw
u   m11 m12 m13 m14   
 So there are 11 values to estimate:
  v   m21 m22   Yw 
(the overall scale doesn’t matter, so m23 m24 
 Zw 
e.g. m34 could be set to 1) 1  m31 m32 m33 
m34   
 1 

 Each observed point gives us a pair of equations:


ui m11 X i  m12Yi  m13Z i  m14
ui  
 m31  m32  m33  m34
v m X  m22Yi  m23Z i  m24
vi  i  21 i
 m31  m32  m33  m34

 To estimate 11 unknowns, we need at least 6 points to calibrate the


camera  solved using linear least squares

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
42
Camera Calibration
X  X w
u   m11 m12 m13 m14   w  Y 
  v   m21 m22  Y
m23 m24     K [ R | T ] w 
w

Z   Zw 
1  m31 m32 m33 m34   w   
 
1  1 
 what we obtained: the 3x4 projection matrix,
what we need: its decomposition into the camera calibration matrix K, and the
rotation R and position T of the camera.

 Use QR factorization to decompose the 3x3 submatrix (m11:33) into the product of
an upper triangular matrix K and a rotation matrix R (orthogonal matrix)
 m14 
 The translation T can subsequently be obtained by: T  K 1 m24 
 
 m34 

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
43
How do we measure distances with cameras?
 Impossible to capture 3D structure from a single image. We can only
deduct the ray on which each image point lies.
3D Object

Left Image Right Image

 Observe scene from 2 different viewpoints  solve for the intersection of


the rays and recover the 3D structure
© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL
Lecture 5 - Perception - Vision
Lec. 5
44
Disparity in the human retina

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
45
How do we measure distances with cameras?

 Structure from stereo (Stereo-vision):


• use two cameras with known relative position and orientation

 Structure from motion:


• use a single moving camera: both 3D structure and camera motion
can be estimated up to a scale

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
46
Stereo Vision - The simplified case
 An ideal, simplified case: assume both cameras are identical and are
aligned on a horizontal axis
From Similar Triangles:
Z P  ( X P , YP , Z P )
f u
 l
ZP XP bf
ZP 
f ur ul u r

ZP Z P b X P
ul ur
Disparity
f difference in image location of the projection
Cl Cr of a 3D point in two image planes
X
b
Baseline
distance between the optical centers of
the two cameras

 “Triangulation”: the intersection of rays to estimate the scene depth

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
47
Disparity map

 Disparity = difference in image


location of the projection of a 3D
point in two image planes
 Disparity map holds the Left image Right image
disparity value at every pixel:
 Find the correspondent points of
all image pixels of the original
images

 Compute the disparity for each


pair of correspondences

 Usually visualised in gray-scale


images
Disparity map

 Close objects experience bigger disparity  appear brighter in disparity map

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
48
Stereo Vision facts

bf
ZP 
ul u r

 Depth is inversely proportional to disparity (ul u r )


 Foreground objects have bigger disparity than background objects
 Disparity is proportional to stereo-baseline b
 The smaller the baseline, the smaller the disparities, so the more uncertain
our estimate of scene depth
 Note: increasing b some objects may be visible from only one camera
 The projections of a single 3D point onto the left and the right stereo
images are called ‘correspondence pair’

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
49
Stereo Vision – the general case
 Two identical cameras do not exist in nature!
 Aligning both cameras on a horizontal axis is very hard

( R, T )

 In order to be able to use a stereo camera, we need the


 relative pose between the cameras (rotation, translation), and
 the focal length, optical center, radial distortion of each
 Use a calibration method, as mentioned before

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
50
Stereo Vision – the general case
 To estimate the 3D position of Pw we just construct the system of equations
of the left and right camera

 ( X w , Yw , Z w )

( R, T ) ul  Xw
pl  l  vl   K l  Yw 
Left camera: ~
set the world frame to coincide
with the left camera frame  1   Z w 

ur  Xw
Right camera: pr  r  vr   K r R  Yw   T
~
 1   Z w 

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
51
Correspondence Problem
 Which patch in the Left image, corresponds to the projection of the same 3D
scene point on the Right image?
 Correspondence search: test the query-patches at pixel positions in the other
image.
 Typical similarity measures are the Normalised Cross-Correlation (NCC)
and Sum of Squared Differences (SSD)
 Exhaustive image search can be computationally very expensive!
Can we make the correspondence search in 1D?

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
52
Epipolar Geometry
 The epipolar plane is defined by a 3D point P and the optical centers.
P  ( x, y, z )

Epipolar Line 1
Epipolar Line 2

p1  (u1 , v1 ) p2  (u2 , v2 )

C1 E1 E2 C2

epipoles
 Impose the epipolar constraint to aid matching: search for a
correspondence along the epipolar line

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
53 Correspondence Problem: Epipolar Constraint

 Thanks to the epipolar constraint, conjugate points can be searched


along epipolar lines: this reduces the computational cost to 1 dimension!

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
54
Epipolar Rectification
 Determine a transformation of each image-plane so that pairs of conjugate
epipolar lines become collinear and parallel to one of the image axes
(usually the horizontal one)

Image from Left Camera Image from Right Camera

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
55
Epipolar Rectification
 Determine a transformation of each image-plane so that pairs of conjugate
epipolar lines become collinear and parallel to one of the image axes
(usually the horizontal one)

Rotation
Image from Left Camera Image from Right Camera

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
56
Epipolar Rectification
 Determine a transformation of each image-plane so that pairs of conjugate
epipolar lines become collinear and parallel to one of the image axes
(usually the horizontal one)

Rotation
Image from Left Camera Image from Right Camera

Focal lengths

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
57
Epipolar Rectification
 Determine a transformation of each image-plane so that pairs of conjugate
epipolar lines become collinear and parallel to one of the image axes
(usually the horizontal one)

Rotation
Image from Left Camera Image from Right Camera

Focal lengths

Lens Distortion

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
58
Epipolar Rectification
 Determine a transformation of each image-plane so that pairs of conjugate
epipolar lines become collinear and parallel to one of the image axes
(usually the horizontal one)

Rotation
Image from Left Camera Image from Right Camera

Focal lengths

Lens Distortion

Translation

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
59
Epipolar Rectification
 Determine a transformation of each image-plane so that pairs of conjugate
epipolar lines become collinear and parallel to one of the image axes
(usually the horizontal one)

Rotation
Image from Left Camera Image from Right Camera

Focal lengths

Lens Distortion

Translation

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
60
Stereo Vision Output: 3D Reconstruction via triangulation
Z

80

60

40

20

-20

-40

-60

-80

400
Y
0
50
100
150
350

X
200

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
61
Stereo Vision - summary
3D Object

Left Image Right Image

1. Stereo camera calibration  compute camera relative pose


2. Epipolar rectification  align images & epipolar lines
3. Search for correspondences
4. Output: compute stereo triangulation or disparity map
5. Consider how baseline & image resolution affect accuracy of depth
estimates
© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL
Lecture 5 - Perception - Vision
Lec. 5
62
Structure from motion
 Given image point correspondences, xi n xi/,determine R and T
 Rotate and translate camera until stars of rays intersect
 At least 5 point correspondences are needed (for calibrated cameras)

x x/
C
C/
(R,T)
© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL
Lecture 5 - Perception - Vision
Lec. 5
63
Multiple-view structure from motion

Image courtesy of Nader Salman

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
64
Multiple-view structure from motion
 Results of Structure from motion from user images from flickr.com
[Seitz, Szeliski ICCV 2009]

Colosseum, Rome San Marco square, Venice


2,106 images, 819,242 points 14,079 images, 4,515,157 points

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
65
Optical Flow (1)

 Optical flow: vector field representing motion of parts of an image in


subsequent frames

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
66
Optical Flow

 It computes the motion vectors of all pixels in the image (or a subset of
them to be faster)

 Applications include collision avoidance

 Does it always work?

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
67
Color Tracking
 Motion estimation of ball and robot for soccer playing using color tracking

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
68
Color segmentation with fixed thesholds

 The simplest way of doing this is using constant thresholding:


a given pixel point is selected if and only if its RGB values (r,g,b) fall simultaneously in the
desired range:

 Alternatively, use YUV color space.


 R, G, and B values encode the intensity of each color, YUV separates the color (or
chrominance captured in U and V) measure from the brightness (or luminosity captured
in Y) measure.

 thresholding in YUV space can achieve greater stability to illumination changes than in
RGB space.

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
69
Color Tracking in RoboCup
 A typical play

© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL


Lecture 5 - Perception - Vision
Lec. 5
70

Image from http://www.donparrish.com/FavoriteOpticalIllusion.html


© R. Siegwart , M.Chli and D. Scaramuzza, ETH Zurich - ASL

You might also like