Lecture16 Stereo

Stereopsis-III Motion -I
Epipolar Geometry for Parallel Cameras
Ol T
Or f el er
P Epipoles are at infinity Epipolar lines are parallel to the baseline
We can always achieve this geometry with image rectification
Image Reprojection
reproject image planes onto common plane parallel to line between optical centers
Notice, only focal point of camera really matters
(Seitz)
Stereo Correspondences
Left scanline
Right scanline
Dynamic Programming: Assume ordering constraint: a matched pair means no other matches can occur to their left on either scanline
Possibilities at each pixel pair

At any pixel all possibilities are to the right pixels on the left have already been matched Thus any global optimal match will contain a local optimal match Match pixel pair Goodness of match determined by some cost-function Or, we cannot match pixels because:
1. Occlusion in the right image 2. Occlusion in the left image
Possibilities at each match Const.
Search Over Correspondences

Occluded Pixels Left scanline
Right scanline Disoccluded Pixels
Three cases:
Sequential add cost of match (small if intensities agree) Occluded add cost of no match (large cost) Disoccluded add cost of no match (large cost)
Stereo Matching with Dynamic Programming

Occluded Pixels Left scanline Dynamic programming yields the optimal path through grid. This is the best set of matches that satisfy the ordering constraint
Start
Dis-occluded Pixels
Right scanline
End
Dynamic Programming
i =1
States: 1
12 22
32
i=2 i=3
Ct 1
Ct
Ct +1
Suppose cost can be decomposed into stages:
ij = Cost of going from state i to state j
Dynamic Programming
i =1 i=2 i=3
1
12 22 32
j=2
Ct 1
Ct
Ct +1
Principle of Optimality for an n-stage assignment problem:
Ct ( j ) = min i ( ij + Ct 1 (i ))
Algorithm
Decide which cost is minimum Start from end
Structure-from-Motion
Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving camera.
Equivalently, we can think of the world as moving and the camera as fixed.
Like stereo, but the position of the camera isnt known (and its more natural to use many images with little motion between them, not just two with a lot of motion).
We may or may not assume we know the parameters of the camera, such as its focal length.
As with stereo, we can divide problem:
Correspondence. Reconstruction.
Again, well talk about reconstruction first.

Assume that each image contains some points, we know which points match which
Cases to consider
Moving camera (moving human) Moving objects in the scene General case: Both are moving
We must relate motion in the real world with motion in images
Reconstruction
A lot harder than with stereo. Start with simpler case: scaled orthographic projection (weak perspective).
Recall, in this we remove the z coordinate and scale all x and y coordinates the same amount. We will consider a fixed camera and a moving scene
First: Represent motion

Well talk about a fixed camera, and moving object. Key point: Some matrix Points
x1 y1 P = z1 1
x2 xn y2 . . . yn z2 zn 1 1
r1,1 r1,2 S = r2,1 r2,2 r 3,1 r3,2
r1,3 t x r2,3 t y r3,3 t z
The image (ignoring z) X X . . . X I = Y Y Y

1 2 n 1 2 n
Then:
I = SP
Objects represented as a moving set of points These are projected into the image Projection is represented with Matrix multiplication take inner product between each row of S and each point. First row of S produces X coordinates, while second row produces Y. Projection model is scaled orthographic here from now on we ignore third row of S. Translation occurs with tx and ty; tz does not matter Scaling encoded with a scale factor in S. The rest of S accounts for rotation
Examples: S = [s, 0, 0, 0; 0, s, 0, 0]; This is just projection, with scaling by s. S = [s, 0, 0, s*tx; 0, s, 0, s*ty]; This is translation by (tx,ty,something), projection, and scaling.
S encodes:
Projection: only two lines Scaling, since S can have a scale factor. Translation, by tx and ty. Rotation:
I = SP
Rotation
r1,1 r2,1 r 3,1 r1,3 r2,3 P r3,3
r1, 2 r2, 2 r3, 2
Represents a 3D rotation of the points in P.
First, look at 2D rotation (easier)

Matrix R acts on points by rotating them.
cos R= sin
x2 y2 . .
sin cos
. xn yn
cos sin
sin x1 cos y1
Also, RRT = Identity. RT is also a rotation matrix, in the opposite direction to R.
Why does multiplying points by R rotate them? rows of R are basis vectors of a new coordinate system Taking inner products of each point with these expresses that point in that coordinate system. This means rows of R must be orthonormal vectors (orthogonal unit vectors). points (1,0) and (0,1) go to (cos , -sin ), and (sin , cos ). They remain orthonormal, and rotate clockwise by theta. Any other point, (a,b) can be written as a(1,0) + b(0,1). So R(a(1,0)+b(0,1) = Ra(1,0) + Ra(0,1) = aR(1,0) + bR(0,1). So its in the same position relative to the rotated coordinates that it was in before rotation relative to the x, y coordinates. That is, its rotated.
Simple 3D Rotation
cos sin 0 sin cos 0 0 x 0 y 1 z x y z . . . x y z
n n n
Rotation about z axis. Rotates x,y coordinates. Leaves z coordinates fixed.
Full 3D Rotation
cos R = sin 0 sin cos 0 0 cos 0 0 sin 1 0 sin 1 0 1 0 0 cos 0 sin 0 cos 0 sin cos
Any rotation can be expressed as combination of three rotations about three axes.
1 0 0 RR = 0 1 0 0 0 1
T
Rows (and columns) of R are orthonormal vectors. R has determinant 1 (not -1).
3D rotations can be expressed as 3 separate rotations about fixed axes. Rotations have 3 degrees of freedom; two describe an axis of rotation, and one the amount. Order of rotations matter Typically rotations described using Euler angles Rotations preserve the length of a vector, and the angle between two vectors. Axes, (1,0,0), (0,1,0), (0,0,1) must remain orthonormal after rotation. After rotation, they are the three columns of R. So these columns must be orthonormal vectors for R to be a rotation. Same reasoning as 2D tells us all other points rotate too. If R has determinant -1, then R is a rotation plus a reflection.
Putting it Together
Scale
1 0 0 t r 1 0 0 s 0 1 0 t r 0 1 0 0 0 1 t 0 Projection
x y z
3D Translation r
1,1
2 ,1
r r r
1, 2
2,2
r r r
1, 3
2,3
3 ,1
3, 2
3,3
0 0 P 0 1
3D Rotation
s s s s where
1,1 2 ,1 1,1 1, 2
1, 2
s s
1, 3
2,2
2,3
st P st
x y
(s , s , s ) (s , s , s ) = 0
1, 3 2 ,1 2,2 2,3
We can just write stx as tx and sty as ty.
(s , s , s ) = (s , s , s )
1,1 1, 2 1, 3 2 ,1 2,2 2,3
Affine Structure from Motion
s s t s P s s s t where (s , s , s ) (s , s , s ) = 0
1,1 1, 2 1, 3 x 2 ,1 2,2 2,3 y 1,1 1, 2 1, 3 2 ,1 2,2 2,3
(s , s , s ) = (s , s , s )
1,1 1, 2 1, 3 2 ,1 2,2 2,3
Affine Structure-from-Motion: Two Frames (1)
u v u v
1 1 1
u v u v
1 2 1 2 2 2 2
1 2
1 2
u s v s = s u v s
1 n 1 n 2 n 2 n
1,1 1
2 ,1 2
1,1 2
s s s s
1, 2 1
2,2 2
1, 2 2 2,2
s s s s
1, 3 1
2,3 2
1, 3 2 2,3
2 ,1
x y z t 1 t t t
1 x 1 y 2 x 2 y
x y
z 1
x y z 1
n n n
To make things easy, suppose:
x y z 1
y z 1
y z 1
x 0 y 0 = z 0 1 1
4 4 4
1 0 0 0 1 0 0 0 1 1 1 1

u v u v
1 1 1
u v u v
1 2 1 2 2 2 2
1 2
1 2
u s v s = s u v s
1 n 1 n 2 n 2 n
1,1 1
s s s s
1, 2 1 2,2 2
s s s s
1, 3 1 2,3 2
2 ,1 2
1,1 2
1, 2 2 2,2
1, 3 2 2,3
2 ,1
t x t y t z t 1
1 x 1 y 2 x 2 y
z 1
x y z 1
n n n
Looking at the first four points, we get:
u v u v
1 1
1 2
1 2
u v u v
1 2 1 2 2 2 2
u v u v
1 3 1
3 2
3 2
u v u
s s =s v s
1 4 1 4 2 4 2 4
1,1 1
2 ,1 2
1,1 2
2 ,1
s s s s
1, 2 1
2,2 2
1, 2 2
2,2
s s s s
1, 3 1
2,3 2
1, 3 2
2,3
t t t t
1 x 1 y 2 x 2 y
0 0 0 1
1 0 0 1
0 1 0 1
0 0 1 1
Affine Structure-from-Motion: Two Frames (4) u u u u s s s t 0 1 0 0 v v v v s s s t 0 0 1 0 = u u u u s s s t 0 0 0 1 1 1 1 1 v v v v s s s t

1 1 1 1 1 1 1 1 1 2 3 4 1,1 1 1, 2 1 1, 3 1 x 1 1 1 1 1 1 2 3 4 2 ,1 2 2, 2 2 2, 3 2 y 2 2 2 2 2 1 2 3 4 1,1 2 1, 2 2 1, 3 2 x 2 2 2 2 2 1 2 3 4 2 ,1 2, 2 2, 3 y
We can solve for motion by inverting matrix of points. Or, explicitly, we see that first column on left (images of first point) give the translations. After solving for these, we can solve for the each column of the s components of the motion using the images of each point, in turn.
u s v s = u s v s
1 k 1 k 2 k 2 k
1,1 1
1, 2 1
1, 3 1
2 ,1 2
2, 2 2
2, 3 2
1,1 2
2 ,1
s s
1, 2 2
2, 2
s s
1, 3 2
2, 3
t a t b t c t 1
1 x k 1 y k 2 x k 2 y
Once we know the motion, we can use the images of another point to solve for the structure. We have four linear equations, with three unknowns.
Affine Structure-from-Motion: Two Frames (6) Suppose we just know where the kth point is in image 1.
u s s s t a v s s s t b = ? s s s t c ? s s s t 1 Then, we can use the first two equations to write a and b as linear
1 1 1 1 1 k 1,1 1 1, 2 1 1, 3 1 x k 1 1 k 2,1 2 2, 2 2 2, 3 2 y k 2 1,1 2 1, 2 2 1, 3 2 x k 2 2,1 2, 2 2, 3 y
k k
in ck. The final two equations lead to two linear equations in the missing values and ck. If we eliminate ck we get one linear equation in the missing values. This means the unknown point lies on a known line. That is, we recover the epipolar constraint. Furthermore, these lines are all parallel.

But, what if the first four points arent so simple? Then we define A so that:
x y A z 1
1 1 1
x x x 0 y y y 0 = z z z 0 1 1 1 1
2 3 4 2 3 4 2 3 4
1 0 0 0 1 0 0 0 1 1 1 1
This is always possible as long as the points arent coplanar.

Then, given:
u v u v
1 1 1
u v u v
1 2 1 2 2 2 2
1 2
1 2
u s v s = s u s v
1 n 1 n 2 n 2 n
1,1 1
2 ,1 2
s s s s
1, 2 1
2,2 2
s s s s
1, 3 1
2,3 2
1,1 2
1, 2 2
1, 3 2
2 ,1
2,2
2,3
t x t y t z 1 t
1 x 1 y 2 x 2 y
x y
z 1
x y z 1
n n n
We have:
u v u v
u v u v
1 1 1
1 1 1
1 2
u v u v
u v u v
1 2 1 2 2 2 2 2
1 2 1 2 2 2 2
1 2
u s v s = s u v s
1 n 1 n 2 n 2 n
1 1 n 1,1 1 1
1,1 1
2 ,1 2
1,1 2
s s s s
1 1, 2 1
1, 2 1
2,2 2
1, 2 2 2,2
s s s s
1 1, 3 1
1, 3 1
2,3 2
1, 3 2 2,3
2 ,1
x y A A z 1 t t t t
1 x 1 y 1 2 x 2 y
1 x 1
x y z
x y z
1
. . .
1
n n n
n
And:
1 2
1 2
u s v s = s u v s
n 2 n 2 n
s s s s
s s s s
2 ,1 2
2,2 2
2,3 2
1,1 2
1, 2 2
1, 3 2
2 ,1
2,2
2,3
t t A t t
y 2 x 2 y
0 0 0 1
1 0 0 0 1 0 0 0 1 1 1 1
x y z 1
n n

u v u v
1 1 1
u v u v
1 2 1 2 2 2 2
Given:
1 2
1 2
u s v s = u s v s
1 n 1 n 2 n 2 n
1,1 1
s s s s
1, 2 1 2,2 2
s s s s
1, 3 1 2,3 2
2 ,1 2
1,1 2
1, 2 2
1, 3 2
2 ,1
2,2
2,3
t t A t t
1 x 1 y 2 x 2 y
0 0 0 1
1 0 0 0 1 0 0 0 1 1 1 1
x y z 1
n n n
Then we just pretend that:
s s s s
1,1 1
2 ,1 2
1,1 2
s s s s
1, 2 1
2,2 2
1, 2 2 2,2
s s s s
1, 3 1
2,3 2
1, 3 2 2,3
2 ,1
A t t t t
1 x 1 y 2 x 2 y
is our motion, and solve as before.

This means that we can never determine the exact 3D structure of the scene. We can only determine it up to some transformation, A. Since if a structure and motion explains the points:
u v u v
1
1 1
1 2
1 2
u . . . u s v v s = s u u v v s
1 1 2 n 1 1 2 n 2 2 2 n 2 2 2 n
1,1 1
s s s s
1 n
1, 2 1
s s s s
1 1,1 1
1, 3 1
2 ,1 2
2, 2 2
2, 3 2
1,1 2
1, 2 2
1, 3 2
2 ,1
2, 2
2, 3
t x t y t z t 1
1 x 1 1 y 1 2 x 1 2 y
1 1, 2 1
x . . . x y y
2 n 2 n
z 1
2
z 1
n
1
So does another image of the form:
u v u v
1 1
u v u v
1 2 1 2 2 2 2
1 2
1 2
u s v s = u s v s
1 n 2 n 2 n
2 ,1 2
s s
2,2 2
s s
1, 3 1
2,3 2
1,1 2
2 ,1
s s
1, 2 2
2,2
s s
1, 3 2
2,3
t t A t t
1 x 1 y 2 x 2 y
x y A z 1
x y
z 1
x y z 1
n n n
Affine Structure-from-Motion: Two Frames (11) u u . . . u s s s t x x . . .

v v u u v v
1 1 1 1 2 1 2 1 1 2 1 2 2 2 2
v s = s u s v
1 n 1 n 2 n 2 n
1,1 1
1, 2 1
1, 3 1
2,1 2
1,1 2
s s s
2, 2 2
1, 2 2
s s s
2, 3 2
1, 3 2
2,1
2, 2
2, 3
t y y A A t z z t 1 1
1 x 1 2 1 y 1 1 2 2 x 1 2 2 y
x y z 1
n n n
Note that A has the form: A corresponds to translation of the points, plus a linear transformation.
a a a a a a a a A= a a a a 0 0 0 1
1,1 1, 2 1, 3 1, 4 2 ,1 2, 2 2, 3 2, 4 3, 1 3, 2 3, 3 3, 4
For example, there is clearly a translational ambiguity in recovering the points. We cant tell the difference between two point sets that are identical up to a translation when we only see them after they undergo an unknown translation. Similarly, theres clearly a rotational ambiguity. The rest of the ambiguity is a stretching in an unknown direction.

Lecture16 Stereo

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture16 Stereo

Uploaded by

Copyright:

Available Formats

Stereopsis-III Motion -I

Epipolar Geometry for Parallel Cameras

P Epipoles are at infinity Epipolar lines are parallel to the baseline

We can always achieve this geometry with image rectification

Possibilities at each pixel pair

Possibilities at each match Const.

Search Over Correspondences

Right scanline Disoccluded Pixels

Stereo Matching with Dynamic Programming

Suppose cost can be decomposed into stages:

ij = Cost of going from state i to state j

Principle of Optimality for an n-stage assignment problem:

Again, well talk about reconstruction first.

We must relate motion in the real world with motion in images

First: Represent motion

r1,1 r1,2 S = r2,1 r2,2 r 3,1 r3,2

r1,3 t x r2,3 t y r3,3 t z

The image (ignoring z) X X . . . X I = Y Y Y

r1, 2 r2, 2 r3, 2

Represents a 3D rotation of the points in P.

First, look at 2D rotation (easier)

Also, RRT = Identity. RT is also a rotation matrix, in the opposite direction to R.

Rotation about z axis. Rotates x,y coordinates. Leaves z coordinates fixed.

We can just write stx as tx and sty as ty.

Affine Structure from Motion

Affine Structure-from-Motion: Two Frames (1)

Affine Structure-from-Motion: Two Frames (2)

To make things easy, suppose:

Affine Structure-from-Motion: Two Frames (3)

Looking at the first four points, we get:

Affine Structure-from-Motion: Two Frames (4) u u u u s s s t 0 1 0 0 v v v v s s s t 0 0 1 0 = u u u u s s s t 0 0 0 1 1 1 1 1 v v v v s s s t

Affine Structure-from-Motion: Two Frames (5)

Affine Structure-from-Motion: Two Frames (7)

This is always possible as long as the points arent coplanar.

Affine Structure-from-Motion: Two Frames (8)

Affine Structure-from-Motion: Two Frames (9)

Then we just pretend that:

is our motion, and solve as before.

Affine Structure-from-Motion: Two Frames (10)

So does another image of the form:

Affine Structure-from-Motion: Two Frames (11) u u . . . u s s s t x x . . .

You might also like