Professional Documents
Culture Documents
Ol T
Or f el er
Image Reprojection
reproject image planes onto common plane parallel to line between optical centers
Notice, only focal point of camera really matters
(Seitz)
Stereo Correspondences
Left scanline
Right scanline
Dynamic Programming: Assume ordering constraint: a matched pair means no other matches can occur to their left on either scanline
Three cases:
Sequential add cost of match (small if intensities agree) Occluded add cost of no match (large cost) Disoccluded add cost of no match (large cost)
Start
Dis-occluded Pixels
Right scanline
End
Dynamic Programming
i =1
States: 1
12 22
32
i=2 i=3
Ct 1
Ct
Ct +1
Dynamic Programming
i =1 i=2 i=3
1
12 22 32
j=2
Ct 1
Ct
Ct +1
Ct ( j ) = min i ( ij + Ct 1 (i ))
Algorithm
Decide which cost is minimum Start from end
Structure-from-Motion
Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving camera.
Equivalently, we can think of the world as moving and the camera as fixed.
Like stereo, but the position of the camera isnt known (and its more natural to use many images with little motion between them, not just two with a lot of motion).
We may or may not assume we know the parameters of the camera, such as its focal length.
Structure-from-Motion
As with stereo, we can divide problem:
Correspondence. Reconstruction.
Cases to consider
Moving camera (moving human) Moving objects in the scene General case: Both are moving
Structure-from-Motion
Reconstruction
A lot harder than with stereo. Start with simpler case: scaled orthographic projection (weak perspective).
Recall, in this we remove the z coordinate and scale all x and y coordinates the same amount. We will consider a fixed camera and a moving scene
x1 y1 P = z1 1
x2 xn y2 . . . yn z2 zn 1 1
Then:
I = SP
Objects represented as a moving set of points These are projected into the image Projection is represented with Matrix multiplication take inner product between each row of S and each point. First row of S produces X coordinates, while second row produces Y. Projection model is scaled orthographic here from now on we ignore third row of S. Translation occurs with tx and ty; tz does not matter Scaling encoded with a scale factor in S. The rest of S accounts for rotation
Examples: S = [s, 0, 0, 0; 0, s, 0, 0]; This is just projection, with scaling by s. S = [s, 0, 0, s*tx; 0, s, 0, s*ty]; This is translation by (tx,ty,something), projection, and scaling.
Structure-from-Motion
S encodes:
Projection: only two lines Scaling, since S can have a scale factor. Translation, by tx and ty. Rotation:
I = SP
Rotation
r1,1 r2,1 r 3,1 r1,3 r2,3 P r3,3
cos R= sin
x2 y2 . .
sin cos
. xn yn
cos sin
sin x1 cos y1
Why does multiplying points by R rotate them? rows of R are basis vectors of a new coordinate system Taking inner products of each point with these expresses that point in that coordinate system. This means rows of R must be orthonormal vectors (orthogonal unit vectors). points (1,0) and (0,1) go to (cos , -sin ), and (sin , cos ). They remain orthonormal, and rotate clockwise by theta. Any other point, (a,b) can be written as a(1,0) + b(0,1). So R(a(1,0)+b(0,1) = Ra(1,0) + Ra(0,1) = aR(1,0) + bR(0,1). So its in the same position relative to the rotated coordinates that it was in before rotation relative to the x, y coordinates. That is, its rotated.
Simple 3D Rotation
cos sin 0 sin cos 0 0 x 0 y 1 z x y z . . . x y z
n n n
Full 3D Rotation
cos R = sin 0 sin cos 0 0 cos 0 0 sin 1 0 sin 1 0 1 0 0 cos 0 sin 0 cos 0 sin cos
Any rotation can be expressed as combination of three rotations about three axes.
1 0 0 RR = 0 1 0 0 0 1
T
Rows (and columns) of R are orthonormal vectors. R has determinant 1 (not -1).
3D rotations can be expressed as 3 separate rotations about fixed axes. Rotations have 3 degrees of freedom; two describe an axis of rotation, and one the amount. Order of rotations matter Typically rotations described using Euler angles Rotations preserve the length of a vector, and the angle between two vectors. Axes, (1,0,0), (0,1,0), (0,0,1) must remain orthonormal after rotation. After rotation, they are the three columns of R. So these columns must be orthonormal vectors for R to be a rotation. Same reasoning as 2D tells us all other points rotate too. If R has determinant -1, then R is a rotation plus a reflection.
Putting it Together
Scale
1 0 0 t r 1 0 0 s 0 1 0 t r 0 1 0 0 0 1 t 0 Projection
x y z
3D Translation r
1,1
2 ,1
r r r
1, 2
2,2
r r r
1, 3
2,3
3 ,1
3, 2
3,3
0 0 P 0 1
3D Rotation
s s s s where
1,1 2 ,1 1,1 1, 2
1, 2
s s
1, 3
2,2
2,3
st P st
x y
(s , s , s ) (s , s , s ) = 0
1, 3 2 ,1 2,2 2,3
(s , s , s ) = (s , s , s )
1,1 1, 2 1, 3 2 ,1 2,2 2,3
s s t s P s s s t where (s , s , s ) (s , s , s ) = 0
1,1 1, 2 1, 3 x 2 ,1 2,2 2,3 y 1,1 1, 2 1, 3 2 ,1 2,2 2,3
(s , s , s ) = (s , s , s )
1,1 1, 2 1, 3 2 ,1 2,2 2,3
u v u v
1 1 1
u v u v
1 2 1 2 2 2 2
1 2
1 2
u s v s = s u v s
1 n 1 n 2 n 2 n
1,1 1
2 ,1 2
1,1 2
s s s s
1, 2 1
2,2 2
1, 2 2 2,2
s s s s
1, 3 1
2,3 2
1, 3 2 2,3
2 ,1
x y z t 1 t t t
1 x 1 y 2 x 2 y
x y
z 1
x y z 1
n n n
x y z 1
y z 1
y z 1
x 0 y 0 = z 0 1 1
4 4 4
1 0 0 0 1 0 0 0 1 1 1 1
u v u v
1 2 1 2 2 2 2
1 2
1 2
u s v s = s u v s
1 n 1 n 2 n 2 n
1,1 1
s s s s
1, 2 1 2,2 2
s s s s
1, 3 1 2,3 2
2 ,1 2
1,1 2
1, 2 2 2,2
1, 3 2 2,3
2 ,1
t x t y t z t 1
1 x 1 y 2 x 2 y
z 1
x y z 1
n n n
u v u v
1 1
1 2
1 2
u v u v
1 2 1 2 2 2 2
u v u v
1 3 1
3 2
3 2
u v u
s s =s v s
1 4 1 4 2 4 2 4
1,1 1
2 ,1 2
1,1 2
2 ,1
s s s s
1, 2 1
2,2 2
1, 2 2
2,2
s s s s
1, 3 1
2,3 2
1, 3 2
2,3
t t t t
1 x 1 y 2 x 2 y
0 0 0 1
1 0 0 1
0 1 0 1
0 0 1 1
We can solve for motion by inverting matrix of points. Or, explicitly, we see that first column on left (images of first point) give the translations. After solving for these, we can solve for the each column of the s components of the motion using the images of each point, in turn.
u s v s = u s v s
1 k 1 k 2 k 2 k
1,1 1
1, 2 1
1, 3 1
2 ,1 2
2, 2 2
2, 3 2
1,1 2
2 ,1
s s
1, 2 2
2, 2
s s
1, 3 2
2, 3
t a t b t c t 1
1 x k 1 y k 2 x k 2 y
Once we know the motion, we can use the images of another point to solve for the structure. We have four linear equations, with three unknowns.
Affine Structure-from-Motion: Two Frames (6) Suppose we just know where the kth point is in image 1.
u s s s t a v s s s t b = ? s s s t c ? s s s t 1 Then, we can use the first two equations to write a and b as linear
1 1 1 1 1 k 1,1 1 1, 2 1 1, 3 1 x k 1 1 k 2,1 2 2, 2 2 2, 3 2 y k 2 1,1 2 1, 2 2 1, 3 2 x k 2 2,1 2, 2 2, 3 y
k k
in ck. The final two equations lead to two linear equations in the missing values and ck. If we eliminate ck we get one linear equation in the missing values. This means the unknown point lies on a known line. That is, we recover the epipolar constraint. Furthermore, these lines are all parallel.
x y A z 1
1 1 1
x x x 0 y y y 0 = z z z 0 1 1 1 1
2 3 4 2 3 4 2 3 4
1 0 0 0 1 0 0 0 1 1 1 1
u v u v
1 2 1 2 2 2 2
1 2
1 2
u s v s = s u s v
1 n 1 n 2 n 2 n
1,1 1
2 ,1 2
s s s s
1, 2 1
2,2 2
s s s s
1, 3 1
2,3 2
1,1 2
1, 2 2
1, 3 2
2 ,1
2,2
2,3
t x t y t z 1 t
1 x 1 y 2 x 2 y
x y
z 1
x y z 1
n n n
We have:
u v u v
u v u v
1 1 1
1 1 1
1 2
u v u v
u v u v
1 2 1 2 2 2 2 2
1 2 1 2 2 2 2
1 2
u s v s = s u v s
1 n 1 n 2 n 2 n
1 1 n 1,1 1 1
1,1 1
2 ,1 2
1,1 2
s s s s
1 1, 2 1
1, 2 1
2,2 2
1, 2 2 2,2
s s s s
1 1, 3 1
1, 3 1
2,3 2
1, 3 2 2,3
2 ,1
x y A A z 1 t t t t
1 x 1 y 1 2 x 2 y
1 x 1
x y z
x y z
1
. . .
1
n n n
n
And:
1 2
1 2
u s v s = s u v s
n 2 n 2 n
s s s s
s s s s
2 ,1 2
2,2 2
2,3 2
1,1 2
1, 2 2
1, 3 2
2 ,1
2,2
2,3
t t A t t
y 2 x 2 y
0 0 0 1
1 0 0 0 1 0 0 0 1 1 1 1
x y z 1
n n
u v u v
1 2 1 2 2 2 2
Given:
1 2
1 2
u s v s = u s v s
1 n 1 n 2 n 2 n
1,1 1
s s s s
1, 2 1 2,2 2
s s s s
1, 3 1 2,3 2
2 ,1 2
1,1 2
1, 2 2
1, 3 2
2 ,1
2,2
2,3
t t A t t
1 x 1 y 2 x 2 y
0 0 0 1
1 0 0 0 1 0 0 0 1 1 1 1
x y z 1
n n n
s s s s
1,1 1
2 ,1 2
1,1 2
s s s s
1, 2 1
2,2 2
1, 2 2 2,2
s s s s
1, 3 1
2,3 2
1, 3 2 2,3
2 ,1
A t t t t
1 x 1 y 2 x 2 y
u v u v
1
1 1
1 2
1 2
u . . . u s v v s = s u u v v s
1 1 2 n 1 1 2 n 2 2 2 n 2 2 2 n
1,1 1
s s s s
1 n
1, 2 1
s s s s
1 1,1 1
1, 3 1
2 ,1 2
2, 2 2
2, 3 2
1,1 2
1, 2 2
1, 3 2
2 ,1
2, 2
2, 3
t x t y t z t 1
1 x 1 1 y 1 2 x 1 2 y
1 1, 2 1
x . . . x y y
2 n 2 n
z 1
2
z 1
n
1
u v u v
1 1
u v u v
1 2 1 2 2 2 2
1 2
1 2
u s v s = u s v s
1 n 2 n 2 n
2 ,1 2
s s
2,2 2
s s
1, 3 1
2,3 2
1,1 2
2 ,1
s s
1, 2 2
2,2
s s
1, 3 2
2,3
t t A t t
1 x 1 y 2 x 2 y
x y A z 1
x y
z 1
x y z 1
n n n
v s = s u s v
1 n 1 n 2 n 2 n
1,1 1
1, 2 1
1, 3 1
2,1 2
1,1 2
s s s
2, 2 2
1, 2 2
s s s
2, 3 2
1, 3 2
2,1
2, 2
2, 3
t y y A A t z z t 1 1
1 x 1 2 1 y 1 1 2 2 x 1 2 2 y
x y z 1
n n n
Note that A has the form: A corresponds to translation of the points, plus a linear transformation.
a a a a a a a a A= a a a a 0 0 0 1
1,1 1, 2 1, 3 1, 4 2 ,1 2, 2 2, 3 2, 4 3, 1 3, 2 3, 3 3, 4
For example, there is clearly a translational ambiguity in recovering the points. We cant tell the difference between two point sets that are identical up to a translation when we only see them after they undergo an unknown translation. Similarly, theres clearly a rotational ambiguity. The rest of the ambiguity is a stretching in an unknown direction.