You are on page 1of 40

Motion

Computer Vision: motion

Erhardt Barth

Institut für Neuro- und Bioinformatik


Universität zu Lübeck

June 4, 2020

1 E. Barth
Motion fields
Motion
Local motion estimation

What is motion?
Defining motion:
Motion is a powerful feature of image sequences that relates
spatial image features to temporal changes.

From a temporal sequence of 2-D images, the only accessible


motion parameter is the optical flow v, which is an approximation
of the 2-D motion field u, which is the projection of the 3-D
motion field w of points in the scene onto the image sensor.
The optical flow v can be used for
motion detection and segmentation
motion compensation and motion-based data compression
3-D scene reconstruction
autonomous navigation (of robots and cars)
tracking
analysis of dynamical processes in scientific applications.
2 E. Barth
Motion fields
Motion
Local motion estimation

Optical flow
The notion of optical flow was introduced by J.J. Gibson, the
founder of ecological psychology.

This is the optical flow


that we generate by
egomotion when
moving towards a wall
at different angles.

What does the optical flow depend on?


egomotion induces a characteristic image pattern, which is
influenced by the shape and position of objects in the scene
motions of objects in the scene disturb the egomotion pattern
local image features determine to what extent motion is
measurable
3 E. Barth
Motion fields
Motion
Local motion estimation

Information in the optical flow


Examples of information that one would like to extract from the
optical flow:
egomotion parameters (velocity and direction of motion)
relative motions of image parts for segmentation
structure from motion: two frames taken at positions P1 and
P2 can be used like a stereo pair with baseline P2 − P1
time-to-contact for obstacle avoidance
Examples of behavior based on optical flow:
Many animals make use of the optical flow. For example,
honeybees adjust their speed of flight according to the optical
flow generated by their flight (they fly slower in a narrower
tube than in a wider tube - see Page 19).
Robots, cars, and airplanes can make use of such information
to navigate.
4 E. Barth
Motion fields
Motion
Local motion estimation

Notations for the different motion fields

w(X, Y, Z) ∈ R3 :
3-D motion field: relative motion between observer (camera) and a
fixed point in 3D space

u(x, y) ∈ R2 :
2-D motion field: projection of the 3-D motion field on the image
plane (only the motion vectors of visible points are projected)

v(x, y) ∈ R2 :
2-D optical flow : an approximation of the 2-D motion field that is
estimated from image intensities

5 E. Barth
Motion fields
Motion
Local motion estimation

3D motion field (I)


The following equation describes the position of a point P (e.g.
center of projection of the camera) relative to a fixed point P0 :
P(t) = A(t)(P0 − k(t)) (1)

The motion of a rigid body (here camera) can always be


decomposed in a translation and a rotation - see section on
image formation.
k(t) is the trajectory of the moving point (it describes the
translation of the point) and A(t) is a matrix that describes
the rotation.
The 3-D motion field is obtained (in camera coordinates) by
computing the time derivative (using the product rule):
w(P(t)) = A0 (t)(P0 − k(t)) − A(t)k0 (t) (2)
6 E. Barth
Motion fields
Motion
Local motion estimation

3D motion field (II)


At t = 0 (we are looking at the instantaneous flow) we obtain (by
assuming, without loss of generality, that A(0) is the unit matrix
and k(0)=0):
w(P0 ) = A0 (0)P0 − k0 (0) (3)

k’(0) is the infinitesimal translation of the point (the


derivative of the trajectory)
A’(0) is the infinitesimal rotation and it is of the form:
 
0 ω3 −ω2
−ω3 0 ω1  (4)
ω2 −ω1 0

The axis of rotation is the vector (ω1 , ω2 , ω3 )T and we denote


the norm of the vector by ω.
7 E. Barth
Motion fields
Motion
Local motion estimation

3D motion field (II++)

Note that from the expression (4) on the previous slide (obtained
by a Taylor expansion of A(0)) it follows that −A0 = A0T .
Furthermore, for any vector x, we have:
 
x2 ω3 − x3 ω2
A0 x = x3 ω1 − x1 ω3  = x × (ω1 , ω2 , ω3 )T , (5)
x1 ω2 − x2 ω1
i.e., we can replace matrix multiplication with vector product.

8 E. Barth
Motion fields
Motion
Local motion estimation

3D motion field (III)


If we denote with T the unit translation vector, with v the scalar
speed of translation, and with R the (unit vector) rotation axis we
obtain (from Eq. 3):
w(P) = vT − ωR × P (6)
Note that the translation (induced by egomotion) does not depend
on position P, whereas the rotation depends on both the axis of
rotation and the point in space.

The figure illustrates the


rotational component.
Note that w is
perpendicular on R and
P, and proportional to the
length of P and sin(α).

9 E. Barth
Motion fields
Motion
Local motion estimation

2D motion field (I)


Remember that the perspective projection
  of a point
1 X
P(t) = (X, Y, Z)T is p(t) = − .
Z Y
By differentiating p = (x, y) with respect to t we obtain the
projection of the 3D motion field as
d d X(t) 1 dX dZ
x(t) = (− )=− ( + x ) and
dt dt Z(t) Z dt dt
d 1 dY dZ
y(t) = − ( + y ).
dt Z dt dt
d
By noting that dt P(0) = w = (w1 , w2 , w3 ), we finally obtain the
projection equation
   
1 w1 x
u=− ( + w3 ). (7)
Z w2 y

10 E. Barth
Motion fields
Motion
Local motion estimation

2D motion field (II)

3D point → projection  2D point


P = (X, Y, Z) 1 X p = (x, y)
p(t) = −
Z Y

d d
↓ ↓
dt dt

3D motion → projection
    2D motion
w(P) 1 w1 x u(p)
u=− ( + w3 )
Z w 2 y

11 E. Barth
Motion fields
Motion
Local motion estimation

2D motion field (III)


What do we learn from the projection equation?
   
1 w1 x
u=− ( + w3 )
Z w2 y

The 2D velocity depends inversely on depth Z, thus closer


objects (seem to) move faster.
The 2D motion field contains information not only about the
motion of 3D points, but also about the geometry of objects
in the scene (Z).
The equation is linear in the 3-D motion field w. Therefore
different 3D motions generate 2D motion fields that
superimpose linearly (as rotations and translations do).
The motion and the geometry of objects in 3D can only be
recovered up to a scale factor (we can scale P and w without
changing u).
12 E. Barth
Motion fields
Motion
Local motion estimation

2D motion field induced by camera rotation


We start with the 3D motion field
 (Eq. 6) 
R2 Z − R3 Y
w(X, Y, Z) = −ωR × P = −ω R3 X − R1 Z  .
R1 Y − R2 X
Now we use the projection equation (7) to obtain the 2D motion
field
   
ω R2 Z − R3 Y x
u(x, y) = ( + (R1 Y − R2 X) ),
Z R 3 X − R1 Z y

and move to 2D coordinates only (since 3D coordinates occur as


X Y
ratios and )
Z Z
1 + x2
     
xy y
u(x, y) = ω(−R1 2 + R2 + R3 ). (8)
1+y xy −x

13 E. Barth
Motion fields
Motion
Local motion estimation

Names for rotations

pitch (Nicken)
yaw (Gieren)
roll (Rollen)

14 E. Barth
Motion fields
Motion
Local motion estimation

Examples of camera rotations

roll: R = (0,
 0, 1), ω = 1, and
y
u(x, y) =
−x

yaw: R = (0, 1, 0),ω = 1, and


1 + x2
u(x, y) =
xy

15 E. Barth
Motion fields
Motion
Local motion estimation

2D motion field induced by camera translation

We start with the 3D motion field (Eq. 6)


w(X, Y, Z) = −vT = −v(T1 , T2 , T3 )T .
Now we use the projection equation (7) to obtain the 2D motion
field    
v T1 x
u(x, y) = ( + T3 ). (9)
Z T2 y
The FOE (focus of expansion) F is the point where the motion
field vanishes and exists if T3 6= 0. Since from u = 0 it follows that
T1 + T3 x0 = 0 and T2 + T3 y0 = 0, the equation that defines the
FOE is    
x0 1 T1
=− := F (10)
y0 T3 T2

16 E. Barth
Motion fields
Motion
Local motion estimation

Examples of camera translations

Flying towards a wall (straight


ahead). Note that the FOE is in
the center of the image.

Flying towards a wall and to the


left. Note that the FOE is on
the left.

17 E. Barth
Motion fields
Motion
Local motion estimation

Optical flow summary

The total equation for the 2-D motion field induced by egomotion
is (putting together Eqs. 8 and 9): u(x, y) =

1 + x2
         
v T1 x xy y
( +T3 )+ω(−R1 2 +R2 +R3 ).
Z T2 y 1+y xy −x
(11)
Rigid motion can be separated into translation and rotation.
The translational field depends on depth Z and can thus be
used to infer 3D structure.
The rotational field does not contain information about 3D
structure.
The focus-of-expansion (FOE) can be used to infer the
direction of heading.

18 E. Barth
Motion fields
Motion
Local motion estimation

How honeybees make grazing landings

The already mentioned simple


principle of keeping the optical
flow constant during flight can
help the bees to make a smooth
landing (figure and text taken
from [2]).

Merriam-Webster: to graze = to touch lightly in passing

19 E. Barth
Motion fields
Motion
Local motion estimation

How can we estimate the optical flow?


The problems are similar to those we have seen in stereo vision.
Correspondence: which element of a frame corresponds to
which element in the next frame?
Reconstruction: given a number of correspondences, what can
we say about the 3-D motion of the objects?
However, the problem is not well posed since the optical flow v
and the 2D motion field u can differ as shown below.

A sphere without
structure will not
generate optical flow
when is rotates but a
moving light source
might.

20 E. Barth
Motion fields
Motion
Local motion estimation

The aperture problem


The figure below illustrates that, at straight edges, different local
motions are valid when observing a particular displacement of the
edge.

Summarizing this and the previous slide, we note that the optical
flow cannot be estimated at both constant and straight image
features (which have intrinsic dimension 0 and 1 respectively).
21 E. Barth
Motion fields
Motion
Local motion estimation

Biological motion sensors


The figure below shows two elementary motion detectors with two
sensors (shown at the top) at 2 different spatial positions.

The Reichardt detector (left) is based on time-delay units τ


and multiplications M .
The detector on the right is based on the ratio of temporal
and spatial derivatives of image intensity1 .
1
The ’−’ just indicates spatial derivatives by discrete differences.
22 E. Barth
Motion fields
Motion
Local motion estimation

A local model of optical flow


A common assumption on optical flow is that the image brightness
I(x, y, t) at a point (x, y) and at time t should only change
because of object motion, i.e., the total time derivative = 0,
leading to the brightness-change constraint equation (BCCE)
dI ∂I dx ∂I dy ∂I
= + + = 0, (12)
dt ∂x dt ∂y dt ∂t
which can be written as:
∇xy I T v + It = 0 (13)
dx dy
since v = (vx , vy ) = ( , ). ∇xy I is the spatial gradient of I
dt dt
and It the temporal derivative. Another way of formulating the
same constraint is to require that all changes in intensity are due
to translations only, i.e., that I(x, y, t) can be written as
I 0 (x − tvx , y − tvy ). Note that in this case, I has intrinsic
dimension 2.
23 E. Barth
Motion fields
Motion
Local motion estimation

BCCE derived by approximation


We assume, that the brightness of an object does not change with
time. If such an object moves with dx and dy in time dt, we can
approximate I(x, y, t) by a truncated Taylor-series expansion:

∂I ∂I ∂I
I(x + dx, y + dy, t + dt) = I(x, y, t) + dx + dy + dt (14)
∂x ∂y ∂t
The above assumption implies that

I(x + dx, y + dy, t + dt) = I(x, y, t)

∂I ∂I ∂I
and it follows from (14) that dx + dy + dt = 0, and, as
∂x ∂y ∂t
on previous slide, that

∇xy I T v + It = 0.

24 E. Barth
Motion fields
Motion
Local motion estimation

How to solve the BCCE


Because the BCCE provides only one equation for two unknowns,
we sum a norm of the BCCE in a local neighborhood (assuming
that the flow is constant there!) to obtain more constraints and
search for the velocity that minimizes the term:
v = arg min (h ∗ (∇xy I T v + It )2 ) = arg min (E), (15)
v v
where h is a convolution kernel.
Minimization with standard least-squares (the partial derivatives of
E with vx and vy must equal zero) leads to the solution:
v = A−1 b (16)
Ix2
   
Ix Iy Ix It
with A = h ∗ and b = h ∗ .
Ix Iy Iy2 Iy It

So, v is obtained with BCCE and local weighted least-squares. No


solution exists if A cannot be inverted, i.e., if det(A) = 0.
25 E. Barth
Motion fields
Motion
Local motion estimation

Tensor methods

The image shows a movie,


and a (x, t) section thereof, in
which a cloud pattern moves
rightwards. In the (x, y, t)
space, the motion generates a
direction of constant
brightness r, which is related
(r1 , r2 )
to v by v = .
r3
Since r is perpendicular to the gradient ∇I = (Ix , Iy , It ), r can be
found as

r = arg min ||E||2 , ||E||2 = h ∗ (rT ∇I∇I T r). (17)


r, rT r=1

26 E. Barth
Motion fields
Motion
Local motion estimation

Motion and structure tensor


Under the assumption that r (and thus v) are constant (at least in
the region defined by h), we obtain
||E||2 = rT h ∗ (∇I∇I T )r = rT Jr (18)
where J is our well-known structure tensor
 2 
Ix Ix Iy Ix It
J = h ∗ Ix Iy Iy2 Iy It  . (19)
Ix It Iy It It2
Thus, our problem can be solved by minimizing rT Jr under the
additional constraint rT r = 1 (to avoid the solution r = 0).
By using Lagrange multipliers, one obtains the system of equations
Jr = λr and ||E||2 = rT Jr = rT λr = λ for the minimizing r.

So, the minimum is reached if r is the eigenvector that corresponds


to the minimum eigenvalue of J.
27 E. Barth
Motion fields
Motion
Local motion estimation

The need for confidence measures

But how do we know that our motion model was correct?

In the left panel, the


motions of overlaid
gratings generate a plaid
pattern that does not have
a unique direction of
motion (intrinsic
dimension = 3).
In the right panel the motion of one straight grating generates a
plane of constant image intensity (intrinsic dimension = 1).

The optical flow cannot be estimated if there is no defined


direction of constant brightness, i.e., it can only be estimated if the
intrinsic dimension = 2.
28 E. Barth
Motion fields
Motion
Local motion estimation

Confidence measures based on the eigenvalues of J

It is often more difficult to detect motion with good confidence,


than to estimate the motion parameters themselves.

If λ1 ≥ λ2 ≥ λ3 are the eigenvalues of J, the following confidence


measures can be defined.
Total coherence:
λ1 − λ3 2
ct = ( ) (20)
λ1 + λ3
Spatial coherence:
λ1 − λ2 2
cs = ( ) (21)
λ1 + λ2
Corner measure:
cc = ct − cs (22)

29 E. Barth
Motion fields
Motion
Local motion estimation

Motion estimation with J

Compute partial derivatives with respect to x, y, and t (the


kernels used to estimate the derivatives are important)
Compute the products fx fy , ...
Blur the products with the convolution kernel h
Estimate the eigenvalues of the structure tensor J
Based on the eigenvalues, define confidence measures that are
related to the conditioning of J
If i2D confidence is high (one eigenvalue is small and the
other two large), compute the eigenvector to the minimum
eigenvalue
Obtain the motion parameters as the first two components of
the above eigenvector divided by the last component

30 E. Barth
Motion fields
Motion
Local motion estimation

The minors of J
The minors Mij of J are the determinants of the matrices obtained
from J by eliminating the row 4 − i and the column 4 − j, for
example, M11 = (h ∗ Ix2 )(h ∗ Iy2 ) − (h ∗ (Ix Iy ))2 .
Fact
If a matrix has a single zero eigenvalue, the corresponding
eigenvector can be evaluated in terms of the minors of that matrix.

Based on this fact, one can show [1] that if a pattern moves with
constant velocity v, i.e., I(x, y, t) = I 0 (x − tvx , y − tvy ), we have:
(M31 , −M21 ) (M32 , −M22 ) (M33 , −M23 )
v= = = . (23)
M11 M12 M13

In other words, if our motion model is valid, the 3 expressions


above are equal and equal to v.
31 E. Barth
Motion fields
Motion
Local motion estimation

Motion estimation with the minors of J

Compute partial derivatives with respect to x, y, and t (the


kernels used to estimate the derivatives are important)
Compute the products fx fy , ...
Blur the products with the convolution kernel h
Compute the minor M11 of J
Stop if the minor M11 is below a threshold (indicating an
aperture problem).
Otherwise compute the 3 different motion vectors based on
Eq. (23)
If the 3 vectors are similar, take the mean of the 3 vectors as
the final result.
Otherwise consider the confidence to be too low (indicating
occlusions, noise, ...)
32 E. Barth
Motion fields
Motion
Local motion estimation

Correlation-based methods
The motion vector is approximated by

s(x, y)
v= , (24)
t2 − t1
where s(x, y) is the displacement that yields the best match
between two image regions in two consecutive frames.
The best match can be determined in 2 ways:
by maximizing the cross-correlation function

h ∗ (I(x0 , t1 )I(x0 − s, t2 ))
c(x, s) = p ; x = (x, y) (25)
(h ∗ I(x0 , t1 )(h ∗ I(x0 − s, t2 ))

by minimizing the distance function

d(x, s) = h ∗ (I(x0 , t1 ) − I(x0 − s, t2 ))2 . (26)

33 E. Barth
Motion fields
Motion
Local motion estimation

Block matching
In practice, correlation-based methods are most often implemented
by using block-matching techniques:
Subdivide every image into square blocks
Find one displacement vector for each block
Within a search range, find a best match that minimizes an
error measure such as SSD or SAD.

P
SSD = block (It2 (x, y) −
It1 (x + sx , y + sy ))2
P
SAD = block |It2 (x, y) −
It1 (x + sx , y + sy )|

34 E. Barth
Motion fields
Motion
Local motion estimation

Block matching: efficient search

The following search option can be used:


Full search
computationally expensive
highly regular, can run in parallel
Successive elimination
speeds up matching significantly
Hierarchical block matching
reduces search space
handles large displacements

35 E. Barth
Motion fields
Motion
Local motion estimation

Block matching: successive elimination


The method is based on the following (triangle) inequality related
to the SAD measure (SSD in analogy) :
X
SAD = |It2 (x, y) − It1 (x + sx , y + sy )| ≥
block
X
| It2 (x, y) − It1 (x + sx , y + sy )| =
block
X X
| It2 (x, y) − It1 (x + sx , y + sy )|
block block

Based on the above relation, the strategy is to


1 Compute partial sums for blocks in current and previous frame

2 Compare blocks based on partial sums

3 Omit full block comparison, if partial sums indicate worse

error measure than previous best result (note that the initial
estimate is important)
36 E. Barth
Motion fields
Motion
Local motion estimation

Hierarchical block-matching

Strategy:
Start to match at coarse level and then search around the coarse
estimate at finer levels. This strategy reduces the search space and
can handle large displacements.

37 E. Barth
Motion fields
Motion
Local motion estimation

Complex motions

Optical flow difficulties


In case of occlusions, illumination changes, multiple motions,
etc., we need more complex motion models.
With our simple local motion model, only sparse flow fields
can be estimated.
For dense flow fields, additional constraints are needed.
38 E. Barth
Motion fields
Motion
Local motion estimation

Notations

v(x, y) optical flow I(x, y, t) movie intensity


u(x, y) 2D motion field r 3D direction of ct. intensity
w(X, Y, Z) 3D motion field J structure tensor
T unit translation vector λi eigenvalues of J
R unit-vector rotation axis Mij minors of J

39 E. Barth
Motion fields
Motion
Local motion estimation

Acknowledgement and literature

Some parts of the course are based on an earlier course by T. Aach and E. Barth.

The books [4] and [3] have a good coverage of motion and are recommended for further reading.

E Barth.
The minors of the structure tensor.
In G Sommer, editor, Mustererkennung 2000, pages 221–228, Berlin, 2000. Springer.

V. Srinivasan, S. W. Zhang, J. S. Chahl, E. Barth, and S. Venkatesh.


How honeybees make grazing landings on flat surfaces.
Biological Cybernetics, 83(3):171–183, 2000.

Richard Szeliski.
Computer Vision: Algorithms and Applications.
Springer, Boston, 2011.

Emanuele Trucco and Alessandro Verri.


Introductory Techniques for 3-D Computer Vision.
Prentice Hall PTR, Upper Saddle River, NJ, USA, 1998.

40 E. Barth

You might also like