CV EB Slides Motion

Motion
Computer Vision: motion
Erhardt Barth
Institut für Neuro- und Bioinformatik

Universität zu Lübeck
June 4, 2020
1 E. Barth
Motion fields
Motion
Local motion estimation
What is motion?
Defining motion:
Motion is a powerful feature of image sequences that relates
spatial image features to temporal changes.
From a temporal sequence of 2-D images, the only accessible

motion parameter is the optical flow v, which is an approximation
of the 2-D motion field u, which is the projection of the 3-D
motion field w of points in the scene onto the image sensor.
The optical flow v can be used for
motion detection and segmentation
motion compensation and motion-based data compression
3-D scene reconstruction
autonomous navigation (of robots and cars)
tracking
analysis of dynamical processes in scientific applications.
2 E. Barth
Motion fields
Motion
Optical flow
The notion of optical flow was introduced by J.J. Gibson, the
founder of ecological psychology.
This is the optical flow

that we generate by
egomotion when
moving towards a wall
at different angles.
What does the optical flow depend on?

egomotion induces a characteristic image pattern, which is
influenced by the shape and position of objects in the scene
motions of objects in the scene disturb the egomotion pattern
local image features determine to what extent motion is
measurable
3 E. Barth
Motion fields
Motion
Information in the optical flow

Examples of information that one would like to extract from the
optical flow:
egomotion parameters (velocity and direction of motion)
relative motions of image parts for segmentation
structure from motion: two frames taken at positions P1 and
P2 can be used like a stereo pair with baseline P2 − P1
time-to-contact for obstacle avoidance
Examples of behavior based on optical flow:
Many animals make use of the optical flow. For example,
honeybees adjust their speed of flight according to the optical
flow generated by their flight (they fly slower in a narrower
tube than in a wider tube - see Page 19).
Robots, cars, and airplanes can make use of such information
to navigate.
4 E. Barth
Motion fields
Motion
Notations for the different motion fields
w(X, Y, Z) ∈ R3 :
3-D motion field: relative motion between observer (camera) and a
fixed point in 3D space
u(x, y) ∈ R2 :
2-D motion field: projection of the 3-D motion field on the image
plane (only the motion vectors of visible points are projected)
v(x, y) ∈ R2 :
2-D optical flow : an approximation of the 2-D motion field that is
estimated from image intensities
5 E. Barth
Motion fields
Motion
3D motion field (I)

The following equation describes the position of a point P (e.g.
center of projection of the camera) relative to a fixed point P0 :
P(t) = A(t)(P0 − k(t)) (1)
The motion of a rigid body (here camera) can always be

decomposed in a translation and a rotation - see section on
image formation.
k(t) is the trajectory of the moving point (it describes the
translation of the point) and A(t) is a matrix that describes
the rotation.
The 3-D motion field is obtained (in camera coordinates) by
computing the time derivative (using the product rule):
w(P(t)) = A0 (t)(P0 − k(t)) − A(t)k0 (t) (2)
6 E. Barth
Motion fields
Motion
3D motion field (II)

At t = 0 (we are looking at the instantaneous flow) we obtain (by
assuming, without loss of generality, that A(0) is the unit matrix
and k(0)=0):
w(P0 ) = A0 (0)P0 − k0 (0) (3)
k’(0) is the infinitesimal translation of the point (the

derivative of the trajectory)
A’(0) is the infinitesimal rotation and it is of the form:
 
0 ω3 −ω2
−ω3 0 ω1  (4)
ω2 −ω1 0
The axis of rotation is the vector (ω1 , ω2 , ω3 )T and we denote

the norm of the vector by ω.
7 E. Barth
Motion fields
Motion
3D motion field (II++)
Note that from the expression (4) on the previous slide (obtained
by a Taylor expansion of A(0)) it follows that −A0 = A0T .
Furthermore, for any vector x, we have:
 
x2 ω3 − x3 ω2
A0 x = x3 ω1 − x1 ω3  = x × (ω1 , ω2 , ω3 )T , (5)
x1 ω2 − x2 ω1
i.e., we can replace matrix multiplication with vector product.
8 E. Barth
Motion fields
Motion
3D motion field (III)

If we denote with T the unit translation vector, with v the scalar
speed of translation, and with R the (unit vector) rotation axis we
obtain (from Eq. 3):
w(P) = vT − ωR × P (6)
Note that the translation (induced by egomotion) does not depend
on position P, whereas the rotation depends on both the axis of
rotation and the point in space.
The figure illustrates the

rotational component.
Note that w is
perpendicular on R and
P, and proportional to the
length of P and sin(α).
9 E. Barth
Motion fields
Motion
2D motion field (I)

Remember that the perspective projection
of a point
1 X
P(t) = (X, Y, Z)T is p(t) = − .
Z Y
By differentiating p = (x, y) with respect to t we obtain the
projection of the 3D motion field as
d d X(t) 1 dX dZ
x(t) = (− )=− ( + x ) and
dt dt Z(t) Z dt dt
d 1 dY dZ
y(t) = − ( + y ).
dt Z dt dt
d
By noting that dt P(0) = w = (w1 , w2 , w3 ), we finally obtain the
projection equation

1 w1 x
u=− ( + w3 ). (7)
Z w2 y
10 E. Barth
Motion fields
Motion
2D motion field (II)
3D point → projection 2D point

P = (X, Y, Z) 1 X p = (x, y)
p(t) = −
Z Y
d d
↓ ↓
dt dt
3D motion → projection
2D motion
w(P) 1 w1 x u(p)
u=− ( + w3 )
Z w 2 y
11 E. Barth
Motion fields
Motion
2D motion field (III)

What do we learn from the projection equation?

1 w1 x
u=− ( + w3 )
Z w2 y
The 2D velocity depends inversely on depth Z, thus closer

objects (seem to) move faster.
The 2D motion field contains information not only about the
motion of 3D points, but also about the geometry of objects
in the scene (Z).
The equation is linear in the 3-D motion field w. Therefore
different 3D motions generate 2D motion fields that
superimpose linearly (as rotations and translations do).
The motion and the geometry of objects in 3D can only be
recovered up to a scale factor (we can scale P and w without
changing u).
12 E. Barth
Motion fields
Motion
2D motion field induced by camera rotation

We start with the 3D motion field
 (Eq. 6) 
R2 Z − R3 Y
w(X, Y, Z) = −ωR × P = −ω R3 X − R1 Z  .
R1 Y − R2 X
Now we use the projection equation (7) to obtain the 2D motion
field

ω R2 Z − R3 Y x
u(x, y) = ( + (R1 Y − R2 X) ),
Z R 3 X − R1 Z y
and move to 2D coordinates only (since 3D coordinates occur as

X Y
ratios and )
Z Z
1 + x2

xy y
u(x, y) = ω(−R1 2 + R2 + R3 ). (8)
1+y xy −x
13 E. Barth
Motion fields
Motion
Names for rotations
pitch (Nicken)
yaw (Gieren)
roll (Rollen)
14 E. Barth
Motion fields
Motion
Examples of camera rotations
roll: R = (0,
0, 1), ω = 1, and
y
u(x, y) =
−x
yaw: R = (0, 1, 0),ω = 1, and

1 + x2
u(x, y) =
xy
15 E. Barth
Motion fields
Motion
2D motion field induced by camera translation
We start with the 3D motion field (Eq. 6)

w(X, Y, Z) = −vT = −v(T1 , T2 , T3 )T .
Now we use the projection equation (7) to obtain the 2D motion
field
v T1 x
u(x, y) = ( + T3 ). (9)
Z T2 y
The FOE (focus of expansion) F is the point where the motion
field vanishes and exists if T3 6= 0. Since from u = 0 it follows that
T1 + T3 x0 = 0 and T2 + T3 y0 = 0, the equation that defines the
FOE is
x0 1 T1
=− := F (10)
y0 T3 T2
16 E. Barth
Motion fields
Motion
Examples of camera translations
Flying towards a wall (straight

ahead). Note that the FOE is in
the center of the image.
Flying towards a wall and to the

left. Note that the FOE is on
the left.
17 E. Barth
Motion fields
Motion
Optical flow summary
The total equation for the 2-D motion field induced by egomotion
is (putting together Eqs. 8 and 9): u(x, y) =
1 + x2

v T1 x xy y
( +T3 )+ω(−R1 2 +R2 +R3 ).
Z T2 y 1+y xy −x
(11)
Rigid motion can be separated into translation and rotation.
The translational field depends on depth Z and can thus be
used to infer 3D structure.
The rotational field does not contain information about 3D
structure.
The focus-of-expansion (FOE) can be used to infer the
direction of heading.
18 E. Barth
Motion fields
Motion
How honeybees make grazing landings
The already mentioned simple

principle of keeping the optical
flow constant during flight can
help the bees to make a smooth
landing (figure and text taken
from [2]).
Merriam-Webster: to graze = to touch lightly in passing
19 E. Barth
Motion fields
Motion
How can we estimate the optical flow?

The problems are similar to those we have seen in stereo vision.
Correspondence: which element of a frame corresponds to
which element in the next frame?
Reconstruction: given a number of correspondences, what can
we say about the 3-D motion of the objects?
However, the problem is not well posed since the optical flow v
and the 2D motion field u can differ as shown below.
A sphere without
structure will not
generate optical flow
when is rotates but a
moving light source
might.
20 E. Barth
Motion fields
Motion
The aperture problem

The figure below illustrates that, at straight edges, different local
motions are valid when observing a particular displacement of the
edge.
Summarizing this and the previous slide, we note that the optical
flow cannot be estimated at both constant and straight image
features (which have intrinsic dimension 0 and 1 respectively).
21 E. Barth
Motion fields
Motion
Biological motion sensors

The figure below shows two elementary motion detectors with two
sensors (shown at the top) at 2 different spatial positions.
The Reichardt detector (left) is based on time-delay units τ

and multiplications M .
The detector on the right is based on the ratio of temporal
and spatial derivatives of image intensity1 .
1
The ’−’ just indicates spatial derivatives by discrete differences.
22 E. Barth
Motion fields
Motion
A local model of optical flow

A common assumption on optical flow is that the image brightness
I(x, y, t) at a point (x, y) and at time t should only change
because of object motion, i.e., the total time derivative = 0,
leading to the brightness-change constraint equation (BCCE)
dI ∂I dx ∂I dy ∂I
= + + = 0, (12)
dt ∂x dt ∂y dt ∂t
which can be written as:
∇xy I T v + It = 0 (13)
dx dy
since v = (vx , vy ) = ( , ). ∇xy I is the spatial gradient of I
dt dt
and It the temporal derivative. Another way of formulating the
same constraint is to require that all changes in intensity are due
to translations only, i.e., that I(x, y, t) can be written as
I 0 (x − tvx , y − tvy ). Note that in this case, I has intrinsic
dimension 2.
23 E. Barth
Motion fields
Motion
BCCE derived by approximation

We assume, that the brightness of an object does not change with
time. If such an object moves with dx and dy in time dt, we can
approximate I(x, y, t) by a truncated Taylor-series expansion:
∂I ∂I ∂I
I(x + dx, y + dy, t + dt) = I(x, y, t) + dx + dy + dt (14)
∂x ∂y ∂t
The above assumption implies that
I(x + dx, y + dy, t + dt) = I(x, y, t)
∂I ∂I ∂I
and it follows from (14) that dx + dy + dt = 0, and, as
∂x ∂y ∂t
on previous slide, that
∇xy I T v + It = 0.
24 E. Barth
Motion fields
Motion
How to solve the BCCE

Because the BCCE provides only one equation for two unknowns,
we sum a norm of the BCCE in a local neighborhood (assuming
that the flow is constant there!) to obtain more constraints and
search for the velocity that minimizes the term:
v = arg min (h ∗ (∇xy I T v + It )2 ) = arg min (E), (15)
v v
where h is a convolution kernel.
Minimization with standard least-squares (the partial derivatives of
E with vx and vy must equal zero) leads to the solution:
v = A−1 b (16)
Ix2

Ix Iy Ix It
with A = h ∗ and b = h ∗ .
Ix Iy Iy2 Iy It
So, v is obtained with BCCE and local weighted least-squares. No

solution exists if A cannot be inverted, i.e., if det(A) = 0.
25 E. Barth
Motion fields
Motion
Tensor methods
The image shows a movie,

and a (x, t) section thereof, in
which a cloud pattern moves
rightwards. In the (x, y, t)
space, the motion generates a
direction of constant
brightness r, which is related
(r1 , r2 )
to v by v = .
r3
Since r is perpendicular to the gradient ∇I = (Ix , Iy , It ), r can be
found as
r = arg min ||E||2 , ||E||2 = h ∗ (rT ∇I∇I T r). (17)

r, rT r=1
26 E. Barth
Motion fields
Motion
Motion and structure tensor

Under the assumption that r (and thus v) are constant (at least in
the region defined by h), we obtain
||E||2 = rT h ∗ (∇I∇I T )r = rT Jr (18)
where J is our well-known structure tensor
 2 
Ix Ix Iy Ix It
J = h ∗ Ix Iy Iy2 Iy It  . (19)
Ix It Iy It It2
Thus, our problem can be solved by minimizing rT Jr under the
additional constraint rT r = 1 (to avoid the solution r = 0).
By using Lagrange multipliers, one obtains the system of equations
Jr = λr and ||E||2 = rT Jr = rT λr = λ for the minimizing r.
So, the minimum is reached if r is the eigenvector that corresponds

to the minimum eigenvalue of J.
27 E. Barth
Motion fields
Motion
The need for confidence measures
But how do we know that our motion model was correct?
In the left panel, the

motions of overlaid
gratings generate a plaid
pattern that does not have
a unique direction of
motion (intrinsic
dimension = 3).
In the right panel the motion of one straight grating generates a
plane of constant image intensity (intrinsic dimension = 1).
The optical flow cannot be estimated if there is no defined

direction of constant brightness, i.e., it can only be estimated if the
intrinsic dimension = 2.
28 E. Barth
Motion fields
Motion
Confidence measures based on the eigenvalues of J
It is often more difficult to detect motion with good confidence,

than to estimate the motion parameters themselves.
If λ1 ≥ λ2 ≥ λ3 are the eigenvalues of J, the following confidence

measures can be defined.
Total coherence:
λ1 − λ3 2
ct = ( ) (20)
λ1 + λ3
Spatial coherence:
λ1 − λ2 2
cs = ( ) (21)
λ1 + λ2
Corner measure:
cc = ct − cs (22)
29 E. Barth
Motion fields
Motion
Motion estimation with J
Compute partial derivatives with respect to x, y, and t (the

kernels used to estimate the derivatives are important)
Compute the products fx fy , ...
Blur the products with the convolution kernel h
Estimate the eigenvalues of the structure tensor J
Based on the eigenvalues, define confidence measures that are
related to the conditioning of J
If i2D confidence is high (one eigenvalue is small and the
other two large), compute the eigenvector to the minimum
eigenvalue
Obtain the motion parameters as the first two components of
the above eigenvector divided by the last component
30 E. Barth
Motion fields
Motion
The minors of J
The minors Mij of J are the determinants of the matrices obtained
from J by eliminating the row 4 − i and the column 4 − j, for
example, M11 = (h ∗ Ix2 )(h ∗ Iy2 ) − (h ∗ (Ix Iy ))2 .
Fact
If a matrix has a single zero eigenvalue, the corresponding
eigenvector can be evaluated in terms of the minors of that matrix.
Based on this fact, one can show [1] that if a pattern moves with
constant velocity v, i.e., I(x, y, t) = I 0 (x − tvx , y − tvy ), we have:
(M31 , −M21 ) (M32 , −M22 ) (M33 , −M23 )
v= = = . (23)
M11 M12 M13
In other words, if our motion model is valid, the 3 expressions

above are equal and equal to v.
31 E. Barth
Motion fields
Motion
Motion estimation with the minors of J
Compute partial derivatives with respect to x, y, and t (the

kernels used to estimate the derivatives are important)
Compute the products fx fy , ...
Blur the products with the convolution kernel h
Compute the minor M11 of J
Stop if the minor M11 is below a threshold (indicating an
aperture problem).
Otherwise compute the 3 different motion vectors based on
Eq. (23)
If the 3 vectors are similar, take the mean of the 3 vectors as
the final result.
Otherwise consider the confidence to be too low (indicating
occlusions, noise, ...)
32 E. Barth
Motion fields
Motion
Correlation-based methods
The motion vector is approximated by
s(x, y)
v= , (24)
t2 − t1
where s(x, y) is the displacement that yields the best match
between two image regions in two consecutive frames.
The best match can be determined in 2 ways:
by maximizing the cross-correlation function
h ∗ (I(x0 , t1 )I(x0 − s, t2 ))
c(x, s) = p ; x = (x, y) (25)
(h ∗ I(x0 , t1 )(h ∗ I(x0 − s, t2 ))
by minimizing the distance function
d(x, s) = h ∗ (I(x0 , t1 ) − I(x0 − s, t2 ))2 . (26)
33 E. Barth
Motion fields
Motion
Block matching
In practice, correlation-based methods are most often implemented
by using block-matching techniques:
Subdivide every image into square blocks
Find one displacement vector for each block
Within a search range, find a best match that minimizes an
error measure such as SSD or SAD.
P
SSD = block (It2 (x, y) −
It1 (x + sx , y + sy ))2
P
SAD = block |It2 (x, y) −
It1 (x + sx , y + sy )|
34 E. Barth
Motion fields
Motion
Block matching: efficient search
The following search option can be used:

Full search
computationally expensive
highly regular, can run in parallel
Successive elimination
speeds up matching significantly
Hierarchical block matching
reduces search space
handles large displacements
35 E. Barth
Motion fields
Motion
Block matching: successive elimination

The method is based on the following (triangle) inequality related
to the SAD measure (SSD in analogy) :
X
SAD = |It2 (x, y) − It1 (x + sx , y + sy )| ≥
block
X
| It2 (x, y) − It1 (x + sx , y + sy )| =
block
X X
| It2 (x, y) − It1 (x + sx , y + sy )|
block block
Based on the above relation, the strategy is to

1 Compute partial sums for blocks in current and previous frame
2 Compare blocks based on partial sums
3 Omit full block comparison, if partial sums indicate worse
error measure than previous best result (note that the initial
estimate is important)
36 E. Barth
Motion fields
Motion
Hierarchical block-matching
Strategy:
Start to match at coarse level and then search around the coarse
estimate at finer levels. This strategy reduces the search space and
can handle large displacements.
37 E. Barth
Motion fields
Motion
Complex motions
Optical flow difficulties

In case of occlusions, illumination changes, multiple motions,
etc., we need more complex motion models.
With our simple local motion model, only sparse flow fields
can be estimated.
For dense flow fields, additional constraints are needed.
38 E. Barth
Motion fields
Motion
Notations
v(x, y) optical flow I(x, y, t) movie intensity

u(x, y) 2D motion field r 3D direction of ct. intensity
w(X, Y, Z) 3D motion field J structure tensor
T unit translation vector λi eigenvalues of J
R unit-vector rotation axis Mij minors of J
39 E. Barth
Motion fields
Motion
Acknowledgement and literature
Some parts of the course are based on an earlier course by T. Aach and E. Barth.
The books [4] and [3] have a good coverage of motion and are recommended for further reading.
E Barth.
The minors of the structure tensor.
In G Sommer, editor, Mustererkennung 2000, pages 221–228, Berlin, 2000. Springer.
V. Srinivasan, S. W. Zhang, J. S. Chahl, E. Barth, and S. Venkatesh.

How honeybees make grazing landings on flat surfaces.
Biological Cybernetics, 83(3):171–183, 2000.
Richard Szeliski.
Computer Vision: Algorithms and Applications.
Springer, Boston, 2011.
Emanuele Trucco and Alessandro Verri.

Introductory Techniques for 3-D Computer Vision.
Prentice Hall PTR, Upper Saddle River, NJ, USA, 1998.
40 E. Barth

CV EB Slides Motion

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CV EB Slides Motion

Uploaded by

Copyright:

Available Formats

Motion

Computer Vision: motion

Institut für Neuro- und Bioinformatik

From a temporal sequence of 2-D images, the only accessible

This is the optical flow

What does the optical flow depend on?

Information in the optical flow

Notations for the different motion fields

3D motion field (I)

The motion of a rigid body (here camera) can always be

3D motion field (II)

k’(0) is the infinitesimal translation of the point (the

The axis of rotation is the vector (ω1 , ω2 , ω3 )T and we denote

3D motion field (II++)

3D motion field (III)

The figure illustrates the

2D motion field (I)

2D motion field (II)

3D point → projection  2D point

2D motion field (III)

The 2D velocity depends inversely on depth Z, thus closer

2D motion field induced by camera rotation

and move to 2D coordinates only (since 3D coordinates occur as

Names for rotations

Examples of camera rotations

yaw: R = (0, 1, 0),ω = 1, and

2D motion field induced by camera translation

We start with the 3D motion field (Eq. 6)

Examples of camera translations

Flying towards a wall (straight

Flying towards a wall and to the

Optical flow summary

How honeybees make grazing landings

The already mentioned simple

Merriam-Webster: to graze = to touch lightly in passing

How can we estimate the optical flow?

The aperture problem

Biological motion sensors

The Reichardt detector (left) is based on time-delay units τ

A local model of optical flow

BCCE derived by approximation

I(x + dx, y + dy, t + dt) = I(x, y, t)

How to solve the BCCE

So, v is obtained with BCCE and local weighted least-squares. No

The image shows a movie,

r = arg min ||E||2 , ||E||2 = h ∗ (rT ∇I∇I T r). (17)

Motion and structure tensor

So, the minimum is reached if r is the eigenvector that corresponds

The need for confidence measures

But how do we know that our motion model was correct?

In the left panel, the

The optical flow cannot be estimated if there is no defined

Confidence measures based on the eigenvalues of J

It is often more difficult to detect motion with good confidence,

If λ1 ≥ λ2 ≥ λ3 are the eigenvalues of J, the following confidence

Motion estimation with J

Compute partial derivatives with respect to x, y, and t (the

In other words, if our motion model is valid, the 3 expressions

Motion estimation with the minors of J

Compute partial derivatives with respect to x, y, and t (the

by minimizing the distance function

d(x, s) = h ∗ (I(x0 , t1 ) − I(x0 − s, t2 ))2 . (26)

Block matching: efficient search

3D point → projection 2D point

yaw: R = (0, 1, 0),ω = 1, and