You are on page 1of 21

Int J Comput Vis (2011) 93: 368–388

DOI 10.1007/s11263-011-0422-6
Optic Flow in Harmony
Henning Zimmer · Andrés Bruhn · Joachim Weickert
Received: 30 August 2010 / Accepted: 13 January 2011 / Published online: 26 January 2011
© Springer Science+Business Media, LLC 2011
Abstract Most variational optic flow approaches just con-
sist of three constituents: a data term, a smoothness term
and a smoothness weight. In this paper, we present an ap-
proach that harmonises these three components. We start by
developing an advanced data term that is robust under out-
liers and varying illumination conditions. This is achieved
by using constraint normalisation, and an HSV colour rep-
resentation with higher order constancy assumptions and a
separate robust penalisation. Our novel anisotropic smooth-
ness is designed to work complementary to the data term.
To this end, it incorporates directional information from the
data constraints to enable a filling-in of information solely
in the direction where the data term gives no information,
yielding an optimal complementary smoothing behaviour.
This strategy is applied in the spatial as well as in the spatio-
temporal domain. Finally, we propose a simple method for
automatically determining the optimal smoothness weight.
This method bases on a novel concept that we call “opti-
mal prediction principle” (OPP). It states that the flow field
obtained with the optimal smoothness weight allows for the
best prediction of the next frames in the image sequence.
The benefits of our “optic flowin harmony” (OFH) approach
H. Zimmer () · J. Weickert
Mathematical Image Analysis Group, Faculty of Mathematics and
Computer Science, Saarland University, Campus E1.1, 66041
Saarbrücken, Germany
e-mail: zimmer@mia.uni-saarland.de
J. Weickert
e-mail: weickert@mia.uni-saarland.de
A. Bruhn
Vision and Image Processing Group, Cluster of Excellence
Multimodal Computing and Interaction, Saarland University,
Campus E1.1, 66041 Saarbrücken, Germany
e-mail: bruhn@mmci.uni-saarland.de
are demonstrated by an extensive experimental validation
and by a competitive performance at the widely used Mid-
dlebury optic flow benchmark.
Keywords Optic flow · Variational methods · Anisotropic
smoothing · Robust penalisation · Motion tensor ·
Parameter selection
1 Introduction
Despite almost three decades of research on variational optic
flow approaches, there have been hardly any investigations
on the compatibility of their three main components: the
data term, the smoothness term and the smoothness weight.
While the data term models constancy assumptions on im-
age features, the smoothness term penalises fluctuations in
the flow field, and the smoothness weight determines the
balance between the two terms. In this paper, we present the
optic flow in harmony (OFH) method, which harmonises the
three constituents by following two main ideas:
(i) Widely-used data terms such as the one resulting from
the linearised brightness constancy assumption only
constrain the flow in one direction. However, most
smoothness terms impose smoothness also in the data
constraint direction, leading to an undesirable interfer-
ence. A notable exception is the anisotropic smoothness
term that has been proposed by Nagel and Enkelmann
(1986) and modified by Schnörr (1993). At large image
gradients, this regulariser solely smoothes the flow field
along image edges. For a basic data term modelling the
brightness constancy assumption, this smoothing direc-
tion is orthogonal to the data constraint direction and
thus both terms complement each other in an optimal
Int J Comput Vis (2011) 93: 368–388 369
manner. Unfortunately, this promising concept of com-
plementarity between data and smoothness term has not
been further investigated. Our paper revives this con-
cept for state-of-the-art optic flow models by presenting
a novel complementary smoothness term in conjunction
with an advanced data term.
(ii) Having adjusted the smoothing behaviour to the im-
posed data constraints, it remains to determine the op-
timal balance between the two terms for the image se-
quence under consideration. This comes down to select-
ing an appropriate smoothness weight, which is usually
considered a difficult task. We propose a method that
is easy to implement for all variational optic flow ap-
proaches and nevertheless gives surprisingly good re-
sults. It bases on the assumption that the flow estimate
obtained by an optimal smoothness weight allows for
the best possible prediction of the next frames in the
image sequence. This novel concept we name optimal
prediction principle (OPP).
1.1 Related Work
In the first ten years of research on optic flow, several basic
strategies have been considered, e.g. phase-based methods
(Fleet and Jepson 1990), local methods (Lucas and Kanade
1981; Bigün et al. 1991) and energy-based methods (Horn
and Schunck 1981). In recent years, the latter class of meth-
ods became increasingly popular, mainly due to their poten-
tial for giving highly accurate results. Within energy-based
methods, one can distinguish discrete approaches that min-
imise a discrete energy function and are often probabilisti-
cally motivated, and variational approaches that minimise a
continuous energy functional.
Our variational approach naturally incorporates concepts
that have proven their benefits over the years. In the follow-
ing, we briefly review advances in the design of data and
smoothness terms that are influential for our work.
Data Terms To cope with outliers caused by noise or oc-
clusions, Black and Anandan (1996) replaced the quadratic
penalisation from Horn and Schunck (1981) by a robust sub-
quadratic penaliser.
To obtain robustness under additive illumination changes,
Brox et al. (2004) combined the classical brightness con-
stancy assumption of Horn and Schunck (1981) with the
higher-order gradient constancy assumption (Tretiak and
Pastor 1984; Schnörr 1994). Bruhn and Weickert (2005) im-
proved this idea by a separate robust penalisation of bright-
ness and gradient constancy assumption. This gives advan-
tages if one of the two constraints produces an outlier. Re-
cently, Xu et al. (2010) went a step further and estimate
a binary map that locally selects between imposing either
brightness or gradient constancy. An alternative to higher-
order constancy assumptions can be to preprocess the im-
ages by a structure-texture decomposition (Wedel et al.
2008).
In addition to additive illumination changes, realistic sce-
narios also encompass a multiplicative part (van de Weijer
and Gevers 2004). For colour image sequences, this issue
can be tackled by normalising the colour channels (Golland
and Bruckstein 1997), or by using alternative colour spaces
with photometric invariances (Golland and Bruckstein 1997;
van de Weijer and Gevers 2004; Mileva et al. 2007). If one is
restricted to greyscale sequences, using log-derivatives (Mil-
eva et al. 2007) can be useful.
A further successful modification of the data term has
been reported by performing a constraint normalisation (Si-
moncelli et al. 1991; Lai and Vemuri 1998; Schoenemann
and Cremers 2006). It prevents an undesirable overweight-
ing of the data term at large image gradient locations.
Smoothness Terms First ideas go back to Horn and Schunck
(1981) who used a homogeneous regulariser that does not re-
spect any flow discontinuities. Since different image objects
may move in different directions, it is, however, desirable to
also permit discontinuities.
This can for example be achieved by using image-driven
regularisers that take into account image discontinuities. Al-
varez et al. (1999) proposed an isotropic model with a scalar-
valued weight function that reduces the regularisation at im-
age edges. An anisotropic counterpart that also exploits the
directional information of image discontinuities was intro-
duced by Nagel and Enkelmann (1986). Their method regu-
larises the flow field along image edges but not across them.
As noted by Xiao et al. (2006), this comes down to con-
volving the flow field with an oriented Gaussian where the
orientation is adapted to the image edges.
Note that for a basic data term modelling the brightness
constancy assumption, the image edge direction coincides
with the complementary direction orthogonal to the data
constraint direction. In such cases, the regularisers of Nagel
and Enkelmann (1986) and Schnörr (1993) can thus be con-
sidered as early complementary smoothness terms.
Of course, not every image edge will coincide with a
flow edge. Thus, image-driven strategies are prone to give
oversegmentation artefacts in textured image regions. To
avoid this, flow-driven regularisers have been proposed that
respect discontinuities of the evolving flow field and are
therefore not misled by image textures. In the isotropic set-
ting this comes down to the use of robust, subquadratic pe-
nalisers which are closely related to line processes (Blake
and Zisserman 1987). For energy-based optic flow meth-
ods, such a strategy was used by Shulman and Hervé (1989)
and Schnörr (1994). An anisotropic extension was later pre-
sented by Weickert and Schnörr (2001a). The drawback of
370 Int J Comput Vis (2011) 93: 368–388
flow-driven regularisers lies in less well-localised flowedges
compared to image-driven approaches.
Concerning the individual problems of image- and flow-
driven strategies, the idea arises to combine the advantages
of both worlds. This goal was first achieved in the discrete
method of Sun et al. (2008). There, the authors developed
an anisotropic regulariser that uses directional flow deriva-
tives steered by image structures. This allows to adapt the
smoothing direction to the direction of image structures and
the smoothing strength to the flow contrast. We call such a
strategy image- and flow-driven regularisation. It combines
the benefits of image- and flow-driven methods, i.e. sharp
flow edges without oversegmentation problems.
The smoothness terms discussed so far only assume
smoothness of the flow field in the spatial domain. As im-
age sequences usually consist of more than two frames,
yielding more than one flow field, it makes sense to also
assume temporal smoothness of the flow fields. This leads
to spatio-temporal smoothness terms. In a discrete setting
they go back to Murray and Buxton (1987). For varia-
tional approaches, an image-driven spatio-temporal smooth-
ness terms was proposed by Nagel (1990) and a flow-driven
counterpart was later presented by Weickert and Schnörr
(2001b). The latter smoothness term was later successfully
used in the methods of Brox et al. (2004) and Bruhn and
Weickert (2005).
Recently, non-local smoothing strategies (Yaroslavsky
1985) have been introduced to the optic flow community by
Sun et al. (2010) and Werlberger et al. (2010). In these ap-
proaches, it is assumed that the flow vector at a certain pixel
is similar to the vectors in a (possibly large) spatial neigh-
bourhood. Adapting ideas from Yoon and Kweon (2006),
the similarity to the neighbours is weighted by a bilateral
weight that depends on the spatial as well as on the colour
value distance of the pixels. Due to the colour value dis-
tance, these strategies can be classified as image-driven ap-
proaches and are thus also prone to oversegmentation prob-
lems. However, comparing non-local strategies to the pre-
viously discussed smoothness terms is somewhat difficult:
Whereas non-local methods explicitly model similarity in a
certain neighbourhood, the previous smoothness terms oper-
ate on flow derivatives that only consider their direct neigh-
bours. Nevertheless, the latter strategies model a globally
smooth flow field as each pixel communicates with each
other pixel through its neighbours.
Automatic Parameter Selection It is well-known that an
appropriate choice of the smoothness weight is crucial for
obtaining favourable results. Nevertheless, there has been
remarkably little research on methods that automatically es-
timate the optimal smoothness weight or other model para-
meters.
Concerning an optimal selection of the smoothness
weight for variational optic flow approaches, Ng and Solo
(1997) proposed an error measure which can be estimated
from the image sequence and the flow estimate only. Using
this measure, a brute-force search for the smoothness weight
that gives the smallest error is performed. Computing the
proposed error measure is, however, computationally expen-
sive, especially for robust data terms. Ng and Solo (1997)
hence restricted their focus to the basic method of Horn
and Schunck (1981). In a Bayesian framework, a parame-
ter selection approach that can also handle robust data terms
was presented by Krajsek and Mester (2007). This method
jointly estimates the flow and the model parameters where
the latter encompass the smoothness weight and also the
relative weights of different data terms. This method does
not require a brute-force search, but the minimisation of the
objective function is more complicated and only computa-
tionally feasible if certain approximations are performed.
1.2 Our Contributions
The OFH method is obtained in three steps. We first de-
velop a robust and invariant data term. Then an anisotropic
image- and flow-driven smoothness term is designed that
works complementary to the data term. Finally we propose
a simple method for automatically determining the optimal
smoothness weight for the given image sequence.
Our data term combines the brightness and the gradient
constancy assumption, and performs a constraint normali-
sation. It further uses a Hue-Saturation-Value (HSV) colour
representation with a separate robustification of each chan-
nel. The latter is motivated by the fact that each channel in
the HSV space has a distinct level of photometric invariance
and information content. Hence, a separate robustification
allows to choose the most reliable channel at each position.
Our anisotropic complementary smoothness term takes
into account directional information fromthe constraints im-
posed in the data term. Across “constraint edges”, we per-
form a robust penalisation to reduce the smoothing in the
direction where the data term gives the most information.
Along constraint edges, where the data term gives no in-
formation, a strong filling-in by using a quadratic penalisa-
tion makes sense. This strategy not only allows for an op-
timal complementarity between data and smoothness term,
but also leads to a desirable image- and flow-driven behav-
iour. We further show that our regulariser can easily be ex-
tended to work in the spatio-temporal domain.
Our method for determining the optimal smoothness
weight bases on the proposed OPP concept. This results
in finding the optimal smoothness weight as the one cor-
responding to the flow field with the best prediction quality.
To judge the latter, we evaluate the data constraints between
the first and the third frame of the sequence. Assuming a
constant speed and linear trajectory of objects within the
considered frames, this can be realised by simply doubling
Int J Comput Vis (2011) 93: 368–388 371
the flow vectors. Due to its simplicity, our method is easy
to implement for all variational optic flow approaches, but
nevertheless produces surprisingly good results.
The present article extends our shorter conference paper
(Zimmer et al. 2009) by the following points: (i) A more
extensive derivation and discussion of the data term. (ii)
An explicit discussion on the adequate treatment of the hue
channel of the HSV colour space. (iii) A taxonomy of ex-
isting smoothness terms within a novel general framework.
The latter allows to reformulate most existing as well as our
novel regulariser in a common notation. (iv) The extension
of our complementary regulariser to the spatio-temporal do-
main. (v) A simple method for automatically selecting the
smoothness weight. (vi) A deeper discussion of implemen-
tation issues. (vii) Amore extensive experimental validation.
Organisation In Sect. 2 we present our variational optic
flowmodel with the robust data termand the complementary
smoothness term. The latter is then extended to the spatio-
temporal domain. Section 3 describes the method proposed
for determining the smoothness weight. After discussing
implementation issues in Sect. 4, we show experiments in
Sect. 5. The paper is finished with conclusions and an out-
look on future work in Sect. 6.
2 Variational Optic Flow
Let f (x) be a greyscale image sequence where x :=
(x, y, t )

. Here, the vector (x, y)

∈ Ω denotes the location
within a rectangular image domain Ω ⊂ R
2
, and t ∈ [0, T ]
denotes time. We further assume that f is presmoothed by a
Gaussian convolution: Given an image sequence f
0
(x), we
obtain f (x) =(K
σ
∗ f
0
)(x), where K
σ
is a spatial Gaussian
of standard deviation σ and ∗ denotes the convolution oper-
ator. This presmoothing step helps to reduce the influence of
noise and additionally makes the image sequence infinitely
many times differentiable, i.e. f ∈ C

.
The optic flow field w := (u, v, 1)

describes the dis-
placement vector field between two frames at time t and
t +1. It is found by minimising a global energy functional
of the general form
E(u, v) =
_
Ω
_
M(u, v) +α V(∇
2
u, ∇
2
v)
_
dx dy, (1)
where ∇
2
:=(∂
x
, ∂
y
)

denotes the spatial gradient operator.
The term M(u, v) denotes the data term, V(∇
2
u, ∇
2
v) the
smoothness term, and α > 0 is a smoothness weight. Note
that the energy (1) refers to the spatial case where one com-
putes one flow field between two frames at time t and t +1.
The more general spatio-temporal case that uses all frames
t ∈ [0, T ] will be presented in Sect. 2.3.
According to the calculus of variations (Elsgolc 1962),
a minimiser (u, v) of the energy (1) necessarily has to fulfil
the associated Euler-Lagrange equations

u
M −α
_

x
_

u
x
V
_
+∂
y
_

u
y
V
__
= 0, (2)

v
M −α
_

x
_

v
x
V
_
+∂
y
_

v
y
V
__
= 0 (3)
with homogeneous Neumann boundary conditions.
2.1 Data Term
Let us now derive our data term in a systematic way. The
classical starting point is the brightness constancy assump-
tion used by Horn and Schunck (1981). It states that im-
age intensities remain constant under their displacement,
i.e. f (x + w) = f (x). Assuming that the image sequence
is smooth (which can be guaranteed by the Gaussian pres-
moothing) and that the displacements are small, we can per-
form a first-order Taylor expansion that yields the linearised
optic flow constraint (OFC)
0 =f
x
u +f
y
v +f
t
=∇

3
f w, (4)
where ∇
3
:= (∂
x
, ∂
y
, ∂
t
)

is the spatio-temporal gradient
operator and subscripts denote partial derivatives. With a
quadratic penalisation, the corresponding data term is given
by
M
1
(u, v) =
_

3
f w
_
2
=w

J
0
w, (5)
with the tensor
J
0
:=∇
3
f ∇

3
f. (6)
The single equation given by the OFC involves two un-
knowns u and v. It is thus not sufficient to compute a unique
solution, which is known as the aperture problem (Bertero
et al. 1988). Nevertheless, the OFC does allow to compute
the flowcomponent orthogonal to image edges, the so-called
normal flow. For |∇
2
f | =0 it is defined as
w
n
:=
_
u

n
, 1
_

:=
_

f
t
|∇
2
f |

2
f
|∇
2
f |
, 1
_

. (7)
Normalisation Our experiments will show that normalis-
ing the data term can be beneficial. Following Simoncelli
et al. (1991), Lai and Vemuri (1998), Schoenemann and Cre-
mers (2006) and using the abbreviation u := (u, v)

, we
rewrite the data term M
1
as
M
1
(u, v) =
_

2
f u +f
t
_
2
=
_
|∇
2
f |
_

2
f u
|∇
2
f |
+
f
t
|∇
2
f |
__
2
372 Int J Comput Vis (2011) 93: 368–388
Fig. 1 Geometric interpretation of the rewritten data term (8)
= |∇
2
f |
2
_

2
f
|∇
2
f |
_
u +
f
t

2
f
|∇
2
f |
2
__
2
= |∇
2
f |
2
_

2
f
|∇
2
f |
_
u −u
n
_
. ,, .
=:d
_
2
. (8)
The term d constitutes a projection of the difference between
the estimated flow u and the normal flow u
n
in the direction
of the image gradient ∇
2
f . In a geometric interpretation,
the term d describes the distance from u to the line l in the
uv-space that is given by
v =−
f
x
f
y
u −
f
t
f
y
. (9)
On this line, the flow u has to lie according to the OFC (4),
and the normal flow u
n
is the smallest vector that lies on
this line. A sketch of this situation is given in Fig. 1. Our
geometric interpretation suggests that one should ideally pe-
nalise the distance d in a data term M
2
(u, v) =d
2
. The data
term M
1
, however, weighs this distance by the squared spa-
tial image gradient, as M
1
(u, v) = |∇
2
f |
2
d
2
, see (8). This
results in a stronger enforcement of the data constraint at
high gradient locations. This overweighting is undesirable as
large gradients can be caused by unreliable structures, such
as noise or occlusions.
As a remedy, we normalise the data term M
1
by multiply-
ing it with a factor (Simoncelli et al. 1991; Lai and Vemuri
1998)
θ
0
:=
1
|∇
2
f |
2

2
, (10)
where the regularisation parameter ζ > 0 avoids division by
zero. Additionally, it reduces the influence of small gradients
which are significantly smaller than ζ
2
, e.g. noise in flat re-
gions. Nevertheless, the normalisation is not influenced for
large gradients. Thus, it may pay off to choose a larger value
of ζ in the presence of noise. Incorporating the normalisa-
tion into M
1
leads to the data term
M
2
(u, v) =w

¯
J
0
w, (11)
with the normalised tensor
¯
J
0
:=θ
0
J
0

0
_

3
f ∇

3
f
_
. (12)
Gradient Constancy Assumption To render the data term
robust under additive illumination changes, it was proposed
to impose the gradient constancy assumption (Tretiak and
Pastor 1984; Schnörr 1994; Brox et al. 2004). In contrast
to the brightness constancy assumption, it states that im-
age gradients remain constant under their displacement, i.e.

2
f (x +w) =∇
2
f (x). A Taylor linearisation gives

3
f
x
w=0 and ∇

3
f
y
w=0, (13)
respectively. Combining both brightness and gradient con-
stancy assumption in a quadratic way gives the data term
M
3
(u, v) =w

Jw, (14)
where the tensor J can be written in the motion tensor no-
tation of Bruhn et al. (2006) that allows to combine the two
constancy assumptions in a joint tensor
J := J
0
+γ J
xy
:=J
0

_
J
x
+J
y
_
:= ∇
3
f ∇

3
f +γ
_

3
f
x

3
f
x
+∇
3
f
y

3
f
y
_
, (15)
where the parameter γ > 0 steers the contribution of the gra-
dient constancy assumption.
To normalise M
3
, we replace the motion tensor J by its
normalised counterpart
¯
J :=
¯
J
0

¯
J
xy
:=
¯
J
0

_
¯
J
x
+
¯
J
y
_
:= θ
0
J
0

_
θ
x
J
x

y
J
y
_
:= θ
0
_

3
f ∇

3
f
_

_
θ
x
_

3
f
x

3
f
x
_

y
_

3
f
y

3
f
y
__
, (16)
with two additional normalisation factors defined as
θ
x
:=
1
|∇
2
f
x
|
2

2
and θ
y
:=
1
|∇
2
f
y
|
2

2
. (17)
The normalised data term M
4
is given by
M
4
(u, v) =w

¯
Jw. (18)
Colour Image Sequences In a next step we extend our data
term to multi-channel sequences (f
1
(x), f
2
(x), f
3
(x)). If
Int J Comput Vis (2011) 93: 368–388 373
one uses the standard RGB colour space, the three channels
represent the red, green and blue channel, respectively. We
couple the three colour channels in the motion tensor
¯
J
c
:=
3

i=1
¯
J
i
:=
3

i=1
_
¯
J
i
0

¯
J
i
xy
_
:=
3

i=1
_
¯
J
i
0

_
¯
J
i
x
+
¯
J
i
y
__
:=
3

i=1
_
θ
i
0
_

3
f
i

3
f
i
_

_
θ
i
x
_

3
f
i
x

3
f
i
x
_

i
y
_

3
f
i
y

3
f
i
y
___
, (19)
with normalisation factors θ
i
for each colour channel f
i
.
The corresponding data term reads as
M
5
(u, v) =w

¯
J
c
w. (20)
Photometric Invariant Colour Spaces Realistic illumina-
tion models encompass a multiplicative influence (van de
Weijer and Gevers 2004), which cannot be captured by the
gradient constancy assumption that is only invariant under
additive illumination changes. This problem can be tackled
by using the Hue Saturation Value (HSV) colour space, as
proposed by Golland and Bruckstein (1997). The hue chan-
nel is invariant under global and local multiplicative illu-
mination changes, as well as under local additive changes.
The saturation channel is only invariant under global mul-
tiplicative illumination changes, and the value channel ex-
hibits no invariances. Mileva et al. (2007) thus only used
the hue channel for optic flow computation as it exhibits the
most invariances. We will additionally use the saturation and
value channel, because they contain information that is not
encoded in the hue channel.
As an example, consider the HSV decomposition of the
Rubberwhale image shown in Fig. 2. As we can see, the
shadow at the left of the wheel (shown in the zoom) is not
present in the hue and the saturation channel, but appears
in the value channel. Nevertheless, especially the hue chan-
nel discards a lot of image information, as can be observed
for the striped cloth. This information is, on the other hand,
available in the value channel.
One problem when using a HSV colour representation is
that the hue channel f
1
describes an angle in a colour circle,
i.e. f
1
∈ [0

, 360

). The hue channel is hence not differen-
tiable at the interface between 0

and 360

. Our solution to
this problem is to consider the unit vector (cos f
1
, sinf
1
)

corresponding to the angle f
1
. This results in treating the
hue channel as two (coupled) channels, which are both dif-
ferentiable. The corresponding motion tensor for the bright-
Fig. 2 HSV decomposition on the example of the Rubberwhale im-
age from the Middlebury optic flow database (Baker et al. 2009,
http://vision.middlebury.edu/flow/data/). First row, from left to right:
(a) Colour image with zoom in shadow region left of the wheel. (b)
Hue channel, visualised with full saturation and value. Second row,
from left to right: (c) Saturation channel. (d) Value channel
ness constancy assumption consequently reads as
¯
J
1
0
:= θ
1
0
_

3
cos f
1

3
cos f
1
+∇
3
sinf
1

3
sinf
1
_
,
(21)
where the normalisation factor is defined as
θ
1
0
:=
1
¸
¸

2
cos f
1
¸
¸
2
+
¸
¸

2
sinf
1
¸
¸
2

2
. (22)
The tensor
¯
J
1
xy
for the gradient constancy assumption is
adapted accordingly. Note that in the differentiable parts of
the hue channel, the motion tensor (21) is equivalent to our
earlier definition, as

3
cos f
1

3
cos f
1
+∇
3
sinf
1

3
sinf
1
=sin
2
f
1
_
¯

3
f
1
¯

3
f
1
_
+cos
2
f
1
_
¯

3
f
1
¯

3
f
1
_
=
¯

3
f
1
¯

3
f
1
, (23)
where
¯
∇ denotes the gradient in the differentiable parts of
the hue channel.
Robust Penalisers To provide robustness of the data term
against outliers caused by noise and occlusions, Black and
Anandan (1996) proposed to refrain from a quadratic penali-
sation. Instead they use a subquadratic penalisation function

M
(s
2
), where s
2
denotes the quadratic data term. Using
such a robust penaliser within our data term yields
M
6
(u, v) =
M
_
w

¯
J
c
w
_
. (24)
Good results are reported by Brox et al. (2004) for the sub-
quadratic penaliser
M
(s
2
) :=

s
2

2
using a small reg-
ularisation parameter ε > 0.
374 Int J Comput Vis (2011) 93: 368–388
Bruhn and Weickert (2005) later proposed a separate
penalisation of the brightness and the gradient constancy
assumption, which is advantageous if one assumption pro-
duces an outlier. Incorporating this strategy into our ap-
proach gives the data term
M
7
(u, v) =
M
_
w

¯
J
c
0
w
_

M
_
w

¯
J
c
xy
w
_
, (25)
where the separate motion tensors are defined as
¯
J
c
0
:=
3

i=1
¯
J
i
0
and
¯
J
c
xy
:=
3

i=1
¯
J
i
xy
. (26)
We will go further by proposing a separate robustification of
each colour channel in the HSV space. This can be justified
by the distinct information content of each of the three chan-
nels, see Fig. 2, that drives the optic flow estimation in dif-
ferent ways. The separate robustification then downweights
the influence of less appropriate colour channels.
Final Data Term Incorporating our separate robustifica-
tion idea into M
7
brings us to our final data term
M(u, v) =
3

i=1

M
_
w

¯
J
i
0
w
_

_
3

i=1

M
_
w

¯
J
i
xy
w
_
_
,
(27)
with the motion tensors
¯
J
1
adapted to the HSV colour space
as described before. Note that our final data term is (i) nor-
malised, (ii) combines the brightness and gradient constancy
assumption, and (iii) uses the HSV colour space with (iv) a
separate robustification of all colour channels.
The contributions of our data term (27) to the Euler-
Lagrange equations (2) and (3) are given by

u
M =
3

i=1
_

M
_
w

¯
J
i
0
w
_
·
__
¯
J
i
0
_
1,1
u +
_
¯
J
i
0
_
1,2
v +
_
¯
J
i
0
_
1,3
__

_
3

i=1
_

M
_
w

¯
J
i
xy
w
_
·
__
¯
J
i
xy
_
1,1
u +
_
¯
J
i
xy
_
1,2
v +
_
¯
J
i
xy
_
1,3
__
_
, (28)

v
M =
3

i=1
_

M
_
w

¯
J
i
0
w
_
·
__
¯
J
i
0
_
1,2
u +
_
¯
J
i
0
_
2,2
v +
_
¯
J
i
0
_
2,3
__

_
3

i=1
_

M
_
w

¯
J
i
xy
w
_
·
__
¯
J
i
xy
_
1,2
u +
_
¯
J
i
xy
_
2,2
v +
_
¯
J
i
xy
_
2,3
__
_
, (29)
where [J]
m,n
denotes the entry in row m and column n of
the tensor J, and

M
(s
2
) denotes the derivative of
M
(s
2
)
w.r.t. its argument. Analysing the terms (28) and (29), we see
that the separate robustification of the HSV channels makes
sense: If a specific channel violates the imposed constancy
assumption at a certain location, the corresponding argu-
ment of the decreasing function

M
will be large, yielding a
downweighting of this channel. Other channels that satisfy
the constancy assumption then have a dominating influence
on the solution. This will be confirmed by a specific experi-
ment in Sect. 5.1.
2.2 Smoothness Term
Following the extensive taxonomy on optic flow regularisers
by Weickert and Schnörr (2001a), we sketch some existing
smoothness terms that led to our novel complementary regu-
lariser. We rewrite the regularisers in a novel framework that
unifies their notation and eases their comparison.
Preliminaries for the General Framework We first intro-
duce concepts that will be used in our general framework.
Anisotropic image-driven regularisers take into account
directional information from image structures. These infor-
mation can be obtained by considering the structure tensor
(Förstner and Gülch 1987)
S
ρ
:=K
ρ

_

2
f ∇

2
f
_
=:
2

i=1
μ
i
s
i
s

i
, (30)
with an integration scale ρ > 0. The structure tensor is a
symmetric, positive semidefinite 2 ×2 matrix that possesses
two orthonormal eigenvectors s
1
and s
2
with corresponding
eigenvalues μ
1
≥ μ
2
≥ 0. The vector s
1
points across im-
age structures, whereas the vector s
2
points along them. In
the case for ρ = 0, i.e. without considering neighbourhood
information, one obtains
s
0
1
=

2
f
|∇
2
f |
and s
0
2
=


2
f
|∇
2
f |
, (31)
where ∇

2
f := (−f
y
, f
x
)

denotes the vector orthogonal
to ∇
2
f . For ρ =0 we also have that |∇
2
f |
2
=tr S
0
, with tr
denoting the trace operator.
Most regularisers impose smoothness by penalising the
magnitude of the flow gradients. As s
1
and s
2
constitute an
orthonormal basis, we can write
|∇
2
u|
2
=u
2
x
+u
2
y
=u
2
s
1
+u
2
s
2
, (32)
using the directional derivatives u
s
i
:= s

i

2
u. A corre-
sponding rewriting can also be performed for |∇
2
v|
2
.
Int J Comput Vis (2011) 93: 368–388 375
To analyse the smoothing behaviour of the regularisers,
we will consider the corresponding Euler-Lagrange equa-
tions that can be written in the form

u
M −α div(D∇
2
u) = 0, (33)

v
M −α div(D∇
2
v) = 0, (34)
with a diffusion tensor D that steers the smoothing of the
flow components u and v. More specific, the eigenvectors
of D give the smoothing direction, and the corresponding
eigenvalues determine the magnitude of smoothing.
Homogeneous Regularisation First ideas for the smooth-
ness term go back to Horn and Schunck (1981) who used a
homogeneous regulariser. In our framework it reads as
V
H
(∇
2
u, ∇
2
v) := |∇
2
u|
2
+|∇
2
v|
2
= u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
. (35)
The corresponding diffusion tensor is equal to the unit ma-
trix D
H
= I. The smoothing processes thus perform homo-
geneous diffusion that blurs important flow edges.
Image-Driven Regularisation To obtain sharp flow edges,
image-driven methods (Nagel and Enkelmann 1986; Al-
varez et al. 1999) reduce the smoothing at image edges, in-
dicated by large values of |∇
2
f |
2
=tr S
0
.
An isotropic image-driven regulariser goes back to Al-
varez et al. (1999) who used
V
II
(∇
2
u, ∇
2
v) := g
_
|∇
2
f |
2
_ _
|∇
2
u|
2
+|∇
2
v|
2
_
= g
_
tr S
0
_ _
u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
_
, (36)
where g is a decreasing, strictly positive weight function.
The corresponding diffusion tensor D
II
= g(tr S
0
) I shows
that the weight function allows to decrease the smoothing in
accordance to the strength of image edges.
The anisotropic image-driven regulariser of Nagel and
Enkelmann (1986) prevents smoothing of the flow field
across image boundaries but encourages smoothing along
them. This is achieved by the regulariser
V
AI
(∇
2
u, ∇
2
v) := ∇

2
uP(∇
2
f ) ∇
2
u +∇

2
v P
_

2
f
_

2
v,
(37)
where P(∇
2
f ) denotes a regularised projection matrix per-
pendicular to the image gradient. It is defined as
P(∇
2
f ) :=
1
|∇
2
f |
2
+2κ
2
_


2
f
_


2
f
_


2
I
_
. (38)
with a regularisation parameter κ > 0. In our common
framework, this regulariser can be written as
V
AI
(∇
2
u, ∇
2
v) =
κ
2
tr S
0
+2κ
2
_
u
2
s
0
1
+v
2
s
0
1
_
+
tr S
0

2
tr S
0
+2κ
2
_
u
2
s
0
2
+v
2
s
0
2
_
. (39)
The correctness of above rewriting can easily be verified and
is based on the observations that s
0
1
and s
0
2
are the eigenvec-
tors of P, and that the factors in front of (u
2
s
0
1
+ v
2
s
0
1
) and
(u
2
s
0
2
+v
2
s
0
2
) are the corresponding eigenvalues. The diffusion
tensor for the regulariser of Nagel and Enkelmann (1986) is
identical to the projection matrix: D
AI
= P. Concerning its
eigenvectors and eigenvalues, we observe that in the lim-
iting case for κ → 0, where V
AI
→ u
2
s
0
2
+ v
2
s
0
2
, we obtain
a smoothing solely in s
0
2
-direction, i.e. along image edges.
In the definition of the normal flow (7) we have seen that
a data term that models the brightness constancy assump-
tion constraints the flow only orthogonal to image edges. In
the limiting case, the regulariser of Nagel and Enkelmann
can hence be interpreted as a first complementary smooth-
ness term that fills in information orthogonal to the data con-
straint direction. This complementarity has been emphasised
by Schnörr (1993), who contributed a theoretical analysis as
well as modifications of the original Nagel and Enkelmann
functional.
The drawback of image-driven strategies is that they are
prone to oversegmentation artefacts in textured image re-
gions where image edges do not necessarily correspond to
flow edges.
Flow-Driven Regularisation To remedy the oversegmen-
tation problem, it makes sense to adapt the smoothing
process to the flow edges instead of the image edges.
In the isotropic setting, Shulman and Hervé (1989) and
Schnörr (1994) proposed to use subquadratic penaliser func-
tions for the smoothness term, i.e.
V
IF
(∇
2
u, ∇
2
v) :=
V
_
|∇
2
u|
2
+|∇
2
v|
2
_
=
V
_
u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
_
, (40)
where the penaliser function
V
(s
2
) is preferably increas-
ing, differentiable and convex in s. The associated diffusion
tensor is given by
D
IF
=

V
_
u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
_
I. (41)
The underlying diffusion processes perform nonlinear iso-
tropic diffusion, where the smoothing is reduced at the
boundaries of the evolving flow field via the decreasing dif-
fusivity

V
. If one uses the convex penaliser (Cohen 1993)

V
(s
2
) :=
_
s
2

2
, (42)
376 Int J Comput Vis (2011) 93: 368–388
one ends up with regularised total variation (TV) regularisa-
tion (Rudin et al. 1992) with the diffusivity

V
(s
2
) =
1
2

s
2

2

1
2|s|
. (43)
Another possible choice is the non-convex Perona-Malik
regulariser (Lorentzian) (Black and Anandan 1996; Perona
and Malik 1990) given by

V
(s
2
) :=λ
2
log
_
1 +
s
2
λ
2
_
, (44)
that results in Perona-Malik diffusion with the diffusivity

V
(s
2
) =
1
1 +
s
2
λ
2
, (45)
using a contrast parameter λ > 0.
We will not discuss the anisotropic flow-driven regu-
lariser of Weickert and Schnörr (2001a) as it does not fit
in our framework and also has not been used in the design
of our complementary regulariser.
Despite the fact that flow-driven methods reduce the
oversegmentation problem caused by image textures, they
suffer from another drawback: The flow edges are not as
well localised as with image-driven strategies.
Image- and Flow-Driven Regularisation We have seen
that image-driven methods suffer from oversegmentation
artefacts, but give sharp flow edges. Flow-driven strategies
remedy the oversegmentation problem but give less pleasant
flow edges. It is thus desirable to combine the advantages of
both strategies to obtain sharp flow edges without overseg-
mentation problems.
This aim was achieved by Sun et al. (2008) who pre-
sented an anisotropic image- and flow-driven smoothness
term in a discrete setting. It adapts the smoothing direc-
tion to image structures but steers the smoothing strength
in accordance to the flow contrast. In contrast to Nagel and
Enkelmann (1986) who considered ∇

2
f to obtain direc-
tional information of image structures, the regulariser from
Sun et al. (2008) analyses the eigenvectors s
i
of the struc-
ture tensor S
ρ
from (30) to obtain a more robust direction
estimation. A continuous version of this regulariser can be
written as
V
AIF
(∇
2
u, ∇
2
v) :=
V
_
u
2
s
1
_
+
V
_
v
2
s
1
_
+
V
_
u
2
s
2
_
+
V
_
v
2
s
2
_
. (46)
Here, we obtain two diffusion tensors, that for p ∈ {u, v}
read as
D
p
AIF
=

V
_
p
2
s
1
_
s
1
s

1
+

V
_
p
2
s
2
_
s
2
s

2
. (47)
We observe that these tensors allow to obtain the desired be-
haviour: The regularisation direction is adapted to the image
structure directions s
1
and s
2
, whereas the magnitude of the
regularisation depends on the flow contrast encoded in p
s
1
and p
s
2
. As a result, one obtains the same sharp flow edges
as image-driven methods but does not suffer from overseg-
mentation problems.
2.2.1 Our Novel Complementary Regulariser
In spite of its sophistication, the anisotropic image- and
flow-driven model from Sun et al. (2008) given in (46) still
suffers from a few shortcomings. We introduce three amend-
ments that we will discuss now.
Regularisation Tensor A first remark w.r.t. the model from
(46) is that the directional information from the structure
tensor S
ρ
is not consistent with the imposed constraints of
our data term (27). It is more natural to take into account di-
rectional information provided by the motion tensor (19) and
to steer the anisotropic regularisation process w.r.t. “con-
straint edges” instead of image edges. To this end we pro-
pose to analyse the eigenvectors r
1
and r
2
of the regularisa-
tion tensor
R
ρ
:=
3

i=1
K
ρ

_
θ
i
0
_

2
f
i

2
f
i
_

_
θ
i
x
_

2
f
i
x

2
f
i
x
_

i
y
_

2
f
i
y

2
f
i
y
___
, (48)
which can be regarded as a generalisation of the structure
tensor (30). Note that the regularisation tensor differs from
the motion tensor
¯
J
c
from (19) by the facts that it (i) in-
tegrates neighbourhood information via the Gaussian con-
volution, and (ii) uses the spatial gradient operator ∇
2
in-
stead of the spatio-temporal operator ∇
3
. The latter is due
to the spatial regularisation. In Sect. 2.3 we extend our reg-
ulariser to the spatio-temporal domain, yielding a regulari-
sation tensor that also uses the spatio-temporal gradient ∇
3
.
Further note that a Gaussian convolution of the motion ten-
sor leads to a combined local-global (CLG) data term in the
spirit of Bruhn et al. (2005). Our experiments in Sect. 5.1
will analyse in which cases such a modification of our data
term can be useful.
Rotational Invariance The smoothness term V
AIF
from
(46) lacks the desirable property of rotational invariance, be-
cause the directional derivatives of u and v in the eigenvec-
tor directions are penalised separately. We propose to jointly
penalise the directional derivatives, yielding
V
AIF
Rρ,RI
(∇
2
u, ∇
2
v) :=
V
_
u
2
r
1
+v
2
r
1
_
+
V
_
u
2
r
2
+v
2
r
2
_
,
(49)
where we use the eigenvectors r
i
of the regularisation tensor.
Int J Comput Vis (2011) 93: 368–388 377
Table 1 Comparison of
regularisation strategies. The
next to last column names the
tensor that is analysed to obtain
directional information for
anisotropic strategies, and the
last column indicates if the
corresponding regulariser is
rotationally invariant
Strategy Regulariser V Directional Rotationally
adaptation invariant
Homogeneous u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2

(Horn and Schunck 1981)
Isotropic image-driven g (tr S
0
)
_
u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
_

(Alvarez et al. 1999)
Anisotropic image-driven u
2
s
0
2
+v
2
s
0
2
, for κ →0 S
0

(Nagel and Enkelmann 1986)
Isotropic flow-driven
V
_
u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
_

(Shulman and Hervé 1989)
Anisotropic image and
V
_
u
2
s
1
_
+
V
_
v
2
s
1
_
+
V
_
u
2
s
2
_
+
V
_
v
2
s
2
_
S
ρ

flow-driven (Sun et al. 2008)
Anisotropic complementary
V
_
u
2
r
1
+v
2
r
1
_
+u
2
r
2
+v
2
r
2
R
ρ

image- and flow-driven
Single Robust Penalisation. The above regulariser (49)
performs a twofold robust penalisation in both eigenvec-
tor directions. However, the data term mainly constraints
the flow in direction of the largest eigenvalue of the spa-
tial motion tensor, i.e. in r
1
-direction. We hence propose a
single robust penalisation in r
1
-direction. In the orthogonal
r
2
-direction, we opt for a quadratic penalisation to obtain
a strong filling-in effect of missing information. The bene-
fits of this design will be confirmed by our experiments in
Sect. 5.2. Incorporating the single robust penalisation finally
yields our complementary regulariser
V
CR
(∇
2
u, ∇
2
v) :=
V
_
u
2
r
1
+v
2
r
1
_
+u
2
r
2
+v
2
r
2
, (50)
that complements the proposed robust data term from (27)
in an optimal fashion. For the penaliser
V
, we propose to
use the Perona-Malik regulariser (44).
The corresponding joint diffusion tensor is given by
D
CR
=

V
_
u
2
r
1
+v
2
r
1
_
r
1
r

1
+r
2
r

2
, (51)
with

V
given in (45). The derivation of this diffusion tensor
is presented in Appendix.
Discussion To understand the advantages of the comple-
mentary regulariser compared to the anisotropic image- and
flow-driven regulariser (46), we compare our joint diffusion
tensor (51) to its counterparts (47), which reveals the fol-
lowing innovations: (i) The smoothing direction is adapted
to constraint edges instead of image edges, as the eigenvec-
tors of the regularisation tensor r
i
are used instead of the
eigenvectors of the structure tensor. (ii) We achieve rota-
tional invariance by coupling the two flowcomponents in the
argument of

V
. (iii) We only reduce the smoothing across
constraint edges, i.e. in r
1
-direction. Along them, always a
strong diffusion with strength 1 is performed, resembling
edge-enhancing anisotropic diffusion (Weickert 1996).
Furthermore, when analysing our joint diffusion ten-
sor, the benefits of the underlying anisotropic image- and
flow-driven regularisation become visible. The smoothing
strength across constraint edges is determined by the expres-
sion

V
(u
2
r
1
+v
2
r
1
). Here we can distinguish two scenarios:
At a flowedge that corresponds to a constraint edge, the flow
gradients will be large and almost parallel to r
1
. Thus, the
argument of the decreasing function

V
will be large, yield-
ing a reduced diffusion which preserves this important edge.
At “deceiving” texture edges in flat flow regions, however,
the flow gradients are small. This results in a small argument
for

V
, leading to almost homogeneous diffusion. Hence,
we perform a pronounced smoothing in both directions that
avoids oversegmentation artefacts.
Finally note that our complementary regulariser has the
same structure, even if other data terms are used. Only the
regularisation tensor R
ρ
has to be adapted to the new data
term.
2.2.2 Summary
To conclude this section, Table 1 summarises the discussed
regularisers rewritten in our framework. It also compares
the way directional information is obtained for anisotropic
strategies, and it indicates if the regulariser is rotationally
invariant. Note that despite the fact these regularisers have
been developed within almost three decades, our taxonomy
shows their structural similarities.
2.3 Extension to a Spatio-Temporal Smoothness Term
The smoothness terms we have discussed so far model the
assumption of a spatially smooth flow field. As image se-
quences in general encompass more than two frames, yield-
ing several flow fields, it makes sense to also assume a
378 Int J Comput Vis (2011) 93: 368–388
temporal smoothness of the flow fields, leading to spatio-
temporal regularisation strategies.
A spatio-temporal (ST) version of the general energy
functional (1) reads as
E
ST
(u, v) =
_
Ω×[0,T ]
_
M(u, v)
+α V
ST
(∇
3
u, ∇
3
v)
_
dx dy dt. (52)
Compared to the spatial energy (1) we note the additional
integration over the time domain and that the smoothness
term now depends on the spatio-temporal flow gradient.
To extend our complementary regulariser from(50) to the
spatio-temporal domain, we define the spatio-temporal reg-
ularisation tensor
R
ST
ρ
:=K
ρ

¯
J
c
. (53)
For ρ = 0 it is identical to the motion tensor
¯
J
c
from (19).
The Gaussian convolution with K
ρ
is now performed in
the spatio-temporal domain, which also holds for the pres-
moothing of the image sequence. The spatio-temporal regu-
larisation tensor is a 3 ×3 tensor that possesses three ortho-
normal eigenvectors r
1
, r
2
and r
3
. With their help, we define
the spatio-temporal complementary regulariser (ST-CR)
V
ST
CR
(∇
3
u, ∇
3
v) :=
V
_
u
2
r
1
+v
2
r
1
_
+u
2
r
2
+v
2
r
2
+u
2
r
3
+v
2
r
3
.
(54)
The corresponding spatio-temporal diffusion tensor reads as
D
ST
CR
=

V
_
u
2
r
1
+v
2
r
1
_
r
1
r

1
+r
2
r

2
+r
3
r

3
. (55)
3 Automatic Selection of the Smoothness Weight
The last step missing for our OFH method is a strategy that
automatically determines the optimal smoothness parameter
α for the image sequence under consideration. This is es-
pecially important in real world applications of optic flow
where no ground truth flow is known. Note that if the latter
would be the case, we could simply select the smoothness
weight that gives the flow field with the smallest deviation
from the ground truth.
3.1 A Novel Concept
We propose an error measure that allows to judge the quality
of a flow field without knowing the ground truth. This error
measure bases on a novel concept, the optimal prediction
principle (OPP). The OPP states that the flow field obtained
with an optimal smoothness weight allows for the best pre-
diction of the next frames in the image sequence. This makes
sense as a too small smoothness weight would lead to an
overfit to the first two frames and consequently result in a
bad prediction of further frames. For too large smoothness
weights, the flow fields would be too smooth and thus also
lead to a bad prediction.
Following the OPP, our error measure needs to judge the
quality of the prediction achieved with a given flow field.
To this end, we evaluate the imposed data constraints be-
tween the first and the third frame of the image sequence,
resulting in an average data constancy error (ADCE) mea-
sure. To compute this measure, we assume that the mo-
tion of the scene objects is of more or less constant speed
and that it describes linear trajectories within the considered
three frames. Under these assumptions, we simply double
the flow vectors to evaluate the data constraints between first
and third frame. Following this strategy, we can define the
ADCE between frame 1 and 3 as
ADCE
1,3
(w
α
)
:=
1
|Ω|
_
Ω
_
3

i=1

M
_
θ
i
0
_
f
i
(x +2w
α
) −f
i
(x)
_
2
_

_
3

i=1

M
_
θ
i
x
_
f
i
x
(x +2w
α
) −f
i
x
(x)
_
2

i
y
_
f
i
y
(x +2w
α
) −f
i
y
(x)
_
2
_
__
dx dy, (56)
where w
α
denotes the flow field obtained with a smoothness
weight α. The integrand of above expression is (apart from
the doubled flow field) a variant of our final data term (27)
where no linearisation of the constancy assumptions have
been performed. To evaluate the images at the subpixel lo-
cations f
i
(x+2w
α
) we use Coons patches based on bicubic
interpolation (Coons 1967).
3.2 Determining the Best Parameter
In general, the relation between α and the ADCE is not
convex, which excludes the use of gradient descent-like ap-
proaches for finding the optimal value of α w.r.t. our error
measure.
We propose a brute-force method similar to the one of
Ng and Solo (1997): We first compute the error measures for
a “sufficiently large” set of flow fields obtained with differ-
ent α values. We then select the α that gives the smallest er-
ror. To reduce the number of α values to test, we propose to
start from a given, standard value α
0
, say. This value is then
incremented/decremented n
α
times by multiplying/dividing
it with a stepping factor a > 1, yielding in total 2n
α
+ 1
tests. This strategy results in testing more values of α that
are close to α
0
and, more important, tests less very small
or very large values of α that hardly give reasonable re-
sults.
Int J Comput Vis (2011) 93: 368–388 379
4 Implementation
The solution of the Euler-Lagrange equations for our method
comes down to solving a nonlinear system of equations. We
solve the system by a nonlinear multigrid scheme based on
a Gauß-Seidel type solver with alternating line relaxation
(Bruhn et al. 2006).
4.1 Warping Strategy for Large Displacements
The derivation of the optic flow constraint (4) by means of
a linearisation is only valid under the assumption of small
displacements. If the temporal sampling of the image se-
quence is too coarse, this precondition will be violated and
a linearised approach fails. To overcome this problem, Brox
et al. (2004) proposed a coarse-to-fine multiscale warping
strategy. To obtain a coarse representation of the problem,
we downsample the input images by a factor η ∈ [0.5, 1.0).
Prior to downsampling, we apply a low-pass filter to the im-
ages by performing a Gaussian convolution with standard
deviation

2/(4η). This prevents aliasing problems.
At each warping level, we split the flow field into an
already computed solution from coarser levels and an un-
known flow increment. As the increments are small, they
can computed by the presented linearised approach. At the
next finer level, the already computed solution serves as ini-
tialisation, which is achieved by performing a motion com-
pensation of the second frame by the current flow, known as
warping. For warping with subpixel precision we again use
Coons patches based on bicubic interpolation (Coons 1967).
Adapting the Smoothness Weight to the Warping Level The
influence of the data termusually becomes smaller at coarser
levels of our multiscale framework. This is due to the
smoothing properties of the downsampling that leads to
smaller values of the image gradients at coarse levels. Such
a behaviour is in fact desirable as the data term might not
be reliable at coarse levels. Our proposed data term normal-
isation leads, however, to image gradients that are approxi-
mately in the same range at each level. To recover the previ-
ous reduction of the data term at coarse levels, we propose
to adapt the smoothness weight α to the warping level k.
This is achieved by setting α(k) = α/η
k
which results in
larger values of α and an emphasis of the smoothness term
at coarse levels.
4.2 Discretisation
We follow Bruhn et al. (2006) for the discretisation of the
Euler-Lagrange equations. The images and the flow fields
are sampled on a rectangular pixel grid with grid size h and
temporal step size τ .
Spatial image derivatives are approximated via central fi-
nite differences using the stencil
1
12h
(1, −8, 0, 8, −1), re-
sulting in a fourth order approximation. The spatial flow
derivatives are discretised by second order approximations
with the stencil
1
2h
(−1, 0, 1). For approximating temporal
image derivatives we use a two-point stencil (−1, 1), result-
ing in a temporal difference. Concerning the temporal flow
derivatives that occur in the spatio-temporal case, we use the
stencil (−1, 1)/τ . Here, it makes sense to adapt the value of
τ to the given image sequence to allow for an appropriate
scaling of the temporal direction compared to the spatial di-
rections (Weickert and Schnörr 2001b).
When computing the motion tensor, occurring derivatives
are averaged from the two frames at time t and t + 1 to
obtain a lower approximation error. For the regularisation
tensor, the derivatives are solely computed at the first frame
as we only want to consider directional information from the
reference image.
5 Experiments
In our experiments we show the benefits of the OFH ap-
proach. The first experiments are concerned with our robust
data term and the complementary smoothness term in the
spatial and the spatio-temporal domain. Then, we turn to the
automatic selection of the smoothness weight. After a small
experiment on the importance of anti-aliasing in the warping
scheme, we finish our experiments by presenting the perfor-
mance at the Middlebury optic flow benchmark (Baker et al.
2009, http://vision.middlebury.edu/flow/eval/).
As all considered sequences exhibit relatively large dis-
placements, we use the multiscale warping approach de-
scribed in Sect. 4.1. The flow fields are visualised by a
colour code where hue encodes the flow direction and
brightness the magnitude, see Fig. 3 (d). If not said oth-
erwise, we use the following constant parameters: σ =0.5,
γ =20.0, ρ =4.0, η =0.95, ζ =0.1, ε =0.001, λ =0.1.
5.1 Robust Data Term
Benefits of Normalisation and the HSV Colour Space We
proposed two main innovations in the data term: constraint
normalisation and using an HSV colour representation. In
our first experiment, we thus compare our method against
variants (i) without data term normalisation, and (ii) using
the RGB instead of the HSV colour space. For the latter
we only separately robustify the brightness and the gradi-
ent constancy assumption, as a separate robustification of the
RGB channels makes no sense. In Fig. 3 we show the results
for the Snail sequence that we have created. Note that it is a
rather challenging sequence due to severe shadows and large
displacements up to 25 pixels. When comparing the results
380 Int J Comput Vis (2011) 93: 368–388
Fig. 3 Results for our Snail sequence with different variants of our
method. First row, from left to right: (a) First frame. (b) Zoom in
marked region of first frame. (c) Same for second frame. Second row,
from left to right: (d) Colour code. (e) Flow field in marked region,
without normalisation (α = 5000.0). (f) Same for RGB colour space
(α = 300.0). Third row, from left to right: (g) Same for TV regulari-
sation (α =50.0). (h) Same for image- and flow-driven regularisation
(Sun et al. 2008) (α =2000.0). (i) Same for our method (α =2000.0)
to our result in Fig. 3 (i), the following drawbacks of the
modified versions become obvious: Without data term nor-
malisation (Fig. 3 (e)), unpleasant artefacts at image edges
arise, even when using a large smoothness weight α. When
relying on the RGB colour space (Fig. 3 (f)), a phantom mo-
tion in the shadow region at the right border is estimated.
To further substantiate these observations, we compare
the discussed variants to our proposed method on some Mid-
dlebury training sequences. The results of our method are
shown together with the ground truth flow fields in Fig. 4.
To evaluate the quality of the flow fields compared to the
given ground truth, we use the average angular error (AAE)
Int J Comput Vis (2011) 93: 368–388 381
Fig. 4 Results for some Middlebury sequences with ground truth.
First column: Reference frame. Second column: Ground truth (white
pixels mark locations where no ground truth is given). Third col-
umn: Result with our method. From top to bottom: Rubberwhale
(α =3500.0, σ =0.3, γ =20.0, ρ =1.0, λ =0.01 =⇒AAE =2.38

,
AEE =0.076), Dimetrodon (α =8000.0, σ =0.7, γ =25.0, ρ =1.5,
λ = 0.01 =⇒ AAE = 1.39

, AEE = 0.073), Grove3 (α = 20.0, σ =
0.5, γ = 0.2, ρ = 1.0, λ = 0.1 =⇒ AAE = 4.87

, AEE = 0.487),
and Urban3 (α = 75.0, σ = 0.5, γ = 1.0, ρ = 1.5, λ = 0.1 =⇒
AAE =2.77

, AEE =0.298)
measure (Barron et al. 1994). To ease comparison with other
methods, we also give the alternative average endpoint er-
ror (AEE) measure (Baker et al. 2009). The resulting AAE
measures are listed in Table 2. Compared to our proposed
method, the results without normalisation are always signifi-
cantly worse. When using the RGB colour space, the results
are mostly inferior, apart from the Rubberwhale sequence.
Adeeper discussion on the assets and drawbacks of the RGB
and the HSV colour space can be found in Sect. 5.5. Addi-
tionally, Table 2 also lists results obtained with greyscale im-
ages. These are, however, always worse compared to RGB
or HSV results.
Effect of the Separate Robust Penalisation This experi-
ment illustrates the desirable effect of our separate robust
penalisation of the HSV channels. Using the Rubberwhale
sequence from the Middlebury database, we show in Fig. 5
382 Int J Comput Vis (2011) 93: 368–388
Table 2 Comparing different variants of our method (using the AAE).
In the Sun et al. regulariser, we use the Charbonnier et al. (1994) pe-
naliser function as this gives better results here
Sequence Rubber- Dime- Grove3 Urban3
whale trodon
Proposed 2.38

1.39

4.87

2.77

No normalisation 2.89

1.45

5.72

4.88

RGB colour space 2.34

1.44

5.14

2.97

Greyscale images 2.61

1.59

5.35

3.21

TV regulariser 2.54

1.39

5.09

3.27

Sun et al. regulariser 2.58

1.40

5.27

3.53

CLG 2.44

1.42

4.86

2.72

Fig. 5 Effect of our separate robust penalisation of the HSV channels.
First row, from left to right: (a) Zoom in first frame of the Rubberwhale
sequence. (b) Visualisation of the corresponding hue channel weight.
Brighter pixels correspond to a larger weight. Second row, from left
to right: (c) Same for the saturation channel. (d) Same for the value
channel
the data term weights

M
(w

¯
J
i
0
w) for the brightness con-
stancy assumption on the hue, the saturation and the value
channel (i =1, . . . , 3). Here, brighter pixels correspond to a
larger weight and we only show a zoom for better visibility.
As we can see, the weight of the value channel is reduced
in the shadow regions (left of the wheel, of the orange toy
and of the clam). This is desirable as the value channel is not
invariant under shadows, see Fig. 2.
A CLGVariant of Our Method Our next experiment is con-
cerned with a CLG variant of our data term where we, as for
the regularisation tensor, perform a Gaussian convolution
of the motion tensor entries. First, we compare our method
against a CLG variant for some Middlebury sequences, see
Table 2. We find that the CLG variant may only lead to small
improvements. On the other hand, it may also deteriorate
the results. We thus conclude that for the considered Mid-
Table 3 Comparison of our method to a CLG variant on noisy ver-
sions of the Yosemite sequence (using the AAE). We have added
Gaussian noise with zero mean and standard deviation σ
n
σ
n
0 10 20 40
CLG 1.75

4.01

5.82

8.14

Proposed 1.64

3.82

6.08

8.55

dlebury test sequences, this modification seems not too use-
ful.
However, a CLG variant of our method can actually be
useful in the presence of severe noise in the image sequence.
To prove this, we compare in Table 3 the performance of
our method to its CLG counterpart on noisy versions of
the Yosemite sequence. As it turns out, the CLG variant im-
proves the results at large noise scales, but deteriorates the
quality for low noise scenarios. This also explains the expe-
rienced behaviour on the Middlebury data sets, which hardly
suffer from noise.
5.2 Complementary Smoothness Term
Comparison with Other Regularisers. In Fig. 3, we com-
pare our method against two results obtained when using
another regulariser in conjunction with our robust data term:
(i) Using the popular TV regulariser; see (40) and (42). (ii)
Using the anisotropic image and flow-driven regulariser of
Sun et al. (2008), which built the basis of our complemen-
tary regulariser. Here, we use a rotationally invariant formu-
lation that can be obtained from (49) by replacing the eigen-
vectors r
i
of the regularisation tensor by the eigenvectors s
i
of the structure tensor. Comparing the obtained results to our
result in Fig. 3 (i), we see that TV regularisation (Fig. 3 (g)),
leads to blurred and badly localised flow edges. Using the
regulariser of Sun et al. (2008) (Fig. 3 (h)), unpleasant stair-
casing artefacts deteriorate the result.
As for the data term experiments, we also applied our
method with different regularisers to some Middlebury se-
quences. Considering the corresponding error measures in
Table 2, we see that the alternative regularisation strategies
mostly lead to higher errors. Only for the Dimetrodon se-
quence, the TV regulariser is en par with our result. Our
flow fields in Fig. 4 in general closely resemble the ground
truth. However, minor problems arising from incorporating
image information in the smoothness term are visible at a
few locations, e.g. at the wheel in the Rubberwhale result
and at the bottom of the Urban 3 result. Furthermore, some
tiny flow details are smoothed out, e.g. at the leafs in the
Grove 3 result. As recently shown by Xu et al. (2010), the
latter problem could be resolved by using a more sophisti-
cated warping strategy.
Int J Comput Vis (2011) 93: 368–388 383
Fig. 6 Results for the Marble sequence with our spatio-temporal
method. First row, from left to right: (a) Reference frame (frame 16).
(b) Ground truth (white pixels mark locations where no ground truth
is available). Second row, from left to right: (c) Result using 2 frames
(16–17). (d) Same for 6 frames (14–19)
Table 4 Smoothness weight α and AAE measures for our spatio-
temporal method on the Marble sequence, see Fig. 6. All results used
the fixed parameters σ = 0.5, γ = 0.5, ρ = 1.0, τ = 1.5, η = 0.5.
When using more than two frames, the convolutions with K
σ
and K
ρ
are performed in the spatio-temporal domain
Number of frames 2 4 6 8
(from – to) (16–17) (15–18) (14–19) (13–20)
Smoothn. weight α 75.0 50.0 50.0 50.0
AAE 4.85

2.63

1.86

2.04

Optic Flow in the Spatio-Temporal Domain Let us now
turn to the spatio-temporal extension of our complementary
smoothness term. As most Middlebury sequences consist
of 8 frames, a spatio-temporal method would in general be
applicable. However, the displacements between two subse-
quent frames are often rather large there, resulting in a vio-
lation of the assumption of a temporally smooth flow field.
Consequently, spatio-temporal methods do not improve the
results. In our experiments, we use the Marble sequence
(available at http://i21www.ira.uka.de/image_sequences/)
and the Yosemite sequence from the Middlebury datasets.
These sequences exhibit relatively small displacements and
our spatio-temporal method allows to obtain notably better
results, see Fig. 6 and Tables 4 and 5. Note that when using
more than two frames, a smaller smoothness weight α has
to be chosen and that a too large temporal window may also
deteriorate the results again.
Table 5 Smoothness weight α and AAE measures for our spatio-
temporal method on the Yosemite sequence from Middlebury. All re-
sults used the fixed parameters σ = 1.0, γ = 20.0, ρ = 1.5, τ = 1.0,
η =0.5. Here, better results could be obtained when disabling the tem-
poral presmoothing
Number of frames 2 4 6 8
(from – to) (10–11) (9–12) (8–13) (7–14)
Smoothn. weight α 2000.0 1000.0 1000.0 1000.0
AAE 1.65

1.16

1.05

1.01

Fig. 7 Automatic selection of the smoothness weight α at the Grove2
sequence. From left to right: (a) Angular error (AAE) for 51 values of
α, computed from α
0
=400.0 and a stepping factor a =1.1. (b) Same
for the proposed data constancy error (ADCE
1,3
)
Table 6 Results (AAE) for some Middlebury sequences when (i) fix-
ing the smoothness weight (α =400.0), (ii) estimating α and (iii) with
the optimal value of α
Sequence Fixed α Estimated α Optimal α
Rubberwhale 3.43

3.00

3.00

Grove2 2.59

2.43

2.43

Grove3 5.50

5.62

5.50

Urban2 3.22

2.84

2.66

Urban3 3.44

3.37

3.35

Hydrangea 1.96

1.94

1.86

Yosemite 2.56

1.89

1.71

Marble 5.73

5.05

4.94

5.3 Automatic Selection of the Smoothness Weight
Performance of our Proposed Error Measure We first
show that our proposed data constancy error between frame
1 and 3 (ADCE
1,3
) is a very good approximation of the pop-
ular angular error (AAE) measure. To this end, we com-
pare the two error measures for the Grove2 sequence in
Fig. 7. It becomes obvious that our proposed error measure
(Fig. 7 (b)) indeed exhibits a shape very close to the angular
error shown in Fig. 7 (a). As our error measure reflects the
quality of the prediction with the given flow field, our result
further substantiate the validity of the proposed OPP.
384 Int J Comput Vis (2011) 93: 368–388
Benefits of an Automatic Parameter Selection Next, we
show that our automatic parameter selection works well for
a large variety of different test sequences. In Table 6, we
summarise the AAE obtained when (i) setting α to a fixed
value (α = α
0
= 400.0), (ii) using our automatic parame-
ter selection method, and (iii) selecting the (w.r.t. the AAE)
optimal value of α under the tested proposals. It turns out
that estimating α with our proposed method allows to im-
prove the results compared to a fixed value of α in almost
all cases. Just for the Grove 3 sequence, the fixed value of α
accidentally coincides with the optimal value. Compared to
the results achieved with an optimal value of α, our results
are on average 3% and at most 10% worse than the optimal
result.
We wish to note that a favourable performance of the pro-
posed parameter selection of course depends on the validity
of our assumptions (constant speed and linear trajectories
of objects). For all considered test sequences these assump-
tions were obviously valid. In order to show the limitations
of our method, we applied our approach again to the same
sequences, but replaced the third frame by a copy of the sec-
ond one. This violates the assumption of a constant speed
and leads to a severe impairment of results (up to 30%).
5.4 Importance of Anti-Aliasing in the Warping Scheme
We proposed to presmooth the images prior to downsam-
pling in order to avoid aliasing problems. In most cases, the
resulting artefacts will not significantly deteriorate the flow
estimation, which can be attributed to the robust data term.
However, for the Urban sequence from the official Middle-
bury benchmark, anti-aliasing is crucial for obtaining rea-
sonable results, see Fig. 8. As it turns out, the large displace-
ment of the building in the lower left corner can only be esti-
mated when using anti-aliasing. We explain this by the high
frequent stripe pattern on the facade of the building.
5.5 Comparison to State-of-the-Art Methods
To compare our method to the state-of-the-art in optic flow
estimation, we submitted our results to the popular Middle-
bury benchmark (http://vision.middlebury.edu/flow/eval/).
We found that for the provided benchmark sequences,
using a HSV colour representation is not as beneficial as
seen in our experiment from Fig. 3. As the Middlebury se-
quences hardly suffer from difficult illumination conditions,
we cannot profit from the photometric invariances of the
HSV colour space. On the other hand, some sequences even
pose problems in their HSV representation. As an exam-
ple, consider the results for the Teddy sequence in the first
row of Fig. 9. Here we see that the small white triangle be-
neath the chimney causes unpleasant artefacts in the flow
field. This results from the problem that greyscales do not
Fig. 8 Importance of anti-aliasing on the example of the Urban se-
quence. Top row, from left to right: (a) Frame 10. (b) Frame 11. Bot-
tom row, from left to right: (c) Our result without anti-aliasing. (d)
Same with anti-aliasing. Note that for this sequence, no ground truth is
publicly available
have a unique representation in the hue as well as the satura-
tion channel. Nevertheless, there are also sequences where
a HSV colour representation is beneficial. For the Mequon
sequence (second row of Fig. 9) a HSV colour representa-
tion removes artefacts in the shadows left of the toys. The
bottom line is, however, that for the whole set of benchmark
sequences, we obtain slightly better results when using the
RGB colour space. Thus, we use this variant of our method
for evaluation at the Middlebury benchmark.
For our submission we used in accordance to the guide-
lines a fixed set of parameters: σ = 0.5, γ = 20.0, ρ =
4.0, η = 0.95, ζ = 0.1, λ = 0.1. The smoothness weight
α was automatically determined by our proposed method
with the settings n
α
= 8, α
0
= 400.0, a = 1.2. For the
Teddy sequence with only two frames, we set α = α
0
, as
our parameter estimation method is not applicable in this
case. The resulting running time for the Urban sequence
(640 × 480 pixels) was 620s on a standard PC (3.2 GHz
Intel Pentium 4). For the parameter selection we com-
puted 2 · 8 + 1 = 17 flow fields, corresponding to approx-
imately 36s per flow field. Following the trend of paral-
lel implementations on modern GPUs (Zach et al. 2007;
Grossauer and Thoman 2008; Sundaram et al. 2010; Werl-
berger et al. 2010) we recently came up with a GPU version
of our method that can compute flow fields of size 640×480
pixels in less than one second (Gwosdek et al. 2010).
At the time of submission (August 2010), we achieve the
4th place w.r.t. the AAE and the AEE measure among 39
listed methods. Note that our previous Complementary Op-
tic Flow method (Zimmer et al. 2009) only ranks 6th for the
AAE and 9th for the AEE, which demonstrates the benefits
of the proposed novelties in this paper, like the automatic
parameter selection and the anti-aliasing.
Int J Comput Vis (2011) 93: 368–388 385
Fig. 9 Comparison of results obtained with HSV or RGB colour representation. First row, from left to right: (a) Frame 10 of the Teddy sequence.
(b) Frame 11. (c) Result when using the HSV colour space (AAE = 3.94

). (d) Same for the RGB colour space (AAE = 2.64

). Second row, from
left to right: (e) Frame 10 of the Mequon sequence. (f) Frame 11. (g) Result when using the HSV colour space (AAE = 2.28

). (h) Same for the
RGB colour space (AAE = 2.84

)
Table 7 Estimated values of the smoothness weight α, using our au-
tomatic parameter selection method
Sequence Smoothness weight α
Army 277.8
Grove 277.8
Mequon 277.8
Schefflera 691.2
Urban 480.0
Wooden 995.3
Yosemite 1433.3
In Table 7 we additionally summarise the estimated val-
ues of α resulting from our automatic parameter selection
method. As desired, for sequences with small details in
the flow field (Army, Grove, Mequon) a small smoothness
weight is chosen. On the other hand, sequences like Wooden
and Yosemite with a rather smooth flow yield significantly
larger values for the smoothness weight.
6 Conclusions and Outlook
In this paper we have shown how to harmonise the three
main constituents of variational optic flow approaches: the
data term, the smoothness term and the smoothness weight.
This was achieved by two main ideas: (i) We developed a
smoothness term that achieves an optimal complementary
smoothing behaviour w.r.t. the imposed data constraints. (ii)
We presented a simple, yet well performing method for de-
termining the optimal smoothness weight for the given im-
age sequence. To this end, we came up with a novel para-
digm, the optimal prediction principle (OPP).
Our optic flow in harmony (OFH) method bases on an
advanced data term that combines and extended success-
ful concepts like normalisation, photometric invariant colour
representation, higher order constancy assumptions and ro-
bust penalisation. The anisotropic complementary smooth-
ness term incorporates directional information from the mo-
tion tensor occurring in the data term. The smoothing in data
constraint direction is reduced to avoid interference with
the data term, while a strong smoothing in the orthogonal
direction allows to fill-in missing information. This yields
an optimal complementary between both terms. Further-
more, our smoothness term unifies the benefits of image-
and flow-driven regularisers, resulting in sharp flow edges
without oversegmentation artefacts. The proposed parame-
ter selection method bases on the OPP that we introduced
in this paper. It states the flow field obtained with an opti-
mal smoothness weight allows for the best prediction of the
next frames in the image sequence. Under some assump-
tions, the quality of the prediction can be judged by evaluat-
ing the data constraints between first and third frame of the
sequence and using the doubled flow vectors. Due to its sim-
plicity, our method can easily be used in all variational optic
flow approaches and additionally gives surprisingly good re-
sults.
The benefits of the OFH idea are demonstrated by our
extensive experimental validation and the competitive per-
formance at the Middlebury optic flow benchmark. Our pa-
per thus shows that a careful design of data and smoothness
term together with an automatic choice of the smoothness
386 Int J Comput Vis (2011) 93: 368–388
weight allows to outperform other well-engineered methods
that incorporate many more processing steps, e.g. segmen-
tation (Lei and Yang 2009), or the integration of an epipo-
lar geometry prior (Wedel et al. 2009). We thus hope that
our work will give rise to more “harmonised” approaches in
other fields where energy-based methods are used, e.g. im-
age registration.
Our current research is on the one hand concerned with
investigating possibilities to small issues with the proposed
regularisation strategy; see our remarks in Sect. 5.2. Further-
more, we are interested in extensions of our novel parameter
selection approach. Here, we focus on higher efficiency and
ways to apply our method in cases where only two images
are available. The latter would also remedy the problems
with non-constant speed of objects that we have mentioned
in Sect. 5.3.
Acknowledgements The authors gratefully acknowledge partial
funding by the International Max-Planck Research School (IMPRS)
and the Cluster of Excellence “Multimodal Computing and Interac-
tion” within the Excellence Initiative of the German Federal Govern-
ment.
Appendix: Derivation of the Diffusion Tensor for the
Complementary Regulariser
Consider the complementary regulariser from (50):
V(∇
2
u, ∇
2
v) =
V
_
u
2
r
1
+v
2
r
1
_
+u
2
r
2
+v
2
r
2
. (57)
Its contributions to the Euler-Lagrange equations are given
by

x
_

u
x
V
_
+∂
y
_

u
y
V
_
, (58)
and

x
_

v
x
V
_
+∂
y
_

v
y
V
_
, (59)
respectively. We exemplify the computation of the first ex-
pression (58). The second one then follows analogously. Let
us first define the abbreviations
r
i
:=(r
i1
, r
i2
)

and ψ

V
(r
i
) :=

V
_
u
2
r
i
+v
2
r
i
_
, (60)
for i = 1, . . . , 2. With their help, we compute the expres-
sions

u
x
V = 2
_
ψ

V
(r
1
) u
r
1
r
11
+u
r
2
r
21
_
, (61)

u
y
V = 2
_
ψ

V
(r
1
) u
r
1
r
12
+u
r
2
r
22
_
(62)
from (58). Using the fact that

x
_

u
x
V
_
+∂
y
_

u
y
V
_
=div(∂
u
x
V, ∂
u
y
V)

, (63)
we obtain by plugging (61) and (62) into (58):

x
_

u
x
V
_
+∂
y
_

u
y
V
_
=2div
_
ψ

V
(r
1
) u
r
1
r
11
+u
r
2
r
21
ψ

V
(r
1
) u
r
1
r
12
+u
r
2
r
22
_
.
(64)
By multiplying out the expressions inside the divergence ex-
pressions one ends up with

x
_

u
x
V
_
+∂
y
_

u
y
V
_
=2div
_

V
(r
1
)r
2
11
+r
2
21
)u
x
+(ψ

V
(r
1
)r
11
r
12
+r
21
r
22
)u
y

V
(r
1
)r
11
r
12
+r
21
r
22
)u
x
+(ψ

V
(r
1
)r
2
12
+r
2
22
)u
y
_
.
(65)
We can write above equation in diffusion tensor notation as

x
_

u
x
V
_
+∂
y
_

u
y
V
_
=2div(D∇
2
u), (66)
with the diffusion tensor
D :=
_
ψ

V
(r
1
)· r
2
11
+1· r
2
21
ψ

V
(r
1
)· r
11
r
12
+1· r
21
r
22
ψ

V
(r
1
)· r
11
r
12
+1· r
21
r
22
ψ

V
(r
1
)· r
2
12
+1· r
2
22
_
.
(67)
We multiplied the second term of each sum by a factor of
1 to clarify that the eigenvalues of D are ψ

V
(r
1
) and 1, re-
spectively. The corresponding eigenvectors are r
1
and r
2
,
respectively, which allows to rewrite the tensor D as
D =
_
r
11
r
21
r
12
r
22
_
_
ψ

V
(r
1
) · r
11
ψ

V
(r
1
) · r
12
1 · r
21
1 · r
22
_
= (r
1
| r
2
)
_
ψ

V
(r
1
) 0
0 1
_
_
r

1
r

2
_
. (68)
This shows that D is identical to D
CR
from (51), as it can be
written as
D=ψ

V
(r
1
) r
1
r

1
+r
2
r

2
=

V
_
u
2
r
1
+v
2
r
1
_
r
1
r

1
+r
2
r

2
.
(69)
References
Alvarez, L., Esclarín, J., Lefébure, M., & Sánchez, J. (1999). A PDE
model for computing the optical flow. In Proc. XVI congreso de
ecuaciones diferenciales y aplicaciones (pp. 1349–1356). Las Pal-
mas de Gran Canaria, Spain.
Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., &Szeliski,
R. (2009). A database and evaluation methodology for optical
flow. Tech. Rep. MSR-TR-2009-179, Microsoft Research, Red-
mond, WA.
Barron, J. L., Fleet, D. J., & Beauchemin, S. S. (1994). Performance
of optical flow techniques. International Journal of Computer Vi-
sion, 12(1), 43–77.
Int J Comput Vis (2011) 93: 368–388 387
Bertero, M., Poggio, TA, & Torre, V. (1988). Ill-posed problems in
early vision. Proceedings of the IEEE, 76(8), 869–889.
Bigün, J., Granlund, G. H., & Wiklund, J. (1991). Multidimensional
orientation estimation with applications to texture analysis and
optical flow. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 13(8), 775–790.
Black, M. J., & Anandan, P. (1996). The robust estimation of multiple
motions: parametric and piecewise smooth flow fields. Computer
Vision and Image Understanding, 63(1), 75–104.
Blake, A., & Zisserman, A. (1987). Visual reconstruction. Cambridge:
MIT Press.
Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accu-
racy optical flow estimation based on a theory for warping. In T.
Pajdla & J. Matas (Eds.), Computer vision—ECCV 2004, Part IV.
Lecture notes in computer science (Vol. 3024, pp. 25–36). Berlin:
Springer.
Bruhn, A., & Weickert, J. (2005). Towards ultimate motion estima-
tion: combining highest accuracy with real-time performance. In
Proc. tenth international conference on computer vision (Vol. 1,
pp. 749–755). Beijing: IEEE Computer Society Press.
Bruhn, A., Weickert, J., & Schnörr, C. (2005). Lucas/Kanade meets
Horn/Schunck: combining local and global optic flow methods.
International Journal of Computer Vision, 61(3), 211–231.
Bruhn, A., Weickert, J., Kohlberger, T., & Schnörr, C. (2006).
A multigrid platform for real-time motion computation with
discontinuity-preserving variational methods. International Jour-
nal of Computer Vision, 70(3), 257–277.
Charbonnier, P., Blanc-Féraud, L., Aubert, G., & Barlaud, M. (1994).
Two deterministic half-quadratic regularization algorithms for
computed imaging. In Proc. IEEE international conference on
image processing (Vol. 2, pp. 168–172). Austin: IEEE Computer
Society Press.
Cohen, I. (1993). Nonlinear variational method for optical flow compu-
tation. In Proc. eighth Scandinavian conference on image analysis
(Vol. 1, pp. 523–530). Norway, Tromsø.
Coons, S. A. (1967). Surfaces for computer aided design of space
forms (Tech. Rep. MIT/LCS/TR-41). Massachusetts Institute of
Technology, Cambridge.
Elsgolc, L. E. (1962). Calculus of variations. London: Pergamon.
Fleet, D. J., & Jepson, A. D. (1990). Computation of component image
velocity from local phase information. International Journal of
Computer Vision, 5(1), 77–104.
Förstner, W., & Gülch, E. (1987). A fast operator for detection and
precise location of distinct points, corners and centres of circu-
lar features. In Proc. ISPRS intercommission conference on fast
processing of photogrammetric data (pp. 281–305). Interlaken,
Switzerland.
Golland, P., & Bruckstein, A. M. (1997). Motion from color. Computer
Vision and Image Understanding, 68(3), 346–362.
Grossauer, H., & Thoman, P. (2008). GPU-based multigrid: real-time
performance in high resolution nonlinear image processing. In
A. Gasteratos, M. Vincze, & J. K. Tsotsos (Eds.), Lecture notes in
computer science: Vol. 5008. Computer vision systems (pp. 141–
150). Berlin: Springer.
Gwosdek, P., Zimmer, H., Grewenig, S., Bruhn, A., & Weickert, J.
(2010). A highly efficient GPU implementation for variational
optic flow based on the Euler-Lagrange framework. In Proc.
2010 ECCV workshop on computer vision with GPUs, Heraklion,
Greece.
Horn, B., & Schunck, B. (1981). Determining optical flow. Artificial
Intelligence, 17, 185–203.
Krajsek, K., & Mester, R. (2007). Bayesian model selection for opti-
cal flow estimation. In F. A. Hamprecht, C. Schnörr, & B. Jähne
(Eds.), Pattern recognition. Lecture notes in computer science
(pp. 142–151). Berlin: Springer.
Lai, S. H., & Vemuri, B. C. (1998). Reliable and efficient computation
of optical flow. International Journal of Computer Vision, 29(2),
87–105.
Lei, C., & Yang, Y. H. (2009). Optical flow estimation on coarse-to-
fine region-trees using discrete optimization. In Proc. 2009 IEEE
international conference on computer vision. Kyoto: IEEE Com-
puter Society Press.
Lucas, B., & Kanade, T. (1981). An iterative image registration tech-
nique with an application to stereo vision. In Proc. seventh inter-
national joint conference on artificial intelligence (pp. 674–679).
Vancouver, Canada.
Mileva, Y., Bruhn, A., & Weickert, J. (2007). Illumination-robust vari-
ational optical flow with photometric invariants. In F. A. Ham-
precht, C. Schnörr, & B. Jähne (Eds.), Pattern recognition. Lec-
ture notes in computer science (pp. 152–162). Berlin: Springer.
Murray, D. W., & Buxton, B. F. (1987). Scene segmentation from vi-
sual motion using global optimization. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 9(2), 220–228.
Nagel, H. H. (1990). Extending the ‘oriented smoothness constraint’
into the temporal domain and the estimation of derivatives of opti-
cal flow. In O. Faugeras (Ed.), Lecture notes in computer science:
Vol. 427. Computer vision—ECCV ’90 (pp. 139–148). Berlin:
Springer.
Nagel, H. H., & Enkelmann, W. (1986). An investigation of smooth-
ness constraints for the estimation of displacement vector fields
from image sequences. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 8, 565–593.
Ng, L., & Solo, V. (1997). A data-driven method for choosing smooth-
ing parameters in optical flow problems. In Proc. 1997 IEEE in-
ternational conference on image processing (Vol. 3, pp. 360–363).
Los Alamitos: IEEE Computer Society.
Perona, P., & Malik, J. (1990). Scale space and edge detection using
anisotropic diffusion. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 12, 629–639.
Rudin, L. I., Osher, S., & Fatemi, E. (1992). Nonlinear total variation
based noise removal algorithms. Physica D, 60, 259–268.
Schnörr, C. (1993). On functionals with greyvalue-controlled smooth-
ness terms for determining optical flow. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 15(10), 1074–1079.
Schnörr, C. (1994). Segmentation of visual motion by minimizing con-
vex non-quadratic functionals. In Proc. twelfth international con-
ference on pattern recognition (Vol. A, pp. 661–663). Jerusalem:
IEEE Computer Society Press.
Schoenemann, T., & Cremers, D. (2006). Near real-time motion seg-
mentation using graph cuts. In K. Franke, K. R. Müller, B. Nick-
olay, & R. Schäfer (Eds.), Pattern recognition, Lecture notes in
computer science (pp. 455–464). Berlin: Springer.
Shulman, D., & Hervé, J. (1989). Regularization of discontinuous flow
fields. In Proc. workshop on visual motion (pp. 81–86). Irvine:
IEEE Computer Society Press.
Simoncelli, E. P., Adelson, E. H., & Heeger, D. J. (1991). Probability
distributions of optical flow. In Proc. 1991 IEEE computer society
conference on computer vision and pattern recognition (pp. 310–
315). Maui: IEEE Computer Society Press.
Sun, D., Roth, S., Lewis, J. P., & Black, M. J. (2008). Learning optical
flow. In D. Forsyth, P. Torr, & A. Zisserman (Eds.), Computer
vision—ECCV 2008, Part III, Lecture notes in computer science
(pp. 83–97). Berlin: Springer.
Sun, D., Roth, S., & Black, M. J. (2010). Secrets of optical flow esti-
mation and their principles. In Proc. 2010 IEEE computer society
conference on computer vision and pattern recognition. San Fran-
cisco: IEEE Computer Society Press.
Sundaram, N., Brox, T., & Keutzer, K. (2010). Dense point trajectories
by GPU-accelerated large displacement optical flow. In Lecture
notes in computer science: Vol. 6311. Computer vision—ECCV
2010 (pp. 438–451). Berlin: Springer.
388 Int J Comput Vis (2011) 93: 368–388
Tretiak, O., & Pastor, L. (1984). Velocity estimation from image se-
quences with second order differential operators. In Proc. sev-
enth international conference on pattern recognition (pp. 16–19).
Montreal, Canada.
van de Weijer, J., & Gevers, T. (2004). Robust optical flow from pho-
tometric invariants. In Proc. 2004 IEEE international conference
on image processing (Vol. 3, pp. 1835–1838). Singapore: IEEE
Signal Processing Society.
Wedel, A., Pock, T., Zach, C., Bischof, H., & Cremers, D. (2008). An
improved algorithm for TV-L
1
optical flow computation. In D.
Cremers, B. Rosenhahn, A. L. Yuille, &F. R. Schmidt (Eds.), Lec-
ture notes in computer science: Vol. 5604. Statistical and geomet-
rical approaches to visual motion analysis (pp. 23–45). Berlin:
Springer.
Wedel, A., Cremers, D., Pock, T., & Bischof, H. (2009). Structure-
and motion-adaptive regularization for high accuracy optic flow.
In Proc. 2009 IEEE international conference on computer vision.
Kyoto: IEEE Computer Society Press.
Weickert, J. (1996). Theoretical foundations of anisotropic diffusion in
image processing. Computing Supplement, 11, 221–236.
Weickert, J., & Schnörr, C. (2001a). A theoretical framework for con-
vex regularizers in PDE-based computation of image motion. In-
ternational Journal of Computer Vision, 45(3), 245–264.
Weickert, J., &Schnörr, C. (2001b). Variational optic flowcomputation
with a spatio-temporal smoothness constraint. Journal of Mathe-
matical Imaging and Vision, 14(3), 245–255.
Werlberger, M., Pock, T., & Bischof, H. (2010). Motion estimation
with non-local total variation regularization. In Proc. 2010 IEEE
computer society conference on computer vision and pattern
recognition. San Francisco: IEEE Computer Society Press.
Xiao, J., Cheng, H., Sawhney, H., Rao, C., & Isnardi, M. (2006). Bilat-
eral filtering-based optical flow estimation with occlusion detec-
tion. In A. Leonardis, H. Bischof, & A. Pinz (Eds.), Lecture notes
in computer science: Vol. 3951. Computer vision—ECCV 2006,
Part I (pp. 211–224). Berlin: Springer.
Xu, L., Jia, J., & Matsushita, Y. (2010). Motion detail preserving opti-
cal flow estimation. In Proc. 2010 IEEE computer society confer-
ence on computer vision and pattern recognition. San Francisco:
IEEE Computer Society Press.
Yaroslavsky, L. P. (1985). Digital picture processing: an introduction.
Berlin: Springer.
Yoon, K. J., & Kweon, I. S. (2006). Adaptive support-weight approach
for correspondence search. IEEE Transactions on Pattern Analy-
sis and Machine Intelligence, 28(4), 650–656.
Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach
for realtime TV-L
1
optical flow. In F. Hamprecht, C. Schnörr, &
B. Jähne (Eds.), Lecture notes in computer science: Vol. 4713.
Pattern recognition (pp. 214–223). Berlin: Springer.
Zimmer, H., Bruhn, A., Weickert, J., Valgaerts, L., Salgado, A., Rosen-
hahn, B., & Seidel, H. P. (2009). Complementary optic flow. In
D. Cremers, Y. Boykov, A. Blake, & F. R. Schmidt (Eds.), Lec-
ture notes in computer science: Vol. 5681. Energy minimization
methods in computer vision and pattern recognition (EMMCVPR)
(pp. 207–220). Berlin: Springer.

Int J Comput Vis (2011) 93: 368–388

369

manner. Unfortunately, this promising concept of complementarity between data and smoothness term has not been further investigated. Our paper revives this concept for state-of-the-art optic flow models by presenting a novel complementary smoothness term in conjunction with an advanced data term. (ii) Having adjusted the smoothing behaviour to the imposed data constraints, it remains to determine the optimal balance between the two terms for the image sequence under consideration. This comes down to selecting an appropriate smoothness weight, which is usually considered a difficult task. We propose a method that is easy to implement for all variational optic flow approaches and nevertheless gives surprisingly good results. It bases on the assumption that the flow estimate obtained by an optimal smoothness weight allows for the best possible prediction of the next frames in the image sequence. This novel concept we name optimal prediction principle (OPP). 1.1 Related Work In the first ten years of research on optic flow, several basic strategies have been considered, e.g. phase-based methods (Fleet and Jepson 1990), local methods (Lucas and Kanade 1981; Bigün et al. 1991) and energy-based methods (Horn and Schunck 1981). In recent years, the latter class of methods became increasingly popular, mainly due to their potential for giving highly accurate results. Within energy-based methods, one can distinguish discrete approaches that minimise a discrete energy function and are often probabilistically motivated, and variational approaches that minimise a continuous energy functional. Our variational approach naturally incorporates concepts that have proven their benefits over the years. In the following, we briefly review advances in the design of data and smoothness terms that are influential for our work. Data Terms To cope with outliers caused by noise or occlusions, Black and Anandan (1996) replaced the quadratic penalisation from Horn and Schunck (1981) by a robust subquadratic penaliser. To obtain robustness under additive illumination changes, Brox et al. (2004) combined the classical brightness constancy assumption of Horn and Schunck (1981) with the higher-order gradient constancy assumption (Tretiak and Pastor 1984; Schnörr 1994). Bruhn and Weickert (2005) improved this idea by a separate robust penalisation of brightness and gradient constancy assumption. This gives advantages if one of the two constraints produces an outlier. Recently, Xu et al. (2010) went a step further and estimate a binary map that locally selects between imposing either

brightness or gradient constancy. An alternative to higherorder constancy assumptions can be to preprocess the images by a structure-texture decomposition (Wedel et al. 2008). In addition to additive illumination changes, realistic scenarios also encompass a multiplicative part (van de Weijer and Gevers 2004). For colour image sequences, this issue can be tackled by normalising the colour channels (Golland and Bruckstein 1997), or by using alternative colour spaces with photometric invariances (Golland and Bruckstein 1997; van de Weijer and Gevers 2004; Mileva et al. 2007). If one is restricted to greyscale sequences, using log-derivatives (Mileva et al. 2007) can be useful. A further successful modification of the data term has been reported by performing a constraint normalisation (Simoncelli et al. 1991; Lai and Vemuri 1998; Schoenemann and Cremers 2006). It prevents an undesirable overweighting of the data term at large image gradient locations. Smoothness Terms First ideas go back to Horn and Schunck (1981) who used a homogeneous regulariser that does not respect any flow discontinuities. Since different image objects may move in different directions, it is, however, desirable to also permit discontinuities. This can for example be achieved by using image-driven regularisers that take into account image discontinuities. Alvarez et al. (1999) proposed an isotropic model with a scalarvalued weight function that reduces the regularisation at image edges. An anisotropic counterpart that also exploits the directional information of image discontinuities was introduced by Nagel and Enkelmann (1986). Their method regularises the flow field along image edges but not across them. As noted by Xiao et al. (2006), this comes down to convolving the flow field with an oriented Gaussian where the orientation is adapted to the image edges. Note that for a basic data term modelling the brightness constancy assumption, the image edge direction coincides with the complementary direction orthogonal to the data constraint direction. In such cases, the regularisers of Nagel and Enkelmann (1986) and Schnörr (1993) can thus be considered as early complementary smoothness terms. Of course, not every image edge will coincide with a flow edge. Thus, image-driven strategies are prone to give oversegmentation artefacts in textured image regions. To avoid this, flow-driven regularisers have been proposed that respect discontinuities of the evolving flow field and are therefore not misled by image textures. In the isotropic setting this comes down to the use of robust, subquadratic penalisers which are closely related to line processes (Blake and Zisserman 1987). For energy-based optic flow methods, such a strategy was used by Shulman and Hervé (1989) and Schnörr (1994). An anisotropic extension was later presented by Weickert and Schnörr (2001a). The drawback of

This allows to adapt the smoothing direction to the direction of image structures and the smoothing strength to the flow contrast. however.and flow-driven behaviour. It further uses a Hue-Saturation-Value (HSV) colour representation with a separate robustification of each channel. the previous smoothness terms operate on flow derivatives that only consider their direct neighbours. Our data term combines the brightness and the gradient constancy assumption. Along constraint edges. these strategies can be classified as image-driven approaches and are thus also prone to oversegmentation problems. and performs a constraint normalisation. To judge the latter.and flow-driven methods. non-local smoothing strategies (Yaroslavsky 1985) have been introduced to the optic flow community by Sun et al. Computing the proposed error measure is. Ng and Solo (1997) hence restricted their focus to the basic method of Horn and Schunck (1981). However. Automatic Parameter Selection It is well-known that an appropriate choice of the smoothness weight is crucial for obtaining favourable results. In these approaches. Due to the colour value distance. The latter smoothness term was later successfully used in the methods of Brox et al. computationally expensive. This method does not require a brute-force search. 1. comparing non-local strategies to the previously discussed smoothness terms is somewhat difficult: Whereas non-local methods explicitly model similarity in a certain neighbourhood. Across “constraint edges”. it makes sense to also assume temporal smoothness of the flow fields. but the minimisation of the objective function is more complicated and only computationally feasible if certain approximations are performed. Concerning the individual problems of image. We further show that our regulariser can easily be extended to work in the spatio-temporal domain. Hence. Adapting ideas from Yoon and Kweon (2006). This goal was first achieved in the discrete method of Sun et al. Our anisotropic complementary smoothness term takes into account directional information from the constraints imposed in the data term. Our method for determining the optimal smoothness weight bases on the proposed OPP concept. Using this measure. where the data term gives no information. it is assumed that the flow vector at a certain pixel is similar to the vectors in a (possibly large) spatial neighbourhood. the authors developed an anisotropic regulariser that uses directional flow derivatives steered by image structures. The latter is motivated by the fact that each channel in the HSV space has a distinct level of photometric invariance and information content. an image-driven spatio-temporal smoothness terms was proposed by Nagel (1990) and a flow-driven counterpart was later presented by Weickert and Schnörr (2001b). a strong filling-in by using a quadratic penalisation makes sense. Then an anisotropic image. Nevertheless. the idea arises to combine the advantages of both worlds. i.and flowdriven strategies. a parameter selection approach that can also handle robust data terms was presented by Krajsek and Mester (2007).370 Int J Comput Vis (2011) 93: 368–388 flow-driven regularisers lies in less well-localised flow edges compared to image-driven approaches. Assuming a constant speed and linear trajectory of objects within the considered frames. but also leads to a desirable image. Recently. Concerning an optimal selection of the smoothness weight for variational optic flow approaches. There. (2008). especially for robust data terms. (2004) and Bruhn and Weickert (2005). This strategy not only allows for an optimal complementarity between data and smoothness term. there has been remarkably little research on methods that automatically estimate the optimal smoothness weight or other model parameters. a brute-force search for the smoothness weight that gives the smallest error is performed. the latter strategies model a globally smooth flow field as each pixel communicates with each other pixel through its neighbours. (2010) and Werlberger et al. This method jointly estimates the flow and the model parameters where the latter encompass the smoothness weight and also the relative weights of different data terms. It combines the benefits of image. (2010). sharp flow edges without oversegmentation problems. we perform a robust penalisation to reduce the smoothing in the direction where the data term gives the most information. The smoothness terms discussed so far only assume smoothness of the flow field in the spatial domain. yielding more than one flow field. Nevertheless. the similarity to the neighbours is weighted by a bilateral weight that depends on the spatial as well as on the colour value distance of the pixels. Finally we propose a simple method for automatically determining the optimal smoothness weight for the given image sequence. a separate robustification allows to choose the most reliable channel at each position. As image sequences usually consist of more than two frames. we evaluate the data constraints between the first and the third frame of the sequence. This leads to spatio-temporal smoothness terms. In a Bayesian framework.and flow-driven smoothness term is designed that works complementary to the data term. For variational approaches. Ng and Solo (1997) proposed an error measure which can be estimated from the image sequence and the flow estimate only. We call such a strategy image. We first develop a robust and invariant data term.2 Our Contributions The OFH method is obtained in three steps. This results in finding the optimal smoothness weight as the one corresponding to the flow field with the best prediction quality.and flow-driven regularisation. In a discrete setting they go back to Murray and Buxton (1987). this can be realised by simply doubling .e.

Organisation In Sect. ∇2 v) the smoothness term. It states that image intensities remain constant under their displacement. Section 3 describes the method proposed for determining the smoothness weight. (6) 2 = w J0 w. the corresponding data term is given by M1 (u. The latter is then extended to the spatiotemporal domain. a minimiser (u. 4. we can perform a first-order Taylor expansion that yields the linearised optic flow constraint (OFC) 0 = fx u + fy v + ft = ∇3 f w. This presmoothing step helps to reduce the influence of noise and additionally makes the image sequence infinitely many times differentiable. we show experiments in Sect. After discussing implementation issues in Sect. Lai and Vemuri (1998). ∂t ) is the spatio-temporal gradient operator and subscripts denote partial derivatives. 1 := − f t ∇2 f .Int J Comput Vis (2011) 93: 368–388 371 the flow vectors. the so-called normal flow. (1) where ∇2 := (∂x . V (∇2 u. v) = ∇3 f w with the tensor J0 := ∇3 f ∇3 f.1 Data Term Let us now derive our data term in a systematic way. Assuming that the image sequence is smooth (which can be guaranteed by the Gaussian presmoothing) and that the displacements are small.1 |∇2 f | |∇2 f | . y) ∈ Ω denotes the location within a rectangular image domain Ω ⊂ R2 . The classical starting point is the brightness constancy assumption used by Horn and Schunck (1981). ∂y ) denotes the spatial gradient operator. It is thus not sufficient to compute a unique solution. 2. Schoenemann and Cremers (2006) and using the abbreviation u := (u. which is known as the aperture problem (Bertero et al. 1988).3. v) denotes the data term. (ii) An explicit discussion on the adequate treatment of the hue channel of the HSV colour space. 2009) by the following points: (i) A more extensive derivation and discussion of the data term. (iii) A taxonomy of existing smoothness terms within a novel general framework. 2. 2 we present our variational optic flow model with the robust data term and the complementary smoothness term. (v) A simple method for automatically selecting the smoothness weight. The term M(u. T ] will be presented in Sect. (1991). (vii) A more extensive experimental validation. It is found by minimising a global energy functional of the general form E(u. v) = Ω where ∇3 := (∂x . where Kσ is a spatial Gaussian of standard deviation σ and ∗ denotes the convolution operator. Due to its simplicity. Here. We further assume that f is presmoothed by a Gaussian convolution: Given an image sequence f0 (x). With a quadratic penalisation. (vi) A deeper discussion of implementation issues. The paper is finished with conclusions and an outlook on future work in Sect. and t ∈ [0. v) of the energy (1) necessarily has to fulfil the associated Euler-Lagrange equations ∂u M − α ∂x ∂ux V + ∂y ∂uy V ∂v M − α ∂x ∂vx V + ∂y ∂vy V = 0. we rewrite the data term M1 as M1 (u. T ] denotes time. our method is easy to implement for all variational optic flow approaches. (4) 2 Variational Optic Flow Let f (x) be a greyscale image sequence where x := (x. According to the calculus of variations (Elsgolc 1962). f ∈ C ∞ . Normalisation Our experiments will show that normalising the data term can be beneficial. =0 (2) (3) with homogeneous Neumann boundary conditions. For |∇2 f | = 0 it is defined as wn := un . v) = ∇2 f u + ft = |∇2 f | 2 2 ∇2 f u ft + |∇2 f | |∇2 f | . but nevertheless produces surprisingly good results.e. (5) The single equation given by the OFC involves two unknowns u and v. the vector (x. ∂y . 5. 1) describes the displacement vector field between two frames at time t and t + 1. The optic flow field w := (u.e. The present article extends our shorter conference paper (Zimmer et al. t) . 6. i. Following Simoncelli et al. the OFC does allow to compute the flow component orthogonal to image edges. y. (iv) The extension of our complementary regulariser to the spatio-temporal domain. The more general spatio-temporal case that uses all frames t ∈ [0. f (x + w) = f (x). v) + α V (∇2 u. Note that the energy (1) refers to the spatial case where one computes one flow field between two frames at time t and t + 1. and α > 0 is a smoothness weight. ∇2 v) dx dy. v. (7) M(u. Nevertheless. we obtain f (x) = (Kσ ∗ f0 )(x). i. The latter allows to reformulate most existing as well as our novel regulariser in a common notation. v) .

(13) . θ0 := |∇2 f |2 + ζ 2 (10) where the parameter γ > 0 steers the contribution of the gradient constancy assumption. 1. however. Our geometric interpretation suggests that one should ideally penalise the distance d in a data term M2 (u. see (8).e. i. To normalise M3 . it was proposed to impose the gradient constancy assumption (Tretiak and Pastor 1984. v) = w J w. v) = w J w. The data term M1 . weighs this distance by the squared spatial image gradient.g. A sketch of this situation is given in Fig. we normalise the data term M1 by multiplying it with a factor (Simoncelli et al. (14) The term d constitutes a projection of the difference between the estimated flow u and the normal flow un in the direction of the image gradient ∇2 f . fy fy (9) where the tensor J can be written in the motion tensor notation of Bruhn et al. we replace the motion tensor J by its normalised counterpart ¯ ¯ ¯ ¯ ¯ ¯ J := J0 + γ Jxy := J0 + γ Jx + Jy := θ0 J0 + γ θx Jx + θy Jy := θ0 ∇3 f ∇3 f + γ θ x ∇3 f x ∇3 f x + θ y ∇3 f y ∇3 f y . f 3 (x)). v) = |∇2 f |2 d 2 . As a remedy. with the normalised tensor ¯ J0 := θ0 J0 = θ0 ∇3 f ∇3 f . noise in flat regions. the normalisation is not influenced for large gradients. A Taylor linearisation gives ∇3 fx w = 0 and ∇3 fy w = 0. v) = d 2 . 1 Geometric interpretation of the rewritten data term (8) = |∇2 f |2 = |∇2 f |2 ∇2 f f t ∇2 f u+ |∇2 f | |∇2 f |2 ∇2 f u − un |∇2 f | =:d 2 2 Gradient Constancy Assumption To render the data term robust under additive illumination changes. Combining both brightness and gradient constancy assumption in a quadratic way gives the data term M3 (u. e. (8) respectively. |∇2 fy |2 + ζ 2 (17) (16) The normalised data term M4 is given by ¯ M4 (u.372 Int J Comput Vis (2011) 93: 368–388 of ζ in the presence of noise. the term d describes the distance from u to the line l in the uv-space that is given by v=− fx ft u− . This overweighting is undesirable as large gradients can be caused by unreliable structures. Incorporating the normalisation into M1 leads to the data term ¯ M2 (u. ∇2 f (x + w) = ∇2 f (x). (2006) that allows to combine the two constancy assumptions in a joint tensor J := J0 + γ Jxy := J0 + γ Jx + Jy := ∇3 f ∇3 f + γ ∇3 fx ∇3 fx + ∇3 fy ∇3 fy . it states that image gradients remain constant under their displacement. it may pay off to choose a larger value Colour Image Sequences In a next step we extend our data term to multi-channel sequences (f 1 (x). This results in a stronger enforcement of the data constraint at high gradient locations. 2004). f 2 (x). as M1 (u. 1991. (12) (11) Fig. v) = w J0 w. Brox et al. with two additional normalisation factors defined as θx := 1 |∇2 fx |2 + ζ 2 and θy := 1 . In a geometric interpretation. (18) where the regularisation parameter ζ > 0 avoids division by zero. Schnörr 1994. such as noise or occlusions. the flow u has to lie according to the OFC (4). Nevertheless. it reduces the influence of small gradients which are significantly smaller than ζ 2 . Lai and Vemuri 1998) 1 . Additionally. (15) On this line. If . Thus. In contrast to the brightness constancy assumption. and the normal flow un is the smallest vector that lies on this line.

respectively. which cannot be captured by the gradient constancy assumption that is only invariant under additive illumination changes. This problem can be tackled by using the Hue Saturation Value (HSV) colour space. The corresponding motion tensor for the bright1 ¯1 J0 := θ0 ∇3 cos f 1 ∇3 cos f 1 + ∇3 sin f 1 ∇3 sin f 1 . . on the other hand. from left to right: (a) Colour image with zoom in shadow region left of the wheel. (21) where the normalisation factor is defined as 1 θ0 := 1 ∇2 2 cos f 1 + ∇2 sin f 1 + ζ 2 2 . f 1 ∈ [0◦ . 2. but appears in the value channel. from left to right: (c) Saturation channel. as can be observed for the striped cloth. where s denotes the quadratic data term.edu/flow/data/). 360◦ ). http://vision. and the value channel exhibits no invariances. The hue channel is invariant under global and local multiplicative illumination changes. (19) with normalisation factors θ i for each colour channel f i . (2007) thus only used the hue channel for optic flow computation as it exhibits the most invariances. available in the value channel. v) = M ¯ w Jc w . Mileva et al. the motion tensor (21) is equivalent to our earlier definition. We couple the three colour channels in the motion tensor ¯ J c := 3 i=1 3 ¯ J i := 3 i=1 ¯i ¯i J0 + γ Jxy := i=1 3 ¯i ¯i ¯i J0 + γ Jx + Jy := i=1 i θ 0 ∇3 f i ∇3 f i i i i i i i + γ θ x ∇3 f x ∇3 f x + θ y ∇3 f y ∇3 f y . One problem when using a HSV colour representation is that the hue channel f 1 describes an angle in a colour circle. Our solution to this problem is to consider the unit vector (cos f 1 . 2 HSV decomposition on the example of the Rubberwhale image from the Middlebury optic flow database (Baker et al. especially the hue channel discards a lot of image information. The corresponding data term reads as ¯ M5 (u. as ∇3 cos f 1 ∇3 cos f 1 + ∇3 sin f 1 ∇3 sin f 1 = sin2 f 1 ∇3 f 1 ∇3 f 1 + cos2 f 1 ∇3 f 1 ∇3 f 1 = ∇3 f 1 ∇ 3 f 1 . visualised with full saturation and value. The hue channel is hence not differentiable at the interface between 0◦ and 360◦ . Black and Anandan (1996) proposed to refrain from a quadratic penalisation. As an example. green and blue channel. the shadow at the left of the wheel (shown in the zoom) is not present in the hue and the saturation channel. We will additionally use the saturation and value channel. (d) Value channel ness constancy assumption consequently reads as Photometric Invariant Colour Spaces Realistic illumination models encompass a multiplicative influence (van de Weijer and Gevers 2004).Int J Comput Vis (2011) 93: 368–388 373 one uses the standard RGB colour space. Nevertheless. Instead they use a subquadratic penalisation function 2 2 M (s ). Second row. As we can see. Using such a robust penaliser within our data term yields M6 (u. (2004) for the sub√ quadratic penaliser M (s 2 ) := s 2 + ε 2 using a small regularisation parameter ε > 0. (22) ¯1 The tensor Jxy for the gradient constancy assumption is adapted accordingly. as proposed by Golland and Bruckstein (1997). This results in treating the hue channel as two (coupled) channels. 2009. The saturation channel is only invariant under global multiplicative illumination changes. which are both differentiable. as well as under local additive changes. (b) Hue channel. (24) Good results are reported by Brox et al. (23) where ∇ denotes the gradient in the differentiable parts of the hue channel. First row. i.middlebury.e. Note that in the differentiable parts of the hue channel. consider the HSV decomposition of the Rubberwhale image shown in Fig. v) = w J c w. This information is. because they contain information that is not encoded in the hue channel. (20) Fig. sin f 1 ) corresponding to the angle f 1 . the three channels represent the red. Robust Penalisers To provide robustness of the data term against outliers caused by noise and occlusions.

see Fig. This will be confirmed by a specific experiment in Sect. Anisotropic image-driven regularisers take into account directional information from image structures. yielding a downweighting of this channel.2 using the directional derivatives usi := si ∇2 u. Final Data Term Incorporating our separate robustification idea into M7 brings us to our final data term 3 M(u.n denotes the entry in row m and column n of the tensor J. Analysing the terms (28) and (29). v) = i=1 M ¯i w J0 w + γ 3 M i=1 ¯i w Jxy w . 2. (25) where the separate motion tensors are defined as ¯c J0 := 3 i=1 ¯i J0 ¯c and Jxy := 3 i=1 ¯i Jxy . with tr denoting the trace operator. Preliminaries for the General Framework We first introduce concepts that will be used in our general framework.2 ¯i · J0 +γ 1. These information can be obtained by considering the structure tensor (Förstner and Gülch 1987) 2 We will go further by proposing a separate robustification of each colour channel in the HSV space.1 3 ¯i v + J0 1.t. As s1 and s2 constitute an orthonormal basis.2 ¯i u + Jxy 2. Note that our final data term is (i) normalised. This can be justified by the distinct information content of each of the three channels.e. v) = M ¯c w J0 w + γ M ¯c w Jxy w .3 (32) ¯i · Jxy 1.3 with an integration scale ρ > 0. the corresponding argument of the decreasing function M will be large. Incorporating this strategy into our approach gives the data term M7 (u. (29) i=1 |∇2 u|2 = u2 + u2 = u21 + u22 . We rewrite the regularisers in a novel framework that unifies their notation and eases their comparison. A corresponding rewriting can also be performed for |∇2 v|2 . x y s s 2. fx ) denotes the vector orthogonal to ∇2 f . Most regularisers impose smoothness by penalising the magnitude of the flow gradients. we sketch some existing smoothness terms that led to our novel complementary regulariser. positive semidefinite 2 × 2 matrix that possesses two orthonormal eigenvectors s1 and s2 with corresponding eigenvalues μ1 ≥ μ2 ≥ 0.3 ⊥ where ∇2 f := (−fy .2 3 ¯i v + J0 2. The separate robustification then downweights the influence of less appropriate colour channels. 2. and M (s 2 ) denotes the derivative of M (s 2 ) w. In the case for ρ = 0. and (iii) uses the HSV colour space with (iv) a separate robustification of all colour channels. without considering neighbourhood information. we see that the separate robustification of the HSV channels makes sense: If a specific channel violates the imposed constancy assumption at a certain location. i. .2 1. which is advantageous if one assumption produces an outlier. its argument. (28) i=1 (31) ¯i · Jxy 3 1. Other channels that satisfy the constancy assumption then have a dominating influence on the solution. The contributions of our data term (27) to the EulerLagrange equations (2) and (3) are given by 3 Sρ := Kρ ∗ ∇2 f ∇2 f =: i=1 μi si si . For ρ = 0 we also have that |∇2 f |2 = tr S0 . The structure tensor is a symmetric.2 ¯i · J0 +γ 1.3 ∂v M = i=1 M ¯i w J0 w ¯i u + J0 M 2. we can write ¯i w Jxy w ¯i v + Jxy .2 Smoothness Term Following the extensive taxonomy on optic flow regularisers by Weickert and Schnörr (2001a). that drives the optic flow estimation in different ways. (30) ∂u M = i=1 M ¯i w J0 w ¯i u + J0 M 1. The vector s1 points across image structures. 5. (27) ¯ with the motion tensors J 1 adapted to the HSV colour space as described before.1 ¯i u + Jxy 1.374 Int J Comput Vis (2011) 93: 368–388 Bruhn and Weickert (2005) later proposed a separate penalisation of the brightness and the gradient constancy assumption.1. (ii) combines the brightness and gradient constancy assumption. (26) where [J]m. |∇2 f | ¯i w Jxy w ¯i v + Jxy .r. one obtains s0 = 1 ∇2 f |∇2 f | and s0 = 2 ⊥ ∇2 f . whereas the vector s2 points along them.

Image-Driven Regularisation To obtain sharp flow edges. (33) (34) with a regularisation parameter κ > 0. we will consider the corresponding Euler-Lagrange equations that can be written in the form ∂u M − α div (D ∇2 u) = 0. 2 In the definition of the normal flow (7) we have seen that a data term that models the brightness constancy assumption constraints the flow only orthogonal to image edges. (42) . we observe that in the limiting case for κ → 0. Homogeneous Regularisation First ideas for the smoothness term go back to Horn and Schunck (1981) who used a homogeneous regulariser. strictly positive weight function. The corresponding diffusion tensor DII = g(tr S0 ) I shows that the weight function allows to decrease the smoothing in accordance to the strength of image edges. 1999) reduce the smoothing at image edges. More specific. ∇2 v) := ∇2 u P(∇2 f ) ∇2 u + ∇2 v P ∇2 f ∇2 v. The drawback of image-driven strategies is that they are prone to oversegmentation artefacts in textured image regions where image edges do not necessarily correspond to flow edges. In the limiting case. i. This complementarity has been emphasised by Schnörr (1993). An isotropic image-driven regulariser goes back to Alvarez et al. where the smoothing is reduced at the boundaries of the evolving flow field via the decreasing diffusivity V . s s (41) The underlying diffusion processes perform nonlinear isotropic diffusion. and the corresponding eigenvalues determine the magnitude of smoothing. Shulman and Hervé (1989) and Schnörr (1994) proposed to use subquadratic penaliser functions for the smoothness term. the eigenvectors of D give the smoothing direction. (37) where P(∇2 f ) denotes a regularised projection matrix perpendicular to the image gradient.e. Flow-Driven Regularisation To remedy the oversegmentation problem.e. where VAI → u20 + v 20 . this regulariser can be written as VAI (∇2 u. the regulariser of Nagel and Enkelmann can hence be interpreted as a first complementary smoothness term that fills in information orthogonal to the data constraint direction. The smoothing processes thus perform homogeneous diffusion that blurs important flow edges. tr S0 + 2κ 2 s2 2 (39) with a diffusion tensor D that steers the smoothing of the flow components u and v. (38) ) := s 2 + ε2 . i. The anisotropic image-driven regulariser of Nagel and Enkelmann (1986) prevents smoothing of the flow field across image boundaries but encourages smoothing along them. image-driven methods (Nagel and Enkelmann 1986. ∇2 v) := = V V s2 s2 s2 s2 s1 s1 (35) The corresponding diffusion tensor is equal to the unit matrix DH = I. ∇2 v) := |∇2 u|2 + |∇2 v|2 2 2 = u21 + u22 + vs1 + vs2 . we obtain a smoothing solely in s0 -direction. who contributed a theoretical analysis as well as modifications of the original Nagel and Enkelmann functional. If one uses the convex penaliser (Cohen 1993) V (s 2 + κ 2I . ∂v M − α div (D ∇2 v) = 0. This is achieved by the regulariser VAI (∇2 u. In the isotropic setting. (1999) who used VII (∇2 u. s s (40) where the penaliser function V (s 2 ) is preferably increasing. s s The correctness of above rewriting can easily be verified and is based on the observations that s0 and s0 are the eigenvec1 2 tors of P.Int J Comput Vis (2011) 93: 368–388 375 To analyse the smoothing behaviour of the regularisers. differentiable and convex in s. The diffusion tensor for the regulariser of Nagel and Enkelmann (1986) is identical to the projection matrix: DAI = P. along image edges. In our common framework. ∇2 v) := g |∇2 f |2 |∇2 u|2 + |∇2 v|2 2 2 = g tr S0 u21 + u22 + vs1 + vs2 . The associated diffusion tensor is given by DIF = V 2 2 u21 + u22 + vs1 + vs2 I. Alvarez et al. In our framework it reads as VH (∇2 u. s s (36) where g is a decreasing. and that the factors in front of (u20 + v 20 ) and (u20 + v 20 ) are the corresponding eigenvalues. It is defined as P(∇2 f ) := 1 |∇2 f |2 + 2κ 2 ⊥ ⊥ ∇2 f ∇2 f |∇2 u|2 + |∇2 v|2 2 2 u21 + u22 + vs1 + vs2 . ∇2 v) = κ2 2 u20 + vs0 tr S0 + 2κ 2 s1 1 + tr S0 + κ 2 2 2 u 0 + v s0 . VIF (∇2 u. it makes sense to adapt the smoothing process to the flow edges instead of the image edges. indicated by large values of |∇2 f |2 = tr S0 . Concerning its eigenvectors and eigenvalues.

(48) which can be regarded as a generalisation of the structure tensor (30). As a result.and flow-driven smoothness term in a discrete setting. r u21 + s V V 2 vs1 V 2 vs2 . Flow-driven strategies remedy the oversegmentation problem but give less pleasant flow edges. the regulariser from Sun et al. Perona and Malik 1990) given by V (s 2 We observe that these tensors allow to obtain the desired behaviour: The regularisation direction is adapted to the image structure directions s1 and s2 . It adapts the smoothing direction to image structures but steers the smoothing strength in accordance to the flow contrast.1 Our Novel Complementary Regulariser In spite of its sophistication. yielding a regularisation tensor that also uses the spatio-temporal gradient ∇3 . (2008) given in (46) still suffers from a few shortcomings. Image. To this end we propose to analyse the eigenvectors r1 and r2 of the regularisation tensor 3 ) := λ2 log 1 + s2 . Our experiments in Sect.2. “constraint edges” instead of image edges. one obtains the same sharp flow edges as image-driven methods but does not suffer from oversegmentation problems. 5. ∇2 v) := V Rρ := i=1 i Kρ ∗ θ0 ∇2 f i ∇2 f i i i i i i i + γ θ x ∇2 f x ∇2 f x + θ y ∇2 f y ∇2 f y . A continuous version of this regulariser can be written as VAIF (∇2 u.3 we extend our regulariser to the spatio-temporal domain. (47) . we obtain two diffusion tensors. the anisotropic image. We introduce three amendments that we will discuss now. Further note that a Gaussian convolution of the motion tensor leads to a combined local-global (CLG) data term in the spirit of Bruhn et al. + u22 + s (46) Here. (2005).1 will analyse in which cases such a modification of our data term can be useful. v} read as p DAIF (49) where we use the eigenvectors ri of the regularisation tensor. 2. We propose to jointly penalise the directional derivatives. Regularisation Tensor A first remark w. (2008) who presented an anisotropic image. ∇2 v) := V 2 u21 + vr1 + r V 2 u22 + vr2 .t.RI (∇2 u.and flow-driven model from Sun et al. This aim was achieved by Sun et al. In contrast to Nagel and ⊥ Enkelmann (1986) who considered ∇2 f to obtain directional information of image structures. 1992) with the diffusivity V (s 2 1 1 .r. yielding VAIFRρ .r. Rotational Invariance The smoothness term VAIF from (46) lacks the desirable property of rotational invariance.and Flow-Driven Regularisation We have seen that image-driven methods suffer from oversegmentation artefacts. but give sharp flow edges.t. whereas the magnitude of the regularisation depends on the flow contrast encoded in ps1 and ps2 . It is thus desirable to combine the advantages of both strategies to obtain sharp flow edges without oversegmentation problems. that for p ∈ {u. λ2 (44) that results in Perona-Malik diffusion with the diffusivity V (s 2 )= 1 1+ s2 λ2 .376 Int J Comput Vis (2011) 93: 368–388 one ends up with regularised total variation (TV) regularisation (Rudin et al. The latter is due to the spatial regularisation. (45) using a contrast parameter λ > 0. In Sect. (2008) analyses the eigenvectors si of the structure tensor Sρ from (30) to obtain a more robust direction estimation. the model from (46) is that the directional information from the structure tensor Sρ is not consistent with the imposed constraints of our data term (27). because the directional derivatives of u and v in the eigenvector directions are penalised separately. 2. Note that the regularisation tensor differs from ¯ the motion tensor Jc from (19) by the facts that it (i) integrates neighbourhood information via the Gaussian convolution. they suffer from another drawback: The flow edges are not as well localised as with image-driven strategies. It is more natural to take into account directional information provided by the motion tensor (19) and to steer the anisotropic regularisation process w. = 2 V ps 1 s1 s1 + 2 V ps 2 s2 s2 . )= √ ≈ 2 s 2 + ε 2 2 |s| (43) Another possible choice is the non-convex Perona-Malik regulariser (Lorentzian) (Black and Anandan 1996. and (ii) uses the spatial gradient operator ∇2 instead of the spatio-temporal operator ∇3 . We will not discuss the anisotropic flow-driven regulariser of Weickert and Schnörr (2001a) as it does not fit in our framework and also has not been used in the design of our complementary regulariser. Despite the fact that flow-driven methods reduce the oversegmentation problem caused by image textures.

leading to almost homogeneous diffusion. The next to last column names the tensor that is analysed to obtain directional information for anisotropic strategies.and flow-driven regulariser (46). and it indicates if the regulariser is rotationally invariant. as the eigenvectors of the regularisation tensor ri are used instead of the eigenvectors of the structure tensor. and the last column indicates if the corresponding regulariser is rotationally invariant Strategy Regulariser V Directional adaptation Homogeneous (Horn and Schunck 1981) Isotropic image-driven (Alvarez et al. in r1 -direction. ∇2 v) := V 2 2 u21 + vr1 + u22 + vr2 . r Furthermore. This results in a small argument for V . (iii) We only reduce the smoothing across constraint edges. . Finally note that our complementary regulariser has the same structure. yielding several flow fields. Table 1 summarises the discussed regularisers rewritten in our framework.and flow-driven regularisation become visible. The above regulariser (49) performs a twofold robust penalisation in both eigenvector directions. our taxonomy shows their structural similarities. Along them. The corresponding joint diffusion tensor is given by DCR = V 2 u21 + vr1 r1 r1 + r2 r2 . resembling edge-enhancing anisotropic diffusion (Weickert 1996). for κ → 0 s2 s2 u21 + s V 2 vs1 + u22 + s V 2 vs2 Sρ Rρ — Single Robust Penalisation. we perform a pronounced smoothing in both directions that avoids oversegmentation artefacts. in r1 -direction. 2. Incorporating the single robust penalisation finally yields our complementary regulariser VCR (∇2 u. when analysing our joint diffusion tensor. the flow gradients are small. 2. We hence propose a single robust penalisation in r1 -direction. The benefits of this design will be confirmed by our experiments in Sect. it makes sense to also assume a (51) with V given in (45). however. Here we can distinguish two scenarios: r At a flow edge that corresponds to a constraint edge. we propose to use the Perona-Malik regulariser (44).3 Extension to a Spatio-Temporal Smoothness Term The smoothness terms we have discussed so far model the assumption of a spatially smooth flow field. yielding a reduced diffusion which preserves this important edge. the flow gradients will be large and almost parallel to r1 . which reveals the following innovations: (i) The smoothing direction is adapted to constraint edges instead of image edges. It also compares the way directional information is obtained for anisotropic strategies.Int J Comput Vis (2011) 93: 368–388 Table 1 Comparison of regularisation strategies. As image sequences in general encompass more than two frames.e. i. the data term mainly constraints the flow in direction of the largest eigenvalue of the spatial motion tensor. The derivation of this diffusion tensor is presented in Appendix. always a strong diffusion with strength 1 is performed. 2008) Anisotropic complementary image. we compare our joint diffusion tensor (51) to its counterparts (47). Only the regularisation tensor Rρ has to be adapted to the new data term. However.2 Summary To conclude this section. the argument of the decreasing function V will be large. Hence. For the penaliser V . r r (50) that complements the proposed robust data term from (27) in an optimal fashion. Discussion To understand the advantages of the complementary regulariser compared to the anisotropic image. At “deceiving” texture edges in flat flow regions. The smoothing strength across constraint edges is determined by the expres2 sion V (u21 + vr1 ). we opt for a quadratic penalisation to obtain a strong filling-in effect of missing information.2. In the orthogonal r2 -direction. even if other data terms are used. the benefits of the underlying anisotropic image. (ii) We achieve rotational invariance by coupling the two flow components in the argument of V .e. 5. 1999) Anisotropic image-driven (Nagel and Enkelmann 1986) Isotropic flow-driven (Shulman and Hervé 1989) Anisotropic image and flow-driven (Sun et al.and flow-driven V 2 2 u21 + vr1 + u22 + vr2 r r V V 2 2 u21 + u22 + vs1 + vs2 s s 2 2 g (tr S0 ) u21 + u22 + vs1 + vs2 s s 2 2 u21 + u22 + vs1 + vs2 s s 377 Rotationally invariant – – S0 – V u20 + v 20 . Thus.2. Note that despite the fact these regularisers have been developed within almost three decades. i.

3 (wα ) := 1 |Ω| +γ i=1 i i i + θy fy (x + 2wα ) − fy (x) 2 3 M Ω 3 M i i i θx fx (x + 2wα ) − fx (x) 2 i=1 i θ0 f i (x + 2wα ) − f i (x) 2 (53) ¯ For ρ = 0 it is identical to the motion tensor Jc from (19). we define the spatio-temporal complementary regulariser (ST-CR) ST VCR (∇3 u. Note that if the latter would be the case. Compared to the spatial energy (1) we note the additional integration over the time domain and that the smoothness term now depends on the spatio-temporal flow gradient. Under these assumptions. We then select the α that gives the smallest error. overfit to the first two frames and consequently result in a bad prediction of further frames. The integrand of above expression is (apart from the doubled flow field) a variant of our final data term (27) where no linearisation of the constancy assumptions have been performed. (56) The corresponding spatio-temporal diffusion tensor reads as DST = CR V 2 u21 + vr1 r1 r1 + r2 r2 + r3 r3 . This error measure bases on a novel concept. The Gaussian convolution with Kρ is now performed in the spatio-temporal domain.2 Determining the Best Parameter In general. we evaluate the imposed data constraints between the first and the third frame of the image sequence. r r r (54) dx dy. say. To this end. ∇3 v) := V 2 2 2 u21 + vr1 + u22 + vr2 + u23 + vr3 . yielding in total 2 nα + 1 tests. This strategy results in testing more values of α that are close to α0 and. tests less very small or very large values of α that hardly give reasonable results.r. the optimal prediction principle (OPP). more important. the relation between α and the ADCE is not convex. v) = M(u. To compute this measure. ∇3 v) dx dy dt. our error measure needs to judge the quality of the prediction achieved with a given flow field. This value is then incremented/decremented nα times by multiplying/dividing it with a stepping factor a > 1. v) (52) Ω×[0. To reduce the number of α values to test.t. Following this strategy. we could simply select the smoothness weight that gives the flow field with the smallest deviation from the ground truth. This makes sense as a too small smoothness weight would lead to an where wα denotes the flow field obtained with a smoothness weight α. The spatio-temporal regularisation tensor is a 3 × 3 tensor that possesses three orthonormal eigenvectors r1 . For too large smoothness weights. .1 A Novel Concept We propose an error measure that allows to judge the quality of a flow field without knowing the ground truth. we simply double the flow vectors to evaluate the data constraints between first and third frame. which excludes the use of gradient descent-like approaches for finding the optimal value of α w. standard value α0 . To evaluate the images at the subpixel locations f i (x + 2wα ) we use Coons patches based on bicubic interpolation (Coons 1967).378 Int J Comput Vis (2011) 93: 368–388 temporal smoothness of the flow fields. our error measure. To extend our complementary regulariser from (50) to the spatio-temporal domain. We propose a brute-force method similar to the one of Ng and Solo (1997): We first compute the error measures for a “sufficiently large” set of flow fields obtained with different α values. 3. leading to spatiotemporal regularisation strategies. resulting in an average data constancy error (ADCE) measure. This is especially important in real world applications of optic flow where no ground truth flow is known. r (55) 3 Automatic Selection of the Smoothness Weight The last step missing for our OFH method is a strategy that automatically determines the optimal smoothness parameter α for the image sequence under consideration. A spatio-temporal (ST) version of the general energy functional (1) reads as E ST (u. 3. we can define the ADCE between frame 1 and 3 as ADCE1. we define the spatio-temporal regularisation tensor ST ¯ Rρ := Kρ ∗ Jc . r2 and r3 . which also holds for the presmoothing of the image sequence. we propose to start from a given. we assume that the motion of the scene objects is of more or less constant speed and that it describes linear trajectories within the considered three frames. The OPP states that the flow field obtained with an optimal smoothness weight allows for the best prediction of the next frames in the image sequence. the flow fields would be too smooth and thus also lead to a bad prediction. Following the OPP.T ] + α V ST (∇3 u. With their help.

As the increments are small.middlebury. When comparing the results . Here. however. In Fig. we downsample the input images by a factor η ∈ [0. If the temporal sampling of the image sequence is too coarse. and (ii) using the RGB instead of the HSV colour space. This prevents aliasing problems. For the regularisation tensor. When computing the motion tensor. (2006) for the discretisation of the Euler-Lagrange equations. At each warping level. To recover the previous reduction of the data term at coarse levels. 5. Adapting the Smoothness Weight to the Warping Level The influence of the data term usually becomes smaller at coarser levels of our multiscale framework.1 Warping Strategy for Large Displacements The derivation of the optic flow constraint (4) by means of a linearisation is only valid under the assumption of small displacements. ζ = 0. resulting in a temporal difference. −8. This is due to the smoothing properties of the downsampling that leads to smaller values of the image gradients at coarse levels. 1).95.2 Discretisation We follow Bruhn et al. ε = 0. occurring derivatives are averaged from the two frames at time t and t + 1 to obtain a lower approximation error. 8. http://vision. As all considered sequences exhibit relatively large displacements. For warping with subpixel precision we again use Coons patches based on bicubic interpolation (Coons 1967).1 Robust Data Term Benefits of Normalisation and the HSV Colour Space We proposed two main innovations in the data term: constraint normalisation and using an HSV colour representation. In our first experiment. we use the following constant parameters: σ = 0. For approximating temporal image derivatives we use a two-point stencil (−1. this precondition will be violated and a linearised approach fails.1. At the next finer level. 3 (d).0. The spatial flow derivatives are discretised by second order approximations 1 with the stencil 2h (−1. To obtain a coarse representation of the problem.edu/flow/eval/). The flow fields are visualised by a colour code where hue encodes the flow direction and brightness the magnitude. 1). η = 0. we finish our experiments by presenting the performance at the Middlebury optic flow benchmark (Baker et al. it makes sense to adapt the value of τ to the given image sequence to allow for an appropriate scaling of the temporal direction compared to the spatial directions (Weickert and Schnörr 2001b). 2009. 4. 4. −1). we use the stencil (−1. (2004) proposed a coarse-to-fine multiscale warping strategy.1. we propose to adapt the smoothness weight α to the warping level k. After a small experiment on the importance of anti-aliasing in the warping scheme.0).001. we apply a low-pass filter to the images by performing a Gaussian convolution with standard √ deviation 2/(4η).Int J Comput Vis (2011) 93: 368–388 379 4 Implementation The solution of the Euler-Lagrange equations for our method comes down to solving a nonlinear system of equations. we use the multiscale warping approach described in Sect. Such a behaviour is in fact desirable as the data term might not be reliable at coarse levels. Spatial image derivatives are approximated via central fi1 nite differences using the stencil 12h (1. which is achieved by performing a motion compensation of the second frame by the current flow. Our proposed data term normalisation leads. λ = 0. ρ = 4. the derivatives are solely computed at the first frame as we only want to consider directional information from the reference image. known as warping. If not said otherwise. we split the flow field into an already computed solution from coarser levels and an unknown flow increment. Brox et al. we thus compare our method against variants (i) without data term normalisation. Concerning the temporal flow derivatives that occur in the spatio-temporal case. The images and the flow fields are sampled on a rectangular pixel grid with grid size h and temporal step size τ . γ = 20. 0.5. 1)/τ . 4. 2006). 5 Experiments In our experiments we show the benefits of the OFH approach. To overcome this problem. the already computed solution serves as initialisation. as a separate robustification of the RGB channels makes no sense. they can computed by the presented linearised approach. This is achieved by setting α(k) = α/ηk which results in larger values of α and an emphasis of the smoothness term at coarse levels. Note that it is a rather challenging sequence due to severe shadows and large displacements up to 25 pixels. to image gradients that are approximately in the same range at each level. 3 we show the results for the Snail sequence that we have created. 0. resulting in a fourth order approximation. The first experiments are concerned with our robust data term and the complementary smoothness term in the spatial and the spatio-temporal domain.5. For the latter we only separately robustify the brightness and the gradient constancy assumption. see Fig.1. 1.0. Prior to downsampling. Then. We solve the system by a nonlinear multigrid scheme based on a Gauß-Seidel type solver with alternating line relaxation (Bruhn et al. we turn to the automatic selection of the smoothness weight.

3 (f)). we use the average angular error (AAE) .0). (b) Zoom in marked region of first frame. a phantom motion in the shadow region at the right border is estimated. (e) Flow field in marked region. we compare the discussed variants to our proposed method on some Middlebury training sequences.0). 3 (i).0). (i) Same for our method (α = 2000. To further substantiate these observations. even when using a large smoothness weight α. (c) Same for second frame. The results of our method are shown together with the ground truth flow fields in Fig.and flow-driven regularisation (Sun et al. (h) Same for image.0). 3 Results for our Snail sequence with different variants of our method. unpleasant artefacts at image edges arise. 3 (e)). First row. To evaluate the quality of the flow fields compared to the given ground truth. When relying on the RGB colour space (Fig. 2008) (α = 2000. from left to right: (a) First frame. from left to right: (g) Same for TV regularisation (α = 50.0) to our result in Fig. Second row. without normalisation (α = 5000. from left to right: (d) Colour code. Third row. the following drawbacks of the modified versions become obvious: Without data term normalisation (Fig. (f) Same for RGB colour space (α = 300. 4.380 Int J Comput Vis (2011) 93: 368–388 Fig.

AEE = 0.0.073).3. To ease comparison with other methods.0.77◦ .7. the results are mostly inferior.5. γ = 1. AEE = 0.01 =⇒ AAE = 1. σ = 0. First column: Reference frame. A deeper discussion on the assets and drawbacks of the RGB and the HSV colour space can be found in Sect.298) measure (Barron et al.0. ρ = 1. From top to bottom: Rubberwhale (α = 3500.38◦ .5. we also give the alternative average endpoint error (AEE) measure (Baker et al.2. AEE = 0. γ = 20. 4 Results for some Middlebury sequences with ground truth. 5.0. Third column: Result with our method. Effect of the Separate Robust Penalisation This experiment illustrates the desirable effect of our separate robust penalisation of the HSV channels. 1994).5. always worse compared to RGB or HSV results. ρ = 1.39◦ . Compared to our proposed method.0. λ = 0.5.0. 5 . ρ = 1. Grove3 (α = 20. we show in Fig. apart from the Rubberwhale sequence. AEE = 0. γ = 25. σ = 0. Using the Rubberwhale sequence from the Middlebury database. however.Int J Comput Vis (2011) 93: 368–388 381 Fig. Table 2 also lists results obtained with greyscale images. γ = 0. λ = 0.076). σ = 0. The resulting AAE measures are listed in Table 2. Second column: Ground truth (white pixels mark locations where no ground truth is given). Dimetrodon (α = 8000. These are.0.87◦ . When using the RGB colour space. λ = 0. λ = 0. σ = 0. the results without normalisation are always significantly worse.0.487). and Urban3 (α = 75.1 =⇒ AAE = 2.1 =⇒ AAE = 4. 2009). ρ = 1. Addi- tionally.01 =⇒ AAE = 2.0.5.

We find that the CLG variant may only lead to small improvements. we use a rotationally invariant formulation that can be obtained from (49) by replacing the eigenvectors ri of the regularisation tensor by the eigenvectors si of the structure tensor.44◦ 1. Considering the corresponding error measures in Table 2. . On the other hand. we see that the alternative regularisation strategies mostly lead to higher errors.53◦ 2. 4 in general closely resemble the ground truth. In Fig. Our flow fields in Fig.09◦ 5.72◦ Proposed Grove3 Urban3 Int J Comput Vis (2011) 93: 368–388 Table 3 Comparison of our method to a CLG variant on noisy versions of the Yosemite sequence (using the AAE).54◦ 2. 3).27◦ 4. from left to right: (a) Zoom in first frame of the Rubberwhale sequence. Here.64◦ 10 4. .86◦ 2. the saturation and the value channel (i = 1. (1994) penaliser function as this gives better results here Sequence Rubberwhale Proposed No normalisation RGB colour space Greyscale images TV regulariser Sun et al.35◦ 5.75◦ 1. First.97◦ 3. we see that TV regularisation (Fig. (2010). As recently shown by Xu et al.87◦ 5. brighter pixels correspond to a larger weight and we only show a zoom for better visibility.14◦ 5. we compare in Table 3 the performance of our method to its CLG counterpart on noisy versions of the Yosemite sequence. see Fig. First row.72◦ 5. Only for the Dimetrodon sequence.82◦ 20 5. we also applied our method with different regularisers to some Middlebury sequences. see Table 2. (2008). unpleasant staircasing artefacts deteriorate the result. leads to blurred and badly localised flow edges. which hardly suffer from noise.27◦ 3.42◦ 4.14◦ 8. minor problems arising from incorporating image information in the smoothness term are visible at a few locations. As for the data term experiments. 3 (h)). As it turns out. In the Sun et al. but deteriorates the quality for low noise scenarios.61◦ 2. at the leafs in the Grove 3 result. (d) Same for the value channel ¯i the data term weights M (w J0 w) for the brightness constancy assumption on the hue. (2008) (Fig. 2. from left to right: (c) Same for the saturation channel. 3 (g)).55◦ dlebury test sequences. the CLG variant improves the results at large noise scales.88◦ 2.01◦ 3. This also explains the experienced behaviour on the Middlebury data sets.21◦ 3.34◦ 2. the weight of the value channel is reduced in the shadow regions (left of the wheel.82◦ 6. Furthermore. regulariser. Second row. A CLG Variant of Our Method Our next experiment is concerned with a CLG variant of our data term where we.59◦ 1.40◦ 1. (b) Visualisation of the corresponding hue channel weight. . perform a Gaussian convolution of the motion tensor entries. this modification seems not too useful.g. a CLG variant of our method can actually be useful in the presence of severe noise in the image sequence. of the orange toy and of the clam). as for the regularisation tensor. Fig.89◦ 2. we compare our method against two results obtained when using another regulariser in conjunction with our robust data term: (i) Using the popular TV regulariser. the latter problem could be resolved by using a more sophisticated warping strategy.77◦ 4.38◦ 2.382 Table 2 Comparing different variants of our method (using the AAE).08◦ 40 8. 5 Effect of our separate robust penalisation of the HSV channels.39◦ 1. 3 (i). see (40) and (42). 3. Brighter pixels correspond to a larger weight. However.2 Complementary Smoothness Term Comparison with Other Regularisers.44◦ Dimetrodon CLG 1. This is desirable as the value channel is not invariant under shadows. (ii) Using the anisotropic image and flow-driven regulariser of Sun et al. e. we compare our method against a CLG variant for some Middlebury sequences. some tiny flow details are smoothed out. However. We have added Gaussian noise with zero mean and standard deviation σn σn 0 1. To prove this. Here. at the wheel in the Rubberwhale result and at the bottom of the Urban 3 result. Comparing the obtained results to our result in Fig. we use the Charbonnier et al. . As we can see. which built the basis of our complementary regulariser. regulariser CLG 2.58◦ 2. Using the regulariser of Sun et al. 5.g. the TV regulariser is en par with our result. We thus conclude that for the considered Mid- . e. it may also deteriorate the results.39◦ 1.45◦ 1.

43◦ 2. weight α AAE 2 (16–17) 75.56◦ 5.65◦ 4 (9–12) 1000. computed from α0 = 400.62◦ 2.0 1. In our experiments.71◦ 4. τ = 1. First row.22◦ 3.96◦ 2.0.0 1. our result further substantiate the validity of the proposed OPP. (b) Same for the proposed data constancy error (ADCE1.Int J Comput Vis (2011) 93: 368–388 383 Table 5 Smoothness weight α and AAE measures for our spatiotemporal method on the Yosemite sequence from Middlebury.0 2. All results used the fixed parameters σ = 0.0 1.de/image_sequences/) and the Yosemite sequence from the Middlebury datasets.50◦ 3.0 2.0 1.0). spatio-temporal methods do not improve the results. (ii) estimating α and (iii) with the optimal value of α Sequence Rubberwhale Grove2 Grove3 Urban2 Urban3 Hydrangea Fixed α 3. better results could be obtained when disabling the temporal presmoothing Number of frames (from – to) Smoothn. from left to right: (a) Reference frame (frame 16). Second row. However. 7 (a).43◦ 5.35◦ 1.3 Automatic Selection of the Smoothness Weight Performance of our Proposed Error Measure We first show that our proposed data constancy error between frame 1 and 3 (ADCE1. When using more than two frames. τ = 1.5.59◦ 5.50◦ 2.37◦ 1.63◦ 6 (14–19) 50.0 1. As our error measure reflects the quality of the prediction with the given flow field.43◦ 5. To this end.5.5. γ = 20.86◦ 8 (13–20) 50. 6 Results for the Marble sequence with our spatio-temporal method. All results used the fixed parameters σ = 1.94◦ Optic Flow in the Spatio-Temporal Domain Let us now turn to the spatio-temporal extension of our complementary smoothness term. (d) Same for 6 frames (14–19) Table 4 Smoothness weight α and AAE measures for our spatiotemporal method on the Marble sequence.85◦ 4 (15–18) 50. we compare the two error measures for the Grove2 sequence in Fig.66◦ 3. From left to right: (a) Angular error (AAE) for 51 values of α. see Fig. a spatio-temporal method would in general be applicable. As most Middlebury sequences consist of 8 frames. Note that when using more than two frames. from left to right: (c) Result using 2 frames (16–17). Here. weight α AAE 2 (10–11) 2000.01◦ Fig.16◦ 6 (8–13) 1000. 7 (b)) indeed exhibits a shape very close to the angular error shown in Fig.ira.0.0 4. It becomes obvious that our proposed error measure (Fig.44◦ 1.00◦ 2. 7 Automatic selection of the smoothness weight α at the Grove2 sequence. the convolutions with Kσ and Kρ are performed in the spatio-temporal domain Number of frames (from – to) Smoothn.89◦ 5. γ = 0.73◦ Estimated α 3. η = 0.3 ) is a very good approximation of the popular angular error (AAE) measure. Consequently.3 ) Table 6 Results (AAE) for some Middlebury sequences when (i) fixing the smoothness weight (α = 400. These sequences exhibit relatively small displacements and our spatio-temporal method allows to obtain notably better results. ρ = 1.0.0.5.5.0 and a stepping factor a = 1. (b) Ground truth (white pixels mark locations where no ground truth is available). resulting in a violation of the assumption of a temporally smooth flow field. a smaller smoothness weight α has to be chosen and that a too large temporal window may also deteriorate the results again. 6 and Tables 4 and 5.05◦ Optimal α 3. η = 0.5.uka.94◦ 1. ρ = 1. 7.00◦ 2.86◦ 1.1.05◦ 8 (7–14) 1000. we use the Marble sequence (available at http://i21www. see Fig. Yosemite Marble 5. the displacements between two subsequent frames are often rather large there. 6.84◦ 3. .04◦ Fig.

ρ = 4. However. 5. the resulting artefacts will not significantly deteriorate the flow estimation. Top row. from left to right: (c) Our result without anti-aliasing. and (iii) selecting the (w.r. As the Middlebury sequences hardly suffer from difficult illumination conditions. For the Teddy sequence with only two frames.5. using a HSV colour representation is not as beneficial as seen in our experiment from Fig. but replaced the third frame by a copy of the second one. corresponding to approximately 36 s per flow field. we submitted our results to the popular Middlebury benchmark (http://vision. This violates the assumption of a constant speed and leads to a severe impairment of results (up to 30%). 2010.0. see Fig. This results from the problem that greyscales do not Fig. we cannot profit from the photometric invariances of the HSV colour space. which can be attributed to the robust data term. In Table 6. however. Nevertheless. Following the trend of parallel implementations on modern GPUs (Zach et al. the AAE and the AEE measure among 39 listed methods. 3. 9. (ii) using our automatic parameter selection method. . 8 Importance of anti-aliasing on the example of the Urban sequence. It turns out that estimating α with our proposed method allows to improve the results compared to a fixed value of α in almost all cases. we applied our approach again to the same sequences. γ = 20. Note that for this sequence.edu/flow/eval/). the fixed value of α accidentally coincides with the optimal value.t. The resulting running time for the Urban sequence (640 × 480 pixels) was 620 s on a standard PC (3. our results are on average 3% and at most 10% worse than the optimal result. no ground truth is publicly available have a unique representation in the hue as well as the saturation channel. which demonstrates the benefits of the proposed novelties in this paper. a = 1. we show that our automatic parameter selection works well for a large variety of different test sequences. 5. λ = 0. consider the results for the Teddy sequence in the first row of Fig. Here we see that the small white triangle beneath the chimney causes unpleasant artefacts in the flow field. some sequences even pose problems in their HSV representation. The smoothness weight α was automatically determined by our proposed method with the settings nα = 8. Bottom row. 2010). As it turns out.t. we use this variant of our method for evaluation at the Middlebury benchmark. We explain this by the high frequent stripe pattern on the facade of the building. 2007.r.0). ζ = 0.0.0. At the time of submission (August 2010). Just for the Grove 3 sequence.4 Importance of Anti-Aliasing in the Warping Scheme We proposed to presmooth the images prior to downsampling in order to avoid aliasing problems. 9) a HSV colour representation removes artefacts in the shadows left of the toys. that for the whole set of benchmark sequences. Compared to the results achieved with an optimal value of α. we achieve the 4th place w. we summarise the AAE obtained when (i) setting α to a fixed value (α = α0 = 400. the large displacement of the building in the lower left corner can only be estimated when using anti-aliasing.1. In most cases. As an example. we obtain slightly better results when using the RGB colour space. α0 = 400.95. (d) Same with anti-aliasing. there are also sequences where a HSV colour representation is beneficial. For all considered test sequences these assumptions were obviously valid. Werlberger et al. the AAE) optimal value of α under the tested proposals. We wish to note that a favourable performance of the proposed parameter selection of course depends on the validity of our assumptions (constant speed and linear trajectories of objects). 2010) we recently came up with a GPU version of our method that can compute flow fields of size 640 × 480 pixels in less than one second (Gwosdek et al. Sundaram et al. Note that our previous Complementary Optic Flow method (Zimmer et al. The bottom line is. as our parameter estimation method is not applicable in this case. η = 0. like the automatic parameter selection and the anti-aliasing. Grossauer and Thoman 2008. we set α = α0 . (b) Frame 11. anti-aliasing is crucial for obtaining reasonable results. We found that for the provided benchmark sequences. 8. Thus. 2009) only ranks 6th for the AAE and 9th for the AEE.5 Comparison to State-of-the-Art Methods To compare our method to the state-of-the-art in optic flow estimation. For our submission we used in accordance to the guidelines a fixed set of parameters: σ = 0. On the other hand. for the Urban sequence from the official Middlebury benchmark.2.384 Int J Comput Vis (2011) 93: 368–388 Benefits of an Automatic Parameter Selection Next. In order to show the limitations of our method.2 GHz Intel Pentium 4). For the Mequon sequence (second row of Fig. For the parameter selection we computed 2 · 8 + 1 = 17 flow fields.1.middlebury. from left to right: (a) Frame 10.

from left to right: (a) Frame 10 of the Teddy sequence.t.r.Int J Comput Vis (2011) 93: 368–388 385 Fig.8 277. The smoothing in data constraint direction is reduced to avoid interference with the data term. for sequences with small details in the flow field (Army. (h) Same for the RGB colour space (AAE = 2. The benefits of the OFH idea are demonstrated by our extensive experimental validation and the competitive performance at the Middlebury optic flow benchmark. our smoothness term unifies the benefits of imageand flow-driven regularisers. To this end. our method can easily be used in all variational optic flow approaches and additionally gives surprisingly good results. Furthermore.8 277. (d) Same for the RGB colour space (AAE = 2. resulting in sharp flow edges without oversegmentation artefacts. (g) Result when using the HSV colour space (AAE = 2. Mequon) a small smoothness weight is chosen.3 1433. Due to its simplicity. (b) Frame 11. Under some assumptions. the quality of the prediction can be judged by evaluating the data constraints between first and third frame of the sequence and using the doubled flow vectors. Our paper thus shows that a careful design of data and smoothness term together with an automatic choice of the smoothness . we came up with a novel paradigm. As desired. (c) Result when using the HSV colour space (AAE = 3. Second row. (f) Frame 11. photometric invariant colour representation. using our automatic parameter selection method Sequence Army Grove Mequon Schefflera Urban Wooden Yosemite Smoothness weight α 277. This yields an optimal complementary between both terms. It states the flow field obtained with an optimal smoothness weight allows for the best prediction of the next frames in the image sequence.8 691. On the other hand. The anisotropic complementary smoothness term incorporates directional information from the motion tensor occurring in the data term.84◦ ) Table 7 Estimated values of the smoothness weight α.94◦ ). (ii) We presented a simple. from left to right: (e) Frame 10 of the Mequon sequence.3 In Table 7 we additionally summarise the estimated values of α resulting from our automatic parameter selection method. higher order constancy assumptions and robust penalisation. This was achieved by two main ideas: (i) We developed a smoothness term that achieves an optimal complementary smoothing behaviour w.2 480.0 995. sequences like Wooden and Yosemite with a rather smooth flow yield significantly larger values for the smoothness weight. The proposed parameter selection method bases on the OPP that we introduced in this paper.64◦ ). yet well performing method for determining the optimal smoothness weight for the given im- age sequence. the imposed data constraints. the smoothness term and the smoothness weight. 9 Comparison of results obtained with HSV or RGB colour representation. the optimal prediction principle (OPP). while a strong smoothing in the orthogonal direction allows to fill-in missing information. Grove. 6 Conclusions and Outlook In this paper we have shown how to harmonise the three main constituents of variational optic flow approaches: the data term.28◦ ). First row. Our optic flow in harmony (OFH) method bases on an advanced data term that combines and extended successful concepts like normalisation.

Using the fact that ∂x ∂ux V + ∂y ∂uy V = div(∂ux V . Fleet. M.. (2009). The second one then follows analogously.. L. Barron. J. S. r2 (68) (57) Its contributions to the Euler-Lagrange equations are given by ∂x ∂ux V + ∂y ∂uy V . 2. 43–77. ∂uy V ) . Roth. XVI congreso de ecuaciones diferenciales y aplicaciones (pp. Black.. r (69) (60) References Alvarez. J. for i = 1.g. 5. e. 5. Our current research is on the one hand concerned with investigating possibilities to small issues with the proposed regularisation strategy. (1994).g. J. Lewis. Performance of optical flow techniques. International Journal of Computer Vision. Redmond. S. 12(1). Appendix: Derivation of the Diffusion Tensor for the Complementary Regulariser Consider the complementary regulariser from (50): V (∇2 u. S. & Szeliski. Lefébure.. or the integration of an epipolar geometry prior (Wedel et al. MSR-TR-2009-179. Tech. J. segmentation (Lei and Yang 2009).. The corresponding eigenvectors are r1 and r2 . The latter would also remedy the problems with non-constant speed of objects that we have mentioned in Sect. Scharstein. L. 2009). Esclarín. D.. WA. . P. We thus hope that our work will give rise to more “harmonised” approaches in other fields where energy-based methods are used. r r (67) We multiplied the second term of each sum by a factor of 1 to clarify that the eigenvalues of D are ψV (r1 ) and 1. and ∂x ∂vx V + ∂y ∂vy V . A PDE model for computing the optical flow. M.3. D. S. Let us first define the abbreviations ri := (ri1 .2.. Microsoft Research. (63) (61) (62) . with the diffusion tensor D := 2 2 ψV (r1 )· r11 + 1· r21 (66) ψV (r1 )· r11 r12 + 1· r21 r22 2 2 ψV (r1 )· r12 + 1· r22 ψV (r1 )· r11 r12 + 1· r21 r22 . A database and evaluation methodology for optical flow. (59) (58) = (r1 | r2 ) This shows that D is identical to DCR from (51). Furthermore. & Sánchez. Rep. we are interested in extensions of our novel parameter selection approach. we focus on higher efficiency and ways to apply our method in cases where only two images are available. In Proc.. 1349–1356). respectively. ri2 ) and ψV (ri ) := V 2 u2i + vri . Las Palmas de Gran Canaria. Baker.. which allows to rewrite the tensor D as D= r11 r21 r12 r22 ψV (r1 ) · r11 ψV (r1 ) · r12 1 · r21 1 · r22 ψV (r1 ) 0 0 1 r1 . ψV (r1 ) ur1 r12 + ur2 r22 (64) By multiplying out the expressions inside the divergence expressions one ends up with ∂x ∂ux V + ∂y ∂uy V = 2div 2 2 (ψV (r1 )r11 + r21 )ux + (ψV (r1 )r11 r12 + r21 r22 )uy 2 2 (ψV (r1 )r11 r12 + r21 r22 )ux + (ψV (r1 )r12 + r22 )uy . Here. we obtain by plugging (61) and (62) into (58): ∂x ∂ux V + ∂y ∂uy V = 2div ψV (r1 ) ur1 r11 + ur2 r21 . ∂uy V = 2 ψV (r1 ) ur1 r12 + ur2 r22 from (58). & Beauchemin. respectively. ∇2 v) = V 2 2 u21 + vr1 + u22 + vr2 . as it can be written as D = ψV (r1 ) r1 r1 + r2 r2 = V 2 u21 + vr1 r1 r1 + r2 r2 . see our remarks in Sect.. Acknowledgements The authors gratefully acknowledge partial funding by the International Max-Planck Research School (IMPRS) and the Cluster of Excellence “Multimodal Computing and Interaction” within the Excellence Initiative of the German Federal Government. image registration.386 Int J Comput Vis (2011) 93: 368–388 weight allows to outperform other well-engineered methods that incorporate many more processing steps. . . We exemplify the computation of the first expression (58). R. Spain. e. With their help. r respectively. J. (1999). . (65) We can write above equation in diffusion tensor notation as ∂x ∂ux V + ∂y ∂uy V = 2div(D ∇2 u). J. we compute the expressions ∂ux V = 2 ψV (r1 ) ur1 r11 + ur2 r21 .

& Thoman. 141– 150). 2010 IEEE computer society conference on computer vision and pattern recognition. (1962). In Proc. & J. Switzerland. Rep. Ng. In T. (2005). Osher. Dense point trajectories by GPU-accelerated large displacement optical flow. 281–305). 61(3). Multidimensional orientation estimation with applications to texture analysis and optical flow. & Barlaud. pp. Sun. Bruhn. G. Part III. Lecture notes in computer science (Vol. (1998). corners and centres of circular features. & Fatemi. N. C. Heraklion. In Proc. Charbonnier. Cambridge. Hamprecht. H. Computer vision—ECCV 2010 (pp... Grewenig. Computer vision—ECCV ’90 (pp. Jähne (Eds. In Proc. In Proc. P. M. Kohlberger. Krajsek. In Proc.. J. Computer vision—ECCV 2004. ISPRS intercommission conference on fast processing of photogrammetric data (pp.. P. Austin: IEEE Computer Society Press. A. (2007). B. Golland. & Black. Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Computer vision—ECCV 2008... An iterative image registration technique with an application to stereo vision. In Proc. 629–639. Schnörr. & B. A. (2010). 5008. Probability distributions of optical flow. V. (1987). S. (1993). Interlaken. eighth Scandinavian conference on image analysis (Vol. Artificial Intelligence. Schnörr.. J. S. .. Forsyth. 13(8). Lei. 12. E. 81–86). J. 257–277. Proceedings of the IEEE. Two deterministic half-quadratic regularization algorithms for computed imaging. In Proc. H. Förstner. IEEE Transactions on Pattern Analysis and Machine Intelligence. & Hervé.. Schäfer (Eds. (1997). D. In Proc. Vincze. Lewis. W. (1981). H. N. & Heeger. & Torre.. I. T. 185–203. Kyoto: IEEE Computer Society Press. Bruhn. Lecture notes in computer science: Vol. & Buxton. J. A. P. B. Schoenemann..). J. Simoncelli. Gwosdek. Computer Vision and Image Understanding. M. D. I.Int J Comput Vis (2011) 93: 368–388 Bertero. (1988). Gasteratos. 220–228.. Y. Beijing: IEEE Computer Society Press. Pajdla & J. D. (2006). & Keutzer. pp. Perona. & R. J. 1991 IEEE computer society conference on computer vision and pattern recognition (pp. Adelson.. L. & Gülch. Illumination-robust variational optical flow with photometric invariants. R. A data-driven method for choosing smoothing parameters in optical flow problems. Bruhn. 2. 661–663). Bigün. & Schnörr. An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence. Nagel. (1987). A. K. D. Pattern recognition. Lecture notes in computer science (pp. In Proc. 6311. Tromsø. M. S. Surfaces for computer aided design of space forms (Tech. 211–231.). J. (1991). 1074–1079. G. workshop on visual motion (pp. H. J. 9(2). Zimmer. Brox. K. 68(3). M.. Y. IEEE Transactions on Pattern Analysis and Machine Intelligence. In F. Pattern recognition. & Mester. H.. In A. D. International Journal of Computer Vision. B. Berlin: Springer. 76(8). & Anandan. Berlin: Springer. Lecture notes in computer science (pp. Torr... W. TA. B. IEEE international conference on image processing (Vol.. H.. J. 438–451). 168–172). H. Sun. Regularization of discontinuous flow fields. Calculus of variations.). (2009). & Weickert. In D.. Extending the ‘oriented smoothness constraint’ into the temporal domain and the estimation of derivatives of optical flow. (1989). & Kanade.. A. J. Roth. Scene segmentation from visual motion using global optimization. (1996). (1993). J. Blanc-Féraud. Poggio. & A. 1997 IEEE international conference on image processing (Vol. Weickert. 310– 315). 674–679). J. Pattern recognition. A. Murray.. (2008). Brox. In Proc. Jerusalem: IEEE Computer Society Press. In K. High accuracy optical flow estimation based on a theory for warping. (1992). In O. M. L. S. twelfth international conference on pattern recognition (Vol. C. A. Maui: IEEE Computer Society Press. Motion from color. Sundaram.). 77–104. & Schnörr. S.. (2004). A. T. 60. Computer vision systems (pp. & Yang. P. S. 346–362. Horn. (2005). A. (2008). 259–268. Computation of component image velocity from local phase information. Norway. & Enkelmann. B. Visual reconstruction. & Wiklund. tenth international conference on computer vision (Vol. Berlin: Springer. J. P. & Cremers. T.. Papenberg... Greece. (1997). International Journal of Computer Vision. Secrets of optical flow estimation and their principles. In Proc. A. MIT/LCS/TR-41). Black. (1967). P. IEEE Transactions on Pattern Analysis and Machine Intelligence. W. Schnörr. Computer Vision and Image Understanding. (1981). A.. P. Reliable and efficient computation of optical flow. Berlin: Springer. B. Nickolay. pp. Jähne (Eds. R... Berlin: Springer. D. (1991). The robust estimation of multiple motions: parametric and piecewise smooth flow fields. 142–151). & Vemuri. Near real-time motion segmentation using graph cuts. 387 Lai.). D. A. 1. Aubert. Matas (Eds. Determining optical flow. 1. Zisserman (Eds. C. San Francisco: IEEE Computer Society Press. London: Pergamon. 139–148). E. 17. T. Mileva. K. & Bruckstein. Weickert. Berlin: Springer. E. Blake. Tsotsos (Eds. Bruhn. Bayesian model selection for optical flow estimation. 565–593... International Journal of Computer Vision. A multigrid platform for real-time motion computation with discontinuity-preserving variational methods. Lecture notes in computer science (pp. Grossauer. Scale space and edge detection using anisotropic diffusion. J.. D.. & Black. Berlin: Springer. Fleet. H.. Schnörr.. C. Optical flow estimation on coarse-tofine region-trees using discrete optimization.. A highly efficient GPU implementation for variational optic flow based on the Euler-Lagrange framework. T.. Franke. 455–464). C. In F. Canada. 360–363). Nonlinear variational method for optical flow computation. IEEE Transactions on Pattern Analysis and Machine Intelligence. J. Nagel. Vancouver. Segmentation of visual motion by minimizing convex non-quadratic functionals. seventh international joint conference on artificial intelligence (pp. 5(1). (2010). M. In Lecture notes in computer science: Vol. pp. 75–104. Los Alamitos: IEEE Computer Society. C. E. P. & B. pp. 25–36). E. Coons..). L. P. Physica D. F. Cambridge: MIT Press. (1994). J. 523–530). On functionals with greyvalue-controlled smoothness terms for determining optical flow. A fast operator for detection and precise location of distinct points. 83–97). 869–889. & Malik. Bruhn.. V.. K. 775–790. 2009 IEEE international conference on computer vision. Learning optical flow. Bruhn. C. (1994). 15(10). & Zisserman.. (1990). & Solo. Müller. Berlin: Springer. (1986). (1990).). Lucas.. Towards ultimate motion estimation: combining highest accuracy with real-time performance.. 3. (2007). 427.. 70(3). & Weickert. Ill-posed problems in early vision. (1987).. Hamprecht. A.. 152–162). Nonlinear total variation based noise removal algorithms. GPU-based multigrid: real-time performance in high resolution nonlinear image processing. In Proc. 87–105. Lecture notes in computer science: Vol. & Weickert. Rudin. Cohen. Lecture notes in computer science (pp. Granlund. Roth. M. & Schunck. Irvine: IEEE Computer Society Press. Faugeras (Ed. Elsgolc. Part IV.. 63(1). H. 8. A. pp. Shulman. (2006)... C. (1990). 3024. H. L. Massachusetts Institute of Technology. 2010 ECCV workshop on computer vision with GPUs. International Journal of Computer Vision. & Weickert. & Jepson. 29(2). 749–755). (2010).

& Seidel. (2010). L. 3951. 2004 IEEE international conference on image processing (Vol. J. & Gevers. Lecture notes in computer science: Vol. Zach. Motion detail preserving optical flow estimation. San Francisco: IEEE Computer Society Press. Salgado. Hamprecht. Lecture notes in computer science: Vol. & A. 16–19). Xu. C. Pock.. (2008). J. (1985). Sawhney. Wedel. Lecture notes in computer science: Vol. A.. (2004). C. Boykov. & Cremers. D. Bruhn. & Bischof. T. In A. Xiao. Theoretical foundations of anisotropic diffusion in image processing. H. Canada. Complementary optic flow... Berlin: Springer.. O. C. Leonardis.. International Journal of Computer Vision. I. Kyoto: IEEE Computer Society Press. P. Lecture notes in computer science: Vol. & F. Adaptive support-weight approach for correspondence search. 2010 IEEE Int J Comput Vis (2011) 93: 368–388 computer society conference on computer vision and pattern recognition. Pattern recognition (pp. Computing Supplement. H. (1984). Statistical and geometrical approaches to visual motion analysis (pp. Robust optical flow from photometric invariants.. 4713. In D. A. Wedel. Cheng. (2001a).. In Proc... Motion estimation with non-local total variation regularization. & Kweon.. A.. & F. Yuille. (1996). Jähne (Eds. Bischof.. & Matsushita.). Velocity estimation from image sequences with second order differential operators. San Francisco: IEEE Computer Society Press. D. Y. Cremers. T. In Proc. J. (2009). T. B. Y... Yaroslavsky. Montreal.). Berlin: Springer.. 5604.. 214–223). Journal of Mathematical Imaging and Vision. L.). Pinz (Eds. (2010)..). Berlin: Springer. H. Schnörr. H. Weickert. H. T.. Digital picture processing: an introduction.. Yoon. A. In F. Berlin: Springer. Werlberger. K. van de Weijer. Rosenhahn. (2001b). H. seventh international conference on pattern recognition (pp. & Bischof. IEEE Transactions on Pattern Analysis and Machine Intelligence. pp. M. S. Part I (pp. Weickert. Schmidt (Eds. & B. 207–220). C. (2006). An improved algorithm for TV-L1 optical flow computation. J. Bischof. Pock. & Isnardi. A theoretical framework for convex regularizers in PDE-based computation of image motion. 2009 IEEE international conference on computer vision. 221–236... Jia. In Proc. In Proc. Rosenhahn. & Schnörr. Weickert. 245–255. Structureand motion-adaptive regularization for high accuracy optic flow. & Bischof. C. Valgaerts. C. Energy minimization methods in computer vision and pattern recognition (EMMCVPR) (pp. Weickert. Schmidt (Eds. 1835–1838). A. (2006). 45(3). Singapore: IEEE Signal Processing Society. Variational optic flow computation with a spatio-temporal smoothness constraint. A. Blake.. R. B. P. Cremers. 2010 IEEE computer society conference on computer vision and pattern recognition.. 3. & Pastor. In Proc.388 Tretiak. J. H. . Pock. J. Zimmer. Computer vision—ECCV 2006. In D.. H. & Schnörr. L. 245–264. L. (2009). 11. Cremers. (2007). J. 14(3). 5681. Bilateral filtering-based optical flow estimation with occlusion detection. M.. A duality based approach for realtime TV-L1 optical flow. Berlin: Springer. L. J.. T. R.. Rao. H. 28(4). 650–656. Zach. 23–45). Pock. 211–224).