DOI 10.1007/s1126301104226
Optic Flow in Harmony
Henning Zimmer · Andrés Bruhn · Joachim Weickert
Received: 30 August 2010 / Accepted: 13 January 2011 / Published online: 26 January 2011
© Springer Science+Business Media, LLC 2011
Abstract Most variational optic ﬂow approaches just con
sist of three constituents: a data term, a smoothness term
and a smoothness weight. In this paper, we present an ap
proach that harmonises these three components. We start by
developing an advanced data term that is robust under out
liers and varying illumination conditions. This is achieved
by using constraint normalisation, and an HSV colour rep
resentation with higher order constancy assumptions and a
separate robust penalisation. Our novel anisotropic smooth
ness is designed to work complementary to the data term.
To this end, it incorporates directional information from the
data constraints to enable a ﬁllingin of information solely
in the direction where the data term gives no information,
yielding an optimal complementary smoothing behaviour.
This strategy is applied in the spatial as well as in the spatio
temporal domain. Finally, we propose a simple method for
automatically determining the optimal smoothness weight.
This method bases on a novel concept that we call “opti
mal prediction principle” (OPP). It states that the ﬂow ﬁeld
obtained with the optimal smoothness weight allows for the
best prediction of the next frames in the image sequence.
The beneﬁts of our “optic ﬂowin harmony” (OFH) approach
H. Zimmer () · J. Weickert
Mathematical Image Analysis Group, Faculty of Mathematics and
Computer Science, Saarland University, Campus E1.1, 66041
Saarbrücken, Germany
email: zimmer@mia.unisaarland.de
J. Weickert
email: weickert@mia.unisaarland.de
A. Bruhn
Vision and Image Processing Group, Cluster of Excellence
Multimodal Computing and Interaction, Saarland University,
Campus E1.1, 66041 Saarbrücken, Germany
email: bruhn@mmci.unisaarland.de
are demonstrated by an extensive experimental validation
and by a competitive performance at the widely used Mid
dlebury optic ﬂow benchmark.
Keywords Optic ﬂow · Variational methods · Anisotropic
smoothing · Robust penalisation · Motion tensor ·
Parameter selection
1 Introduction
Despite almost three decades of research on variational optic
ﬂow approaches, there have been hardly any investigations
on the compatibility of their three main components: the
data term, the smoothness term and the smoothness weight.
While the data term models constancy assumptions on im
age features, the smoothness term penalises ﬂuctuations in
the ﬂow ﬁeld, and the smoothness weight determines the
balance between the two terms. In this paper, we present the
optic ﬂow in harmony (OFH) method, which harmonises the
three constituents by following two main ideas:
(i) Widelyused data terms such as the one resulting from
the linearised brightness constancy assumption only
constrain the ﬂow in one direction. However, most
smoothness terms impose smoothness also in the data
constraint direction, leading to an undesirable interfer
ence. A notable exception is the anisotropic smoothness
term that has been proposed by Nagel and Enkelmann
(1986) and modiﬁed by Schnörr (1993). At large image
gradients, this regulariser solely smoothes the ﬂow ﬁeld
along image edges. For a basic data term modelling the
brightness constancy assumption, this smoothing direc
tion is orthogonal to the data constraint direction and
thus both terms complement each other in an optimal
Int J Comput Vis (2011) 93: 368–388 369
manner. Unfortunately, this promising concept of com
plementarity between data and smoothness term has not
been further investigated. Our paper revives this con
cept for stateoftheart optic ﬂow models by presenting
a novel complementary smoothness term in conjunction
with an advanced data term.
(ii) Having adjusted the smoothing behaviour to the im
posed data constraints, it remains to determine the op
timal balance between the two terms for the image se
quence under consideration. This comes down to select
ing an appropriate smoothness weight, which is usually
considered a difﬁcult task. We propose a method that
is easy to implement for all variational optic ﬂow ap
proaches and nevertheless gives surprisingly good re
sults. It bases on the assumption that the ﬂow estimate
obtained by an optimal smoothness weight allows for
the best possible prediction of the next frames in the
image sequence. This novel concept we name optimal
prediction principle (OPP).
1.1 Related Work
In the ﬁrst ten years of research on optic ﬂow, several basic
strategies have been considered, e.g. phasebased methods
(Fleet and Jepson 1990), local methods (Lucas and Kanade
1981; Bigün et al. 1991) and energybased methods (Horn
and Schunck 1981). In recent years, the latter class of meth
ods became increasingly popular, mainly due to their poten
tial for giving highly accurate results. Within energybased
methods, one can distinguish discrete approaches that min
imise a discrete energy function and are often probabilisti
cally motivated, and variational approaches that minimise a
continuous energy functional.
Our variational approach naturally incorporates concepts
that have proven their beneﬁts over the years. In the follow
ing, we brieﬂy review advances in the design of data and
smoothness terms that are inﬂuential for our work.
Data Terms To cope with outliers caused by noise or oc
clusions, Black and Anandan (1996) replaced the quadratic
penalisation from Horn and Schunck (1981) by a robust sub
quadratic penaliser.
To obtain robustness under additive illumination changes,
Brox et al. (2004) combined the classical brightness con
stancy assumption of Horn and Schunck (1981) with the
higherorder gradient constancy assumption (Tretiak and
Pastor 1984; Schnörr 1994). Bruhn and Weickert (2005) im
proved this idea by a separate robust penalisation of bright
ness and gradient constancy assumption. This gives advan
tages if one of the two constraints produces an outlier. Re
cently, Xu et al. (2010) went a step further and estimate
a binary map that locally selects between imposing either
brightness or gradient constancy. An alternative to higher
order constancy assumptions can be to preprocess the im
ages by a structuretexture decomposition (Wedel et al.
2008).
In addition to additive illumination changes, realistic sce
narios also encompass a multiplicative part (van de Weijer
and Gevers 2004). For colour image sequences, this issue
can be tackled by normalising the colour channels (Golland
and Bruckstein 1997), or by using alternative colour spaces
with photometric invariances (Golland and Bruckstein 1997;
van de Weijer and Gevers 2004; Mileva et al. 2007). If one is
restricted to greyscale sequences, using logderivatives (Mil
eva et al. 2007) can be useful.
A further successful modiﬁcation of the data term has
been reported by performing a constraint normalisation (Si
moncelli et al. 1991; Lai and Vemuri 1998; Schoenemann
and Cremers 2006). It prevents an undesirable overweight
ing of the data term at large image gradient locations.
Smoothness Terms First ideas go back to Horn and Schunck
(1981) who used a homogeneous regulariser that does not re
spect any ﬂow discontinuities. Since different image objects
may move in different directions, it is, however, desirable to
also permit discontinuities.
This can for example be achieved by using imagedriven
regularisers that take into account image discontinuities. Al
varez et al. (1999) proposed an isotropic model with a scalar
valued weight function that reduces the regularisation at im
age edges. An anisotropic counterpart that also exploits the
directional information of image discontinuities was intro
duced by Nagel and Enkelmann (1986). Their method regu
larises the ﬂow ﬁeld along image edges but not across them.
As noted by Xiao et al. (2006), this comes down to con
volving the ﬂow ﬁeld with an oriented Gaussian where the
orientation is adapted to the image edges.
Note that for a basic data term modelling the brightness
constancy assumption, the image edge direction coincides
with the complementary direction orthogonal to the data
constraint direction. In such cases, the regularisers of Nagel
and Enkelmann (1986) and Schnörr (1993) can thus be con
sidered as early complementary smoothness terms.
Of course, not every image edge will coincide with a
ﬂow edge. Thus, imagedriven strategies are prone to give
oversegmentation artefacts in textured image regions. To
avoid this, ﬂowdriven regularisers have been proposed that
respect discontinuities of the evolving ﬂow ﬁeld and are
therefore not misled by image textures. In the isotropic set
ting this comes down to the use of robust, subquadratic pe
nalisers which are closely related to line processes (Blake
and Zisserman 1987). For energybased optic ﬂow meth
ods, such a strategy was used by Shulman and Hervé (1989)
and Schnörr (1994). An anisotropic extension was later pre
sented by Weickert and Schnörr (2001a). The drawback of
370 Int J Comput Vis (2011) 93: 368–388
ﬂowdriven regularisers lies in less welllocalised ﬂowedges
compared to imagedriven approaches.
Concerning the individual problems of image and ﬂow
driven strategies, the idea arises to combine the advantages
of both worlds. This goal was ﬁrst achieved in the discrete
method of Sun et al. (2008). There, the authors developed
an anisotropic regulariser that uses directional ﬂow deriva
tives steered by image structures. This allows to adapt the
smoothing direction to the direction of image structures and
the smoothing strength to the ﬂow contrast. We call such a
strategy image and ﬂowdriven regularisation. It combines
the beneﬁts of image and ﬂowdriven methods, i.e. sharp
ﬂow edges without oversegmentation problems.
The smoothness terms discussed so far only assume
smoothness of the ﬂow ﬁeld in the spatial domain. As im
age sequences usually consist of more than two frames,
yielding more than one ﬂow ﬁeld, it makes sense to also
assume temporal smoothness of the ﬂow ﬁelds. This leads
to spatiotemporal smoothness terms. In a discrete setting
they go back to Murray and Buxton (1987). For varia
tional approaches, an imagedriven spatiotemporal smooth
ness terms was proposed by Nagel (1990) and a ﬂowdriven
counterpart was later presented by Weickert and Schnörr
(2001b). The latter smoothness term was later successfully
used in the methods of Brox et al. (2004) and Bruhn and
Weickert (2005).
Recently, nonlocal smoothing strategies (Yaroslavsky
1985) have been introduced to the optic ﬂow community by
Sun et al. (2010) and Werlberger et al. (2010). In these ap
proaches, it is assumed that the ﬂow vector at a certain pixel
is similar to the vectors in a (possibly large) spatial neigh
bourhood. Adapting ideas from Yoon and Kweon (2006),
the similarity to the neighbours is weighted by a bilateral
weight that depends on the spatial as well as on the colour
value distance of the pixels. Due to the colour value dis
tance, these strategies can be classiﬁed as imagedriven ap
proaches and are thus also prone to oversegmentation prob
lems. However, comparing nonlocal strategies to the pre
viously discussed smoothness terms is somewhat difﬁcult:
Whereas nonlocal methods explicitly model similarity in a
certain neighbourhood, the previous smoothness terms oper
ate on ﬂow derivatives that only consider their direct neigh
bours. Nevertheless, the latter strategies model a globally
smooth ﬂow ﬁeld as each pixel communicates with each
other pixel through its neighbours.
Automatic Parameter Selection It is wellknown that an
appropriate choice of the smoothness weight is crucial for
obtaining favourable results. Nevertheless, there has been
remarkably little research on methods that automatically es
timate the optimal smoothness weight or other model para
meters.
Concerning an optimal selection of the smoothness
weight for variational optic ﬂow approaches, Ng and Solo
(1997) proposed an error measure which can be estimated
from the image sequence and the ﬂow estimate only. Using
this measure, a bruteforce search for the smoothness weight
that gives the smallest error is performed. Computing the
proposed error measure is, however, computationally expen
sive, especially for robust data terms. Ng and Solo (1997)
hence restricted their focus to the basic method of Horn
and Schunck (1981). In a Bayesian framework, a parame
ter selection approach that can also handle robust data terms
was presented by Krajsek and Mester (2007). This method
jointly estimates the ﬂow and the model parameters where
the latter encompass the smoothness weight and also the
relative weights of different data terms. This method does
not require a bruteforce search, but the minimisation of the
objective function is more complicated and only computa
tionally feasible if certain approximations are performed.
1.2 Our Contributions
The OFH method is obtained in three steps. We ﬁrst de
velop a robust and invariant data term. Then an anisotropic
image and ﬂowdriven smoothness term is designed that
works complementary to the data term. Finally we propose
a simple method for automatically determining the optimal
smoothness weight for the given image sequence.
Our data term combines the brightness and the gradient
constancy assumption, and performs a constraint normali
sation. It further uses a HueSaturationValue (HSV) colour
representation with a separate robustiﬁcation of each chan
nel. The latter is motivated by the fact that each channel in
the HSV space has a distinct level of photometric invariance
and information content. Hence, a separate robustiﬁcation
allows to choose the most reliable channel at each position.
Our anisotropic complementary smoothness term takes
into account directional information fromthe constraints im
posed in the data term. Across “constraint edges”, we per
form a robust penalisation to reduce the smoothing in the
direction where the data term gives the most information.
Along constraint edges, where the data term gives no in
formation, a strong ﬁllingin by using a quadratic penalisa
tion makes sense. This strategy not only allows for an op
timal complementarity between data and smoothness term,
but also leads to a desirable image and ﬂowdriven behav
iour. We further show that our regulariser can easily be ex
tended to work in the spatiotemporal domain.
Our method for determining the optimal smoothness
weight bases on the proposed OPP concept. This results
in ﬁnding the optimal smoothness weight as the one cor
responding to the ﬂow ﬁeld with the best prediction quality.
To judge the latter, we evaluate the data constraints between
the ﬁrst and the third frame of the sequence. Assuming a
constant speed and linear trajectory of objects within the
considered frames, this can be realised by simply doubling
Int J Comput Vis (2011) 93: 368–388 371
the ﬂow vectors. Due to its simplicity, our method is easy
to implement for all variational optic ﬂow approaches, but
nevertheless produces surprisingly good results.
The present article extends our shorter conference paper
(Zimmer et al. 2009) by the following points: (i) A more
extensive derivation and discussion of the data term. (ii)
An explicit discussion on the adequate treatment of the hue
channel of the HSV colour space. (iii) A taxonomy of ex
isting smoothness terms within a novel general framework.
The latter allows to reformulate most existing as well as our
novel regulariser in a common notation. (iv) The extension
of our complementary regulariser to the spatiotemporal do
main. (v) A simple method for automatically selecting the
smoothness weight. (vi) A deeper discussion of implemen
tation issues. (vii) Amore extensive experimental validation.
Organisation In Sect. 2 we present our variational optic
ﬂowmodel with the robust data termand the complementary
smoothness term. The latter is then extended to the spatio
temporal domain. Section 3 describes the method proposed
for determining the smoothness weight. After discussing
implementation issues in Sect. 4, we show experiments in
Sect. 5. The paper is ﬁnished with conclusions and an out
look on future work in Sect. 6.
2 Variational Optic Flow
Let f (x) be a greyscale image sequence where x :=
(x, y, t )
. Here, the vector (x, y)
∈ Ω denotes the location
within a rectangular image domain Ω ⊂ R
2
, and t ∈ [0, T ]
denotes time. We further assume that f is presmoothed by a
Gaussian convolution: Given an image sequence f
0
(x), we
obtain f (x) =(K
σ
∗ f
0
)(x), where K
σ
is a spatial Gaussian
of standard deviation σ and ∗ denotes the convolution oper
ator. This presmoothing step helps to reduce the inﬂuence of
noise and additionally makes the image sequence inﬁnitely
many times differentiable, i.e. f ∈ C
∞
.
The optic ﬂow ﬁeld w := (u, v, 1)
describes the dis
placement vector ﬁeld between two frames at time t and
t +1. It is found by minimising a global energy functional
of the general form
E(u, v) =
_
Ω
_
M(u, v) +α V(∇
2
u, ∇
2
v)
_
dx dy, (1)
where ∇
2
:=(∂
x
, ∂
y
)
denotes the spatial gradient operator.
The term M(u, v) denotes the data term, V(∇
2
u, ∇
2
v) the
smoothness term, and α > 0 is a smoothness weight. Note
that the energy (1) refers to the spatial case where one com
putes one ﬂow ﬁeld between two frames at time t and t +1.
The more general spatiotemporal case that uses all frames
t ∈ [0, T ] will be presented in Sect. 2.3.
According to the calculus of variations (Elsgolc 1962),
a minimiser (u, v) of the energy (1) necessarily has to fulﬁl
the associated EulerLagrange equations
∂
u
M −α
_
∂
x
_
∂
u
x
V
_
+∂
y
_
∂
u
y
V
__
= 0, (2)
∂
v
M −α
_
∂
x
_
∂
v
x
V
_
+∂
y
_
∂
v
y
V
__
= 0 (3)
with homogeneous Neumann boundary conditions.
2.1 Data Term
Let us now derive our data term in a systematic way. The
classical starting point is the brightness constancy assump
tion used by Horn and Schunck (1981). It states that im
age intensities remain constant under their displacement,
i.e. f (x + w) = f (x). Assuming that the image sequence
is smooth (which can be guaranteed by the Gaussian pres
moothing) and that the displacements are small, we can per
form a ﬁrstorder Taylor expansion that yields the linearised
optic ﬂow constraint (OFC)
0 =f
x
u +f
y
v +f
t
=∇
3
f w, (4)
where ∇
3
:= (∂
x
, ∂
y
, ∂
t
)
is the spatiotemporal gradient
operator and subscripts denote partial derivatives. With a
quadratic penalisation, the corresponding data term is given
by
M
1
(u, v) =
_
∇
3
f w
_
2
=w
J
0
w, (5)
with the tensor
J
0
:=∇
3
f ∇
3
f. (6)
The single equation given by the OFC involves two un
knowns u and v. It is thus not sufﬁcient to compute a unique
solution, which is known as the aperture problem (Bertero
et al. 1988). Nevertheless, the OFC does allow to compute
the ﬂowcomponent orthogonal to image edges, the socalled
normal ﬂow. For ∇
2
f  =0 it is deﬁned as
w
n
:=
_
u
n
, 1
_
:=
_
−
f
t
∇
2
f 
∇
2
f
∇
2
f 
, 1
_
. (7)
Normalisation Our experiments will show that normalis
ing the data term can be beneﬁcial. Following Simoncelli
et al. (1991), Lai and Vemuri (1998), Schoenemann and Cre
mers (2006) and using the abbreviation u := (u, v)
, we
rewrite the data term M
1
as
M
1
(u, v) =
_
∇
2
f u +f
t
_
2
=
_
∇
2
f 
_
∇
2
f u
∇
2
f 
+
f
t
∇
2
f 
__
2
372 Int J Comput Vis (2011) 93: 368–388
Fig. 1 Geometric interpretation of the rewritten data term (8)
= ∇
2
f 
2
_
∇
2
f
∇
2
f 
_
u +
f
t
∇
2
f
∇
2
f 
2
__
2
= ∇
2
f 
2
_
∇
2
f
∇
2
f 
_
u −u
n
_
. ,, .
=:d
_
2
. (8)
The term d constitutes a projection of the difference between
the estimated ﬂow u and the normal ﬂow u
n
in the direction
of the image gradient ∇
2
f . In a geometric interpretation,
the term d describes the distance from u to the line l in the
uvspace that is given by
v =−
f
x
f
y
u −
f
t
f
y
. (9)
On this line, the ﬂow u has to lie according to the OFC (4),
and the normal ﬂow u
n
is the smallest vector that lies on
this line. A sketch of this situation is given in Fig. 1. Our
geometric interpretation suggests that one should ideally pe
nalise the distance d in a data term M
2
(u, v) =d
2
. The data
term M
1
, however, weighs this distance by the squared spa
tial image gradient, as M
1
(u, v) = ∇
2
f 
2
d
2
, see (8). This
results in a stronger enforcement of the data constraint at
high gradient locations. This overweighting is undesirable as
large gradients can be caused by unreliable structures, such
as noise or occlusions.
As a remedy, we normalise the data term M
1
by multiply
ing it with a factor (Simoncelli et al. 1991; Lai and Vemuri
1998)
θ
0
:=
1
∇
2
f 
2
+ζ
2
, (10)
where the regularisation parameter ζ > 0 avoids division by
zero. Additionally, it reduces the inﬂuence of small gradients
which are signiﬁcantly smaller than ζ
2
, e.g. noise in ﬂat re
gions. Nevertheless, the normalisation is not inﬂuenced for
large gradients. Thus, it may pay off to choose a larger value
of ζ in the presence of noise. Incorporating the normalisa
tion into M
1
leads to the data term
M
2
(u, v) =w
¯
J
0
w, (11)
with the normalised tensor
¯
J
0
:=θ
0
J
0
=θ
0
_
∇
3
f ∇
3
f
_
. (12)
Gradient Constancy Assumption To render the data term
robust under additive illumination changes, it was proposed
to impose the gradient constancy assumption (Tretiak and
Pastor 1984; Schnörr 1994; Brox et al. 2004). In contrast
to the brightness constancy assumption, it states that im
age gradients remain constant under their displacement, i.e.
∇
2
f (x +w) =∇
2
f (x). A Taylor linearisation gives
∇
3
f
x
w=0 and ∇
3
f
y
w=0, (13)
respectively. Combining both brightness and gradient con
stancy assumption in a quadratic way gives the data term
M
3
(u, v) =w
Jw, (14)
where the tensor J can be written in the motion tensor no
tation of Bruhn et al. (2006) that allows to combine the two
constancy assumptions in a joint tensor
J := J
0
+γ J
xy
:=J
0
+γ
_
J
x
+J
y
_
:= ∇
3
f ∇
3
f +γ
_
∇
3
f
x
∇
3
f
x
+∇
3
f
y
∇
3
f
y
_
, (15)
where the parameter γ > 0 steers the contribution of the gra
dient constancy assumption.
To normalise M
3
, we replace the motion tensor J by its
normalised counterpart
¯
J :=
¯
J
0
+γ
¯
J
xy
:=
¯
J
0
+γ
_
¯
J
x
+
¯
J
y
_
:= θ
0
J
0
+γ
_
θ
x
J
x
+θ
y
J
y
_
:= θ
0
_
∇
3
f ∇
3
f
_
+γ
_
θ
x
_
∇
3
f
x
∇
3
f
x
_
+θ
y
_
∇
3
f
y
∇
3
f
y
__
, (16)
with two additional normalisation factors deﬁned as
θ
x
:=
1
∇
2
f
x

2
+ζ
2
and θ
y
:=
1
∇
2
f
y

2
+ζ
2
. (17)
The normalised data term M
4
is given by
M
4
(u, v) =w
¯
Jw. (18)
Colour Image Sequences In a next step we extend our data
term to multichannel sequences (f
1
(x), f
2
(x), f
3
(x)). If
Int J Comput Vis (2011) 93: 368–388 373
one uses the standard RGB colour space, the three channels
represent the red, green and blue channel, respectively. We
couple the three colour channels in the motion tensor
¯
J
c
:=
3
i=1
¯
J
i
:=
3
i=1
_
¯
J
i
0
+γ
¯
J
i
xy
_
:=
3
i=1
_
¯
J
i
0
+γ
_
¯
J
i
x
+
¯
J
i
y
__
:=
3
i=1
_
θ
i
0
_
∇
3
f
i
∇
3
f
i
_
+γ
_
θ
i
x
_
∇
3
f
i
x
∇
3
f
i
x
_
+θ
i
y
_
∇
3
f
i
y
∇
3
f
i
y
___
, (19)
with normalisation factors θ
i
for each colour channel f
i
.
The corresponding data term reads as
M
5
(u, v) =w
¯
J
c
w. (20)
Photometric Invariant Colour Spaces Realistic illumina
tion models encompass a multiplicative inﬂuence (van de
Weijer and Gevers 2004), which cannot be captured by the
gradient constancy assumption that is only invariant under
additive illumination changes. This problem can be tackled
by using the Hue Saturation Value (HSV) colour space, as
proposed by Golland and Bruckstein (1997). The hue chan
nel is invariant under global and local multiplicative illu
mination changes, as well as under local additive changes.
The saturation channel is only invariant under global mul
tiplicative illumination changes, and the value channel ex
hibits no invariances. Mileva et al. (2007) thus only used
the hue channel for optic ﬂow computation as it exhibits the
most invariances. We will additionally use the saturation and
value channel, because they contain information that is not
encoded in the hue channel.
As an example, consider the HSV decomposition of the
Rubberwhale image shown in Fig. 2. As we can see, the
shadow at the left of the wheel (shown in the zoom) is not
present in the hue and the saturation channel, but appears
in the value channel. Nevertheless, especially the hue chan
nel discards a lot of image information, as can be observed
for the striped cloth. This information is, on the other hand,
available in the value channel.
One problem when using a HSV colour representation is
that the hue channel f
1
describes an angle in a colour circle,
i.e. f
1
∈ [0
◦
, 360
◦
). The hue channel is hence not differen
tiable at the interface between 0
◦
and 360
◦
. Our solution to
this problem is to consider the unit vector (cos f
1
, sinf
1
)
corresponding to the angle f
1
. This results in treating the
hue channel as two (coupled) channels, which are both dif
ferentiable. The corresponding motion tensor for the bright
Fig. 2 HSV decomposition on the example of the Rubberwhale im
age from the Middlebury optic ﬂow database (Baker et al. 2009,
http://vision.middlebury.edu/ﬂow/data/). First row, from left to right:
(a) Colour image with zoom in shadow region left of the wheel. (b)
Hue channel, visualised with full saturation and value. Second row,
from left to right: (c) Saturation channel. (d) Value channel
ness constancy assumption consequently reads as
¯
J
1
0
:= θ
1
0
_
∇
3
cos f
1
∇
3
cos f
1
+∇
3
sinf
1
∇
3
sinf
1
_
,
(21)
where the normalisation factor is deﬁned as
θ
1
0
:=
1
¸
¸
∇
2
cos f
1
¸
¸
2
+
¸
¸
∇
2
sinf
1
¸
¸
2
+ζ
2
. (22)
The tensor
¯
J
1
xy
for the gradient constancy assumption is
adapted accordingly. Note that in the differentiable parts of
the hue channel, the motion tensor (21) is equivalent to our
earlier deﬁnition, as
∇
3
cos f
1
∇
3
cos f
1
+∇
3
sinf
1
∇
3
sinf
1
=sin
2
f
1
_
¯
∇
3
f
1
¯
∇
3
f
1
_
+cos
2
f
1
_
¯
∇
3
f
1
¯
∇
3
f
1
_
=
¯
∇
3
f
1
¯
∇
3
f
1
, (23)
where
¯
∇ denotes the gradient in the differentiable parts of
the hue channel.
Robust Penalisers To provide robustness of the data term
against outliers caused by noise and occlusions, Black and
Anandan (1996) proposed to refrain from a quadratic penali
sation. Instead they use a subquadratic penalisation function
M
(s
2
), where s
2
denotes the quadratic data term. Using
such a robust penaliser within our data term yields
M
6
(u, v) =
M
_
w
¯
J
c
w
_
. (24)
Good results are reported by Brox et al. (2004) for the sub
quadratic penaliser
M
(s
2
) :=
√
s
2
+ε
2
using a small reg
ularisation parameter ε > 0.
374 Int J Comput Vis (2011) 93: 368–388
Bruhn and Weickert (2005) later proposed a separate
penalisation of the brightness and the gradient constancy
assumption, which is advantageous if one assumption pro
duces an outlier. Incorporating this strategy into our ap
proach gives the data term
M
7
(u, v) =
M
_
w
¯
J
c
0
w
_
+γ
M
_
w
¯
J
c
xy
w
_
, (25)
where the separate motion tensors are deﬁned as
¯
J
c
0
:=
3
i=1
¯
J
i
0
and
¯
J
c
xy
:=
3
i=1
¯
J
i
xy
. (26)
We will go further by proposing a separate robustiﬁcation of
each colour channel in the HSV space. This can be justiﬁed
by the distinct information content of each of the three chan
nels, see Fig. 2, that drives the optic ﬂow estimation in dif
ferent ways. The separate robustiﬁcation then downweights
the inﬂuence of less appropriate colour channels.
Final Data Term Incorporating our separate robustiﬁca
tion idea into M
7
brings us to our ﬁnal data term
M(u, v) =
3
i=1
M
_
w
¯
J
i
0
w
_
+γ
_
3
i=1
M
_
w
¯
J
i
xy
w
_
_
,
(27)
with the motion tensors
¯
J
1
adapted to the HSV colour space
as described before. Note that our ﬁnal data term is (i) nor
malised, (ii) combines the brightness and gradient constancy
assumption, and (iii) uses the HSV colour space with (iv) a
separate robustiﬁcation of all colour channels.
The contributions of our data term (27) to the Euler
Lagrange equations (2) and (3) are given by
∂
u
M =
3
i=1
_
M
_
w
¯
J
i
0
w
_
·
__
¯
J
i
0
_
1,1
u +
_
¯
J
i
0
_
1,2
v +
_
¯
J
i
0
_
1,3
__
+γ
_
3
i=1
_
M
_
w
¯
J
i
xy
w
_
·
__
¯
J
i
xy
_
1,1
u +
_
¯
J
i
xy
_
1,2
v +
_
¯
J
i
xy
_
1,3
__
_
, (28)
∂
v
M =
3
i=1
_
M
_
w
¯
J
i
0
w
_
·
__
¯
J
i
0
_
1,2
u +
_
¯
J
i
0
_
2,2
v +
_
¯
J
i
0
_
2,3
__
+γ
_
3
i=1
_
M
_
w
¯
J
i
xy
w
_
·
__
¯
J
i
xy
_
1,2
u +
_
¯
J
i
xy
_
2,2
v +
_
¯
J
i
xy
_
2,3
__
_
, (29)
where [J]
m,n
denotes the entry in row m and column n of
the tensor J, and
M
(s
2
) denotes the derivative of
M
(s
2
)
w.r.t. its argument. Analysing the terms (28) and (29), we see
that the separate robustiﬁcation of the HSV channels makes
sense: If a speciﬁc channel violates the imposed constancy
assumption at a certain location, the corresponding argu
ment of the decreasing function
M
will be large, yielding a
downweighting of this channel. Other channels that satisfy
the constancy assumption then have a dominating inﬂuence
on the solution. This will be conﬁrmed by a speciﬁc experi
ment in Sect. 5.1.
2.2 Smoothness Term
Following the extensive taxonomy on optic ﬂow regularisers
by Weickert and Schnörr (2001a), we sketch some existing
smoothness terms that led to our novel complementary regu
lariser. We rewrite the regularisers in a novel framework that
uniﬁes their notation and eases their comparison.
Preliminaries for the General Framework We ﬁrst intro
duce concepts that will be used in our general framework.
Anisotropic imagedriven regularisers take into account
directional information from image structures. These infor
mation can be obtained by considering the structure tensor
(Förstner and Gülch 1987)
S
ρ
:=K
ρ
∗
_
∇
2
f ∇
2
f
_
=:
2
i=1
μ
i
s
i
s
i
, (30)
with an integration scale ρ > 0. The structure tensor is a
symmetric, positive semideﬁnite 2 ×2 matrix that possesses
two orthonormal eigenvectors s
1
and s
2
with corresponding
eigenvalues μ
1
≥ μ
2
≥ 0. The vector s
1
points across im
age structures, whereas the vector s
2
points along them. In
the case for ρ = 0, i.e. without considering neighbourhood
information, one obtains
s
0
1
=
∇
2
f
∇
2
f 
and s
0
2
=
∇
⊥
2
f
∇
2
f 
, (31)
where ∇
⊥
2
f := (−f
y
, f
x
)
denotes the vector orthogonal
to ∇
2
f . For ρ =0 we also have that ∇
2
f 
2
=tr S
0
, with tr
denoting the trace operator.
Most regularisers impose smoothness by penalising the
magnitude of the ﬂow gradients. As s
1
and s
2
constitute an
orthonormal basis, we can write
∇
2
u
2
=u
2
x
+u
2
y
=u
2
s
1
+u
2
s
2
, (32)
using the directional derivatives u
s
i
:= s
i
∇
2
u. A corre
sponding rewriting can also be performed for ∇
2
v
2
.
Int J Comput Vis (2011) 93: 368–388 375
To analyse the smoothing behaviour of the regularisers,
we will consider the corresponding EulerLagrange equa
tions that can be written in the form
∂
u
M −α div(D∇
2
u) = 0, (33)
∂
v
M −α div(D∇
2
v) = 0, (34)
with a diffusion tensor D that steers the smoothing of the
ﬂow components u and v. More speciﬁc, the eigenvectors
of D give the smoothing direction, and the corresponding
eigenvalues determine the magnitude of smoothing.
Homogeneous Regularisation First ideas for the smooth
ness term go back to Horn and Schunck (1981) who used a
homogeneous regulariser. In our framework it reads as
V
H
(∇
2
u, ∇
2
v) := ∇
2
u
2
+∇
2
v
2
= u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
. (35)
The corresponding diffusion tensor is equal to the unit ma
trix D
H
= I. The smoothing processes thus perform homo
geneous diffusion that blurs important ﬂow edges.
ImageDriven Regularisation To obtain sharp ﬂow edges,
imagedriven methods (Nagel and Enkelmann 1986; Al
varez et al. 1999) reduce the smoothing at image edges, in
dicated by large values of ∇
2
f 
2
=tr S
0
.
An isotropic imagedriven regulariser goes back to Al
varez et al. (1999) who used
V
II
(∇
2
u, ∇
2
v) := g
_
∇
2
f 
2
_ _
∇
2
u
2
+∇
2
v
2
_
= g
_
tr S
0
_ _
u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
_
, (36)
where g is a decreasing, strictly positive weight function.
The corresponding diffusion tensor D
II
= g(tr S
0
) I shows
that the weight function allows to decrease the smoothing in
accordance to the strength of image edges.
The anisotropic imagedriven regulariser of Nagel and
Enkelmann (1986) prevents smoothing of the ﬂow ﬁeld
across image boundaries but encourages smoothing along
them. This is achieved by the regulariser
V
AI
(∇
2
u, ∇
2
v) := ∇
2
uP(∇
2
f ) ∇
2
u +∇
2
v P
_
∇
2
f
_
∇
2
v,
(37)
where P(∇
2
f ) denotes a regularised projection matrix per
pendicular to the image gradient. It is deﬁned as
P(∇
2
f ) :=
1
∇
2
f 
2
+2κ
2
_
∇
⊥
2
f
_
∇
⊥
2
f
_
+κ
2
I
_
. (38)
with a regularisation parameter κ > 0. In our common
framework, this regulariser can be written as
V
AI
(∇
2
u, ∇
2
v) =
κ
2
tr S
0
+2κ
2
_
u
2
s
0
1
+v
2
s
0
1
_
+
tr S
0
+κ
2
tr S
0
+2κ
2
_
u
2
s
0
2
+v
2
s
0
2
_
. (39)
The correctness of above rewriting can easily be veriﬁed and
is based on the observations that s
0
1
and s
0
2
are the eigenvec
tors of P, and that the factors in front of (u
2
s
0
1
+ v
2
s
0
1
) and
(u
2
s
0
2
+v
2
s
0
2
) are the corresponding eigenvalues. The diffusion
tensor for the regulariser of Nagel and Enkelmann (1986) is
identical to the projection matrix: D
AI
= P. Concerning its
eigenvectors and eigenvalues, we observe that in the lim
iting case for κ → 0, where V
AI
→ u
2
s
0
2
+ v
2
s
0
2
, we obtain
a smoothing solely in s
0
2
direction, i.e. along image edges.
In the deﬁnition of the normal ﬂow (7) we have seen that
a data term that models the brightness constancy assump
tion constraints the ﬂow only orthogonal to image edges. In
the limiting case, the regulariser of Nagel and Enkelmann
can hence be interpreted as a ﬁrst complementary smooth
ness term that ﬁlls in information orthogonal to the data con
straint direction. This complementarity has been emphasised
by Schnörr (1993), who contributed a theoretical analysis as
well as modiﬁcations of the original Nagel and Enkelmann
functional.
The drawback of imagedriven strategies is that they are
prone to oversegmentation artefacts in textured image re
gions where image edges do not necessarily correspond to
ﬂow edges.
FlowDriven Regularisation To remedy the oversegmen
tation problem, it makes sense to adapt the smoothing
process to the ﬂow edges instead of the image edges.
In the isotropic setting, Shulman and Hervé (1989) and
Schnörr (1994) proposed to use subquadratic penaliser func
tions for the smoothness term, i.e.
V
IF
(∇
2
u, ∇
2
v) :=
V
_
∇
2
u
2
+∇
2
v
2
_
=
V
_
u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
_
, (40)
where the penaliser function
V
(s
2
) is preferably increas
ing, differentiable and convex in s. The associated diffusion
tensor is given by
D
IF
=
V
_
u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
_
I. (41)
The underlying diffusion processes perform nonlinear iso
tropic diffusion, where the smoothing is reduced at the
boundaries of the evolving ﬂow ﬁeld via the decreasing dif
fusivity
V
. If one uses the convex penaliser (Cohen 1993)
V
(s
2
) :=
_
s
2
+ε
2
, (42)
376 Int J Comput Vis (2011) 93: 368–388
one ends up with regularised total variation (TV) regularisa
tion (Rudin et al. 1992) with the diffusivity
V
(s
2
) =
1
2
√
s
2
+ε
2
≈
1
2s
. (43)
Another possible choice is the nonconvex PeronaMalik
regulariser (Lorentzian) (Black and Anandan 1996; Perona
and Malik 1990) given by
V
(s
2
) :=λ
2
log
_
1 +
s
2
λ
2
_
, (44)
that results in PeronaMalik diffusion with the diffusivity
V
(s
2
) =
1
1 +
s
2
λ
2
, (45)
using a contrast parameter λ > 0.
We will not discuss the anisotropic ﬂowdriven regu
lariser of Weickert and Schnörr (2001a) as it does not ﬁt
in our framework and also has not been used in the design
of our complementary regulariser.
Despite the fact that ﬂowdriven methods reduce the
oversegmentation problem caused by image textures, they
suffer from another drawback: The ﬂow edges are not as
well localised as with imagedriven strategies.
Image and FlowDriven Regularisation We have seen
that imagedriven methods suffer from oversegmentation
artefacts, but give sharp ﬂow edges. Flowdriven strategies
remedy the oversegmentation problem but give less pleasant
ﬂow edges. It is thus desirable to combine the advantages of
both strategies to obtain sharp ﬂow edges without overseg
mentation problems.
This aim was achieved by Sun et al. (2008) who pre
sented an anisotropic image and ﬂowdriven smoothness
term in a discrete setting. It adapts the smoothing direc
tion to image structures but steers the smoothing strength
in accordance to the ﬂow contrast. In contrast to Nagel and
Enkelmann (1986) who considered ∇
⊥
2
f to obtain direc
tional information of image structures, the regulariser from
Sun et al. (2008) analyses the eigenvectors s
i
of the struc
ture tensor S
ρ
from (30) to obtain a more robust direction
estimation. A continuous version of this regulariser can be
written as
V
AIF
(∇
2
u, ∇
2
v) :=
V
_
u
2
s
1
_
+
V
_
v
2
s
1
_
+
V
_
u
2
s
2
_
+
V
_
v
2
s
2
_
. (46)
Here, we obtain two diffusion tensors, that for p ∈ {u, v}
read as
D
p
AIF
=
V
_
p
2
s
1
_
s
1
s
1
+
V
_
p
2
s
2
_
s
2
s
2
. (47)
We observe that these tensors allow to obtain the desired be
haviour: The regularisation direction is adapted to the image
structure directions s
1
and s
2
, whereas the magnitude of the
regularisation depends on the ﬂow contrast encoded in p
s
1
and p
s
2
. As a result, one obtains the same sharp ﬂow edges
as imagedriven methods but does not suffer from overseg
mentation problems.
2.2.1 Our Novel Complementary Regulariser
In spite of its sophistication, the anisotropic image and
ﬂowdriven model from Sun et al. (2008) given in (46) still
suffers from a few shortcomings. We introduce three amend
ments that we will discuss now.
Regularisation Tensor A ﬁrst remark w.r.t. the model from
(46) is that the directional information from the structure
tensor S
ρ
is not consistent with the imposed constraints of
our data term (27). It is more natural to take into account di
rectional information provided by the motion tensor (19) and
to steer the anisotropic regularisation process w.r.t. “con
straint edges” instead of image edges. To this end we pro
pose to analyse the eigenvectors r
1
and r
2
of the regularisa
tion tensor
R
ρ
:=
3
i=1
K
ρ
∗
_
θ
i
0
_
∇
2
f
i
∇
2
f
i
_
+γ
_
θ
i
x
_
∇
2
f
i
x
∇
2
f
i
x
_
+θ
i
y
_
∇
2
f
i
y
∇
2
f
i
y
___
, (48)
which can be regarded as a generalisation of the structure
tensor (30). Note that the regularisation tensor differs from
the motion tensor
¯
J
c
from (19) by the facts that it (i) in
tegrates neighbourhood information via the Gaussian con
volution, and (ii) uses the spatial gradient operator ∇
2
in
stead of the spatiotemporal operator ∇
3
. The latter is due
to the spatial regularisation. In Sect. 2.3 we extend our reg
ulariser to the spatiotemporal domain, yielding a regulari
sation tensor that also uses the spatiotemporal gradient ∇
3
.
Further note that a Gaussian convolution of the motion ten
sor leads to a combined localglobal (CLG) data term in the
spirit of Bruhn et al. (2005). Our experiments in Sect. 5.1
will analyse in which cases such a modiﬁcation of our data
term can be useful.
Rotational Invariance The smoothness term V
AIF
from
(46) lacks the desirable property of rotational invariance, be
cause the directional derivatives of u and v in the eigenvec
tor directions are penalised separately. We propose to jointly
penalise the directional derivatives, yielding
V
AIF
Rρ,RI
(∇
2
u, ∇
2
v) :=
V
_
u
2
r
1
+v
2
r
1
_
+
V
_
u
2
r
2
+v
2
r
2
_
,
(49)
where we use the eigenvectors r
i
of the regularisation tensor.
Int J Comput Vis (2011) 93: 368–388 377
Table 1 Comparison of
regularisation strategies. The
next to last column names the
tensor that is analysed to obtain
directional information for
anisotropic strategies, and the
last column indicates if the
corresponding regulariser is
rotationally invariant
Strategy Regulariser V Directional Rotationally
adaptation invariant
Homogeneous u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
–
(Horn and Schunck 1981)
Isotropic imagedriven g (tr S
0
)
_
u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
_
–
(Alvarez et al. 1999)
Anisotropic imagedriven u
2
s
0
2
+v
2
s
0
2
, for κ →0 S
0
(Nagel and Enkelmann 1986)
Isotropic ﬂowdriven
V
_
u
2
s
1
+u
2
s
2
+v
2
s
1
+v
2
s
2
_
–
(Shulman and Hervé 1989)
Anisotropic image and
V
_
u
2
s
1
_
+
V
_
v
2
s
1
_
+
V
_
u
2
s
2
_
+
V
_
v
2
s
2
_
S
ρ
—
ﬂowdriven (Sun et al. 2008)
Anisotropic complementary
V
_
u
2
r
1
+v
2
r
1
_
+u
2
r
2
+v
2
r
2
R
ρ
image and ﬂowdriven
Single Robust Penalisation. The above regulariser (49)
performs a twofold robust penalisation in both eigenvec
tor directions. However, the data term mainly constraints
the ﬂow in direction of the largest eigenvalue of the spa
tial motion tensor, i.e. in r
1
direction. We hence propose a
single robust penalisation in r
1
direction. In the orthogonal
r
2
direction, we opt for a quadratic penalisation to obtain
a strong ﬁllingin effect of missing information. The bene
ﬁts of this design will be conﬁrmed by our experiments in
Sect. 5.2. Incorporating the single robust penalisation ﬁnally
yields our complementary regulariser
V
CR
(∇
2
u, ∇
2
v) :=
V
_
u
2
r
1
+v
2
r
1
_
+u
2
r
2
+v
2
r
2
, (50)
that complements the proposed robust data term from (27)
in an optimal fashion. For the penaliser
V
, we propose to
use the PeronaMalik regulariser (44).
The corresponding joint diffusion tensor is given by
D
CR
=
V
_
u
2
r
1
+v
2
r
1
_
r
1
r
1
+r
2
r
2
, (51)
with
V
given in (45). The derivation of this diffusion tensor
is presented in Appendix.
Discussion To understand the advantages of the comple
mentary regulariser compared to the anisotropic image and
ﬂowdriven regulariser (46), we compare our joint diffusion
tensor (51) to its counterparts (47), which reveals the fol
lowing innovations: (i) The smoothing direction is adapted
to constraint edges instead of image edges, as the eigenvec
tors of the regularisation tensor r
i
are used instead of the
eigenvectors of the structure tensor. (ii) We achieve rota
tional invariance by coupling the two ﬂowcomponents in the
argument of
V
. (iii) We only reduce the smoothing across
constraint edges, i.e. in r
1
direction. Along them, always a
strong diffusion with strength 1 is performed, resembling
edgeenhancing anisotropic diffusion (Weickert 1996).
Furthermore, when analysing our joint diffusion ten
sor, the beneﬁts of the underlying anisotropic image and
ﬂowdriven regularisation become visible. The smoothing
strength across constraint edges is determined by the expres
sion
V
(u
2
r
1
+v
2
r
1
). Here we can distinguish two scenarios:
At a ﬂowedge that corresponds to a constraint edge, the ﬂow
gradients will be large and almost parallel to r
1
. Thus, the
argument of the decreasing function
V
will be large, yield
ing a reduced diffusion which preserves this important edge.
At “deceiving” texture edges in ﬂat ﬂow regions, however,
the ﬂow gradients are small. This results in a small argument
for
V
, leading to almost homogeneous diffusion. Hence,
we perform a pronounced smoothing in both directions that
avoids oversegmentation artefacts.
Finally note that our complementary regulariser has the
same structure, even if other data terms are used. Only the
regularisation tensor R
ρ
has to be adapted to the new data
term.
2.2.2 Summary
To conclude this section, Table 1 summarises the discussed
regularisers rewritten in our framework. It also compares
the way directional information is obtained for anisotropic
strategies, and it indicates if the regulariser is rotationally
invariant. Note that despite the fact these regularisers have
been developed within almost three decades, our taxonomy
shows their structural similarities.
2.3 Extension to a SpatioTemporal Smoothness Term
The smoothness terms we have discussed so far model the
assumption of a spatially smooth ﬂow ﬁeld. As image se
quences in general encompass more than two frames, yield
ing several ﬂow ﬁelds, it makes sense to also assume a
378 Int J Comput Vis (2011) 93: 368–388
temporal smoothness of the ﬂow ﬁelds, leading to spatio
temporal regularisation strategies.
A spatiotemporal (ST) version of the general energy
functional (1) reads as
E
ST
(u, v) =
_
Ω×[0,T ]
_
M(u, v)
+α V
ST
(∇
3
u, ∇
3
v)
_
dx dy dt. (52)
Compared to the spatial energy (1) we note the additional
integration over the time domain and that the smoothness
term now depends on the spatiotemporal ﬂow gradient.
To extend our complementary regulariser from(50) to the
spatiotemporal domain, we deﬁne the spatiotemporal reg
ularisation tensor
R
ST
ρ
:=K
ρ
∗
¯
J
c
. (53)
For ρ = 0 it is identical to the motion tensor
¯
J
c
from (19).
The Gaussian convolution with K
ρ
is now performed in
the spatiotemporal domain, which also holds for the pres
moothing of the image sequence. The spatiotemporal regu
larisation tensor is a 3 ×3 tensor that possesses three ortho
normal eigenvectors r
1
, r
2
and r
3
. With their help, we deﬁne
the spatiotemporal complementary regulariser (STCR)
V
ST
CR
(∇
3
u, ∇
3
v) :=
V
_
u
2
r
1
+v
2
r
1
_
+u
2
r
2
+v
2
r
2
+u
2
r
3
+v
2
r
3
.
(54)
The corresponding spatiotemporal diffusion tensor reads as
D
ST
CR
=
V
_
u
2
r
1
+v
2
r
1
_
r
1
r
1
+r
2
r
2
+r
3
r
3
. (55)
3 Automatic Selection of the Smoothness Weight
The last step missing for our OFH method is a strategy that
automatically determines the optimal smoothness parameter
α for the image sequence under consideration. This is es
pecially important in real world applications of optic ﬂow
where no ground truth ﬂow is known. Note that if the latter
would be the case, we could simply select the smoothness
weight that gives the ﬂow ﬁeld with the smallest deviation
from the ground truth.
3.1 A Novel Concept
We propose an error measure that allows to judge the quality
of a ﬂow ﬁeld without knowing the ground truth. This error
measure bases on a novel concept, the optimal prediction
principle (OPP). The OPP states that the ﬂow ﬁeld obtained
with an optimal smoothness weight allows for the best pre
diction of the next frames in the image sequence. This makes
sense as a too small smoothness weight would lead to an
overﬁt to the ﬁrst two frames and consequently result in a
bad prediction of further frames. For too large smoothness
weights, the ﬂow ﬁelds would be too smooth and thus also
lead to a bad prediction.
Following the OPP, our error measure needs to judge the
quality of the prediction achieved with a given ﬂow ﬁeld.
To this end, we evaluate the imposed data constraints be
tween the ﬁrst and the third frame of the image sequence,
resulting in an average data constancy error (ADCE) mea
sure. To compute this measure, we assume that the mo
tion of the scene objects is of more or less constant speed
and that it describes linear trajectories within the considered
three frames. Under these assumptions, we simply double
the ﬂow vectors to evaluate the data constraints between ﬁrst
and third frame. Following this strategy, we can deﬁne the
ADCE between frame 1 and 3 as
ADCE
1,3
(w
α
)
:=
1
Ω
_
Ω
_
3
i=1
M
_
θ
i
0
_
f
i
(x +2w
α
) −f
i
(x)
_
2
_
+γ
_
3
i=1
M
_
θ
i
x
_
f
i
x
(x +2w
α
) −f
i
x
(x)
_
2
+θ
i
y
_
f
i
y
(x +2w
α
) −f
i
y
(x)
_
2
_
__
dx dy, (56)
where w
α
denotes the ﬂow ﬁeld obtained with a smoothness
weight α. The integrand of above expression is (apart from
the doubled ﬂow ﬁeld) a variant of our ﬁnal data term (27)
where no linearisation of the constancy assumptions have
been performed. To evaluate the images at the subpixel lo
cations f
i
(x+2w
α
) we use Coons patches based on bicubic
interpolation (Coons 1967).
3.2 Determining the Best Parameter
In general, the relation between α and the ADCE is not
convex, which excludes the use of gradient descentlike ap
proaches for ﬁnding the optimal value of α w.r.t. our error
measure.
We propose a bruteforce method similar to the one of
Ng and Solo (1997): We ﬁrst compute the error measures for
a “sufﬁciently large” set of ﬂow ﬁelds obtained with differ
ent α values. We then select the α that gives the smallest er
ror. To reduce the number of α values to test, we propose to
start from a given, standard value α
0
, say. This value is then
incremented/decremented n
α
times by multiplying/dividing
it with a stepping factor a > 1, yielding in total 2n
α
+ 1
tests. This strategy results in testing more values of α that
are close to α
0
and, more important, tests less very small
or very large values of α that hardly give reasonable re
sults.
Int J Comput Vis (2011) 93: 368–388 379
4 Implementation
The solution of the EulerLagrange equations for our method
comes down to solving a nonlinear system of equations. We
solve the system by a nonlinear multigrid scheme based on
a GaußSeidel type solver with alternating line relaxation
(Bruhn et al. 2006).
4.1 Warping Strategy for Large Displacements
The derivation of the optic ﬂow constraint (4) by means of
a linearisation is only valid under the assumption of small
displacements. If the temporal sampling of the image se
quence is too coarse, this precondition will be violated and
a linearised approach fails. To overcome this problem, Brox
et al. (2004) proposed a coarsetoﬁne multiscale warping
strategy. To obtain a coarse representation of the problem,
we downsample the input images by a factor η ∈ [0.5, 1.0).
Prior to downsampling, we apply a lowpass ﬁlter to the im
ages by performing a Gaussian convolution with standard
deviation
√
2/(4η). This prevents aliasing problems.
At each warping level, we split the ﬂow ﬁeld into an
already computed solution from coarser levels and an un
known ﬂow increment. As the increments are small, they
can computed by the presented linearised approach. At the
next ﬁner level, the already computed solution serves as ini
tialisation, which is achieved by performing a motion com
pensation of the second frame by the current ﬂow, known as
warping. For warping with subpixel precision we again use
Coons patches based on bicubic interpolation (Coons 1967).
Adapting the Smoothness Weight to the Warping Level The
inﬂuence of the data termusually becomes smaller at coarser
levels of our multiscale framework. This is due to the
smoothing properties of the downsampling that leads to
smaller values of the image gradients at coarse levels. Such
a behaviour is in fact desirable as the data term might not
be reliable at coarse levels. Our proposed data term normal
isation leads, however, to image gradients that are approxi
mately in the same range at each level. To recover the previ
ous reduction of the data term at coarse levels, we propose
to adapt the smoothness weight α to the warping level k.
This is achieved by setting α(k) = α/η
k
which results in
larger values of α and an emphasis of the smoothness term
at coarse levels.
4.2 Discretisation
We follow Bruhn et al. (2006) for the discretisation of the
EulerLagrange equations. The images and the ﬂow ﬁelds
are sampled on a rectangular pixel grid with grid size h and
temporal step size τ .
Spatial image derivatives are approximated via central ﬁ
nite differences using the stencil
1
12h
(1, −8, 0, 8, −1), re
sulting in a fourth order approximation. The spatial ﬂow
derivatives are discretised by second order approximations
with the stencil
1
2h
(−1, 0, 1). For approximating temporal
image derivatives we use a twopoint stencil (−1, 1), result
ing in a temporal difference. Concerning the temporal ﬂow
derivatives that occur in the spatiotemporal case, we use the
stencil (−1, 1)/τ . Here, it makes sense to adapt the value of
τ to the given image sequence to allow for an appropriate
scaling of the temporal direction compared to the spatial di
rections (Weickert and Schnörr 2001b).
When computing the motion tensor, occurring derivatives
are averaged from the two frames at time t and t + 1 to
obtain a lower approximation error. For the regularisation
tensor, the derivatives are solely computed at the ﬁrst frame
as we only want to consider directional information from the
reference image.
5 Experiments
In our experiments we show the beneﬁts of the OFH ap
proach. The ﬁrst experiments are concerned with our robust
data term and the complementary smoothness term in the
spatial and the spatiotemporal domain. Then, we turn to the
automatic selection of the smoothness weight. After a small
experiment on the importance of antialiasing in the warping
scheme, we ﬁnish our experiments by presenting the perfor
mance at the Middlebury optic ﬂow benchmark (Baker et al.
2009, http://vision.middlebury.edu/ﬂow/eval/).
As all considered sequences exhibit relatively large dis
placements, we use the multiscale warping approach de
scribed in Sect. 4.1. The ﬂow ﬁelds are visualised by a
colour code where hue encodes the ﬂow direction and
brightness the magnitude, see Fig. 3 (d). If not said oth
erwise, we use the following constant parameters: σ =0.5,
γ =20.0, ρ =4.0, η =0.95, ζ =0.1, ε =0.001, λ =0.1.
5.1 Robust Data Term
Beneﬁts of Normalisation and the HSV Colour Space We
proposed two main innovations in the data term: constraint
normalisation and using an HSV colour representation. In
our ﬁrst experiment, we thus compare our method against
variants (i) without data term normalisation, and (ii) using
the RGB instead of the HSV colour space. For the latter
we only separately robustify the brightness and the gradi
ent constancy assumption, as a separate robustiﬁcation of the
RGB channels makes no sense. In Fig. 3 we show the results
for the Snail sequence that we have created. Note that it is a
rather challenging sequence due to severe shadows and large
displacements up to 25 pixels. When comparing the results
380 Int J Comput Vis (2011) 93: 368–388
Fig. 3 Results for our Snail sequence with different variants of our
method. First row, from left to right: (a) First frame. (b) Zoom in
marked region of ﬁrst frame. (c) Same for second frame. Second row,
from left to right: (d) Colour code. (e) Flow ﬁeld in marked region,
without normalisation (α = 5000.0). (f) Same for RGB colour space
(α = 300.0). Third row, from left to right: (g) Same for TV regulari
sation (α =50.0). (h) Same for image and ﬂowdriven regularisation
(Sun et al. 2008) (α =2000.0). (i) Same for our method (α =2000.0)
to our result in Fig. 3 (i), the following drawbacks of the
modiﬁed versions become obvious: Without data term nor
malisation (Fig. 3 (e)), unpleasant artefacts at image edges
arise, even when using a large smoothness weight α. When
relying on the RGB colour space (Fig. 3 (f)), a phantom mo
tion in the shadow region at the right border is estimated.
To further substantiate these observations, we compare
the discussed variants to our proposed method on some Mid
dlebury training sequences. The results of our method are
shown together with the ground truth ﬂow ﬁelds in Fig. 4.
To evaluate the quality of the ﬂow ﬁelds compared to the
given ground truth, we use the average angular error (AAE)
Int J Comput Vis (2011) 93: 368–388 381
Fig. 4 Results for some Middlebury sequences with ground truth.
First column: Reference frame. Second column: Ground truth (white
pixels mark locations where no ground truth is given). Third col
umn: Result with our method. From top to bottom: Rubberwhale
(α =3500.0, σ =0.3, γ =20.0, ρ =1.0, λ =0.01 =⇒AAE =2.38
◦
,
AEE =0.076), Dimetrodon (α =8000.0, σ =0.7, γ =25.0, ρ =1.5,
λ = 0.01 =⇒ AAE = 1.39
◦
, AEE = 0.073), Grove3 (α = 20.0, σ =
0.5, γ = 0.2, ρ = 1.0, λ = 0.1 =⇒ AAE = 4.87
◦
, AEE = 0.487),
and Urban3 (α = 75.0, σ = 0.5, γ = 1.0, ρ = 1.5, λ = 0.1 =⇒
AAE =2.77
◦
, AEE =0.298)
measure (Barron et al. 1994). To ease comparison with other
methods, we also give the alternative average endpoint er
ror (AEE) measure (Baker et al. 2009). The resulting AAE
measures are listed in Table 2. Compared to our proposed
method, the results without normalisation are always signiﬁ
cantly worse. When using the RGB colour space, the results
are mostly inferior, apart from the Rubberwhale sequence.
Adeeper discussion on the assets and drawbacks of the RGB
and the HSV colour space can be found in Sect. 5.5. Addi
tionally, Table 2 also lists results obtained with greyscale im
ages. These are, however, always worse compared to RGB
or HSV results.
Effect of the Separate Robust Penalisation This experi
ment illustrates the desirable effect of our separate robust
penalisation of the HSV channels. Using the Rubberwhale
sequence from the Middlebury database, we show in Fig. 5
382 Int J Comput Vis (2011) 93: 368–388
Table 2 Comparing different variants of our method (using the AAE).
In the Sun et al. regulariser, we use the Charbonnier et al. (1994) pe
naliser function as this gives better results here
Sequence Rubber Dime Grove3 Urban3
whale trodon
Proposed 2.38
◦
1.39
◦
4.87
◦
2.77
◦
No normalisation 2.89
◦
1.45
◦
5.72
◦
4.88
◦
RGB colour space 2.34
◦
1.44
◦
5.14
◦
2.97
◦
Greyscale images 2.61
◦
1.59
◦
5.35
◦
3.21
◦
TV regulariser 2.54
◦
1.39
◦
5.09
◦
3.27
◦
Sun et al. regulariser 2.58
◦
1.40
◦
5.27
◦
3.53
◦
CLG 2.44
◦
1.42
◦
4.86
◦
2.72
◦
Fig. 5 Effect of our separate robust penalisation of the HSV channels.
First row, from left to right: (a) Zoom in ﬁrst frame of the Rubberwhale
sequence. (b) Visualisation of the corresponding hue channel weight.
Brighter pixels correspond to a larger weight. Second row, from left
to right: (c) Same for the saturation channel. (d) Same for the value
channel
the data term weights
M
(w
¯
J
i
0
w) for the brightness con
stancy assumption on the hue, the saturation and the value
channel (i =1, . . . , 3). Here, brighter pixels correspond to a
larger weight and we only show a zoom for better visibility.
As we can see, the weight of the value channel is reduced
in the shadow regions (left of the wheel, of the orange toy
and of the clam). This is desirable as the value channel is not
invariant under shadows, see Fig. 2.
A CLGVariant of Our Method Our next experiment is con
cerned with a CLG variant of our data term where we, as for
the regularisation tensor, perform a Gaussian convolution
of the motion tensor entries. First, we compare our method
against a CLG variant for some Middlebury sequences, see
Table 2. We ﬁnd that the CLG variant may only lead to small
improvements. On the other hand, it may also deteriorate
the results. We thus conclude that for the considered Mid
Table 3 Comparison of our method to a CLG variant on noisy ver
sions of the Yosemite sequence (using the AAE). We have added
Gaussian noise with zero mean and standard deviation σ
n
σ
n
0 10 20 40
CLG 1.75
◦
4.01
◦
5.82
◦
8.14
◦
Proposed 1.64
◦
3.82
◦
6.08
◦
8.55
◦
dlebury test sequences, this modiﬁcation seems not too use
ful.
However, a CLG variant of our method can actually be
useful in the presence of severe noise in the image sequence.
To prove this, we compare in Table 3 the performance of
our method to its CLG counterpart on noisy versions of
the Yosemite sequence. As it turns out, the CLG variant im
proves the results at large noise scales, but deteriorates the
quality for low noise scenarios. This also explains the expe
rienced behaviour on the Middlebury data sets, which hardly
suffer from noise.
5.2 Complementary Smoothness Term
Comparison with Other Regularisers. In Fig. 3, we com
pare our method against two results obtained when using
another regulariser in conjunction with our robust data term:
(i) Using the popular TV regulariser; see (40) and (42). (ii)
Using the anisotropic image and ﬂowdriven regulariser of
Sun et al. (2008), which built the basis of our complemen
tary regulariser. Here, we use a rotationally invariant formu
lation that can be obtained from (49) by replacing the eigen
vectors r
i
of the regularisation tensor by the eigenvectors s
i
of the structure tensor. Comparing the obtained results to our
result in Fig. 3 (i), we see that TV regularisation (Fig. 3 (g)),
leads to blurred and badly localised ﬂow edges. Using the
regulariser of Sun et al. (2008) (Fig. 3 (h)), unpleasant stair
casing artefacts deteriorate the result.
As for the data term experiments, we also applied our
method with different regularisers to some Middlebury se
quences. Considering the corresponding error measures in
Table 2, we see that the alternative regularisation strategies
mostly lead to higher errors. Only for the Dimetrodon se
quence, the TV regulariser is en par with our result. Our
ﬂow ﬁelds in Fig. 4 in general closely resemble the ground
truth. However, minor problems arising from incorporating
image information in the smoothness term are visible at a
few locations, e.g. at the wheel in the Rubberwhale result
and at the bottom of the Urban 3 result. Furthermore, some
tiny ﬂow details are smoothed out, e.g. at the leafs in the
Grove 3 result. As recently shown by Xu et al. (2010), the
latter problem could be resolved by using a more sophisti
cated warping strategy.
Int J Comput Vis (2011) 93: 368–388 383
Fig. 6 Results for the Marble sequence with our spatiotemporal
method. First row, from left to right: (a) Reference frame (frame 16).
(b) Ground truth (white pixels mark locations where no ground truth
is available). Second row, from left to right: (c) Result using 2 frames
(16–17). (d) Same for 6 frames (14–19)
Table 4 Smoothness weight α and AAE measures for our spatio
temporal method on the Marble sequence, see Fig. 6. All results used
the ﬁxed parameters σ = 0.5, γ = 0.5, ρ = 1.0, τ = 1.5, η = 0.5.
When using more than two frames, the convolutions with K
σ
and K
ρ
are performed in the spatiotemporal domain
Number of frames 2 4 6 8
(from – to) (16–17) (15–18) (14–19) (13–20)
Smoothn. weight α 75.0 50.0 50.0 50.0
AAE 4.85
◦
2.63
◦
1.86
◦
2.04
◦
Optic Flow in the SpatioTemporal Domain Let us now
turn to the spatiotemporal extension of our complementary
smoothness term. As most Middlebury sequences consist
of 8 frames, a spatiotemporal method would in general be
applicable. However, the displacements between two subse
quent frames are often rather large there, resulting in a vio
lation of the assumption of a temporally smooth ﬂow ﬁeld.
Consequently, spatiotemporal methods do not improve the
results. In our experiments, we use the Marble sequence
(available at http://i21www.ira.uka.de/image_sequences/)
and the Yosemite sequence from the Middlebury datasets.
These sequences exhibit relatively small displacements and
our spatiotemporal method allows to obtain notably better
results, see Fig. 6 and Tables 4 and 5. Note that when using
more than two frames, a smaller smoothness weight α has
to be chosen and that a too large temporal window may also
deteriorate the results again.
Table 5 Smoothness weight α and AAE measures for our spatio
temporal method on the Yosemite sequence from Middlebury. All re
sults used the ﬁxed parameters σ = 1.0, γ = 20.0, ρ = 1.5, τ = 1.0,
η =0.5. Here, better results could be obtained when disabling the tem
poral presmoothing
Number of frames 2 4 6 8
(from – to) (10–11) (9–12) (8–13) (7–14)
Smoothn. weight α 2000.0 1000.0 1000.0 1000.0
AAE 1.65
◦
1.16
◦
1.05
◦
1.01
◦
Fig. 7 Automatic selection of the smoothness weight α at the Grove2
sequence. From left to right: (a) Angular error (AAE) for 51 values of
α, computed from α
0
=400.0 and a stepping factor a =1.1. (b) Same
for the proposed data constancy error (ADCE
1,3
)
Table 6 Results (AAE) for some Middlebury sequences when (i) ﬁx
ing the smoothness weight (α =400.0), (ii) estimating α and (iii) with
the optimal value of α
Sequence Fixed α Estimated α Optimal α
Rubberwhale 3.43
◦
3.00
◦
3.00
◦
Grove2 2.59
◦
2.43
◦
2.43
◦
Grove3 5.50
◦
5.62
◦
5.50
◦
Urban2 3.22
◦
2.84
◦
2.66
◦
Urban3 3.44
◦
3.37
◦
3.35
◦
Hydrangea 1.96
◦
1.94
◦
1.86
◦
Yosemite 2.56
◦
1.89
◦
1.71
◦
Marble 5.73
◦
5.05
◦
4.94
◦
5.3 Automatic Selection of the Smoothness Weight
Performance of our Proposed Error Measure We ﬁrst
show that our proposed data constancy error between frame
1 and 3 (ADCE
1,3
) is a very good approximation of the pop
ular angular error (AAE) measure. To this end, we com
pare the two error measures for the Grove2 sequence in
Fig. 7. It becomes obvious that our proposed error measure
(Fig. 7 (b)) indeed exhibits a shape very close to the angular
error shown in Fig. 7 (a). As our error measure reﬂects the
quality of the prediction with the given ﬂow ﬁeld, our result
further substantiate the validity of the proposed OPP.
384 Int J Comput Vis (2011) 93: 368–388
Beneﬁts of an Automatic Parameter Selection Next, we
show that our automatic parameter selection works well for
a large variety of different test sequences. In Table 6, we
summarise the AAE obtained when (i) setting α to a ﬁxed
value (α = α
0
= 400.0), (ii) using our automatic parame
ter selection method, and (iii) selecting the (w.r.t. the AAE)
optimal value of α under the tested proposals. It turns out
that estimating α with our proposed method allows to im
prove the results compared to a ﬁxed value of α in almost
all cases. Just for the Grove 3 sequence, the ﬁxed value of α
accidentally coincides with the optimal value. Compared to
the results achieved with an optimal value of α, our results
are on average 3% and at most 10% worse than the optimal
result.
We wish to note that a favourable performance of the pro
posed parameter selection of course depends on the validity
of our assumptions (constant speed and linear trajectories
of objects). For all considered test sequences these assump
tions were obviously valid. In order to show the limitations
of our method, we applied our approach again to the same
sequences, but replaced the third frame by a copy of the sec
ond one. This violates the assumption of a constant speed
and leads to a severe impairment of results (up to 30%).
5.4 Importance of AntiAliasing in the Warping Scheme
We proposed to presmooth the images prior to downsam
pling in order to avoid aliasing problems. In most cases, the
resulting artefacts will not signiﬁcantly deteriorate the ﬂow
estimation, which can be attributed to the robust data term.
However, for the Urban sequence from the ofﬁcial Middle
bury benchmark, antialiasing is crucial for obtaining rea
sonable results, see Fig. 8. As it turns out, the large displace
ment of the building in the lower left corner can only be esti
mated when using antialiasing. We explain this by the high
frequent stripe pattern on the facade of the building.
5.5 Comparison to StateoftheArt Methods
To compare our method to the stateoftheart in optic ﬂow
estimation, we submitted our results to the popular Middle
bury benchmark (http://vision.middlebury.edu/ﬂow/eval/).
We found that for the provided benchmark sequences,
using a HSV colour representation is not as beneﬁcial as
seen in our experiment from Fig. 3. As the Middlebury se
quences hardly suffer from difﬁcult illumination conditions,
we cannot proﬁt from the photometric invariances of the
HSV colour space. On the other hand, some sequences even
pose problems in their HSV representation. As an exam
ple, consider the results for the Teddy sequence in the ﬁrst
row of Fig. 9. Here we see that the small white triangle be
neath the chimney causes unpleasant artefacts in the ﬂow
ﬁeld. This results from the problem that greyscales do not
Fig. 8 Importance of antialiasing on the example of the Urban se
quence. Top row, from left to right: (a) Frame 10. (b) Frame 11. Bot
tom row, from left to right: (c) Our result without antialiasing. (d)
Same with antialiasing. Note that for this sequence, no ground truth is
publicly available
have a unique representation in the hue as well as the satura
tion channel. Nevertheless, there are also sequences where
a HSV colour representation is beneﬁcial. For the Mequon
sequence (second row of Fig. 9) a HSV colour representa
tion removes artefacts in the shadows left of the toys. The
bottom line is, however, that for the whole set of benchmark
sequences, we obtain slightly better results when using the
RGB colour space. Thus, we use this variant of our method
for evaluation at the Middlebury benchmark.
For our submission we used in accordance to the guide
lines a ﬁxed set of parameters: σ = 0.5, γ = 20.0, ρ =
4.0, η = 0.95, ζ = 0.1, λ = 0.1. The smoothness weight
α was automatically determined by our proposed method
with the settings n
α
= 8, α
0
= 400.0, a = 1.2. For the
Teddy sequence with only two frames, we set α = α
0
, as
our parameter estimation method is not applicable in this
case. The resulting running time for the Urban sequence
(640 × 480 pixels) was 620s on a standard PC (3.2 GHz
Intel Pentium 4). For the parameter selection we com
puted 2 · 8 + 1 = 17 ﬂow ﬁelds, corresponding to approx
imately 36s per ﬂow ﬁeld. Following the trend of paral
lel implementations on modern GPUs (Zach et al. 2007;
Grossauer and Thoman 2008; Sundaram et al. 2010; Werl
berger et al. 2010) we recently came up with a GPU version
of our method that can compute ﬂow ﬁelds of size 640×480
pixels in less than one second (Gwosdek et al. 2010).
At the time of submission (August 2010), we achieve the
4th place w.r.t. the AAE and the AEE measure among 39
listed methods. Note that our previous Complementary Op
tic Flow method (Zimmer et al. 2009) only ranks 6th for the
AAE and 9th for the AEE, which demonstrates the beneﬁts
of the proposed novelties in this paper, like the automatic
parameter selection and the antialiasing.
Int J Comput Vis (2011) 93: 368–388 385
Fig. 9 Comparison of results obtained with HSV or RGB colour representation. First row, from left to right: (a) Frame 10 of the Teddy sequence.
(b) Frame 11. (c) Result when using the HSV colour space (AAE = 3.94
◦
). (d) Same for the RGB colour space (AAE = 2.64
◦
). Second row, from
left to right: (e) Frame 10 of the Mequon sequence. (f) Frame 11. (g) Result when using the HSV colour space (AAE = 2.28
◦
). (h) Same for the
RGB colour space (AAE = 2.84
◦
)
Table 7 Estimated values of the smoothness weight α, using our au
tomatic parameter selection method
Sequence Smoothness weight α
Army 277.8
Grove 277.8
Mequon 277.8
Schefﬂera 691.2
Urban 480.0
Wooden 995.3
Yosemite 1433.3
In Table 7 we additionally summarise the estimated val
ues of α resulting from our automatic parameter selection
method. As desired, for sequences with small details in
the ﬂow ﬁeld (Army, Grove, Mequon) a small smoothness
weight is chosen. On the other hand, sequences like Wooden
and Yosemite with a rather smooth ﬂow yield signiﬁcantly
larger values for the smoothness weight.
6 Conclusions and Outlook
In this paper we have shown how to harmonise the three
main constituents of variational optic ﬂow approaches: the
data term, the smoothness term and the smoothness weight.
This was achieved by two main ideas: (i) We developed a
smoothness term that achieves an optimal complementary
smoothing behaviour w.r.t. the imposed data constraints. (ii)
We presented a simple, yet well performing method for de
termining the optimal smoothness weight for the given im
age sequence. To this end, we came up with a novel para
digm, the optimal prediction principle (OPP).
Our optic ﬂow in harmony (OFH) method bases on an
advanced data term that combines and extended success
ful concepts like normalisation, photometric invariant colour
representation, higher order constancy assumptions and ro
bust penalisation. The anisotropic complementary smooth
ness term incorporates directional information from the mo
tion tensor occurring in the data term. The smoothing in data
constraint direction is reduced to avoid interference with
the data term, while a strong smoothing in the orthogonal
direction allows to ﬁllin missing information. This yields
an optimal complementary between both terms. Further
more, our smoothness term uniﬁes the beneﬁts of image
and ﬂowdriven regularisers, resulting in sharp ﬂow edges
without oversegmentation artefacts. The proposed parame
ter selection method bases on the OPP that we introduced
in this paper. It states the ﬂow ﬁeld obtained with an opti
mal smoothness weight allows for the best prediction of the
next frames in the image sequence. Under some assump
tions, the quality of the prediction can be judged by evaluat
ing the data constraints between ﬁrst and third frame of the
sequence and using the doubled ﬂow vectors. Due to its sim
plicity, our method can easily be used in all variational optic
ﬂow approaches and additionally gives surprisingly good re
sults.
The beneﬁts of the OFH idea are demonstrated by our
extensive experimental validation and the competitive per
formance at the Middlebury optic ﬂow benchmark. Our pa
per thus shows that a careful design of data and smoothness
term together with an automatic choice of the smoothness
386 Int J Comput Vis (2011) 93: 368–388
weight allows to outperform other wellengineered methods
that incorporate many more processing steps, e.g. segmen
tation (Lei and Yang 2009), or the integration of an epipo
lar geometry prior (Wedel et al. 2009). We thus hope that
our work will give rise to more “harmonised” approaches in
other ﬁelds where energybased methods are used, e.g. im
age registration.
Our current research is on the one hand concerned with
investigating possibilities to small issues with the proposed
regularisation strategy; see our remarks in Sect. 5.2. Further
more, we are interested in extensions of our novel parameter
selection approach. Here, we focus on higher efﬁciency and
ways to apply our method in cases where only two images
are available. The latter would also remedy the problems
with nonconstant speed of objects that we have mentioned
in Sect. 5.3.
Acknowledgements The authors gratefully acknowledge partial
funding by the International MaxPlanck Research School (IMPRS)
and the Cluster of Excellence “Multimodal Computing and Interac
tion” within the Excellence Initiative of the German Federal Govern
ment.
Appendix: Derivation of the Diffusion Tensor for the
Complementary Regulariser
Consider the complementary regulariser from (50):
V(∇
2
u, ∇
2
v) =
V
_
u
2
r
1
+v
2
r
1
_
+u
2
r
2
+v
2
r
2
. (57)
Its contributions to the EulerLagrange equations are given
by
∂
x
_
∂
u
x
V
_
+∂
y
_
∂
u
y
V
_
, (58)
and
∂
x
_
∂
v
x
V
_
+∂
y
_
∂
v
y
V
_
, (59)
respectively. We exemplify the computation of the ﬁrst ex
pression (58). The second one then follows analogously. Let
us ﬁrst deﬁne the abbreviations
r
i
:=(r
i1
, r
i2
)
and ψ
V
(r
i
) :=
V
_
u
2
r
i
+v
2
r
i
_
, (60)
for i = 1, . . . , 2. With their help, we compute the expres
sions
∂
u
x
V = 2
_
ψ
V
(r
1
) u
r
1
r
11
+u
r
2
r
21
_
, (61)
∂
u
y
V = 2
_
ψ
V
(r
1
) u
r
1
r
12
+u
r
2
r
22
_
(62)
from (58). Using the fact that
∂
x
_
∂
u
x
V
_
+∂
y
_
∂
u
y
V
_
=div(∂
u
x
V, ∂
u
y
V)
, (63)
we obtain by plugging (61) and (62) into (58):
∂
x
_
∂
u
x
V
_
+∂
y
_
∂
u
y
V
_
=2div
_
ψ
V
(r
1
) u
r
1
r
11
+u
r
2
r
21
ψ
V
(r
1
) u
r
1
r
12
+u
r
2
r
22
_
.
(64)
By multiplying out the expressions inside the divergence ex
pressions one ends up with
∂
x
_
∂
u
x
V
_
+∂
y
_
∂
u
y
V
_
=2div
_
(ψ
V
(r
1
)r
2
11
+r
2
21
)u
x
+(ψ
V
(r
1
)r
11
r
12
+r
21
r
22
)u
y
(ψ
V
(r
1
)r
11
r
12
+r
21
r
22
)u
x
+(ψ
V
(r
1
)r
2
12
+r
2
22
)u
y
_
.
(65)
We can write above equation in diffusion tensor notation as
∂
x
_
∂
u
x
V
_
+∂
y
_
∂
u
y
V
_
=2div(D∇
2
u), (66)
with the diffusion tensor
D :=
_
ψ
V
(r
1
)· r
2
11
+1· r
2
21
ψ
V
(r
1
)· r
11
r
12
+1· r
21
r
22
ψ
V
(r
1
)· r
11
r
12
+1· r
21
r
22
ψ
V
(r
1
)· r
2
12
+1· r
2
22
_
.
(67)
We multiplied the second term of each sum by a factor of
1 to clarify that the eigenvalues of D are ψ
V
(r
1
) and 1, re
spectively. The corresponding eigenvectors are r
1
and r
2
,
respectively, which allows to rewrite the tensor D as
D =
_
r
11
r
21
r
12
r
22
_
_
ψ
V
(r
1
) · r
11
ψ
V
(r
1
) · r
12
1 · r
21
1 · r
22
_
= (r
1
 r
2
)
_
ψ
V
(r
1
) 0
0 1
_
_
r
1
r
2
_
. (68)
This shows that D is identical to D
CR
from (51), as it can be
written as
D=ψ
V
(r
1
) r
1
r
1
+r
2
r
2
=
V
_
u
2
r
1
+v
2
r
1
_
r
1
r
1
+r
2
r
2
.
(69)
References
Alvarez, L., Esclarín, J., Lefébure, M., & Sánchez, J. (1999). A PDE
model for computing the optical ﬂow. In Proc. XVI congreso de
ecuaciones diferenciales y aplicaciones (pp. 1349–1356). Las Pal
mas de Gran Canaria, Spain.
Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., &Szeliski,
R. (2009). A database and evaluation methodology for optical
ﬂow. Tech. Rep. MSRTR2009179, Microsoft Research, Red
mond, WA.
Barron, J. L., Fleet, D. J., & Beauchemin, S. S. (1994). Performance
of optical ﬂow techniques. International Journal of Computer Vi
sion, 12(1), 43–77.
Int J Comput Vis (2011) 93: 368–388 387
Bertero, M., Poggio, TA, & Torre, V. (1988). Illposed problems in
early vision. Proceedings of the IEEE, 76(8), 869–889.
Bigün, J., Granlund, G. H., & Wiklund, J. (1991). Multidimensional
orientation estimation with applications to texture analysis and
optical ﬂow. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 13(8), 775–790.
Black, M. J., & Anandan, P. (1996). The robust estimation of multiple
motions: parametric and piecewise smooth ﬂow ﬁelds. Computer
Vision and Image Understanding, 63(1), 75–104.
Blake, A., & Zisserman, A. (1987). Visual reconstruction. Cambridge:
MIT Press.
Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accu
racy optical ﬂow estimation based on a theory for warping. In T.
Pajdla & J. Matas (Eds.), Computer vision—ECCV 2004, Part IV.
Lecture notes in computer science (Vol. 3024, pp. 25–36). Berlin:
Springer.
Bruhn, A., & Weickert, J. (2005). Towards ultimate motion estima
tion: combining highest accuracy with realtime performance. In
Proc. tenth international conference on computer vision (Vol. 1,
pp. 749–755). Beijing: IEEE Computer Society Press.
Bruhn, A., Weickert, J., & Schnörr, C. (2005). Lucas/Kanade meets
Horn/Schunck: combining local and global optic ﬂow methods.
International Journal of Computer Vision, 61(3), 211–231.
Bruhn, A., Weickert, J., Kohlberger, T., & Schnörr, C. (2006).
A multigrid platform for realtime motion computation with
discontinuitypreserving variational methods. International Jour
nal of Computer Vision, 70(3), 257–277.
Charbonnier, P., BlancFéraud, L., Aubert, G., & Barlaud, M. (1994).
Two deterministic halfquadratic regularization algorithms for
computed imaging. In Proc. IEEE international conference on
image processing (Vol. 2, pp. 168–172). Austin: IEEE Computer
Society Press.
Cohen, I. (1993). Nonlinear variational method for optical ﬂow compu
tation. In Proc. eighth Scandinavian conference on image analysis
(Vol. 1, pp. 523–530). Norway, Tromsø.
Coons, S. A. (1967). Surfaces for computer aided design of space
forms (Tech. Rep. MIT/LCS/TR41). Massachusetts Institute of
Technology, Cambridge.
Elsgolc, L. E. (1962). Calculus of variations. London: Pergamon.
Fleet, D. J., & Jepson, A. D. (1990). Computation of component image
velocity from local phase information. International Journal of
Computer Vision, 5(1), 77–104.
Förstner, W., & Gülch, E. (1987). A fast operator for detection and
precise location of distinct points, corners and centres of circu
lar features. In Proc. ISPRS intercommission conference on fast
processing of photogrammetric data (pp. 281–305). Interlaken,
Switzerland.
Golland, P., & Bruckstein, A. M. (1997). Motion from color. Computer
Vision and Image Understanding, 68(3), 346–362.
Grossauer, H., & Thoman, P. (2008). GPUbased multigrid: realtime
performance in high resolution nonlinear image processing. In
A. Gasteratos, M. Vincze, & J. K. Tsotsos (Eds.), Lecture notes in
computer science: Vol. 5008. Computer vision systems (pp. 141–
150). Berlin: Springer.
Gwosdek, P., Zimmer, H., Grewenig, S., Bruhn, A., & Weickert, J.
(2010). A highly efﬁcient GPU implementation for variational
optic ﬂow based on the EulerLagrange framework. In Proc.
2010 ECCV workshop on computer vision with GPUs, Heraklion,
Greece.
Horn, B., & Schunck, B. (1981). Determining optical ﬂow. Artiﬁcial
Intelligence, 17, 185–203.
Krajsek, K., & Mester, R. (2007). Bayesian model selection for opti
cal ﬂow estimation. In F. A. Hamprecht, C. Schnörr, & B. Jähne
(Eds.), Pattern recognition. Lecture notes in computer science
(pp. 142–151). Berlin: Springer.
Lai, S. H., & Vemuri, B. C. (1998). Reliable and efﬁcient computation
of optical ﬂow. International Journal of Computer Vision, 29(2),
87–105.
Lei, C., & Yang, Y. H. (2009). Optical ﬂow estimation on coarseto
ﬁne regiontrees using discrete optimization. In Proc. 2009 IEEE
international conference on computer vision. Kyoto: IEEE Com
puter Society Press.
Lucas, B., & Kanade, T. (1981). An iterative image registration tech
nique with an application to stereo vision. In Proc. seventh inter
national joint conference on artiﬁcial intelligence (pp. 674–679).
Vancouver, Canada.
Mileva, Y., Bruhn, A., & Weickert, J. (2007). Illuminationrobust vari
ational optical ﬂow with photometric invariants. In F. A. Ham
precht, C. Schnörr, & B. Jähne (Eds.), Pattern recognition. Lec
ture notes in computer science (pp. 152–162). Berlin: Springer.
Murray, D. W., & Buxton, B. F. (1987). Scene segmentation from vi
sual motion using global optimization. IEEE Transactions on Pat
tern Analysis and Machine Intelligence, 9(2), 220–228.
Nagel, H. H. (1990). Extending the ‘oriented smoothness constraint’
into the temporal domain and the estimation of derivatives of opti
cal ﬂow. In O. Faugeras (Ed.), Lecture notes in computer science:
Vol. 427. Computer vision—ECCV ’90 (pp. 139–148). Berlin:
Springer.
Nagel, H. H., & Enkelmann, W. (1986). An investigation of smooth
ness constraints for the estimation of displacement vector ﬁelds
from image sequences. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 8, 565–593.
Ng, L., & Solo, V. (1997). A datadriven method for choosing smooth
ing parameters in optical ﬂow problems. In Proc. 1997 IEEE in
ternational conference on image processing (Vol. 3, pp. 360–363).
Los Alamitos: IEEE Computer Society.
Perona, P., & Malik, J. (1990). Scale space and edge detection using
anisotropic diffusion. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 12, 629–639.
Rudin, L. I., Osher, S., & Fatemi, E. (1992). Nonlinear total variation
based noise removal algorithms. Physica D, 60, 259–268.
Schnörr, C. (1993). On functionals with greyvaluecontrolled smooth
ness terms for determining optical ﬂow. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 15(10), 1074–1079.
Schnörr, C. (1994). Segmentation of visual motion by minimizing con
vex nonquadratic functionals. In Proc. twelfth international con
ference on pattern recognition (Vol. A, pp. 661–663). Jerusalem:
IEEE Computer Society Press.
Schoenemann, T., & Cremers, D. (2006). Near realtime motion seg
mentation using graph cuts. In K. Franke, K. R. Müller, B. Nick
olay, & R. Schäfer (Eds.), Pattern recognition, Lecture notes in
computer science (pp. 455–464). Berlin: Springer.
Shulman, D., & Hervé, J. (1989). Regularization of discontinuous ﬂow
ﬁelds. In Proc. workshop on visual motion (pp. 81–86). Irvine:
IEEE Computer Society Press.
Simoncelli, E. P., Adelson, E. H., & Heeger, D. J. (1991). Probability
distributions of optical ﬂow. In Proc. 1991 IEEE computer society
conference on computer vision and pattern recognition (pp. 310–
315). Maui: IEEE Computer Society Press.
Sun, D., Roth, S., Lewis, J. P., & Black, M. J. (2008). Learning optical
ﬂow. In D. Forsyth, P. Torr, & A. Zisserman (Eds.), Computer
vision—ECCV 2008, Part III, Lecture notes in computer science
(pp. 83–97). Berlin: Springer.
Sun, D., Roth, S., & Black, M. J. (2010). Secrets of optical ﬂow esti
mation and their principles. In Proc. 2010 IEEE computer society
conference on computer vision and pattern recognition. San Fran
cisco: IEEE Computer Society Press.
Sundaram, N., Brox, T., & Keutzer, K. (2010). Dense point trajectories
by GPUaccelerated large displacement optical ﬂow. In Lecture
notes in computer science: Vol. 6311. Computer vision—ECCV
2010 (pp. 438–451). Berlin: Springer.
388 Int J Comput Vis (2011) 93: 368–388
Tretiak, O., & Pastor, L. (1984). Velocity estimation from image se
quences with second order differential operators. In Proc. sev
enth international conference on pattern recognition (pp. 16–19).
Montreal, Canada.
van de Weijer, J., & Gevers, T. (2004). Robust optical ﬂow from pho
tometric invariants. In Proc. 2004 IEEE international conference
on image processing (Vol. 3, pp. 1835–1838). Singapore: IEEE
Signal Processing Society.
Wedel, A., Pock, T., Zach, C., Bischof, H., & Cremers, D. (2008). An
improved algorithm for TVL
1
optical ﬂow computation. In D.
Cremers, B. Rosenhahn, A. L. Yuille, &F. R. Schmidt (Eds.), Lec
ture notes in computer science: Vol. 5604. Statistical and geomet
rical approaches to visual motion analysis (pp. 23–45). Berlin:
Springer.
Wedel, A., Cremers, D., Pock, T., & Bischof, H. (2009). Structure
and motionadaptive regularization for high accuracy optic ﬂow.
In Proc. 2009 IEEE international conference on computer vision.
Kyoto: IEEE Computer Society Press.
Weickert, J. (1996). Theoretical foundations of anisotropic diffusion in
image processing. Computing Supplement, 11, 221–236.
Weickert, J., & Schnörr, C. (2001a). A theoretical framework for con
vex regularizers in PDEbased computation of image motion. In
ternational Journal of Computer Vision, 45(3), 245–264.
Weickert, J., &Schnörr, C. (2001b). Variational optic ﬂowcomputation
with a spatiotemporal smoothness constraint. Journal of Mathe
matical Imaging and Vision, 14(3), 245–255.
Werlberger, M., Pock, T., & Bischof, H. (2010). Motion estimation
with nonlocal total variation regularization. In Proc. 2010 IEEE
computer society conference on computer vision and pattern
recognition. San Francisco: IEEE Computer Society Press.
Xiao, J., Cheng, H., Sawhney, H., Rao, C., & Isnardi, M. (2006). Bilat
eral ﬁlteringbased optical ﬂow estimation with occlusion detec
tion. In A. Leonardis, H. Bischof, & A. Pinz (Eds.), Lecture notes
in computer science: Vol. 3951. Computer vision—ECCV 2006,
Part I (pp. 211–224). Berlin: Springer.
Xu, L., Jia, J., & Matsushita, Y. (2010). Motion detail preserving opti
cal ﬂow estimation. In Proc. 2010 IEEE computer society confer
ence on computer vision and pattern recognition. San Francisco:
IEEE Computer Society Press.
Yaroslavsky, L. P. (1985). Digital picture processing: an introduction.
Berlin: Springer.
Yoon, K. J., & Kweon, I. S. (2006). Adaptive supportweight approach
for correspondence search. IEEE Transactions on Pattern Analy
sis and Machine Intelligence, 28(4), 650–656.
Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach
for realtime TVL
1
optical ﬂow. In F. Hamprecht, C. Schnörr, &
B. Jähne (Eds.), Lecture notes in computer science: Vol. 4713.
Pattern recognition (pp. 214–223). Berlin: Springer.
Zimmer, H., Bruhn, A., Weickert, J., Valgaerts, L., Salgado, A., Rosen
hahn, B., & Seidel, H. P. (2009). Complementary optic ﬂow. In
D. Cremers, Y. Boykov, A. Blake, & F. R. Schmidt (Eds.), Lec
ture notes in computer science: Vol. 5681. Energy minimization
methods in computer vision and pattern recognition (EMMCVPR)
(pp. 207–220). Berlin: Springer.
Int J Comput Vis (2011) 93: 368–388
369
manner. Unfortunately, this promising concept of complementarity between data and smoothness term has not been further investigated. Our paper revives this concept for stateoftheart optic ﬂow models by presenting a novel complementary smoothness term in conjunction with an advanced data term. (ii) Having adjusted the smoothing behaviour to the imposed data constraints, it remains to determine the optimal balance between the two terms for the image sequence under consideration. This comes down to selecting an appropriate smoothness weight, which is usually considered a difﬁcult task. We propose a method that is easy to implement for all variational optic ﬂow approaches and nevertheless gives surprisingly good results. It bases on the assumption that the ﬂow estimate obtained by an optimal smoothness weight allows for the best possible prediction of the next frames in the image sequence. This novel concept we name optimal prediction principle (OPP). 1.1 Related Work In the ﬁrst ten years of research on optic ﬂow, several basic strategies have been considered, e.g. phasebased methods (Fleet and Jepson 1990), local methods (Lucas and Kanade 1981; Bigün et al. 1991) and energybased methods (Horn and Schunck 1981). In recent years, the latter class of methods became increasingly popular, mainly due to their potential for giving highly accurate results. Within energybased methods, one can distinguish discrete approaches that minimise a discrete energy function and are often probabilistically motivated, and variational approaches that minimise a continuous energy functional. Our variational approach naturally incorporates concepts that have proven their beneﬁts over the years. In the following, we brieﬂy review advances in the design of data and smoothness terms that are inﬂuential for our work. Data Terms To cope with outliers caused by noise or occlusions, Black and Anandan (1996) replaced the quadratic penalisation from Horn and Schunck (1981) by a robust subquadratic penaliser. To obtain robustness under additive illumination changes, Brox et al. (2004) combined the classical brightness constancy assumption of Horn and Schunck (1981) with the higherorder gradient constancy assumption (Tretiak and Pastor 1984; Schnörr 1994). Bruhn and Weickert (2005) improved this idea by a separate robust penalisation of brightness and gradient constancy assumption. This gives advantages if one of the two constraints produces an outlier. Recently, Xu et al. (2010) went a step further and estimate a binary map that locally selects between imposing either
brightness or gradient constancy. An alternative to higherorder constancy assumptions can be to preprocess the images by a structuretexture decomposition (Wedel et al. 2008). In addition to additive illumination changes, realistic scenarios also encompass a multiplicative part (van de Weijer and Gevers 2004). For colour image sequences, this issue can be tackled by normalising the colour channels (Golland and Bruckstein 1997), or by using alternative colour spaces with photometric invariances (Golland and Bruckstein 1997; van de Weijer and Gevers 2004; Mileva et al. 2007). If one is restricted to greyscale sequences, using logderivatives (Mileva et al. 2007) can be useful. A further successful modiﬁcation of the data term has been reported by performing a constraint normalisation (Simoncelli et al. 1991; Lai and Vemuri 1998; Schoenemann and Cremers 2006). It prevents an undesirable overweighting of the data term at large image gradient locations. Smoothness Terms First ideas go back to Horn and Schunck (1981) who used a homogeneous regulariser that does not respect any ﬂow discontinuities. Since different image objects may move in different directions, it is, however, desirable to also permit discontinuities. This can for example be achieved by using imagedriven regularisers that take into account image discontinuities. Alvarez et al. (1999) proposed an isotropic model with a scalarvalued weight function that reduces the regularisation at image edges. An anisotropic counterpart that also exploits the directional information of image discontinuities was introduced by Nagel and Enkelmann (1986). Their method regularises the ﬂow ﬁeld along image edges but not across them. As noted by Xiao et al. (2006), this comes down to convolving the ﬂow ﬁeld with an oriented Gaussian where the orientation is adapted to the image edges. Note that for a basic data term modelling the brightness constancy assumption, the image edge direction coincides with the complementary direction orthogonal to the data constraint direction. In such cases, the regularisers of Nagel and Enkelmann (1986) and Schnörr (1993) can thus be considered as early complementary smoothness terms. Of course, not every image edge will coincide with a ﬂow edge. Thus, imagedriven strategies are prone to give oversegmentation artefacts in textured image regions. To avoid this, ﬂowdriven regularisers have been proposed that respect discontinuities of the evolving ﬂow ﬁeld and are therefore not misled by image textures. In the isotropic setting this comes down to the use of robust, subquadratic penalisers which are closely related to line processes (Blake and Zisserman 1987). For energybased optic ﬂow methods, such a strategy was used by Shulman and Hervé (1989) and Schnörr (1994). An anisotropic extension was later presented by Weickert and Schnörr (2001a). The drawback of
This allows to adapt the smoothing direction to the direction of image structures and the smoothing strength to the ﬂow contrast. however.and ﬂowdriven behaviour. It further uses a HueSaturationValue (HSV) colour representation with a separate robustiﬁcation of each channel. the previous smoothness terms operate on ﬂow derivatives that only consider their direct neighbours. Our data term combines the brightness and the gradient constancy assumption. Along constraint edges. these strategies can be classiﬁed as imagedriven approaches and are thus also prone to oversegmentation problems. and performs a constraint normalisation. To judge the latter.and ﬂowdriven methods. nonlocal smoothing strategies (Yaroslavsky 1985) have been introduced to the optic ﬂow community by Sun et al. Computing the proposed error measure is. Ng and Solo (1997) hence restricted their focus to the basic method of Horn and Schunck (1981). However. Automatic Parameter Selection It is wellknown that an appropriate choice of the smoothness weight is crucial for obtaining favourable results. In these approaches. Due to the colour value distance. The latter smoothness term was later successfully used in the methods of Brox et al. computationally expensive. This method does not require a bruteforce search. 1. comparing nonlocal strategies to the previously discussed smoothness terms is somewhat difﬁcult: Whereas nonlocal methods explicitly model similarity in a certain neighbourhood. Across “constraint edges”. it makes sense to also assume temporal smoothness of the ﬂow ﬁelds. but the minimisation of the objective function is more complicated and only computationally feasible if certain approximations are performed. Concerning the individual problems of image. We further show that our regulariser can easily be extended to work in the spatiotemporal domain. Hence. Adapting ideas from Yoon and Kweon (2006). This goal was ﬁrst achieved in the discrete method of Sun et al. Our anisotropic complementary smoothness term takes into account directional information from the constraints imposed in the data term. Our method for determining the optimal smoothness weight bases on the proposed OPP concept. Using this measure. where the data term gives no information. it is assumed that the ﬂow vector at a certain pixel is similar to the vectors in a (possibly large) spatial neighbourhood. the authors developed an anisotropic regulariser that uses directional ﬂow derivatives steered by image structures. The latter is motivated by the fact that each channel in the HSV space has a distinct level of photometric invariance and information content. an imagedriven spatiotemporal smoothness terms was proposed by Nagel (1990) and a ﬂowdriven counterpart was later presented by Weickert and Schnörr (2001b). a strong ﬁllingin by using a quadratic penalisation makes sense. Then an anisotropic image. Nevertheless. the idea arises to combine the advantages of both worlds. i.and ﬂowdriven strategies. a parameter selection approach that can also handle robust data terms was presented by Krajsek and Mester (2007).370 Int J Comput Vis (2011) 93: 368–388 ﬂowdriven regularisers lies in less welllocalised ﬂow edges compared to imagedriven approaches. Assuming a constant speed and linear trajectory of objects within the considered frames. but also leads to a desirable image. Recently. Concerning an optimal selection of the smoothness weight for variational optic ﬂow approaches. There. (2008). especially for robust data terms. (2004) and Bruhn and Weickert (2005). This strategy not only allows for an optimal complementarity between data and smoothness term. there has been remarkably little research on methods that automatically estimate the optimal smoothness weight or other model parameters. a bruteforce search for the smoothness weight that gives the smallest error is performed. the latter strategies model a globally smooth ﬂow ﬁeld as each pixel communicates with each other pixel through its neighbours. (2010) and Werlberger et al. This method jointly estimates the ﬂow and the model parameters where the latter encompass the smoothness weight and also the relative weights of different data terms. It combines the beneﬁts of image. (2010). sharp ﬂow edges without oversegmentation problems. we perform a robust penalisation to reduce the smoothing in the direction where the data term gives the most information. The smoothness terms discussed so far only assume smoothness of the ﬂow ﬁeld in the spatial domain. yielding more than one ﬂow ﬁeld. Nevertheless. the similarity to the neighbours is weighted by a bilateral weight that depends on the spatial as well as on the colour value distance of the pixels. Finally we propose a simple method for automatically determining the optimal smoothness weight for the given image sequence. a separate robustiﬁcation allows to choose the most reliable channel at each position. As image sequences usually consist of more than two frames. we evaluate the data constraints between the ﬁrst and the third frame of the sequence. This leads to spatiotemporal smoothness terms. In a Bayesian framework.and ﬂowdriven smoothness term is designed that works complementary to the data term. For variational approaches. Ng and Solo (1997) proposed an error measure which can be estimated from the image sequence and the ﬂow estimate only. We call such a strategy image. We ﬁrst develop a robust and invariant data term.2 Our Contributions The OFH method is obtained in three steps. This results in ﬁnding the optimal smoothness weight as the one corresponding to the ﬂow ﬁeld with the best prediction quality.and ﬂowdriven regularisation. In a discrete setting they go back to Murray and Buxton (1987). this can be realised by simply doubling .e.
Organisation In Sect. ∇2 v) the smoothness term. It states that image intensities remain constant under their displacement. Section 3 describes the method proposed for determining the smoothness weight. (6) 2 = w J0 w. the corresponding data term is given by M1 (u. The latter is then extended to the spatiotemporal domain. a minimiser (u. 4. we can perform a ﬁrstorder Taylor expansion that yields the linearised optic ﬂow constraint (OFC) 0 = fx u + fy v + ft = ∇3 f w. This presmoothing step helps to reduce the inﬂuence of noise and additionally makes the image sequence inﬁnitely many times differentiable. we show experiments in Sect. After discussing implementation issues in Sect. Lai and Vemuri (1998). ∂t ) is the spatiotemporal gradient operator and subscripts denote partial derivatives. 1 := − f t ∇2 f .Int J Comput Vis (2011) 93: 368–388 371 the ﬂow vectors. the socalled normal ﬂow. (1) where ∇2 := (∂x . V (∇2 u. v) = ∇3 f w with the tensor J0 := ∇3 f ∇3 f.1 Data Term Let us now derive our data term in a systematic way. Assuming that the image sequence is smooth (which can be guaranteed by the Gaussian presmoothing) and that the displacements are small.1 ∇2 f  ∇2 f  . y) ∈ Ω denotes the location within a rectangular image domain Ω ⊂ R2 . The classical starting point is the brightness constancy assumption used by Horn and Schunck (1981). ∂y ) denotes the spatial gradient operator. It is thus not sufﬁcient to compute a unique solution. 2. Schoenemann and Cremers (2006) and using the abbreviation u := (u. which is known as the aperture problem (Bertero et al. 1988).3. v) denotes the data term. (ii) An explicit discussion on the adequate treatment of the hue channel of the HSV colour space. 2009) by the following points: (i) A more extensive derivation and discussion of the data term. (iii) A taxonomy of existing smoothness terms within a novel general framework. 2. 2 we present our variational optic ﬂow model with the robust data term and the complementary smoothness term. (v) A simple method for automatically selecting the smoothness weight. The term M(u. T ] will be presented in Sect. (1991). (vii) A more extensive experimental validation. It is found by minimising a global energy functional of the general form E(u. v) = Ω where ∇3 := (∂x . where Kσ is a spatial Gaussian of standard deviation σ and ∗ denotes the convolution operator. Due to its simplicity. Here. We further assume that f is presmoothed by a Gaussian convolution: Given an image sequence f0 (x). With a quadratic penalisation. (vi) A deeper discussion of implementation issues. The paper is ﬁnished with conclusions and an outlook on future work in Sect. and t ∈ [0. v) of the energy (1) necessarily has to fulﬁl the associated EulerLagrange equations ∂u M − α ∂x ∂ux V + ∂y ∂uy V ∂v M − α ∂x ∂vx V + ∂y ∂vy V = 0. we rewrite the data term M1 as M1 (u. T ] denotes time. our method is easy to implement for all variational optic ﬂow approaches. (4) 2 Variational Optic Flow Let f (x) be a greyscale image sequence where x := (x. According to the calculus of variations (Elsgolc 1962). f ∈ C ∞ . Normalisation Our experiments will show that normalising the data term can be beneﬁcial. =0 (2) (3) with homogeneous Neumann boundary conditions. For ∇2 f  = 0 it is deﬁned as wn := un . v) = ∇2 f u + ft = ∇2 f  2 2 ∇2 f u ft + ∇2 f  ∇2 f  . but nevertheless produces surprisingly good results.e. (5) The single equation given by the OFC involves two unknowns u and v. the vector (x. ∂y . 5. 1) describes the displacement vector ﬁeld between two frames at time t and t + 1. The optic ﬂow ﬁeld w := (u.e. The present article extends our shorter conference paper (Zimmer et al. t) . 6. i. Following Simoncelli et al. the OFC does allow to compute the ﬂow component orthogonal to image edges. y. (iv) The extension of our complementary regulariser to the spatiotemporal domain. The more general spatiotemporal case that uses all frames t ∈ [0. f (x + w) = f (x). v) + α V (∇2 u. Note that the energy (1) refers to the spatial case where one computes one ﬂow ﬁeld between two frames at time t and t + 1. and α > 0 is a smoothness weight. ∇2 v) dx dy. v. (7) M(u. Nevertheless. we obtain f (x) = (Kσ ∗ f0 )(x). i. The latter allows to reformulate most existing as well as our novel regulariser in a common notation. v) .
(13) . θ0 := ∇2 f 2 + ζ 2 (10) where the parameter γ > 0 steers the contribution of the gradient constancy assumption. 1. however. Our geometric interpretation suggests that one should ideally penalise the distance d in a data term M2 (u. see (8).e. i. To normalise M3 . it was proposed to impose the gradient constancy assumption (Tretiak and Pastor 1984. v) = w J w. v) = w J w. The data term M1 . weighs this distance by the squared spatial image gradient.g. A sketch of this situation is given in Fig. we normalise the data term M1 by multiplying it with a factor (Simoncelli et al. (14) The term d constitutes a projection of the difference between the estimated ﬂow u and the normal ﬂow un in the direction of the image gradient ∇2 f . fy fy (9) where the tensor J can be written in the motion tensor notation of Bruhn et al. we replace the motion tensor J by its normalised counterpart ¯ ¯ ¯ ¯ ¯ ¯ J := J0 + γ Jxy := J0 + γ Jx + Jy := θ0 J0 + γ θx Jx + θy Jy := θ0 ∇3 f ∇3 f + γ θ x ∇3 f x ∇3 f x + θ y ∇3 f y ∇3 f y . f 3 (x)). v) = ∇2 f 2 d 2 . As a remedy. with the normalised tensor ¯ J0 := θ0 J0 = θ0 ∇3 f ∇3 f . noise in ﬂat regions. the normalisation is not inﬂuenced for large gradients. A Taylor linearisation gives ∇3 fx w = 0 and ∇3 fy w = 0. v) = d 2 . 1 Geometric interpretation of the rewritten data term (8) = ∇2 f 2 = ∇2 f 2 ∇2 f f t ∇2 f u+ ∇2 f  ∇2 f 2 ∇2 f u − un ∇2 f  =:d 2 2 Gradient Constancy Assumption To render the data term robust under additive illumination changes. Combining both brightness and gradient constancy assumption in a quadratic way gives the data term M3 (u. e. (8) respectively. ∇2 fy 2 + ζ 2 (17) (16) The normalised data term M4 is given by ¯ M4 (u.372 Int J Comput Vis (2011) 93: 368–388 of ζ in the presence of noise. the term d describes the distance from u to the line l in the uvspace that is given by v=− fx ft u− . This overweighting is undesirable as large gradients can be caused by unreliable structures. Incorporating the normalisation into M1 leads to the data term ¯ M2 (u. ∇2 f (x + w) = ∇2 f (x). (2006) that allows to combine the two constancy assumptions in a joint tensor J := J0 + γ Jxy := J0 + γ Jx + Jy := ∇3 f ∇3 f + γ ∇3 fx ∇3 fx + ∇3 fy ∇3 fy . it states that image gradients remain constant under their displacement. it may pay off to choose a larger value Colour Image Sequences In a next step we extend our data term to multichannel sequences (f 1 (x). This results in a stronger enforcement of the data constraint at high gradient locations. 2004). f 2 (x). as M1 (u. 1991. (12) (11) Fig. v) = w J0 w. Brox et al. with two additional normalisation factors deﬁned as θx := 1 ∇2 fx 2 + ζ 2 and θy := 1 . In a geometric interpretation. (18) where the regularisation parameter ζ > 0 avoids division by zero. Schnörr 1994. such as noise or occlusions. the ﬂow u has to lie according to the OFC (4). Nevertheless. it reduces the inﬂuence of small gradients which are signiﬁcantly smaller than ζ 2 . Lai and Vemuri 1998) 1 . Additionally. (15) On this line. If . Thus. In contrast to the brightness constancy assumption. and the normal ﬂow un is the smallest vector that lies on this line.
respectively. which cannot be captured by the gradient constancy assumption that is only invariant under additive illumination changes. This problem can be tackled by using the Hue Saturation Value (HSV) colour space. The corresponding motion tensor for the bright1 ¯1 J0 := θ0 ∇3 cos f 1 ∇3 cos f 1 + ∇3 sin f 1 ∇3 sin f 1 . . on the other hand. from left to right: (a) Colour image with zoom in shadow region left of the wheel. (21) where the normalisation factor is deﬁned as 1 θ0 := 1 ∇2 2 cos f 1 + ∇2 sin f 1 + ζ 2 2 . f 1 ∈ [0◦ . 2. but appears in the value channel. from left to right: (c) Saturation channel. as can be observed for the striped cloth. where s denotes the quadratic data term.edu/ﬂow/data/). 360◦ ). http://vision. and the value channel exhibits no invariances. The hue channel is invariant under global and local multiplicative illumination changes. (19) with normalisation factors θ i for each colour channel f i . (2007) thus only used the hue channel for optic ﬂow computation as it exhibits the most invariances. available in the value channel. v) = M ¯ w Jc w . Mileva et al. the motion tensor (21) is equivalent to our earlier deﬁnition. We couple the three colour channels in the motion tensor ¯ J c := 3 i=1 3 ¯ J i := 3 i=1 ¯i ¯i J0 + γ Jxy := i=1 3 ¯i ¯i ¯i J0 + γ Jx + Jy := i=1 i θ 0 ∇3 f i ∇3 f i i i i i i i + γ θ x ∇3 f x ∇3 f x + θ y ∇3 f y ∇3 f y . One problem when using a HSV colour representation is that the hue channel f 1 describes an angle in a colour circle. Our solution to this problem is to consider the unit vector (cos f 1 . 2 HSV decomposition on the example of the Rubberwhale image from the Middlebury optic ﬂow database (Baker et al. especially the hue channel discards a lot of image information. The corresponding data term reads as ¯ M5 (u. as ∇3 cos f 1 ∇3 cos f 1 + ∇3 sin f 1 ∇3 sin f 1 = sin2 f 1 ∇3 f 1 ∇3 f 1 + cos2 f 1 ∇3 f 1 ∇3 f 1 = ∇3 f 1 ∇ 3 f 1 . visualised with full saturation and value. The hue channel is hence not differentiable at the interface between 0◦ and 360◦ . Black and Anandan (1996) proposed to refrain from a quadratic penalisation. As an example. green and blue channel. the shadow at the left of the wheel (shown in the zoom) is not present in the hue and the saturation channel. We will additionally use the saturation and value channel. (d) Value channel ness constancy assumption consequently reads as Photometric Invariant Colour Spaces Realistic illumination models encompass a multiplicative inﬂuence (van de Weijer and Gevers 2004).Int J Comput Vis (2011) 93: 368–388 373 one uses the standard RGB colour space. Nevertheless. Instead they use a subquadratic penalisation function 2 2 M (s ). Second row. As we can see. Using such a robust penaliser within our data term yields M6 (u. (2004) for the sub√ quadratic penaliser M (s 2 ) := s 2 + ε 2 using a small regularisation parameter ε > 0. (22) ¯1 The tensor Jxy for the gradient constancy assumption is adapted accordingly. as proposed by Golland and Bruckstein (1997). This results in treating the hue channel as two (coupled) channels. 2009. The saturation channel is only invariant under global multiplicative illumination changes. which are both differentiable. as well as under local additive changes. (b) Hue channel. (24) Good results are reported by Brox et al. (23) where ∇ denotes the gradient in the differentiable parts of the hue channel. First row. i.middlebury.e. Note that in the differentiable parts of the hue channel. consider the HSV decomposition of the Rubberwhale image shown in Fig. v) = w J c w. This information is. because they contain information that is not encoded in the hue channel. (20) Fig. sin f 1 ) corresponding to the angle f 1 . the three channels represent the red. Robust Penalisers To provide robustness of the data term against outliers caused by noise and occlusions.
see Fig. This will be conﬁrmed by a speciﬁc experiment in Sect. Anisotropic imagedriven regularisers take into account directional information from image structures. yielding a downweighting of this channel.2 using the directional derivatives usi := si ∇2 u. Final Data Term Incorporating our separate robustiﬁcation idea into M7 brings us to our ﬁnal data term 3 M(u.n denotes the entry in row m and column n of the tensor J. Analysing the terms (28) and (29). v) = i=1 M ¯i w J0 w + γ 3 M i=1 ¯i w Jxy w . 2. (25) where the separate motion tensors are deﬁned as ¯c J0 := 3 i=1 ¯i J0 ¯c and Jxy := 3 i=1 ¯i Jxy . with tr denoting the trace operator. Preliminaries for the General Framework We ﬁrst introduce concepts that will be used in our general framework.2 ¯i · J0 +γ 1. These information can be obtained by considering the structure tensor (Förstner and Gülch 1987) 2 We will go further by proposing a separate robustiﬁcation of each colour channel in the HSV space.1 3 ¯i v + J0 1.t. As s1 and s2 constitute an orthonormal basis.2 ¯i u + Jxy 2. Note that our ﬁnal data term is (i) normalised. This can be justiﬁed by the distinct information content of each of the three channels.e. v) = M ¯c w J0 w + γ M ¯c w Jxy w .3 (32) ¯i · Jxy 1.3 with an integration scale ρ > 0. the corresponding argument of the decreasing function M will be large. Incorporating this strategy into our approach gives the data term M7 (u. (29) i=1 ∇2 u2 = u2 + u2 = u21 + u22 . We rewrite the regularisers in a novel framework that uniﬁes their notation and eases their comparison. A corresponding rewriting can also be performed for ∇2 v2 . x y s s 2. fx ) denotes the vector orthogonal to ∇2 f . Most regularisers impose smoothness by penalising the magnitude of the ﬂow gradients. we sketch some existing smoothness terms that led to our novel complementary regulariser. positive semideﬁnite 2 × 2 matrix that possesses two orthonormal eigenvectors s1 and s2 with corresponding eigenvalues μ1 ≥ μ2 ≥ 0.3 ⊥ where ∇2 f := (−fy .2 3 ¯i v + J0 2. The separate robustiﬁcation then downweights the inﬂuence of less appropriate colour channels. 2. and M (s 2 ) denotes the derivative of M (s 2 ) w. In the case for ρ = 0. and (iii) uses the HSV colour space with (iv) a separate robustiﬁcation of all colour channels. without considering neighbourhood information. we see that the separate robustiﬁcation of the HSV channels makes sense: If a speciﬁc channel violates the imposed constancy assumption at a certain location. i. .2 1. which is advantageous if one assumption produces an outlier. its argument. (28) i=1 (31) ¯i · Jxy 3 1. Other channels that satisfy the constancy assumption then have a dominating inﬂuence on the solution. The contributions of our data term (27) to the EulerLagrange equations (2) and (3) are given by 3 Sρ := Kρ ∗ ∇2 f ∇2 f =: i=1 μi si si . For ρ = 0 we also have that ∇2 f 2 = tr S0 . The structure tensor is a symmetric.2 ¯i · J0 +γ 1.3 ∂v M = i=1 M ¯i w J0 w ¯i u + J0 M 2. we can write ¯i w Jxy w ¯i v + Jxy .2 Smoothness Term Following the extensive taxonomy on optic ﬂow regularisers by Weickert and Schnörr (2001a). that drives the optic ﬂow estimation in different ways. (30) ∂u M = i=1 M ¯i w J0 w ¯i u + J0 M 1. The vector s1 points across image structures. 5. (27) ¯ with the motion tensors J 1 adapted to the HSV colour space as described before.1 ¯i u + Jxy 1.374 Int J Comput Vis (2011) 93: 368–388 Bruhn and Weickert (2005) later proposed a separate penalisation of the brightness and the gradient constancy assumption.1. (ii) combines the brightness and gradient constancy assumption. (26) where [J]m. ∇2 f  ¯i w Jxy w ¯i v + Jxy .r. one obtains s0 = 1 ∇2 f ∇2 f  and s0 = 2 ⊥ ∇2 f . whereas the vector s2 points along them.
ImageDriven Regularisation To obtain sharp ﬂow edges. (33) (34) with a regularisation parameter κ > 0. we will consider the corresponding EulerLagrange equations that can be written in the form ∂u M − α div (D ∇2 u) = 0. 2 In the deﬁnition of the normal ﬂow (7) we have seen that a data term that models the brightness constancy assumption constraints the ﬂow only orthogonal to image edges. (42) . we observe that in the limiting case for κ → 0. Homogeneous Regularisation First ideas for the smoothness term go back to Horn and Schunck (1981) who used a homogeneous regulariser. strictly positive weight function. The corresponding diffusion tensor DII = g(tr S0 ) I shows that the weight function allows to decrease the smoothing in accordance to the strength of image edges. 1999) reduce the smoothing at image edges. More speciﬁc. ∇2 v) := ∇2 u P(∇2 f ) ∇2 u + ∇2 v P ∇2 f ∇2 v. The drawback of imagedriven strategies is that they are prone to oversegmentation artefacts in textured image regions where image edges do not necessarily correspond to ﬂow edges. In the limiting case. i. This complementarity has been emphasised by Schnörr (1993). An isotropic imagedriven regulariser goes back to Alvarez et al. where the smoothing is reduced at the boundaries of the evolving ﬂow ﬁeld via the decreasing diffusivity V . s s (41) The underlying diffusion processes perform nonlinear isotropic diffusion. and the corresponding eigenvalues determine the magnitude of smoothing. Shulman and Hervé (1989) and Schnörr (1994) proposed to use subquadratic penaliser functions for the smoothness term. the eigenvectors of D give the smoothing direction. (37) where P(∇2 f ) denotes a regularised projection matrix perpendicular to the image gradient.e. FlowDriven Regularisation To remedy the oversegmentation problem.e. where VAI → u20 + v 20 . this regulariser can be written as VAI (∇2 u. the regulariser of Nagel and Enkelmann can hence be interpreted as a ﬁrst complementary smoothness term that ﬁlls in information orthogonal to the data constraint direction. The smoothing processes thus perform homogeneous diffusion that blurs important ﬂow edges. tr S0 + 2κ 2 s2 2 (39) with a diffusion tensor D that steers the smoothing of the ﬂow components u and v. (38) ) := s 2 + ε2 . i. The anisotropic imagedriven regulariser of Nagel and Enkelmann (1986) prevents smoothing of the ﬂow ﬁeld across image boundaries but encourages smoothing along them. imagedriven methods (Nagel and Enkelmann 1986. ∇2 v) := = V V s2 s2 s2 s2 s1 s1 (35) The corresponding diffusion tensor is equal to the unit matrix DH = I. ∇2 v) := ∇2 u2 + ∇2 v2 2 2 = u21 + u22 + vs1 + vs2 . we obtain a smoothing solely in s0 direction. who contributed a theoretical analysis as well as modiﬁcations of the original Nagel and Enkelmann functional. If one uses the convex penaliser (Cohen 1993) V (s 2 + κ 2I . ∂v M − α div (D ∇2 v) = 0. This is achieved by the regulariser VAI (∇2 u. In the isotropic setting. (1999) who used VII (∇2 u. s s (40) where the penaliser function V (s 2 ) is preferably increasing. s s The correctness of above rewriting can easily be veriﬁed and is based on the observations that s0 and s0 are the eigenvec1 2 tors of P.Int J Comput Vis (2011) 93: 368–388 375 To analyse the smoothing behaviour of the regularisers. differentiable and convex in s. The diffusion tensor for the regulariser of Nagel and Enkelmann (1986) is identical to the projection matrix: DAI = P. along image edges. In our common framework. ∇2 v) := g ∇2 f 2 ∇2 u2 + ∇2 v2 2 2 = g tr S0 u21 + u22 + vs1 + vs2 . The associated diffusion tensor is given by DIF = V 2 2 u21 + u22 + vs1 + vs2 I. Alvarez et al. In our framework it reads as VH (∇2 u. s s (36) where g is a decreasing. and that the factors in front of (u20 + v 20 ) and (u20 + v 20 ) are the corresponding eigenvalues. It is deﬁned as P(∇2 f ) := 1 ∇2 f 2 + 2κ 2 ⊥ ⊥ ∇2 f ∇2 f ∇2 u2 + ∇2 v2 2 2 u21 + u22 + vs1 + vs2 . ∇2 v) = κ2 2 u20 + vs0 tr S0 + 2κ 2 s1 1 + tr S0 + κ 2 2 2 u 0 + v s0 . VIF (∇2 u. it makes sense to adapt the smoothing process to the ﬂow edges instead of the image edges. indicated by large values of ∇2 f 2 = tr S0 . Concerning its eigenvectors and eigenvalues.
(48) which can be regarded as a generalisation of the structure tensor (30). As a result.and ﬂowdriven smoothness term in a discrete setting. r u21 + s V V 2 vs1 V 2 vs2 . Flowdriven strategies remedy the oversegmentation problem but give less pleasant ﬂow edges. the regulariser from Sun et al. Perona and Malik 1990) given by V (s 2 We observe that these tensors allow to obtain the desired behaviour: The regularisation direction is adapted to the image structure directions s1 and s2 . It adapts the smoothing direction to image structures but steers the smoothing strength in accordance to the ﬂow contrast.1 Our Novel Complementary Regulariser In spite of its sophistication. yielding a regularisation tensor that also uses the spatiotemporal gradient ∇3 . (2008) given in (46) still suffers from a few shortcomings. Image. To this end we propose to analyse the eigenvectors r1 and r2 of the regularisation tensor 3 ) := λ2 log 1 + s2 . Our experiments in Sect.2. “constraint edges” instead of image edges. one obtains the same sharp ﬂow edges as imagedriven methods but does not suffer from oversegmentation problems. 5. ∇2 v) := V Rρ := i=1 i Kρ ∗ θ0 ∇2 f i ∇2 f i i i i i i i + γ θ x ∇2 f x ∇2 f x + θ y ∇2 f y ∇2 f y . A continuous version of this regulariser can be written as VAIF (∇2 u.3 we extend our regulariser to the spatiotemporal domain. (47) . we obtain two diffusion tensors. the anisotropic image. We introduce three amendments that we will discuss now. Further note that a Gaussian convolution of the motion tensor leads to a combined localglobal (CLG) data term in the spirit of Bruhn et al. + u22 + s (46) Here. (2005).1 will analyse in which cases such a modiﬁcation of our data term can be useful. v} read as p DAIF (49) where we use the eigenvectors ri of the regularisation tensor. 2. We propose to jointly penalise the directional derivatives. Regularisation Tensor A ﬁrst remark w. (2008) who presented an anisotropic image. ∇2 v) := V 2 u21 + vr1 + r V 2 u22 + vr2 .t.RI (∇2 u.and ﬂowdriven model from Sun et al. This aim was achieved by Sun et al. In contrast to Nagel and ⊥ Enkelmann (1986) who considered ∇2 f to obtain directional information of image structures. 1992) with the diffusivity V (s 2 1 1 .r. yielding VAIFRρ .r. Rotational Invariance The smoothness term VAIF from (46) lacks the desirable property of rotational invariance.and FlowDriven Regularisation We have seen that imagedriven methods suffer from oversegmentation artefacts. but give sharp ﬂow edges.t. whereas the magnitude of the regularisation depends on the ﬂow contrast encoded in ps1 and ps2 . It is thus desirable to combine the advantages of both strategies to obtain sharp ﬂow edges without oversegmentation problems. that for p ∈ {u. λ2 (44) that results in PeronaMalik diffusion with the diffusivity V (s 2 )= 1 1+ s2 λ2 .376 Int J Comput Vis (2011) 93: 368–388 one ends up with regularised total variation (TV) regularisation (Rudin et al. The latter is due to the spatial regularisation. (45) using a contrast parameter λ > 0. In Sect. (2008) analyses the eigenvectors si of the structure tensor Sρ from (30) to obtain a more robust direction estimation. the model from (46) is that the directional information from the structure tensor Sρ is not consistent with the imposed constraints of our data term (27). because the directional derivatives of u and v in the eigenvector directions are penalised separately. 2. Note that the regularisation tensor differs from ¯ the motion tensor Jc from (19) by the facts that it (i) integrates neighbourhood information via the Gaussian convolution. they suffer from another drawback: The ﬂow edges are not as well localised as with imagedriven strategies. It is more natural to take into account directional information provided by the motion tensor (19) and to steer the anisotropic regularisation process w. = 2 V ps 1 s1 s1 + 2 V ps 2 s2 s2 . )= √ ≈ 2 s 2 + ε 2 2 s (43) Another possible choice is the nonconvex PeronaMalik regulariser (Lorentzian) (Black and Anandan 1996. and (ii) uses the spatial gradient operator ∇2 instead of the spatiotemporal operator ∇3 . We will not discuss the anisotropic ﬂowdriven regulariser of Weickert and Schnörr (2001a) as it does not ﬁt in our framework and also has not been used in the design of our complementary regulariser. Despite the fact that ﬂowdriven methods reduce the oversegmentation problem caused by image textures.
leading to almost homogeneous diffusion. The next to last column names the tensor that is analysed to obtain directional information for anisotropic strategies.and ﬂowdriven regulariser (46). and it indicates if the regulariser is rotationally invariant. as the eigenvectors of the regularisation tensor ri are used instead of the eigenvectors of the structure tensor. and the last column indicates if the corresponding regulariser is rotationally invariant Strategy Regulariser V Directional adaptation Homogeneous (Horn and Schunck 1981) Isotropic imagedriven (Alvarez et al. in r1 direction. ∇2 v) := V 2 2 u21 + vr1 + u22 + vr2 . r Furthermore. This results in a small argument for V . (iii) We only reduce the smoothing across constraint edges. . Finally note that our complementary regulariser has the same structure. yielding several ﬂow ﬁelds. Table 1 summarises the discussed regularisers rewritten in our framework.and ﬂowdriven regularisation become visible. The above regulariser (49) performs a twofold robust penalisation in both eigenvector directions. our taxonomy shows their structural similarities. Along them. The corresponding joint diffusion tensor is given by DCR = V 2 u21 + vr1 r1 r1 + r2 r2 . resembling edgeenhancing anisotropic diffusion (Weickert 1996). for κ → 0 s2 s2 u21 + s V 2 vs1 + u22 + s V 2 vs2 Sρ Rρ — Single Robust Penalisation. we perform a pronounced smoothing in both directions that avoids oversegmentation artefacts. in r1 direction. 2. Incorporating the single robust penalisation ﬁnally yields our complementary regulariser VCR (∇2 u. when analysing our joint diffusion tensor. the ﬂow gradients are small. 2. We hence propose a single robust penalisation in r1 direction. The beneﬁts of this design will be conﬁrmed by our experiments in Sect. it makes sense to also assume a (51) with V given in (45). however. Here we can distinguish two scenarios: r At a ﬂow edge that corresponds to a constraint edge. we propose to use the PeronaMalik regulariser (44).3 Extension to a SpatioTemporal Smoothness Term The smoothness terms we have discussed so far model the assumption of a spatially smooth ﬂow ﬁeld. yielding a reduced diffusion which preserves this important edge. the ﬂow gradients will be large and almost parallel to r1 . which reveals the following innovations: (i) The smoothing direction is adapted to constraint edges instead of image edges. It also compares the way directional information is obtained for anisotropic strategies.Int J Comput Vis (2011) 93: 368–388 Table 1 Comparison of regularisation strategies. As image sequences in general encompass more than two frames.e. i. the data term mainly constraints the ﬂow in direction of the largest eigenvalue of the spatial motion tensor. The derivation of this diffusion tensor is presented in Appendix. always a strong diffusion with strength 1 is performed. 2008) Anisotropic complementary image. we compare our joint diffusion tensor (51) to its counterparts (47). Only the regularisation tensor Rρ has to be adapted to the new data term. However.2 Summary To conclude this section. the argument of the decreasing function V will be large. Hence. For the penaliser V . r r (50) that complements the proposed robust data term from (27) in an optimal fashion. Discussion To understand the advantages of the complementary regulariser compared to the anisotropic image. At “deceiving” texture edges in ﬂat ﬂow regions. The smoothing strength across constraint edges is determined by the expres2 sion V (u21 + vr1 ). we opt for a quadratic penalisation to obtain a strong ﬁllingin effect of missing information.2. In the orthogonal r2 direction. even if other data terms are used. the beneﬁts of the underlying anisotropic image. (ii) We achieve rotational invariance by coupling the two ﬂow components in the argument of V .e. 5. 1999) Anisotropic imagedriven (Nagel and Enkelmann 1986) Isotropic ﬂowdriven (Shulman and Hervé 1989) Anisotropic image and ﬂowdriven (Sun et al.and ﬂowdriven V 2 2 u21 + vr1 + u22 + vr2 r r V V 2 2 u21 + u22 + vs1 + vs2 s s 2 2 g (tr S0 ) u21 + u22 + vs1 + vs2 s s 2 2 u21 + u22 + vs1 + vs2 s s 377 Rotationally invariant – – S0 – V u20 + v 20 . Thus.2. Note that despite the fact these regularisers have been developed within almost three decades. i.
3 (wα ) := 1 Ω +γ i=1 i i i + θy fy (x + 2wα ) − fy (x) 2 3 M Ω 3 M i i i θx fx (x + 2wα ) − fx (x) 2 i=1 i θ0 f i (x + 2wα ) − f i (x) 2 (53) ¯ For ρ = 0 it is identical to the motion tensor Jc from (19). we deﬁne the spatiotemporal complementary regulariser (STCR) ST VCR (∇3 u. Note that if the latter would be the case. Compared to the spatial energy (1) we note the additional integration over the time domain and that the smoothness term now depends on the spatiotemporal ﬂow gradient. Under these assumptions. We then select the α that gives the smallest error. overﬁt to the ﬁrst two frames and consequently result in a bad prediction of further frames. The integrand of above expression is (apart from the doubled ﬂow ﬁeld) a variant of our ﬁnal data term (27) where no linearisation of the constancy assumptions have been performed. (56) The corresponding spatiotemporal diffusion tensor reads as DST = CR V 2 u21 + vr1 r1 r1 + r2 r2 + r3 r3 . This error measure bases on a novel concept. The Gaussian convolution with Kρ is now performed in the spatiotemporal domain.2 Determining the Best Parameter In general. we evaluate the imposed data constraints between the ﬁrst and the third frame of the image sequence. r r r (54) dx dy. say. To this end. ∇3 v) := V 2 2 2 u21 + vr1 + u22 + vr2 + u23 + vr3 . yielding in total 2 nα + 1 tests. This strategy results in testing more values of α that are close to α0 and. tests less very small or very large values of α that hardly give reasonable results.r. the optimal prediction principle (OPP). more important. the relation between α and the ADCE is not convex. v) = M(u. To compute this measure. ∇3 v) dx dy dt. our error measure needs to judge the quality of the prediction achieved with a given ﬂow ﬁeld. This value is then incremented/decremented nα times by multiplying/dividing it with a stepping factor a > 1. v) (52) Ω×[0. To reduce the number of α values to test.t. Following this strategy. we could simply select the smoothness weight that gives the ﬂow ﬁeld with the smallest deviation from the ground truth. This makes sense as a too small smoothness weight would lead to an where wα denotes the ﬂow ﬁeld obtained with a smoothness weight α. The spatiotemporal regularisation tensor is a 3 × 3 tensor that possesses three orthonormal eigenvectors r1 . For too large smoothness weights. .1 A Novel Concept We propose an error measure that allows to judge the quality of a ﬂow ﬁeld without knowing the ground truth. we simply double the ﬂow vectors to evaluate the data constraints between ﬁrst and third frame. which excludes the use of gradient descentlike approaches for ﬁnding the optimal value of α w. standard value α0 . To evaluate the images at the subpixel locations f i (x + 2wα ) we use Coons patches based on bicubic interpolation (Coons 1967).378 Int J Comput Vis (2011) 93: 368–388 temporal smoothness of the ﬂow ﬁelds. our error measure. To extend our complementary regulariser from (50) to the spatiotemporal domain. We propose a bruteforce method similar to the one of Ng and Solo (1997): We ﬁrst compute the error measures for a “sufﬁciently large” set of ﬂow ﬁelds obtained with different α values. 3. leading to spatiotemporal regularisation strategies. resulting in an average data constancy error (ADCE) measure. This is especially important in real world applications of optic ﬂow where no ground truth ﬂow is known. r (55) 3 Automatic Selection of the Smoothness Weight The last step missing for our OFH method is a strategy that automatically determines the optimal smoothness parameter α for the image sequence under consideration. A spatiotemporal (ST) version of the general energy functional (1) reads as E ST (u. 3. we can deﬁne the ADCE between frame 1 and 3 as ADCE1. we deﬁne the spatiotemporal regularisation tensor ST ¯ Rρ := Kρ ∗ Jc . r2 and r3 . which also holds for the presmoothing of the image sequence. we propose to start from a given. we assume that the motion of the scene objects is of more or less constant speed and that it describes linear trajectories within the considered three frames. The OPP states that the ﬂow ﬁeld obtained with an optimal smoothness weight allows for the best prediction of the next frames in the image sequence. the ﬂow ﬁelds would be too smooth and thus also lead to a bad prediction. Following the OPP.T ] + α V ST (∇3 u. With their help.
As the increments are small.middlebury. When comparing the results . Here. however. In Fig. we downsample the input images by a factor η ∈ [0. If the temporal sampling of the image sequence is too coarse. and (ii) using the RGB instead of the HSV colour space. This prevents aliasing problems. For the regularisation tensor. When computing the motion tensor. (2006) for the discretisation of the EulerLagrange equations. At each warping level. To recover the previous reduction of the data term at coarse levels. 5. Adapting the Smoothness Weight to the Warping Level The inﬂuence of the data term usually becomes smaller at coarser levels of our multiscale framework.1 Warping Strategy for Large Displacements The derivation of the optic ﬂow constraint (4) by means of a linearisation is only valid under the assumption of small displacements. ζ = 0. resulting in a temporal difference. −8. This is due to the smoothing properties of the downsampling that leads to smaller values of the image gradients at coarse levels. 1).95.2 Discretisation We follow Bruhn et al. ε = 0. occurring derivatives are averaged from the two frames at time t and t + 1 to obtain a lower approximation error. 8. http://vision. As all considered sequences exhibit relatively large displacements. For warping with subpixel precision we again use Coons patches based on bicubic interpolation (Coons 1967).1 Robust Data Term Beneﬁts of Normalisation and the HSV Colour Space We proposed two main innovations in the data term: constraint normalisation and using an HSV colour representation. In our ﬁrst experiment. we use the following constant parameters: σ = 0. For approximating temporal image derivatives we use a twopoint stencil (−1. this precondition will be violated and a linearised approach fails.1. At the next ﬁner level. 3 (d).0. The spatial ﬂow derivatives are discretised by second order approximations 1 with the stencil 2h (−1. To obtain a coarse representation of the problem.edu/ﬂow/eval/). The ﬂow ﬁelds are visualised by a colour code where hue encodes the ﬂow direction and brightness the magnitude. 1). η = 0. we ﬁnish our experiments by presenting the performance at the Middlebury optic ﬂow benchmark (Baker et al. it makes sense to adapt the value of τ to the given image sequence to allow for an appropriate scaling of the temporal direction compared to the spatial directions (Weickert and Schnörr 2001b). 2009. 4. 4. −1). we use the stencil (−1. (2004) proposed a coarsetoﬁne multiscale warping strategy.1. we propose to adapt the smoothness weight α to the warping level k. After a small experiment on the importance of antialiasing in the warping scheme.0).001. we apply a lowpass ﬁlter to the images by performing a Gaussian convolution with standard √ deviation 2/(4η).Int J Comput Vis (2011) 93: 368–388 379 4 Implementation The solution of the EulerLagrange equations for our method comes down to solving a nonlinear system of equations. we use the multiscale warping approach described in Sect. Such a behaviour is in fact desirable as the data term might not be reliable at coarse levels. Spatial image derivatives are approximated via central ﬁ1 nite differences using the stencil 12h (1. which is achieved by performing a motion compensation of the second frame by the current ﬂow. Our proposed data term normalisation leads. λ = 0. ρ = 4. the derivatives are solely computed at the ﬁrst frame as we only want to consider directional information from the reference image. known as warping. If not said otherwise. we split the ﬂow ﬁeld into an already computed solution from coarser levels and an unknown ﬂow increment. Brox et al. we thus compare our method against variants (i) without data term normalisation. Concerning the temporal ﬂow derivatives that occur in the spatiotemporal case. The images and the ﬂow ﬁelds are sampled on a rectangular pixel grid with grid size h and temporal step size τ . γ = 20. 0.5. 1)/τ . 4. 2006). 5 Experiments In our experiments we show the beneﬁts of the OFH approach. To overcome this problem. the already computed solution serves as initialisation. as a separate robustiﬁcation of the RGB channels makes no sense. they can computed by the presented linearised approach. This is achieved by setting α(k) = α/ηk which results in larger values of α and an emphasis of the smoothness term at coarse levels. Note that it is a rather challenging sequence due to severe shadows and large displacements up to 25 pixels. to image gradients that are approximately in the same range at each level. 3 we show the results for the Snail sequence that we have created. 0. resulting in a fourth order approximation. The ﬁrst experiments are concerned with our robust data term and the complementary smoothness term in the spatial and the spatiotemporal domain.5. For the latter we only separately robustify the brightness and the gradient constancy assumption. see Fig.1. 1.0. Prior to downsampling. Then. We solve the system by a nonlinear multigrid scheme based on a GaußSeidel type solver with alternating line relaxation (Bruhn et al. we turn to the automatic selection of the smoothness weight.
3 (f)). we use the average angular error (AAE) .0). (b) Zoom in marked region of ﬁrst frame. a phantom motion in the shadow region at the right border is estimated. (e) Flow ﬁeld in marked region. we compare the discussed variants to our proposed method on some Middlebury training sequences.0). 3 (i).0). (i) Same for our method (α = 2000. To further substantiate these observations. even when using a large smoothness weight α. (c) Same for second frame. The results of our method are shown together with the ground truth ﬂow ﬁelds in Fig.and ﬂowdriven regularisation (Sun et al. (h) Same for image.0). 3 Results for our Snail sequence with different variants of our method. unpleasant artefacts at image edges arise. 3 (e)). First row. To evaluate the quality of the ﬂow ﬁelds compared to the given ground truth. When relying on the RGB colour space (Fig. 2008) (α = 2000. from left to right: (a) First frame. from left to right: (g) Same for TV regularisation (α = 50.0) to our result in Fig. Second row. without normalisation (α = 5000. from left to right: (d) Colour code. Third row. the following drawbacks of the modiﬁed versions become obvious: Without data term normalisation (Fig. (f) Same for RGB colour space (α = 300. 4.380 Int J Comput Vis (2011) 93: 368–388 Fig.
AEE = 0.0.073).3. To ease comparison with other methods.0.77◦ .7. the results are mostly inferior.5. γ = 1. AEE = 0.01 =⇒ AAE = 1. σ = 0. First column: Reference frame. A deeper discussion on the assets and drawbacks of the RGB and the HSV colour space can be found in Sect.298) measure (Barron et al.0. ρ = 1. From top to bottom: Rubberwhale (α = 3500.38◦ .5. we also give the alternative average endpoint error (AEE) measure (Baker et al.2. AEE = 0. γ = 20. 4 Results for some Middlebury sequences with ground truth. 5.0. Third column: Result with our method. Effect of the Separate Robust Penalisation This experiment illustrates the desirable effect of our separate robust penalisation of the HSV channels. 1994).5. always worse compared to RGB or HSV results. ρ = 1.39◦ . Compared to our proposed method.0. λ = 0.5.0. 5 . ρ = 1. Grove3 (α = 20. we show in Fig. apart from the Rubberwhale sequence. AEE = 0. γ = 25. σ = 0. Using the Rubberwhale sequence from the Middlebury database. however.Int J Comput Vis (2011) 93: 368–388 381 Fig. Table 2 also lists results obtained with greyscale images. γ = 0. λ = 0.076). σ = 0. The resulting AAE measures are listed in Table 2. Second column: Ground truth (white pixels mark locations where no ground truth is given). Dimetrodon (α = 8000. These are.0.87◦ . When using the RGB colour space. λ = 0. λ = 0. σ = 0. the results without normalisation are always signiﬁcantly worse.0.487). and Urban3 (α = 75.1 =⇒ AAE = 2.1 =⇒ AAE = 4. 2009). ρ = 1. Addi tionally.01 =⇒ AAE = 2.0.5.
We ﬁnd that the CLG variant may only lead to small improvements. we use a rotationally invariant formulation that can be obtained from (49) by replacing the eigenvectors ri of the regularisation tensor by the eigenvectors si of the structure tensor.44◦ 1. Considering the corresponding error measures in Table 2. . On the other hand. we see that the alternative regularisation strategies mostly lead to higher errors.53◦ 2. 4 in general closely resemble the ground truth. In Fig. Our ﬂow ﬁelds in Fig.09◦ 5.72◦ Proposed Grove3 Urban3 Int J Comput Vis (2011) 93: 368–388 Table 3 Comparison of our method to a CLG variant on noisy versions of the Yosemite sequence (using the AAE).54◦ 2. 3).27◦ 4. from left to right: (a) Zoom in ﬁrst frame of the Rubberwhale sequence. Here.64◦ 10 4. .86◦ 2. the saturation and the value channel (i = 1. (1994) penaliser function as this gives better results here Sequence Rubberwhale Proposed No normalisation RGB colour space Greyscale images TV regulariser Sun et al.35◦ 5.75◦ 1. First.97◦ 3. we see that TV regularisation (Fig. (2010). As recently shown by Xu et al.87◦ 5. brighter pixels correspond to a larger weight and we only show a zoom for better visibility.14◦ 5. we compare in Table 3 the performance of our method to its CLG counterpart on noisy versions of the Yosemite sequence. see Fig. First row.72◦ 5. Only for the Dimetrodon sequence.82◦ 20 5. we also applied our method with different regularisers to some Middlebury sequences. see Table 2. (2008). unpleasant staircasing artefacts deteriorate the result. leads to blurred and badly localised ﬂow edges. which hardly suffer from noise.27◦ 3.42◦ 4.14◦ 8. minor problems arising from incorporating image information in the smoothness term are visible at a few locations. As for the data term experiments. 3 (h)). As it turns out. In the Sun et al. but deteriorates the quality for low noise scenarios.61◦ 2. at the leafs in the Grove 3 result. (d) Same for the value channel ¯i the data term weights M (w J0 w) for the brightness constancy assumption on the hue. (2008) (Fig. 2. from left to right: (c) Same for the saturation channel. 3 (g)).55◦ dlebury test sequences. the CLG variant improves the results at large noise scales.88◦ 2.01◦ 3. This also explains the experienced behaviour on the Middlebury data sets.21◦ 3.34◦ 2. the weight of the value channel is reduced in the shadow regions (left of the wheel.82◦ 6. Furthermore. regulariser. Second row. A CLG Variant of Our Method Our next experiment is concerned with a CLG variant of our data term where we.59◦ 1.40◦ 1. (b) Visualisation of the corresponding hue channel weight. . perform a Gaussian convolution of the motion tensor entries. this modiﬁcation seems not too useful.g. a CLG variant of our method can actually be useful in the presence of severe noise in the image sequence. of the orange toy and of the clam). as for the regularisation tensor. Fig.89◦ 2. we compare our method against two results obtained when using another regulariser in conjunction with our robust data term: (i) Using the popular TV regulariser. the latter problem could be resolved by using a more sophisticated warping strategy.77◦ 4.38◦ 2.382 Table 2 Comparing different variants of our method (using the AAE).08◦ 40 8. 5 Effect of our separate robust penalisation of the HSV channels.39◦ 1. 3 (i). see (40) and (42). 3. Brighter pixels correspond to a larger weight. However.2 Complementary Smoothness Term Comparison with Other Regularisers.44◦ Dimetrodon CLG 1. This is desirable as the value channel is not invariant under shadows. (ii) Using the anisotropic image and ﬂowdriven regulariser of Sun et al. e. we compare our method against a CLG variant for some Middlebury sequences. some tiny ﬂow details are smoothed out. However. We have added Gaussian noise with zero mean and standard deviation σn σn 0 1. To prove this. Here. at the wheel in the Rubberwhale result and at the bottom of the Urban 3 result. Comparing the obtained results to our result in Fig. we use the Charbonnier et al. . As we can see. which built the basis of our complementary regulariser. regulariser CLG 2.58◦ 2. Using the regulariser of Sun et al. 5.g. the TV regulariser is en par with our result. We thus conclude that for the considered Mid . e. it may also deteriorate the results.39◦ 1.45◦ 1.
43◦ 2. weight α AAE 2 (16–17) 75.56◦ 5.65◦ 4 (9–12) 1000. computed from α0 = 400.62◦ 2.0 1. In our experiments.71◦ 4. τ = 1. First row.22◦ 3.96◦ 2.0.0 1. our result further substantiate the validity of the proposed OPP. (b) Same for the proposed data constancy error (ADCE1.Int J Comput Vis (2011) 93: 368–388 383 Table 5 Smoothness weight α and AAE measures for our spatiotemporal method on the Yosemite sequence from Middlebury.0 2. All results used the ﬁxed parameters σ = 0.0 1.de/image_sequences/) and the Yosemite sequence from the Middlebury datasets.50◦ 3.0 2.0 1.0). spatiotemporal methods do not improve the results. (ii) estimating α and (iii) with the optimal value of α Sequence Rubberwhale Grove2 Grove3 Urban2 Urban3 Hydrangea Fixed α 3. better results could be obtained when disabling the temporal presmoothing Number of frames (from – to) Smoothn. from left to right: (a) Reference frame (frame 16). Second row. However. 7 (a).43◦ 5.35◦ 1.3 Automatic Selection of the Smoothness Weight Performance of our Proposed Error Measure We ﬁrst show that our proposed data constancy error between frame 1 and 3 (ADCE1. When using more than two frames. τ = 1.5.59◦ 5.50◦ 2.37◦ 1.63◦ 6 (14–19) 50.0 1. As our error measure reﬂects the quality of the prediction with the given ﬂow ﬁeld.43◦ 5. To this end.5.5. γ = 20.86◦ 8 (13–20) 50. 6 Results for the Marble sequence with our spatiotemporal method. All results used the ﬁxed parameters σ = 1.94◦ Optic Flow in the SpatioTemporal Domain Let us now turn to the spatiotemporal extension of our complementary smoothness term. (d) Same for 6 frames (14–19) Table 4 Smoothness weight α and AAE measures for our spatiotemporal method on the Marble sequence.85◦ 4 (15–18) 50. we compare the two error measures for the Grove2 sequence in Fig.66◦ 3. From left to right: (a) Angular error (AAE) for 51 values of α. see Fig. a spatiotemporal method would in general be applicable. As most Middlebury sequences consist of 8 frames. Note that when using more than two frames. from left to right: (c) Result using 2 frames (16–17). Here. weight α AAE 2 (10–11) 2000.01◦ Fig.16◦ 6 (8–13) 1000. 7 (b)) indeed exhibits a shape very close to the angular error shown in Fig.ira.0.0 4. It becomes obvious that our proposed error measure (Fig.44◦ 1.00◦ 2. 7 Automatic selection of the smoothness weight α at the Grove2 sequence. the convolutions with Kσ and Kρ are performed in the spatiotemporal domain Number of frames (from – to) Smoothn.89◦ 5. γ = 0.73◦ Estimated α 3. η = 0.3 ) is a very good approximation of the popular angular error (AAE) measure. Consequently.3 ) Table 6 Results (AAE) for some Middlebury sequences when (i) ﬁxing the smoothness weight (α = 400. These sequences exhibit relatively small displacements and our spatiotemporal method allows to obtain notably better results. ρ = 1.0.0.5.5.0 and a stepping factor a = 1. (b) Ground truth (white pixels mark locations where no ground truth is available). resulting in a violation of the assumption of a temporally smooth ﬂow ﬁeld. a smaller smoothness weight α has to be chosen and that a too large temporal window may also deteriorate the results again. 6 and Tables 4 and 5.05◦ Optimal α 3. η = 0.5.uka.94◦ 1. ρ = 1. 7.00◦ 2.86◦ 1.1.05◦ 8 (7–14) 1000. we use the Marble sequence (available at http://i21www. see Fig. Yosemite Marble 5. the displacements between two subsequent frames are often rather large there. 6.84◦ 3. .04◦ Fig.
ρ = 4. However. 5. the resulting artefacts will not signiﬁcantly deteriorate the ﬂow estimation. Top row. from left to right: (c) Our result without antialiasing. and (iii) selecting the (w.r. As the Middlebury sequences hardly suffer from difﬁcult illumination conditions. For the Teddy sequence with only two frames.5. using a HSV colour representation is not as beneﬁcial as seen in our experiment from Fig. but replaced the third frame by a copy of the second one. corresponding to approximately 36 s per ﬂow ﬁeld. we submitted our results to the popular Middlebury benchmark (http://vision. This violates the assumption of a constant speed and leads to a severe impairment of results (up to 30%). 2010.0. see Fig. This results from the problem that greyscales do not Fig. we cannot proﬁt from the photometric invariances of the HSV colour space. which can be attributed to the robust data term. In Table 6. however. Nevertheless. Following the trend of parallel implementations on modern GPUs (Zach et al. the AAE and the AEE measure among 39 listed methods. 3. 9. (ii) using our automatic parameter selection method. . 8 Importance of antialiasing on the example of the Urban sequence. It turns out that estimating α with our proposed method allows to improve the results compared to a ﬁxed value of α in almost all cases. we applied our approach again to the same sequences. γ = 20. Note that for this sequence.edu/ﬂow/eval/). the ﬁxed value of α accidentally coincides with the optimal value.t. The resulting running time for the Urban sequence (640 × 480 pixels) was 620 s on a standard PC (3. our results are on average 3% and at most 10% worse than the optimal result. no ground truth is publicly available have a unique representation in the hue as well as the saturation channel. which demonstrates the beneﬁts of the proposed novelties in this paper. a = 1. we show that our automatic parameter selection works well for a large variety of different test sequences. 5. λ = 0. consider the results for the Teddy sequence in the ﬁrst row of Fig. Here we see that the small white triangle beneath the chimney causes unpleasant artefacts in the ﬂow ﬁeld. some sequences even pose problems in their HSV representation. The smoothness weight α was automatically determined by our proposed method with the settings nα = 8. Bottom row. 2010). As it turns out.t. we use this variant of our method for evaluation at the Middlebury benchmark. We explain this by the high frequent stripe pattern on the facade of the building. 2007.r.0). ζ = 0.0.0. At the time of submission (August 2010). Just for the Grove 3 sequence.4 Importance of AntiAliasing in the Warping Scheme We proposed to presmooth the images prior to downsampling in order to avoid aliasing problems. 9) a HSV colour representation removes artefacts in the shadows left of the toys. that for the whole set of benchmark sequences. Compared to the results achieved with an optimal value of α. we achieve the 4th place w. we summarise the AAE obtained when (i) setting α to a ﬁxed value (α = α0 = 400. the large displacement of the building in the lower left corner can only be estimated when using antialiasing.1. In most cases. As an example. we obtain slightly better results when using the RGB colour space. α0 = 400.95. (d) Same with antialiasing. there are also sequences where a HSV colour representation is beneﬁcial. For all considered test sequences these assumptions were obviously valid. Werlberger et al. the AAE) optimal value of α under the tested proposals. We wish to note that a favourable performance of the proposed parameter selection of course depends on the validity of our assumptions (constant speed and linear trajectories of objects). 2010) we recently came up with a GPU version of our method that can compute ﬂow ﬁelds of size 640 × 480 pixels in less than one second (Gwosdek et al. Sundaram et al. Note that our previous Complementary Optic Flow method (Zimmer et al. The bottom line is. as our parameter estimation method is not applicable in this case. η = 0. like the automatic parameter selection and the antialiasing. Grossauer and Thoman 2008. we set α = α0 . (b) Frame 11. antialiasing is crucial for obtaining reasonable results. We found that for the provided benchmark sequences. 8. Thus. 2009) only ranks 6th for the AAE and 9th for the AEE.5 Comparison to StateoftheArt Methods To compare our method to the stateoftheart in optic ﬂow estimation. For our submission we used in accordance to the guidelines a ﬁxed set of parameters: σ = 0. On the other hand. for the Urban sequence from the ofﬁcial Middlebury benchmark.2.384 Int J Comput Vis (2011) 93: 368–388 Beneﬁts of an Automatic Parameter Selection Next. In order to show the limitations of our method.2 GHz Intel Pentium 4). For the Mequon sequence (second row of Fig. For the parameter selection we computed 2 · 8 + 1 = 17 ﬂow ﬁelds.1.middlebury. from left to right: (a) Frame 10.
from left to right: (a) Frame 10 of the Teddy sequence.t.r.Int J Comput Vis (2011) 93: 368–388 385 Fig.8 277. The smoothing in data constraint direction is reduced to avoid interference with the data term. for sequences with small details in the ﬂow ﬁeld (Army. (h) Same for the RGB colour space (AAE = 2. The beneﬁts of the OFH idea are demonstrated by our extensive experimental validation and the competitive performance at the Middlebury optic ﬂow benchmark. our smoothness term uniﬁes the beneﬁts of imageand ﬂowdriven regularisers. To this end. our method can easily be used in all variational optic ﬂow approaches and additionally gives surprisingly good results. Furthermore.8 277. (d) Same for the RGB colour space (AAE = 2. resulting in sharp ﬂow edges without oversegmentation artefacts. (g) Result when using the HSV colour space (AAE = 2. Mequon) a small smoothness weight is chosen.3 1433. Due to its simplicity. (b) Frame 11. Under some assumptions. the quality of the prediction can be judged by evaluating the data constraints between ﬁrst and third frame of the sequence and using the doubled ﬂow vectors. Our paper thus shows that a careful design of data and smoothness term together with an automatic choice of the smoothness . we came up with a novel paradigm. As desired. (c) Result when using the HSV colour space (AAE = 3. Second row. (f) Frame 11. photometric invariant colour representation. using our automatic parameter selection method Sequence Army Grove Mequon Schefﬂera Urban Wooden Yosemite Smoothness weight α 277. This yields an optimal complementary between both terms. It states the ﬂow ﬁeld obtained with an optimal smoothness weight allows for the best prediction of the next frames in the image sequence.8 691. On the other hand. The anisotropic complementary smoothness term incorporates directional information from the motion tensor occurring in the data term.84◦ ) Table 7 Estimated values of the smoothness weight α.94◦ ). (ii) We presented a simple. from left to right: (e) Frame 10 of the Mequon sequence.3 In Table 7 we additionally summarise the estimated values of α resulting from our automatic parameter selection method. higher order constancy assumptions and robust penalisation. This was achieved by two main ideas: (i) We developed a smoothness term that achieves an optimal complementary smoothing behaviour w.2 480.0 995. sequences like Wooden and Yosemite with a rather smooth ﬂow yield signiﬁcantly larger values for the smoothness weight. The proposed parameter selection method bases on the OPP that we introduced in this paper.64◦ ). yet well performing method for determining the optimal smoothness weight for the given im age sequence. the imposed data constraints. the smoothness term and the smoothness weight. 9 Comparison of results obtained with HSV or RGB colour representation. the optimal prediction principle (OPP). while a strong smoothing in the orthogonal direction allows to ﬁllin missing information. Grove. 6 Conclusions and Outlook In this paper we have shown how to harmonise the three main constituents of variational optic ﬂow approaches: the data term.28◦ ). First row. Our optic ﬂow in harmony (OFH) method bases on an advanced data term that combines and extended successful concepts like normalisation.
Using the fact that ∂x ∂ux V + ∂y ∂uy V = div(∂ux V . Fleet. M.. (2009). The second one then follows analogously.. L. Barron. J. S. r2 (68) (57) Its contributions to the EulerLagrange equations are given by ∂x ∂ux V + ∂y ∂uy V . 2. 43–77. ∂uy V ) . Roth. XVI congreso de ecuaciones diferenciales y aplicaciones (pp. Black.. r (69) (60) References Alvarez. J. for i = 1.g. 5. e. 5. Our current research is on the one hand concerned with investigating possibilities to small issues with the proposed regularisation strategy. (1994).g. J. Lewis. Performance of optical ﬂow techniques. International Journal of Computer Vision. Redmond. S. 12(1). Appendix: Derivation of the Diffusion Tensor for the Complementary Regulariser Consider the complementary regulariser from (50): V (∇2 u. S. & Szeliski. Lefébure.. or the integration of an epipolar geometry prior (Wedel et al. MSRTR2009179. Tech. J. segmentation (Lei and Yang 2009).. The corresponding eigenvectors are r1 and r2 . The latter would also remedy the problems with nonconstant speed of objects that we have mentioned in Sect. Scharstein. L. 2009). Esclarín. D.. WA. . P. We thus hope that our work will give rise to more “harmonised” approaches in other ﬁelds where energybased methods are used. r r (67) We multiplied the second term of each sum by a factor of 1 to clarify that the eigenvalues of D are ψV (r1 ) and 1. and ∂x ∂vx V + ∂y ∂vy V . A PDE model for computing the optical ﬂow. M.3. D. S. Let us ﬁrst deﬁne the abbreviations ri := (ri1 .2.. Microsoft Research. (63) (61) (62) . with the diffusion tensor D := 2 2 ψV (r1 )· r11 + 1· r21 (66) ψV (r1 )· r11 r12 + 1· r21 r22 2 2 ψV (r1 )· r12 + 1· r22 ψV (r1 )· r11 r12 + 1· r21 r22 . A database and evaluation methodology for optical ﬂow. (59) (58) = (r1  r2 ) This shows that D is identical to DCR from (51). Furthermore. & Sánchez. Rep. we are interested in extensions of our novel parameter selection approach. we focus on higher efﬁciency and ways to apply our method in cases where only two images are available. In Proc.. 1349–1356). respectively. ri2 ) and ψV (ri ) := V 2 u2i + vri . Las Palmas de Gran Canaria. Baker.. which allows to rewrite the tensor D as D= r11 r21 r12 r22 ψV (r1 ) · r11 ψV (r1 ) · r12 1 · r21 1 · r22 ψV (r1 ) 0 0 1 r1 . ψV (r1 ) ur1 r12 + ur2 r22 (64) By multiplying out the expressions inside the divergence expressions one ends up with ∂x ∂ux V + ∂y ∂uy V = 2div 2 2 (ψV (r1 )r11 + r21 )ux + (ψV (r1 )r11 r12 + r21 r22 )uy 2 2 (ψV (r1 )r11 r12 + r21 r22 )ux + (ψV (r1 )r12 + r22 )uy . Here. we obtain by plugging (61) and (62) into (58): ∂x ∂ux V + ∂y ∂uy V = 2div ψV (r1 ) ur1 r11 + ur2 r21 . ∂uy V = 2 ψV (r1 ) ur1 r12 + ur2 r22 from (58). & Beauchemin. respectively. ∇2 v) = V 2 2 u21 + vr1 + u22 + vr2 . as it can be written as D = ψV (r1 ) r1 r1 + r2 r2 = V 2 u21 + vr1 r1 r1 + r2 r2 . see our remarks in Sect.. Acknowledgements The authors gratefully acknowledge partial funding by the International MaxPlanck Research School (IMPRS) and the Cluster of Excellence “Multimodal Computing and Interaction” within the Excellence Initiative of the German Federal Government. image registration.386 Int J Comput Vis (2011) 93: 368–388 weight allows to outperform other wellengineered methods that incorporate many more processing steps. . . We exemplify the computation of the ﬁrst expression (58). R. Spain. e. With their help. r respectively. J. (1999). . (65) We can write above equation in diffusion tensor notation as ∂x ∂ux V + ∂y ∂uy V = 2div(D ∇2 u). J. we compute the expressions ∂ux V = 2 ψV (r1 ) ur1 r11 + ur2 r21 .
& Thoman. 141– 150). 2010 IEEE computer society conference on computer vision and pattern recognition. (1962). In Proc. & J. Switzerland. Rep. Ng. In T. (2005). Osher. Dense point trajectories by GPUaccelerated large displacement optical ﬂow. 281–305). 61(3). Multidimensional orientation estimation with applications to texture analysis and optical ﬂow. & Barlaud. pp. Sun. Bruhn. G. Part III. Lecture notes in computer science (Vol. (1998). corners and centres of circular features. & Fatemi. N. C. Heraklion. In Proc. Charbonnier. Cambridge. Hamprecht. H. Computer vision—ECCV 2010 (pp... Grewenig. Computer vision—ECCV ’90 (pp. Jähne (Eds. In Proc. In Proc. P. M. Kohlberger. Krajsek. In Proc.. J. Computer vision—ECCV 2004. ISPRS intercommission conference on fast processing of photogrammetric data (pp.. P. Austin: IEEE Computer Society Press. A. (2007). B. Golland. & Black. Lucas/Kanade meets Horn/Schunck: combining local and global optic ﬂow methods. Computer vision—ECCV 2008... An iterative image registration technique with an application to stereo vision. In Proc. 629–639. Schnörr. & B. A. (2010). 5008. Probability distributions of optical ﬂow. V. (1987). S. (1993). Interlaken. eighth Scandinavian conference on image analysis (Vol. Artiﬁcial Intelligence. Schnörr.. J. S. .. Forsyth. 13(8). Lei. 12. E. 81–86). J. 257–277. Proceedings of the IEEE. Two deterministic halfquadratic regularization algorithms for computed imaging. In Proc. H. Förstner. IEEE Transactions on Pattern Analysis and Machine Intelligence. & Hervé.. Schäfer (Eds. (1997). D. In Proc. Vincze. Lewis. W. (1981). H. N. & Heeger. & Torre.. I. T. 185–203. Kyoto: IEEE Computer Society Press. Bruhn. Lecture notes in computer science: Vol. & Buxton. J. A. P. B. Schoenemann..). J. Simoncelli. Gwosdek. Computer Vision and Image Understanding. M. D. I.Int J Comput Vis (2011) 93: 368–388 Bertero. (1988). Gasteratos. 220–228.. Y. Beijing: IEEE Computer Society Press. Pajdla & J. D. (2006). & Keutzer. pp. Perona. & R. J. 1991 IEEE computer society conference on computer vision and pattern recognition (pp. Adelson.. L. & Gülch. Illuminationrobust variational optical ﬂow with photometric invariants. R. A datadriven method for choosing smoothing parameters in optical ﬂow problems. Bruhn. 2. 661–663). Bigün. & Schnörr. An investigation of smoothness constraints for the estimation of displacement vector ﬁelds from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence. Nagel. (1987). A. K. D. Pattern recognition. Lecture notes in computer science (pp. In Proc. 6311. Tromsø. M. S. Surfaces for computer aided design of space forms (Tech. 211–231.). J. (1991). 1074–1079. G. workshop on visual motion (pp. H. J. 9(2). Zimmer. Brox. K. 68(3). M.. Y. IEEE Transactions on Pattern Analysis and Machine Intelligence. In F. Pattern recognition. & Mester. H.. In A. D. International Journal of Computer Vision. B. Berlin: Springer. 76(8). & Anandan. Berlin: Springer. Lecture notes in computer science (pp. Torr... W. TA. B. IEEE international conference on image processing (Vol.. H.. J. 438–451). 168–172). H. Sun. Regularization of discontinuous ﬂow ﬁelds. Calculus of variations.). (2009). & Weickert. In D.. Extending the ‘oriented smoothness constraint’ into the temporal domain and the estimation of derivatives of optical ﬂow. (1989). & Kanade.. A. J. Roth. Scene segmentation from visual motion using global optimization. (1996). (1993). J. BlancFéraud. Poggio. & A. 1997 IEEE international conference on image processing (Vol. Weickert. 310– 315). 674–679). J. Pattern recognition. A. Murray.. (2008). Brox. In Proc. Jerusalem: IEEE Computer Society Press. In K. High accuracy optical ﬂow estimation based on a theory for warping. (1992). In O. M. L. S. twelfth international conference on pattern recognition (Vol. C. A. Maui: IEEE Computer Society Press. Motion from color. Sundaram.). 77–104. & Schnörr. S.. (2004). A. T. 60. Computer vision systems (pp. & Yang. P. S. 346–362. Horn. (2005). A. (2008). 259–268. Computation of component image velocity from local phase information. Norway. & Enkelmann. B. Visual reconstruction. & Wiklund. tenth international conference on computer vision (Vol. Berlin: Springer. J. P. & Cremers. T.. Papenberg... Greece. (1997). International Journal of Computer Vision. Secrets of optical ﬂow estimation and their principles. In Proc. A. MIT/LCS/TR41). Black. (1967). P. IEEE Transactions on Pattern Analysis and Machine Intelligence. W. Schnörr. Computer Vision and Image Understanding. (1981). A.. P. Reliable and efﬁcient computation of optical ﬂow. Berlin: Springer. B. Nickolay. pp. Jähne (Eds. R... Berlin: Springer. D. (1991). The robust estimation of multiple motions: parametric and piecewise smooth ﬂow ﬁelds. 142–151). & Vemuri. Near realtime motion segmentation using graph cuts. 387 Lai.). D. A. 1. Aubert. Matas (Eds. Determining optical ﬂow. 1. Zisserman (Eds. C. San Francisco: IEEE Computer Society Press. London: Pergamon. 139–148). E. 17. T. Mileva. K. & Bruckstein. Weickert. Berlin: Springer. E. Blake. Tsotsos (Eds. Bruhn. Bayesian model selection for optical ﬂow estimation. 565–593... International Journal of Computer Vision. A multigrid platform for realtime motion computation with discontinuitypreserving variational methods. Lecture notes in computer science (pp. Grossauer. Scale space and edge detection using anisotropic diffusion. J.. D.. & Black. Berlin: Springer. Fleet. H.. Schnörr.. C. Optical ﬂow estimation on coarsetoﬁne regiontrees using discrete optimization.. A highly efﬁcient GPU implementation for variational optic ﬂow based on the EulerLagrange framework. T.. Franke. 455–464). C. In F. Canada. 360–363). Nonlinear variational method for optical ﬂow computation. IEEE Transactions on Pattern Analysis and Machine Intelligence. J. Nagel. Vancouver. Segmentation of visual motion by minimizing convex nonquadratic functionals. seventh international joint conference on artiﬁcial intelligence (pp. 5(1). (2010). M. In Lecture notes in computer science: Vol. pp. 75–104. Los Alamitos: IEEE Computer Society. C. E. P. & B. pp. 25–36). E. Coons..). L. P. Physica D. F. Cambridge: MIT Press. (1994). J. 523–530). On functionals with greyvaluecontrolled smoothness terms for determining optical ﬂow. A fast operator for detection and precise location of distinct points. 83–97). 869–889. & Malik. Bruhn.. V.. K. 775–790. 2009 IEEE international conference on computer vision. Learning optical ﬂow. Bruhn. C. (1994). 15(10). & Zisserman.. (1990). & Solo. Müller. Berlin: Springer. (1986). (1990).). Lucas.. Towards ultimate motion estimation: combining highest accuracy with realtime performance.. 3. (2007). 427.. 70(3). & Weickert. Illposed problems in early vision. (1987).. Hamprecht. A.. 152–162). Nonlinear total variation based noise removal algorithms. GPUbased multigrid: realtime performance in high resolution nonlinear image processing. In Proc. 87–105. Lecture notes in computer science: Vol. & Weickert. Rudin. Cohen. Lecture notes in computer science (pp. Granlund. Roth. M. & Schunck. Irvine: IEEE Computer Society Press. Faugeras (Ed. Elsgolc. Part IV.. 63(1). H. 8. A. pp. Shulman. (2006)... C. (1990). 3024. H. L. Massachusetts Institute of Technology. 2010 ECCV workshop on computer vision with GPUs. International Journal of Computer Vision. & Weickert. & Jepson. 29(2). 749–755). (2010).
& Seidel. (2010). L. 3951. 2004 IEEE international conference on image processing (Vol. J. & Gevers. Lecture notes in computer science: Vol. Zach. Motion detail preserving optical ﬂow estimation. San Francisco: IEEE Computer Society Press. Salgado. Hamprecht. Lecture notes in computer science: Vol. & A. 16–19). Xu. C. Pock.. (2008). J. (1985). Sawhney. Wedel. Lecture notes in computer science: Vol. A.. (2004). C. Boykov. & Cremers. D. Bruhn. & Bischof. T. In A. Xiao. Theoretical foundations of anisotropic diffusion in image processing. H. Canada. Complementary optic ﬂow... Berlin: Springer.. O. C. Leonardis.. International Journal of Computer Vision. I. Kyoto: IEEE Computer Society Press. P. Lecture notes in computer science: Vol. & F. Adaptive supportweight approach for correspondence search. 2010 IEEE Int J Comput Vis (2011) 93: 368–388 computer society conference on computer vision and pattern recognition. Pattern recognition (pp. Computing Supplement. H. (1984). Statistical and geometrical approaches to visual motion analysis (pp. Robust optical ﬂow from photometric invariants.. 4713. In D. A. Wedel. Cheng. (2001a).. In Proc... Motion estimation with nonlocal total variation regularization. & Kweon.. A.. & F. Yuille. (1996). Jähne (Eds. Bischof.. & Matsushita.). Velocity estimation from image sequences with second order differential operators. San Francisco: IEEE Computer Society Press. D. Y. Cremers. T. In Proc. J. (2009). T. B. Y... Yaroslavsky. Montreal.). Berlin: Springer.. 5604.. 214–223). Journal of Mathematical Imaging and Vision. L.). Pinz (Eds. (2010)..). Berlin: Springer. H. Schnörr. H. Weickert. H. T.. Digital picture processing: an introduction.. Yoon. A. In F. Berlin: Springer. Werlberger. K. van de Weijer. Rosenhahn. (2001b). H. seventh international conference on pattern recognition (pp. & Bischof. IEEE Transactions on Pattern Analysis and Machine Intelligence. pp. M. S. Part I (pp. Weickert. Schmidt (Eds. & B. 207–220). C. (2006). An improved algorithm for TVL1 optical ﬂow computation. J. Bischof. Pock. & Isnardi. A theoretical framework for convex regularizers in PDEbased computation of image motion. 2009 IEEE international conference on computer vision. 221–236... Jia. In Proc. In Proc. Rosenhahn. & Schnörr. Weickert. 245–255. Structureand motionadaptive regularization for high accuracy optic ﬂow. & Bischof. C. Valgaerts. C. Energy minimization methods in computer vision and pattern recognition (EMMCVPR) (pp. Weickert. Schmidt (Eds. 1835–1838). A. (2006). 45(3). Singapore: IEEE Signal Processing Society. Variational optic ﬂow computation with a spatiotemporal smoothness constraint. A. Blake.. R. B. P. Cremers. 2010 IEEE computer society conference on computer vision and pattern recognition.. 3. & Pastor. In Proc.388 Tretiak. J. H. . Pock. J. Zimmer. Computer vision—ECCV 2006. In D.. H. & Schnörr. L. 245–264. L. (2009). 11. Cremers. (2007). J. 14(3). 5681. Bilateral ﬁlteringbased optical ﬂow estimation with occlusion detection. M.. A duality based approach for realtime TVL1 optical ﬂow. Berlin: Springer. L. J.. T. R.. Rao. H. 28(4). 650–656. Zach. 23–45). Pock. 211–224).