You are on page 1of 35

Applications of Legendre-Fenchel transformation to

computer vision problems
Ankur Handa, Richard A. Newcombe, Adrien Angeli, Andrew J. Davison
Department of Computing,
Huxley Building,
180, Queens Gate,
SW7 2AZ, London
{ahanda,r.newcombe08,a.angeli,ajd}@imperial.ac.uk,
WWW home page: http://www.doc.ic.ac.uk/∼ahanda

Abstract. We aim to provide a small background on Lengenre-Fenchel
transformation, the applications of which have been increasingly getting
popular in computer vision. A general motivation follows up with standard examples. Then we take a good view on their applications in solving
various standard computer vision problems e.g. image denoising, optical
flow, image deconvolution etc.
Keywords: Legendre-Fenchel transformation, Convex functions, Optimisation

1

Legendre-Fenchel transformation

Legendre-Fenchel (LF) transformation of a continuous but not necessarily differentiable function f : R → R ∪ {∞}, is defined as
f ∗ (p) = sup{ px − f (x)}
x ∈R

(1)

Geometrically it means that we are interested in finding a point x on the function f (x) such that the slope of line p passing through (x, f (x)) has a maximum
intercept on the y-axis. This happens to be the point on the curve which has a
slope p which is nothing but the tangent at that point.
p = f 0 (x)

(2)

A vector definition can be written as
f ∗ (p) = sup {xt p − f (x)}
x∈Rn

2
2.1

(3)

Motivation: Why do we need it?
Duality

Duality is the principle of looking at a function or a problem from two different
perspectives namely the primal and the dual form. When working in optimisation theory, often and in general a deep understanding of a given function

II

Fig. 1. Image Courtesy: Wikipedia.org

is required. For instance, one would like to know whether a given function is
linear, whether the function is well behaved in a given domain etc. to name
a few. Transformations are one way of mapping the function to another space
where better and easy ways of understanding the function emerge out. Take for
instance Fourier Transform and Laplace Transform. Legendre-Fenchel is one
such transform which maps the (x, f (x)) space to the space of slope and conjugate that is (p, f ∗ (p)). However, while the Fourier transform consists of an
integration with a kernel, the Legendre Fenchel transform uses supremum as
the transformation procedure. Under the assumption that the transformation is
reversible, one form is the dual of the other. This is easily expressed as
(x, f (x)) ⇐⇒ (p, f ∗ (p))

(4)

p is the slope and f ∗ (p) is called the convex conjugate of the function f (x). A
conjugate allows one to build a dual problem which may be easier to solve than
the primal problem. Legendre-Fenchel conjugate is always convex.

3

How to use the Duality

There are two ways of viewing a curve or a surface, either as a locus of points
or envelope of tangents [1]. Now, let us imagine that we want to use the duality
of tangents to represent a function f (x) instead of points. A tangent is parameterised by two variables namely the slope p and the intercept it cuts on negative
y-axis (using negative y-axis for intercept is purely a matter of choice) which is
denoted by c. We provide two ways to solve for the intercept and the slope and
arrive at the same result.

III

3.1

Motivation I

Let us denote the equation of line having a slope p and intercept c by
y = px − c

(5)

Now imagine this line is to touch the function f (x) at x, then we can equate
both of them and write as
px − c = f (x)

(6)

Also suppose that the function f (x) is convex and is x2 , a parabola (as an example). Then we can solve for quadratic equation
x2 − px + c = 0

(7)

Now this quadratic equation has two roots
p
p ± p2 − 4c
x=
2

(8)

But we know that this line should touch this convex function only at once (if the
function was non-convex, the line could touch the function at two points) and
because we want to use this line as a tangent to represent the function in dual
space. Therefore, the roots of this equation should be equal which is to say that
the determinant of the above quadratic equation should be zero i.e. p2 − 4c = 0
which gives us
f ∗ (p) = c =

p2
4

(9)

This is nothing but our Legendre-Fenchel transformed convex conjugate and
p = 2x

(10)

is the slope. That is
(x, f (x)) ⇐⇒ (p, f ∗ (p))
(x, x2 ) ⇐⇒ (p,
3.2

p2
4

)

(11)
(12)

Motivation II

For we know that
y = px − c

(13)

to be a tangent to f (x) at x it must be that
p = f 0 (x)

(14)

In this case we can rewrite our equation as px ∗ − c = f (x ∗ ) (24) c = px ∗ − f (x ∗ ) (25) which means is only a linear function of p. if the function is f (x) = x2 (15) we can take the first order derivative to obtain the slope which is p = f 0 (x) = 2x (16) p 2 (17) x = f 0−1 (p) = Replacing y we get y = f (x) (18) y = f ( f 0−1 (p)) (19) p p2 y = f( ) = 2 4 (20) and substituting x we get. So the duality of a non-differentiable point at x induces a linear function of slope in the dual space. This is because the slope of the tangent at a point in the very [ f 0 (x− + ∗ is f 0 (x ∗ ) and the right. denoted by x− − . small vicinity of x ∗ to the left side of it. and therefore we can solve for c as c = p f 0−1 (p) − f ( f 0−1 (p)) c=p c= 4 p2 p − 2 4 p2 4 (21) (22) (23) How about fuctions which are not differentiable everywhere? Let us now turn our attention towards the cases where a function may not be differentiable at a given point x ∗ and has a value f (x ∗ ).IV Again. f 0 (x ∗ )]. And p can take a value from ∗ ).

The point xˆ which minimises the function is referred to as the critiˆ = 0. we can draw many tangents with slopes ranging from [ f 0 (x− + interval is defined as the subgradient. f 0 (x ∗ )] defining a linear function. Therefore. denoted by x+∗ is f 0 (x+∗ ). the minimiser of function i. ∇ f (x) the class of functions which have global minimiser.e. interested in minimising or maximising a function f . the dual space is differentiable. of this slope. Therefore. 1] and they form the subgradients of the curve at x ∗ . ∀ x 0 ∈ Rn } (26) . 2. Convex functions belong to cal point. the point of non-differentiability in the primal space can be readily explained in dual space with a continuously varying slope in the range ∗ ). 4. majority of time. we are. This is explained briefly in the next section. one defines the subdifferential. f 0 (x ∗ )].V Fig. A subdifferential is therefore formalised as ∂ f (x) such that ∂ f (x) = {y ∈ Rn : hy. A notorious function f (x) = | x | is an example of convex function which is not differentiable at x = 0. even if [ f 0 (x− + the primal space is non-differentiable. x 0 − x i ≤ f (x 0 ) − f (x). This f (x). Instead. At the point x ∗ we can draw as many tangents we want with slopes ranging from [−1. At the point of discontinuity in the space of function ∗ ).1 Subdifferential and Subgradient In calculus. However there are certain convex functions which are not differentiable everywhere therefore one can not compute the gradient. c(p).

while the subdifferential at any point x > 0 is the singleton {1}.e. In simpler term. a subdifferential is defined as the slope of the line at point x such that it is either always touching or remaining below the graph of function. 1] because we can always draw a line with a slope between [-1. Members of the subdifferential are called subgradients.2 Proof: Legendre-Fenchel conjugate is always convex f ∗ (z) = sup {xt z − f (x)} x∈Rn (27) In order to prove that function is convex we need to prove that for a given 0 ≤ λ ≤ 1 the function should obey Jensen’s inequality i. An illustration of duality where lines are mapped to points while points are mapped to lines in the dual space.VI Fig. f ∗ (λz1 + (1 − λ)z2 ) ≤ λ f ∗ (z1 ) + (1 − λ) f ∗ (z2 ) (28) Let us expand the left hand side of the inequality to be proved. 3. The subdifferential at any point x < 0 is the singleton set {-1}. f ∗ (λz1 + (1 − λ)z2 ) = sup {xt (λz1 + (1 − λ)z2 ) − f (x)} x∈Rn (29) We can rewrite f (x) as f (x) = λ f (x) + (1 − λ) f (x) (30) . For the same notorious | x | function the differential is not defined at x = 0 because x is not defined at x = 0 while subdifferential at point x = 0 is the close interval |x| [-1. 4.1] which is always below the function.

VII and replacing it in the equation above yields a new expression which is f ∗ (λz1 + (1 − λ)z2 ) = sup {λ(xt z1 − f (x)) + (1 − λ)(xt z2 − f (x))} x∈Rn (31) Assuming p1 = (xt z1 − f (x)) and p2 = (xt z2 − f (x)) for brevity. we can substitute for sup {λ(xt z1 − f (x))} = λ f ∗ (z1 ) x∈Rn (34) sup {(1 − λ)(xt z2 − f (x))} = (1 − λ) f ∗ (z2 ) x∈Rn (35) and We arrive it our desired result which is f ∗ (λz1 + (1 − λ)z2 ) ≤ λ f ∗ (z1 ) + (1 − λ) f ∗ (z2 ) (36) f ∗ (z) Therefore. . we know that the maximum value of yt b is ||y|| so it is trivial to see that max {yt b − ||y||} = 0 ∀y ∈ R ||b||≤1 (40) Therefore. 5 5. we can write the conjugate as  0 if ||z|| ≤ 1 f ∗ (z) = ∞ otherwise The fact that the conjugate is ∞ when ||z|| > 1 can be easily explained using Figure 4. we know that sup {λp1 + (1 − λ)p2 } ≤ sup {λp1 } + sup {(1 − λ)p2 } x∈Rn x∈Rn x∈Rn (32) It is the property of supremum which states that sup {x + y} ≤ sup {x} + sup {y} (33) Therefore.1 Examples Example 1: Norm function (non-differentiable at zero) f (y) = ||y|| ∗ (37) t f (z) = sup{y z − ||y||} y ∈R (38) By using Cauchy-Schwarz inequality we can also write ||y|| = max yt b ||b||≤1 (39) Now. is convex always irrespective of the whether the function f (x) is convex or not.

4.VIII Fig. The image explains the process involved when fitting a tangent with slope p = 2 or z = 2. to be able to be tangent to the function. then the LF transform of a function f (y) for an n-dimensional vector y is defined as f ∗ (z) = sup {yt z − f (y)} y∈Rn (48) . Any line with slope ∈ / [-1.1] has to intersect the y-axis at ∞.3 (45) (46) Example 3: A general quadratic n-D curve f (y) = 1 t y Ay 2 (47) Let us assume that A is symmetric. 5.2 Example 2: Parabola f (y) = y2 (41) LF transform of a function f (y) for an n-dimensional vector y is defined as f ∗ (z) = sup{yz − f (y)} y ∈R (42) The function attains maxima when ∂y (yz − y2 ) = 0 (43) z − 2y = 0 (44) z = 2y substituting the value of y in the above function f ∗ (z) we get  2 z 1 z = z2 f ∗ (z) = z − 2 2 4 5.

4 (53) (54) (55) (56) Example 4: l p Norms f (y) = 1 ||y|| p p ∀ 1<p<∞ (57) Again writing the LF transform as f ∗ (z) = sup{yt z − y ∈R 1 ||y|| p } p (58) . The function attains maxima when 1 ∂y (yt z − yt Ay) = 0 2 1 z − (A + A T )y = 0 2 z − Ay = 0 [∵ A y=A −1 (49) (50) is symmetric] z (51) (52) substituting the value of y in the above function f ∗ (z) we get 1 f ∗ (z) = (A−1 z)t z − (A−1 z)t A(A−1 z) 2 1 = zt A−1 z − zt A−1 AA−1 z 2 1 = zt A−1 z − zt A−1 AA−1 z 2 1 t −1 = zA z 2 5. 5. The plot shows the parabola y2 and it’s conjugate which is also a parabola 41 z2 .IX Fig.

X 1 ||y|| p ) = 0 p y z − ||y|| p−1 =0 ||y|| ∂y (yt z − (59) (60) z = ||y|| p−2 y ||z|| = ||y|| p −1 1 p −1 ||y|| = ||z|| z y= p −2 ||z|| p−1 (61) (62) (63) (64) substituting this value of y into the function gives f ∗ (z) = = = zt p −2 p −1 ||z|| zt z p 1 ||z|| p−1 p p 1 − ||z|| p−1 p (66) p −2 p −1 p 1 − ||z|| p−1 p (67) p −2 p 1 − ||z|| p−1 p p 2(p−1)−(p−2) 1 p −1 − ||z|| p−1 = ||z|| p p 2p−2− p+2 1 = ||z|| p−1 − ||z|| p−1 p p p 1 = ||z|| p−1 − ||z|| p−1 p p 1 = (1 − )||z|| p−1 p = ||z|| (65) p −2 p −1 ||z|| ||z||2 ||z|| z− 2 − p −1 (68) (69) (70) (71) (72) 1 = (1 − = 1 1 )||z|| 1− p p 1 1 (1− 1p ) 1 1 ||z|| 1− p (73) (74) (75) Let us call q= 1 (1 − 1p ) (76) .

we obtain f ∗ (z) = 1 ||z||q q 1 1 + =1 p q 5.XI Therefore.5 (77) (78) Example 5: Exponential function Fig. 6. f (y) = ey (79) f ∗ (z) = sup{yt z − ey } (80) y ∈R (81) if z < 0 : supy∈R {yt z − ey } is unbounded so f ∗ (z) = ∞ if z > 0 : supy∈R {yt z − ey } is bounded and can be computed as ∂y (yz − ey ) = 0 (82) z − ey = 0 (83) y = ln z (84) . The plot shows the function e x and it’s conjugate which is z(ln z − 1).

∂y (yz + log y) = 0 1 z+ = 0 y y=− (88) (89) 1 z (90) Substituting this value back into the equation we get 1 z + log(1/ − z) −z ∗ f (z) = −1 − log(−z) f ∗ (z) = This is only valid if z < 0 (91) (92) . 7.6 {yt z − ey } = supy∈R {−ey } (85) =0 Example 6: Negative logarithm f (y) = − log y (86) f ∗ (z) = sup{yt z − (− log y)} y ∈R (87) Fig. The plot shows the function -log y and it’s conjugate which is -1 -log (-z).XII substituting the value of y in the above function f ∗ (z) we get f ∗ (z) = z ln z − z = z(ln z − 1) if z = 0 : supy∈R 5.

and slopes of f are transformed into points of f ∗ – The Legendre-Fenchel transform is more general than the Legendre transform because it is also applicable to non-convex functions as well as nondifferentiable functions. K ∗ yi + G(x)} = − G ∗ (−K ∗ y) (99) x ∈ X y ∈Y Now by definition x∈X . Usually. yi = h x.XIII 6 Summary of noteworthy points – The Legendre-Fenchel transform only yields convex functions – Points of function f are transformed into slopes of f ∗ . K ∗ yi − F ∗ (y) + G(x)} (98) min{h x. F(Kx) corresponds to regularisation term of the form ||Kx || and G(x) corresponds to the data term. yi − F ∗ (y) + G(x)} x ∈ X y ∈Y (94) because F is a convex function then. Going back to Eqn. K ∗ yi (97) where K ∗ is the adjoint conjugate of K which is more general. K T yi (96) and in case the dot product is defined on hermitian space we can write it as hKx. that is min max{hKx. the equation now becomes min max{h x. 7 Applications to computer vision Many problems in computer vision can be expressed in the form of energy minimisations [7]. yi − F ∗ (y)} y ∈Y (95) We know that dot product is commutative so we can re-write hKx. The dual form can be easily derived by replacing F(Kx) with it’s convex conjugate. 94. The Legendre-Fenchel transform reduces to Legendre transform for convex functions. yi = h x. by definition of Legendre-Fenchel transformation F(Kx) = max{hKx. A general class of the functions representing these problems can be written as min{ F(Kx) + G(x)} x∈X (93) where F and G are proper convex functions and K ∈ Rn×m .

one should be able to compute the proximal mapping of F and G. 7. 98. xi = aT x (111) . one can formulate the minimisation steps as yn+1 = ProxσF∗ (yn + σKx n ) x n+1 x n+1 n ∗ n+1 = ProxτG (x − τK y =x n+1 + θ(x n+1 (107) ) n −x ) (108) (109) Note that being able to compute the proximal mapping of F is equivalent to being able to compute the proximal mapping of F ∗ . x n converges to the minimiser of the original energy function. Note: The standard way of writing the update equations is via proximity operator but it reduces to standard pointwise gradient descent with projection onto a unit ball which is expressed in max(1. −K ∗ yi + G(x)} = − G ∗ (−K ∗ y) (102) min{h x. due to Moreau’s identity: x = ProxτF∗ (x) + τProxF/τ (x /τ) (110) It can be shown that if 0 ≤ θ ≤ 1 and στ ||K ||2 < 1.XIV because max{h x. | p|) in subsequent equations. K ∗ yi + G(x)} = − G ∗ (K ∗ y) (101) min{−h x.1 Premilinaries Given a and x are column vectors where the dot product ha. K ∗ yi − G(x)} = G ∗ (K ∗ y) (100) min{−h x. the dual problem then becomes max{− G ∗ (−K ∗ y) − F ∗ (y)} (104) y ∈Y Primal Dual Gap is then defined as min{ F(Kx) + G(x)} − max{− G ∗ (−K ∗ y) − F ∗ (y)} x∈X y ∈Y (105) For the primal-dual algorithm to be applicable. K ∗ yi + G(x)} = − G ∗ (−K ∗ y) (103) x∈X x∈X x∈X x∈X Under the weak assumptions in convex analysis . This projection comes from the indicator function. min and max can be switched in Eqn. defined as: 1 ProxγF (x) = arg min || x − y||2 + γF(y) y 2 (106) Therefore. xi can be written as ha.

.. . .1   am. m...      divA =    ∂ ∂x 0 ··· 0 0 ··· 0 .  m..n   ax       m. .   ..   0 0 0 · · · · · · · · · ∂x     a1.   ∂  . .1 a2.n =  .    x  0    am.   . .2  a1.n  .  am. .  y   ∂  a1. .  . .1 a1.   a2.   a1.2     . .  a  ....  .. .2   . . .1 am.1  .     y   a1. ··· 0 0 ··· ∂ ∂y 0 0 ··· 0 0··· ∂ 0 ∂y 0 ··· 0 ··· . . . 0 ··· 0 0 ··· . .2 · · · am. ..  =  a1. . .n The divergence of matrix where elements are stacked in a vectorial fashion can be derived using the following   ∂ ∂x 0 · · · · · · · · · · · · 0    0 ∂ · · · · · · · · · · · · 0  a1.1   0 ∂ ··· ··· ··· ··· 0     ... . 0 ∂ ∂x .  0  .XV Then we can write the associated derivates with respect to x as ∂aT x ∂xT a = =a (112) ∂x ∂x We will be representing a 2-D image matrix as a vector in which the elements are arranged in lexicographical order. . .. .2 · · · a2. . . . . (113) . . .  a  .   .n ∂ 0 0 0 · · · · · · · · · ∂y x  a1.   a1.1 .n (115) . .  . .n  .n  .n    . ..  . .2   .   .. ∂ 0 0 · · · 0 · · · ∂x y am.1   ∂x  .. .  ∂y  . . . . .n  (114) ∇A =  ∂ 0 · · · · · · · · · · · · 0      ∂y   am. .. . . . ∂y    . .    . . . . .1     . . .1  ax   1.     ax   1.1      a1.n        Am. ... . . .. ..2 · · · a1.. . ...   ay  1. .n   y   am.. . .

j )2 + (∂y ui.j = u(i. j) − u(i − 1.j )2 (118) where the partial derivatives are defined on discrete 2D grid as follows ∂− x ui. j − 1) (120) ∂+x ui.|| norm is an indicator function  0 if || p|| ≤ 1 δ(p) = ∞ otherwise (125) . j) For Maximisation we will be using gradient as pn+1 − pn = ∇ p E(p) σp (123) For Minimisation we will be using gradient as pn − pn+1 = ∇ p E(p) σp (124) NOTE the switch in the iteration numbers in Maximisation and Minimisation.j (122) = u(i.j = u(i + 1. j) (119) ∂− y ui.j | (117) i=1 j=1 |∇ui. j) − u(i. 7. j) − u(i.j = u(i. For brevity. ∂− x are used. NOTE ON COMPUTING GRAD AND DIVERGENCE: To compute ∇ forward differences ∂+x are used while for computing divergence backward differences. j + 1) − u(i.3 ROF Model A standard ROF model can be written as min ||∇u||1 + u∈ X λ ||u − g||22 2 We know that the convex conjugate of ||.j is denoted by u.j | = q (∂ x ui.2 Representation of norms We will be using the following notational brevity to represent the L1 norm: Etv (u) = ||∇u||1 (116) W H ||∇u||1 = ∑ ∑ |∇ui.XVI 7. j) (121) ∂+y ui. ui.

∇ui = ∂u − hu. ∇ui + ||u − g||22 − δP (p)    2  ∂u h p. constant function] (131) ⇒ ∂ p E(u. divpi = −divp  λ ||u − g||22 = λ(u − g) ∂u 2 ∂u δP (p) = 0 ⇒ ∂u E(u. Compute the derivative with respect to p i. ∇ui = ∇u [proof given] (129) λ  ∂p ||u − g||22 = 0 (130) 2 ∂ p δP (p) = 0 [∵ indicator function i. . Use simple gradient descent 1 pn+1 − pn = ∂ p E(u. ∂ p E(u.e.e. p) = ∇u (132) 2. p) = ∂ p h p. p) which is   λ ∂ p E(u. ∂u E(u. | pn + σ ∇un |) un − un+1 = ∂u E(u. p) = −divp + λ(un+1 − g) τ un + τdivpn+1 + τλg un+1 = 1 + τλ 1 (138) (139) (140) (141) (142) The projection onto unit ball max(1. p) = ∇un σ pn+1 = pn + σ ∇un pn + σ ∇un pn+1 = max(1.e. ∇ui + u∈ X p∈ P λ ||u − g||22 − δP (p) 2 (127) Let us call this new function E(u. p) 1. Compute the derivate with respect to u i. ∇ui − δP (p) p∈ P (126) Therefore. | p|) comes from the indicator function as explained in a note just below the Primal Dual Gap paragraph. ∇ui + ||u − g||22 − δP (p) (128) 2   ∂ p h p. In the subsequent equations we have used this property wherever projection is made. we can write the ROF function as min maxh p. p) which is   λ ∂u E(u. p) = ∂u h p. p) = −divp + λ(u − g) (133) (134) (135) (136) (137) 3.XVII ∴   ||∇u||1 = max h p.

∂ p E(u.e. the function we want to minimise is min ||∇u||αh + u∈ X λ ||u − g||22 2 (143) where ||. p) which is   λ α ∂ p E(u. it wouldn’t be possible to do so because the second derivative of Huber model is not continuous. so a simple gradient descent on the function can bring us to the minima while if we were to use Newton-Raphson method which requires the second order derivative. ∇ui − δP (p) − || p||2 + ||u − g||22 2 2   ∂ p h p.4 Huber-ROF model The interesting thing about Huber model is that it has a continuous first derivative. p) = ∇u − αp (147) (148) (149) (150) (151) (152) .XVIII 7. Compute the derivative with respect to p i. p) = ∂ p h p. ∇ui − δP (p) − || p||2 + ||u − g||22 2 2 u∈ X p∈ P (146) Minimisation The minimisation can be carried out following the series of steps 1. ∇ui = ∇u λ  ∂p ||u − g||22 = 0 2   ∂ p δP (p) = 0 [∵ p is an indicator function] α  ∂ p || p||2 = αp 2 ⇒ ∂ p E(u. So.|| function is the same indicator function ∗ f (p) = α 2 ∞ if α < || p|| ≤ 1 otherwise (145) Therefore the minimisation can be re-written as   λ α min max h p.||h is the Huber-Norm and is defined as ( || x ||α = | x |2 2α |x| − α 2 if | x | ≤ α if | x | > α The convex conjugate of a parablic function can be written as f ∗ (p) = α || p||22 2 ∀|| p|| ≤ α (144) and the conjugate of ||.

p) = −divp + λ(un+1 − g) τ un + τdivpn+1 + τλg un+1 = 1 + τλ 8 (161) (162) (163) TVL1 denoising TVL1 denoising can be rewritten as min ||∇u||1 + λ||u − f ||1 u∈ X (164) This can be further rewritten as a new equation where λ is subsumed inside the norm i. Use simple gradient descent pn+1 − pn = ∂ p E(u.|| norm is an indicator function  0 if || p|| ≤ 1 δ(p) = ∞ otherwise   ||∇u||1 = max h p. p) which is   α λ ∂u E(u. ∇ui = ∂u − hu. p) = ∇un − αpn+1 σ pn + σ ∇un pn+1 = 1 + σα (159) (160) pn +σ ∇un pn+1 = 1+σα pn +σ ∇un 1+σα |) max(1. divpi = −divp  λ ||u − g||22 = λ(u − g) ∂u 2 ∂u δP (p) = 0 α  ∂u || p||2 = 0 2 ⇒ ∂u E(u. | un − un+1 = ∂u E(u. p) = −divp + λ(u − g) (153) (154) (155) (156) (157) (158) 3. Compute the derivate with respect to u i.XIX 2. ∂u E(u.e.e. ∇ui − δP (p) − || p||2 + ||u − g||22 2   2 ∂u h p. ∇ui − δP (p) p∈ P   and ||λ(u − f )||1 = max hq. λ(u − f )i − δQ (q) ∴ q∈ Q (166) (167) . min ||∇u||1 + ||λ(u − f )||1 u∈ X (165) We know that the convex conjugate of ||. p) = ∂u h p.

p. q) = ∂ p h p. p. ∂u E(u. q) = ∇u (174) 2. p. q) which is   ∂u E(u. p. q) = λ(u − f ) (180) 3. λ(u − f )i − δP (p) − δQ (q)   ∂ p h p. λ(u − f )i = 0 ⇒ (169) (170) (171) ∂ p δP (p) = 0 [∵ indicator function i. λ(u − f )i − δP (p) − δQ (q) (175)   ∂q h p. ∇ui = ∇u [proof given]   ∂ p hq. we can write the TVL1 denoising function as min max maxh p. p. λ(u − f )i = λq (181) ∂u δP (p) = 0 (184) ∂u δQ (q) = 0 (185) ∂u E(u. ∇ui = ∂u − hu. q) 1. p. p. constant function] (172) ∂ p δQ (q) = 0 (173) ∂ p E(u. p. q) which is   ∂q E(u. divpi = −divp   ∂u hq. ∇ui + hq. ∂u E(u. ∂ p E(u. p. λ(u − f )i − δP (p) − δQ (q) u∈ X p∈ P q∈ Q (168) Let us call this new function E(u. Compute the derivative with respect to p i.e. q) = ∂u h p. q) = ∂q h p. λ(u − f )i = λ(u − f ) (177) ⇒ ∂q δP (p) = 0 (178) ∂q δQ (q) = 0 (179) ∂q E(u.e. p) = −divp + λq (186) ⇒ (182) (183) . ∇ui + hq. ∇ui = 0 (176)   ∂q hq. ∇ui + hq. Compute the derivate with respect to u i. λ(u − f )i − δP (p) − δQ (q)     ∂u h p. q) which is   ∂ p E(u.XX Therefore. Compute the derivate with respect to q i. ∇ui + hq.e.e.

q) = ∇un σ pn+1 = pn + σ ∇un pn + σ ∇un pn+1 = max(1. p.e. q) = −divpn+1 + qn+1 τ un+1 = un + τdivpn+1 − τqn+1 8. p. p.1 (195) (196) (197) (198) (199) (200) (201) (202) Image Deconvolution min ||∇u||1 + u∈ X λ || Au − g||22 2 (203) The problem can be written in terms of saddle-point problem as min maxh p. |qn + σλ(un − f )|) un − un+1 = ∂u E(u. q) = (un − f ) σ qn+1 = qn + σ(un − f ) qn + σ(un − f ) qn+1 = |qn +σ(un − f )| ) max(1. Use simple gradient descent pn+1 − pn = ∂ p E(u. The lambda appears only in the projection step i. λ un − un+1 = ∂u E(u. p. ∇ui + u∈ X p∈ P λ || Au − g||22 − δP (p) 2 (204) . pn+1 − pn = ∂ p E(u. q) = ∇un σ pn+1 = pn + σ∇un pn + σ ∇un pn+1 = max(1. | pn + σ ∇un |) qn+1 − qn = ∂q E(u. | pn + σ ∇un |) qn+1 − qn = ∂q E(u. p. q) = λ(un − f ) σ qn+1 = qn + σλ(un − f ) qn + σλ(un − f ) qn+1 = max(1. p.XXI 4. q) = −divpn+1 + λqn+1 τ un+1 = un + τdivpn+1 − τλqn+1 (187) (188) (189) (190) (191) (192) (193) (194) Another alternative that is used in [8] is to do the projection a bit differently where the update equations on q and u have slightly different equations.

∇ui − δP (p) + || Au − g||22 2   ∂ p h p. divpi = −divp    λ ∂u || Au − g||22 = λ A T Au − A T g 2 ∂u δP (p) = 0   T T ⇒ ∂u E(u. p) = −divp + λ A Au − A g (210) (211) (212) (213) (214) ∂u || Au − g||22 = ∂u (Au − g)T (Au − g) T T T T T (215) (Au − g) (Au − g) = ((Au) − g )(Au − g) T T T T T (216) ((Au) − g )(Au − g) = (u A − g )(Au − g) T T T T T (217) T T (218) [Lets say B is A A] (219) (u A − g )(Au − g) = u A Au − u A g + g Au + g g T T T T ∂u u A Au = ∂u u Bu T T ∂u u Bu = (B + B )u T T T T T T T T T (220) T T ∂u u A Au = (A A + (A A) )u T ∂u u A Au = (A A + A A)u ∂u u A Au = 2A Au (221) (222) (223) . ∂u E(u.e. p) = ∂ p h p. p) = ∂u h p. Compute the derivate with respect to u i. p) which is   λ ∂ p E(u. Compute the derivative with respect to p i. ∇ui − δP (p) + || Au − g||22 2     ∂u h p.e. ∇ui = ∂u − hu. ∇ui = ∇u  λ ∂p || Au − g||22 = 0 2   ∂ p δP (p) = 0 [∵ p is an indicator function] ⇒ ∂ p E(u. ∂ p E(u. p) = ∇u (205) (206) (207) (208) (209) 2. p) which is   λ ∂u E(u.XXII Minimisation The minimisation can be carried out following the series of steps 1.

q) which is ∂u E(u. Use simple gradient descent un − un+1 τ pn+1 − pn = ∂ p E(u. p. p. which then yields min max h p. p) = ∇un σ pn+1 = pn + σ ∇un pn + σ ∇un pn+1 = max(1. Compute the derivate with respect to u i. ∇ui + h Au − g. q) which is ∂ p E(u. q) which is ∂q E(u. Compute the derivative with respect to p i. ∂u E(u. q) = −divp + A T q (234) . Compute the derivate with respect to q i.q∈ Q 1 ||q||2 2λ (231) 1. qi − δP (p) − u∈ X p∈ P. ∂q E(u.e. p. q) = Au − g − 1 q λ (233) 3. p.e.XXIII 3. Therefore. p. p) = −divpn+1 + λ u − AT g 1   un+1 I + τλA T A = un + τdivpn+1 + τλA T g u n+1  = T I + τλA A  −1 (un + τdivpn+1 + τλA T g) (224) (225) (226) (227) (228) (229) (230) This requires matrix inversion.e. | pn + σ ∇un |)  T  (A A) n+1 = ∂u E(u. p) = ∇u (232) 2. one resorts to using Fourier Analysis. Another alternative is to dualise again with respect to u. ∂ p E(u. In some cases the matrix may be singular because it is generally sparse and therefore inversion is not a feasible solution.

| pn + σp ∇un |) qn+1 − qn 1 = ∂q E(u. one can make use of the fact that h ∗ u can be expressed as a linear function of sparse matrix D. q) = Aun − g − qn+1 σq λ n + σ Aun − σ g q q q qn+1 = σ 1 + λq un − un+1 = ∂u E(u.XXIV 4. h˜ ≡ h(− x) 8.e. i. If one wants to take the derivate with respect to u. Rewriting the equation we can derive E = (Du − f )2 = (Du − f )T (Du − f ) (243) Now it is very trivial to see the derivative of this function with respect to u. q) = ∇un σp pn+1 pn+1 = pn + σp ∇un pn + σp ∇un = max(1. Du. Use simple gradient descent pn+1 − pn = ∂ p E(u. Referring to the Eqn. p.e. we can then write the derivative of E with respect to u as follows ∂E = 2D T (Du − f ) ∂u Du = h ∗ u D T (Du − f ) = h˜ ∗ (h ∗ u − f ) (244) (245) (246) where h˜ is the mirrored kernel. i. p. 76 in the [2]. One of the benefits of using the LegendreFenchel transformation. p) = −divpn+1 + A T qn+1 τ un+1 = un + τdivpn+1 − τA T qn+1 (235) (236) (237) (238) (239) (240) (241) This saves matrix inversion. Interesting tip Imagine we have a function of the form E = (h ∗ u − f )2 (242) where ∗ operator denotes the convolution.2 Optic Flow Optic flow was popularised by Horn and Schunk’s seminal paper [4] which has over the next two decades sparked a great interest in minimising the energy function associated with computing optic flow and it’s various different .

r) = ∇u (249) 2. v) at any pixel (x. Writing the standard L1 norm based optic flow equation   min ||∇u||1 + ||∇v||1 + λ| I1 (x + f ) − I2 (x)| (247) u∈ X. r) which is ∂u E(u. r) which is ∂q E(u. q. λ(I1 (x + f ) − I2 (x))i) (252) Linearising around f 0 . we can rewrite the original energy formulation in it’s primal-dual form as  max min p ∈ P. Various derivates required for gradient descent can be computed as shown below 1. v. q. q. (v − v0 )t Iy i (255) . p. r) = λ(I1 (x + f ) − I2 (x)) (251) 4. r) which is ∂ p E(u. v. q for ∇v and r for I1 (x + f ) − I2 (x).v∈Y where f is a flow vector (u. p. Substituting p for dual variable corresponding to ∇u. ∂ p E(u. p. y) is replaced by x. v. ∂r E(u.XXV formulations [5][6][9]. Compute the derivate with respect to r i. we used while writing the dual formulation of the data term in TV-L1 denoising equation 167. q. r) = −divp + ∂u (hr.e. ∂r E(u. q.e. ∇ui + hq. (253) We can expand the terms involving f and f 0 as I1 (x + f 0 ) − I2 (x)+( f − f 0 )t [Ix Iy ]T = I1 (x + f 0 ) − I2 (x)+(u − u0 )t Ix +(v − v0 )t Iy (254) It is easy to then expand the dot-product expression involving r as hr. λ(I1 (x + f ) − I2 (x))i = hr. r) = ∇v (250) 3. λ(I1 (x + f ) − I2 (x))i − δp (P) − δq (Q) − δr (R) We have used the same trick for writing the dual formulation corresponding to the I1 (x + f ) − I2 (x) by subsuming the λ inside. (u − u0 )t Ix i + λhr. p. λ(I1 (x + f 0 ) − I2 (x) + ( f − f 0 )t [Ix Iy ]T )i = λhr.e. λ(I1 (x + f 0 ) − I2 (x))+( f − f 0 )t [Ix Iy ]T )i. r) which is ∂r E(u. v. ∂q E(u. I1 (x + f 0 ) − I2 (x)i + λhr. y) in the image. Compute the derivate with respect to p i. q. p.v∈Y h p. Compute the derivate with respect to u i. p. p.r ∈ R u ∈ X. q. For brevity (x. we can rewrite the above expression involving r as hr. ∇vi  (248) + hr. q. v. v. p. v. v. Compute the derivate with respect to q i.q∈ Q.e.

(u − u0 )t Ix i = hr. q. | pn + σp ∇un |) (259) (260) 2. Maximise with respect to q qn+1 qn+1 − qn = ∇vn σq qn + σq ∇vn = max(1. |qn + σq ∇vn |) (261) (262) 3. The derivative of E with respect to u can be then written as ∂u E(u. r) = −divp + λIx T r (257) 5. v. r) which is ∂v E(u. r) = −divq + λIy T r (258) Gradient descent equations then follow straightforward 1. Maximise with respect to p pn+1 pn+1 − pn = ∇un σp pn + σp ∇un = max(1. q.XXVI hr. Minimise with respect to u un − un+1 = −divpn+1 + λIx T r n+1 σu (265) 5. p. Ix (u − u0 )i = r t Ix (u − u0 ) = hIx T r. |r n + σr λ(I1 (x + f n ) − I2 (x))|) (263) (264) 4. v. Compute the derivate with respect to v i. Maximise with respect to r r n+1 r n+1 − r n = λ(I1 (x + f n ) − I2 (x)) σr r n + σr λ(I1 (x + f n ) − I2 (x)) = max(1. q. p. v. ∂v E(u. (u − u0 )i (256) Ix is a diagonal matrix composed of entries corresponding to gradient along x-axis for each pixel and similarly Iy is composed of gradient along y-axis.e. p. Minimise with respect to v vn − vn+1 = −divqn+1 + λIy T r n+1 σv (266) .

DBWi uˆ − f i i X − ||qˇ|| − δ{|qˇi |≤(ξh)2 } 2(ξh)2 i =1 Minimisation Minimisation equations can be written as follows (275) . ∇ui − ||∇u|| is an indicator function defined by  0 if ||k|| ≤ 1 ∗ f (k) = (272) ∞ otherwise Therefore. we can write the f ∗ (p) as  0 ∗ f (p) = ∞ Now replace k by p λ if ||k|| ≤ 1 otherwise (273) we can then come to an expression ( ∗ f (p) = 0 ∞ p if || || ≤ 1 λ otherwise (274) The saddle-point formulation then becomes  eu ˆ ∇uˆ i − min max h p. let us now rewrite the conjugate for λ||.  N  ed ξh 2 ˇ + ∑ hqˆi .  min u∈ X λ||∇uˆ ||ehu N ξh + ∑ ||DBWi uˆ − fˇi ||ed  With λ > 0. ∇ui − ||∇u|| u ∈R (270) f ∗ (p) = λ f ∗ (k) (271)   But we know that supu∈R hk. ∇ui − λ||∇u|| u ∈R   p f ∗ (p) = λ sup h . || pˆ ||2 − δ{| pˆ |≤λh2 } ˆ qˇ 2λh2 u ∈ X p. ∇ui − ||∇u|| λ u ∈R Let us now denote p λ (267) i=1 (268) (269) = k.XXVII 8.3 Super-Resolution The formulation was first used in [8] but we will describe here the minimisation procedue below.|| we see   f ∗ (p) = sup h p. then we can write   f ∗ (p) = λ sup hk.

DBWi uˆ − f i i X − 2(ξh)2 i=1 (276)   ∂ p h p. qi ) which is N ˆ p.e. p. ˆ qˇi ) = (ξh)2 (DBWi uˆ − fˇi ) − ∂qˇi ∂qˇi E(u. Compute the derivate with respect to qi i. ∂u E(u. Use simple gradient descent pn+1 − pn eu = ∂ p E(u. (281) i=1 N ⇒ ˆ p. p.XXVIII 1.e. ∇ui = ∇u ⇒ (277) ∂ p E(u. ˆ qˇi ) = −divp + (ξh)2 ∑ (WiT BT DT qˇi ) ∂u E(u. ˆ q) ˇ = ∂ p h p.e. p) = h2 ∇un − 2 pn+1 σp λh σp eu n+1 p pn+1 − pn = σp h2 ∇h un − λh2 σp h2 ∇h un + pn pn+1 = σp eu 1 + λh 2 pn+1 = pn+1 max(1. ˆ qˇi ) = (ξh)2 (DBWi uˆ − fˇi ) − d 2 qˇi ∂qˇi E(u. p) which is  eu ˆ p. Compute the derivative with respect to p i. ˆ ∇uˆ i − ∂ p E(u. Compute the derivate with respect to u i. (ξh) ˆ p. | pn+1 | λ ) (283) (284) (285) (286) . (279) (280) 3. || pˆ ||2 − δ{| pˆ |≤λh2 } 2λh2  N  ed ξh 2 ˇ ||qˇ|| − δ{|qˇi |≤(ξh)2 } + ∑ hqˆi . (282) i=1 4. qi ) which is ⇒   ed 2 ˇ || q || i 2(ξh)2 e ˆ p. ˆ qˇi ) = −divp + (ξh)2 ∑ ∂u (qˇiT (DBWi uˆ − fˇi )) ∂u E(u. p) = ∇u − eu pˆ λh2 (278) 2. ∂u E(u. ∂ p E(u.

4 Super Resolution with Joint Flow Estimation Let us now try to turn our attention towards doing full joint tracking and super resolution image reconstruction. D.ˆ D.  ˆ D. Before we derive anything let’s try to formulate the problem from bayesian point of view. We are given the downsampling. we can write this in terms of likelihoods and priors as N N ˆ {wˆ i }i=1 P(u. B) = || DBu(x ˆ + wˆ i ) − fˇi || − logP( fˇi |wˆ i . (295) ˆ D. while P(wˆ i . u. the L1 norm can be expressed as follows ˆ D. (293) Using standard bayes rule. τ i=1   N uˆ n+1 = uˆ n − τ − divpn+1 + (ξh)2 ∑ (WiT BT DT qˇin+1 ) (292) i=1 8.XXIX qˇin+1 − qˇin e = (ξh)2 (DBWi uˆ n − fˇi ) − d 2 qˇin+1 σq (ξh) σq ed n+1 qˇin+1 − qˇin = σq (ξh)2 (DBWi uˆ n − fˇi ) − qˇ (ξh)2 i qˇn + σq (ξh)2 (DBWi uˆ n − fˇi ) qˇin+1 = i σq ed 1 + (ξh) 2 qˇin+1 = qˇin+1 max(1. B) |{ fˇi }i=1 P(u. blurring operators and we want to determine the optical flow between the images and reconstruct the super resolution image at the same time. u. B) is our standard super resolution likelihood model and under P( fˇi |wˆ i . B) (294) i=1 ˆ D. B) ∝ N ∏ P( fˇi |wˆ i . N  ˆ ˆ P( w ) ∏ i P(u) i=1 (296) . B) marks our prior for the super resolution image and the flow. u. u. The posterior probability can be written as N N ˆ {wˆ i }i=1 . u. D. |{ fˇi }i=1 . B)P(wˆ i .ˆ D. It can be easily simplied under the assumption that flow prior is independent of super resolution prior. |qˇin+1 |) (287) (288) (289) (290) N uˆ n − uˆ n+1 ˆ p. B) = P(wˆ i . ˆ qˇi ) = −divpn+1 + (ξh)2 ∑ (WiT BT DT qˇin+1 ) (291) = ∂u E(u. u.

∇wˆ xi i + hryi . ∇wˆ yi i} + λh p. DBu(x i=1 (300) N + ∑ µi {hr xi . ∇uˆ i i=1 Optimising with respect to qi qin+1 − qin = DBuˆ n (x + win ) − fˇi σq (301) Optimising with respect to p pn+1 − pn = ∇un σp (302) Optimising with respect to r xi n r n+1 xi − r xi = ∇wnxi σrx (303) Optimising with respect to ryi n+1 − r n ryi yi σry n = ∇wyi (304) . ) = ∑ hqi . ) = ∑ || DBu(x (299) We dualise with respect to each L1 norm and obtain the following expression N N ˆ {wˆ i }i=1 ˆ + wˆ i ) − fˇi i E(u.XXX The priors are standard TV-L1 priors and can be written as − logP(wˆ i ) = µi {||∇wˆ xi || + ||∇wˆ yi ||} (297) ˆ = λ||∇uˆ || − logP(u) (298) and The combined energy function can be written as N N i=1 i=1 N ˆ {wˆ i }i=1 ˆ + wˆ i ) − fˇi || + ∑ µi {||∇wˆ xi || + ||∇wˆ yi ||} + λ||∇uˆ || E(u.

DB u(x σw   n −1 n n −1 ˇ ˆ + wˆ i )(wˆ yi − wˆ yi ) − f i i + hr xi . r xi and ryi are done on a coarseto-fine pyramid fashion. . wxi . we can minimise the energy function with ren respectively. The obtained update equations can be written spect to wˆ xi and wˆ yi as   wˆ nxi − wˆ nxi−1 ˆ + wˆ in−1 ) + ∂ x u(x ˆ + wˆ in−1 )(wˆ nxi − wˆ nxi−1 ) = ∂wˆ xi hqi . wyi . ∇wˆ yi i + ∂y u(x (309) wˆ nxi − wˆ nxi−1 = IxT B T D T qin − divr nxi σw (310) or ˆ nxi wˆ n+1 xi − w = IxT B T D T qin+1 − divr n+1 xi σw ˆ + wˆ in ))) Ix = diag(∂ x (u(x (311) (312) Optimising with respect to w ˆ yi Similar optimisation scheme with respect to wˆ yi yields a similar update equation n+1 − w n ˆ yi wˆ yi σw n+1 = IyT B T D T qin+1 − divryi (313) ˆ + wˆ in ))) Iy = diag(∂y (u(x (314) Optimisations with respect to qi . ∇wˆ xi i + hryi .XXXI Optimising with respect to w ˆ xi The linearisation around the current solution leads to expanding the flow equation as ˆ n ) − fˇi ˆ + wˆ in ) − fˇi = DBu(x ˆ + wˆ in−1 + dw DBu(x i (305) (306) ˆ n ) − fˇi = DB{u(x ˆ n ˆ + wˆ in−1 + dw ˆ + wˆ in−1 ) + ∂ x u(x ˆ + wˆ in−1 )dw DBu(x i xi ˆ n } − fˇi ˆ + wˆ n−1 )dw + ∂y u(x (307) yi i ˆ n and dw ˆ n by wˆ n − wˆ n−1 and wˆ n − wˆ n−1 respectively we can Replacing dw xi yi xi yi xi yi rewrite the above equation as ˆ n ) − fˇi = DB{u(x ˆ + wˆ in−1 + dw ˆ + wˆ in−1 )+ ∂ x u(x ˆ + wˆ in−1 )(wˆ nxi − wˆ nxi−1 ) DBu(x i (308) ˆ + wˆ n−1 )(wˆ n − wˆ n−1 )} − fˇi + ∂y u(x i yi yi Treating now wˆ in−1 to be constant.

∇ui − δP (p) + hq. A new formulation from Pock et al. 94 so that we treat our u in this equation as x there. N ∑i=1 1 |Kij |2−α and σi = M ∑ j=1 1 |Kij |α (321) . ||K ||. It is τj = where generally α = 1. For instance if we try to look at our problem of ROF model and dualise it we can see that min ||∇u||1 + u∈ X λ ||u − g||22 2 (317) using p as a dual to ∇u and q as a dual to u − g we can reduce this to it’s dual form 1 2 q (318) min max maxh p.XXXII Optimising with respect to uˆ Given the current solution for wˆ in we can write ˆ + wˆ in ) as a linear function of uˆ by multiplying it with warping matrix Win uˆ u(x  N  uˆ n+1 − uˆ n (n+1) T T T n+1 n+1 = − ∑(Wi ) B D qi − divp (315) σuˆ i=1 9 Setting the step sizes The constants τ and σ are usually very easy to set if the operator K in the equation min F(Kx) + G(x) (316) x∈X is a simple operator. However. y = . We can then rewrite this above expression very simply in x and y as min maxhKx.g. [10] describe a way to set the τ and σ such that the optimality condition of convergence still holds.e.e. F ∗ (y) = and G(x) = 0 1 2 T I q q f + 2λ q (319) (320) It is easy to see that if K has a simple form. ||K ||. in the case of deblurring or super resolution K would have different entries in each row and it’s hard to come up with a closed form solution of the norm of K in which case one would like to know how to set the τ and σ so that we can carry out the succession of iterations for our variables involved in minimisation. only to simply this equation to obtain an expression in the form of Eqn. x = u. we can write the closed form solution of the norm of K. in which case τ and σ can be easily found from the constraint that τ σ L2 ≤ 1. u − f i − 2λ u∈ X p∈ P q∈ Q Let us denote that we want to use a single dual variable y as a substitute for concatenated vector of p and q. yi − F ∗ (y) + G(x) x ∈ X y ∈Y where our K now is        δP (p) ∇ p K= . i. where L is the operator norm i. if K has some complicated structure e.

However. Now if we were to write the gradient descent update step with respect to u. ∇u) = min ||∇u||22 + u∈ X λ ||u − g||22 2 (322) If we were to use the standard Euler-Lagrange equations. un+1 − un ∂ ∂ = −(λ(u − f ) − 2 (u x ) − 2 (uy )) σu ∂x ∂y (325) It therefore involves the Laplacian which is nothing but the ∇2 u = ∂u x ∂uy ∂2 u ∂2 u + = 2+ 2 ∂x ∂y ∂x ∂y (326) Therefore our minimisation with respect to u takes us to the final gradient step update as un+1 − un = −(λ(u − f ) − 2∇2 u) σu (327) .1 If the function is well convex and differentiable.e. expressing any convex function as a combination of simple linear functions in dual space makes the whole problem easy to handle. we will obtain the following updateequation. We take the ROF model and replace the L1 norm with the standard L2 norm. ∇u) =− σu ∂u where ∂E(u. we will obtain the following update equation un+1 − un ∂E(u. ∇u) ∂E ∂ ∂E ∂ ∂E = − − ∂u ∂u ∂x ∂u x ∂y ∂uy (323) (324) where ofcourse our ||∇u||2 is defined as (u2x + u2y ) where u x is the derivative with respect to x and similarly for y.∇u) ∂u is defined according to Euler-Lagrange equation as     ∂E(u.XXXIII 10 When to and when not to use Duality: What price do we pay on dualising a function? It may at first seem a bit confusing that adding more variables using duality makes the optimisation quicker. Working on primal and dual problems at the same time brings us closer to the solution very quickly. Switching between min and max between the optimisation means a strong duality holds. E(u. i. 10. should we still dualise? Let us take an example of a function which is convex and differential everywhere.

Volume 56.tugraz. Schunck. 1981. XXIV 5. Therefore. pp 185-203.: Determining optical flow. Zach.G.: Convex Analysis. References 1. Simon Baker and Ian Matthews: Lucas-Kanade 20 Years On: A Unifying Framework. http://gpu4vision. T. T.pdf XXIV 3. we could see the data-term and regulariser-term are decoupled. XXV .dk/pubdb/views/edoc_download. However.P. u − f i − q 2 2λ (328) Optimising with respect to p pn+1 − pn = ∇un − pn+1 σp pn + σp ∇un pn+1 = 1 + σp (329) (330) Optimising with respect to q qn+1 − qn 1 = un − f − qn+1 σq λ (331) Optimising with respect to u un − un+1 = −divpn+1 + qn+1 σu (332) It is clear that while updating q we only work on u − f and while updating p we only work on ∇u. the Primal-Dual form decouples the data and the regulariser terms. vol 17. The equations we obtain are system of linear equations and are pointwise separable.icg. Proceedings of the DAGM conference on Pattern recognition. Bishof: A Duality Based Approach for Realtime TV-L1 Optical Flow. Following is the Primal-Dual min-max equation for the same problem. Pock and H. we will still have to compute the derivative of the regulariser in order to update with respect to u.at/ 4. Horn and B.php/ 3274/pdf/imm3274. International journal of computer vision.dtu. min max maxh p. Rockafellar. Number 3. ∇ui − u∈ X p∈ P q∈ Q 1 2 1 2 p + hq.K. II 2. While in the case of Euler-Lagrange we will have to approximate the Laplacian operator with a kernel which makes the solution at on point dependent on the neighbours. B. C. if we were to dualise the function. Matrix Cookbook: http://www2. 221-255 XXV 6. It makes the problem easier to handle.imm. Artificial Intelligence. 2007.XXXIV It is therefore clear that if we were to use the Euler-Lagrange equations.

. D. Journal of Mathematical Imaging and Vision XIII 8. XXVII 9. Pock. Chambolle.XXXV 7. Pock. Pock. A.. T. Werlberger. Chambolle.. International Conference on Computer Vision 2011 XXXII .. International Conference on Computer Vision 2009 XXV 10. F.. XXI.: A First-Order Primal-Dual Algorithm for Convex Problems. Steinbruecker. T. Pock.. Unger. A.:Diagonal preconditioning for first order primal-dual algorithms in convex optimization. H. M. T. M. T.. Cremers. Bishof.: Large Displacement Optical Flow Computation without Warping.: A covex approach for variational Super-Resolution.