You are on page 1of 53

Gradient Methods

May 2005
Preview

 Background
 Steepest Descent
 Conjugate Gradient
Preview

 Background
 Steepest Descent
 Conjugate Gradient
Background

 Motivation
 The gradient notion
 The Wolfe Theorems
Motivation

 The min(max) problem:

min f ( x)
x
 But we learned in calculus how to solve that
kind of question!
Motivation

 Not exactly,
 Functions: f :R R
n
 High order polynomials:
1 3 1 5 1 7
x  x  x  x
6 120 5040
 What about function that don’t have an analytic
presentation: “Black Box”
Motivation- “real world” problem

 Connectivity shapes (isenburg,gumhold,gotsman)


mesh  {C  (V , E ), geometry}
 What do we get only from C without geometry?
Motivation- “real world” problem

 First we introduce error functionals and then try


to minimize them:

  x x 
2
n3
Es ( x 
 ) i j 1
( i , j )E
n
Er ( x 
 n3
)   L( xi ) 2

i 1

1
L( xi ) 
di

( i , j )E
x j  xi
Motivation- “real world” problem
 Then we minimize:

E (C ,  )  arg min 1    Es ( x)   Er ( x) 


xn3

 High dimension non-linear problem.


 The authors use conjugate gradient method
which is maybe the most popular optimization
technique based on what we’ll see here.
Motivation- “real world” problem

 Changing the parameter:

E (C ,  )  arg min 1    Es ( x)   Er ( x) 


xn3
Motivation

 General problem: find global min(max)


 This lecture will concentrate on finding local
minimum.
Background

 Motivation
 The gradient notion
 The Wolfe Theorems
1  1 
f := ( x , y )cos x  cos y  x
2  2 
Directional Derivatives:
first, the one dimension
derivative:


Directional Derivatives :
Along the Axes…

f ( x, y )
y
f ( x, y )
x
Directional Derivatives :
In general direction…

vR 2

v 1

f ( x, y )
v
Directional
Derivatives

f ( x, y ) f ( x, y )
y x
2
The Gradient: Definition in R
 f f 
f :R R
2
f ( x, y ) :  
 x y 

In the plane
f ( x , y )
The Gradient: Definition

f :R R
n

 f f 
f ( x1 ,..., xn ) :  ,..., 
 x1 xn 
The Gradient Properties

 The gradient defines (hyper) plane


approximating the function infinitesimally

f f
z   x   y
x y
The Gradient properties

 By the chain rule: (important for later use)

v 1
f
( p )  f p , v
v

v
f p
The Gradient properties
 Proposition 1:
is maximal choosing 1
v  f p
f f p
v
is minimal choosing 1
v  f p
f p

(intuitive: the gradient points at the greatest change direction)


The Gradient properties

Proof: (only for minimum case)


1
Assign: v  f p by chain rule:
f p

f ( x, y ) 1
( p )  (f ) p ,  (f ) p 
v (f ) p
2
1  f p
 f p , f p    f p
f p f p
The Gradient properties

On the other hand for general v:


f ( x, y )
( p )  f p , v  f p  v 
v
 f p
f ( x, y )
 ( p )   f p
v
The Gradient Properties

Proposition 2: let f : R  R be a


n

1
smooth C function around P,
if f has local minimum (maximum) at p
then,

f p  0

(Intuitive: necessary for local min(max))


The Gradient Properties

Proof:
Intuitive:
The Gradient Properties

Formally: for any v  R \ {0}


n

We get:

df ( p  t  v)
0 (0)  (f ) p , v
dt
 (f ) p  0
The Gradient Properties

 We found the best INFINITESIMAL DIRECTION


at each point,
 Looking for minimum: “blind man” procedure
 How can we derive the way to the minimum
using this knowledge?
Background

 Motivation
 The gradient notion
 The Wolfe Theorems
The Wolfe Theorem

 This is the link from the previous gradient


properties to the constructive algorithm.
 The problem:

min f ( x)
x
The Wolfe Theorem

 We introduce a model for algorithm:


Data: x 0  R
n

Step 0: set i=0


Step 1: if f ( xi )  0stop,
else, compute search direction hi  R
n

Step 2: compute the step-size


i  arg min f ( xi    hi )
 0
Step 3: set xi 1  xi  i  hgoi to step 1
The Wolfe Theorem

The Theorem: suppose f : R   R C1


n

smooth, and exist continuous function:


k : R  [0,1]
n

And,
x : f ( x)  0  k ( x)  0
And, the search vectors constructed by the
model algorithm satisfy:
f ( xi ), hi  k ( xi )  f ( xi )  hi
The Wolfe Theorem

And f ( y )  0  hi  0
Then {ifxi }i0 is the sequence constructed by
the algorithm model,
then any accumulation point y of this sequence
satisfy:
f ( y )  0
The Wolfe Theorem

The theorem has very intuitive interpretation :


Always go in decent direction.

hi
f ( xi )
Preview

 Background
 Steepest Descent
 Conjugate Gradient
Steepest Descent

 What it mean?
 We now use what we have learned to
implement the most basic minimization
technique.
 First we introduce the algorithm, which is a
version of the model algorithm.
 The problem:
min f ( x)
x
Steepest Descent

 Steepest descent algorithm:


Data: x 0  R
n

Step 0: set i=0


Step 1: if f ( xi )  0stop,
else, compute search direction hi  f ( xi )
Step 2: compute the step-size
i  arg min f ( xi    hi )
 0
Step 3: set xi 1  xi  i  hgoi to step 1
Steepest Descent

 { x }
Theorem: if i i 0 is a sequence
constructed by the SD algorithm, then every
accumulation point y of the sequence satisfy:

f ( y )  0
 Proof: from Wolfe theorem

Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t


given (are calculated numerically).
Steepest Descent

 From the chain rule:


d
f ( xi    hi )  f ( xi    hi ), hi  0
d
 Therefore the method of steepest descent
looks like this:
Steepest Descent
Steepest Descent

 The steepest descent find critical point and


local minimum.
 Implicit step-size rule
 Actually we reduced the problem to finding
minimum:
f :RR
 There are extensions that gives the step size
rule in discrete sense. (Armijo)
Steepest Descent

 Back with our connectivity shapes: the authors


solve the 1-dimension problem analytically.
i  arg min f ( xi    hi )
 0

 They change the spring energy and get a


quartic polynomial in x

 x x 
2


n3 2
Es ( x   ) i j 1
( i , j )E
Preview

 Background
 Steepest Descent
 Conjugate Gradient
Conjugate Gradient

 We from now on assume we want to minimize


the quadratic function:
1 T
f ( x)  x Ax  bT x  c
2
 This is equivalent to solve linear problem:
0  f ( x)  Ax  b

 There are generalizations to general functions.


Conjugate Gradient

 What is the problem with steepest descent?

 We can repeat the same directions over and


over…
 Conjugate gradient takes at most n steps.
Conjugate Gradient

d 0 ,d 1,...,d j ,... Search directions – should span  n


xi 1  xi   i d i
A~ x b x1
~ e1 ~
e x x
i i 0x
e0
f ( x)  Ax  b  Ax  A~ x d0
f ( xi )  A( xi  ~
x )  Aei x0
Conjugate Gradient

Given dj , how do we calculate j ? (as before)

d f ( xi 1 )  0
T
i

d iT Aei 1  0 x1 f ( x i 1 )
~
x0
d A(ei   i d i )  0
T
i d0
d Aei T
d f ( xi ) T
i   i
 T i
d Ad iT
i d i Ad i x0
Conjugate Gradient

How do we find d j ?
We want that after n step the error will be 0 : x1 e1 ~
x0
n 1
d0 e0
e0    i d i
i 0
j 1
x0
e0  e1   0 d 0  e2   0 d 0  1d1  ...  e j    i d i
i 0
n 1 j 1
e j    i di    i di
i 0 i 0
Conjugate Gradient

Here an idea: if  j  j then:


n 1 j 1 n 1 j 1 n 1
e j    i di    i di    i di    i di    i di
i 0 i 0 i 0 i 0 i j

So if j  n,
en  0
Conjugate Gradient

So we look for d j such that  j  j :


Simple calculation shows that if we take
d Tj Ad i  0 i  j A - conjugate (- orthogonal)
Conjugate Gradient

 We have to find an A conjugate basis


d j , j  0...n  1

 We can do “gram-schmidt” process, but we


should be careful since it is an O(n³) process:

i 1
u1 , u2 ,..., un d i  ui    i , k d k
k 0
Some series of vectors
Conjugate Gradient

 So for a arbitrary choice of ui we don’t earn


nothing.
 Luckily, we can choose ui so that the
conjugate direction calculation is O(m) where
m is the number of non-zero entries in A .
 The correct choice of ui is:
ui  f ( xi )
Conjugate Gradient
 So the conjugate gradient algorithm for minimizing f:
Data: x0   n

Step 0: d 0  r0 : f ( x0 )
riT ri
Step 1:  i  T ri : f ( xi )
d i Ad i

Step 2: xi 1  xi   i d i
riT1ri 1
Step 3:  i 1  r T r
i i

Step 4: d i 1  ri 1   i 1d i and repeat n times.

You might also like