Gradient Methods

Gradient Methods
May 2005
Preview
 Background
 Steepest Descent
 Conjugate Gradient
Preview
 Background
Background
 Motivation
 The gradient notion
 The Wolfe Theorems
Motivation
 The min(max) problem:
min f ( x)
x
 But we learned in calculus how to solve that
kind of question!
Motivation
 Not exactly,
 Functions: f :R R
n
 High order polynomials:
1 3 1 5 1 7
x  x  x  x
6 120 5040
 What about function that don’t have an analytic
presentation: “Black Box”
Motivation- “real world” problem
 Connectivity shapes (isenburg,gumhold,gotsman)

mesh  {C  (V , E ), geometry}
 What do we get only from C without geometry?
 First we introduce error functionals and then try

to minimize them:
  x x 
2
n3
Es ( x 
 ) i j 1
( i , j )E
n
Er ( x 
 n3
)   L( xi ) 2
i 1
1
L( xi ) 
di

( i , j )E
x j  xi
 Then we minimize:
E (C ,  )  arg min 1    Es ( x)   Er ( x) 

xn3
 High dimension non-linear problem.

 The authors use conjugate gradient method
which is maybe the most popular optimization
technique based on what we’ll see here.
 Changing the parameter:
E (C ,  )  arg min 1    Es ( x)   Er ( x) 

xn3
Motivation
 General problem: find global min(max)

 This lecture will concentrate on finding local
minimum.
Background
 Motivation
1  1 
f := ( x , y )cos x  cos y  x
2  2 
Directional Derivatives:
first, the one dimension
derivative:

Directional Derivatives :
Along the Axes…
f ( x, y )
y
f ( x, y )
x
Directional Derivatives :
In general direction…
vR 2
v 1
f ( x, y )
v
Directional
Derivatives
f ( x, y ) f ( x, y )
y x
2
The Gradient: Definition in R
 f f 
f :R R
2
f ( x, y ) :  
 x y 
In the plane
f ( x , y )
The Gradient: Definition
f :R R
n
 f f 
f ( x1 ,..., xn ) :  ,..., 
 x1 xn 
The Gradient Properties
 The gradient defines (hyper) plane

approximating the function infinitesimally
f f
z   x   y
x y
The Gradient properties
 By the chain rule: (important for later use)
v 1
f
( p )  f p , v
v
v
f p
 Proposition 1:
is maximal choosing 1
v  f p
f f p
v
is minimal choosing 1
v  f p
f p
(intuitive: the gradient points at the greatest change direction)

Proof: (only for minimum case)

1
Assign: v  f p by chain rule:
f p
f ( x, y ) 1
( p )  (f ) p ,  (f ) p 
v (f ) p
2
1  f p
 f p , f p    f p
f p f p
On the other hand for general v:

f ( x, y )
( p )  f p , v  f p  v 
v
 f p
f ( x, y )
 ( p )   f p
v
Proposition 2: let f : R  R be a

n

1
smooth C function around P,
if f has local minimum (maximum) at p
then,
f p  0
(Intuitive: necessary for local min(max))

Proof:
Intuitive:
Formally: for any v  R \ {0}

n
We get:
df ( p  t  v)
0 (0)  (f ) p , v
dt
 (f ) p  0
 We found the best INFINITESIMAL DIRECTION

at each point,
 Looking for minimum: “blind man” procedure
 How can we derive the way to the minimum
using this knowledge?
Background
 Motivation
The Wolfe Theorem
 This is the link from the previous gradient

properties to the constructive algorithm.
 The problem:
min f ( x)
x
The Wolfe Theorem
 We introduce a model for algorithm:

Data: x 0  R
n
Step 0: set i=0

Step 1: if f ( xi )  0stop,
else, compute search direction hi  R
n
Step 2: compute the step-size

i  arg min f ( xi    hi )
 0
Step 3: set xi 1  xi  i  hgoi to step 1
The Wolfe Theorem
The Theorem: suppose f : R   R C1

n
smooth, and exist continuous function:

k : R  [0,1]
n
And,
x : f ( x)  0  k ( x)  0
And, the search vectors constructed by the
model algorithm satisfy:
f ( xi ), hi  k ( xi )  f ( xi )  hi
The Wolfe Theorem
And f ( y )  0  hi  0
Then {ifxi }i0 is the sequence constructed by
the algorithm model,
then any accumulation point y of this sequence
satisfy:
f ( y )  0
The Wolfe Theorem
The theorem has very intuitive interpretation :

Always go in decent direction.
hi
f ( xi )
Preview
 Background
Steepest Descent
 What it mean?
 We now use what we have learned to
implement the most basic minimization
technique.
 First we introduce the algorithm, which is a
version of the model algorithm.
 The problem:
min f ( x)
x
Steepest Descent
 Steepest descent algorithm:

Data: x 0  R
n
Step 0: set i=0

Step 1: if f ( xi )  0stop,
else, compute search direction hi  f ( xi )
Step 2: compute the step-size
 0
Step 3: set xi 1  xi  i  hgoi to step 1
Steepest Descent

 { x }
Theorem: if i i 0 is a sequence
constructed by the SD algorithm, then every
accumulation point y of the sequence satisfy:
f ( y )  0
 Proof: from Wolfe theorem
Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t

given (are calculated numerically).
Steepest Descent
 From the chain rule:

d
f ( xi    hi )  f ( xi    hi ), hi  0
d
 Therefore the method of steepest descent
looks like this:
Steepest Descent
Steepest Descent
 The steepest descent find critical point and

local minimum.
 Implicit step-size rule
 Actually we reduced the problem to finding
minimum:
f :RR
 There are extensions that gives the step size
rule in discrete sense. (Armijo)
Steepest Descent
 Back with our connectivity shapes: the authors

solve the 1-dimension problem analytically.
 0
 They change the spring energy and get a

quartic polynomial in x
 x x 
2

n3 2
Es ( x   ) i j 1
( i , j )E
Preview
 Background
Conjugate Gradient
 We from now on assume we want to minimize

the quadratic function:
1 T
f ( x)  x Ax  bT x  c
2
 This is equivalent to solve linear problem:
0  f ( x)  Ax  b
 There are generalizations to general functions.

Conjugate Gradient
 What is the problem with steepest descent?
 We can repeat the same directions over and

over…
 Conjugate gradient takes at most n steps.
Conjugate Gradient
d 0 ,d 1,...,d j ,... Search directions – should span  n

xi 1  xi   i d i
A~ x b x1
~ e1 ~
e x x
i i 0x
e0
f ( x)  Ax  b  Ax  A~ x d0
f ( xi )  A( xi  ~
x )  Aei x0
Conjugate Gradient
Given dj , how do we calculate j ? (as before)
d f ( xi 1 )  0
T
i
d iT Aei 1  0 x1 f ( x i 1 )
~
x0
d A(ei   i d i )  0
T
i d0
d Aei T
d f ( xi ) T
i   i
 T i
d Ad iT
i d i Ad i x0
Conjugate Gradient
How do we find d j ?
We want that after n step the error will be 0 : x1 e1 ~
x0
n 1
d0 e0
e0    i d i
i 0
j 1
x0
e0  e1   0 d 0  e2   0 d 0  1d1  ...  e j    i d i
i 0
n 1 j 1
e j    i di    i di
i 0 i 0
Conjugate Gradient
Here an idea: if  j  j then:

n 1 j 1 n 1 j 1 n 1
e j    i di    i di    i di    i di    i di
i 0 i 0 i 0 i 0 i j
So if j  n,
en  0
Conjugate Gradient
So we look for d j such that  j  j :

Simple calculation shows that if we take
d Tj Ad i  0 i  j A - conjugate (- orthogonal)
Conjugate Gradient
 We have to find an A conjugate basis

d j , j  0...n  1
 We can do “gram-schmidt” process, but we

should be careful since it is an O(n³) process:
i 1
u1 , u2 ,..., un d i  ui    i , k d k
k 0
Some series of vectors
Conjugate Gradient
 So for a arbitrary choice of ui we don’t earn

nothing.
 Luckily, we can choose ui so that the
conjugate direction calculation is O(m) where
m is the number of non-zero entries in A .
 The correct choice of ui is:
ui  f ( xi )
Conjugate Gradient
 So the conjugate gradient algorithm for minimizing f:
Data: x0   n
Step 0: d 0  r0 : f ( x0 )
riT ri
Step 1:  i  T ri : f ( xi )
d i Ad i
Step 2: xi 1  xi   i d i
riT1ri 1
Step 3:  i 1  r T r
i i
Step 4: d i 1  ri 1   i 1d i and repeat n times.

Gradient Methods

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gradient Methods

Uploaded by

Copyright:

Available Formats

Gradient Methods

 The min(max) problem:

 Connectivity shapes (isenburg,gumhold,gotsman)

 First we introduce error functionals and then try

E (C ,  )  arg min 1    Es ( x)   Er ( x) 

 High dimension non-linear problem.

 Changing the parameter:

E (C ,  )  arg min 1    Es ( x)   Er ( x) 

 General problem: find global min(max)

 The gradient defines (hyper) plane

 By the chain rule: (important for later use)

(intuitive: the gradient points at the greatest change direction)

Proof: (only for minimum case)

On the other hand for general v:

Proposition 2: let f : R  R be a

(Intuitive: necessary for local min(max))

Formally: for any v  R \ {0}

 We found the best INFINITESIMAL DIRECTION

 This is the link from the previous gradient

 We introduce a model for algorithm:

Step 0: set i=0

Step 2: compute the step-size

The Theorem: suppose f : R   R C1

smooth, and exist continuous function:

The theorem has very intuitive interpretation :

 Steepest descent algorithm:

Step 0: set i=0

Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t

 From the chain rule:

 The steepest descent find critical point and

 Back with our connectivity shapes: the authors

 They change the spring energy and get a

 We from now on assume we want to minimize

 There are generalizations to general functions.

 What is the problem with steepest descent?

 We can repeat the same directions over and

d 0 ,d 1,...,d j ,... Search directions – should span  n

Given dj , how do we calculate j ? (as before)

Here an idea: if  j  j then:

So we look for d j such that  j  j :

 We have to find an A conjugate basis

 We can do “gram-schmidt” process, but we

 So for a arbitrary choice of ui we don’t earn

Step 4: d i 1  ri 1   i 1d i and repeat n times.

You might also like