Professional Documents
Culture Documents
Gradient Methods
Gradient Methods
May 2005
Preview
Background
Steepest Descent
Conjugate Gradient
Preview
Background
Steepest Descent
Conjugate Gradient
Background
Motivation
The gradient notion
The Wolfe Theorems
Motivation
min f ( x)
x
But we learned in calculus how to solve that
kind of question!
Motivation
Not exactly,
Functions: f :R R
n
High order polynomials:
1 3 1 5 1 7
x x x x
6 120 5040
What about function that don’t have an analytic
presentation: “Black Box”
Motivation- “real world” problem
x x
2
n3
Es ( x
) i j 1
( i , j )E
n
Er ( x
n3
) L( xi ) 2
i 1
1
L( xi )
di
( i , j )E
x j xi
Motivation- “real world” problem
Then we minimize:
Motivation
The gradient notion
The Wolfe Theorems
1 1
f := ( x , y )cos x cos y x
2 2
Directional Derivatives:
first, the one dimension
derivative:
Directional Derivatives :
Along the Axes…
f ( x, y )
y
f ( x, y )
x
Directional Derivatives :
In general direction…
vR 2
v 1
f ( x, y )
v
Directional
Derivatives
f ( x, y ) f ( x, y )
y x
2
The Gradient: Definition in R
f f
f :R R
2
f ( x, y ) :
x y
In the plane
f ( x , y )
The Gradient: Definition
f :R R
n
f f
f ( x1 ,..., xn ) : ,...,
x1 xn
The Gradient Properties
f f
z x y
x y
The Gradient properties
v 1
f
( p ) f p , v
v
v
f p
The Gradient properties
Proposition 1:
is maximal choosing 1
v f p
f f p
v
is minimal choosing 1
v f p
f p
f ( x, y ) 1
( p ) (f ) p , (f ) p
v (f ) p
2
1 f p
f p , f p f p
f p f p
The Gradient properties
f p 0
Proof:
Intuitive:
The Gradient Properties
We get:
df ( p t v)
0 (0) (f ) p , v
dt
(f ) p 0
The Gradient Properties
Motivation
The gradient notion
The Wolfe Theorems
The Wolfe Theorem
min f ( x)
x
The Wolfe Theorem
And,
x : f ( x) 0 k ( x) 0
And, the search vectors constructed by the
model algorithm satisfy:
f ( xi ), hi k ( xi ) f ( xi ) hi
The Wolfe Theorem
And f ( y ) 0 hi 0
Then {ifxi }i0 is the sequence constructed by
the algorithm model,
then any accumulation point y of this sequence
satisfy:
f ( y ) 0
The Wolfe Theorem
hi
f ( xi )
Preview
Background
Steepest Descent
Conjugate Gradient
Steepest Descent
What it mean?
We now use what we have learned to
implement the most basic minimization
technique.
First we introduce the algorithm, which is a
version of the model algorithm.
The problem:
min f ( x)
x
Steepest Descent
f ( y ) 0
Proof: from Wolfe theorem
x x
2
n3 2
Es ( x ) i j 1
( i , j )E
Preview
Background
Steepest Descent
Conjugate Gradient
Conjugate Gradient
d f ( xi 1 ) 0
T
i
d iT Aei 1 0 x1 f ( x i 1 )
~
x0
d A(ei i d i ) 0
T
i d0
d Aei T
d f ( xi ) T
i i
T i
d Ad iT
i d i Ad i x0
Conjugate Gradient
How do we find d j ?
We want that after n step the error will be 0 : x1 e1 ~
x0
n 1
d0 e0
e0 i d i
i 0
j 1
x0
e0 e1 0 d 0 e2 0 d 0 1d1 ... e j i d i
i 0
n 1 j 1
e j i di i di
i 0 i 0
Conjugate Gradient
So if j n,
en 0
Conjugate Gradient
i 1
u1 , u2 ,..., un d i ui i , k d k
k 0
Some series of vectors
Conjugate Gradient
Step 0: d 0 r0 : f ( x0 )
riT ri
Step 1: i T ri : f ( xi )
d i Ad i
Step 2: xi 1 xi i d i
riT1ri 1
Step 3: i 1 r T r
i i