You are on page 1of 28

Optimization

Unconstrained Minimization
Def : f(x), x is said to be differentiable at a point x*, if it is defined in a
neighborhood N around x* and if x* +h a vector
n
independent of
h that


where the vector a is called the gradient of f(x) evaluated at x*,
denote it as
> < + > < + = + ) *, ( , , *) ( ) * ( h x h h a x f h x f
T
x
n
x
f
x
f
x
f
x f a
*
] [ ) (
2 1
*
c
c
c
c
c
c
= V =
The term <a,h> is called the 1-st variation.



and
( )
T
n
h x h x h x ] ) *, ( ) *, ( [ ,
1
*
| | | =
n i h x
h
i
,.... 2 , 1 , 0 ) *, ( lim
0
= =

|
Unconstrained Minimization
Note if f(x) is twice differentiable, then
T O H h x F h x . . ) (
2
1
) , ( + = |
where F(x) is an n*n symmetric, called the Hessian of f(x)








Then



1st variation 2nd variation
(
(
(
(
(
(
(

c
c

c
c
c c
c
c
c

c c
c
c
c
=
: : :
: : :
) (
2
2
2
2
2
1 2
2
1
2
2 1
2
2
1
2
n
n
x x
f
x
f
x x
f
x x
f
x x
f
x
f
x F
T O H h h x F h x f x f h x f . . , *) (
2
1
*), ( *) ( ) * ( + > < + > V < + = +
Directional derivatives
Let w be a directional vector of unit norm || w|| =1
Now consider

is a function of the scalar r.
) * ( ) ( w r x f r g + =
w
*
x
rw
rw x +
*
Def : The directional derivative of f(x) in the direction w (unit norm)
at w* is defined as
> < + > V =<
> < + > V < =
> < + > V <
=
+
=

= =

=
w rw x w x f
w rw x w x f
r
rw rw x rw x f

r
x f rw x f

r
g r g
dr
r dg
x f D
r
r
r
r
r
r w
), *, ( *), (
) ), *, ( *), ( (
), *, ( *), (
*) ( ) * (
0
) 0 ( ) (
|
) (
*) (
lim
lim
lim
lim
lim
0
0
0
0
0
0
|
|
|
0
Directional derivatives
Example :
Let


Then


i.e. the partial derivative of f(x*) w.r.t x
i
is the directional derivative
of f(x) in the direction e
i.

T
i
e w ] 0 1 0 [ = =
i
i i
x
f
e x f x f De
c
c
>= V =< *), ( *) (
Interpretation of
*) (x Dwf
Consider

Then



The directional derivative along a direction w (||w||=1) is the
length of the projection vector of on w.
w w x f x f proj > V =< V *), ( *) (
*) ( *), ( *) (
, *), (
, *) ( , *) ( *) (
2
2
x Dwf w x f x f proj
w w w x f
w x f w x f x f proj
>= V < = V
> < > V < =
> > V < > V =<< V
*) (x Dwf
*) (x f V
1 = w w
*
x
*) (x f proj V
[Q] : What direction w yield the largest directional derivative?
Ans :


Recall that the 1st variation of is



Conclusion 1 : The direction of the gradient is the direction that
yields the largest change (1st -variation) in the
function.

This suggests in the steepest decent method
which will be described later

Unconstrained Minimization
|| *) ( ||
*) (
x f
x f
w
V
V
=
) * ( rw x f +
*) ( *), ( x rDwf w x f r >= V <
) ( 1 k k k x f x x V = + o
Example:
2
2
1 2
2
2
1
, ) ( R
x
x
x x x x f e
(

= + =
(

=
(
(
(
(

c
c
c
c
= V
2
1
2
1
2
2
) (
x
x
x
f
x
f
x f
Let ,


w with unit norm =
(

=
-
1
1
x
(

= V
-
2
2
) (x f
(
(
(
(

=
(

=
V
V
-
-
2
1
2
1
2
2
8
1
) (
) (
x f
x f
Sol :
1
1
) (
-
V x f
1
x
2
x
curcles level
c x x f = ) , (
2 1
Directional derivatives
The directional derivative in the direction of the gradient is









Notes :
2 2
2
4
2
1
2
1
,
2
2
) (
) (
, ), ( ) (
= = )
(
(
(
(

( =
V
V
= ) (V =
-
-
- -

x f
x f
w w x f x f D
w
2 2 2
0
1
,
2
2
), ( ) (
1
1
< = )
(

( = ) (V =
- -
e x f x f D
e
2 2 2
1
0
,
2
2
), ( ) (
2
2
< = )
(

( = ) (V =
- -
e x f x f D
e
Directional derivatives
Directional derivatives
Def : f(x) is said to have a local (or relative) minimum at x*,
if in a nbd N of x*
N x x f x f e s ), ( ) (
*
Theorem: Let f(x) be differentiable ,If f(x) has a local minimum
at x* , then

pf :
) , 0 ) ( ( 0 ) ( w x f D or x f
w
= = V
- -
0 ), (
0 ), ( ,
0 ), (
0 ), ( 0
0 ), , ( ), ( ) ( ) (
1 1
*
< ) (V =
> ) (V -
= ) V
> ) (V
> ) ( + ) (V = +
-
-
-
-
- - -
h x f h h then
h x f h if since
h x f
h x f h as
h h x h x f x f h x f |
Note: is a necessary condition, not sufficient condition. ) (
-
V x f
Directional derivatives
Theorem: If f(x) is twice diff and





pf :

-
-x
f(x) of minimum local a is x then
) v , )v F(x (i.e. v matrix definite positive is ) F(x
x f
*
* T *
0 0 ) 2 (
0 ) ( ) 1 (
= >
= V
-
T O H h h x F h x f x f h x f . . , ) (
2
1
), ( ) ( ) ( + ) ( + ) (V + = +
- - - -
0 ) (
2
1
) ( ) ( > = +
- - -
h x F h x f h x f
T
*
x
N x , x f x f e <
-
) ( ) (
Conclusion2: The necessary & Sufficient Conditions for a local
minimum of f(x) is
p.d is x F
0 ) f (x
*
) ( ) 2 (
) 1 (
-
= V
Minimization of Unconstrained function
Prob. : Let y=f(x) , . We want to generate a sequence


and such that it converges to the minimum of f(x).

Consider the kth guess, , we can generate provided that
we have two of information
(1) the direction to go
(2) a scalar step size
Then


Basic descent methods
(1) Steepest descent
(2) Newton-Raphson method
> > > ) x f( ) x f( ) x f( that such x x x
) 2 ( ) 1 ( ) 0 ( ) 2 ( ) 1 ( ) 0 (
,........ , ,
,
k
x
) 1 ( + k
x
:
k
d
:
k
o
) ( ) ( ) ( ) 1 ( k k k k
d x x o + =
+
n
R xe
Steepest Descent
Steepest descent :
) (
) ( ) ( k k
x f d V =
0 ) (
) ( ) ( ) ( ) ( ) 1 (
> V =
+ k k k k k
with x f x x o o
.
,
(k) (k) (k) (k) (k)
(k)
of function a is )) f(x - f(x ) g( consider
determine To
o o o
o
V =
Note
) ( ) ( ) (
) ( ), ( ) (
)) ( (
) (
2
) ( ) ( ) (
) ( ) ( ) ( ) (
) ( ) ( ) (
k k k k
k k k k
k k k
x f x f x f
x f x f x f
x f x f
< V =
) V (V + ~
V
o
o
o
1.a. Optimum it minimizes
) (k
o
0
) (
), (
) (
) (
) (
=
k
k
k
d
dg
ie g
o
o
o
Steepest Descent
Example :
2
2 1
2
1
,
1
) ( R x
x x
x
x x f e

+ =
(
(
(

= V
(

=
64
21
64
37
) ( ,
3
3
) ( ) ( k k
x f x Suppose
(
(
(

=
+
64
21
64
37
3
3
) ( ) 1 ( k k
x o
1 )
64
21
3 )(
64
37
3 (
)
64
37
3 (
64
37
3 ) (
) ( ) (
2 ) (
) ( ) (

+ =
k k
k
k k
g
o o
o
o o
general) (in calvulate to messy is
d
dg
k
k
k
) (
) (
) (
, 0
) (
o
o
o
=
Steepest Descent
Example : p.d. and c symmetri : Q x b Qx x x f
n n
T T

= ,
2
1
) (
) in parabola a (


(k)
o
o o
o o
o
. .
) (
2
1
) ) (( ) ( ) (
2
1
)) ( ( ) (
) (
) (
) ( ) ( ) (
) ( ) ( ) ( ) ( ) ( ) ( 2 ) (
) ( ) ( ) ( ) (
) ( ) ( ) ( ) 1 (
e i
x b Qx x
d b Qd X Qd d
b Qx x f g
b Qx x x
b Qx x f
k T k T k
k T k T k k k T k k
k k k k
k k k k
+
=
=
=
= V
+
) ( ) ( k k
d b QX =
Steepest Descent
0 ) ( ) (
) (
) ( ) ( ) ( ) ( ) (
) (
) (
= =
k T
T
k k T k k
k
k
d b Q X Qd d
d
dg
o
o
o
) ( ) (
) ( ) (
) (
) (
) (
k T k
k T k
k
Qd d
d d
= o
p.d.) is Q ( Qd d
d
g d
Note
k T k
k
k
0 ) (
) (
) (
) ( ) (
2 ) (
) ( 2
> =
o
o
Optimum iteration



Remark :
The optimal steepest descent step size can be determined
analytically for quadratic function.
b QX d d
Qd d
d d
X X
k k k
k k
k k
k k
=
> <
> <
=
+ ) ( ) ( ) (
) ( ) (
) ( ) (
) ( ) 1 (
,
,
,
Steepest Descent
1.b. other possibilities for choosing
) ( k
o
(1) Constant step size i.e.


adv : simple
disadv : no idea of which value of to choose
If is too large diverge
If is too small very slow

(2) Variable step size
k constant
k
= =o o
) (
) (
) ( ) ( ) 1 ( k k k
x f x x V =
+
o
one. minimized the find g( , ), g( ), g( evaluate
minimized is ) g( } , , , { from choose e i
k 2 1
(k)
k 2 1
(k)
)
. .
o o o
o o o o o


Steepest Descent
1.b. other possibilities for choosing
) ( k
o
(3) Polynomial fit methods

(i) Quadratic fit



gauss three values for , say
1
,
2
,
3
.

Let

Solve for a, b, c minimize by


Check
2
) ( o o o c b a g + + =
) (o g
0 2
) (
= + = o
o
o
c b
d
dg
1,2,3 i , c b a x f x f g
i i
k
i
k
i
= + + = V = o o o o ) ( ( ) (
) ( ) (
0 2
) (
2
2
> = c
d
g d
o
o
) , , , , , (
2
3 2 1 3 2 1
) (
g g g fun
c
b
k
o o o o = =
) (
k
g o
) ( k
o
) (k
o 1
o
2
o
3
o
1
g
2
g
3
g
Steepest Descent
1.b. other possibilities for choosing
) ( k
o
(3) Polynomial fit methods

(ii) Cubic fit
4 3
2
2
3
1
) ( a a a a g + + + = o o o o
(
(
(
(

=
(
(
(
(

(
(
(
(

4
3
2
1
4
3
2
1
4 4 4
3 3 3
2 2 2
1 1 1
1
1
1
1
2 3
2 3
2 3
2 3
g
g
g
g
a
a
a
a
o o o
o o o
o o o
o o o
0 2 6
) (
6
12 4 2
0 2 3
) (
, , ,
2
) (
1
) (
) ( 2
1
3 1
2
2 2 ) (
3 2
2
1
4 3 2 1
> + =

=
= + + =

a a
d
g d
check
a
a a a a
a a a
d
dg

a a a a solve to
k
k
k
k
o
o
o
o
o o
o
o
) (
k
g o
) ( k
o
) (k
o
1
o
2
o
3
o
4
o
1
g
2
g
3
g
3
g
(4) Region elimination methods


Assume g() is convex
over [a,b] i.e. one minimum



(a) g
1
>g
2
(b)g
1
<g
2
(c)g
1
=g
2
Steepest Descent
1.b. other possibilities for choosing
) ( k
o
1
g
2
g
2
o a
b
o
) (o g
1
o
2
g
1
g
2
g
1
g
2
g
1
g
a
b 1
o
2
o a
b 1
o
2
o
a
b 1
o
2
o
eliminated
eliminated eliminated eliminated
initial interval of uncertainty [a,b] , next interval of uncertainty for
(i) is [ ,b]; for (ii) is [a, ]; for (iii) is [ , ]
1
o
1
o
2
o
2
o
Steepest Descent
[Q] : how do we choose and ?


(i) Two points equal interval search

i.e.
1
- a =
1
-
2
=b-
1


1
st
iteration

2
nd
iteration

3
rd
iteration

k
th
iteration

1
o
2
o
a b L =
0
0 1
3
2
L L =
0
2
1 2
)
3
2
(
3
2
L L L = =
0
)
3
2
( L L
k
k
=
a
b
1
o
2
o
Steepest Descent
[Q] : how do we choose and ?

(ii) Fibonacci Search method

For N-search iteration





Example: Let N=5, initial a = 0 , b = 1

1
o
2
o
2 1 1 0
, 1 , 1

+ = = =
k k k
F F F F F
1 , , 2 , 1 , 0 , ) (
1
1
) (
1
= + =
+

N k a a b
F
F
k k k
k N
k N
k
o
1 , , 2 , 1 , 0 , ) (
1
) (
2
= + =
+

N k a a b
F
F
k k k
k N
k N
k
o
13
5
0 ) 0 1 (
6
4
) 0 (
1
= + =
F
F
o
] b , [a g g compare
F
F
1 1 1

= + =
2
6
5
) 0 (
2
,
13
8
0 ) 0 1 (
o
0
13
5
0
1
= o
13
8
0
2
= o
2
g
1
g
1
k=0
1
b
1
a
0 1
13
8
L L =
1 1 1
5
3
) 1 (
1
) ( a a b
F
F
+ =
o
] b , [a g g compare
a a b
F
F
2 2 1

+ =
2
1 1 1
5
4
) 1 (
2
,
) (
o
Steepest Descent
[Q] : how do we choose and ?

(iii) Golden Section Method


then use


until


Example:
then then






etc

N if 382 . 0
1
1
lim
=
+


N
N
N
F
F
618 . 0
1
lim
=
+

N
N
N
F
F
k k k
k
a a b + = ) ( 382 . 0
) (
1
o
k k k
k
a a b + = ) ( 618 . 0
) (
2
o
2 , 1 , 0 = k
c <
k k
a b
] 2 , 0 [ ] , [
0 0
= b a
764 . 0 0 ) 0 2 ( 382 . 0
) 0 (
1
= + = o
236 . 1 0 ) 0 2 ( 618 . 0
) 0 (
2
= + = o


0
764 . 0 236 . 1
2
g
1
g
1
] 236 . 1 , 0 [ ] , [
1 1
= b a
472 . 0 0 ) 0 236 . 1 ( 382 . 0
) 1 (
1
= + = o
763 . 0 0 ) 0 236 . 1 ( 618 . 0
) 1 (
2
= + = o
0
472 . 0 763 . 0
2
g
1
g
236 . 1
1
o
2
o
Steepest Descent
Flow chart of steepest descent
Initial guess x
(0)

Compute f(x
(k)
)
f(x
(k)
)
Determine
(k)

x
(k+1)c
=x
(k)
-
(k)
f(x
(k)
)
k=k+1
Stop!
x
(k)
is minimum
Yes
No





{
1

n
}
Polynomial fit : cubic ,
Region elimination :
[Q]: is the direction of the best direction to go?

suppose the initial guess is x
(0)

Consider the next guess


What should M be such that x
(1)
is the minimum, i.e. ?

Since we want




If MQ=Ior M=Q
-1

Thusfor a quadratic functionx
(k+1)
=x
(k)
-Q
-1
f(x
(k)
) will take us
to the minimum in one iteration no matter what x(0) is.

Steepest Descent
) (x f V
b Qx x f x b Qx x x f
T T
= V = ) (
2
1
) (
) (
) (
) 0 ( ) 0 (
) 0 ( ) 1 (
b Qx M x
matrix n n : M , x f M x x
=
V =
0 ) (
) 1 (
= V x f
0 ) (
) 1 ( ) 1 (
= = V b Qx x f
b b Qx M x Q = )) ( (
) 0 ( ) 0 (
0
) 0 ( ) 0 (
= + = b QMb QMQx Qx
Newton-Raphson Method
Minimize f(x)
The necessary condition f(x)=0
The N-R algorithm is to find the roots of f(x)=0






Guess x
(k)
then x
(k+1)
must satisfy




Note not
always converge
) (
) ( ) (
) (
) 1 ( ) (
) (
k
x
k
k k
k
dx
x f d
x x
x f V
=

V
+
) ( )
) (
(
) ( 1 ) ( ) 1 (
) (
k
x
k k
x f
dx
x f d
x x
k
V
V
=
+
) (
k
x f V
) (x f V
1 + k
x
k
x
k k
x x
+1
) (x f V
1
2
x
Newton-Raphson Method
A more formal derivation
Min f(x
(k)
+h) w.r.t h
h x F h h x f x f h x f
k k k k
) ( ,
2
1
, ( ) ( ) (
) ( ) ( ) ( ) (
+ V + ~ +
0 ) ( ) ( ) (
) ( ) ( ) (
= + V ~ + V h x F x f h x f
k k k
h
| | ) ( ) (
) (
1
) ( k k
x f x F h V =

h x x
k k
+ =
+ ) ( ) 1 (
) ( )] ( [
) ( 1 ) ( ) ( k k k
x f x F x V ~

k
x 1 + k
x
2 + k
x
3 + k
x
x
) (x f V
Newton-Raphson Method
Remarks
1computation of [F(x
(k)
)]
-1
at every iteration
time consuming
modify N-R algorithm to calculate [F(x
(k)
)]
-1
every M-th
iteration

2must check F(x
(k)
) is p.d. at every iteration.

If not


Example
x F x F
k k
+ = ) ( ) (

) ( ) (
0
0
0
1
>
(

=
n

+ +

=
x x
x x f
1
1
) , (
2
2
2
1
2 1
(
(
(
(

+ +
+ +
= V
2
2
2
2
1
2
2
2
2
2
1
1
) 1 (
2
) 1 (
2
x x
x
x x
x
f
Newton-Raphson Method
(

+
+ +
+ +
=
1 3 4
4 1 3
) 1 (
2
) (
2
2
2
1 2 1
2 1
2
2
2
1
3
2
2
2
1
x x x x
x x x x
x x
x F
The minimum of f(x) is at (0,0)

In the nbd of (0,0) is p.d.

Now suppose we start an initial guess




Then

diverges.

then , x
(

=
0
1
) 0 (
(
(

= V
0
2
1
) (
) 0 (
x f
(
(
(

=
2
1
0
0
2
1
) (
) 0 (
x F
(
(

= V =

0
4
5
) ( )) ( (
) 0 ( 1 ) 0 ( ) 0 ( ) 1 (
x f x F x x
(

=
1 0
0 1
2 )) 0 , 0 (( F
Remark
3N-R algorithm is good(fast) when initial guess close to
minimum but not very good when far from minimum.

You might also like