Optimization

Optimization
Unconstrained Minimization
Def : f(x), x is said to be differentiable at a point x*, if it is defined in a
neighborhood N around x* and if x* +h a vector
n
independent of
h that

where the vector a is called the gradient of f(x) evaluated at x*,
denote it as
> < + > < + = + ) *, ( , , *) ( ) * ( h x h h a x f h x f
T
x
n
x
f
x
f
x
f
x f a
*
] [ ) (
2 1
*
c
c
c
c
c
c
= V =
The term <a,h> is called the 1-st variation.

and
( )
T
n
h x h x h x ] ) *, ( ) *, ( [ ,
1
*
| | | =
n i h x
h
i
,.... 2 , 1 , 0 ) *, ( lim
0
= =
|
Note if f(x) is twice differentiable, then
T O H h x F h x . . ) (
2
1
) , ( + = |
where F(x) is an n*n symmetric, called the Hessian of f(x)

Then

1st variation 2nd variation
(
(
(
(
(
(
(
c
c

c
c
c c
c
c
c

c c
c
c
c
=
: : :
: : :
) (
2
2
2
2
2
1 2
2
1
2
2 1
2
2
1
2
n
n
x x
f
x
f
x x
f
x x
f
x x
f
x
f
x F
T O H h h x F h x f x f h x f . . , *) (
2
1
*), ( *) ( ) * ( + > < + > V < + = +
Directional derivatives
Let w be a directional vector of unit norm || w|| =1
Now consider

is a function of the scalar r.
) * ( ) ( w r x f r g + =
w
*
x
rw
rw x +
*
Def : The directional derivative of f(x) in the direction w (unit norm)
at w* is defined as
> < + > V =<
> < + > V < =
> < + > V <
=
+
=
= =
=
w rw x w x f
w rw x w x f
r
rw rw x rw x f

r
x f rw x f

r
g r g
dr
r dg
x f D
r
r
r
r
r
r w
), *, ( *), (
) ), *, ( *), ( (
), *, ( *), (
*) ( ) * (
0
) 0 ( ) (
|
) (
*) (
lim
lim
lim
lim
lim
0
0
0
0
0
0
|
|
|
0
Example :
Let

Then

i.e. the partial derivative of f(x*) w.r.t x
i
is the directional derivative
of f(x) in the direction e
i.

T
i
e w ] 0 1 0 [ = =
i
i i
x
f
e x f x f De
c
c
>= V =< *), ( *) (
Interpretation of
*) (x Dwf
Consider

Then

The directional derivative along a direction w (||w||=1) is the
length of the projection vector of on w.
w w x f x f proj > V =< V *), ( *) (
*) ( *), ( *) (
, *), (
, *) ( , *) ( *) (
2
2
x Dwf w x f x f proj
w w w x f
w x f w x f x f proj
>= V < = V
> < > V < =
> > V < > V =<< V
*) (x Dwf
*) (x f V
1 = w w
*
x
*) (x f proj V
[Q] : What direction w yield the largest directional derivative?
Ans :

Recall that the 1st variation of is

Conclusion 1 : The direction of the gradient is the direction that
yields the largest change (1st -variation) in the
function.

This suggests in the steepest decent method
which will be described later

|| *) ( ||
*) (
x f
x f
w
V
V
=
) * ( rw x f +
*) ( *), ( x rDwf w x f r >= V <
) ( 1 k k k x f x x V = + o
Example:
2
2
1 2
2
2
1
, ) ( R
x
x
x x x x f e
(
= + =
(
=
(
(
(
(
c
c
c
c
= V
2
1
2
1
2
2
) (
x
x
x
f
x
f
x f
Let ,

w with unit norm =
(
=
-
1
1
x
(
= V
-
2
2
) (x f
(
(
(
(
=
(
=
V
V
-
-
2
1
2
1
2
2
8
1
) (
) (
x f
x f
Sol :
1
1
) (
-
V x f
1
x
2
x
curcles level
c x x f = ) , (
2 1
The directional derivative in the direction of the gradient is

Notes :
2 2
2
4
2
1
2
1
,
2
2
) (
) (
, ), ( ) (
= = )
(
(
(
(
( =
V
V
= ) (V =
-
-
- -

x f
x f
w w x f x f D
w
2 2 2
0
1
,
2
2
), ( ) (
1
1
< = )
(
( = ) (V =
- -
e x f x f D
e
2 2 2
1
0
,
2
2
), ( ) (
2
2
< = )
(
( = ) (V =
- -
e x f x f D
e
Def : f(x) is said to have a local (or relative) minimum at x*,
if in a nbd N of x*
N x x f x f e s ), ( ) (
*
Theorem: Let f(x) be differentiable ,If f(x) has a local minimum
at x* , then

pf :
) , 0 ) ( ( 0 ) ( w x f D or x f
w
= = V
- -
0 ), (
0 ), ( ,
0 ), (
0 ), ( 0
0 ), , ( ), ( ) ( ) (
1 1
*
< ) (V =
> ) (V -
= ) V
> ) (V
> ) ( + ) (V = +
-
-
-
-
- - -
h x f h h then
h x f h if since
h x f
h x f h as
h h x h x f x f h x f |
Note: is a necessary condition, not sufficient condition. ) (
-
V x f
Theorem: If f(x) is twice diff and

pf :

-
-x
f(x) of minimum local a is x then
) v , )v F(x (i.e. v matrix definite positive is ) F(x
x f
*
* T *
0 0 ) 2 (
0 ) ( ) 1 (
= >
= V
-
T O H h h x F h x f x f h x f . . , ) (
2
1
), ( ) ( ) ( + ) ( + ) (V + = +
- - - -
0 ) (
2
1
) ( ) ( > = +
- - -
h x F h x f h x f
T
*
x
N x , x f x f e <
-
) ( ) (
Conclusion2: The necessary & Sufficient Conditions for a local
minimum of f(x) is
p.d is x F
0 ) f (x
*
) ( ) 2 (
) 1 (
-
= V
Minimization of Unconstrained function
Prob. : Let y=f(x) , . We want to generate a sequence

and such that it converges to the minimum of f(x).

Consider the kth guess, , we can generate provided that
we have two of information
(1) the direction to go
(2) a scalar step size
Then

Basic descent methods
(1) Steepest descent
(2) Newton-Raphson method
> > > ) x f( ) x f( ) x f( that such x x x
) 2 ( ) 1 ( ) 0 ( ) 2 ( ) 1 ( ) 0 (
,........ , ,
,
k
x
) 1 ( + k
x
:
k
d
:
k
o
) ( ) ( ) ( ) 1 ( k k k k
d x x o + =
+
n
R xe
Steepest Descent
Steepest descent :
) (
) ( ) ( k k
x f d V =
0 ) (
) ( ) ( ) ( ) ( ) 1 (
> V =
+ k k k k k
with x f x x o o
.
,
(k) (k) (k) (k) (k)
(k)
of function a is )) f(x - f(x ) g( consider
determine To
o o o
o
V =
Note
) ( ) ( ) (
) ( ), ( ) (
)) ( (
) (
2
) ( ) ( ) (
) ( ) ( ) ( ) (
) ( ) ( ) (
k k k k
k k k k
k k k
x f x f x f
x f x f x f
x f x f
< V =
) V (V + ~
V
o
o
o
1.a. Optimum it minimizes
) (k
o
0
) (
), (
) (
) (
) (
=
k
k
k
d
dg
ie g
o
o
o
Steepest Descent
Example :
2
2 1
2
1
,
1
) ( R x
x x
x
x x f e
+ =
(
(
(
= V
(
=
64
21
64
37
) ( ,
3
3
) ( ) ( k k
x f x Suppose
(
(
(
=
+
64
21
64
37
3
3
) ( ) 1 ( k k
x o
1 )
64
21
3 )(
64
37
3 (
)
64
37
3 (
64
37
3 ) (
) ( ) (
2 ) (
) ( ) (

+ =
k k
k
k k
g
o o
o
o o
general) (in calvulate to messy is
d
dg
k
k
k
) (
) (
) (
, 0
) (
o
o
o
=
Steepest Descent
Example : p.d. and c symmetri : Q x b Qx x x f
n n
T T
= ,
2
1
) (
) in parabola a (

(k)
o
o o
o o
o
. .
) (
2
1
) ) (( ) ( ) (
2
1
)) ( ( ) (
) (
) (
) ( ) ( ) (
) ( ) ( ) ( ) ( ) ( ) ( 2 ) (
) ( ) ( ) ( ) (
) ( ) ( ) ( ) 1 (
e i
x b Qx x
d b Qd X Qd d
b Qx x f g
b Qx x x
b Qx x f
k T k T k
k T k T k k k T k k
k k k k
k k k k
+
=
=
=
= V
+
) ( ) ( k k
d b QX =
Steepest Descent
0 ) ( ) (
) (
) ( ) ( ) ( ) ( ) (
) (
) (
= =
k T
T
k k T k k
k
k
d b Q X Qd d
d
dg
o
o
o
) ( ) (
) ( ) (
) (
) (
) (
k T k
k T k
k
Qd d
d d
= o
p.d.) is Q ( Qd d
d
g d
Note
k T k
k
k
0 ) (
) (
) (
) ( ) (
2 ) (
) ( 2
> =
o
o
Optimum iteration

Remark :
The optimal steepest descent step size can be determined
analytically for quadratic function.
b QX d d
Qd d
d d
X X
k k k
k k
k k
k k
=
> <
> <
=
+ ) ( ) ( ) (
) ( ) (
) ( ) (
) ( ) 1 (
,
,
,
Steepest Descent
1.b. other possibilities for choosing
) ( k
o
(1) Constant step size i.e.

adv : simple
disadv : no idea of which value of to choose
If is too large diverge
If is too small very slow

(2) Variable step size
k constant
k
= =o o
) (
) (
) ( ) ( ) 1 ( k k k
x f x x V =
+
o
one. minimized the find g( , ), g( ), g( evaluate
minimized is ) g( } , , , { from choose e i
k 2 1
(k)
k 2 1
(k)
)
. .
o o o
o o o o o

Steepest Descent
) ( k
o
(3) Polynomial fit methods

(i) Quadratic fit

gauss three values for , say
1
,
2
,
3
.

Let

Solve for a, b, c minimize by

Check
2
) ( o o o c b a g + + =
) (o g
0 2
) (
= + = o
o
o
c b
d
dg
1,2,3 i , c b a x f x f g
i i
k
i
k
i
= + + = V = o o o o ) ( ( ) (
) ( ) (
0 2
) (
2
2
> = c
d
g d
o
o
) , , , , , (
2
3 2 1 3 2 1
) (
g g g fun
c
b
k
o o o o = =
) (
k
g o
) ( k
o
) (k
o 1
o
2
o
3
o
1
g
2
g
3
g
Steepest Descent
) ( k
o
(3) Polynomial fit methods

(ii) Cubic fit
4 3
2
2
3
1
) ( a a a a g + + + = o o o o
(
(
(
(
=
(
(
(
(
(
(
(
(
4
3
2
1
4
3
2
1
4 4 4
3 3 3
2 2 2
1 1 1
1
1
1
1
2 3
2 3
2 3
2 3
g
g
g
g
a
a
a
a
o o o
o o o
o o o
o o o
0 2 6
) (
6
12 4 2
0 2 3
) (
, , ,
2
) (
1
) (
) ( 2
1
3 1
2
2 2 ) (
3 2
2
1
4 3 2 1
> + =

=
= + + =
a a
d
g d
check
a
a a a a
a a a
d
dg

a a a a solve to
k
k
k
k
o
o
o
o
o o
o
o
) (
k
g o
) ( k
o
) (k
o
1
o
2
o
3
o
4
o
1
g
2
g
3
g
3
g
(4) Region elimination methods

Assume g() is convex
over [a,b] i.e. one minimum

(a) g
1
>g
2
(b)g
1
<g
2
(c)g
1
=g
2
Steepest Descent
) ( k
o
1
g
2
g
2
o a
b
o
) (o g
1
o
2
g
1
g
2
g
1
g
2
g
1
g
a
b 1
o
2
o a
b 1
o
2
o
a
b 1
o
2
o
eliminated
eliminated eliminated eliminated
initial interval of uncertainty [a,b] , next interval of uncertainty for
(i) is [ ,b]; for (ii) is [a, ]; for (iii) is [ , ]
1
o
1
o
2
o
2
o
Steepest Descent
[Q] : how do we choose and ?

(i) Two points equal interval search

i.e.
1
- a =
1
-
2
=b-
1

1
st
iteration

2
nd
iteration

3
rd
iteration

k
th
iteration

1
o
2
o
a b L =
0
0 1
3
2
L L =
0
2
1 2
)
3
2
(
3
2
L L L = =
0
)
3
2
( L L
k
k
=
a
b
1
o
2
o
Steepest Descent

(ii) Fibonacci Search method

For N-search iteration

Example: Let N=5, initial a = 0 , b = 1

1
o
2
o
2 1 1 0
, 1 , 1

+ = = =
k k k
F F F F F
1 , , 2 , 1 , 0 , ) (
1
1
) (
1
= + =
+

N k a a b
F
F
k k k
k N
k N
k
o
1 , , 2 , 1 , 0 , ) (
1
) (
2
= + =
+
N k a a b
F
F
k k k
k N
k N
k
o
13
5
0 ) 0 1 (
6
4
) 0 (
1
= + =
F
F
o
] b , [a g g compare
F
F
1 1 1

= + =
2
6
5
) 0 (
2
,
13
8
0 ) 0 1 (
o
0
13
5
0
1
= o
13
8
0
2
= o
2
g
1
g
1
k=0
1
b
1
a
0 1
13
8
L L =
1 1 1
5
3
) 1 (
1
) ( a a b
F
F
+ =
o
] b , [a g g compare
a a b
F
F
2 2 1

+ =
2
1 1 1
5
4
) 1 (
2
,
) (
o
Steepest Descent

(iii) Golden Section Method

then use

until

Example:
then then

etc

N if 382 . 0
1
1
lim
=
+

N
N
N
F
F
618 . 0
1
lim
=
+

N
N
N
F
F
k k k
k
a a b + = ) ( 382 . 0
) (
1
o
k k k
k
a a b + = ) ( 618 . 0
) (
2
o
2 , 1 , 0 = k
c <
k k
a b
] 2 , 0 [ ] , [
0 0
= b a
764 . 0 0 ) 0 2 ( 382 . 0
) 0 (
1
= + = o
236 . 1 0 ) 0 2 ( 618 . 0
) 0 (
2
= + = o

0
764 . 0 236 . 1
2
g
1
g
1
] 236 . 1 , 0 [ ] , [
1 1
= b a
472 . 0 0 ) 0 236 . 1 ( 382 . 0
) 1 (
1
= + = o
763 . 0 0 ) 0 236 . 1 ( 618 . 0
) 1 (
2
= + = o
0
472 . 0 763 . 0
2
g
1
g
236 . 1
1
o
2
o
Steepest Descent
Flow chart of steepest descent
Initial guess x
(0)

Compute f(x
(k)
)
f(x
(k)
)
Determine
(k)

x
(k+1)c
=x
(k)
-
(k)
f(x
(k)
)
k=k+1
Stop!
x
(k)
is minimum
Yes
No

{
1
n
}
Polynomial fit : cubic ,
Region elimination :
[Q]: is the direction of the best direction to go?

suppose the initial guess is x
(0)

Consider the next guess

What should M be such that x
(1)
is the minimum, i.e. ?

Since we want

If MQ=Ior M=Q
-1

Thusfor a quadratic functionx
(k+1)
=x
(k)
-Q
-1
f(x
(k)
) will take us
to the minimum in one iteration no matter what x(0) is.

Steepest Descent
) (x f V
b Qx x f x b Qx x x f
T T
= V = ) (
2
1
) (
) (
) (
) 0 ( ) 0 (
) 0 ( ) 1 (
b Qx M x
matrix n n : M , x f M x x
=
V =
0 ) (
) 1 (
= V x f
0 ) (
) 1 ( ) 1 (
= = V b Qx x f
b b Qx M x Q = )) ( (
) 0 ( ) 0 (
0
) 0 ( ) 0 (
= + = b QMb QMQx Qx
Newton-Raphson Method
Minimize f(x)
The necessary condition f(x)=0
The N-R algorithm is to find the roots of f(x)=0

Guess x
(k)
then x
(k+1)
must satisfy

Note not
always converge
) (
) ( ) (
) (
) 1 ( ) (
) (
k
x
k
k k
k
dx
x f d
x x
x f V
=
V
+
) ( )
) (
(
) ( 1 ) ( ) 1 (
) (
k
x
k k
x f
dx
x f d
x x
k
V
V
=
+
) (
k
x f V
) (x f V
1 + k
x
k
x
k k
x x
+1
) (x f V
1
2
x
A more formal derivation
Min f(x
(k)
+h) w.r.t h
h x F h h x f x f h x f
k k k k
) ( ,
2
1
, ( ) ( ) (
) ( ) ( ) ( ) (
+ V + ~ +
0 ) ( ) ( ) (
) ( ) ( ) (
= + V ~ + V h x F x f h x f
k k k
h
| | ) ( ) (
) (
1
) ( k k
x f x F h V =

h x x
k k
+ =
+ ) ( ) 1 (
) ( )] ( [
) ( 1 ) ( ) ( k k k
x f x F x V ~

k
x 1 + k
x
2 + k
x
3 + k
x
x
) (x f V
Remarks
1computation of [F(x
(k)
)]
-1
at every iteration
time consuming
modify N-R algorithm to calculate [F(x
(k)
)]
-1
every M-th
iteration

2must check F(x
(k)
) is p.d. at every iteration.

If not

Example
x F x F
k k
+ = ) ( ) (
) ( ) (
0
0
0
1
>
(
=
n
+ +
=
x x
x x f
1
1
) , (
2
2
2
1
2 1
(
(
(
(
+ +
+ +
= V
2
2
2
2
1
2
2
2
2
2
1
1
) 1 (
2
) 1 (
2
x x
x
x x
x
f
(
+
+ +
+ +
=
1 3 4
4 1 3
) 1 (
2
) (
2
2
2
1 2 1
2 1
2
2
2
1
3
2
2
2
1
x x x x
x x x x
x x
x F
The minimum of f(x) is at (0,0)

In the nbd of (0,0) is p.d.

Now suppose we start an initial guess

Then

diverges.

then , x
(
=
0
1
) 0 (
(
(
= V
0
2
1
) (
) 0 (
x f
(
(
(
=
2
1
0
0
2
1
) (
) 0 (
x F
(
(
= V =

0
4
5
) ( )) ( (
) 0 ( 1 ) 0 ( ) 0 ( ) 1 (
x f x F x x
(
=
1 0
0 1
2 )) 0 , 0 (( F
Remark
3N-R algorithm is good(fast) when initial guess close to
minimum but not very good when far from minimum.

Optimization

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimization

Uploaded by

Copyright:

Available Formats

Optimization

You might also like