Chapter 6

Chapter 6
Chapter 6
UNCONSTRAINED MULTIVARIABLE
OPTIMIZATION
1
6.1 Function Values Only
6.2 First Derivatives of f (gradient and conjugate

Chapter 6
direction methods)
6.3 Second Derivatives of f (e.g., Newton’s

method)
6.4 Quasi-Newton methods

2
Chapter 6
3
Chapter 6
4
Chapter 6
5
Chapter 6
6
Chapter 6
7
Chapter 6
8
General Strategy for Gradient methods
k
(1) Calculate a search direction s
(2) Select a step length in that direction to reduce f(x)
k 1 k k k k
x  x   s  x  x
Chapter 6
Steepest Descent
Search Direction
k k
s  f ( x ) Don’t need to normalize
Method terminates at any stationary point. Why?

f ( x )  0
9
So procedure can stop at saddle point. Need to show
*
H ( x ) is positive definite for a minimum.
Step Length
Chapter 6
How to pick 
• analytically
• numerically
10
Chapter 6
11
Chapter 6
12
Chapter 6
13
Analytical Method
How does one minimize a function in a search direction
using an analytical method?
It means s is fixed and you want to pick , the step

length to minimize f(x). Note  x k   s k .
Chapter 6
k k k 1 k k k 1 k k k
f (x   s )  f (x )  f ( x )  T f ( x )(  x )  (  x )T H ( x )(  x )
2
k k
df ( x   s ) k k k k k
 0  T f ( x )(s )   (s )T H ( x )(s )
d
Solve for 
k k
T f ( x )(s ) (6.9)
 
k k k
(s )T H ( x )(s )
This yields a minimum of the approximating function.

14
Numerical Method
Use coarse search first
(1) Fixed  ( = 1) or variable  ( = 1, 2, ½, etc.)
Chapter 6
Options for optimizing 

(1) Use interpolation such as quadratic, cubic
(2) Region Elimination (Golden Search)
(3) Newton, Secant, Quasi-Newton
(4) Random
(5) Analytical optimization
(1), (3), and (5) are preferred. However, it may

not be desirable to exactly optimize  (better to
generate new search directions).
15
Suppose we calculate the gradient at the point x T = [2 2]
Chapter 6
16
Chapter 6
17
Chapter 6
18
Termination Criteria
f(x)
Big change in f(x) but little change
in x. Code will stop if x is sole criterion.
x
f(x)
Big change in x but little change
Chapter 6
in f(x). Code will stop if x is sole criterion.
x
For minimization you can use up to three criteria for termination:
(1) f ( x k )  f ( x k 1 ) except when f ( x k )  0

1
k
f (x ) then use f ( x k )  f ( x k 1 ) 2
(2) xi k 1  xi k except when x k  0

3
xi k then use x k 1  x k 4
(3) 19
f ( x ) 5
k
or si 6
k
Conjugate Search Directions
• Improvement over gradient method for general quadratic

functions
• Basis for many NLP techniques
• Two search directions are conjugate relative to Q if
Chapter 6
( s i )T Q ( s j )  0
• To minimize f(xnx1) when H is a constant matrix (=Q), you
are guaranteed to reach the optimum in n conjugate
direction stages if you minimize exactly at each stage
(one-dimensional search)
20
Chapter 6
21
Conjugate Gradient Method
Step 1. At x 0 calculate f (x 0 ). Let
s 0  f ( x 0 )
Step 2. Savef ( x 0 ) and compute
x1  x 0   0 s 0
by minimizing f(x) with respect to  in the s0 direction (i.e., carry out a unidimensional search for 0).
Chapter 6
Step 3. Calculate f ( x1 ), f ( x1 ). The new search direction is a linear combination of s 0 and f ( x1 ) :

T f ( x1 )f ( x1 )
s  f ( x )  s T
1 1 0
 f ( x 0 )f ( x 0 )
For the kth iteration the relation is
k 1 k 1 T f ( x k 1 )f ( x k 1 )
s  f ( x )s k
(6.6)
T f ( x k )f ( x k )
For a quadratic function it can be shown that these successive search directions are conjugate.
After n iterations (k = n), the quadratic function is minimized. For a nonquadratic function,
the procedure cycles again with xn+1 becoming x0.
Step 4. Test for convergence to the minimum of f(x). If convergence is not attained, return to step 3.
Step n. Terminate the algorithm when f ( x ) is less than some prescribed tolerance.
k
22
Chapter 6
23
Chapter 6
24
Chapter 6
25
Chapter 6
26
Minimize f  ( x1  3)  9( x2  5) using the method of conjugate gradients with
2 2
x10  1 and x20  1 as an initial point.
1
In vector notation, x 0   
1
4 
f x0
  
72 
Chapter 6
For steepest descent,

 4
s 0  f x0
 
72 
Steepest Descent Step (1-D Search)
1 4 
x1      0   ,  0  0.
1  72 
The objective function can be expressed as a function of 0 as follows:

f ( 0 )  (4 0  2) 2  9(72 0  4) 2 .
Minimizing f(0), we obtain f = 3.1594 at 0 = 0.0555. Hence

1.223  27
x1   
5.011
Calculate Weighting of Previous step
The new gradient can now be determined as
 3.554
f x1   
0.197 
and 0 can be computed as
(3.554) 2  (0.197) 2
 
0
 0.00244.
(4) 2  (72) 2
Generate New (Conjugate) Search Direction
Chapter 6
 3.554   4   3.564 
s1     0.00244    
 0.197  72   0.022 
and
1.223  1  3.564 
x2       0.022
5.011  
One dimensional Search
Solving for 1 as before [i.e., expressing f(x1) as a function of 1 and minimizing

with respect to 1] yields f = 5.91 x 10-10 at 1 = 0.4986. Hence
3.0000 
X 
2

5.0000 
28
which is the optimum (in 2 steps, which agrees with the theory).
Chapter 6
29
Chapter 6
30
Chapter 6
31
Fletcher – Reeves Conjugate Gradient Method
0 0
Let s  f ( x )
1 1 0
s  f ( x )   1s
2 2 1
s  f ( x )   s 2
Chapter 6
k k k 1
 are chosen to make s H s
k
 0 (conjugate directions)
k
Derivation: (let H  H )
k 1 k k k
f ( x )  f ( x )   2f ( x )( x  x )
k 1 k k k
f ( x )  f ( x )  H x  H k s
32
s  H f ( x )  f ( x )  k
k 1 k 1 k
T
(s )  f ( x )  f ( x ) H /  k
k T k 1 k1
k k 1
Using definition of conjugate directions, (s )T Hs =0,
T
 f ( x k 1
)  f ( x ) H H  f ( x )   k s   0
1
k k 1 k

Chapter 6
k k 1
f T ( x )f ( x )0
k 1 k
and f T ( x )s  0, and solving for the weighting factor:
k 1 k 1
T f ( x )f ( x )
 
k
k k
T f ( x )f ( x )
k 1 k 1 k
s  f ( x )  k s
33
Linear vs. Quadratic Approximation of f(x)
k k k 1 k k k
f ( x )  f ( x )  ( x  x )T f ( x )  ( x  x )T H ( x )( x  x )
2
k k k
x  x  x   k s
(1) Using a linear approximation of f ( x ) :
Chapter 6
df ( x ) k k
T
 0   f ( x ) so cannot solve for  x !
d ( x )
(2) Using a quadratic approximation for f (x) :
df ( x ) k k k  Newton's method
T
 0   f ( x )  H ( x )( x  x )
d ( x )  solves one of these
or
k 1 k
x  x  H ( x )f ( x )
k  with x  x k 1

(simultaneous
equation-solving)
34
Note: Both direction and step length are determined
- Requires second derivatives (Hessian)
1
- H, H must be positive definite (for minimum) to guarantee convergence
- Iterate if f ( x ) is not quadratic
Chapter 6
Modified Newton's Procedure:

k 1 k 1 k k
x  x   k H ( x )f ( x )
 k  1 for Newton's Method
(If H  I, you have steepest descent)
Example
f ( x )  x12  20 x2 2
Minimize f starting at x 0   1 1
T
35
Chapter 6
36
Chapter 6
37
Chapter 6
38
Marquardt’s Method
1
If H ( x ) or H ( x ) is not always positive definite, make it
positive definite.
1 1
Let H ( x )  H ( x )   I  ; similar for H( x )
  
 is a positive constant large enough to shift all the
Chapter 6
negative eigenvalues of H ( x ).
Example
0
At the start of the search, H( x ) is evaluated at x and
Not positive definite
0 1 2
found to be H ( x )    as the eigenvalues
 2 1  are e1  3, e2  1
0
Modify H ( x ) to be (  2)
Positive definite as the
1  2 2 
H

 eigenvalues are e1  5, e2  1
 2 1  2 
39
 is adjusted as search proceeds.
Step 1
0
Pick x the starting point.
Let   convergence criterion
Step 2
Chapter 6
Set k  0. Let  0  10 3
Step 3
k
Calculate f ( x )
Step 4
Is f ( x )k )   ? If yes, terminate. If no, continue.
40
Step 5
1
Calculate s( x )  - H   I  f ( x )
k k k k k
Step 6
k 1 k k
Calculate x  x  s( x )
Chapter 6
Step 7
k 1 k
Is f( x )  f ( x )? If yes, go to step 8. If no, go to step 9.
Step 8
1 k
Set  k 1   and k  k  1. Go to step 3
4
Step 9
Set  k  2 k . Go to step 5 41
Secant Methods
Recall for one dimensional search the secant method
only uses values of f(x) and f ′(x).
1
 f ( x )  f ( x ) 
k p
k 1
x  x  k
 f ( x k
)
 x x
k p

Chapter 6
Approximate f (x ) by a straight line (the secant).

Hence it is called a "Quasi-Newton" method.
The basic idea (for a quadratic function):
k k k 1 k 1 k
f ( x )  H  x  0 or (x  x )  H f ( x )
Pick two points to start (x k  Ref. point)
k k
f ( x 2 )  f ( x )  H ( x 2  x )
k k
f ( x 1 )  f ( x )  H ( x 1  x )
k
f ( x 2 )  f ( x 1 )  y  H ( x 2  x 1 ) 42
For a non-quadratic function, H would be calculated,
k k 1
after taking a step from x to x , by solving the
secant equations
k k k 1 k
y  H  x or 
x  H y
Chapter 6
- An infinite number of candidates exist for H when n  1

-1 -1
 
- We want to choose H (or H ) close to H (or H ) in
some sense. Several methods can be used to update H
43
• Probably the best update formula is the BFGS update
(Broyden – Fletcher – Goldfarb – Shanno) – ca. 1970
• BFGS is the basis for the unconstrained optimizer

in the Excel Solver
Chapter 6
• Does not require inverting the Hessian matrix but

approximates the inverse with values of f
44
Chapter 6
45
Chapter 6
46
Chapter 6
47
Chapter 6
48

Chapter 6

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6

Uploaded by

Copyright:

Available Formats

Chapter 6

6.2 First Derivatives of f (gradient and conjugate

6.3 Second Derivatives of f (e.g., Newton’s

6.4 Quasi-Newton methods

Method terminates at any stationary point. Why?

It means s is fixed and you want to pick , the step

This yields a minimum of the approximating function.

Options for optimizing 

(1), (3), and (5) are preferred. However, it may

in f(x). Code will stop if x is sole criterion.

(1) f ( x k )  f ( x k 1 ) except when f ( x k )  0

(2) xi k 1  xi k except when x k  0

• Improvement over gradient method for general quadratic

Step 3. Calculate f ( x1 ), f ( x1 ). The new search direction is a linear combination of s 0 and f ( x1 ) :

x10  1 and x20  1 as an initial point.

For steepest descent,

The objective function can be expressed as a function of 0 as follows:

Minimizing f(0), we obtain f = 3.1594 at 0 = 0.0555. Hence

Solving for 1 as before [i.e., expressing f(x1) as a function of 1 and minimizing

Modified Newton's Procedure:

Is f ( x )k )   ? If yes, terminate. If no, continue.

Approximate f (x ) by a straight line (the secant).

- An infinite number of candidates exist for H when n  1

• BFGS is the basis for the unconstrained optimizer

• Does not require inverting the Hessian matrix but

You might also like