You are on page 1of 48

Chapter 6

Chapter 6

UNCONSTRAINED MULTIVARIABLE
OPTIMIZATION

1
6.1 Function Values Only

6.2 First Derivatives of f (gradient and conjugate


Chapter 6

direction methods)

6.3 Second Derivatives of f (e.g., Newton’s


method)

6.4 Quasi-Newton methods


2
Chapter 6

3
Chapter 6

4
Chapter 6

5
Chapter 6

6
Chapter 6

7
Chapter 6

8
General Strategy for Gradient methods

k
(1) Calculate a search direction s
(2) Select a step length in that direction to reduce f(x)
k 1 k k k k
x  x   s  x  x
Chapter 6

Steepest Descent
Search Direction
k k
s  f ( x ) Don’t need to normalize

Method terminates at any stationary point. Why?


f ( x )  0

9
So procedure can stop at saddle point. Need to show
*
H ( x ) is positive definite for a minimum.

Step Length
Chapter 6

How to pick 
• analytically
• numerically

10
Chapter 6

11
Chapter 6

12
Chapter 6

13
Analytical Method
How does one minimize a function in a search direction
using an analytical method?

It means s is fixed and you want to pick , the step


length to minimize f(x). Note  x k   s k .
Chapter 6

k k k 1 k k k 1 k k k
f (x   s )  f (x )  f ( x )  T f ( x )(  x )  (  x )T H ( x )(  x )
2
k k
df ( x   s ) k k k k k
 0  T f ( x )(s )   (s )T H ( x )(s )
d
Solve for 
k k
T f ( x )(s ) (6.9)
 
k k k
(s )T H ( x )(s )

This yields a minimum of the approximating function.


14
Numerical Method
Use coarse search first
(1) Fixed  ( = 1) or variable  ( = 1, 2, ½, etc.)
Chapter 6

Options for optimizing 


(1) Use interpolation such as quadratic, cubic
(2) Region Elimination (Golden Search)
(3) Newton, Secant, Quasi-Newton
(4) Random
(5) Analytical optimization

(1), (3), and (5) are preferred. However, it may


not be desirable to exactly optimize  (better to
generate new search directions).
15
Suppose we calculate the gradient at the point x T = [2 2]
Chapter 6

16
Chapter 6

17
Chapter 6

18
Termination Criteria
f(x)
Big change in f(x) but little change
in x. Code will stop if x is sole criterion.

x
f(x)
Big change in x but little change
Chapter 6

in f(x). Code will stop if x is sole criterion.

x
For minimization you can use up to three criteria for termination:

(1) f ( x k )  f ( x k 1 ) except when f ( x k )  0


1
k
f (x ) then use f ( x k )  f ( x k 1 ) 2

(2) xi k 1  xi k except when x k  0


3
xi k then use x k 1  x k 4

(3) 19
f ( x ) 5
k
or si 6
k
Conjugate Search Directions

• Improvement over gradient method for general quadratic


functions
• Basis for many NLP techniques
• Two search directions are conjugate relative to Q if
Chapter 6

( s i )T Q ( s j )  0
• To minimize f(xnx1) when H is a constant matrix (=Q), you
are guaranteed to reach the optimum in n conjugate
direction stages if you minimize exactly at each stage
(one-dimensional search)

20
Chapter 6

21
Conjugate Gradient Method
Step 1. At x 0 calculate f (x 0 ). Let
s 0  f ( x 0 )
Step 2. Savef ( x 0 ) and compute
x1  x 0   0 s 0

by minimizing f(x) with respect to  in the s0 direction (i.e., carry out a unidimensional search for 0).
Chapter 6

Step 3. Calculate f ( x1 ), f ( x1 ). The new search direction is a linear combination of s 0 and f ( x1 ) :


T f ( x1 )f ( x1 )
s  f ( x )  s T
1 1 0

 f ( x 0 )f ( x 0 )
For the kth iteration the relation is

k 1 k 1 T f ( x k 1 )f ( x k 1 )
s  f ( x )s k
(6.6)
T f ( x k )f ( x k )

For a quadratic function it can be shown that these successive search directions are conjugate.
After n iterations (k = n), the quadratic function is minimized. For a nonquadratic function,
the procedure cycles again with xn+1 becoming x0.

Step 4. Test for convergence to the minimum of f(x). If convergence is not attained, return to step 3.

Step n. Terminate the algorithm when f ( x ) is less than some prescribed tolerance.
k
22
Chapter 6

23
Chapter 6

24
Chapter 6

25
Chapter 6

26
Minimize f  ( x1  3)  9( x2  5) using the method of conjugate gradients with
2 2

x10  1 and x20  1 as an initial point.

1
In vector notation, x 0   
1
4 
f x0
  
72 
Chapter 6

For steepest descent,


 4
s 0  f x0
 
72 
Steepest Descent Step (1-D Search)
1 4 
x1      0   ,  0  0.
1  72 

The objective function can be expressed as a function of 0 as follows:


f ( 0 )  (4 0  2) 2  9(72 0  4) 2 .

Minimizing f(0), we obtain f = 3.1594 at 0 = 0.0555. Hence


1.223  27
x1   
5.011
Calculate Weighting of Previous step
The new gradient can now be determined as
 3.554
f x1   
0.197 
and 0 can be computed as
(3.554) 2  (0.197) 2
 
0
 0.00244.
(4) 2  (72) 2
Generate New (Conjugate) Search Direction
Chapter 6

 3.554   4   3.564 
s1     0.00244    
 0.197  72   0.022 
and
1.223  1  3.564 
x2       0.022
5.011  
One dimensional Search

Solving for 1 as before [i.e., expressing f(x1) as a function of 1 and minimizing


with respect to 1] yields f = 5.91 x 10-10 at 1 = 0.4986. Hence
3.0000 
X 
2

5.0000 
28
which is the optimum (in 2 steps, which agrees with the theory).
Chapter 6

29
Chapter 6

30
Chapter 6

31
Fletcher – Reeves Conjugate Gradient Method

0 0
Let s  f ( x )
1 1 0
s  f ( x )   1s
2 2 1
s  f ( x )   s 2
Chapter 6

k k k 1
 are chosen to make s H s
k
 0 (conjugate directions)
k
Derivation: (let H  H )
k 1 k k k
f ( x )  f ( x )   2f ( x )( x  x )
k 1 k k k
f ( x )  f ( x )  H x  H k s

32
s  H f ( x )  f ( x )  k
k 1 k 1 k

T
(s )  f ( x )  f ( x ) H /  k
k T k 1 k1

k k 1
Using definition of conjugate directions, (s )T Hs =0,
T
 f ( x k 1
)  f ( x ) H H  f ( x )   k s   0
1
k k 1 k

Chapter 6

k k 1
f T ( x )f ( x )0
k 1 k
and f T ( x )s  0, and solving for the weighting factor:

k 1 k 1
T f ( x )f ( x )
 
k
k k
T f ( x )f ( x )
k 1 k 1 k
s  f ( x )  k s

33
Linear vs. Quadratic Approximation of f(x)

k k k 1 k k k
f ( x )  f ( x )  ( x  x )T f ( x )  ( x  x )T H ( x )( x  x )
2
k k k
x  x  x   k s
(1) Using a linear approximation of f ( x ) :
Chapter 6

df ( x ) k k
T
 0   f ( x ) so cannot solve for  x !
d ( x )
(2) Using a quadratic approximation for f (x) :
df ( x ) k k k  Newton's method
T
 0   f ( x )  H ( x )( x  x )
d ( x )  solves one of these
or
k 1 k
x  x  H ( x )f ( x )
k  with x  x k 1

(simultaneous
equation-solving)
34
Note: Both direction and step length are determined
- Requires second derivatives (Hessian)
1
- H, H must be positive definite (for minimum) to guarantee convergence
- Iterate if f ( x ) is not quadratic
Chapter 6

Modified Newton's Procedure:


k 1 k 1 k k
x  x   k H ( x )f ( x )
 k  1 for Newton's Method
(If H  I, you have steepest descent)
Example
f ( x )  x12  20 x2 2
Minimize f starting at x 0   1 1
T

35
Chapter 6

36
Chapter 6

37
Chapter 6

38
Marquardt’s Method
1
If H ( x ) or H ( x ) is not always positive definite, make it
positive definite.
1 1
Let H ( x )  H ( x )   I  ; similar for H( x )
  
 is a positive constant large enough to shift all the
Chapter 6

negative eigenvalues of H ( x ).
Example
0
At the start of the search, H( x ) is evaluated at x and
Not positive definite
0 1 2
found to be H ( x )    as the eigenvalues
 2 1  are e1  3, e2  1
0
Modify H ( x ) to be (  2)
Positive definite as the
1  2 2 
H

 eigenvalues are e1  5, e2  1
 2 1  2 
39
 is adjusted as search proceeds.
Step 1
0
Pick x the starting point.
Let   convergence criterion

Step 2
Chapter 6

Set k  0. Let  0  10 3

Step 3
k
Calculate f ( x )

Step 4

Is f ( x )k )   ? If yes, terminate. If no, continue.

40
Step 5
1
Calculate s( x )  - H   I  f ( x )
k k k k k

Step 6
k 1 k k
Calculate x  x  s( x )
Chapter 6

Step 7
k 1 k
Is f( x )  f ( x )? If yes, go to step 8. If no, go to step 9.

Step 8
1 k
Set  k 1   and k  k  1. Go to step 3
4
Step 9

Set  k  2 k . Go to step 5 41
Secant Methods
Recall for one dimensional search the secant method
only uses values of f(x) and f ′(x).
1
 f ( x )  f ( x ) 
k p
k 1
x  x  k
 f ( x k
)
 x x
k p

Chapter 6

Approximate f (x ) by a straight line (the secant).


Hence it is called a "Quasi-Newton" method.
The basic idea (for a quadratic function):
k k k 1 k 1 k
f ( x )  H  x  0 or (x  x )  H f ( x )
Pick two points to start (x k  Ref. point)
k k
f ( x 2 )  f ( x )  H ( x 2  x )
k k
f ( x 1 )  f ( x )  H ( x 1  x )
k
f ( x 2 )  f ( x 1 )  y  H ( x 2  x 1 ) 42
For a non-quadratic function, H would be calculated,
k k 1
after taking a step from x to x , by solving the
secant equations
k k k 1 k
y  H  x or 
x  H y
Chapter 6

- An infinite number of candidates exist for H when n  1


-1 -1
 
- We want to choose H (or H ) close to H (or H ) in
some sense. Several methods can be used to update H

43
• Probably the best update formula is the BFGS update
(Broyden – Fletcher – Goldfarb – Shanno) – ca. 1970

• BFGS is the basis for the unconstrained optimizer


in the Excel Solver
Chapter 6

• Does not require inverting the Hessian matrix but


approximates the inverse with values of f

44
Chapter 6

45
Chapter 6

46
Chapter 6

47
Chapter 6

48

You might also like