You are on page 1of 42

UNCONSTRAINED MULTIVARIABLE

Chapter 6

OPTIMIZATION

1
Methods
1 Function Values Only (grid search)
Chapter 6

2 First Derivatives of f (gradient and conjugate


direction methods)

3 Second Derivatives of f (e.g., Newton’s


method)

4 Quasi-Newton methods

2
Grid Search
• Fungsi dihitung di setiap titik pada grid
• Nilai ekstremum ditemukan pada satu titik
Chapter 6

tertentu  nilai optimum

3
Gradient Method
Chapter 6

k
(1) Calculate a search direction s

(2) Select a step length in that direction to reduce f(x)


k 1 k k k k
x  x   s  x  x

4
Gradient Method: Steepest
Descent (Search Direction)
k k
s  f ( x ) Don’t need to normalize

Method terminates at any stationary point. Why?


f ( x )  0

So procedure can stop at saddle point. Need to show

*
H ( x ) is positive definite for a minimum.

5
Gradient Method:
Step Length
Chapter 6

How to pick 
• analytically
• numerically

6
Chapter 6

7
8
Chapter 6

9
Analytical Method
How does one minimize a function in a search direction
using an analytical method?

It means s is fixed and you want to pick , the step


length to minimize f(x). Note  x k   s k .
Chapter 6

k k k 1 k k k 1 k k k
f (x   s )  f (x )  f ( x )  T f ( x )(  x )  (  x )T H ( x )(  x )
2
k k
df ( x   s ) k k k k k
 0  T f ( x )(s )   (s )T H ( x )(s )
d
Solve for 
k k
T f ( x )(s ) (6.9)
 
k k k
(s )T H ( x )(s )

This yields a minimum of the approximating function.


10
Numerical Method
Use coarse search first
(1) Fixed  ( = 1) or variable  ( = 1, 2, ½, etc.)
Chapter 6

Options for optimizing 


(1) Use interpolation such as quadratic, cubic
(2) Region Elimination (Golden Search)
(3) Newton, Secant, Quasi-Newton
(4) Random
(5) Analytical optimization

(1), (3), and (5) are preferred. However, it may


not be desirable to exactly optimize  (better to
generate new search directions).
11
Suppose we calculate the gradient at the point x T = [2 2]
Chapter 6

12
Chapter 6

13
Chapter 6

14
Gradient Method:
Termination Criteria
f(x) f(x)
Chapter 6

x x
Big change in f(x) but little change Big change in x but little change
in x. Code will stop if x is sole criterion. in f(x). Code will stop if x is sole criterion.
For minimization you can use up to three criteria for termination:
(1) f ( x k )  f ( x k 1 ) except when f ( x k )  0
1
k
f (x ) then use f ( x k )  f ( x k 1 ) 2
xi k 1  xi k except when x k  0
(2) 3
xi k
then use x k 1  x k 4
(3) f ( x k ) 5 or si k 6
15
Gradient Method:
Conjugate Search Directions
Improvement over gradient method for general quadratic functions
Basis for many NLP techniques
Two search directions are conjugate relative to Q if

( s i )T Q ( s j )  0

To minimize f(xnx1) when H is a constant matrix (=Q), you are


guaranteed to reach the optimum in n conjugate direction stages if
you minimize exactly at each stage
(one-dimensional search)

16
Chapter 6

17
Conjugate Gradient Method
Step 1. At x 0 calculate f (x 0 ). Let
s 0  f ( x 0 )
Step 2. Savef ( x 0 ) and compute
x1  x 0   0 s 0

by minimizing f(x) with respect to  in the s0 direction (i.e., carry out a unidimensional search for 0).
Chapter 6

Step 3. Calculate f ( x1 ), f ( x1 ). The new search direction is a linear combination of s 0 and f ( x1 ) :


T f ( x1 )f ( x1 )
s  f ( x )  s T
1 1 0

 f ( x 0 )f ( x 0 )
For the kth iteration the relation is

k 1 k 1 T f ( x k 1 )f ( x k 1 )
s  f ( x )s k
(6.6)
T f ( x k )f ( x k )

For a quadratic function it can be shown that these successive search directions are conjugate.
After n iterations (k = n), the quadratic function is minimized. For a nonquadratic function,
the procedure cycles again with xn+1 becoming x0.

Step 4. Test for convergence to the minimum of f(x). If convergence is not attained, return to step 3.
Step n. Terminate the algorithm when f ( x ) is less than some prescribed tolerance.
k

18
Chapter 6

19
Chapter 6

20
Chapter 6

21
Chapter 6

22
Minimize f  ( x1  3)  9( x2  5) using the method of conjugate gradients with
2 2

x10  1 and x20  1 as an initial point.

1
In vector notation, x 0   
1
4 
f x0
  
72 
Chapter 6

For steepest descent,


 4
s 0  f x0
 
72 
Steepest Descent Step (1-D Search)
1 4 
x1      0   ,  0  0.
1  72 

The objective function can be expressed as a function of 0 as follows:


f ( 0 )  (4 0  2) 2  9(72 0  4) 2 .
1.223 
Minimizing f(0), we obtain f = 3.1594 at 0 = 0.0555. Hence x1   
5.011
23
Calculate Weighting of Previous step
The new gradient can now be determined as
 3.554
f x1   
0.197 
and 0 can be computed as
(3.554) 2  (0.197) 2
 
0
 0.00244.
(4) 2  (72) 2
Generate New (Conjugate) Search Direction
Chapter 6

 3.554   4   3.564 
s1     0.00244    
 0.197  72   0.022 
and
1.223  1  3.564 
x2       0.022
5.011  
One dimensional Search

Solving for 1 as before [i.e., expressing f(x1) as a function of 1 and minimizing


with respect to 1] yields f = 5.91 x 10-10 at 1 = 0.4986. Hence
3.0000  which is the optimum (in 2 steps,
X 
2
 which agrees with the theory).
5.0000 
24
Chapter 6

25
Chapter 6

26
s  H f ( x )  f ( x )  k
k 1 k 1 k

T
(s )  f ( x )  f ( x ) H /  k
k T k 1 k1

k k 1
Using definition of conjugate directions, (s )T Hs =0,
T
 f ( x k 1
)  f ( x ) H H  f ( x )   k s   0
1
k k 1 k

Chapter 6

k k 1
f T ( x )f ( x )0
k 1 k
and f T ( x )s  0, and solving for the weighting factor:

k 1 k 1
T f ( x )f ( x )
 
k
k k
T f ( x )f ( x )
k 1 k 1 k
s  f ( x )  k s

27
Newton Method: Linear vs.
Quadratic Approximation of f(x)
k k k 1 k k k
f ( x )  f ( x )  ( x  x )T f ( x )  ( x  x )T H ( x )( x  x )
2
k k k
x  x  x   k s
Chapter 6

(1) Using a linear approximation of f ( x ) :


df ( x ) k k
T
 0   f ( x ) so cannot solve for  x !
d ( x )
(2) Using a quadratic approximation for f (x) :
df ( x ) k k k  Newton's method
T
 0   f ( x )  H ( x )( x  x )
d ( x )  solves one of these
or
k 1 k
x  x  H ( x )f ( x )
k  with x  x k 1

(simultaneous
equation-solving)
28
Note: Both direction and step length are determined
- Requires second derivatives (Hessian)
1
- H, H must be positive definite (for minimum) to guarantee convergence
- Iterate if f ( x ) is not quadratic
Chapter 6

Modified Newton's Procedure:


k 1 k 1 k k
x  x   k H ( x )f ( x )
 k  1 for Newton's Method
(If H  I, you have steepest descent)
Example
f ( x )  x12  20 x2 2
Minimize f starting at x 0   1 1
T

29
Chapter 6

30
Chapter 6

31
Chapter 6

32
Marquardt’s Method
1
If H ( x ) or H ( x ) is not always positive definite, make it
positive definite.
1 1
Let H ( x )  H ( x )   I  ; similar for H( x )
  
 is a positive constant large enough to shift all the
Chapter 6

negative eigenvalues of H ( x ).
Example
0
At the start of the search, H( x ) is evaluated at x and
Not positive definite
0 1 2
found to be H ( x )    as the eigenvalues
 2 1  are e1  3, e2  1
0
Modify H ( x ) to be (  2)
Positive definite as the
1  2 2 
H

 eigenvalues are e1  5, e2  1
 2 1  2 
 is adjusted as search proceeds. 33
Step 1
0
Pick x the starting point.
Let   convergence criterion

Step 2
Chapter 6

Set k  0. Let  0  10 3

Step 3
k
Calculate f ( x )

Step 4

Is f ( x )k )   ? If yes, terminate. If no, continue.

34
Step 5
1
Calculate s( x )  - H   I  f ( x )
k k k k k

Step 6
k 1 k k
Calculate x  x  s( x )
Chapter 6

Step 7
k 1 k
Is f( x )  f ( x )? If yes, go to step 8. If no, go to step 9.

Step 8
1 k
Set  k 1   and k  k  1. Go to step 3
4
Step 9

Set  k  2 k . Go to step 5
35
Secant Methods
Recall for one dimensional search the secant method
only uses values of f(x) and f ′(x).
1
 f ( x )  f ( x ) 
k p
k 1
x  x  k
 f ( x k
)
 x x
k p

Chapter 6

Approximate f (x ) by a straight line (the secant).


Hence it is called a "Quasi-Newton" method.
The basic idea (for a quadratic function):
k k k 1 k 1 k
f ( x )  H  x  0 or (x  x )  H f ( x )
Pick two points to start (x k  Ref. point)
k k
f ( x 2 )  f ( x )  H ( x 2  x )
k k
f ( x 1 )  f ( x )  H ( x 1  x )
k
f ( x 2 )  f ( x 1 )  y  H ( x 2  x 1 )
36
For a non-quadratic function, H would be calculated,
k k 1
after taking a step from x to x , by solving the
secant equations
k k k 1 k
y  H  x or 
x  H y
Chapter 6

- An infinite number of candidates exist for H when n  1


-1 -1
 
- We want to choose H (or H ) close to H (or H ) in
some sense. Several methods can be used to update H

37
• Probably the best update formula is the BFGS update
(Broyden – Fletcher – Goldfarb – Shanno) – ca. 1970

• BFGS is the basis for the unconstrained optimizer


in the Excel Solver
Chapter 6

• Does not require inverting the Hessian matrix but


approximates the inverse with values of f

38
Chapter 6

39
Chapter 6

40
Chapter 6

41
Chapter 6

42

You might also like