You are on page 1of 25

# Chen CL

## Numerical Methods for

Unconstrained Optimization

## Analytical Methods: write necessary conditions and

solve them (analytical or numerical ?) for candidate
local minimum designs
Some Diculties:
Number of design variables in constraints can be large
Functions for the design problem can be highly nonlinear
In many applications, cost and/or constraint functions can be
implicit in terms of design variables

Cheng-Liang Chen

PSE

## Numerical Methods: estimate an initial design and

improve it until optimality conditions are satised

LABORATORY

## Department of Chemical Engineering

National TAIWAN University

Chen CL

## General Concepts Related to

Numerical Algorithms
A General Algorithm
Current estimate:

x(k)

Subproblem 1:

(k)

Subproblem 2:

k = 0, 1,

= x(k) + x(k)

Chen CL

## General Concepts Related to

Numerical Algorithms

Chen CL

Chen CL

## General Concepts Related to

Numerical Algorithms

## General Concepts Related to

Numerical Algorithms

current
estimate

## Example: check the descent condition

f (x) = x21 x1x2 + 2x22 2x1 + e(x1+x2)

new
estimate

  
  
f (x(k)) > f (x(k+1))
= f (x(k) + k d(k))
Taylor

(k)

(k)

f (x ) + f (x )



(k)

(k)

+ k d

(k)

(k) (k)
= f (x(k)) +
 k c d 
<0

## f (x(k))d(k) = c(k)d(k) < 0 : descent condition

T

Angle between c(k) and d(k) must be between 90o and 270o
Chen CL

Unconstrained Optimization

## Verify d1 = (1, 2), d2 = (1, 0) at (0, 0) are descent directions or not




1
2x1 x2 2 + e(x1+x2) 

=
c =

1
x1 + 4x2 + e(x1+x2) 
(0,0)


1
c d1 = 1 1 = 1 + 2 = 1 > 0 (not a descent dir.)
2


1
c d2 = 1 1 = 1 + 0 = 1 < 0 (a descent dir.)
0

Chen CL

One-Dimensional Minimization:
Reduction to A Function of Single Variable

## Assume: a descent direction has been found

f (x)

f (x(k) + d(k))

f ()

<

T
f (x(k)) + f
(x(k))d(k) = f(|x(k))

=cd<0
f (0) = f (x(k)) (small move reducing f )

f (0)

Taylor

Chen CL

df 2(k )
df (k )
= 0,

>0
d
d2
0=

Chen CL

(k+1)
=
= f T (x
) d(k)


d
dx
d
c(k+1)T
T

## c(k+1) d(k) = c(k+1) d(k) = 0

Gradient of the cost function at NEW point, c(k+1),
is orthogonal to the current search direction, d(k)

## d(k) = (1, 1) at x(k) = (1, 2)


6x1 + 2x2 
10

c(k) = f (x(k)) =
=

2x1 + 4x2  (k)
10
x


1
(k)
(k)
= 10 10 = 20 < 0
c d
1

1
1
1

x(k+1) = + =
2
1
2
f (x(k+1)) = 3(1 )2 + 2(1 )(2 ) + 2(2 )2 + 7
= 72 20 + 22 f ()

Chen CL

10

df
10
= 14k 20 = 0 k =
d
7
2
df
= 14 > 0
d2

1
1
3/7
=

x(k+1) = + ( 10
)
7
1
2
4/7

NC:

54
< 22 = f (x(k))
7
10
7
(k+1)

f (x
) =
10

7
1

10
= 0 (check)
f T (x(k+1))d(k) = 10
7
7
1
f (x(k+1)) =

Chen CL

11

## Numerical Methods to Compute Step Size

Most one-dimensional search methods work for only
unimodal functions
(work for  = 0
= u,)
(u  interval of uncertainty)

Chen CL

12

Chen CL

Unimodal Function

Unimodal Function

## Outcome of two experiments

x [0, 1], 0 < x1 < x2 < 1

## x1 < x2 < x implies f (x1) > f (x2), and

x > x3 > x4 implies f (x3) < f (x4)

Chen CL

## f1 < f2 x [0, x2]

f1 > f2 x [x1, 1]
f1 = f2 x [x1, x2]

14

## Equal Interval Search

To reduce successively the interval of uncertainty, I,
to a small acceptable value
I = u ,

( = 0)

## Evaluate the function at  = 0, , 2, 3, , u

If f ((q + 1)) < f (q)

then continue

then  = (q 1),

new pt

current pt

u = (q + 1)
[ , u ]

13

Chen CL

15

Chen CL

16

Chen CL

## Equal Interval Search: Example

Equal Interval Search:
Example
f () = 2 4 + e
= 0.5
 = 0.001
Note:
f (x) = x(x1.5), x [0, 1] x [x7, x8] = [0.7, 0.8]
i

xi

.1

.2

.3

.4

.5

.6

.7

.8

.9

use 99 points

eliminate 98%
eliminate 1%
per function evaluation

f (xi) .14 .26 .36 .44 .50 .54 .56 .56 .54

Chen CL

18

No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37


u


u


u


u

Trial step
0.000000
0.500000
1.000000
1.500000
2.000000
1.050000
1.100000
1.150000
1.200000
1.250000
1.300000
1.350000
1.400000
1.450000
1.355000
1.360000
1.365000
1.370000
1.375000
1.380000
1.385000
1.390000
1.380500
1.381000
1.381500
1.382000
1.382500
1.383000
1.383500
1.384000
1.384500
1.385000
1.385500
1.386000
1.386500
1.387000
1.386500

Function
3.000000
1.648721
0.718282
0.481689
1.389056
0.657651
0.604166
0.558193
0.520117
0.490343
0.469297
0.457426
0.455200
0.463115
0.456761
0.456193
0.455723
0.455351
0.455077
0.454902
0.454826
0.454850
0.454890
0.454879
0.454868
0.454859
0.454851
0.454844
0.454838
0.454833
0.454829
0.454826
0.454824
0.454823
0.454823
0.454824
0.454823

17

= 0.5

start
from
 = 1.0
= 0.05

start
from
 = 1.35
= 0.005

start
from
 = 1.38
= 0.0005

Chen CL

19

## Equal Interval Search: 2 Interior Points

a =  + 13 (u ) = 13 (u + 2)
b =  + 23 (u ) = 13 (2u + )

I  = 23 I

## 2 points eliminate 16.7% per function evaluation ?!

try 3 points (2 are new) eliminate 25% per function evaluation !!

Why ?

Chen CL

20

Chen CL

21

## Golden Section Search

Reduction of Interval of Uncertainty

## Question of Equal Interval Search (n = 2):

known midpoint is NOT used in next iteration
Solution: Golden Section Search
Fibonacci Sequence:
F0 = 1;

F1 = 1;

Fn = Fn1 + Fn2, n = 2, 3,

## 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,

Fn
1.618,
Fn1

Given u, 
I
Select a, b s.t. u a
b 
Suppose f (b) > f (a)


b
 = ,


b 

Fn1
0.618 as n
Fn

Chen CL

22

=
=
=

=
=

u 
I, a
I, u
[b, u],

a, u


I , u

 = (1 )I
b = (1 )I
delete [b, u]



= b, I = u ,


b = (1 )I

Chen CL

23

## Initial Bracketing of Minimum

I = I,

(1 )I = I = ( I)
2 + 1 = 0

= 1+2 5 = 0.618 =
1 = 0.382

1
1.618

## Q : ne initial three points ?

q1

q = 0; 0 =

1 0

  
q = 1; 1 = + 1.618
 

= 2.618

1.618(0 0)

q = 2; 2

1
2 

2
= 2.618 + 1.618
  

= 5.236

2
3 

3
= 5.236 + 1.618

  

= 9.472

1.618(1 0 )

q1 q2

u a
q q1
0.618I
= 1.618

=
=
a 
q1 q2
0.382I
ratio of increased trial step size is 1.618

q = 0, 1, 2,

j=0

q1
q2


a  = 0.382I,
u 
a = 0.618I = (1.618) 0.382I
 


Starting at = 0,
q

evaluate q = (1.618)j = q1 + (1.618)q ,

q = 3; 3
..

..

1.618(2 1 )

Chen CL

24

25

## Initial Bracketing of Minimum

Algorithm
Step 1: choose q,  = q2, u = q , I

f (q1) < f (q )

If

Chen CL

u = q

q


(1.618)j

## Step 4: if f (a) < f (b)  < < b

 = , u = b, b = a, a =  + 0.382(u  ), go to Step 7

(1.618)j

## Step 5: if f (a) > f (b) a < < u

 = a, u = u, a = b, b =  + 0.618(u  ), go to Step 7

j=0

q2


 = q2 =

j=0

I = u

q
q1
 = (1.618)

  + (1.618)



q q1

= 2.618(1.618)

q1

## Step 6: if f (a) = f (b) a < < b

 = a, u = b, return to Step 2

q1 q2

Step 7: if I  = u  < 
=

Chen CL

26

u +
2

Chen CL

27

f () = 2 4 + e,

f () = 2 4 + e

= 0.5,

 = 0.001

= 0.5
 = 0.001

No.

l ; [f (l )]

a ; [f (a )]

b ; [f (b )]

u ; [f (u )]

0.500 000

1.309 017

1.809 017

2.618 034

2.118 034

[1.648 721]

[0.466 464]

[0.868 376]

[5.236 610]

0.500 000

1.000 000

1.309 017

1.809 017

[1.648 721]

[0.718 282]

[0.466 464]

[0.868 376]

f () = 2 4 + e

3
4

Trial step
1
2
3
4

0.000
1 0.500
q 1.309
u 2.618

000
000
071
034

Function value
3.000
1.648
0.466
5.236

000
721
464
610

5
6
7
8

1.000 000

1.309 017

1.500 000

1.809 017

[0.718 282]

[0.466 464]

[0.481 689]

[0.868 376]

1.000 000

1.190 983

1.309 017

1.500 000

[0.718 282]

[0.526 382]

[0.466 464]

[0.481 689]

1.190 983

1.309 017

1.381 966

1.500 000

[0.526 382]

[0.466 464]

[0.454 860]

[0.481 689]

1.309 017

1.381 966

1.427 051

1.500 000

[0.466 464]

[0.454 860]

[0.458 190]

[0.481 689]

1.309 017

1.354 102

1.381 966

1.427 051

[0.466 464]

[0.456 873]

[0.454 860]

[0.458 190]

1.354 102

1.381 966

1.399 187

1.427 051

[0.456 873]

[0.454 860]

[0.455 156]

[0.458 190]

1.309 017
0.809 017
0.500 000
0.309 017
0.190 983
0.118 304
0.072 949

Chen CL

28

Chen CL

29

f () = 2 4 + e,

= 0.5,

 = 0.001

No.

l ; [f (l )]

a ; [f (a )]

b ; [f (b )]

u ; [f (u )]

1.354 102

1.371 323

1.381 966

1.399 187

0.045 085

[0.456 873]

[0.455 269]

[0.454 860]

[0.455 156]

1.371 323

1.381 966

1.388 544

1.399 187

[0.455 269]

[0.454 860]

[0.454 833]

[0.455 156]

10
11
12
13
14
15

1.381 966

1.388 544

1.392 609

1.399 187

[0.454 860]

[0.454 833]

[0.454 902

[0.455 156]

1.381 966

1.386 031

1.388 544

1.392 609

[0.454 860]

[0.454 823]

[0.454 833]

[0.454 902]

1.381 966

1.384 479

1.386 031

1.388 544

[0.454860]

[0.454 829]

[0.454 823]

[0.454 833]

1.384 479

1.386 031

1.386 991

1.388 544

[0.454 829]

[0.454 823]

[0.454 824]

[0.454 833]

1.384 471

1.385 438

1.386 031

1.386 991

[0.454 829]

[0.454 824]

[0.454 823]

[0.454 823]

0.027 864
0.017 221
0.010 643

2
3
4
5
6
7
8

Xa

0
0.25
0.6545
1.308981

0
0.3125
0.55338
0.25004

Xl
fmin
Xu

31

= 0.25, = 0.001
Xb

1
2
3
4

Chen CL

Xl

Fcn value

0.002 512

Trial xs

0.004 065

30

No.

No.

0.006 570

Chen CL

= 0.25, = 0.001

= 0.25, = 0.001

Xu

0.250000000

0.654530742

0.904450258

1.308981000

0.312500000

0.553385621

0.538645118

0.250040242

0.250000000

0.499999999

0.654450259

0.904450258

0.312500000

0.499999999

0.553370247

0.538645118

0.499999999

0.654499998

0.749950259

0.904450258

0.499999999

0.553379750

0.562499998

0.538645118

0.654499998

0.749980997

0.808969259

0.904450258

0.553379750

0.562500000

0.559022627

0.538645118

0.654499998

0.713507255

0.749962001

0.808969259

0.553379750

0.561168280

0.562499999

0.559022627

0.713507255

0.749973741

0.772502773

0.808969259

0.561168280

0.562499999

0.561993625

0.559022627

0.713507255

0.736043543

0.749966485

0.772502773

0.561168280

0.562305217

0.562499999

0.561993625

0.736043543

0.749970969

0.758575347

0.772502773

0.562305217

0.562499999

0.562426463

0.561993625

I
1.058981
0.654450258
0.404450259
0.24995026
0.154469261
0.095462003
0.058995518
0.03645923

No.

9
10
11
12
13
14
15
16

Xl

Xa

Xb

Xu

0.736043543

0.744650692

0.749968198

0.758575347

0.562305217

0.562471385

0.562499999

0.562426463

0.744650692

0.749969911

0.753256129

0.758575347

0.562471385

0.562499999

0.562489398

0.562426463

0.744650692

0.747937969

0.749968852

0.753256129

0.562471385

0.562495748

0.562499999

0.562489398

0.747937969

0.749969506

0.751224592

0.753256129

0.562495748

0.562499999

0.562498500

0.562489398

0.747937969

0.749193459

0.749969102

0.751224592

0.562495748

0.562499349

0.562499999

0.562498500

0.749193459

0.749969352

0.750448699

0.751224592

0.562499349

0.562499999

0.562499799

0.562498500

0.749193459

0.749672961

0.749969198

0.750448699

0.562499349

0.562499893

0.562499999

0.562499799

0.749672961

0.749969293

0.750152367

0.750448699

0.562499893

0.562499999

0.562499977

0.562499799

I
0.022531804
0.013924655
0.008605437
0.00531816
0.003286623
0.002031133
0.00125524
0.000775738

Chen CL

32

## f () = q() = a0 + a1 + a22

f (i) = q(i) = a0 + a1i + a2i2
f (u) = q(u) = a0 + a1u + a2u2


1
f (u) f () f (i) f ()
a2 =

u i
u 
i 
f (i) f ()
a1 =
a2(i + )
i 
a0 = f () a1 a22

## 1: locate initial interval of uncertainty (, u)

2: select  < i < u f (i)
3: compute a0, a1, a2,
, f (
)
4:

< i

f (i) < f (
)

f (i) > f (
)

[
, u]

[, i]

, i,

i,
, u

[,
]

[i, u]

, i, u

,
, i

## Step 5: Stop if two successive estimates of minimum point of

f () are suciently close. Otherwise delete primes on  , i , u

34

Example:
f () = 2 4 + e

Step
Step
Step
Step

i <


dq() 
= a1 + 2a2
= 0
d 
a1
d2 q

=
if d
2 = 2a2 > 0
2a2

Chen CL

33

Computational Algorithm:

Polynomial Interpolation
q() = a0 + a1 + a22

Chen CL

= 0.5

Chen CL

35

Multi-Dimensional Minimization:
Powells Conjugate Directions Method

##  = 0.5 i = 1.309017 u = 2.618034

f () = 1.648721 f (i) = 0.466464 f (u) = 5.236610
 3.5879 1.1823 
1
a2 = 1.30902
2.1180 0.80902 = 2.410
a1 =

1.1823
0.80902

(2.41)(1.80902) = 5.821

## a0 = 1.648271 (5.821)(0.50) 2.41(0.25) = 3.957

= 1.2077 < i

f (
) = 0.5149 > f (i)

 =
= 1.2077 u = u = 2.618034, i = i = 1.309017
 = 1.2077 i = 1.309017 u = 2.618034
f () = 0.5149 f (i) = 0.466464 f (u) = 5.236610
a2 = 5.3807 a1 = 7.30547 a0 = 2.713

= 1.3464 f (
) = 0.4579

Conjugate Directions
Let A be an n n symmetric matrix.
A set of n vectors (directions) {S i} is said to be
A-conjugate if
S Ti AS j = 0 for i, j = 1, , n;

i = j

## Note: orthogonal directions are a special case of

conjugate directions (A = I)

Chen CL

36

Multi-Dimensional Minimization:
Powells Conjugate Directions Method

37

Multi-Dimensional Minimization:
Powells Conjugate Directions Method

## Quadratically Convergent Method

If a minimization method, using exact arithmetic, can
nd the minimum point in n steps while minimizing a
quadratic function in n variables, the method is called
a quadratically convergent method
Theorem: Given a quadratic function of n variables
and two parallel hyperplanes 1 and 2 of dimensions
k < n. Let the constrained stationary points of the
quadratic function in the hyperplanes be X 1 and X 2,
respectively. Then the line joining X 1 and X 2 is
conjugate to any line parallel to the hyperplanes.
Chen CL

Chen CL

38

Multi-Dimensional Minimization:
Powells Conjugate Directions Method
Meaning: If X 1 and X 2 are the minima of Q obtained
by searching along the direction S from two dierent
starting points X a and X b, respectively,
the line (X 1 X 2) will be conjugate to S

Proof:
1 T
X AX + B T X + C
2
Q(X) = AX + B
(n 1)
Q(X) =

(stationary pt)

## search from b along S X 2

S orthogonal to Q(X 1) and Q(X 2)
S T Q(X 1) = S T AX 1 + S T B = 0
S T Q(X 2) = S T AX 2 + S T B = 0
S T [Q(X 1) Q(X 2)] = S T A(X 1 X 2) = 0

Chen CL

39

Multi-Dimensional Minimization:
Powells Conjugate Directions Method
Proof:

Theorem:

If a quadratic function
1
Q(X) = X T AX + B T X + C
2

is minimized sequentially,
once along each direction
of a set of n mutually
conjugate directions,

Q(X ) = B + AX = 0
n

j S j
Let X = X 1 +
j=1

Sj

conjugate directions
to A
n

j S j
0 = B + AX 1 + A
j=1

0 =

the

S Ti (B+

AX 1)+
n

S Ti A
j S j
j=1

## will be found at or before

the nth step irrespective of

= (B + AX 1)T S i + iS Ti AS i
(B + AX 1)T S i
=
S Ti AS i

Chen CL

40

Multi-Dimensional Minimization:
Powells Conjugate Directions Method
Note:

X i+1
i
0
Q(X i+1)
0
i
Xi

Chen CL

## Powells Conjugate Directions: Example

f (x1, x2) = 6x21 + 2x22
 6x1x2 x1 2x2 
 

x
12 6 x
1
1
1
= 1 2
+ x1 x2
2
x
x
6
4
2
2
 
 
1
0
if S 1 =
X1 =
2

  0

12 6 s
1
S T1 AS 2 = 1 2
6
4
s
2
 
 

s
1
1
= 0
S2 =
= 0 2
s2
0



1
1 2
2
5

1 =
=


4
12
6
1

1 2
6 4  2  


0
5/4
5 1

+
X 2 = X 1 + 1 S 1 =
=
4 2
0
5/2

i S i,

=
is
=
=
=
=

Xi +
i = 1, , n
found by minimizing Q(iS i) so that
S Ti Q(X i+1)
B + AX i+1 = B + A(X i + i S i)
S Ti Q(X i+1) = S Ti {B + A(X i + i S i)}
(B + AX i)T S i + i S Ti S i
(B + AX i)T S i
=
S Ti AS i
i1

= X1 +
j S j
j=1

X iT AS i = X 1T AS i +

i1


j S j T AS i = X 1T AS i

j=1

Si
S Ti AS i
Si
= (B + AX 1)T T
S i AS i

i = (B + AX i)T

= i

Chen CL

42

## Powells Conjugate Directions: Example



1 2

0
1

=
12
12
6
1

1 0
6 4 0
 
5/4
1
1
= X 2 + 2 S 2 =
+
12 0
5/2

2 =

X3


=

4/3
5/2

= X (?)

41

Chen CL

43

Powells Algorithm

Chen CL

44

un ; u1 ,

u2,

, un1,

S (1); u2,

, un1,

S (2); u3,

un,

S (2),

Chen CL

45

un;

un,

S (1);

S (1),

S (2);

, S (n1)

X 1 = [0 0]T

## (un, S (1)), (S (1), S (2)),

un, S (1), S (2),
are A-conjugate

Chen CL

46

= 0 =

1
2

X 2 = X 1 + u2 =

0.5
2

df
d

= 0 =

1
2

X 3 = X 2 u1 =

0.5

0.5

df
d

= 0 =

47

df
d

Chen CL

1
2

X 4 = X 1 + u2 =

0.5

S (1) = X 4 X 2

0.5
0
0.5
=

=
1
0.5
0.5

## f (X 4 + S (1)) = f (0.5 0.5, 1 + 0.5)

= 0.252 0.5 1
df
d

= 0 = 1.0

X 5 = X 4 + S (1) =

1.0

1.5

Chen CL

48

Chen CL

Simplex Method

Chen CL

Simplex Method

50

Simplex Method

49

Chen CL

51

Simplex Method

Chen CL

52

Chen CL

53

## Properties of Gradient Vector

Property 1: The gradient vector c of a function
f (x1, , xn) at point x = (x1, , xn) is orthogonal
(normal) to the tangent plane for the surface
f (x1, , xn) = constant.

f (x)
x1
.
f (x) =
. = c
f (x)
xn

f (x(k))
xi

## C is any curve on the surface through x

T is a vector tangent to curve C at x c T = 0

Chen CL

54

Chen CL

## Properties of Gradient Vector

Proof:
s : any parameter along C

x1
s

T =

xn
s

x=x
(a unit tangent vector along C at x)
df
= 0
f (x) = constant
ds
df
f x1
f xn
0 =
=
+ +
ds
x1 s
xn s
T
= c T = cT

55

## Properties of Gradient Vector

Property 2: Gradient represents a direction of
maximum rate of increase for f (x) at x
Proof:
u

## a unit vector in any direction not tangent to C

t : a parameter along u
df
f (x + u) f (x)
= lim
0
dt

 
f
f
2
f (x + u) = f (x) +  u1 x
+

+
u
n
xn + O( )
1
n

1
f
ui
+ O(2)
( )
f (x + u) f (x) = 
xi

i=1
n

df
f (x + u)
f
ui
= c u = cT u
= lim
=
0
dt

x
i
i=1
= ||c|| ||u|| cos

Chen CL

56

Chen CL

57

## Property 3: The maximum rate of change of f (x) at

any point x is the magnitude of the gradient vector
 
 
(max  df
dt  = ||c||)
u is in the direction of gradient vector for = 0

## f (x) = 25x21 + x22,

f (x(0)) = 25

x(0) = (0.6, 4)


c = f (0.6, 4) =

C T =0

Slope of tangent: m1 =
Slope of gradient: m2 =

dx2
dx1

c1
c2

5x1 2
1x1
50x1
2x2

1
= 3.75

30
8

= 3.75

## Property 2: choose arbitrary direction

D = (0.501034, 0.865430), = 0.1


x(1)
C
x(1)
D
f (x(1) )
C
f (x(1) )
D




0.6
0.966235
0.6966235
(0)
= x + C =
+ 0.1
=
4.0
0.257663
4.0257663
0.6
0.501034
0.6501034
= x(0) + D =
+ 0.1
=
4.0
0.865430
4.0854300
= 28.3389
= 27.2566

Property 3:

<

f (x(1) )
C

50x1
2x2


=

30
8

15
t
=
=
(4)2 +152
||t||

.257663
0.966235

Chen CL

Property 1:

58

30



8
0.966235
c
C =
= 2 2 =
30 +8
||c||
0.257663

 
(25x21 +x22 =25)
1
= 4
t = (25x2s
2
+x
=25)
1
2
15
s
2

Chen CL

f
x1
f
x2

59

## Steepest Descent Algorithm

Steepest Descent Direction
Let f (x) be a dierentiable function w.r.t. x. The
direction of steepest descent for f (x) at any point
is d = c
Steepest Descent Algorithm:

Step
Step
Step
Step
Step

1:
2:
3:
4:
5:

## a starting design x(0), k = 0, 

c(k) = f (x(k)); stop if ||c(k)|| < 
d(k) = c(k)
calculate k to minimize f (x(k) + d(k))
x(k+1) = x(k) + k d(k), k = k + 1 Step 2

Chen CL

60

Chen CL

61

Notes:

## Steepest Descent: Example

f (x1, x2) = x21 + x22 2x1x2

d = c c d = ||c||2 < 0
The successive directions of steepest
descent are normal to each other

## Step 1: x(0) = (1, 0), k = 0, ()

Step 2: c(0) = f (x(0)) = (2x1 2x
2, 2x2 2x1)
= (2, 2); ||c(0)|| = 2 2 = 0

d = c
d(k) d(k+1) = c(k) c(k+1) = 0

proof:

df (x(k+1) )
 d (k+1)

0 =

## f (1 2, 2) = (1 2)2 + (2)2 2(1 2)(2)

= 162 8 = f ()
df ()
= 32 8 = 0 0 = 0.25
d

T
)
x(k+1)
f (x
=
x



   (k)
T
(k) +d
(k+1)
(
x
)
c

d2 f ()
d2

(k+1) T

= c
d(k) = c(k+1) c(k)
(k+1)
= d
d(k)

Chen CL

Chen CL

63

## Steepest Descent: Example

f (x1, x2, x3) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3
x(0) = (2, 4, 10)
x = (0, 0, 0)
Step 1: k = 0,  = 0.005, ( = 0.05, = 0.0001 for Golden)
4x2 + 2x1 + 2x3, 4x3 + 2x2)
Step 2: c(0) = f (x(0)) = (2x1 + 2x2,
= (12, 40, 48); ||c(0)|| = 4048 = 63.6 > 
(0)

Step 3: d

= c

(0)

(1)

(0)

(0)

## x = x + 0d = (0.0954, 2.348, 2.381)

c(1) = (4.5, 4.438, 4.828); ||c(1)|| = 7.952 > 
Note: c(1) d(0) = 0 (perfect line search)

= 32 > 0

Step 5:
x(1) = x(0) + 0d(0) = (1 0.25(2), 0 + 0.25(2)) = (0.5, 0.5)
c(1) = (0, 0) stop
62

Step 5:

x(0) = (1, 0)

## Steepest Descent: Example

Optimum solution for Example 5.10 with steepest descent program
NO.

x1
2.00000E + 00

x2
4.00000E + 00

x3

f (x)

c

9.53870E 02

2.34871E + 00

1.47384E + 00

9.90342E 01

1.29826E + 00

1.13477E + 00

1.08573E + 00

6.61514E 01

9.24028E 01

7.63036E 01

7.34684E 01

4.92294E 01

6.40697E 01

5.48401E 01

5.35008E 01

3.46139E 01

10

4.61478E 01

3.89014E 01

11

3.78902E 01

2.49307E 01

12

3.28464E 01

2.78695E 01

13

2.71519E 01

1.77281E 01

14

2.34872E 01

1.98704E 01

15

1.93449E 01

1.26669E 01

16

1.67477E 01

1.41872E 01

17

1.38196E 01

9.03973E 02

18

1.19599E 01

1.01263E 01

Chen CL

64

Chen CL

x1

NO.

x2

x3

f (x)

65

c

19

9.86659E 02

6.46051E 02

20

8.54114E 02

7.23289E 02

21

7.04412E 02

4.60867E 02

22

6.09761E 02

5.16211E 02

23

5.02296E 02

3.28470E 02

24

4.34774E 02

3.68093E 02

25

3.58170E 02

2.34188E 02

26

3.09971E 02

2.62487E 02

27

2.55704E 02

1.67348E 02

28

2.21338E 02

1.87414E 02

29

1.82492E 02

1.19396E 02

30

1.57951E 02

1.33752E 02

31

1.30274E 02

8.52432E 03

32

1.12762E 02

9.54793E 03

33

9.29691E 03

6.08244E 03

## Slow to converge, especially when approaching the optimum

a large number of iterations
Information calculated at previous iterations is NOT used,
each iteration is started independent of others

## Optimum design variables : 8.04787E 03 , 6.81319E 03 , 3.42174E 03

Optimum cost function value : 2.47347E 05
Norm of gradient at optimum : 4.97071E 03
Total no. of function evaluations : 753

Chen CL

66

Chen CL

67

Example:

## The steepest descent method converges in only one iteration for a

positive denite quadratic function with a unit condition number
of the Hessian matrix

## To accelerate the rate of convergence

scale design variables such that
condition number of new Hessian

matrix

is

unity

## Min: f (x1, x2) = 

25x21 + x22
50 0
H =
0 2
let x = Dy

D=

Min: f (y1, y2) = 12 y12 + y22

y 0 = ( 50, 2)

x0 = (1, 1)


1
50

0
1
2

Chen CL

68

Chen CL

69

Example:
Min: f (x1, x2) = 6x21 6x1x2 + 2x22 5x1 + 4x2 + 2


H =

12 6
6 4

(eigenvalues)




0.4718 0.8817
let x = Qy
Q = v1 v2 =
0.8817

0.4718

## Min: f (y1, y2) = 0.5(0.7889y12 + 15.211y22) + 1.1678y1 + 6.2957y2 + 2

1
0

let y = Dz
D = 0.7889
1
0
15.211
Min: f (y1, y2) = 0.5(z12 + z22) + 1.3148z1 + 1.6142z2
x0 = (1, 2) z = (1.3158, 1.6142)
x = QDz = ( 31 , 23 )

Chen CL

70

## Conjugate Gradient Method

Fletcher and Reeves (1964)
Steepest Descent: orthogonal at consecutive steps
converge but slow
modify current steepest descent direction by adding a
scaled previous direction
cut diagonally through orthogonal steepest descent directions
Conjugate Gradient Directions: d(i), d(j)
orthogonal w.r.t. a symmetric and positive denite
matrix A
T
d(i) Ad(j) = 0

Chen CL

71

## Conjugate Gradient Method: algorithm

Step 1: k = 0, x(0) d(0) = c(0) = f (x(0))
Stop if ||c(0)|| < , otherwise go to Step 4
Step 2: c(k) = f (x(k)), Stop if ||c(k)|| < 

2
Step 3: d(k) = c(k)+k d(k1), k = ||c(k)||/||c(k1)||
Step 4: compute k = to minimize f (x(k) + d(k))
Step 5: x(k+1) = x(k) + k d(k), k = k + 1, go to Step 2

Note:
Find the minimum in n iterations for positive denite quadratic
forms having n design variables
Inexact line search, non-quadratic forms
re-started every n + 1 iterations for computational stability
(x(0) = x(n+1))

Chen CL

72

Chen CL

73

Example:

0.0956

4.31241

2.348 + 3.81268

5.57838
2.381

## Min: f (x(1) + d(1)) = 0.3156

Min: f (x) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3
c(0) = (12, 40, 48);

f (x(0)) = 332.0

Note:

(1)

(1)

## = (4.5, 4.438, 4.828); ||c(1)|| = 7.952; f (x(1)) = 10.75


2
= ||c(1)||/||c(0)||
= [7.952/63.3]2 = 0.015633

x
c

||c(0)|| = 63.6;

## x(0) = (2, 4, 10)

1
(1)

||c(2)|| = 0.7788

c(2) d(1) = 0

= c(1) + 1d(0)

4.31241
12
4.500

+ (0.015633) 40 = 3.81268
=
4.438

5.57838
48
4.828

Chen CL

74

Newton Method
A Second-order Method
x : current estimate of x
x x + x (desired)
f (x + x) = f (x) + cT x + 12 xT Hx
NC:

f
x

= c + Hx = 0

x = H 1c
x = H 1c

(modied)

Chen CL

75

Steps: (modied)
Step 1: k = 0; c(0); 
(k)
x(k)) , i = 1 n; Stop if ||c(k)|| < 
Step 2: ci = f (x
i

2
f
Step 3: H(x(k)) = xix
j

or

Hd(k) = c(k)

## Note:for computational eciency, a system of linear simultaneous eqns

is solved instead of evaluating the inverse of Hessian

## Step 5: compute k = to minimize f (x(k) + d(k))

Step 6: x(k+1) = x(k) + d(k), k = k + 1, go to Step 2

## Note: unless H is positive denite,

d(k) will not be that of descent for f
T

1 (k)
H > 0 c(k) d(k) = k c(k) H
 c  < 0
> 0 for positive H

Chen CL

76

Chen CL

Example:

x(0) = (5, 10);

 = 0.0001

## c(0) = (6x1 + 2x2, 2x1 + 4x2) = (50, 50);



6 2
,
2 4

H (0)

(1)

=


||c(0)|| = 50 2

c(1) =

4 2

2 6 

4 2 50
5
(0)
1 (0)
1
d
= H c = 20
=
2 6 50
 

 10

5
5

5
5
(0)
x(1) = x(0) + d =
+
=
10
10
10 10

H (0) =

df
d

77

5 5
10 10
50 50
50 50

 
0
=
0

 
0
=
0

1
= 20

= 0 or f (x(1)) d(0) = 0





6(5

5)
+
2(10

10)
50

50
f (x(1)) =
=
2(5 5) + 4(10 10)

 50 50

5
f (x(1)) d(0) = 50 50 50 50
10

## = 5(50 50) 10(50 50) = 0 = 1

Chen CL

78

Example:
f (x) = 10x41 20x21x2 + 10x22 + x21 2x1 + 5,

x(0) = (1, 3)



H = f (x) =

40x1
20

Chen CL

79

Chen CL

80

## Comparison of Steepest Descent,

f (x) = 50(x2 x21)2 + (2 x1)2

Chen CL

Chen CL

81

Chen CL

83

Newton,

## x(0) = (5, 5) x = (2, 4)

82

Newton Method
Calculation of second-order derivatives at each iteration
A system of simultaneous linear equations needs to be solved
Hessian of the function may be singular at some iterations
Memoryless method: each iteration is started afresh
Not convergent unless Hessian remains positive denite and a
step size determination scheme is used

Chen CL

84

Chen CL

Quasi-Newton Methods

## Marquardt Modication (1963)

(k)

Steepest Descent:

1 (k)

= (H + I) c

## Use only 1st-order information poor rate of convergence

Each iteration is started with new design variables without using
any information from previous iterations

## Far away solution point use Steepest Descent

Near the solution point use Newton Method
Step 1: k = 0; c
Step 2:
Step 3:

(0)

Newton Method:

; ; (= 10000) (large)

x(k)) , i = 1 n; Stop
= f (x
i 2
f
(k)
)
H(x = xix
j
(k)
d = (H + k I)1c(k)
(k)
(k)
(k)

(k)
ci

## Use 2nd-order derivatives quadratic convergence rate

if ||c(k)|| < 

2nd-order derivatives !
Requires calculation of n(n+1)
2
DIculties if Hessian is singular
Not learning processes

Step 4:
Step 5: if f (x + d ) < f (x ), go to Step 6
Otherwise, let k = 2k and go to Step 4
Step 6: Set k+1 = 0.5k , k = k + 1 and go to Step 2

Chen CL

85

86

Quasi-Newton Methods
Quasi Newton Methods, Update Methods:
Use rst-order derivatives to generate approximations for Hessian
combine desirable features of both steepest descent and Newtons
methods
Use information from previous iterations to speed up convergence
(learning processes)
Several ways to approximate (updated) Hessian or its inverse
Preserve properties of symmetry and positive deniteness

Chen CL

87

## Davidon-Fletcher-Powell (DFP) Method

Davidon (1959), Fletcher and Powell (1963)
To approximate Hessian inverse using only rst
derivatives
x = H 1c
Ac
A :

Chen CL

88

Step
Step
Step
Step
Step
Step

1:
2:
3:
4:
5:
6:

## Matrix A(k) is always positive denite

always converge to a local minimum if > 0


T
d
(k) 
(k)
= c(k) A(k)c(k) < 0
f (x + d )
d
=0

k = 0; c(0), ; A(0)(= I, H 1)
c(k) = f (x(k)), Stop if ||c(k)|| < 
d(k) = A(k)c(k)
compute k = to minimize f (x(k) + d(k))
x(k+1) = x(k) + k d(k)
update A(k)

## When applied to a positive denite quadratic form,

A(k) converges to inverse of Hessian of the quadratic form

## A(k+1) = A(k) + B (k) + C (k)

(k) (k) T
(k) (k) T
B (k) = ss(k)sy (k)
C (k) = yz(k)zz (k)
s(k) = k d(k)
(change in design)
y (k) = c(k+1) c(k)
c

(k+1)

= f (x

(k+1)

89

DFP Properties:

A H 1

DFP Procedures:

Chen CL

z (k) = A(k)y (k)

## Step 7: set k = k + 1 and go to Step 2

Chen CL

90

Chen CL

91

DFP Example:
f (x) = 5x21 + 2x1x2 + x22 + 7
1-1.

c

1-2.

||c

A(0) = I;

x(0) = (1, 2)

k = 0,  = 0.001

(0)

(0)

## = (10x1 + 2x2, 2x1 + 2x2) = (14, 6),

|| =
142 + 62 = 15.232 > 

1-3.

1-4.

## x(1) = x(0) + d(0) = (1 14, 2 6)

f (x(1)) = f () = 5(1 14)2 + 2(1 14)(2 6) + 2(2 6)2 + 7
df
d

+2(6)(2 6) = 0

= 0.0988,
1-5.

(1)

= x

(0)

d2f
d2
(0)

+ 0d

= 2348 > 0

1-6.

## y (0) = c(1) c(0) = (15.046, 3.958),

s(0) y (0) = 23.20,

1.921
T
s(0)s(0) =
0.822

0.0828
B (0) =
0.0354
A(1)

z (0) = y (0)

0.822
226.40 59.55
T

z (0)z (0) =
0.352
59.55 15.67

0.935 0.246
0.0354

C (0) =
0.246 0.065
0.0152

0.148 0.211

0.211 0.950

Chen CL

92

Chen CL

93

2-6.

## y (1) = c(2) c(1) = (1.882, 1.758)

z (1) = A(1)y (1) = (0.649, 2.067)

2-2. ||c(1)|| =

## s(1) y (1) = 3.201,

y (1) z (1) = 4.855

0.207 0.607
0.421 1.341
T
z (1)z (1)T =

s(1)s(1) =
0.607 1.780
1.341 4.272

.0647
0.19
.0867
0.276
C (1) =

B (1) =
0.19 0.556
0.276 0.880

0.126 0.125

0.125 0.626

2-3.

2-4.

## x(2) = x(1) + d(1)

1 = 0.776 (minimize f (x(1) + d(1)))

2-5.

## x(2) = x(1) + d(1) = (0.386, 1.407) + (0.455, 1.334)

= (0.069, 0.073)

Chen CL

94

Broyden-Fletcher-Goldfarb-Shanno (BFGS)
Method
Direct update Hessian using only rst derivatives
x = H 1c
Hx = c
Ax c
A :

## nd A by using only 1st-order information

Chen CL

95

BFGS Procedures:

Step
Step
Step
Step
Step
Step

1:
2:
3:
4:
5:
6:

k = 0; c(0), ; H (0)(= I, H)
c(k) = f (x(k)), Stop if ||c(k)|| < 
solve H (k)d(k) = c(k) to obtain d(k)
compute k = to minimize f (x(k) + d(k))
x(k+1) = x(k) + k d(k)
update H (k)

## H (k+1) = H (k) + D (k) + E (k)

T
(k) (k) T
y (k)y (k)
D (k) = y (k)s(k)
E (k) = c (k)c (k)
c d
s(k) = k d(k)
(change in design)
y (k) = c(k+1) c(k)

c(k+1) = f (x(k+1))
Step 7: set k = k + 1 and go to Step 2

Chen CL

96

Chen CL

97

BFGS Example:
f (x) = 5x21 + 2x1x2 + x22 + 7
H (0) = I;

1-1.

x(0) = (1, 2)
k = 0,  = 0.001

## = (10x1 + 2x2, 2x1 + 2x2) = (14, 6),

||c(0)|| =
142 + 62 = 15.232 > 
c

1-2.

(0)

1-4.

## x(1) = x(0) + d(0) = (1 14, 2 6)

= 5(2)(14)(1 14) + 2(14)(2 6) + 2(6)(1 14)
+2(6)(2 6) = 0

= 0.0988,

d2f
d2
(0)

x(1) = x(0) + 0d

1-5.

= 2348 > 0

H (1)

Chen CL

98

59.55
196
84
T

c(0)c(0) =
15.67
84 36

2.567
0.845 0.362

E (0) =
0.675
0.362 0.155

9.915
2.205

2.205 0.520

226.40
T
y (0)y (0) =
59.55

9.760
D (0) =
2.567

df
d

= c(0) = (14, 6)

1-3.

1-6.

(0)

Chen CL

2-6.

99

2-2. ||c(1)|| =
2-3.
2-4.

## c(1) = H (1)d(1) d(1) = (17.20, 76.77)

x(2) = x(1) + d(1)
1 = 0.018455 (minimize f (x(1) + d(1)))

2-5.

## y (1) s(1) = 3.224,

c(1) d(1) = 174.76

0.1156 0.748
1.094 2.136
T
c(1)c(1)T =

y (1)y (1) =
0.748 4.836
2.136 4.170

.0063 .0122
0.036 0.232

E (1) =
D (1) =
.0122 .0239
0.232 1.500

9.945
1.985

1.985 1.996