L2 Unconstrained

Department of Chemical and Biological Engineering Korea University
LECTURE NOTE II
Chapter 3
Function of Several Variables
 Unconstrained multivariable minimization problem:
min f ( x), x  R N
x
where x is a vector of design variables of dimension N, and f is a scalar objective function.

T
 f f f f 
- Gradient of f: f    
 x1 x2 x3 xN 
- Possible locations of local optima
 points where the gradient of f is zero
 boundary points only if the feasible region is defined
 points where f is discontinuous
 points where the gradient of f is discontinuous or does not exist
- Assumption for the development of optimality criteria
f and its derivatives exist and are continuous everywhere
3.1 Optimality Criteria

- Optimality criteria are necessary to recognize the solution.
- Optimality criteria provide motivation for most of useful methods.
- Taylor series expansion of f
1
f ( x)  f ( x )  f ( x )T x  xT  2 f ( x ) x  O3 ( x)
2
 2 f 
where x is the current expansion point, 2 f ( x )   
x  x  x is the change in x,  xi x j 
 2 f ( x ) is the NxN symmetric Hessian matrix at x ,
O3 (x) is the error of 2nd-order expansion.
- In order for x to be local minimum
f  f ( x)  f ( x )  0 for x  x   (  0)
- In order for x to be strict local minimum
f  f ( x)  f ( x )  0 for x  x   (  0)
ChE605 Engineering Optimization II-1

- Optimality criterion (strict)

1
f  f ( x)  f ( x )  f ( x )T x  xT  2 f ( x ) x  0,  x  
2
 f ( x )  0 and  2 f ( x )  0 (positive definite)
- For Q ( z )  z T Az
A is positive definite if Q ( z )  0, z  0
A is positive semidefinite if Q ( z )  0, z and z  0  z T Az  0
A is negative definite if Q ( z )  0, z  0
A is negative semidefinite if Q ( z )  0, z and z  0  z T Az  0
A is indefinite if Q ( z )  0 for some z and Q ( z )  0 for other z
 Test for positive definite matrices
1. If any one of diagonal elements is not positive, then A is not p.d.
2. All the leading principal determinants must be positive.
3. All eigenvalues of A are positive.
 Test for negative definite matrices
1. If any one of diagonal elements is not negative, then A is not n.d.
2. All the leading principal determinant must have alternate sign starting from
D1<0 (D2>0, D3<0, D4>0, … ).
3. All eigenvalues of A are negative.
 Test for positive semidefinite matrices
1. If any one of diagonal elements is nonnegative, then A is not p.s.d.
2. All the principal determinants are nonnegative.
 Test for negative semidefinite matrices
1. If any one of diagonal elements is nonpositive, then A is not n.s.d.
2. All the k-th order principal determinants are nonpositive if k is odd, and
nonnegative if k is even.
Remark 1: The principal minor of order k of NxN matrix Q is a submatrix of size kxk
obtained by deleting any n-k rows and their corresponding columns from the matrix Q.
Remark 2: The leading principal minor of order k of NxN matrix Q is a submatrix of
size kxk obtained by deleting the last n-k rows and their corresponding columns.
Remark 3: The determinant of a principal minor is called the principal determinant. For
NxN matrix, there are 2 N  1 principal determinant in all.

- The stationary point x is a

minimum if  2 f ( x ) is positive definite,
maximum if  2 f ( x ) is negative definite,
saddle point if  2 f ( x ) is indefinite.
- Theorem 3.1 Necessary condition for a local minimum

For x* to be local minimum of f(x), it is necessary that
f ( x* )  0 and  2 f ( x* )  0
- Theorem 3.2 Sufficient condition for strict local minimum

If f ( x* )  0 and  2 f ( x* )  0 ,
then x* to be strict or isolated local minimum of f(x).
Remark 1: The reverse of Theorem 3.1 is not true. (e.g., f(x)=x3 at x=0)
Remark 2: The reverse of Theorem 3.2 is not true. (e.g., f(x)=x4 at x=0)
3.2 Direct Search Methods
- Direct search methods use only function values.
- For the cases where f is not available or may not exist.
 Modified simplex search method (Nelder and Mead)

- In n dimensions, a regular simplex is a polyhedron composed of n+1 equidistant points
which form its vertices. (for 2-d equilateral triangle, for 3-d tetrahedron)
- Let xi  ( xi1 , xi 2 , , xin ) (i  1, 2, , n  1) be the i-th vector point in Rn of the
simples vertices on each step of the search.
Define f ( xh )  max{ f ( xi ); i  1, , n  1} ,
f ( xg )  max{ f ( xi  h ); i  1, , n  1} and
f ( xl )  min{ f ( xi ); i  1, , n  1} .
Select an initial simplex with termination criteria. (M=0)
i) Decide xh , xg , xl among (n+1) points in simplex vertices and let xc be the
centroid of all vertices excluding the worst point xh .
1  n 1 
xc   xi  xh 
n  j 1 
ii) Calculate f ( xh ) , f ( xl ) , and f ( xg ) . If xl is same as previous
one, then let M=M+1. If M>1.65n+0.05n2, then M=0 and go to vi).
iii) Reflection: xr  xc   ( xc  xh ) (usually  =1)
If f ( xl )  f ( xr )  f ( xg ) , then set xh  xr and go to i).
iv) Expansion: If f ( xr )  f ( xl ) , xe  xc   ( xr  xc ) .
( 2.8    3.0 ) If f ( xe )  f ( xr ) , then set xh  xe and go to i).

v) Contraction: If f ( xr )  f ( xh ) , xt  xc   ( xh  xc ) .
( 0.4    0.6 ) Else if f ( xr )  f ( xg ) , xt  xc   ( xh  xc ) .
Then set xh  xt and go to i).
vi) If the simplex is small enough, then stop. Otherwise,
Reduction: xi  xl  0.5( xi  xl ) for i  1, 2, , n  1 . And go to i).
Remark 1: The indices h and l have the one of value of i.

Remark 2: The termination criteria can be that the longest segment between points is
small enough and the largest difference between function values is small enough.
Remark 3: If the contour of the objective function is severely distorted and elongated,
the search can very inefficient and fail to converge.
 Hooke-Jeeves Pattern Search

- It consists of exploratory moves and pattern moves.
Select an initial guess x (0) , increment vectors  i for
i  1, 2, , n and termination criteria. Start with k=1.
i) Exploratory search:
A. Let i=1 and xb( k )  x ( k 1) .
B. Try xn( k )  xb( k )   i . If f ( xn( k ) )  f ( xb( k ) ) , then
xb( k )  xn( k ) .
C. Else, try xn( k )  xb( k )   i . If f ( xn( k ) )  f ( xb( k ) ) , then xb( k )  xn( k ) .
D. Else, let i = i +1 and go to B until i >n.
ii) If exploratory search fails ( xb( k )  x ( k 1) )
A. If  i   i for i  1, 2, , n , then x*  x ( k 1) and stop.
B. Else,  i  0.5 i for i  1, 2, , n and go to i).
iii) Pattern search:
( k 1) ( k 1)
A. Let x p  xb  ( xb  xb )
(k ) (k )
( k 1)
B. If f ( x p )  f ( xb ) , then x  x p and go to i).
(k ) (k ) (k )
C. Else, x ( k )  xb( k ) and go to i).

Remark 1: HJ method may be terminated prematurely in the presence of severe
nonlinearity and will degenerate to a sequence of exploratory moves.
Remark 2: For the efficiency, the pattern search can be modified to perform a line search
in the pattern search direction.
Remark 3: The Rosenblock’s rotating direction method will rotate the exploratory search
direction based on the previous moves using Gram-Schmidt orthogonalization.
- Let 1 ,  2 , ,  n be the initial search direction.

- Let  i be the net distance moved in  i direction. And

u1  11   2 2     n n
u2   2 2     n n

un   n n
Then
ˆ1  u1 / u1
j 1
 j  wj / wj for i  2,3, , n where w j  u j   [(u j )T  k ] k
k 1
- Use 1 ,  2 , ,  n as a new search direction for exploratory search.
Remark 4: More complicated methods can be derived. However, the next Powell’s
Conjugate Direction Method is better if a more sophisticated algorithm is to be used.
 Powell’s Conjugate Direction Method

- Motivations
 It is based on the model of a quadratic objective function.
 If the objective function of n variables is quadratic and in the form of perfect square,
then the optimum can be found after exactly n single variable searches.
 Quadratic functions:
q ( x)  a  bT x  0.5 xT Cx
Similarity transform (Diagonalization): Find T with x=Tz so that
Q( x)  xT Cx  z T TT CTz  z T Dz (D is a diagonal matrix)
cf) If C is diagonalizable, T is the eigenvector of C.
 For optimization, C of objective function is not generally available.
- Conjugate directions
 Definition:
Given an nxn symmetric matrix C, the direction s1, s2,, …, sr ( r  n ) are said to be
C conjugate if the directions are linearly independent and
siT Cs j  0 for all i  j .
Remark 1: If si s j  0 for all i  j they are orthogonal.
T

Remark 2: If si is the i-th column of a matrix T, then TTCT is a diagonal matrix.
- Parallel subspace property

For a 2D-quadratic function, pick a direction d and two initial points x1 and x2.
Let zi be the minimum point of min f ( xi   d ) .


f f x
  (bT  xT C) d 0
 x  x  z1 or z2
(bT  z1T C)d  (bT  z2T C)d  0  ( z1  z2 )T Cd  0

 ( z1  z2 ) and d are conjugate directions.
- Extended Parallel subspace property
For a quadratic function, pick n-direction si =ei ( i  1, 2, , n ) and a initial points x0.
i) Perform a line search in sn direction and let the result be x1.
ii) Perform n line searches for s1, s2,…, sn starting from a last line search result.
Let the last point be z1 after n line search.
iii) Then replace si with si+1 (i=1,2,…n-1) and set sn = (z1-x1).
iv) Repeat ii) and iii) (n-1) times. Then s1, s2, …, sn are conjugate each other.
- Given C, find n-conjugate directions

A. Choose n linearly independent vectors, u1, u2,…, un. Let z1= u1.
 u Tj Azk 
j 1
zj  uj   T  zk for j  2,3, , n
 zk Azk 
k 1 
B. Recursive method (from an arbitrary direction z1)
 z T A2 z 
z2  Az1   1T 1  z1
 z1 Az1 

 z Tj A2 z j   z Tj A2 z j 
z j 1  Az j   T  z j  T z for j  2,3, , n -1
 z j Az j  z j 1 Az j 1  j 1
   
cf) Select b so that ziT Azi 1  ziT A( Azi  bzi )  0
- Powell’s conjugate direction method

Select initial guess x0 and a set of n linearly independent directions (si = ei).
i) Perform a line search in en direction and let the result be x0(1) and x (1)  x0(1) (k=1).
ii) Starting at x ( k ) , perform n line search in si direction from the previous point of line
search result for i  1, 2, , n . Let the point obtained from the each line search be
xi( k ) .
iii) Form a new conjugated direction, sn+1 using the extended parallel subspace property.
sn 1  ( xn( k )  x ( k ) ) / xn( k )  x ( k ) .
iv) If sn 1   , then x*  x ( k ) and stop.
v) Perform additional line search in sn+1 direction and let the result be xn( k)1 .
vi) Delete s1 and replace si with si+1 for i  1, 2, , n . Then set x ( k 1)  xn( k)1 and k=k+1
and go to ii).
Remark 1: If the objective function is quadratic, the optimum will be found after n2 line
searches.
Remark 2: Before step vi), needs a procedure to check the linear independence of the
conjugate direction set.
A. Modification by Sargent
Suppose k* is obtained by min f ( x  k sn 1 ) .

(k )

x ( k 1)
 x ( k )  k sn 1 
And let f ( xm( k)1 )  f ( xm( k ) )  max  f ( x (jk1) )  f ( x (jk ) ) 

j
0.5
 f ( x ( k ) )  f ( x ( k 1) ) 
Check if    *
(k ) 
 f ( xm 1 )  f ( xm ) 
k (k )
If yes, use old directions again. Else delete sm and add sn+1.
B. Modification by Zangwill
Let D   s1 s2  sn  and xm( k)1  xm( k )  max x (jk1)  x (jk )

(k )
j
xm( k)1  xm( k )

Check if det( D ( k ) )  
sn 1
If yes, use old directions again. Else delete sm and add sn+1.
Remark 3: This method will converge to a local minimum at superlinear convergence
rate.

 ( k 1)
cf) Let lim  C where  ( k )  x ( k )  x*
(k ) r
k 

If C<1, then it is convergent at r-order of convergence rate.

r=1 : linear convergence rate
r=2 : quadratic convergence rate
r=1 and C=0 : superlinear convergence rate
 Among unconstrained multidimensional direct search methods, the Powell’s conjugate
direction method is the most recommended method.
3.3 Gradient based Methods

- All techniques employ a similar iteration procedure:
x ( k 1)  x ( k )   ( k ) s ( x ( k ) )
where  ( k ) is the step-length parameter found by a line search, and
s ( x ( k ) ) is the search direction.
- The  ( k ) is decided by a line search in the search direction s ( x ( k ) ) .
i) Start from an initial guess x (0) (k=0).
ii) Decide the search direction s ( x ( k ) ) .
iii) Perform a line search in the search direction and get an improved point x ( k 1) .
iv) Check the termination criteria. If satisfied, then stop.
v) Else set k=k+1 and go to ii).
- Gradient based methods require accurate values of first derivative of f(x).

- Second-order methods use values of second derivative of f(x) additionally.
 Steepest descent Method (Cauchy’s Method)

f ( x)  f ( x )  f ( x )T x   (higher-order terms ignored)
f ( x )  f ( x)  f ( x )T x
The steepest descent direction: Maximize the decent by choosing x
x*  arg max  f ( x )T x   f ( x ) (  0)

x
The search direction: s ( x ( k ) )   ( k )f ( x ( k ) )

Termination criteria:
f ( x ( k ) )   f and/or x ( k 1)  x ( k ) / x ( k )   x
Remark 1: This method shows slow improvement near optimum. ( f ( x )  0)

Remark 2: This method possesses a descent property.

 f ( x ( k ) )T s ( x ( k ) )  0
 Newton’s Method (Modified Newton’s Method)

f ( x)  f ( x )   2 f ( x )x   (higher-order terms ignored)
The optimality condition for approximate derivative at x :
f ( x)  f ( x )   2 f ( x )x  0
x   2 f ( x ) 1 f ( x )
The search direction: s ( x ( k ) )   2 f ( x ( k ) ) 1 f ( x ( k ) ) (Newton’s method)
s ( x ( k ) )   ( k ) 2 f ( x ( k ) ) 1 f ( x ( k ) ) (Modified Newton’s method)
Remark 1: In the modified Newton’s method, the step-size parameter  ( k ) is decided
by a line search to ensure for the best improvement.
Remark 2: The calculation of the inverse of Hessian matrix  2 f ( x ( k ) ) imposes quite
heavy computation when the dimension of the optimization variable is high.
Remark 3: The family of Newton’s methods exhibits quadratic convergence.
2
 ( k 1)  C  ( k ) (C is related to the condition of Hessian  f ( x
2 (k )
))
 ( k 1) * f ( x ( k ) )  f ( x* ) 
 x  x  x (k )
 x *
 
 f ( x )(k )

  f ( x ( k ) )  f ( x ( k ) )( x*  x ( k ) )   f ( x* ) 
  
 f ( x ( k ) ) 
 
 f ( x ( k ) ) * ( k ) 2 
 (x  x )  k(x  x ) * (k ) 2
 f ( x ( k ) ) 
 
1
Also, if the initial condition is chosen such that  (0)  , the method will
C
converge. It implies that the initial condition is chosen poorly, it may diverge.
Remark 4: The family of Newton’s methods does not possess the descent property.
f ( x ( k ) )T s ( x ( k ) )  f ( x ( k ) )T  2 f ( x ( k ) )1 f ( x ( k ) )  0 only if the Hessian
is positive definite.
 Marquardt’s Method (Marquardt’s compromise)

- This method combines steepest descent and Newton’s methods.
- The steepest descent method has good reduction in f when x ( k ) is far from x* .
- Newton’s method possesses quadratic convergence near x* .
- The search direction: s ( x ( k ) )  [H ( k )   ( k ) I ]1 f ( x ( k ) )
- Start with large  (0) , say 104 (steepest descent direction) and decrease to zero.
If f ( x ( k 1) )  f ( x ( k ) ) , then set  ( k 1)  0.5 ( k ) .

Else set  ( k 1)  2 ( k ) .

Remark 1: This is quite useful for the problems with objective function form of
f ( x)  f12 ( x )  f 22 ( x)    f m2 ( x ) (Levenberg-Marquardt method)
Remark 2: Goldstein and Price Algorithm

Let  (  0.5) and  be positive numbers.
i) Start from x (0) with k =1. Let  ( x (0) )  f ( x (0) ) .
ii) Check if f ( x ( k ) )   . If yes, then stop.
f ( x ( k ) )  f ( x ( k )   k ( x ( k ) ))
iii) Calculate g ( x ( k ) , k ) 
 k f ( x ( k ) )T  ( x ( k ) )
If g ( x ( k ) ,1)   , select  k such that   g ( x ( k ) , k )  1   .
Else,  k  1 .
iv) Let Q  [Q1 Q2  Qn ] (approximation of the Hessian)
f ( x ( k )   f ( x ( k 1) ) ei )  f ( x ( k ) )
where Qi 
 f ( x ( k 1) )
If Q( x ( k ) ) is singular or f ( x ( k ) )T Q( x ( k ) ) 1 f ( x ( k ) )  0 ,
then  ( x ( k ) )  f ( x ( k ) ) . Else  ( x ( k ) )  Q ( x ( k ) ) 1 f ( x ( k ) ) .
v) Set x ( k 1)  x ( k )   k  ( x ( k ) ) and k=k+1. Then go to ii).
 Conjugate Gradient Method

- Quadratically convergent method: The optimum of a n-D quadratic function can be
found in approximately n steps using exact arithmetic.
- This method generates conjugate directions using gradient information.
- For a quadratic function, consider two distinct points, x (0) and x (1) .
Let g ( x (0) )  f ( x (0) )  Cx (0)  b and
g ( x (1) )  f ( x (1) )  Cx (1)  b .
g ( x)  g ( x (1) )  g ( x (0) )  C( x (1)  x (0) )  Cx
(Property of quadratic function: expression for a change in gradient)
- Iterative update equation: x ( k 1)  x ( k )   ( k ) s ( x ( k ) )
f ( x ( k 1) )
 b T s ( k )  s ( k ) T C( x ( k )   ( k ) s ( k ) )
 ( k )
 s ( k )T (b  Cx ( k ) )  s ( k )T C ( k ) s ( k )  0
T
s ( k ) f ( x ( k ) )
 (k )
 and f ( x ( k 1) )T s ( k )  0 (optimality of line search)
( k )T (k )
s Cs

k 1
- Search direction: s ( k )   g ( k )   i 0
(i ) (i )
s with s (0)   g (0)
- In order that the s ( k ) is C-conjugate to all previous search direction
i) Choose  (0) such that s (1) Cs (0)  0

T
where s (1)   g (1)   (0) s (0)   g (1)   (0) g (0)

 [ g (1)   (0) g (0) ]T C[x /  (0) ]  0 ( x   (0) s (0) )
 [ g (1)   (0) g (0) ]T g  0 (property of quadratic function)
0 2
g g T (1) (1) T
( g  g ) g (1) g (1) g (1) g (1)
(0) T
 (0)   T (0)  (0)  
g g ( g  g (1) )T g (0) g (0)T g (0) g (0)
2
ii) Choose  (0) and  (1) such that s (2) Cs (1)  0 and s (2) Cs (0)  0 .
T T
where s (2)   g (2)   (0) g (0)   (1) ( g (1)   (0) g (0) )

2
g (2)
  (0)  0 and  (1)  2
g (1)
ii) In general, s ( k )   g (1)   ( k ) s ( k 1)

 g (k ) 2 
s(k )   g (k )    s ( k 1) (Fletcher and Reeves Method)
 g ( k 1) 2 
 
Remark 1: Variations of conjugate gradient method
i) Miele and Cantrell (Memory gradient method)
s ( k )  f ( x ( k ) )   ( k ) s ( k 1)
where  (1) is sought directly at each iteration such that s ( k ) Cs ( k 1)  0 .
T
cf) Use when the objective and gradient evaluations are very inexpensive.
ii) Daniel
T
s ( k 1)  2 f ( x ( k ) )f ( x ( k ) )
s (k )
 f ( x ) (k )
s ( k 1)
( k 1)T ( k 1)
s  f ( x )s
2 (k )
iii) Sorenson and Wolfe

g ( x ( k ) )T g ( x ( k ) ) ( k 1)
s ( k )  f ( x ( k ) )  s
g ( x ( k ) )T s ( k 1)
iv) Polak and Ribiere
g ( x ( k ) )T g ( x ( k ) )
s (k )
 f ( x ) (k )
2
s ( k 1)
( k 1)
g(x )

Remark 2: These methods are doomed to a linear rate of convergence in the

absence of periodic restarts to avoid the dependency of the directions.
2
 Set s ( k )   g ( x ( k ) ) whenever g ( x ( k ) )T g ( x ( k 1) )  0.2 g ( x ( k ) ) or
every n iterations.
Remark 3: The Polak and Ribiere method is more efficient for general functions
and less sensitive to inexact line search than the Fletcher and Reeves.
 Quasi-Newton Method
- Mimic the Newton’s method using only first-order information
- Form of search direction: s ( x ( k ) )   A ( k )f ( x ( k ) )
where A is an nxn matrix call the metric.
- Variable metric methods employ search direction of this form.
- Quasi-Newton method is a variable metric method with the quadratic property.
 x  C 1  g
- Recursive form for estimation of the inverse of Hessian
A ( k 1)  A ( k )  A (ck ) ( A (ck ) is a correction to the current metric)
- If A ( k ) approaches to H 1   2 f ( x* ) 1 , on additional line search will produce the

minimum if the function is quadratic.
- Assume H 1   A ( k ) . Then x ( k )   A ( k ) g ( k )   A ( k 1) g ( k )
 A (ck ) g ( k )  x ( k ) /   A ( k ) g ( k )
Family of solutions
1  x ( k ) yT  A ( k ) g ( k ) z T
 A (ck )    (y and z are arbitrary vectors)
  yT g ( k )  z T g ( k )
- DFP method (Davidon-Fletcher-Powell)

Let   1 , y  x ( k ) and z  A ( k ) g ( k ) .
 x ( k 1) x ( k 1)T  A ( k 1) g ( k 1) g ( k 1)T A ( k 1)

 A ( k )  A ( k 1)   
 x ( k 1)T g ( k 1)  T
g ( k 1) A ( k 1) g ( k 1)
 
 If A (0) is any symmetric positive definite, then A ( k ) will be so in the absence
of round-off error. ( A (0)  I is a convenient choice.)
 z T x ( k 1) x ( k 1)T z  z T A ( k 1) g ( k 1) g ( k 1)T A ( k 1) z
( k 1)
z A z  z A z 
T (k ) T

 x ( k 1)T g ( k 1)  T
g ( k 1) A ( k 1) g ( k 1)
 
( aT b) 2 ( z T x ( k 1) ) 2 1/ 2 1/ 2
a a T 
T
T
where a  A( k 1) z, b  A( k 1) g ( k 1)
 
b b x ( k 1)
g ( k 1)

T T T T
i) x ( k 1) g ( k 1)  x ( k 1) g ( k )  x ( k 1) g ( k 1)  x ( k 1) g ( k 1)
x ( k 1) g ( k 1)  ( ( k 1) g ( k 1) A ( k 1) g ( k 1) )  0

T T
 a a  b b    a b 
2
ii) T T T
 0 (Schwarz inequality)
iii) If a and b are proportional (z and g ( k 1) are too),
 a a  b b    a b 
2
T T T
 0.
x ( k 1) z  cx ( k 1) g ( k 1)  c ( k 1) g ( k 1) A ( k 1) g ( k 1)  0

T T T
but
 z T Az  0
 This method has the descent property.
f  f ( x ( k ) )T x   ( k )f ( x ( k ) )T A ( k )f ( x ( k ) )  0 for  ( k )  0
- Variations
 McCormick (Pearson No.2)
T
(x ( k 1)  A ( k 1) g ( k 1) ) x ( k 1)
A ( k )  A ( k 1)  T
x ( k 1) g ( k 1)
 Pearson (Pearson No.3)

T
(x ( k 1)  A ( k 1) g ( k 1) ) g ( k 1) A ( k 1)
A ( k )  A ( k 1)  T
g ( k 1) A ( k 1) g ( k 1)
 Broydon 1965 method (not symmetric)

T
(x ( k 1)  A ( k 1) g ( k 1) ) x ( k 1) A ( k 1)
A ( k )  A ( k 1)  T
x ( k 1) A ( k 1) g ( k 1)

 Broydon symmetric Rank-one method (1967)

( x ( k 1)  A ( k 1) g ( k 1) )( x ( k 1)  A ( k 1) g ( k 1) )T
A ( k )  A ( k 1) 
( x ( k 1)  A ( k 1) g ( k 1) )T g ( k 1)
 Zoutendijk
T
( k 1) A ( k 1) g ( k 1) g ( k 1) A ( k 1)
A (k )
A  T
g ( k 1) A ( k 1) g ( k 1)
 BFS method (Broydon-Fletcher-Shanno, rank-two method)
𝑨 𝑰 𝑨 𝑰
 Invariant DFP (Oren, 1974)
x ( k 1) g ( k 1)
T
 ( k 1) A ( k 1) g ( k 1) g ( k 1)T A ( k 1)  x ( k 1) x ( k 1)T
A (k )
 T A  T  T
g ( k 1) A ( k 1) g ( k 1)  g ( k 1) A ( k 1) g ( k 1)  x ( k 1) g ( k 1)
 Hwang (Unification of many variations)

T
A ( k )  A ( k 1)   x ( k 1) A ( k 1) g ( k 1)  B ( k 1)  x ( k 1) A ( k 1) g ( k 1) 
T
where B is 2x2 and B ( k 1)  x ( k 1) A ( k 1) g ( k 1)  g ( k 1)  [ -1]T .
Remark: If   1 and B ( k )  diag (1/ x ( k ) g ( k ) , 1/ g ( k ) A ( k ) g ( k ) ) , this

T T
method will be same as DFP method.
Remark 1: As these methods iterate, A ( k ) tends to become ill-conditioned or nearly

singular. Thus, they require restart. ( A ( k )  I : loss of 2nd-order information)
cf) Condition number= ratio of max. and min. magnitudes of eigenvalues of A.
Ill-conditioned: if A has large condition number
Remark 2: The size of A ( k ) is quite big if n is large. (computation and storage)
Remark 3: BFS method is widely used and known that it has decreased need for restart
and it is less dependent on exact line search.
Remark 4: The line search is the most time-consuming phase of these methods.
Remark 5: If the gradient is not explicitly available, the numerical gradient can be
obtained using, for example, forward and central difference approximations. If
the changes in x and/or f between iterations are small, the central difference
approximation is better at the cost of more computation.

3.4 Comparison of Methods

- Test functions
 Rosenblock’s function: f ( x)  100( x2  x12 ) 2  (1  x1 ) 2
1  1  x22 x12 x22  100 

 Fenton and Eason’s Function: f ( x)    
2
12 x 
 x1 x2  
1 4
10  x12

f ( x)  100( x2  x12 ) 2  (1  x1 ) 2  90( x4  x32 ) 2  (1  x3 ) 2

 Wood’s function:
10.1 ( x2  1) 2  ( x4  1) 2   19.8( x2  1)( x4  1)
- Test results
 Himmelblau (1972): BFS, DFP and Powell’s direct search methods are superior.
 Sargent and Sebastian (1971): BFS among BFS, DFP and FR methods
 Shanno and Phua (1980): BFS
 Reklaitis (1983): FR among Cauchy, FR, DFP, and BFS methods

L2 Unconstrained

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L2 Unconstrained

Uploaded by

Copyright:

Available Formats

Department of Chemical and Biological Engineering Korea University

Function of Several Variables

 Unconstrained multivariable minimization problem:

where x is a vector of design variables of dimension N, and f is a scalar objective function.

3.1 Optimality Criteria

- In order for x to be strict local minimum

ChE605 Engineering Optimization II-1

- Optimality criterion (strict)

ChE605 Engineering Optimization II-2

- The stationary point x is a

- Theorem 3.1 Necessary condition for a local minimum

- Theorem 3.2 Sufficient condition for strict local minimum

 Modified simplex search method (Nelder and Mead)

ChE605 Engineering Optimization II-3

Remark 1: The indices h and l have the one of value of i.

 Hooke-Jeeves Pattern Search

C. Else, x ( k )  xb( k ) and go to i).

ChE605 Engineering Optimization II-4

- Let  i be the net distance moved in  i direction. And

- Use 1 ,  2 , ,  n as a new search direction for exploratory search.

 Powell’s Conjugate Direction Method

ChE605 Engineering Optimization II-5

Remark 2: If si is the i-th column of a matrix T, then TTCT is a diagonal matrix.

- Parallel subspace property

Let zi be the minimum point of min f ( xi   d ) .

(bT  z1T C)d  (bT  z2T C)d  0  ( z1  z2 )T Cd  0

- Given C, find n-conjugate directions

B. Recursive method (from an arbitrary direction z1)

ChE605 Engineering Optimization II-6

cf) Select b so that ziT Azi 1  ziT A( Azi  bzi )  0

- Powell’s conjugate direction method

Suppose k* is obtained by min f ( x  k sn 1 ) .

And let f ( xm( k)1 )  f ( xm( k ) )  max  f ( x (jk1) )  f ( x (jk ) ) 

Let D   s1 s2  sn  and xm( k)1  xm( k )  max x (jk1)  x (jk )

xm( k)1  xm( k )

ChE605 Engineering Optimization II-7

If C<1, then it is convergent at r-order of convergence rate.

3.3 Gradient based Methods

- Gradient based methods require accurate values of first derivative of f(x).

 Steepest descent Method (Cauchy’s Method)

x*  arg max  f ( x )T x   f ( x ) (  0)

The search direction: s ( x ( k ) )   ( k )f ( x ( k ) )

Remark 1: This method shows slow improvement near optimum. ( f ( x )  0)

ChE605 Engineering Optimization II-8

 Newton’s Method (Modified Newton’s Method)

 Marquardt’s Method (Marquardt’s compromise)

ChE605 Engineering Optimization II-9

Else set  ( k 1)  2 ( k ) .

f ( x)  f12 ( x )  f 22 ( x)    f m2 ( x ) (Levenberg-Marquardt method)

Remark 2: Goldstein and Price Algorithm

ii) Check if f ( x ( k ) )   . If yes, then stop.

v) Set x ( k 1)  x ( k )   k  ( x ( k ) ) and k=k+1. Then go to ii).

 Conjugate Gradient Method

ChE605 Engineering Optimization II-10

- In order that the s ( k ) is C-conjugate to all previous search direction

i) Choose  (0) such that s (1) Cs (0)  0

where s (1)   g (1)   (0) s (0)   g (1)   (0) g (0)

where s (2)   g (2)   (0) g (0)   (1) ( g (1)   (0) g (0) )

ii) In general, s ( k )   g (1)   ( k ) s ( k 1)

iii) Sorenson and Wolfe

ChE605 Engineering Optimization II-11

Remark 2: These methods are doomed to a linear rate of convergence in the

- Recursive form for estimation of the inverse of Hessian