Professional Documents
Culture Documents
UNCONSTRAINED
OPTIMIZATION
Prof. S.S. Jang
Department of Chemical Engineering
National Tsing-Hua Univeristy
Contents
Optimality Criteria
Direct Search Methods
• Nelder and Mead (Simplex Search)
• Hook and Jeeves(Pattern Search)
• Powell’s Method (The Conjugated Direction Search)
Gradient Based Methods
• Cauchy’s method (the Steepest Decent)
• Newton’s Method
• Conjugate Gradient Method (The Fletcher and
Reese Method)
• Quasi-Newton Method – Variable Metric Method
3-1 The Optimality Criteria
x1
x
Given a function (objective function) f(x), where x 2
and let
xN
Stationary Condition: f x * 0
x=linspace(-3,3);y=x;
>> for i=1:100
for j=1:100
z(i,j)=2*x(i)*x(i)+4*x(i)*y(j)-10*x(i)*y(j)^3+y(j)^2;
end
end
mesh(x,y,z)
contour(x,y,z,100)
Example- Continued (mesh and contour)
Example-Continued
At 0
x*
0
f
4 x1 4 x 2 10 x 2 0
2
x x*
x1
2 f
x x* 4
x1
2
2 f
x x* 2
x 2
2
2 f
12 x 2 10 10
2
x x*
x1x 2
4 10
H Indefinite, x* is a non-optimum point.
10 2
Mesh and Contour – Himmelblau’s
function : f(x)=(x12+x2-11)2+(x1+x22-7)2
Optimality Criteria may find a local
minimum in this case
x=linspace(-5,5);y=x;
for i=1:100
for j=1:100
z(i,j)=(x(i)^2+y(j)-11)^2+(x(i)+y(j)^2-7)^2;
end
end
mesh(x,y,z)
contour(x,y,z,80)
Mesh and Contour - Continued
1000
3
800 2
600 1
400 0
200 -1
0 -2
5
-3
5
0 -4
0
-5 -5 -5
-5 -4 -3 -2 -1 0 1 2 3 4 5
Example: f(x)=x12+25x22
Rosenbrock’s Banana Function
f(x,y)=100(y-x2)2+(1-x)2
Example: f(x)=2x13+4x1x23-10x1x2+x23
Steepest decent search from the initial point at (2,2); we
have d=[6x12+4x23-10x2, 12x1x22-10x1+3x22]’=
Example: Curve Fitting
From the theoretical considerations, it is believe
that the dependent variable y is related to variable
x via a two-parameter function:
k1 x
y
1 k 2 x
The parameters k1 and k2 are to be determined by a least
square of the following experimental data:
x y
1.0 1.05
2.0 1.25
3.0 1.55
4.0 1.59
Problem Formulation and mesh, contour
2 2 2 2
k 2k1 3k1 4k1
min 1.05 1 1.25 1.55 1.59
k1 , k 2
1 k2 1 2k 2 1 3k 2 1 4k 2
The Importance of One-dimensional Problem - The
Iterative Optimization Procedure
Consider a objective
function Min f(X)=
x12+x22, with an initial
point X0=(-4,-1) and
a direction (1,0), what
is the optimum at this
direction, i.e. X1=X0
+*(1,0). This is a one
-dimensional search for
.
3-2 Steepest Decent Approach –
Cauchy’s Method
A direction d=[x1,x2,..,xn]’ is a n-
dimensional vector, a line search is an
approach such that from a based point x0,
find *, for all , x0+ *d is an optimum.
Theorem: the direction d=-[f/x1,f/x2, .., f/xn]’ is
the steepest decent direction.
Proof: Consider a two dimensional system: f(x1,x2), the movement
ds2=dx12+dx22, and f= (f/x1)dx1+ (f/x2)dx2=
(f/x1)dx1+ (f/x2)(ds2-dx12)1/2. Then to maximize the change f,
we have d(f)/dx1=0. It can be shown that dx1/ds= (f/x1)/( (f/x1)2
+ (f/x2)2)1/2, dx2/ds= (f/x2)/( (f/x1)2+ (f/x2)2)1/2
Example: f(x)=2x13+4x1x23-10x1x2+x23
Steepest decent search from the initial point at (2,2); we
have d=[6x12+4x23-10x2, 12x1x22-10x1+3x22]’=
Quadratic Function Properties
Property #1: Optimal linear search for a quadratic function:
f ( x ) f ( xk ) T f k x x k x xk T H x xk
1
2
Let x xk sk
df
0 T f k sk skT Hs k
d
T f k sk
T
sk Hs k
Quadratic Function Properties- Continued
Property #2 fk+1 is orthogonal to sk for a quadratic function
1 T
Let f ( x) a x b x Hx
T
2
f b Hx or Hx k f k b
Since s T f s T Hs 0 (sk xk 1 xk )
skT f k skT H xk 1 xk 0
skT f k skT f k 1 b f k b 0
skT f k 1 0
3-2 Newton’s Method – Best
Convergence approaching the optimum
f ( x ) f xk T f k x xk x xk T H x xk
1
2
Let Δx k x xk ;
Want
d
f x f xk 0 T f k H x xk
xk
xk 1 xk H 1f k
Define d H 1f k
xk 1 xk * H 1f k
3-2 Newton’s Method – Best
Convergence approaching the optimum
1 1
1 12 x1 12 x22 10 6x12 +4x 23 -10x 2 24 38 36 0.4739
d H f 2 2
88 0.6481
12 x2 10 24 x1 x2 6 x2 12x1x 2 -10x1 +3x 2
2
38 108
Comparison of Newton’s Method and
Cauchy’s Method
Remarks
Newton’s method is much more efficient than Steep
Decent approach. Especially, the starting point is
getting close to the optimum.
There is a requirement for the positive definite of the
hessian in implementing Newton’s method. Otherwise,
the search will be diverse.
The evaluation of hessian requires second derivative of
the objective function, thus, the number of function
evaluation will be drastically increased if a numerical
derivative is used.
Steepest decent is very inefficient around the optimum,
but very efficient at the very beginning of the search.
Conclusion
2
x x d
(1)
then
dq dq dx
(b' x' H )d
d dx d
dq
Assume, at y1 0 (b' y1 ' H )d (1)
d
Same approach, start from x ( 2 ) get y2 such that
dq
0 (b' y2 ' H )d (1)
d
(1) - (2) yields y1 ' y2 ' Hd 0
y1 ' y2 ' is conjugated to d w.r.t. H
Powell’s Algorithm
Step 1: Define x0, and set N linearly indep.
Directions say:
s(i)=e(i)
Step 2: Perfrom N+1 one directional searches
along with s(N), s(1),,s(N)
Step 3: Form the conjugate direction d using the
extended parallel space property.
Step 4: Set s (1)
s ( 2)
, s ( 2)
s ( 3)
, , s ( N 1)
s (N)
, s (N)
d
Go to Step 2
Powell’s Algorithm (MATLAB)
function opt=powell_n_dim(x0,n,eps,tol,opt_func)
xx=x0;obj_old=feval(opt_func,x0);s=zeros(n,n);obj=obj_old;df=100;absdx=100;
for i=1:n;s(i,i)=1;end
while df>eps/absdx>tol
x_old=xx;
for i=1:n+1
if(i==n+1) j=1; else j=i; end
ss=s(:,j);
alopt=one_dim_pw(xx,ss',opt_func);
xx=xx+alopt*ss';
if(i==1)
y1=xx;
end
if(i==n+1)
y2=xx;
end
end
d=y2-y1;
nn=norm(d,'fro');
for i=1:n
d(i)=d(i)/nn;
end
dd=d';
alopt=one_dim_pw(xx,dd',opt_func);
xx=xx+alopt*d
dx=xx-x_old;
plot(xx(1),xx(2),'ro')
absdx=norm(dx,'fro');
obj=feval(opt_func,xx);
df=abs(obj-obj_old);
obj_old=obj;
for i=1:n-1
s(:,i)=s(:,i+1);
end
s(:,n)=dd;
end
opt=xx
Example: f(x)=x12+25x22
Example 2: Rosenbrock’s function
Example: fittings
x0=[3 3];opt=fminsearch('fittings',x0)
opt =
2.0884 1.0623
Fittings
Function Fminsearch
>> help fminsearch
… Q
D
L
y T g k z T g k
k
Quasi-Newton’s Method – Continued
Choose y= xk, z= Ak gk (The Davidon-Flecher-Powell’s
method
Broyden-Flecher-Shanno’s Method
( k 1) x ( k ) g ( k )T k x ( k ) g ( k )T x ( k ) x ( k )T
A I ( k )T ( k ) A I ( k )T ( k ) ( k )T ( k )
Summary: x g x g x g
In the first run, given initial conditions and convergence criteria, and
evaluate the gradient
First direction is on the negative of the gradient, Ak =I
In the iteration phase, evaluate the new gradient and find gk, xk,
and solve Ak using the equation in the previous slide
The new direction is on sk+1=Ak+1f k+1
Check convergence, and continue the iteration phase
Example: (The BFS Method for
Rosenbrock’s function )
Heuristic Approaches (1) s-2 Simplex
Approach
Main Idea Trial Point
Base Point
Rules
1: Straddled: Selected the “worse” vertex generated in the
previous iteration, then choose instead the vertex with the
next highest function value
2. Cycling: In a given vertex unchanged more than M
iterations, reduce the size by some factor
3. Termination: The search is terminated when the simlex
gets small enough.
Example: Rosenbrock Function
Simplex Approach: Nelder and Mead
xnew
z
xnew
xnew
X(h) xc
X(h) xc
xnew
z
xnew
X(h) xc
X(h) z xc
(c)Contraction ( O) (d )Contraction ( O)
f ( xnew ) f (g)
and
f ( g ) f ( xnew ) f ( h )
f ( xnew ) f ( h )
Comparisons of Different Approach for
Solving Rosenbrock function
Method subprogram Optimum found Number of Cost function Number of function
iterations value called