You are on page 1of 26

Optimization Techniques

Multi-variable Unconstrained Optimization:


Cauchy method and Newton Method

Dr. Nasir M Mirza


Email: nasirmm@yahoo.com

In this Lecture
In this lecture we will discuss two important methods
that deal with Multi-variable Unconstrained
Optimization:
Cauchys Steepest Ascent Method
Newton's Method

Cauchys Steepest Ascent Method


The search direction used in Cauchys method is the
negative of the gradient at any particular point x*.
Since this direction gives maximum descent in
function values, it is also known as the steepest
descent method.
At every iteration the derivative is computed at
current point and a unidirectional search is performed
in negative to this derivative direction to find
maximum.
The maximum becomes the current point and search
is continued from this point.

Cauchys Steepest Ascent Method


Steepest Ascent Algorithm
STEP 1: choose a maximum number of iterations M, an initial point, x0
= ( x1, x2 , , xn ) and two termination parameters 1 and 2.
STEP 2: It is kth iteration.
Calculate Si = Vf the first derivative at xi
STEP 3: if the || Vf || < 1 ; Terminate.
Else if k >M; Terminate. Else go to Step 4.
STEP 4: Perform uni-directional search to find (k) such that
f(x(k+1)) = f(x(k) a(k)Vf(x(k)) 2 is minimum
STEP 5: Stop loop if x converges or if the error is small enough

Steepest ascent method converges linearly.

Example: Suppose f(x, y) = 2xy + 2x x2 2y2


Using the steepest ascent method to find the next point if we are
moving from point (-1, 1).

f
f
= 2 y + 2 2x
= 2x 4 y
x
y
f

1
,
1
)
x
2(1) + 2 2(1) 6
At (1,1), f = f
=
=

(1,1) 2(1) 4(1) 6


y

Let g (h) = f (-1 + 6h,1 6h)


Next step is to find h that maximize g(h)

f ( x) = 2 xy + 2 x x 2 2 y 2
g (h) = f (-1 + 6h,1 6h) = ... = 7 + 72h 180h 2
Setting g ' (h) = 0 yields
72 360h = 0 h = 0.2
If h = 0.2 maximizes g(h), then x = -1+6(0.2) = 0.2 and y = 16(0.2) = -0.2 would maximize f(x, y).
So moving along the direction of gradient from point (-1,
1), we would reach the optimum point (which is our next
point) at (0.2, -0.2).

EXERCISE 3.4.2
Consider the Himmelblau
function:
Minimize:
f(x, y) = (x2 + y 11)2 + (x + y2
7)2
Step 1: In order to ensure
proper convergence, a large value
of M (= 100) is usually chosen.

The choice of M also depends on the available time and computing


resource. Let us choose M = 100, an initial point x(0) = (0, 0)T, and
termination parameters 1 = 2 = 10-3 . We also set k = 0.

EXERCISE 3.4.2
3D Graph

Example
Contour graph:

% Matlab program to
draw contour of function
[X,Y] = meshgrid(0:.1:5);
Z = (X.*X + Y -11.).^2. +(
X + Y.*Y - 7.).^2.
contour(X, Y, Z, 150);
colormap(jet);

Minimum point

EXERCISE 3.4.2
Step 2:
The derivative at x(0) = (0, 0)T is
first calculated

f ( xi + h) f ( xi h)
f
=
2h
xi
and found to be (-14, -22)T, which
is identical to the exact derivative
at that point (Figure 3.12).
Step 3:
The magnitude of the derivative vector is not small and k =a < M =
100. Thus, we do not terminate; rather we proceed to Step 4.

EXERCISE 3.4.2
Step 4: At this step, we need to perform a line search from x(0)
in the direction - f(x(0)) such that the function value is minimum.
Along that direction, any point can be expressed by fixing a value or
the parameter a(0) in the equation:

x = x(0) - a(0) f(x(0)) = (14a(0), 22a(0))T .

f ( x) = ( x 2 + y 11) 2 + ( x + y 2 7) 2
g (a ) = f (14a, 22a ) = (196a 2 + 22a 11) 2 + (484a 2 + 14a 7) 2
g (a ) = 2(392a + 22)(196a 2 + 22a 11) +
2(968a + 14)(484a 2 + 14a 7) 2
Setting g ' (a ) = 0 for region (0,1) yields
a = 0.1

EXERCISE 3.4.2
Thus, the minimum point along the search direction is
x(1) == (1.788, 2.810)T.
Step 5: Since, x(1) and x(0) are quite different, we do
not terminate; rather we move back to Step 2. This
completes one iteration of Cauchy's method. The total
number of function evaluations required in this iteration
is equal to 30.
Step 2: The derivative vector at this point, computed
numerically, is (-30.7, 18.8)T.

EXERCISE 3.4.2
Step 3: This magnitude of the derivative vector is not
smaller than 1. Thus, we continue with Step 4.
Step 4: Another unidirectional search along (30.70, 18.80)T from the point x(1) = (1.788, 2.810)T using
the golden section search finds the new point:
x(2) = (3.00, 1.99)T with a function value equal to 0.018.

Newtons Method
This method uses second order derivatives to create
search directions.
This allows faster convergence to the minimum point.
Consider the first three terms in Taylors series
expansion of a multivariable function, it can be shown
that the first order optimality condition will be satisfied
if following search direction is used:

(k )

= f ( x ) f ( x )
2

(k )

(k )

Newton's Method
One-dimensional
Optimization
At the
optimal
Newton's
Method

Multi-dimensional
Optimization

f ' ( xi ) = 0

f (x) = 0

f ' ( xi )
xi +1 = xi
f " ( xi )

xi +1 = xi H i1f (xi )

f " ( xi ) 1 f ' ( xi )

Hi is the Hessian matrix (or


matrix of 2nd partial
derivatives) of f evaluated
at xi.

Newton's Method
x i +1 = x i H i1f ( x i )
Converge quadratic fashion.
May diverge if the starting point is not close
enough to the optimum point.
Costly to evaluate H-1.

EXERCISE 3.4.3
Consider the Himmelblau function:
Minimize using Newtons method:
f(x, y) = (x2 + y 11)2 + (x + y2 7)2

Step 1: In order to ensure proper convergence, a large value of


M (= 100) is usually chosen.
Also, keep an initial point x(0) = (0, 0)T,
and termination parameters 1 = 2 = 10-3 . We also set k = 0 as an
iteration counter.
Step 2: The derivative at this point is calculated as (-14, -22)T.
Step 3: Since the termination criteria are not met we go to the step 4.

EXERCISE 3.4.3
3D Graph

Example on Newton Method


Contour graph:
% Matlab program to
draw contour of function
[X,Y] = meshgrid(0:.1:5);
Z = (X.*X + Y -11.).^2. +(
X + Y.*Y - 7.).^2.
contour(X, Y, Z, 150);
colormap(jet);

Minimum point

Example on Newton Method

) (
) (
2

f ( x, y) = x + y 11 + x + y 7
f
= 4 x x 2 + y 11 + 2 x + y 2 7 = 4 x 3 + 4 xy 42 x + y 2 14
x
f
= 2 x 2 + y 11 + 2 y x + y 2 7 = 2 x 2 2 y 22 + 2 xy + 2 y 3 14 y
y
2

12 x 2 + 4 y 42
4x + 2 y
H=
2
16 + 2 x + 6 y
4x + 2 y
0
42.0
T
At (0,0), f = [ 14 22] and H =
.

22
0

Example on Newton Method

) (
2

f ( x, y ) = x + y 11 + x + y 7
2

0
42.0
At (0,0), f = [ 14 22] and H =
.

22
0
x(1) = x(0) aH 1f
T

0 14
0 a 22

=
42 22
0 924 0
0.333a

=
a

Example on Newton Method


Performing a unidirectional search along this
direction, we obtain value of a = -3.349.
Since this quantity is negative, the function
value does not reduce in the given search
direction.
Instead, function value reduces in the opposite
direction.
This shows that search direction in Newton
method may not always be descent.
When this happens we restart with a new point.

Example on Newton Method


Now again select a new initial point x(0) = (2, 1)T,
and function value at this point is f(x(0)) = 52.
Step 2: The derivative at this point is calculated as (-57, -24)T.
Step 3: Since the termination criteria are not met we go to the step 4.
Step 4: At this point the Hessian is given as:

) (
) (

f ( x, y ) = x 2 + y 11 + x + y 2 7
f
= 4 x x 2 + y 11 + 2 x + y 2 7 = 4 x 3 + 4 xy 42 x + y 2 14
x
f
= 2 x 2 + y 11 + 2 y x + y 2 7 = 2 x 2 2 y 22 + 2 xy + 2 y 3 14 y
y

12 x 2 + 4 y 42
4x + 2 y
H=
2
x
y
x
y
+

+
+
4
2
2
2
3

10 10
At (2, 1), f = [ 57 24] and H =
.

10 5
T

Example on Newton Method

) (
2

f ( x, y ) = x + y 11 + x + y 7
2

10 10
At (0,0), f = [ 57 24] and H =
.

10 5
1
x(1) = x(0) aH f
T

10 57
2
a 5

=
1 50 10 10 24
2 0.9a

=
1 + 6.6a

Example on Newton Method


A unidirectional search along this direction
reveals that the minimum occurs at a = 0.34
Since this quantity is positive, we accept this.
The new point is then (1.694, 3.244) and
function value is 44.335 which is smaller than
the previous point value of 52.
It means we are going in right direction.
When we continue for next two iterations we
reach at point (3.021, 2.003)T with f = 0.0178;
Hence answer is point (3, 2)T for this case when
we go further.

EXERCISE 3.4.3
This method is effective for initial points close to the
optimum.
This demands some knowledge of the optimum point.
Computation of the Hessian matrix and its inverse is
computationally expensive.

You might also like