You are on page 1of 14

I.

Classical Optimization Basics

The notes provided here is as a guideline only, and is not a substitute for attending
classes, writing own notes and doing some own reading!

Introduction

y
nl
In everyday language, optimisation means the process of obtaining the ’best’. Examples of
optimisation are prevalent in all walks of day to day life. However, optimisation has a

O
technical connotation, unlike the ’best’ in everyday language. The word ‘optimum’ (or the

n
older word ’extremum’) refers to either the minimum or maximum.

io
Optimisation theory is a branch of mathematics encompassing the quantitative study of
at
optima (plural of optimum), and methods for finding them.
ul
Optimisation practice is the collection of techniques, methods, procedures, and algorithms
that can be used to find the optima.
irc

Even though optimisation seems to be present everywhere in some form or the other,
eC

applying optimisation to an engineering problem requires formal study and knowledge of


both the theory and the practice involved.
at

Single Variable Optimisation


riv

The optimum of f 1 (x)= x2


rP

From our knowledge of elementary calculus, we are familiar with the procedure of finding
the optimum of f 1 (x) = x2. The steps involved in finding the optimum are
Fo

(1) Differentiate f 1 (x) with respect to (‘w.r.t’ hereafter) x and set it equal to zero.
(2) Solve the resulting equation, to get x = x*, where x* refers to the value of x at which the
function attains the optimum.
d f1( x )
Following this procedure, we get = 2x = 0
dx
The solution to the above is, x = 0, denoted by x* hereafter. x*, the value of the independent
variable at the solution point is called as the optimiser, to distinguish it from the function
value at this point (f 1 (x*), in this case), which is called the optimum. An optimiser that
maximises the function is a maximiser, and one that minimises the function is a minimiser.
The optimum of f 2 ( x) = x 3

Repeating the same procedure for f 2 (x) = x3, we get x* = 0, as in the above case for f 1 (x) = x2.
But in this case, x* is neither the minimiser, nor the maximiser! Why does not the above
procedure followed for finding the minimum of f 1 (x) = x2 not work for f 2 (x) = x3? Even in the
case of f 1 (x) = x2, how do we know whether x* is a minimiser or maximiser?

Optimality Conditions

Consider the Taylor’s series expansion of a single variable function f (x) about its optimiser,
x*:
1 1 1
f ( x * +∆x=) f ( x*) + f ′( x*)∆x + f ′′( x*) (∆x) 2 + f ′′′( x*) ( ∆x)3 + f (4) ( x*) ( ∆x) 4 + ...

y
2! 3! 4!

nl
1
= f ( x* ) + f ′( x* )∆x + f ′′( x* ) (∆x) 2 + H. .O. T

O
2!

n
where ∆x = x − x * is a small change in the value of x about x*, and H.O.T. stands for higher
io
order terms The difference between the value of f at its optimiser x*, and its neighbourhood
point (x* + ∆x) is given by
at
ul
1
= ( x*) f ′( x*)∆x +
∆f f ( x * +∆x) − f = f ′′( x*) (∆x) 2 + H.O.T.
2!
irc

f(x) Consider the case of a minimum first. If∆ f on the


eC

L.H.S. has to be always positive for any ∆ x,


positive or negative (see Fig.), that is possible only
if the first term on the R.H.S. of (1.4) is zero. For a
at

non-zero ∆x, this can happen if and only if


riv

f(x* + ∆x)
f ′( x* ) = 0
f(x*)
rP

This condition is known as the first order


x* x necessary condition (FONC).
Fo

x*+(+ ∆x)
x*+(- ∆x) Further, since ∆( x)2 is always positive (for a
minimum), ∆ f can be positive if and only if f ′′( x* ) > 0. If at all f ′′( x* ) < 0, then
=
∆f f ( x* + ∆x) − f ( x* ) < 0, and x* is a maximiser. That is,

f ′′( x* ) > 0 for x* to be a minimiser


< 0 for x* to be a maximiser
The above is known as the second order sufficiency condition (SOSC) for a minimum or
maximum respectively. The necessary and sufficient conditions are together termed as
optimality conditions.
Examples of necessary and sufficient conditions
(1) To take an everyday example from a student’s life, if the student wants to pass in a
subject, attending classes is a necessary condition, but mere physical presence in classes
is not sufficient! The sufficient condition is that he should also be mentally present, and
put in some effort of his own!
(2) To quote a simple example from engineering, an example of a necessary and sufficient
condition is the Routh-Hurwitz criterion used in control engineering. An example of a
sufficient condition that is not necessary is the Lyapunov criterion for stability analysis
that one comes across in modern control.

Going back to the examples of f 1 (x) = x2 and f 2 (x) = x3, we see that f 1 (x) = x2 satisfies both the

y
necessary and sufficiency conditions (FONC and SOSC) for x* = 0 to be a minimiser.

nl
Whereas, f 2 (x) = x3 satisfies the necessary condition (FONC) only, and not the sufficient
condition (SOSC) for x* = 0 to be an optimiser in this case.

O
n
The optimum of f 3 (x) = x4
io
Let us now consider a new case, of f 3 (x) = x4. Both f 2 (x) = x3 and f 3 (x) = x4 satisfy the FONC,
at
but not the SOSC. On this basis, can we conclude that, f 3 (x) does not have a minimiser,
ul
similar to f 2 (x)? But simple geometrical consideration tells us that f 3 (x) too has the same
irc

minimiser as f 1 (x), in x* = 0! In such a case where f′′(x*) = 0, as in f 3 (x) = x4, we need to look
at the higher order derivatives:
eC

1 1 1 (4) *
f ( x* + ∆x=) f ( x* ) + f ′( x* )∆x + f ′′( x* ) (∆x) 2 + f ′′′( x* ) (∆x)3 + f ( x ) (∆x) 4 + H. .O. T
2! 3! 4!
at

By similar reasoning as in the case of FONC, the third order derivative f ′′′( x* ) must be zero
riv

now as a necessary condition, and whether x* is a minimiser or maximiser depends on the


sign of f (4)(x*).
rP

f (4)(x*) > 0 for x* to be a minimum


< 0 for x* to be a maximum
Fo

Generalising, we can state the following theorem.


Theorem: If at a stationary point 1 x* of f (x), the first (n − 1) derivatives vanish, and f (n)
(x*)
≠ 0, then at x = x*, f (x) has
(i) a saddle point if n is odd.
(ii) an optimum point if n is even.
This optimum point will be
a minimum if f (n)(x*) > 0, and
a maximum if f (n)(x*) < 0

1
Defined in the next Section (below).
f 2 (x) = x3 does satisfy the FONC, does not satisfy the SOSC, hence, looking at the third
derivative, it does not satisfy this third order necessary condition. Which means that, it cannot
have an interior optimum.

Types of stationary points


Points at which the first derivative is
equal to zero are known as stationary
f(x 6 )
points. These points can be classified into
f(x 5 ) three types: minimiser, maximiser and
f(x 3) saddle (or inflection) point. Further, a
f(x 1) maximiser (minimiser) may be a local

y
f(x 4 ) maximiser (minimiser), or a global

nl
maximiser (minimiser). In the illustrative
f(x 2 ) Fig. shown here, x 4 is a local minimiser,

O
x 2 a global minimiser, x 3 a local
x1 x2 x3 x4 x5 x6

n
maximiser, and x 6 is a global maximiser.

io
The points x 1 and x 5 are saddle points.
at
ul
Example: Find and classify the stationary points of (or, determine the minimum and maximum values
of) f ( x) =12 x 5 − 45 x 4 + 40 x 3 + 5.
irc

Solution:
eC

By FONC,
f ′( x=) 60 x 4 − 180 x3 + 120 x =
2
60 x 2 ( x 2 − 3 x + 2)= 60 x 2 ( x − 1)( x − 2)= 0
⇒ x1,2,3,4 = 0, + 1, + 2
at
riv

f ′′( x ) = 240 x 3 − 540 x 2 + 240 x = 60( 4 x 3 − 9 x 2 + 4 x )


rP

At x = 1: f ′′( x ) = − 60 ⇒ x = 1 is a maximiser, and f max = f ( x = 1 ) = 12


Fo

At x = 2: f ′′( x ) = 240 ⇒ x = 2 is a minimiser, and f min = f ( x = 2 ) = − 11

At x = 0: f ′′( x ) = 0 ⇒ we must investigate the higher derivatives:

f ′′′( x ) = 720 x 2 − 1080 x + 240 = 240

Since this does not satisfy the third order necessary condition, x = 0 is a saddle point.

Example: Using the first three terms of Taylor’s series, can we calculate cos (0.1)?
Find the values of cos x in the vicinity of (i) 0, given cos 0 = 1, and (ii) cos x in the vicinity of
π/2, given cos (π/2) = 0.
Solution:
f (x) = cos x, f'(x) = −sin x, f′′(x) = −cos x
(i) About 0:
1
f (x)= f (0) + f '(0) ( x − 0 ) + f ''(0)( x − 0) 2
2
1
x) cos(0) + sin(0) ( x − 0 ) − cos(0)( x − 0) 2
cos(=
2

x2
= 1−
2
Using this, we may calculate

y
nl
(0.1) 2
cos(0.1) =
1− =
0.995

O
2
(ii) Similarly, we may compute the value of cos (x) in the vicinity of π/2.

n
io
at
ul
Exercise 1: As a motivational example and a practical application of the basic concepts of
derivative/gradient based optimisation that we have learnt so far, derive the condition for
irc

maximum (power) efficiency of a transformer. Note carefully all the assumptions that you
make.
eC
at

Exercise 2: Applying the optimality conditions, derive the condition for optimum power
delivered to a load, by a practical source.
riv
rP

Multivariable Optimisation
Optimality conditions for multivariable functions
Fo

What is the optimum of f ( x ) = x12 + x 22 ? To answer this, we need to generalise the optimality
conditions of a single variable function to multivariable functions. Writing the Taylor’s series
for the multivariable function f (x) about its optimiser x*,
1
f ( x * + ∆=
x ) f ( x * ) + ∇f T ( x * ) (∆x ) + (∆x )T ∇ 2 f ( x * )(∆x ) + H. . O. T
2
where ∆x = x − x * is a small change in the value of x about x*, and H.O.T. stands for higher
order terms, as in the single variable case, and
 x1 
x 
 2
 . 
x= 
 . 
 . 
 
 x n 

In running text, the above is often denoted as x = [x1 x 2  x n ]T to save space. The
difference between the value of f at its optimiser x*, and its neighbourhood point
(x* + ∆x) is given by

y
1
∆f =f ( x * + ∆x ) − f ( x * ) =
∇f T ( x * ) (∆x ) + (∆x )T ∇ 2 f ( x * )(∆x ) + H. . O. T

nl
2
Consider the case of a minimum first. If ∆f on the L.H.S. has to be always positive for any ∆x,

O
positive or negative, that is possible only if the first term on the R.H.S. of the above equation

n
is zero. For a non-zero ∆x, this can happen if and only if
∇f ( x* ) =
0 io
at
This is the first order necessary condition (FONC), similar to the FONC in the single variable
ul
case. Further, ∆ f on the L.H.S. of the above equation can be positive only if the
(∆x )T ∇ 2 f ( x * )(∆x ) in the second term on the R.H.S. is positive. This quantity can be
irc

positive if and only if (iff) 2 ∇ 2 f ( x * ) is positive definite3. If at all this quantity is negative
eC

definite, then x* is a maximiser. That is,


∇ 2 f ( x * ) > 0 , x* is a strong minimiser
at

<0, x* is a strong maximiser


riv

where > 0 means positive definite, and <0 means negative definite, since the quantity
rP

involved ( ∇ 2 f ( x * ) ) is a matrix.
This is known as the second order sufficiency condition (SOSC) for a minimum or maximum
Fo

respectively, as in the single variable case. The necessary and sufficient conditions are
together termed as optimality conditions.
Instead of the above, if only the weaker conditions of ≥ (positive semi-definite) and ≤
(negative semi-definite) hold, and
∇ 2 f ( x* ) 0, x* is a weak minimiser
0, x* is a weak maximiser

If none of the above hold, then the only possibility is the condition of (indefiniteness), in
which case,
2
if and only if (iff) means the truth of one implies the truth of the other and vice versa, denoted by the symbol
⇔ or ↔ .
3
Definiteness of matrices is defined in the next section, ‘Review of Some Basic Calculus Concepts’.
∇ 2 f ( x * ) ≶ 0, x* is a saddle point
The saddle point is a minimum in some direction, and maximum in some other, this being
possible since x is a vector now, comprising many variables.

Review of Some Basic Calculus Concepts

The derivative of a differentiable function f (x), x = [x 1 , x 2 , … x n ]T, is given by


 ∂f ( x ) 
 ∂x 
 1

y
 ∂f ( x ) 
∇f ( x ) = Df ( x ) = G ( x ) = J ( x ) =  ∂x2 

nl
  

O
 
 ∂f ( x ) 

n
 ∂xn 
io
∇f ( x ) is known as the gradient of f at x, and is of the order n × 1.
at
ul

If ∇f ( x ) is differentiable, f is said to be twice differentiable, and the derivative of the gradient is


irc

given by
eC

∂2 f ( x ) ∂2 f ( x ) ∂ 2 f ( x )
  
 ∂x1
2 ∂x 2 ∂x1 ∂x n ∂x1 
∂2 f ( x ) ∂ 2 f ( x )
at

∂2 f ( x )
 
∇ f ( x ) = D f ( x ) = H ( x ) = ∇( ∇ f ( x ) =  ∂x1∂x 2
2 2 T
∂x 22 ∂x n ∂x 2 
riv

   
 2 
∂ f ( x ) ∂ f(x) ∂ f ( x )
2 2
rP


 ∂x1∂x n
 ∂x 2 ∂x n ∂x n2 
Fo

∇ 2 f ( x ) is known as the Hessian matrix, or simply the Hessian of f at x.

Definiteness of Matrices
A special nonlinear function of the form Q( x ) = x T A x containing only quadratic terms is
said to be in quadratic form.

The third term in the 3 term Taylor series expansion of a multivariable function (that is, x =
[x 1 , x 2 , … x n ]T) is in quadratic form. As mentioned in Section 1.7.1, whether the value of a
function in quadratic form (at some given point) is positive or negative is determined by the
definiteness of the matrix A in Q( x ) = x T A x .
Q( x ) > 0 iff A is positive definite (p.d.)

Q( x ) ≥ 0 iff A is positive semi-definite (p.s.d.)

Q( x ) < 0 iff A is negative definite (n.d.)

Q( x ) ≤ 0 iff A is negative semi-definite (n.s.d.)

Q(x) ≶ 0 iff A is indefinite

Definiteness of a matrix by eigen value method

y
nl
A matrix A nxn is positive definite iff λi ( A ) > 0 , i =1, 2 ,..., n ; in other words, if all its eigen

O
values are strictly positive.

n
A matrix A nxn is positive semi-definite iff λi ( A ) ≥ 0, i =1, 2,..., n ; in other words, if its eigen
values are equal to or greater than zero. io
at
ul
A matrix A nxn is negative definite iff λi ( A ) < 0, i =1, 2,..., n ; in other words, if all its eigen
irc

values are strictly negative.


eC

A matrix A nxn is negative semi-definite iff λi ( A ) ≥ 0, i =1, 2,..., n ; in other words, if its eigen
values are equal to or less than zero.
at

A matrix A nxn is indefinite iff 𝜆𝜆𝑖𝑖 (𝐴𝐴) ≶ 0, i =1, 2 ,..., n ; in other words, if some of its eigen
riv

values are positive, and others negative.


rP
Fo

Definiteness by Sylvester’s Criterion/Theorem


Statement: Let M k , k = 1, 2, …, n be the kth leading principal minor of a n x
n symmetric matrix A. then each M k is defined as the determinant of a k x k sub-matrix
obtained by deleting the last (n-k) rows and columns of A. Assume that no two consecutive
principal minors are zero. Then,

(i) A is positive definite iff all M k > 0, k = 1,2, …, n.

M k > 0, k =1, 2, ..., r


(ii) A is positive semi-definite iff all where r < n is the rank of A.
= 0, k > r
M k < 0 , k odd
(iii) A is negative definite iff , k = 1, 1, …, n.
> 0 , k even

M k < 0, k odd
(iv) A is negative semi-definite iff all > 0, k even where k = 1,2, …, r < n
= 0, k > r
(i) A is indefinite iff it does not satisfy any of the preceding criteria.

Exercises:

y
Using Sylvester’s criterion, comment on the definiteness of
2 2

nl
(i) 𝐴𝐴 = � �
2 4

O
2 − 10
(ii) Q( x ) = x T 
4 
x
0

n
− 1 1 0 
(iii) A =  1 − 1 0 
io
at
 0 0 − 1
ul

(iv) Q( x ) = − x12 − 3 x22 − 11x32 + 2 x1x2 − 4 x2 x3 − 2 x1x3


irc

(v) Q( x ) = 2 x1x2 − 2 x22


eC

Example: Find up to the first 3 terms Taylor’s series expansion of f ( x ) = 3 x13 x2 about (1, 0).
at
riv

Solution:
9 x12 x2  18 x1x2 9 x12 
∇f ( x ) =  3  , ∇ f ( x )= 
2

rP

2
 3 x1   9 x1 0 
0  0 9 
f ( x 0 ) = 0 , ∇f ( x 0 ) =   , ∇ 2 f ( x 0 ) =  
Fo

 3 9 0
1
f(x) ( 1,0 )
= f ( x 0 ) + ∇ f T
( x 0 )( x − x 0 ) + ( x − x0 )T ∇ 2 f ( x0 )( x − x0 )
2
 x − 1 1 0 9  x1 − 1
= 0 + [0 3] 1  + [x1 − 1 x2 ]   
 x2  2 9 0  x2 
x −1
= 3x2 +
1
[9 x2 9( x1 − 1 )] 1 
2  x2 
1
= 3x2 + ( 9 x1x2 − 9 x2 + 9 x1x2 − 9 x2 )
2
= 9 x1x2 − 6 x2
Find and classify the stationary points of f ( x) = x13 + 3 x1 x22 − 3 x12 − 3 x22 + 4

Solution:
3 x 2 + 3 x22 − 6 x1  0 
∇f =  1  = 0=  
 6 x1 x2 − 6 x2  0 
6 x 2 ( x1 − 1 ) = 0 ⇒ x 2 = 0 , or x1 =1
x2 = 0 :
3 x12 + 3 x 22 − 6 x1 = 3 x12 − 6 x1 = 3 x1 ( x1 − 2 ) = 0
⇒ x1 = 0, 2
x1 = 1 :

y
3x12 + 3x 22 − 6 x1 = 3 + 3x 22 − 6 = 0

nl
⇒ x2 = ± 1

O
The stationary points are
(0, 0), (2, 0), (1, 1), (1, -1)

n
6 x1 − 6 6 x2 
H (x ) =
∇ 2 (x ) =
 6x
 2 6 x1 − 6  io
at
ul

Point Hessian minors definiteness type/nature value of f


irc

of H of point
eC

(0, 0) − 6 0  M 1 = -6 < 0 negative maximum 4


 0 − 6 M 2 = 36 > 0 definite
 
at

(2, 0) 6 0  M1 = 6 > 0 positive minimum 0


riv

0 6  M 2 = 36 > 0 definite
 
rP

(1, 1) 0 6  M1 = 0 indefinite saddle 2


6 0  M 2 = -36 < 0
Fo

 

(1, -1)  0 − 6 M1 = 0 indefinite saddle 2


− 6 0  M 2 = -36 < 0
 

Example 2: Find and classify the stationary points of f ( x ) = x12 − x22

Weierstrass Theorem: See uploaded book


Direct and indirect methods:
Direct methods are those that use the function value itself, in the search for the minimiser.
Methods that depend on some other property of the function, like the gradient at the search
point, and not the function value, are known as indirect methods.

y
nl
O
n
io
at
ul
irc
eC
at
riv
rP
Fo
Constrained Optimization Basics

Equality constrained optimization problem:

min f ( x )
x

s.t. h( x ) = 0 :

=
min L f ( x ) + λ T h( x )
x, λ

y
nl
where λ is known as the Lagrange multiplier. Using the optimality conditions (FONC
and SOSC) learnt earlier for solving unconstrained problems, we can solve this now

O
unconstrained problem:

n
 ∂L 
 ∂x  0  io
 x* 
at
∇=
L  =     *
 ∂L  0  will give us the stationary point(s)  λ  , where the solution
ul
 ∂λ 
irc

we are looking for, x* is a subpart of the larger solution.


eC

Example: Suppose we want to make a cyclindrical water tank that is open at the
top, using some material of fixed area. What should the dimensions of the tank be, to
at

maximise the volume, for the material given?


riv

min V = −π r 2 h
r,h
rP

s.t. π r 2 + 2π rh − A0 =
0
Fo

min f = −π r 2 h
s.t. h = π r 2 + 2π rh − A0 = 0

f λh =
L =+ −π r 2 h + λ (π r 2 + 2π rh − A0 )

 ∂L 
 ∂r 
  π (−2rh + 2r λ + 2hλ )  0 
 ∂L    0 
∇=
L =  π ( − r 2 + 2r λ ) =
  
 ∂h 
   π (r 2 + 2rh) − A0  0 
 ∂L 
 ∂λ 
Solving, we get
A
λ=
± 0
λ∗
=
12π

A0
r = h = 2λ = ±

A0
Taking the positive values, r ∗ = h∗ = 2λ ∗ = + .

A0 A0
Vopt π=
The optimum volume is given by = r ∗ 2 h∗ .
3 3π

y
nl
The inequality constrained optimization problem:

O
min f ( x )

n
x

s.t. g ( x ) ≤ 0
io
at
The inequality constraint is converted to an equality constraint, by adding a non-
ul
negative slack variable s (a real number). By adding s2 instead of s, we can ensure
irc

that the slack is always non-negative, whatever be the sign of s.

g ( x) ≤ 0 ⇔ g ( x) + s2 =
eC

We know how to solve the equality constrained problem.


at

min f = x 2
riv

Example:
s.t. 1 − x ≤ 0
rP

Solution: Converting the inequality constraint into an equality one, it becomes


1 − x + s2 =
Fo

0.
Now writing the Langrangean,

L = x 2 + µ (1 − x + s 2 )

Using the FONC,

 ∂L 
 ∂x 
   2x − µ  0
∂L 1 − x + s 2  = 0 
∇L =   =    
∂µ
   2 s µ  0 
 ∂L 
 ∂s 
The above equations can lead to multiple solutions, since the last is a nonlinear one.

= µ 2.
x 1,=
Case 1, s=0: s = 0 ⇒ the inequality constraint is active.

Case 2, µ =0: x = 0, s 2 = −1. s 2 =−1 ⇒ this solution is invalid/infeasible, since we


assumed that the slack variable s2 is a non-negative real number.

Case 3, s=0, µ =0:=x 0,=1 0! Hence this solution too is unacceptable/infeasible!

In summary, the only feasible solution is, ( x* , µ * , s* ) = (1, 2, 0).

y
nl
O
n
io
at
ul
irc
eC
at
riv
rP
Fo

You might also like