Professional Documents
Culture Documents
The notes provided here is as a guideline only, and is not a substitute for attending
classes, writing own notes and doing some own reading!
Introduction
y
nl
In everyday language, optimisation means the process of obtaining the ’best’. Examples of
optimisation are prevalent in all walks of day to day life. However, optimisation has a
O
technical connotation, unlike the ’best’ in everyday language. The word ‘optimum’ (or the
n
older word ’extremum’) refers to either the minimum or maximum.
io
Optimisation theory is a branch of mathematics encompassing the quantitative study of
at
optima (plural of optimum), and methods for finding them.
ul
Optimisation practice is the collection of techniques, methods, procedures, and algorithms
that can be used to find the optima.
irc
Even though optimisation seems to be present everywhere in some form or the other,
eC
From our knowledge of elementary calculus, we are familiar with the procedure of finding
the optimum of f 1 (x) = x2. The steps involved in finding the optimum are
Fo
(1) Differentiate f 1 (x) with respect to (‘w.r.t’ hereafter) x and set it equal to zero.
(2) Solve the resulting equation, to get x = x*, where x* refers to the value of x at which the
function attains the optimum.
d f1( x )
Following this procedure, we get = 2x = 0
dx
The solution to the above is, x = 0, denoted by x* hereafter. x*, the value of the independent
variable at the solution point is called as the optimiser, to distinguish it from the function
value at this point (f 1 (x*), in this case), which is called the optimum. An optimiser that
maximises the function is a maximiser, and one that minimises the function is a minimiser.
The optimum of f 2 ( x) = x 3
Repeating the same procedure for f 2 (x) = x3, we get x* = 0, as in the above case for f 1 (x) = x2.
But in this case, x* is neither the minimiser, nor the maximiser! Why does not the above
procedure followed for finding the minimum of f 1 (x) = x2 not work for f 2 (x) = x3? Even in the
case of f 1 (x) = x2, how do we know whether x* is a minimiser or maximiser?
Optimality Conditions
Consider the Taylor’s series expansion of a single variable function f (x) about its optimiser,
x*:
1 1 1
f ( x * +∆x=) f ( x*) + f ′( x*)∆x + f ′′( x*) (∆x) 2 + f ′′′( x*) ( ∆x)3 + f (4) ( x*) ( ∆x) 4 + ...
y
2! 3! 4!
nl
1
= f ( x* ) + f ′( x* )∆x + f ′′( x* ) (∆x) 2 + H. .O. T
O
2!
n
where ∆x = x − x * is a small change in the value of x about x*, and H.O.T. stands for higher
io
order terms The difference between the value of f at its optimiser x*, and its neighbourhood
point (x* + ∆x) is given by
at
ul
1
= ( x*) f ′( x*)∆x +
∆f f ( x * +∆x) − f = f ′′( x*) (∆x) 2 + H.O.T.
2!
irc
f(x* + ∆x)
f ′( x* ) = 0
f(x*)
rP
x*+(+ ∆x)
x*+(- ∆x) Further, since ∆( x)2 is always positive (for a
minimum), ∆ f can be positive if and only if f ′′( x* ) > 0. If at all f ′′( x* ) < 0, then
=
∆f f ( x* + ∆x) − f ( x* ) < 0, and x* is a maximiser. That is,
Going back to the examples of f 1 (x) = x2 and f 2 (x) = x3, we see that f 1 (x) = x2 satisfies both the
y
necessary and sufficiency conditions (FONC and SOSC) for x* = 0 to be a minimiser.
nl
Whereas, f 2 (x) = x3 satisfies the necessary condition (FONC) only, and not the sufficient
condition (SOSC) for x* = 0 to be an optimiser in this case.
O
n
The optimum of f 3 (x) = x4
io
Let us now consider a new case, of f 3 (x) = x4. Both f 2 (x) = x3 and f 3 (x) = x4 satisfy the FONC,
at
but not the SOSC. On this basis, can we conclude that, f 3 (x) does not have a minimiser,
ul
similar to f 2 (x)? But simple geometrical consideration tells us that f 3 (x) too has the same
irc
minimiser as f 1 (x), in x* = 0! In such a case where f′′(x*) = 0, as in f 3 (x) = x4, we need to look
at the higher order derivatives:
eC
1 1 1 (4) *
f ( x* + ∆x=) f ( x* ) + f ′( x* )∆x + f ′′( x* ) (∆x) 2 + f ′′′( x* ) (∆x)3 + f ( x ) (∆x) 4 + H. .O. T
2! 3! 4!
at
By similar reasoning as in the case of FONC, the third order derivative f ′′′( x* ) must be zero
riv
1
Defined in the next Section (below).
f 2 (x) = x3 does satisfy the FONC, does not satisfy the SOSC, hence, looking at the third
derivative, it does not satisfy this third order necessary condition. Which means that, it cannot
have an interior optimum.
y
f(x 4 ) maximiser (minimiser), or a global
nl
maximiser (minimiser). In the illustrative
f(x 2 ) Fig. shown here, x 4 is a local minimiser,
O
x 2 a global minimiser, x 3 a local
x1 x2 x3 x4 x5 x6
n
maximiser, and x 6 is a global maximiser.
io
The points x 1 and x 5 are saddle points.
at
ul
Example: Find and classify the stationary points of (or, determine the minimum and maximum values
of) f ( x) =12 x 5 − 45 x 4 + 40 x 3 + 5.
irc
Solution:
eC
By FONC,
f ′( x=) 60 x 4 − 180 x3 + 120 x =
2
60 x 2 ( x 2 − 3 x + 2)= 60 x 2 ( x − 1)( x − 2)= 0
⇒ x1,2,3,4 = 0, + 1, + 2
at
riv
Since this does not satisfy the third order necessary condition, x = 0 is a saddle point.
Example: Using the first three terms of Taylor’s series, can we calculate cos (0.1)?
Find the values of cos x in the vicinity of (i) 0, given cos 0 = 1, and (ii) cos x in the vicinity of
π/2, given cos (π/2) = 0.
Solution:
f (x) = cos x, f'(x) = −sin x, f′′(x) = −cos x
(i) About 0:
1
f (x)= f (0) + f '(0) ( x − 0 ) + f ''(0)( x − 0) 2
2
1
x) cos(0) + sin(0) ( x − 0 ) − cos(0)( x − 0) 2
cos(=
2
x2
= 1−
2
Using this, we may calculate
y
nl
(0.1) 2
cos(0.1) =
1− =
0.995
O
2
(ii) Similarly, we may compute the value of cos (x) in the vicinity of π/2.
n
io
at
ul
Exercise 1: As a motivational example and a practical application of the basic concepts of
derivative/gradient based optimisation that we have learnt so far, derive the condition for
irc
maximum (power) efficiency of a transformer. Note carefully all the assumptions that you
make.
eC
at
Exercise 2: Applying the optimality conditions, derive the condition for optimum power
delivered to a load, by a practical source.
riv
rP
Multivariable Optimisation
Optimality conditions for multivariable functions
Fo
What is the optimum of f ( x ) = x12 + x 22 ? To answer this, we need to generalise the optimality
conditions of a single variable function to multivariable functions. Writing the Taylor’s series
for the multivariable function f (x) about its optimiser x*,
1
f ( x * + ∆=
x ) f ( x * ) + ∇f T ( x * ) (∆x ) + (∆x )T ∇ 2 f ( x * )(∆x ) + H. . O. T
2
where ∆x = x − x * is a small change in the value of x about x*, and H.O.T. stands for higher
order terms, as in the single variable case, and
x1
x
2
.
x=
.
.
x n
In running text, the above is often denoted as x = [x1 x 2 x n ]T to save space. The
difference between the value of f at its optimiser x*, and its neighbourhood point
(x* + ∆x) is given by
y
1
∆f =f ( x * + ∆x ) − f ( x * ) =
∇f T ( x * ) (∆x ) + (∆x )T ∇ 2 f ( x * )(∆x ) + H. . O. T
nl
2
Consider the case of a minimum first. If ∆f on the L.H.S. has to be always positive for any ∆x,
O
positive or negative, that is possible only if the first term on the R.H.S. of the above equation
n
is zero. For a non-zero ∆x, this can happen if and only if
∇f ( x* ) =
0 io
at
This is the first order necessary condition (FONC), similar to the FONC in the single variable
ul
case. Further, ∆ f on the L.H.S. of the above equation can be positive only if the
(∆x )T ∇ 2 f ( x * )(∆x ) in the second term on the R.H.S. is positive. This quantity can be
irc
positive if and only if (iff) 2 ∇ 2 f ( x * ) is positive definite3. If at all this quantity is negative
eC
where > 0 means positive definite, and <0 means negative definite, since the quantity
rP
involved ( ∇ 2 f ( x * ) ) is a matrix.
This is known as the second order sufficiency condition (SOSC) for a minimum or maximum
Fo
respectively, as in the single variable case. The necessary and sufficient conditions are
together termed as optimality conditions.
Instead of the above, if only the weaker conditions of ≥ (positive semi-definite) and ≤
(negative semi-definite) hold, and
∇ 2 f ( x* ) 0, x* is a weak minimiser
0, x* is a weak maximiser
If none of the above hold, then the only possibility is the condition of (indefiniteness), in
which case,
2
if and only if (iff) means the truth of one implies the truth of the other and vice versa, denoted by the symbol
⇔ or ↔ .
3
Definiteness of matrices is defined in the next section, ‘Review of Some Basic Calculus Concepts’.
∇ 2 f ( x * ) ≶ 0, x* is a saddle point
The saddle point is a minimum in some direction, and maximum in some other, this being
possible since x is a vector now, comprising many variables.
y
∂f ( x )
∇f ( x ) = Df ( x ) = G ( x ) = J ( x ) = ∂x2
nl
O
∂f ( x )
n
∂xn
io
∇f ( x ) is known as the gradient of f at x, and is of the order n × 1.
at
ul
given by
eC
∂2 f ( x ) ∂2 f ( x ) ∂ 2 f ( x )
∂x1
2 ∂x 2 ∂x1 ∂x n ∂x1
∂2 f ( x ) ∂ 2 f ( x )
at
∂2 f ( x )
∇ f ( x ) = D f ( x ) = H ( x ) = ∇( ∇ f ( x ) = ∂x1∂x 2
2 2 T
∂x 22 ∂x n ∂x 2
riv
2
∂ f ( x ) ∂ f(x) ∂ f ( x )
2 2
rP
∂x1∂x n
∂x 2 ∂x n ∂x n2
Fo
Definiteness of Matrices
A special nonlinear function of the form Q( x ) = x T A x containing only quadratic terms is
said to be in quadratic form.
The third term in the 3 term Taylor series expansion of a multivariable function (that is, x =
[x 1 , x 2 , … x n ]T) is in quadratic form. As mentioned in Section 1.7.1, whether the value of a
function in quadratic form (at some given point) is positive or negative is determined by the
definiteness of the matrix A in Q( x ) = x T A x .
Q( x ) > 0 iff A is positive definite (p.d.)
y
nl
A matrix A nxn is positive definite iff λi ( A ) > 0 , i =1, 2 ,..., n ; in other words, if all its eigen
O
values are strictly positive.
n
A matrix A nxn is positive semi-definite iff λi ( A ) ≥ 0, i =1, 2,..., n ; in other words, if its eigen
values are equal to or greater than zero. io
at
ul
A matrix A nxn is negative definite iff λi ( A ) < 0, i =1, 2,..., n ; in other words, if all its eigen
irc
A matrix A nxn is negative semi-definite iff λi ( A ) ≥ 0, i =1, 2,..., n ; in other words, if its eigen
values are equal to or less than zero.
at
A matrix A nxn is indefinite iff 𝜆𝜆𝑖𝑖 (𝐴𝐴) ≶ 0, i =1, 2 ,..., n ; in other words, if some of its eigen
riv
M k < 0, k odd
(iv) A is negative semi-definite iff all > 0, k even where k = 1,2, …, r < n
= 0, k > r
(i) A is indefinite iff it does not satisfy any of the preceding criteria.
Exercises:
y
Using Sylvester’s criterion, comment on the definiteness of
2 2
nl
(i) 𝐴𝐴 = � �
2 4
O
2 − 10
(ii) Q( x ) = x T
4
x
0
n
− 1 1 0
(iii) A = 1 − 1 0
io
at
0 0 − 1
ul
Example: Find up to the first 3 terms Taylor’s series expansion of f ( x ) = 3 x13 x2 about (1, 0).
at
riv
Solution:
9 x12 x2 18 x1x2 9 x12
∇f ( x ) = 3 , ∇ f ( x )=
2
rP
2
3 x1 9 x1 0
0 0 9
f ( x 0 ) = 0 , ∇f ( x 0 ) = , ∇ 2 f ( x 0 ) =
Fo
3 9 0
1
f(x) ( 1,0 )
= f ( x 0 ) + ∇ f T
( x 0 )( x − x 0 ) + ( x − x0 )T ∇ 2 f ( x0 )( x − x0 )
2
x − 1 1 0 9 x1 − 1
= 0 + [0 3] 1 + [x1 − 1 x2 ]
x2 2 9 0 x2
x −1
= 3x2 +
1
[9 x2 9( x1 − 1 )] 1
2 x2
1
= 3x2 + ( 9 x1x2 − 9 x2 + 9 x1x2 − 9 x2 )
2
= 9 x1x2 − 6 x2
Find and classify the stationary points of f ( x) = x13 + 3 x1 x22 − 3 x12 − 3 x22 + 4
Solution:
3 x 2 + 3 x22 − 6 x1 0
∇f = 1 = 0=
6 x1 x2 − 6 x2 0
6 x 2 ( x1 − 1 ) = 0 ⇒ x 2 = 0 , or x1 =1
x2 = 0 :
3 x12 + 3 x 22 − 6 x1 = 3 x12 − 6 x1 = 3 x1 ( x1 − 2 ) = 0
⇒ x1 = 0, 2
x1 = 1 :
y
3x12 + 3x 22 − 6 x1 = 3 + 3x 22 − 6 = 0
nl
⇒ x2 = ± 1
O
The stationary points are
(0, 0), (2, 0), (1, 1), (1, -1)
n
6 x1 − 6 6 x2
H (x ) =
∇ 2 (x ) =
6x
2 6 x1 − 6 io
at
ul
of H of point
eC
0 6 M 2 = 36 > 0 definite
rP
y
nl
O
n
io
at
ul
irc
eC
at
riv
rP
Fo
Constrained Optimization Basics
min f ( x )
x
s.t. h( x ) = 0 :
=
min L f ( x ) + λ T h( x )
x, λ
y
nl
where λ is known as the Lagrange multiplier. Using the optimality conditions (FONC
and SOSC) learnt earlier for solving unconstrained problems, we can solve this now
O
unconstrained problem:
n
∂L
∂x 0 io
x*
at
∇=
L = *
∂L 0 will give us the stationary point(s) λ , where the solution
ul
∂λ
irc
Example: Suppose we want to make a cyclindrical water tank that is open at the
top, using some material of fixed area. What should the dimensions of the tank be, to
at
min V = −π r 2 h
r,h
rP
s.t. π r 2 + 2π rh − A0 =
0
Fo
min f = −π r 2 h
s.t. h = π r 2 + 2π rh − A0 = 0
f λh =
L =+ −π r 2 h + λ (π r 2 + 2π rh − A0 )
∂L
∂r
π (−2rh + 2r λ + 2hλ ) 0
∂L 0
∇=
L = π ( − r 2 + 2r λ ) =
∂h
π (r 2 + 2rh) − A0 0
∂L
∂λ
Solving, we get
A
λ=
± 0
λ∗
=
12π
A0
r = h = 2λ = ±
3π
A0
Taking the positive values, r ∗ = h∗ = 2λ ∗ = + .
3π
A0 A0
Vopt π=
The optimum volume is given by = r ∗ 2 h∗ .
3 3π
y
nl
The inequality constrained optimization problem:
O
min f ( x )
n
x
s.t. g ( x ) ≤ 0
io
at
The inequality constraint is converted to an equality constraint, by adding a non-
ul
negative slack variable s (a real number). By adding s2 instead of s, we can ensure
irc
g ( x) ≤ 0 ⇔ g ( x) + s2 =
eC
min f = x 2
riv
Example:
s.t. 1 − x ≤ 0
rP
0.
Now writing the Langrangean,
L = x 2 + µ (1 − x + s 2 )
∂L
∂x
2x − µ 0
∂L 1 − x + s 2 = 0
∇L = =
∂µ
2 s µ 0
∂L
∂s
The above equations can lead to multiple solutions, since the last is a nonlinear one.
= µ 2.
x 1,=
Case 1, s=0: s = 0 ⇒ the inequality constraint is active.
y
nl
O
n
io
at
ul
irc
eC
at
riv
rP
Fo