You are on page 1of 15

Optimization: Existence

Reading: Sundaram
Ch 3, Ch 4.1-4.5, (4.6 optional)
Ch 5.1-5.5 (5.2.3, 5.6 optional), Ch. 6.1-6.4 (6.1.2, 6.1.3 optional)
Weierstrass’ Theorem
Let S ⊆ RN be a compact set and f : RN → R is a continuous function. There is a
maximum and minimum of f in S. That is there exists m1 , m2 ∈ S such that
f (m2 ) ≤ f (x) ≤ f (m1 ) for all x ∈ S. These are also called global maximum and
global minimum.
Proof: First we shall show that f (S) is a compact set. Then we shall show that
compact sets in R has maximum and minimum.
Step 1: Take any sequence {yn }∞ n=1 in f (S). yn ∈ f (S) implies there is xn ∈ S such
that yn = f (xn ). Take the sequence {xn }∞ ∞
n=1 in S. Since S is compact, {xn }n=1 has a

subsequence {zn }n=1 which converges to a point z in S. Since f is continuous
{f (zn )}∞ ∞ ∞
n=1 converges to f (z). Of course {f (zn )}n=1 is a subsequence of {yn }n=1 and
f (z) ∈ f (S) (because z ∈ S). Thus every sequence in f (S) has convergent
subsequence implying f (S) is compact.
Step 2: f (S) is compact in R implies f (S) is closed and bounded. f (S) is bounded
implies f (S) has a supremum (say M1 ) and an infimum (say M2 ). Since M1 (similarly
M2 ) satisfies -test, it is a limit point of f (S). Since f (S) is closed, M1 must be in
f (S). Thus there exists m1 ∈ S such that f (m1 ) = M1 ≥ f (x) for all x ∈ S.
Optimization: Existence

Examples
1. f : R → R. f (x) = x. There is no maximum. Here f is continuous but domain is
not compact because it is not bounded.
2. f : (0, 1) → R. f (x) = x. There is no maximum. f is continuous but once again
domain is not compact because it is not closed.
3. f : [0, 1] → R. f (x) = x for all x 6= 1 and f (1) = 0. There is no maximum. Here
domain is compact but f is not continuous.
Weierstrass’ Theorem gives ‘sufficient condition’ for existence of maximum and
minimum. It is not necessary to have these properties for existence of optimum.
4. Take f : [0, 1] → R. f (x) = x for all x 6= 1/2 and f ( 1/2 ) = 0. This has maximum
and minimum.
Finding Interior Optima
Interior Optima
Let f : S → R, where S ⊆ RN . We want to maximize (or minimize) f over S. We
shall focus on maximum. Note that minimum of f is the same as maximum of −f .
Instead of finding global maximum, we start from finding interior local maxima and
use them to find global maximum.
x ? is called an interior local maximum if
(i) x ? is an interior point of S
(ii) There exists  > 0 such that f (x ? ) ≥ f (x) for all x ∈ B (x ? ) ⊆ S.
Note: At this moment we are excluding points that are not interior point of S.
Note: It is strict local maximum if the above inequality is strict.
First order necessary condition:
x ? is interior local maximum (or minimum) and f is differentiable at x ? ⇒ Df (x ? ) = 0
Proof: Take N = 1. Take two sequence {yn }∞ ∞
n=1 and {zn }k=1 such that
? ? ?
lim yn = lim zn = x ; yn < x and zn > x for all n.

f (yn ) − f (x ? ) f (zn ) − f (x ? )
≥0≥
yn − x ? zn − x ?

Taking limit, we get f 0 (x ? ) ≥ 0 ≥ f 0 (x ? ). Thus f 0 (x ? ) = 0.


∂f
When N > 1, follow the same proof for every j. We get ∂x = 0 for all j = 1, 2, . . . , N.
j

Observation: This is not a sufficient condition. Example: S = [−1, 1], f (x) = x 3 .


There is a critical point x = 0 but it is not an optimum.
Digression: Useful Results
Rolle’s Theorem: [a, b] ⊂ R. f : [a, b] → R is a continuous function and has
derivative at each point of (a, b). If f (a) = f (b) then there is c ∈ (a, b) such that
f 0 (c) = 0
Proof: [a, b] is a compact set and f is a continuous function. Therefore f has a global
maximum and minimum in [a, b]. If the maximum or minimum is an interior point,
then we have obtained a c ∈ (a, b) such that f 0 (c) = 0. Otherwise both maximum
and minimum are obtained at a and b, which implies that f is a constant function in
[a, b]. Hence for all c ∈ (a, b), f 0 (c) = 0.
Intermediate Value Theorem: [a, b] ⊂ R. f : [a, b] → R is a continuous function and
has derivative at each point of (a, b). Then there is c ∈ (a, b) such that
f (b)−f (a)
b−a
= f 0 (c)
Proof: Define h(x) = f (x)(b − a) − x[f (b) − f (a)]. Note that h is continuous on
[a, b] and has derivative at each point of (a, b). Moreover h(a) = h(b). Using Rolle’s
Theorem, there is a c ∈ (a, b) such that h0 (c) = f 0 (c)(b − a) − [f (b) − f (a)] = 0
Result: [a, b] ⊂ R. f : [a, b] → R is a continuous function and has derivative at each
point of (a, b).
(i) If f 0 (x) > 0 for all x ∈ (a, b) then f is strictly increasing on [a, b].
(ii) If f 0 (x) < 0 for all x ∈ (a, b) then f is strictly decreasing on [a, b].
(iii) If f 0 (x) = 0 for all x ∈ (a, b) then f is constant on [a, b].
Proof: We just proof part (i). The rest are similar. Take any two points x1 < x2 in
the interval [a, b]. By Intermediate Value Theorem, there exists c ∈ (x1 , x2 ) such that
f (x2 ) − f (x1 ) = (x2 − x1 )f 0 (c). Right hand side is strictly positive because x2 > x1
and f 0 (c) > 0. Hence f (x2 ) > f (x1 )
Second Order Conditions
Second order conditions: f ∈ C 2
1. (necessary) x ? is interior local maximum (resp. minimum) ⇒ z T D 2 f z ≤ 0 (resp.
 
T
 2  N
z D f z ≥ 0) for all z ∈ R and z 6= 0,
2. (sufficient) x ? is a critical point and z T D 2 f z < 0 (resp. z T D 2 f z > 0) for all
   

z ∈ RN such that z 6= 0 ⇒ x ? is a strict local maximum (resp. strict local minimum)


Proof: (N = 1): First we show part 2. Take an interior point x ? such that f 0 (x ? ) = 0
and f 00 (x ? ) < 0 (since it is a scalar). Since f ∈ C 2 , f 00 is a continuous function.
Therefore we can find a neighbourhood around x ? such that for all x ∈ Br (x ? ),
f 00 (x) < 0. Hence f 0 is strictly decreasing in Br (x ? ). Since f 0 (x ? ) = 0, we must have
(i) f 0 (z) > 0 for z < x ? and z ∈ Br (x ? ), (ii) f 0 (z) < 0 for z > x ? and z ∈ Br (x ? ).
(i) ⇒ f is strictly increasing when z < x ? and z ∈ Br (x ? ) ⇒ f (x ? ) > f (z) for all
z < x ? and z ∈ Br (x ? )
(ii) ⇒ f is strictly decreasing when z > x ? and z ∈ Br (x ? ) ⇒ f (x ? ) > f (z) for all
z > x ? and z ∈ Br (x ? )
Combining two cases, we obtain, f (x ? ) > f (z) for all z ∈ Br (x ? ). Hence x ? is a strict
local maximum. Part 1 follows immediately from part 2.
(N > 1): We shall only prove part 1. Pick any z ∈ RN , z 6= 0. Define
g (t) = f (x ? + tz). Note that g is composition of two functions f (which is from
RN → R) and φ, defined as , φ(t) = x ? + tz. Note further that g (0) = f (x ? ) > g (t)
for all t such that (x ? + tz) belongs to the neighbourhood that has x ? as the interior
local maximum. Since g is a function from R→ R, we conclude that 00
 g (0) ≤ 0. By
applying chain rule twice, we get g 00 (0) = z T D 2 f (x ? ) z. Thus z T D 2 f (x ? ) z ≤ 0.
Pathway
Caution

Example: Caution 1: Be aware of necessary conditions.


S = R2 , f (x1 , x2 ) = x13 − 3x1 x22 . Only critical point is x ? = (0, 0) and f (0, 0) = 0
 
0 0
D 2 f (x ? ) =
0 0

which is both PSD and NSD. However x ? is neither local min nor local max. For
 > 0, f (, 0) > 0 and f (−, 0) < 0.
Example: Caution 2: Local maximum does not imply that it is global maximum.
Global maximum and minimum may not exist.
S = R, f (x) = 2x 3 − 3x 2 . Critical points are x ? = 0, 1. Second order conditions
imply 0 is a strict local maximum and 1 is strict local minimum.
Our algorithm will say that global maximum is obtained at 0 and global minimum is
obtained at 1.
However there is no global maximum or minimum. f (x) strictly increases for x > 1
and x < 0. Of course a local maximum is obtained at 0 and a local minimum is
obtained at 1.
Optimization with Equality Constraints
Suppose that the objective function is f : RN → R and there are K (< N) constraint
functions, g 1 : RN → R, g 2 : RN → R, . . . , g K : RN → R. We want to maximize
(or minimize) f (x) over all x ∈ RN such that g 1 (x) = 0, g 2 (x) = 0, . . . , g K (x) = 0.
Once again, we start from local maxima x ? that satisfy K constraints. That is
(i) g 1 (x ? ) = 0, g 2 (x ? ) = 0, . . . , g K (x ? ) = 0
(ii) There exists an open set U such that f (x ? ) ≥ f (x) for all x ∈ U and
g 1 (x) = 0, g 2 (x) = 0, . . . , g K (x) = 0.
Note: K < N makes the domain non trivial.
First order necessary condition (Theorem of Lagrange):
Suppose that f , g 1 , . . . , g K are C 1 functions.
x ? is local maximum (or minimum) of f ⇒ There exists λ?1 , λ?2 , . . . , λ?K ∈ RK such
that
K
X
Df (x ? ) + λ?j Dg j (x ? ) = 0
j=1

However this hold only when rank(Dg (x ? )) = K . This condition is called constraint
qualification.  1 
∂g ∂g 1 ∂g 1
...
 ∂x1 ∂x2 ∂xN 
Dg = . . . . . . . . . . . . . . . . . . . . . . .

∂g K ∂g K ∂g K
∂x1 ∂x2
... ∂xN
Second Order Condition

Second order conditions: f ∈ C 2


1 − a) (necessary) x ? is interior local maximum ⇒
z T D 2 f (x ? ) + K ? 2 j ? N
 P 
j=1 λj D g (x ) z ≤ 0 for all z ∈ R such that z 6= 0 and
?
Dg (x )z = 0.
1 − b) (necessary) x ? is interior local minimum ⇒
z T D 2 f (x ? ) + K ? 2 j ? N
 P 
j=1 λj D g (x ) z ≥ 0 for all z ∈ R such that z 6= 0 and
?
Dg (x )z = 0.
2 − a) (sufficient) x ? is a critical point and z T D 2 f (x ? ) + K ? 2 j ?
 P 
j=1 λj D g (x ) z < 0
for all z ∈ RN , such that z 6= 0 and Dg (x ? )z = 0 ⇒ x ? is a strict local maximum.
2 − b) (sufficient) x ? is a critical point and z T D 2 f (x ? ) + K ? 2 j ?
 P 
j=1 λj D g (x ) z > 0
N ? ?
for all z ∈ R , such that z 6= 0 and Dg (x )z = 0 ⇒ x is a strict local minimum.
Pathway and Caution
Finding global maximum:
Step 1: Find all local maxima using First order condition and the constraints.
Step 2: Compare f (x ? ) at all points obtained in Step 1.
Note 1: There is no need for second order condition.
Note 2: Usually constraint qualification hold at all points in the domain. If it does not
there is no way to check whether CQ holds.
Caution 1: Be aware of necessary conditions.
Suppose f (x1 , x2 ) = x13 + x23 and g 1 (x1 , x2 ) = x1 − x2 . Note that CQ is met at all
points.
First order condition:

3x12 + λ1 = 0 (1)
3x22 − λ1 = 0 (2)
x1 − x2 = 0 (3)

Only solution is x1 = x2 = λ = 0. Even second order conditions can not rule out
x1 = x2 = λ = 0 as local maximum and minimum. However (0, 0) is neither local
maximum nor local minimum. For instance, take f (, ) = 23 > 0 = f (0, 0), when
 > 0.
This also suggests that there is no global maximum and global minimum, which is
indeed true. f (, ) = 23 goes to ∞ as  → ∞
Pathway and Caution

Caution 2: Algorithm may fail because CQ is not satisfied.


Suppose f (x1 , x2 ) = −x2 and g 1 (x1 , x2 ) = x23 − x12 . This has unique global maximum
at (0, 0). Note that CQ fails at (0, 0).
First order condition does not have any solution.

−2λ1 x1 = 0 (4)
−1 + 3λ1 x22 = 0 (5)
x12 − x23 = 0 (6)

Caution 3: Global maximum and minimum may not exist.


Suppose f (x1 , x2 ) = 31 x13 − 32 x22 + 2x1 and g 1 (x1 , x2 ) = x1 − x2 . First order condition

x12 + 2 + λ1 = 0 (7)
−3x2 − λ1 = 0 (8)
x1 − x2 = 0 (9)

This system of equations have two solutions: x1 = x2 = 1, λ1 = −3 and


x1 = x2 = 2, λ1 = −6. f (1, 1) = 56 and f (2, 2) = 23 . However, global maximum and
minimum does not exist. f (, ) is an increasing function when  > 2 and  < 1. Of
course (1, 1) is a local maximum and (2, 2) is a local minimum.
Proof of FOC
Sketch of a proof: We prove a special case N = 2 and K = 1. That is we are
maximizing f (x1 , x2 ) subject to g (x1 , x2 ) = 0. We know that f and g are C 1 functions.
∂g ∂g ∂g
CQ implies ( ∂x , ∂x ) 6= (0, 0) at optimum (x1? , x2? ). Assume that ∂x (x1? , x2? ) 6= 0.
1 2 2
By Implicit function theorem, ∃h ∈ C 1 such that x2 = h(x1 ) in an open set containing
(x1? , x2? ). Thus
∂g
∂g ∂g 0 ∂x1
(x1? ,x2? )
(A) g (x1 , h(x1 )) = 0. By chain rule, ∂x1
+ ∂x2
h (x1 ) = 0 ⇒ h0 (x1? ) = − ∂g
∂x2
(x1? ,x2? )
(B) f (x1 , h(x1 )) is maximized at x1? . By chain rule and unconstrained FOC,
∂f ∂f
(x ? , x2? ) + ∂x
∂x1 1
(x1? , x2? )h0 (x1? ) = 0
2
To show that, we can find λ? ∈ R such that

∂f ∂g ? ?
(x ? , x ? ) + λ? (x , x ) = 0 (10)
∂x1 1 2 ∂x1 1 2
∂f ∂g ? ?
(x ? , x ? ) + λ? (x , x ) = 0 (11)
∂x2 1 2 ∂x2 1 2

∂f
∂x2
(x1? ,x2? )
Equation (11) gives λ? = − ∂g . Remains to show Equation (10) holds for λ? .
∂x2
(x1? ,x2? )

∂f ∂g ∂f ∂f
(x ? , x2? )
∂x1 1
+ λ? ∂x (x1? , x2? ) = ∂x (x1? , x2? ) + (x ? , x2? )h0 (x1? )
∂x2 1
(by A)
1 1
=0 (by B)
Implicit Function Theorem

Take a C 1 function φ : D → R where D ⊆ RN is an open set.


∂φ
Suppose that φ(x1? , x2? , · · · , xN? ) = 0 and (x ? , x2? , · · ·
∂x1 1
, xN? ) 6= 0.
Then we can find a neighbourhood Vδ (x2? , · · · , xN? ) ⊆ R(N−1) and an open set
W ⊂ R1 containing x1? such that
(i) there exists a C 1 function ψ : Vδ (x2? , · · · , xN? ) → W and x1? = ψ(x2? , · · · , xN? )
∂φ
(x1? ,x2? ,··· ,xN
?
)
∂ψ ∂xk
(ii) moreover, ∂xk
(x2? , · · · , xN? ) = − ∂φ for all k = 2, · · · , N
∂x1
(x1? ,x2? ,··· ,xN
?)
Optimization with Inequality Constraints
Suppose that the objective function is f : RN → R and there are K constraint
functions, h1 : RN → R, h2 : RN → R, . . . , hK : RN → R. We want to maximize
(or minimize) f (x) over all x ∈ RN such that h1 (x) ≥ 0, h2 (x) ≥ 0, . . . , hK (x) ≥ 0.
Once again, we start from local maxima x ? that satisfy K constraints. That is
(i) h1 (x ? ) ≥ 0, h2 (x ? ) ≥ 0, . . . , hK (x ? ) ≥ 0
(ii) There exists an open set U such that f (x ? ) ≥ f (x) for all x ∈ U and
h1 (x) ≥ 0, h2 (x) ≥ 0, . . . , hK (x) ≥ 0
First order necessary condition (Theorem of Kuhn and Tucker):
Suppose that f , h1 , . . . , hK are C 1 functions.
x ? is local maximum of f ⇒ There exists λ?1 , λ?2 , . . . , λ?K ∈ RK such that
(KT1) λ?j ≥ 0 and λ?j hj (x ? ) = 0 for all j = 1, 2, . . . , K
(KT2) Df (x ? ) + K ? j ?
P
j=1 λj Dh (x ) = 0
under appropriate CQ condition (which we shall ignore).
In case of minimum replace f by −f to obtain
(KT1) λ?j ≥ 0 and λ?j hj (x ? ) = 0 for all j = 1, 2, . . . , K
(KT2) Df (x ? ) − K ? j ?
P
j=1 λj Dh (x ) = 0

Finding global maximum:


Step 1: Find all local maxima using First order condition.
Step 2: Compare f (x ? ) at all points obtained in Step 1.
Note: The usual cautions apply.
Envelope Theorem
Envelope Theorem
Objective function f : R(N+1) → R is a function of variables (x1 , x2 , . . . xN ) ∈ Rn and
also a parameter α ∈ R. The parameter is held constant when f is maximized over
RN subject to h(x1 , x2 , . . . xN , α) ≥ 0.
Suppose that for every α, there is a unique interior maximizer, so we can say that
x ? (α) is the function that represents this relationship between parameter and
maximizer. Suppose f ∈ C 1 and x ? (α) is differentiable. V (α) = f (x ? (α), α) is the
maximum value of f given α and is called a value function.
Change of V (α) with respect to α is given by
V 0 (α) = ∂L
∂α
(x ? (α), λ? (α), α), where L = f (x, α) + λh(x, α)
Proof (with a bit of hand-waving): V (α) = f (x ?
 (α), α). 
dxj?
∂f
By chain rule, V 0 (α) = ∂α (x ? (α), α) + N ∂f ? (x ? (α), α)
P
j=1 ∂x (x (α), α) ·
j dα
∂f ∂h
By (KT2) (x ? (α), α) = −λ? ∂x
∂xj
(x ? (α), α) for all j = 1, 2 . . . n.
j  
?
∂f
Hence V 0 (α) = ∂α (x ? (α), α) − λ? N ∂h ? (α), α) · dxj (x ? (α), α)
P
j=1 ∂x (x dα
j

By (KT1) either λ? = 0 or h(x ? (α), α) = 0


∂f
Case 1: If λ? = 0 then V 0 (α) = ∂α (x ? (α), α) = ∂L
∂α
(x ? (α), λ? (α), α)
?
Case 2: Define h(x (α), α) = H(α). Thus H(α) = 0. Differentiating w.r.t α, we get,
 ?

∂h
H 0 (α) = ∂α (x ? (α), α) + N ∂h ? (α), α) · dxj (x ? (α), α) = 0
P
j=1 ∂x (x dα
j
∂f ∂h ∂L
Therefore V 0 (α) = ∂α
(x ? (α), α) + λ? ∂α (x ? (α), α) = ∂α
(x ? (α), λ? (α), α)

You might also like