You are on page 1of 43

LONDON SCHOOL OF ECONOMICS Professor Leonardo Felli

Department of Economics S.478; x7525


EC400 20010/11

Math for Microeconomics


September Course, Part II
Lecture Notes

Course Outline

Lecture 1: Tools for optimization (Quadratic forms).

Lecture 2: Tools for optimization (Taylor’s expansion) and Unconstrained optimiza-


tion.

Lecture 3: Concavity, convexity, quasi-concavity and economic applications.

Lecture 4: Constrained Optimization I: Equality Constraints, Lagrange Theorem.

Lecture 5: Constrained Optimization II: Inequality Constraints, Kuhn-Tucker The-


orem.

Lecture 6: Constrained optimization III: The Maximum Value Function, Envelope


Theorem, Implicit Function Theorem and Comparative Statics.
Lecture 1: Tools for optimization:
Quadratic Forms and Taylor’s formulation

What is a quadratic form?

• Quadratic forms are useful because: (i) the simplest functions after linear ones;
(ii) conditions for optimization techniques are stated in terms of quadratic
forms; (iii) economic optimization problems have a quadratic objective func-
tion, such as risk minimization problems in finance, where riskiness is measured
by the quadratic variance of the returns from investments.

• Among the functions of one variable, the simplest functions with a unique global
extremum are the pure quadratics: y = x2 and y = −x2 . The level curve of a
general quadratic form in R2 is

a11 x21 + a12 x1 x2 + a22 x22 = b

and can take the form of an ellipse, a hyperbola, a pair of lines, or possibly, the
empty set.

• Definition: A quadratic form on Rn is a real valued function


X
Q(x1 , x2 , ..., xn ) = aij xi xj
i≤j

• The general quadratic form of

a11 x21 + a12 x1 x2 + a22 x22


! !
  a11 a12 x1
can be written as x1 x 2 .
0 a22 x2

2
• Each quadratic form can be represented as

Q(x) = xT Ax

where A is a (unique) symmetric matrix:


 
a11 a12 /2 ... a1n /2
 
 a21 /2 a22 ... a 2n /2 
 .
 ... ... ... ... 
 
an1 /2 a2n /2 ... ann

• Conversely if A is a symmetric matrix, then the real valued function Q(x) =


xT Ax, is a quadratic form.

3
Definiteness of quadratic forms

• The function always takes the value 0 when x = 0.

• We focus on the question of whether x = 0 is a max, a min, or neither. For


example when
y = ax2

then if a > 0, ax2 is non negative and equals 0 only when x = 0. This is positive
definite, and x = 0 is a global minimizer. If a < 0, then the function is negative
definite.

• In two dimensions,
x21 + x22

is positive definite, whereas


−x21 − x22

is negative definite, whereas


x21 − x22

is indefinite, since it can take both positive and negative values.

• There are two intermediate cases: if the quadratic form is always non negative
but also equals 0 for non zero x0 s, is positive semidefinite, such as

(x1 + x2 )2

which can be 0 for points such that x1 = −x2 . A quadratic form which is never
positive but can be zero at points other than the origin is called negative
semidefinite.

• We apply the same terminology for the symmetric matrix A, that is, the matrix
A is positive semi definite if Q(x) = xT Ax is positive semi definite and so on.

4
• Definition: let A be an (n × n) symmetric matrix. Then A is:
(a) positive definite if xT Ax > 0 for all x 6= 0 in Rn ,
(b) positive semi definite if xT Ax ≥ 0 for all x 6= 0 in Rn ,
(c) negative definite if xT Ax < 0 for all x 6= 0 in Rn ,
(d) negative semi definite if xT Ax ≤ 0 for all x 6= 0 in Rn ,
(e) indefinite xT Ax > 0 for some x 6= 0 in Rn and xT Ax < 0 for some x 6= 0 in
Rn .

• Application (later this week): a function y = f (x) of one variable is concave


if its second derivative f 00 (x) ≤ 0 on some interval. The generalization of this
result to higher dimensions states that a function is concave on some region if
its second derivative matrix is negative semidefinite for all x in the region.

Testing the definiteness of a matrix:

• Definition: The determinant of a matrix is a unique scalar associated with the


matrix.

• Computing the determinant of a matrix:


!
a11 a12
– For a (2 × 2) matrix A = the det A or |A| is
a21 a22

a11 a22 − a12 a21


 
a11 a12 a13
 
– For A = 
 a 21 a 22 a 23
 the determinant is:

a31 a32 a33

! ! !
a22 a23 a21 a23 a21 a22
a11 det − a12 det + a13 det .
a32 a33 a31 a33 a31 a32

5
• Definition: Let A be an (n × n) matrix. A (k × k) submatrix of A formed by
deleting (n − k) columns, say columns (i1 , i2 , ..., in−k ) and the same (n − k) rows
from A, (i1 , i2 , ..., in−k ) , is called a kth order principal submatrix of A. The
determinant of a (k × k) principal submatrix is called a kth order principal
minor of A.

• Example: for a general (3 × 3) matrix


 
a11 a12 a13
 
A=
 a 21 a22 a 23


a31 a32 a33

there is one third order principal minor, which is det(A). There are three second
ordered principal minors and three first order principal minors.

• Definition: Let A be an (n × n) matrix. The kth order principal submatrix of


A obtained by deleting the last (n − k) rows and columns from A is called the
kth order leading principal submatrix of A, denoted by Ak . Its determinant
is called the kth order leading principal minor of A, denoted by |Ak |.

• Let A be an (n × n) symmetric matrix. Then

– A is positive definite if and only if all its n leading principal minors are
strictly positive.
– A is negative definite if and only if all its n leading principal minors alter-
nate in sign as follows:

|A1 | < 0, |A2 | > 0, |A3 | < 0 etc.

The kth order leading principal minor should have the same sign as (−1)k .
– A is positive semidefinite if and only if every principal minor of A is non
negative.

6
– A is negative semidefinite if and only if every principal minor of odd order
is non positive and every principal minor of even order is non negative.

• Diagonal matrices:  
a1 0 0
 
A=
 0 a2 0 .
0 0 a3
These also correspond to the simplest quadratic forms:

xT A x = a1 x21 + a2 x22 + a3 x23 .

This quadratic form will be positive (negative) definite if and only if all the
a0i s are positive (negative). It will be positive semidefinite if and only if all the
ai ; s are non negative and negative semidefinite if and only if all the a0i s are non
positive. If there are two a0i s of opposite signs, it will be indefinite.

• Let A be a (2 × 2) matrix then:


! !
a b x1
Q(x1 , x2 ) = (x1 , x2 )
b c x2
= ax21 + 2bx1 x2 + cx22

If a = 0, then Q cannot be negative or positive definite since Q(1, 0) = 0. So


assume that a 6= 0 and add and subtract b2 x22 /a to get:

b2 2 b2 2
Q(x1 , x2 ) = ax21 + 2bx1 x2 + cx22 + x − x2
a 2 a
2 2
2bx1 x2 b 2 b
= a(x21 + + 2 x2 ) − x22 + cx22
a a a
b 2 (ac − b2 ) 2
= a(x1 + x2 ) + x2
a a

7
• If both coefficients, a and (ac−b2 )/a are positive, then Q will never be negative.
It will equal 0 only when x1 + ab x2 = 0 and x2 = 0 in other words, when x1 = 0
and x2 = 0. In other words, if

a b
|a| > 0 and det A = >0

b c

then Q is positive definite. Conversely, if Q is positive definite then both a and


det A = ac − b2 are positive.

Similarly, Q will be negative definite if and only if both coefficient are negative,
which occurs if and only if a < 0 and ac − b2 > 0, that is, when the leading
principal minors alternative in sign. If ac − b2 < 0. then the two coefficients will
have opposite signs and Q will be indefinite.

• Examples of (2 × 2) matrixes:
!
2 3
– Consider A = . Since |A1 | = 2 and |A2 | = 5, A is positive
3 7
definite.
!
2 4
– Consider B = . Since |B1 | = 2 and |B2 | = −2, B is indefinite.
4 7

Taylor’s formulation:

• The second tool that we need for maximization is Taylor’s series.

• For functions from R1 to R1 , the Taylor’s approximation is

f (a + h) ≈ f (a) + f 0 (a)h

The approximate equality holds in the following sense. Write f (a + h) as

f (a + h) = f (a) + f 0 (a)h + R(h; a)

8
R(h; a) is the difference between the two sides of the approximation, and by the
R(h;a)
definition of the derivative f 0 (a), we have h
→ 0 as h → 0.

• Geometrically, this is the formalization of the approximation of the graph of f


by its tangent line at (a, f (a)). Analytically, it describes the best approximation
of f by a polynomial of degree 1.

• Definition: the kth order Taylor polynomial of f at x = a is

f 00 (a) 2 f [k] (a) k


Pk (a + h) = f (a) + f 0 (a)h + h + ... + h
2! k!

where

Rk (h; a)
f (a + h) − Pk (a + h) = Rk (h; a) where lim =0
h→0 hk

• Example: we compute the first and second order Taylor polynomial of the
exponential function f (x) = ex at x = 0. All the derivatives of f at x = 0 equal
1. Then:

P1 (h) = 1 + h
h2
P2 (h) = 1 + h +
2

For h = .2, then P1 (.2) = 1.2 and P2 (.2) = 1.22 compared with the actual value
of e.2 which is 1.22140.

• For functions of several variables:

∂F ∂F
F (a + h) ≈ F (a) + (a)h1 + ... + (a)hn
∂x1 ∂xn

R1 (h;a)
where ||h||
→ 0 as h → 0. This is the approximation of order 1. Alternatively

F (a + h) = F (a) + DFa · h + R1 (h; a)

9
 
∂F ∂F
where DFa = ∂x1
(a), ..., ∂xn
(a) .

f 00 (a) 2
For order two, the analogue for 2!
h is

1 T 2
h D Fa h,
2

where D2 Fa is the Hessian matrix:

 
∂2F ∂2F
∂ 2 x1
... ∂xn ∂x1
 x=a x=a 
D 2 Fa =  ... ... ... .
 
 
∂2F ∂2F
∂x1 ∂xn
... ∂ 2 xn
x=a x=a

• The extension for order k then trivially follows.

10
Lecture 2: Unconstrained optimization.

Optimization plays a crucial role in economic problems. We start with uncon-


strained optimization problems.

Definition of extreme points

• Definition: The ball B(x, r) centred at x of radius r is the set of all vectors y
in Rn whose distance from x is less than r, that is

B(x, r) = {y ∈ Rn ; ||y − x|| < r}.

• Definition: suppose that f (x) is a real valued function defined on a subset C of


Rn . A point x∗ in C is:
1. A global maximizer for f (x) on C if f (x∗ ) ≥ f (x) for all x ∈ C.
2. A strict global maximizer for f (x) on C if f (x∗ ) > f (x) for all x ∈ C
such that x 6= x∗ .
3. A local maximizer for f (x) if there is a strictly positive number δ such
that f (x∗ ) ≥ f (x) for all x ∈ C for x ∈ B(x∗ , δ).
4. A strict local maximizer for f (x) if there is a strictly positive number δ
such that f (x∗ ) > f (x) for all x ∈ C for x ∈ B(x∗ , δ) and x 6= x∗ .
5. A critical point for f (x) if the first partial derivative of f (x) exists at x∗
and
∂f (x∗ )
= 0 for i = 1, 2, ..., n.
∂xi

• Example: find the critical points of F (x, y) = x3 − y 3 + 9xy. We set

∂F ∂F
= 3x2 + 9y = 0; = −3y 2 + 9x = 0
∂x ∂y

11
the critical points are (0, 0) and (3, −3).

Do extreme points exist?

• Theorem (Extreme Value Theorem): Suppose that f (x) is a continuous function


defined on C, which is compact (closed and bounded) in Rn . Then there exists
a point x∗ in C, at which f has a maximum, and there exists a point x∗ in C,
at which f has a minimum. Thus,

f (x∗ ) ≤ f (x) ≤ f (x∗ )

for all x ∈ C.

Functions of one variable

• Necessary condition for maximum in R :


Suppose that f (x) is a differentiable function on an interval I. If x∗ is a local
maximizer of f (x), then either x∗ is an end point of I or f 0 (x∗ ) = 0.

• Second order sufficient condition for a maximum in R :


Suppose that f (x), f 0 (x), f 00 (x) are all continuous on an interval in I and that
x∗ is a critical point of f (x). Then:

1. If f 00 (x) ≤ 0 for all x ∈ I, then x∗ is a global maximizer of f (x) on I.


2. If f 00 (x) < 0 for all x ∈ I for x∗ 6= x, then x∗ is a strict global maximizer
of f (x) on I.
3. If f 00 (x∗ ) < 0 then x∗ is a strict local maximizer of f (x) on I.

12
Functions of several variables

• First order necessary conditions for a maximum in Rn :


Suppose that f (x) is a real valued function for which all first partial derivatives
of f (x) exist on a subset C ⊂ Rn . If x∗ is an interior point of C that is a local
maximizer of f (x), then x∗ is a critical point of f (x), that is

∂f (x∗ )
= 0 for i = 1, 2, ..., n.
∂xi

Can we say whether (0, 0) or (3, −3) are a local maximum or a local minimum
then? For this we have to consider the Hessian, or the matrix of the second
order partial derivatives. Note that this is a symmetric matrix since cross-
partial derivatives are equal (if the function has continuous second order partial
derivatives, Clairaut’s / Schwarz’s theorem).

• Second order sufficient conditions for a local maximum in Rn


Suppose that f (x) is a real valued function for which all first and second partial
derivatives of f (x) exist on a subset C ⊂ Rn . Suppose that x∗ is a critical point
of f . Then: If D2 f (x∗ ) is negative (positive) definite, then x∗ is a strict local
maximizer (minimizer) of f (x).

It is also true that if x∗ is an interior point and a maximum (minimum) of f ,


then D2 f (x∗ ) is negative (positive) semidefinite.

But it is not true that if x∗ is a critical point, and D2 f (x∗ ) is negative (positive)
semidefinite, then x∗ is a local maximum. A counterexample is f (x) = x3 ,
which has the property that D2 f (0) is semidefinite, but x = 0 is not a maximum
or minimum.

13
• Back to the example of F (x, y) = x3 − y 3 + 9xy. Compute the Hessian:

!
6x 9
D2 F (x, y) =
9 −6y

The first order leading principle minor is 6x and the second order leading princi-
pal minor is det (D2 F (x, y)) = −36xy − 81. At (0, 0) these two minors are 0 and
−81 and hence the matrix is indefinite and this point is neither a local min or
a local max (it is a saddle point). At (3, −3) these two minors are positive and
hence it is a strict local minimum of F. Note that it is not a global minimum
(why?).

• Sketch of proof:

1
F (x∗ + h) = F (x∗ ) + DF (x∗ )h + hT D2 F (x∗ )h + R(h)
2

Ignore R(h) and set DF (x∗ ) = 0. Then

1
F (x∗ + h) − F (x∗ ) ≈ hT D2 F (x∗ )h
2

If D2 F (x∗ ) is negative definite, then for all small enough h 6= 0, the right hand
side is negative. Then
F (x∗ + h) < F (x∗ )

for small enough h or in other words, x∗ is a strict local maximizer of F.

Concavity and convexity

• Definition: A real valued function f defined on a convex subset U of Rn is


concave, if for all x, y in U and for all t ∈ [0, 1] :

f (tx + (1 − t)y) ≥ tf (x) + (1 − t)f (y)

14
A real valued function g defined on a convex subset U of Rn is convex, if for all
x, y in U and for all t ∈ [0, 1] :

g(tx + (1 − t)y) ≤ tg(x) + (1 − t)g(y)

• Notice: f is concave if and only if −f is convex.

• Notice: linear functions are both convex and concave.

• A convex set:
Definition: A set U is a convex set if for all x ∈ U and y ∈ U, then for all
t ∈ [0, 1] :
tx + (1 − t)y ∈ U

• Concave and convex functions need to have convex sets as their domain. Oth-
erwise, the conditions above fail.

• Let f be a continuous and differentiable function on a convex subset U of Rn .


Then f is concave on U if and only if for all x, y in U :

∂f (x) ∂f (x)
f (y) − f (x) ≤ Df (x)(y − x) = (y1 − x1 ) + ... + (yn − xn )
∂x1 ∂xn

• Proof on R1 : since f is concave, then

tf (y) + (1 − t)f (x) ≤ f (ty + (1 − t)x) ⇔


t(f (y) − f (x)) + f (x) ≤ f (x + t(y − x)) ⇔
f (x + t(y − x)) − f (x)
f (y) − f (x) ≤ ⇔
t
f (x + h) − f (x)
f (y) − f (x) ≤ (y − x)
h

for h = t(y − x). Taking limits when h → 0 this becomes

f (y) − f (x) ≤ f 0 (x)(y − x).

15
• If f is a continuous and differentiable concave function on a convex set U and
if x0 ∈ U, then
Df (x0 )(y − x0 ) ≤ 0

implies f (y) ≤ f (x0 ), and if this holds for all y ∈ U, then x0 is a global
maximizer of f .

• Proof: we know that:

f (y) − f (x0 ) ≤ Df (x0 )(y − x0 ) ≤ 0

Hence also

f (y) − f (x0 ) ≤ 0.

• Let f be a continuous twice differentiable function whose domain is a convex


open subset U of Rn . If f is a concave function on U and Df (x0 ) = 0 for some
x0 , then x0 is a global maximum of f on U.

• A continuous twice differentiable function f on an open convex subset U of Rn


is concave on U if and only if the Hessian D2 f (x) is negative semidefinite for
all x in U . The function f is a convex function if and only if D2 f (x) is positive
semidefinite for all x in U.

• Second order sufficient conditions for global maximum (minimum) in Rn :


Suppose that x∗ is a critical point of a function f (x) with continuous first and
second order partial derivatives on Rn . Then x∗ is:

1. a global maximizer for f (x) if D2 f (x) is negative (positive) semidefinite


on Rn .
2. a strict global maximizer for f (x) if D2 f (x) is negative (positive) definite
on Rn .

16
The property that critical points of concave functions are global maximizers is
an important one in economic theory. For example, many economic principles,
such as marginal rate of substitution equals the price ratio, or marginal revenue
equals marginal cost are simply the first order necessary conditions of the corre-
sponding maximization problem as we will see. Ideally, as economist would like
such a rule also to be a sufficient condition guaranteeing that utility or profit
is being maximized, so it can provide a guideline for economic behaviour. This
situation does indeed occur when the objective function is concave.

17
Lecture 3: Concavity, convexity, quasi-concavity and economic
applications

• Recall:
Definition: A set U is a convex set if for all x ∈ U and y ∈ U, then for all
t ∈ [0, 1] :
tx + (1 − t)y ∈ U

• Concave and convex functions need to have convex sets as their domain.

• Recall: A real valued function f defined on a convex subset U of Rn is concave,


if for all x, y in U and for all t ∈ [0, 1] :

f (tx + (1 − t)y) ≥ tf (x) + (1 − t)f (y)

Why are concave functions so useful in economics?

• Let f1 , ..., fk be concave functions, each defined on the same convex subset U
of Rn . Let a1 , a2 , ..., ak be positive numbers. Then a1 f1 + a2 f2 + ... + ak fk is a
concave function on U.
(Proof: in class).

Consider the problem of maximizing profit for a firm whose production function
is y = g(x), where y denotes output and x denote the input bundle. If p denotes the
price of output and wi is the cost per unit of input i, then the firm’s profit function is

Π(x) = pg(x) − (w1 x1 + w2 x2 + ... + wn xn )

The profit function is a concave function if the production function is concave. It


arises because −(w1 x1 + w2 x2 + ... + wn xn ) is concave and g is concave and from the
result above.

18
The first order conditions:

∂g
p = wi for i = 1, 2, ..., n
∂xi

are both necessary and sufficient for an interior profit maximizer.

Quasiconcave and quasiconvex functions

• Definition: a level set of function f defined on U in Rn is:

Xaf = {x ∈ U |f (x) = a}

This could be a point, a curve, a plane.

• Definition: a function f defined on a convex subset U of Rn is quasiconcave if


for every real number a,

Ca+ = {x ∈ U |f (x) ≥ a}

is a convex set.
Thus, the level sets of the function bound convex subsets from below.

• Definition: a function f is quasiconvex if for every real number a,

Ca− = {x ∈ U |f (x) ≤ a}

is a convex set.
Thus, the level sets of the function bound convex subsets from above.

• Every concave function is quasiconcave and every convex function is quasicon-


vex.

19
Proof: Let x and y be two points in Ca+ so that f (x) ≥ a and f (y) ≥a. Then

f (tx + (1 − t)y) ≥ tf (x)+(1 − t)f (y)


≥ ta+(1 − t)a
= a

So tx + (1 − t)y is in Ca+ and hence this set is convex. We have shown that if
f is concave, it is also quasi-concave. Try to show that every convex function
is quasi-convex.

• This is the second advantage of concave functions in economics. Concave func-


tions are quasi-concave. Quasi-concavity is simply a desirable property when
we talk about economic objective functions such as preferences (why?).

• The property that the set above any level set of a concave function is a convex
set is a natural requirement for utility and production functions. For example,
consider an indifference curve C of the concave utility function U . Take two
bundles on this indifference curve. The set of bundles which are preferred to
them, is a convex set. In particular, the bundles that mix their contents are
in this preferred set. Then, given any two bundles, a consumer with a concave
utility function will always prefer a mixture of the bundles to any of them.

• A more important advantage of the shape of the indifference curve is that it


displays a diminishing marginal rate of substitution. As one moves left to right
along the indifference curve C increasing consumption of good 1, the consumer
is willing to give up more and more units of good one to gain an additional unit
of good 2. This property is a property of concave utility functions because each
level set forms the boundary of a convex region.

• Any (positive) monotonic transformation of a concave function is quasiconcave.

• Let y = f (x) be an increasing function on R1 . Easy to see graphically that


the function is both quasiconcave and quasiconvex. The same applies for a

20
decreasing function.

• A single peaked function is quasiconcave.

• Consider the following utility function Q(x, y) = min{x, y}.

• The region above and to the right of any of this function’s level sets is a convex
set and hence Q is quasi-concave.

• Let f be a function defined on a convex set U in Rn . Then, the following


statements are equivalent:

(i) f is a quasiconcave function on U


(ii) For all x, y ∈ U and t ∈ [0, 1],

f (x) ≥ f (y) implies f (tx + (1 − t)y) ≥ f (y)

(iii) For all x, y ∈ U and t ∈ [0, 1],

f (tx + (1 − t)y) ≥ min{f (x), f (y)}

You will prove this in class.

21
Lecture 4: Constrained Optimization I: The Lagrangian

• We now analyze optimal allocation in the presence of scarce resources; after all,
this is what economics is all about.

• Consider the following problem:

max f (x1 , x2 , ..., xn )


x1 ,x2 ,...,xn

where (x1 , x2 , ..., xn ) ∈ Rn must satisfy:

g1 (x1 , x2 , ..., xn ) ≤ b1 , .., gk (x1 , x2 , ..., xn ) ≤ bk

and
h1 (x1 , x2 , ..., xn ) = c1 , .., hm (x1 , x2 , ..., xn ) = cm .

• The function f is called the objective function, while the g and h functions are
the constraint functions: inequality constraint (g) and equality constraints (h).

• An example: utility maximization:

max U (x1 , x2 , ..., xn )


x1 ,x2 ,...,xn

subject to
p1 x1 + p2 x2 + ... + pn xn ≤ I

x1 ≥ 0, x2 ≥ 0, ..., xn ≥ 0

In this case we can treat the latter constraints as −xi ≤ 0.

22
Equality constraints:

• The simple case of two variables and one equality constraint:

max f (x1 , x2 )
x1 ,x2

subject to

p 1 x 1 + p 2 x2 = I

• Geometrical representation: draw the constraint on the (x1 , x2 ) plane. Draw


representative samples of level curves of the objective function f. The goal is
to find the highest valued level curve of f which meets the constraint set. It
cannot cross the constraint set; it therefore must be tangent to it.

Need to find the slope of the level set of f :

f (x1 , x2 ) = a

Use total differentiation:

∂f (x1 , x2 ) ∂f (x1 , x2 )
dx1 + dx2 = 0
∂x1 ∂x2

Then:
dx2 ∂f (x1 , x2 ) ∂f (x1 , x2 )
=− /
dx1 ∂x1 ∂x2

So, the slope of the level set of f at x∗ is

∂f ∗ ∂f ∗
− (x )/ (x )
∂x1 ∂x2

23
The slope of the constraint at x∗ is

∂h ∗ ∂h ∗
− (x )/ (x )
∂x1 ∂x2

and hence x∗ satisfies:


∂f
∂x1
(x∗ ) ∂h
∂x1
(x∗ )
∂f
= ∂h
∂x2
(x∗ ) ∂x2
(x∗ )

or:
∂f ∂f
∂x1
(x∗ ) ∂x2
(x∗ )
∂h
= ∂h
∂x1
(x∗ ) ∂x2
(x∗ )

Let us denote by µ this common value:

∂f ∂f
∂x1
(x∗ ) ∂x2
(x∗ )
∂h
= ∂h

∂x1
(x∗ ) ∂x2
(x∗ )

and then we can re-write these two equations:

∂f ∗ ∂h ∗
(x ) − µ (x ) = 0
∂x1 ∂x1
∂f ∗ ∂h ∗
(x ) − µ (x ) = 0
∂x2 ∂x2

We therefore have three equations with three unknowns:

∂f ∗ ∂h ∗
(x ) − µ (x ) = 0
∂x1 ∂x1
∂f ∗ ∂h ∗
(x ) − µ (x ) = 0
∂x2 ∂x2
h(x∗1 , x∗2 ) = c

We can then form the Lagrangian function:

L(x1 , x2 , µ) = f (x1 , x2 ) − µ(h(x1 , x2 ) − c)

24
and then find the critical point of L, by setting:

∂L
= 0
∂x1
∂L
= 0
∂x2
∂L
= 0
∂µ

and this gives us the same equations as above.

• The variable µ is called the Lagrange multiplier.

• We have reduced a constrained problem in two variables to an unconstrained


problem in three variables.

A caveat: it cannot be that ∂h


∂x1
(x∗ ) = ∂h
∂x2
(x∗ ) = 0. Thus, the constraint quali-

fication is that x is not a critical point of h.

• Formally, let f and h be continuous functions of two variables. Suppose that


x∗ = (x∗1 , x∗2 ) is a solution to max f (x1 , x2 ) subject to h(x1 , x2 ) = c and that x∗
is not a critical point of h. Then there is a real number µ∗ such that (x∗1 , x∗2 , µ∗ )
is a critical point of the Lagrange function

L(x1 , x2 , µ) = f (x1 , x2 ) − µ(h(x1 , x2 ) − c).

An example:

max (x1 x2 )
x1 ,x2

subject to
x1 + 4x2 = 16

The constraint qualification is satisfied.

25
L(x1 , x2 , µ) = x1 x2 − µ(x1 + 4x2 − 16)

and the first order conditions are:

x2 − µ = 0
x1 − 4µ = 0
x1 + 4x2 − 16 = 0

and the only solution is x1 = 8, x2 = 2, µ = 2.


A similar anlaysis easily extends to the case of several equality constraints.

26
Inequality constraints:

With equality constraints, we had the following equations:

∂f ∗ ∂h ∗
(x ) − µ (x ) = 0
∂x1 ∂x1
∂f ∗ ∂h ∗
(x ) − µ (x ) = 0
∂x2 ∂x2

Or: !
∂f
∂x1
(x∗ ) ∂h
∂x1
(x∗ )
∂f

∂x2
(x∗ ) ∂h
∂x2
(x∗ )
Or:
∇f (x∗ ) = µ∇h(x∗ ).

And we had no restrictions on µ.

• The simple case of two variables and one inequality constraint:

max f (x1 , x2 )
x1 ,x2

subject to
g(x1 , x2 ) ≤ b

Graphical representation: In the graph, the solution is where the level curve of
f meets the boundary of the constraint set. This means that the constraint is
binding. There is a tangency at the solution.

• So when the constraint is binding, is it the same as an equality constraint?

• But now when we look graphically at the constraint optimization problem, even
when the constraint is binding, we would have a restriction on the Lagrange

27
multiplier. The gradients are again in line so that one is multiplier of the other:

∇f (x∗ ) = λ∇g(x∗ ).

But now the sign of λ is important: the gradients must point in the same direction
also because otherwise we can increase f and still satisfy the constraint. This means
that λ ≥ 0. This is the main difference between inequality and equality constraints.
We still form the Lagrangian:

L(x1 , x2 , µ) = f (x1 , x2 ) − λ(g(x1 , x2 ) − b)

and then find the critical point of L, by setting:

∂L ∂f ∂g
= −λ =0
∂x1 ∂x1 ∂x1
∂L ∂f ∂g
= −λ =0
∂x2 ∂x2 ∂x2
∂L
But what about ∂λ
?
Suppose that the optimal solution is when g(x1 , x2 ) < b. At this point, the con-
straint is not binding, as the optimal solution is at the interior. The point x∗ of the
optimal solution is a local maximum (it is an unconstrained maximum). Thus:

∂f ∗ ∂f ∗
(x ) = (x ) = 0
∂x1 ∂x2

We can still use the Lagrangian, provided that we set λ = 0!

In other words, either the constraint is binding so that g(x1 , x2 ) − b = 0, or that


it is not binding and then λ = 0. In short, the following complementary slackness
condition has to be satisfied:

λ(g(x1 , x2 ) − b) = 0.

28
Lecture 5: Constrained Optimization II: Inequality Constraints

We describe formally the constrained optimization problem with inequality con-


straints:
Let f and g be continuous functions of two variables. Suppose that x∗ = (x∗1 , x∗2 )
is a solution to max f (x1 , x2 ) subject to g(x1 , x2 ) ≤ b and that x∗ is not a critical
point of g if g(x∗1 , x∗2 ) = b. Then given the Lagrange function

L(x1 , x2 , λ) = f (x1 , x2 ) − λ(g(x1 , x2 ) − b),

there is a real number λ∗ such that:

∂L(x∗ , λ∗ )
= 0
∂x1
∂L(x∗ , λ∗ )
= 0
∂x2
λ∗ (g(x∗1 , x∗2 ) − b) = 0
λ∗ ≥ 0
g(x∗1 , x∗2 ) ≤ b

An example:

ABC is a perfectly competitive, profit maximizing firm, producing y from input


x according to x.5 . The price of output is 2, and of input is 1. Negative levels of x
are impossible. Also, the firm cannot buy more than a > 0 units of input. The firm’s
maximization problem is therefore

max f (x) = 2x.5 − x

subject to g(x) = x ≤ a (and x ≥ 0 which we will ignore now).The Lagrangian is:

L(x, λ) = 2x.5 − x − λ[x − a]

29
The first order condition is:
x−.5 − 1 − λ = 0

Let us write all the information that we have:

x−.5 − 1 − λ = 0
λ(x − a) = 0
λ ≥ 0
x ≤ a

And solve the system of equations.


It is the easiest to divide it in two cases: when λ > 0 and when λ = 0.
Suppose that λ > 0. This means that the constraint is binding. Then we know
that x = a. The full solution is therefore:

1
x = a, λ = √ − 1
a

When is this solution viable? We need to keep consistency so if we assume that λ > 0
then we need to insure it:
1
√ −1>0⇔a<1
a

What if λ = 0? this means that the constraint is not binding. From the first order
condition:
x−.5 − 1 = 0 ⇔ x = 1

The full solution is therefore:


x = 1, λ = 0

and this solution holds for all a ≥ 1.

30
Several Inequality constraints:

The generalization is easy: however, now some constraints may be binding and
some may be not binding.

An example:
We have to maximize f (x, y, z) = (xyz) subject to the constraints that x+y+z ≤ 1
and that x ≥ 0, y ≥ 0 and z ≥ 0. The Lagrangian is

xyz − λ1 (x + y + z − 1) + λ2 x + λ3 y + λ4 z

Solving the Lagrange problem will give us a set of critical points. The optimal
solution will be a subset of this. But we can already restrict this set of critical points
because it is obvious that λ2 = 0 = λ3 = λ4 . If one of these is positive, for example
λ2 > 0, then it must mean by complementary slackness, that x = 0. But then the
value of xyz is 0, and obviously we can do better than that (for example, when
x = y = z = .1).
Thus, the non-negativity conditions cannot bind. This leaves us with a problem
with one constraint, and we have to decide whether λ1 > 0 or λ1 = 0. But obviously,
the constraint must bind. If x + y + z < 1 we can increase one of the variables, satisfy
the constraint, and increase the value of the function. From the first order conditions:

xy − λ1 = 0
zy − λ1 = 0
xz − λ1 = 0

1
we then find that xy = yz = zx and hence it follows that x = y = z = 3
at the
optimal solution.

31
We have looked at: max f (x, y) subject to g(x, y) ≤ b..
We have characterized necessary conditions for a maximum. So that if x∗ is
a solution to a constrained optimization problem (it maximizes f subject to some
constraints), it is also a critical point of the Lagrangian. We find the critical points
of the Lagrangian.

• Can we then say that these are the solutions for the constrained optimization
problem? In other words:

• Can we say that these are maximizers of the Lagrangian, and if these are max-
imizers of the Lagrangian, are these also maximizers of f (subject to the con-
straint)?

To determine the answer, let (x0 , y 0 , λ) satisfy all necessary conditions for a max-
imum. It is clear that if x0 , y 0 is a maximizer of the Lagrangian, it also maximizes
f.
To see this note that λ[g(x0 , y 0 )−b] = 0. Thus, f (x0 , y 0 ) = f (x0 , y 0 )−λ(g(x0 , y 0 )−b).
By λ ≥ 0 and g(x, y) ≤ b for all other (x, y), then f (x, y) − λ(g(x, y) − b) ≥ f (x, y).
Since x0 , y 0 maximizes the Lagrangian, then for all other x, y :

f (x0 , y 0 ) − λ(g(x0 , y 0 ) − b) ≥ f (x, y) − λ(g(x, y) − b)

which implies that


f (x0 , y 0 ) ≥ f (x, y)

So that if x0 , y 0 maximizes the Lagrangian, it also maximizes f (x, y) subject to


g(x, y) ≤ b.

• Recall the main results from unconstrained optimization:

32
• If f is a concave function defined on a convex subset X in Rn , x0 is a point
in the interior in which Df (x0 ) = 0, then x0 maximizes f (x) in X, that is,
f (x) ≤ f (x0 ) for all x.

• You have shown in class that in the constrained optimization problem, if f is


concave and g is convex, then the Lagrangian function is also concave. This
means that we can use first order conditions.

33
The Kuhn-Tucker Theorem:

Consider the problem of maximizing f (x) subject to the constraint that g(x) ≤ b.
Assume that f and g are differentiable, f is concave, g is convex, and that the
constraint qualification holds. Then x∗ solves this problem if and only if there is a
scalar λ such that

∂L(x∗ , λ) ∂ ∂
= f (x∗ ) − λ g(x∗ ) = 0 for all i
∂xi ∂xi ∂xi
λ ≥ 0
g(x∗ ) ≤ b
λ[b − g(x∗ )] = 0

Mechanically (that is, without thinking...), one can solve constrained optimization
problems in the following way:

• Form the Lagrangian L(x, λ) = f (x) − λ(g(x) − b).

• Suppose that there exist λ∗ such that the first order conditions are satisfied,
that is:

∂L(x∗ , λ∗ )
= 0 for all i
∂xi
λ∗ ≥ 0
λ∗i (g(xi ) − b) = 0

• Assume that g1 to ge are binding and that ge+1 to gm are not binding. Write
(g1 , .., ge ) as gE . Assume also that the Hessian of L with respect to x at x∗ , λ∗
is negative definite on the linear constraint set {v : DgE (x∗ )v = 0}, that is:

v 6= 0, DgE (x∗ )v = 0 → vT (Dx2 L(x∗ , λ∗ ))v < 0,

34
• Then x∗ is a strict local constrained max of f on the constraint set.

• To check this condition, we form the bordered Hessian:


!
0 DgE (x∗ )
Q=
DgE (x∗ )T Dx2 L(x∗ , λ∗ )

If the last n − e leading principal minors of Q alternate in sign with the sign
of the determinant of the largest matrix the same as the sign of (−1)n , then
sufficient second order conditions hold for a candidate point to be a solution of
a constrained maximization problem.

35
Lecture 6: Constrained Optimization III: Maximum value functions

Profit functions and indirect utility functions are example of maximum value func-
tions, whereas cost functions and expenditure functions are minimum value functions.

• Maximum value function, a definition:

If x(b) solves the problem of maximizing f (x) subject to g(x) ≤ b, the maximum
value function is v(b) = f (x(b)).

• The maximum value function, is non decreasing.

Maximum value functions and the interpretation of the Lagrange multi-


plier

• Consider the problem of maximizing f (x1 , x2 , ..., xn ) subject to the k inequality


constraints
g(x1 , x2 , ..., xn ) ≤ b∗1 , ..., g(x1 , x2 , ..., xn ) ≤ b∗k

where b∗ = (b∗1 , ..., b∗k ). Let x∗1 (b∗ ), ..., x∗n (b∗ ) denote the optimal solution and let
λ1 (b∗ ), ..., λk (b∗ ) be the corresponding Lagrange multipliers. Suppose that as
b varies near b∗ , then x∗1 (b∗ ), ..., x∗n (b∗ ) and λ1 (b∗ ), ..., λk (b∗ ) are differentiable
functions and that x∗ (b∗ ) satisfies the constraint qualification. Then for each
j = 1, 2, ..., k :

λj (b∗ ) = f (x∗ (b∗ ))
∂bj

• Proof: For simplicity, we do here the case of a single equality constraint, and
with f and g being functions of two variables. The Lagrangian is

L(x, y, λ; b) = f (x, y) − λ(h(x, y) − b)

36
The solution satisfies:

∂L ∗
0 = (x (b), y ∗ (b), λ∗ (b); b)
∂x
∂f ∗ ∂h
= (x (b), y ∗ (b)) − λ∗ (b) (x∗ (b), y ∗ (b), λ∗ (b)),
∂x ∂x
∂L ∗
0 = (x (b), y ∗ (b), λ∗ (b); b)
∂y
∂f ∗ ∂h
= (x (b), y ∗ (b)) − λ∗ (b) (x∗ (b), y ∗ (b), λ∗ (b)),
∂y ∂y

for all b. Furthermore, since h(x∗ (b), y ∗ (b)) = b for all b,

∂h ∗ ∗ ∂x∗ (b) ∂h ∗ ∗ ∂y ∗ (b)


(x , y ) + (x , y ) =1
∂x ∂b ∂y ∂b

for every b. Therefore, using the chain rule, we have:

df (x∗ (b), y ∗ (b)) ∂f ∗ ∗ ∂x∗ (b) ∂f ∗ ∗ ∂y ∗ (b)


= (x , y ) + (x , y )
db ∂x ∂b ∂y ∂b

∂h ∂x (b) ∂h ∂y ∗ (b)
= λ∗ (b)[ (x∗ , y ∗ ) + (x∗ , y ∗ ) ]
∂x ∂b ∂y ∂b
= λ∗ (b).

• The economic interpretation of the multiplier as a ‘shadow price’: For example,


in the application for a firm maximizing profits, it tells us how valuable another
unit of input would be to the firm’s profits, or how much the maximum value
changes for the firm when the constraint is relaxed. In other words, it is the
maximum amount the firm would be willing to pay to acquire another unit of
input.

• Recall that
L(x, y, λ) = f (x, y) − λ(g(x, y) − b),

So that
d ∂
f (x(b), y(b); b) = λ(b) = L(x(b), y(b), λ(b); b)
db ∂b

37
Hence, what we have found above is simply a particular case of the envelope
theorem, which says that

d ∂
f (x(b), y(b); b) = L(x(b), y(b), λ(b); b)
db ∂b

Maximum value functions and Envelope theorem:

• Consider the problem of maximizing f (x1 , x2 , ..., xn ) subject to the k equality


constraints

h1 (x1 , x2 , ..., xn , c) = 0, ..., hk (x1 , x2 , ..., xn , c) = 0

Let x∗1 (c), ..., x∗n (c) denote the optimal solution and let µ1 (c), ..., µk (c) be the
corresponding Lagrange multipliers. Suppose that x∗1 (c), ..., x∗n (c) and µ1 (c),
..., µk (c) are differentiable functions and that x∗ (c) satisfies the constraint qual-
ification. Then for each j = 1, 2, ..., k :

d ∂
f (x∗ (c); c) = L(x∗ (c), µ(c); c)
dc ∂c

• Note: if hi (x1 , x2 , ..., xn , c) = 0 can be expressed as some h0i (x1 , x2 , ..., xn )−c = 0,
then we are back at the previous case, in which we have found that

d ∂
f (x∗ (c), c) = L(x∗ (c), µ(c); c) = λj (c)
dc ∂c

But the statement is more general.

• We will prove this for the simple case of an unconstrained problem. Let φ(x; a)
be a continuous function of x ∈ Rn and the scalar a. For any a,consider the
maximization problem of max φ(x; a). Let x∗ (a) be the solution of this problem
and a continuous and differentiable function of a. We will show that

d ∂
φ(x∗ (a); a) = φ(x∗ (a); a)
da ∂a
38
We compute via the chain rule that

d X ∂φ ∂x∗ ∂φ ∗
φ(x∗ (a); a) = (x∗ (a); a) i (a) + (x (a); a)
da i
∂x i ∂a ∂a
∂φ ∗
= (x (a); a)
∂a

∂φ ∗
since (x (a); a) = 0 for all i by the first order conditions.
∂xi
• Intuitively, when we are already at a maximum, changing slightly the parame-
ters of the problem or the constraints, does not affect the value through changes
∂φ ∗
in the solution x∗ (a), because (x (a); a) = 0.
∂xi
• When we use the envelope theorem we have to make sure though that we do
not jump to another solution in a discrete manner.

39
Comparative Statics

More generally in economic theory, once we pin down an equilibrium or a solution


to an optimization problem, we are interested in how the exogenous variables change
the value of the endogenous variables.

We have been using the Implicit Function Theorem (IFT) throughout without
stating and explaining why we can use it. The IFT allows us to be assured that a set
of simultaneous equations:

F 1 (y1 , ..., yn ; x1 , ..., xm ) = 0


F 2 (y1 , ..., yn ; x1 , ..., xm ) = 0
..
.
F n (y1 , ..., yn ; x1 , ..., xm ) = 0

will define a set of implicit functions:

y1 = f 1 (x1 , ..., xm )
y2 = f 2 (x1 , ..., xm )
..
.
yn = f n (x1 , ..., xm )

In other words, what the conditions of the IFT serve to do is to assure that the n
equations can in principle be solved for the n variables, y1 , ..., yn , even if we may not
be able to obtain the solution in an explicit form.

40
• Given the set of simultaneous equations above, if the functions F 1 , .., F n all
have continuous partial derivatives with respect to all x and y variables, and if
at a point (y0 , x0 ) that solves the set of simultaneous equations the determinant
of the (n × n) Jacobian w.r.t. the y-variables is not 0:


∂F 1 ∂F 1
... ∂F 1
∂y1 ∂y2 ∂yn
∂F 2 ∂F 2 ∂F 2
...
|J| = ∂y1 ∂y2 ∂yn
6= 0
... ... ...

∂F n ∂F n n

∂y1 ∂y2
... ∂F
∂yn

then there exists an m−dimensional neighbourhood of x0 in which the variables


y1 ..., yn are functions of x1 , ..., xm according to the f i functions defined above.
These functions are satisfied at x0 and y 0 . They also satisfy the set of simul-
taneous equations for every vector x in the neighborhood, thereby giving to
the set of simultaneous equations above the status of a set of identities in this
neighbourhood. Moreover, the implicit functions f i are continuous and have
continuous partial derivatives with respect to all the x variables.

• It is then possible to find the partial derivatives of the implicit functions without
having to solve them for the y variables. Taking advantage of the fact that in
the neighborhood of the solution, the set of equations have a status of identities,
we can take the total differential of each equation and write dF j = 0. When
considering only dx1 6= 0 and setting the rest dxi = 0, the result, in matrix
notation, is (we will go through an example later in class):

∂F 1 ∂F 1 ∂F 1 ∂F 1
  ∂y1
  
∂y1 ∂y2
... ∂yn ∂x1 ∂x1
∂F 2 ∂F 2 ∂F 2 ∂y2 ∂F 2
    

 ∂y1 ∂y2
... ∂yn

 ∂x1
 
 = − ∂x1


...   ...
 ...
 ... ...
   
  
∂F n ∂F n ∂F n ∂yn ∂F n
∂y1 ∂y2
... ∂yn ∂x1 ∂x1

41
• Finally, since |J| is non zero there is a unique nontrivial solution to this linear
system, which by Cramer’s rule can be identified in the following way:

∂yj |Jj|
= .
∂x1 |J|

This is for general problems. Optimization problems have a unique feature: the
condition that indeed |J| =
6 0. (What is J? it is simply the matrix of partial
second derivatives of L, or what we call the bordered Hessian). We will see that
later on.
This means that indeed we can take the maximum value function, or set of
equilibrium conditions, totally differentiate them and find how the endogenous
variables change with the exogenous ones in the neighbourhood of the solution.

For example, for the case of optimization with one equality constraint:

F 1 (λ, x, y; b) = 0
F 2 (λ, x, y; b) = 0
F 3 (λ, x, y; b) = 0

is given by

b − g(x, y) = 0
fx − λgx = 0
fy − λgy = 0

We need to ensure that the Jacobian is not zero and then then we can use total
differentiation.

42
Coming back to the condition about the Jacobian, we need to ensure that:

∂F 1 ∂F 1 ∂F 1
∂λ ∂x ∂y
∂F 2 ∂F 2 ∂F 2

|J| = ∂λ ∂x ∂x
6= 0


∂F 3 ∂F 3 ∂F 3

∂λ ∂x ∂y

or:
0 −gx −gy



−gx fxx − λgxx fxy − λgxy 6= 0

−gy fxy − λgxy fyy − λgyy

but the determinant of J, is that of the bordered Hessian H̄. Whenever sufficient
second order conditions are satisfied, we know that the determinant of the bordered
Hessian is not zero (in fact it is positive).
Now we can totally differentiate the equations:

gx dx + gy dy − 1db = 0
(fxx − λgxx )dx + (fxy − λgxy )dy − gx dλ = 0
(fyx − λgyx )dx + (fyy − λgyy )dy − gy dλ = 0

∂x ∂y ∂λ
where at the equilibrium solution, one can then solve for , , .
∂b ∂b ∂b

43