Professional Documents
Culture Documents
1 Linear Algebra
Simultaneous Linear Equations:
Example:
3x1 + 2x2 = 6
6x1 + 7x2 = 2
6x1 + 7x2 = 2
6x1 4x2 = 12
3x2 = 14
14
x2 =
3
Substitute this into equation 1
14
3x1 + 2 =6
3
28 18 + 28 46
3x1 = 6 + = =
3 3 3
46
x1 =
9
The next two equations are equivalent to the …rst two.
3x1 + 2x2 = 6
3x2 = 14
x1+ 32 x2 = 2 or x1 = 2 23 14
3 = 46
9
x2 = 314 x2 = 14
3
The last pair is said to be row reduced and in echelon form. We want to do this
more generally
a11 x1 + a12 x2 + + a1N xN = y1
a21 x1 + a22 x2 + + x2N xN = y2
..
.
aM 1 x1 + aM 2 x2 + + aM N xN = yM
1
The amn and yn are numbers and x1 ; :::; xN are unknown. In order to be more
systematic we write the equation as:
Ax = y; where
0 1
a11 a1N
B .. .. C M
A = @ . . A N matrix
aM 1 aM N
0 1
x1
B C
x = @ ... A N -vector of unknowns
xN
0 1
y1
B C
y = @ ... A M -vector of numbers.
yM
Ax is the M -vector
a11 x1 + + a1N xN
a21 x1 + + a2N xN
..
.
aM 1 x1 + + aM N xN
Consider the following so called elementary operations on M N matrix A:
1. Multiply a row of A by a non-zero number.
2. Replace a row by that row plus c times another row, where c is a non-zero
number.
3. Interchange two rows.
If the M N matrix B is obtained from A by any one of these operations, then
A and B are equivalent in the sense the equations Bx = 0 and Ax = 0 have the same
solutions. (Think this through.)
.
Similarily, if the M (N + 1) matrix (B .. z) is obtained from the M (N + 1)
..
matrix (A . y) by an elementary row operation, then the systems Bx = z and Ax = y
have the same solutions.
Elementary row operations can transform any system Ax = y into a system
Bx = z where
a) the …rst non-zero entry in any row of B is 1, and
b) each column of B that contains the leading non-zero entry of some row has all
its other entries 0.
Example: 0 1
0 1 4 0
B0 0 0 0C
B C is row reduced.
@1 0 3 0A
0 0 0 1
2
Example:
0 10 1 0 1
3 2 1 x1 3
@6 4 2A @x2 A = @6A
6 8 5 x3 0
0 10 1 0 1
3 2 1 x1 3
! @0 0 0A @x2 A = @ 0 A
0 4 3 x3 6
0 10 1 0 1
1 2=3 1=3 x1 1
! @0 0 0 A @x2 A = @ 0 A
0 1 3=4 x3 3=2
0 10 1 0 1
1 2=3 1=3 x1 1
! @0 1 3=4A @x2 A = @ 3=2A
0 0 0 x3 0
0 10 1 0 1
1 0 1=6 x1 2
! @0 1 3=4 A @x2 A = @ 3=2A
0 0 0 x3 0
0 1
1 0 1=6
The matrix @ 0 1 3=4 A is an examle of a row reduced echelon matrix.
0 0 0
a) it is row reduced,
b) any row of zeros lies below all non-zero rows, and
c) if the …rst r rows are non-zero and the leading non-zero entry of row m is in
column nm for m = 1; : : : ; r then n1 < n2 < < nr .
0 1
1 0 3 0
B0 1 4 0C
Example: The matrix B
@0
C is a row reduced echelon matrix
0 0 1A
0 0 0 0
3
the non-zero rows of B. For 1 m r, let the leading non-zero entry of row m
be in column nm , where n1 < n2 < < nr . Since r < N , there is an n such that
1 n N and n 6= nm for any m. For such an n, let xn = 1: For k such that
1 k N , let xk = 0, if k 6= nm , for all m. It is now possible to solve for xn , for
n = n1 ; :::; nr ; so that Bx = 0: Then Ax = 0 and x 6= 0.
2 Vector Spaces
Consider RN = fv = (v1 ; :::; vN ) jvn 2 R; for all ng; where R is the set of real
numbers. We can de…ne the following operations on RN : If v and w belong to RN ,
v + w = (v1 ; :::; vN ) + (w1 ; :::; wN ) = (v1 + w1 ; :::; vN + wN ): If c 2 R and v 2 RN ,
cv = c(v1 ; :::; vN ) = (cv1 ; :::; cvN ): Let 0 = (0; 0; :::; 0) 2 RN : Observe that
a) x + y 2 RN ; if x 2 RN and y 2 RN
b) x + y = y + x;
c) there is a 0 2 RN and 0 + x = x; for all x 2 RN ;
d) for all x 2 RN , there is a unique x 2 RN such that x + ( x) = 0;
e) 1x = x; for x 2 RN
f) (c1 c2 )x = c1 (c2 x); for all numbers c1 and c2 and for all x 2 RN ;
g) (x + y) + z = x + (y + z); for all x; y; and z in RN ;
h) c(x + y) cx + cy; for numbers c and for x and y in RN ;
i) (c1 + c2 )x = c1 x + c2 x; for all numbers c1 and c2 and for all x 2 RN :
4
De…nition: If V is vector space, v 2 V is said to be a linear combination of
w1 ; :::; wN 2 V; if there are numbers c1 ; :::cN such that v = c1 w1 + + cN wN :
De…nitions: If w1 ; :::; wN 2 V; their linear span is the set of all linear combinations
of w1 ; :::; wN : The linear span of w1 ; :::; wN is a subspace of V and is the smallest
subspace containing w1 ; :::; wN : The vectors, w1 ; :::; wN are said to span V , if V is
the linear span of w1 ; :::; wN :
De…nition: Vectors v1 ; :::; vN in V are linearly independent if there are not depen-
dent.
0 1 0 1 0 1
1 0 1
Example: @0A ; @1A ; @1Aare dependent in R3 , since
0 0 0
0 1 0 1 0 1 0 1
1 0 1 0
@0A + @1A @1A = @0A
0 0 0 0
0 1 0 1 0 1
1 0 0
@0A ; @1A ; @0Aare independent, since
0 0 1
0 1 0 1 0 1 0 1 0 1
0 1 0 0 c1
@0A = c1 @0A + c2 @1A + c3 @0A = @c2 A ) c1 = c2 = c3 = 0
0 0 0 1 c3
Example: sin and cos are independent, for suppose that a sin + b cos = 0: Then
0 = a sin( =2) + b cos( =2) = a and 0 = a sin(0)+ b cos(0) = b, so that a = b = 0,
sin + 2 cos and 2 sin 4 cos are dependent, since 2(sin + 2 cos)+( 2 sin 4 cos) = 0
Example: Let
en = (0; :::; 0; 1; 0; :::; 0) 2 RN
"
nth slot
e1 ; :::; eN is the standard basis for RN :
5
Theorem: If v1 ; :::; vM span a vector space V , then any independent set of vectors
in V has no more than M elements.
Proof : I must show that if N > M and w1 ; :::; wN are in V , then w1 ; :::; wN are
linearly dependent. Since v1 ; :::; vM span V , wn = M m=1 amn vm ; for all n and for
some numbers a1n ; :::; aM n : If x1 ; :::; xN are numbers, then
N
X N
X M
X
x1 w1 + + xN wN = xn wn = xn amn vm
n=1 n=1 m=1
N XM M N
!
X X X
= amn xn vm = amn xn vm
n=1 m=1 m=1 n=1
Since N > M , aPprevious theorem implies that there exist numbers x1 ; :::; xN not
all zero such that n amn xn = 0 for m = 1; :::; M: Hence, x1 w1 + + xN wN = 0
and w1 ; :::; wN are linearly dependent.
Corollary: If V is a …nite dimensional vector space, then any two bases have the
same number of elements.
Proof : Let M = dim W and N = dim V: I must show that M < N: If W = f0g then
dim W = 0 < dim V: Suppose that W 6= f0g. If w1 ; :::; wM are linearly independent
vectors in W , they are linearly independent in V and so M N: Therefore, there
is a linearly independent set of vectors in W with a largest number of elements, say
w1 ; :::; wr : By the previous lemma w1 ; :::; wr is a basis for W and r = dim W: Since
W 6= V; there is a v in V such that v 2 = W: By the previous lemma, w1 ; :::; wr ; v are
independent and hence N M + 1 > M:
6
Theorem: If v1 ; :::; vN is a basis for V and v 2 V , then the numbers c1 ; ::; cN , such
PN
that v = n=1 cn vn are unique.
N
X N
X N
X
Proof : cn vn = v = an vn =) (cn an )vn = 0 =) cn an = 0; for all n,
n=1 n=1 n=1
since v1 ; :::; vN are independent.
7
MATH CAMP: Lecture 2
f (s + ); if 0 s
(T f ) (s) =
f (s ), if s 2
Let 0 1 0 1 0 1
y1 x1 a11 a1N
B C B .. C ; B .. .. C :
y = @ ... A ; x = @ . A A=@ . . A
yM xN aM 1 aM N
Then, y = Ax. The M N matrix A represents T in that there is one and only
one linear transformation T corresponding to A and one and only one matrix A
corresponding to T given the bases v1 ; :::; vN for V and w1 ; :::; wM for w:
Let S : W ! Q be a linear transformation and let q1 ; :::; qJ be a basis for Q. Let
the J M matrix B = (bjm ) representing S, so that
X
J
S (wm ) = bjm qj :
j=1
1
S T :V ! Q is the linear transformation de…ned by S T (v) = S(T (v)). Then
!
XM XM X
M XJ
S T (vn ) = S(T vn ) = S amn wm = amn S(wm ) = amn bjm qj
m=1 m=1 m=1 j=1
!
X
J X
M X
J
= bjm amn qj = cjn qj ,
j=1 m=1 j=1
PM
where cjn = m=1 bjm amn , so that the J N matrix C = (cjn ) represents S T.
0 1
c11 c1N
B .. .. C
C = @ . . A
cJ1 cJN
0 10 1
b11 b1M a11 a1N
B .. .. C B .. .. C = BA:
= @ . . A@ . . A
bJ1 bJM aM 1 aM N
Example: 0 1
2 3 2
1 1 0 @ 0 0 2 3 1
1 A= :
0 1 0 0 0 1
1 0 0
Note: The order in which matrices are multiplied does not a¤ect the product. That
is, if A is an M N matrix, B is a J M matrix and C is a K J matrix, then
(CB)A = C(BA):
1
De…nition: An N N matrix A is invertible if there is an N N matrix A such
that
1 .. 0
A 1 A = AA 1
=I= . :
0 1
I is called the N N identity matrix and represents the identity function idV :
V ! V , where V is an N dimensional vector space and idV (v) = v, for all v 2 V .
Clearly, IA = AI = A; for any N N matrix A.
2
1
Lemma: If A and B are invertible, then AB is invertible and (AB) = B 1A 1.
Proof:
(B 1 A 1 )(AB) = B 1 (A 1 A)B = B 1 IB = B 1 B = I
(AB)(B 1 A 1 ) = AIA 1 = AA 1 = I:
Remarks:
1
1. f : V ! W is onto if and only if there exists f : W ! V such that
f (f 1 (w)) = w, for all w 2 W .
3
3. f is one to one and onto, if and only if f is invertible.
1
Theorem: If T : V ! W is an invertible linear transformation, them T is linear.
1 1 1 1
c1 T (w1 ) + c2 T (w2 ) = c1 v1 + c2 v2 = T T (c1 v1 + c2 v2 ) = T (c1 w1 + c2 w2 ):
4
a) Multiplication of the rth row of A by c 6= 0 corresponds to P A, where
0 1
1 0 : : : 0 : : : 0
B0 1 : : : : : : : 0C
B C
B: : : : C
B C
B: : : :C
B C
B: 1 0 : C
P = B B0 : : : 0 c 0 : : 0C row r
C
B C
B: 0 1 : C
B C
B: : : : C
B C
@: : : 0A
0 : : : : 0 : : 0 1
"
column r
and
0 1
1 0 : : : 0 : : : 0
B0 1 : : : : : : : 0C
B C
B: : : :C
B C
B: : : :C
B C
B: 1 0 :C
P 1
= B
B0
C
B : : : 0 c 1 0 : : 0C
C row r
B: 0 1 :C
B C
B: : : :C
B C
@: : : 0A
0 : : : : 0 : : 0 1
"
column r
5
corresponds to P A, where
column s column r
# #
0 1
1 0 0 : 0 : : : 0 : : : : : 0
B0 1 0 : : : 0C
B C
B: : : :C
B C
B: 0 : : :C
B C
B0 : 0C
B : : 0 c 0 : 0 1 0 : : : C row r
B0 : : : 0 : : : 0 1 0 : : : 0C
B C
B: : : :C
B C
P = B
B: : : : :C
C
B: : : :C
B C
B: : : : :C
B C
B: : : :C
B C
B: : : : :C
B C
B: : : 0C
B C
@0 : : : 0 : : : 0 : : : 0 1 0A
0 : : : 0 : : : 0 : : : : 0 1
and
column s column r
# #
0 1
1 0 0 : 0 : : : 0 : : : : : 0
B0 : : : 0C
B C
B: : : :C
B C
B: 0 : : :C
B C
B0 : 0C
B : : 0 c 0 : 0 1 0 : : : C row r
B0 : : : 0 : : : 0 1 0 : : : 0C
B C
B: : : :C
B C
P 1
= B
B: : : : :C
C
B: : : :C
B C
B: : : : :C
B C
B: : : :C
B C
B: : : : :C
B C
B: : : 0C
B C
@0 : : : 0 : : : 0 : : : 0 1 0A
0 : : : 0 : : : 0 : : : : 0 1
6
0 1
1 0 : : : 0 : : : : : : : : 0
B0 1 0 : : 0 : : : : : : : : 0C
B C
B: : : : :C
B C
B: : : : :C
B C
B0 : : 0 1 0 : : : : : 0 : : 0C
B C
B0 : 0C
B : : : : 0 : : : : 0 1 0 C row r
B0 : : : : 0 1 0 : : : 0 : : 0C
B C
P = B
B: : : : :C
C
B: : : : :C
B C
B: : : : :C
B C
B: 0 : : :C
B C
B0 : :C
B : : : 0 1 0 : : : : 0 : C row s
B 0 : 1 :C
B C
@ : : : 0A
0 : : : : 0 : : : : : 0 : 0 1
" "
column r column s
1
P = P:
So, I = P1 P2 ; :::; PQ A, where Pq is invertible, for all q. Let P = P1 P2 ; :::; PQ . P 1 =
PQ 1 ; :::; P1 1 , so that P is invertible. I = P A, since A is row reduced to I via left
multiplication by the matrix P . P 1 = P 1 P A = (P 1 P )A = IA = A. Therefore,
I = P 1 P = AP . Since P A = I = AP , A is invertible.
1
Proof: By the theorem, A is invertible. Therefore, BA = I implies that (BA)A =
A 1 , so that B = BI = B(AA 1 ) = (BA)A 1 = A 1 :
7
De…nition: If T : V ! W is a linear transformation, the rank of T is the dimension
of the range of T and the nullity of T is the dimension of the null space of T .
I will later show that the row rank of A equals its column rank.
Proof: Let v1 ; :::; vK be a basis for the null space of T . Extend v1 ; :::; vK to a
basis v1 ; :::; vK ; vK+1 ; :::; vN of V . I show that T (vK+1 ); :::; T (vN ) is a basis for the
range of T . The vectors T (v1 ); :::; T (vN ) span the range of T . Since T (vn ) = 0; if
n 5 K, T (vK+1 ); :::; T (vN ) span the range of T . I show that T (vK+1 ); :::; T (vN ) are
independent, so that T (vK+1 ); :::; T (vN ) is a basis for the range of T and hence rank
T = N K:
!
X N XN XN
cn T (vn ) = 0 = T cn vn =) cn vn
n=K+1 n=K+1 n=K+1
X
K X
K X
N
= bn vn =) bn v n cn vn = 0
n=1 n=1 n=K+1
8
The function T portrayed in this diagram may be thought of as a projection of R2
followed by a linear function from the vertical axis onto R.
Theorem: Let T : V ! W be linear and suppose that dim V = dim W . Then, the
following are equivalent.
1) T is invertible.
2) T is non-singular.
3) T is onto.
4) If v1 ; :::; vN is a basis of V , then T (v1 ); :::; T (vN ) is a basis of W .
5) There is a basis v1 ; :::; vN of V such that T (v1 ); :::; T (vN ) is a basis of W .
9
Proof: 1 =) 2. Obvious.
2 =) 3. Suppose that T is non-singular. Let v1 ; :::; vN be a basis of V . By the pre-
vious lemma, T (v1 ); :::; T (vN ) are independent. Since dim W = N , T (v1 ); :::; T (vN )
is a basis of W . If w 2 W , w = c1 T (v1 ) + + cN T (vN ) = T (c1 v1 + + cN vN ).
Therefore, T is onto.
3 =) 4. Let v1 ; :::; vN be a basis of V . Since these vectors span V and T is
onto, T (v1 ); :::; T (vN ) span W . Since dim W = N; T (v1 ); :::; T (vN ) are independent.
Therefore T (v1 ); :::; T (vN ) is a basis of W .
4 =) 5. Obvious.
5 =) 1. Suppose that there is a basis v1 ; :::; vN of V such that T (v1 ); :::; T (vN ) is
a basis of W . Then, rank T = dim W = dim V . Therefore, by a previous theorem,
nullity of T = 0. Therefore, T is one to one. Since rank T = dim W , T is onto.
Therefore, T is invertible.
10
MATH CAMP: Lecture 3
P PM PK PK PM
implies that 0 = M m=1 (0)wm = m=1 k=1 ck amk wm = k=1 ck m=1 amk wm =
PK
k=1 ck uk ; which in turn implies that c1 = c2 = = cK = 0 since u1 ; :::; uK are
independent. Hence the vectors
0 1 0 1
a11 a1K
B a21 C B a2K C
B C B C
B .. C ; :::; B .. C
@ . A @ . A
aM 1 aM K
are independent.
Suppose that 0 1 0 1
a11 a1K
B a21 C B a2K C
B C B C
B .. C ; :::; B .. C
@ . A @ . A
aM 1 aM K
P
are independent. To show that u1 ; :::; uK are independent, suppose that 0 = K c k uk =
PK PM PM PK PK k=1
k=1 ck m=1 amk wm = m=1 k=1 ck amk wm; which implies that k=1 ck amk =
1
0, for all m; since w1 ; ::; wM are independent. Therfore,
0 1 0 1
a11 a1K
B a21 C B a2K C
B C B C
c1 B .. C + + cK B .. C = 0
@ . A @ . A
aM 1 aM K
Proof: Let v1 ; :::; vN be a basis for V and let w1 ; :::; wM be the basis for W , such
that A is the representation of T with respect to these bases. Let K be the rank of
T . Then, dim(span(T (v1 ); :::; T (vN ))) = K: There exist K of the vectors v1 ; :::; vN ,
say v1 ; :::; vK such that T (v1 ); :::; T (vK ) is a basis for the range of T; which equals the
span of T (v1 ); :::; T (vN ): By the lemma, the column vectors
0 1 0 1
a11 a1K
B a21 C B a2K C
B C B C
B .. C ; :::; B .. C
@ . A @ . A
aM 1 aM K
P PK PK PM
are independent. If n > K; M m=1 amn wm = T (vn ) = k=1 ck T (vk ) = k=1 ck m=1 amk wm =
PM PK
m=1 k=1 ck amk wm , for some numbers c1 ; ::; cK . Since w1 ; :::; wM are indepen-
P
dent, amn = K k=1 ck amk , for all m. That is,
0 1 0 1 0 1
a1n a11 a1K
B a2n C B a21 C B a2K C
B C B C B C
B .. C is in the linear span of B .. C ; :::; B .. C
@ . A @ . A @ . A
aM n aM 1 aM K
and so these vectors are a basis for the linear span of the columns of A.
Duality
De…nition: If V is a vector space, a linear functional on V is a linear function
f : V ! R: The set of all linear functionals on V is called the dual space of V and
denoted by V :
2
Remark: V is a vector space, where if f 2 V and g 2 V and a and b are numbers,
af + bg : V ! R is de…ned by (af + bg)(v) = af (v) + bg(v):
Let v1 ; :::; vN be a basis for V and Plet f 2 VP. Then (f (v1 ); :::; f (vN )) is the matrix
representation of f , so that if v = N n=1 cn vn 2 V , then
0 1
XN c 1
B C
f (v) = cn f (vn ) = (f (v1 ); :::; f (vN )) @ ... A :
n=1 cN
Example: Let V = RN and let e1 ; :::; eN be the standard basis of RN . The dual
of V = (RN ) is f1 ; :::; fN , where, for all n and k, fn (ek ) = nk
basis P P:NIf y 2 V ,
N N
y = n=1 y n fn , for some numbers y1 ; :::; yN . If x 2 R , y(x) = y( k=1 xk ek ) =
PN P N PN PN
n=1 yn fn ( k=1 xk ek ) = n=1 yn fn (xn en ) = y x . Therefore, y may be iden-
PN n=1 n n
ti…ed with the vector (y1 ; :::; yN ) and y(x) = n=1 yn xn . Hence, V may be identi…ed
with RN .
Remark: S is a subspace of V .
Example:
P If W = f(t; :::; t) 2 RN jt 2 Rg; W may be identi…ed with f(y1 ; :::; yN ) 2
N
RN j n=1 yn = 0g:
3
Proof: Let v1 ; :::; vK be a basis for W . Extend v1 ; :::; vK to a basis v1 ; :::; vK ;
vK+1 ; :::; vN of V . Let f1 ; :::; fN be the basis for V dual to v1 ; :::; vN :
I show you that fK+1 ; :::; fN is a basis for W : IfPn = K + 1, fn 2 W ; since
fn (vm ) = 0, for m 5 K, and for any w 2 W , w = K n=1 am vm , for some numbers
a1 ; :::; aK : The functions fK+1 ; :::; fN are linearly independent, since f1 ; :::; fN is a
basis for V . In order to show that fK+1 ; :::; P fN is a basis for W ; it is su¢ cient to
show that they span W . If f 2 V , f = N n=1 f (vn )fn . If f 2 W , f (vn ) = 0, for
PN
n 5 K. Therefore, f = n=K+1 f (vn )fn , and so fK+1 ; :::; fN span W:
Proof: Let W RN be the linear span of the rows of A and let K = dim W =
row rank of A. Then dim W = N K. W may be viewed as a subset of RN under
the identi…cation of (RN ) with RN . Under this identi…cation, W is the set of all
solutions x of the equation Ax = 0.
Let T : RN ! RM be the linear transformation with matrix representation A with
respect to the standard bases of RN and RM . Then, W is the null space of T , and
the range of T is the linear span of the columns of A. Therefore, the column rank of
A equals the rank of T . We know that rank of T + nullity of T = N . Therefore, the
column rank of A = rank of T = N nullity of T = N dim W = N (N K) = K =
row rank of T .
Inner Product
standard inner product on RN is the function : RN
De…nition: The P RN ! R
de…ned by x y = N n=1 xn yn .
Remarks:
1. If a and b are numbers and x, y, and z belong to RN , then x y = y x and
x (ay + bz) = ax y + bx z. (These equations are easy to verify.)
xy
2. If x 2 RN and y 2 RN and is the angle between x and y, then cos = jjxjj jjyjj
:
(This equation is a little harder to verify.)
4
3. jx yj jjxjj jjyjj. This is called the Cauchy–Schwarz inequality. It follows
from (2).
Remarks:
5
MATH CAMP: Lecture 4
Orthogonal Projections
Let W be a subspace of V , which is a subspace of RN .
Figure 1
Proof: Let v1 ; :::; vK be a basis for W and let vK+1 ; :::; vM be a basis for W ? V ,
where M = dim V and K = dim W . I show that vP 1 ; :::; vM is a basis for V . First I
show that v1 ; :::; vM are independent. Suppose that M n=1 cn vn = 0. Then c1 v1 + +
?
cK vK = cK+1 vK+1 cM vM . Hence, w = c1 v1 + +cK vK 2 W \W . Therefore,
0 = w w, so that w = 0. Since v1 ; :::; vK are independent, c1 = = cK = 0. Since
0 = cK+1 vK+1 cM vM and vK+1 ; :::; vM are independent, cK+1 = = cM = 0.
Therefore, v1 ; :::; vM are independent.
PM Since M = dim PVK, v1 ; :::; vM is a basis for V .
If v 2 V , then, v = c
n=1 n nv . Let (v) = n=1 cn vn . Then, v (v) =
PM ? ?
n=K+1 cn vn . Since vn 2 W , for n > K, v (v) 2 W . Hence, (v) exists.
In order to show that (v) is unique, suppose P that v 2 V and v =P w^ + (v w),^
where w^ 2 W and v w^ 2 W ? . Then, w^ = K a
n=1 n n v and v w
^ = M
n=K+1 n n ,
a v
1
since v1 ; :::; vKPis a basis for W and vK+1 ; :::; vM is a basis for W ? . Therefore, v =
w^ + v w^ = M n=1 an vn .PSince v1 ; :::; vM is a basis for V , the numbers a1 ; :::; aM are
unique. Therefore, w^ = K n=1 an vn = (v).
Orthonormal Bases
De…nition: A set of vectors v1 ; :::; vM in RN is said to be orthogonal if vn vm = 0,
whenever n 6= m.
PK
Remark: If v1 ; :::; vM is an orthonormal basis for V and v 2 V; n=1 (v:vn )vn is the
orthogonal projection of v onto the linear span of v1 ; :::; vK :
2
yk+1 and the projection of yk+1 onto the linear span of v1 ; :::; vk ; which equals the
linear span of y1 ; :::; yk :
Determinants
De…nition: A permutation of f1; :::; N g is a one to one and onto function :
f1; :::; N g ! f1; :::; N g:
Figure 2
Say that is odd if the number of interchanges is odd. Otherwise, is even. Let
1, if is even
the sign of be sgn = :
1, if is odd.
Let A = (amn ) be an N N matrix. The determinant of A is
X
det A = (sgn )a1; (1) a2; (2) ; :::; aN; (N ) :
is a permutation
of f1;:::;N g
That is, pick one entry from each row, every time from a di¤erent column, and
multiply these N numbers together. The choice of column de…nes a permutation of
f1; :::; N g. Multiply the product by the sign of this permutation. Add these products
over all possible permutations. The sum is the determinant.
3
a11 a12
If N = 1, A = (a11 ) and det A = a11 . If N = 2, then A = and det
a21 a22
A = a11 a22 a21 a12 .
For any N , det I = 1, where I is the N N identity matrix. That is,
0 1
1 0 : : : 0
B0 1 0 : : 0C
B C
B: :C
B C
I=B B: :C
C;
B: :C
B C
@0 : : 0 1 0A
0 : : : 0 1
where I has N rows and N columns.
The determinant of an N N matrix A may be considered to be a function of
the N rows of A, each of which is a vector in RN : In order to descriibe this function,
I need the following notation.
SK = S S S ;
K times !
4
permutation of f1; :::; N g that interchanges n and k: Then
X
det A = (sgn )a1; (1) ; :::; aN; (N )
X
= (sgn )a1; (1) ; :::; aN; (N )
X
= sgn( )a1; (1) ; :::; aN; (N )
X
= sgn( )a1; (1) ; :::; aN; (N ) = det A:
This theorem relates the determinant to elementary row operations, which can be
used to simplify a matrix and hence compute its determinant.
Of course, f (e1;:::; eN ) = det I; where I is the N N identity matrix and en is the
nth standard basis vector for RN : The following two theorems may be proved using
elementary row operations.
5
Theorem: The determinant is the unique alternating multilinear form
f : (RN )N ! R
Example:
0 1T
6 1
@4 2A = 6 4 2
1 2 5
2 5
De…nition: The (m; n)th cof actor of A is Cmn = ( 1)m+n det A(mjn):
PN
Theorem: For n such that 1 n 5 N; det A = m=1 amn Cmn :
6
Remarks:
1 1 det a = a
a a12
2 2 det 11 = a11 a22 a21 a12
0a21 a22 1
a11 a12 a13
@ a a23
3 3 det a21 a22 a23 A = a11 det 22
a32 a33
a31 a32 a33
a a13 a a13
a21 det 12 + a31 det 12
a32 a33 a22 a23
and so on.
PN
3. Since det A = det AT , det A = n=1 amn Cmn . This is the expansion of the
determinant by cofactors along row m.
PN
Theorem: If k 6= n, then m=1 amn Cmk = 0.
Proof: Replace the kth column of A by the nth column, obtaining the matrix B.
Then by the de…nition of B, B(m j k) = A(m j k). Since B has two equal columns,
X
N X
N X
N
m+k m+k
0 = det B = ( 1) bmk det B(m j k) = ( 1) amn det A(m j k) = amn Cmk ;
m=1 m=1 m=1
1; if n = k
nk =
0; otherwise.
7
De…nition: The adjoint matrix of A is the matrix adj A, the (m; n)th entry of
which is
(adj A)mn = Cnm = ( 1)n+m det A(n j m);
for m and n such that 1 n, m N.
The adjoint matrix is the transpose of the matrix of cofactors of A, where the
(m; n)th entry of the matrix of cofactors is Cmn . Notice that the (m; n)th entry of
the product matrix (adj A)A is
X
N X
N X
N
(adj A)mj ajn = Cjm ajn = ajn Cjm = nm det A:
j=1 j=1 j=1
Therefore (adj A)A = (det A)I, where I is the N N identity matrix. Hence
1
(adj A)A = I;
det A
if det A 6= 0. That is, if det A 6= 0, A has a left inverse and so is invertible and
1 1
A = adj(A):
det A
Conversely if A is invertible, then A 1 A = I, so that 1 = det I = det A 1 A =
(det A 1 )(det A) and so det A 6= 0 and
1 1
det A = :
det A
This proves the following theorem.
8
MATH CAMP: Lecture 5
Singular Matrices
De…nition: Let V be a …nite dimensional vector space. The linear transformation
T : V ! V is singular if T (v) = 0; for some v 6= 0:
a + bi = (c + di)(x + yi)
cx dy = a
dx + cy = b:
1
Notice that any real number, r; is also complex, in that it may be written as r +0i:
Normally a number is said to be complex if it is of the form a + bi; where b 6= 0:
The key property of complex numbers is that any polynomial equation
o = p(x) = aN xN + aN 1x
N 1
+ ::: + a1 x + a0
with complex (or real) coe¢ cients has a complex solution. Furthermore if aN 6= 0;
then the polynomial p(x) may be written as
where b1 ; :::; bN are the N roots of p; which means that they are solutions of the
equation p(x) = 0 These roots may not all be distinct. A complex number b is said
to be a multiple root of p; if b = bn ; for more than one value of n: If aN 6= 0; the
positive integer N is said to be the degree of p:
Remark:
1. The same terminology applies to linear transformations T : V ! V:
a b
Example: Let A = () ; where a and b are real numbers. The characteristic
b a
equation of A is
x a b
0 = det(xI A) = det = (x a)2 + b2 = x2 2ax + a2 + b2 :
b x a
2a
p
4a2 4a2 4b2 2a 2bi
p
Therefore, x = 2
= 2
=a bi. where i= 1:
2
Theorem: det(A) = 1 ; :::; N ;where 1 ; :::; N are the characteristic values of A:
Theorem: Let A be an N N matrix and suppose that 1 ; :::; N are the charac-
teristic values of A. If j n j < j for all n; then limk!1 Ak = 0 where Ak = AA
| {z A}
k times
Quadratic Forms
De…nition: If V is a vector space, f : V V ! R is a bilinear form on V if for
each v 2 V; f (v; v) and f (v; v) are linear functions of v.
Bilnear forms have a matrix representation. Let v1 ; :::; vN be an ordered basis of
V and let f : V V ! R be a bilinear form. For each m and n,Plet amn = f (vm ; vn ):
Let A be the N N matrix with (m; n)th entry amn : If v = N m=1 bm vm and w =
PN
n=1 cn vn belong to V , then
!
X X XX
f (v; w) = f bm vm ; cn vn = bm cn f (vm ; vn )
m n m n
0 1
X
N X
N c1
B C
= bm cn amn = (b1 ; :::; bN )A @ ... A
m=1 n=1 cN
The N N matrix A represents f . That is, given a basis 0v1 ;1:::; vN for V , there is one
c1
B C P
and only one matrix A such that f (v; w) = (b1 ; :::; bN )A @ ... A ; where v = N n=1 bn vn
cN
PN
and w = n=1 cn vn :
Remark: The quadratic form f is symmetric, if and only if the matrix A repre-
senting it is symmetric.
3
De…nition: The quadratic form associated with a symmetric bilinear form f is
q(v) = f (v; v):
Proof: I prove only the only if statement and that only for the positive de…nite
case. The proofs of the other cases are similar.
Let be a characteristic value of A and let x be a corresponding characteristic
vector. Then,
(A I)x = 0;
so that
Ax = x
and hence
xT Ax = xT x
It follows that if A is positive de…nite, then 0 < xT Ax = xT x; so that > 0:
4
Real Analysis
We know what it means for numbers to be close to each other. The part of real
analysis we will use has to do with generalizations of the notion of closeness and
applications of it.
Examples:
1. RN is open.
2. The empty set, ;; is open in RN , because any assertion about nothing is true.
5
Examples:
2. [0; 1] is closed in R.
4. f(x0 ; 0) 2 R2 j 0 x0 1g is closed in R2 .
6
Proof: I show that A \ B is open if A and B are open. If x 2 A \ B, there
is "A > 0, such that B"A (x) A and there is "B > 0, such that B"B (x) B. Let
" = min("A ; "B ). Then, B" (x) B"A (x) A and B" (x) B"B (x) B, so that
B" (x) A \ B. Therefore, A \ B is open.
I show that [U 2U U is open. If x 2 [U 2U U , then x 2 U 0 , for some U 0 2 U. Since
U 0 is open, there is " > 0 such that B" (x) U 0 [U 2U U . Therefore, [U 2U U is open.
I show that A[B is closed if A and B are closed. RN n(A[B) = (RN nA)\(RN nB).
Since A and B are closed, RN n A and RN n B are open, so that (RN n A) \ (RN n B)
is open, so that RN n (A [ B) is open and hence A [ B is closed.
If C is closed, RN nC is open. Therefore, [C2C (RN nC) is open and so RN n(\C2C C)
is open. Therefore, \C2C C is closed.
Examples:
1. The intervals [ n1 ; 1] are closed, for n = 1; 2; :::, yet
S
1 1
; 1 = (0; 1] = fx 2 R j 0 < x 1g is not closed.
n=1 n
1
2. The intervals ( n
;1 + n1 ) are open, for n = 1; 2; :::, yet
T
1 1 1
;1 + = [0; 1] is not open.
n=1 n n
Examples
1. (0; 1] = fx j 0 < x 1g is open in [0; 1] = fx j 0 x 1g; but is not open in
R:
2. (0; 1] is closed in (0; 2); but is not closed in R:
3. f(x0 ; 0) j 0 < x0 < 1g is open in f(x0 ; 0) j 1 < x0 < 1g, though it is not
open in R2 .
7
De…nition: Let A RN , B RM , and f : A ! B. Then, f is continuous if for
every U B that is open in B, f 1 (U ) = fx 2 A j f (x) 2 U g is open in A.
Examples:
8
0, if x = 0
2. f : [0; 1) ! (0; 1) de…ned by f (x) = 1 is not continuous.
x
, if x > 0.
1, if 0 x < 1=2
3. f : [0; 1] ! [0; 1) de…ned by f (x) = f is not continuous.
0, if 1=2 x 1.
9
Theorem: f : A ! B is continuous at x if and only if for every sequence x1 ; x2 ; :::
in A that converges to x, limn!1 f (xn ) = f (x).
Proof: The argument should be clear, given what has been presented earlier.
10
MATH CAMP: Lecture 6
Proof: For each n, let xn = (xn1 ; :::; xnN ). If xn is Cauchy, then for each k =
1; :::; N , the sequence xnk is Cauchy. Therefore, there is a number yk such that
limn!1 xnk = yk . Let y = (y1 ; :::; yN ). Then limn!1 xn = y.
The completeness property of the real numbers may be expressed by saying that
every set of numbers with an upper bound has a least upper bound or every set of
numbers with a lower bound has a greatest lower bound.
The least upper bound for X is denoted by lub X or supX, which is read as “the
supremum of X.” In an analogous fashion, we may de…ne “bounded from below,”
“lower bound,” and “greatest lower bound.” The greatest lower bound is written as
glb X or as inf X, read as “the in…mum of X.” Clearly glb X = lub( X), where
X = f x j x belongs to Xg, so that a set that is bounded from below has a greatest
lower bound if and only if a set that is bounded from above has a least upper bound.
Least Upper Bound Property: Any set of numbers that is bounded from above
has a least upper bound
1
Theorem: The least upper bound property is equivalent to the completeness prop-
erty.
Proof: Let x1; x2; :::; be a sequence in RN that converges to x: There is a positive
integer M such that kxn xk 1; if n > M: Then kxn k 5 max(kx1 k ; :::; kxM k ; kxk+
1); for all n:
Theorem: If x1; x2; :::; is a sequence in RN that converges to x; then every subse-
quence of x1; x2; :::; converges to x:
Proof: If " > 0; let M be a positive integer such that kxn xk < " if n = M: If
xn1; xn2 :::; is a subsequence of x1; x2; :::;then nk = k; for all k; so that kxnk xk 5 "; if
k = M: Therefore xn1; xn2; :::; converges to x:
2
some positive number b. Divide C1 in half along each dimension, obtaining 2N sub-
cubes, each with edges of length 2b=2 = b: One of these subcubes contains the point
xn for in…nitely many numbers n. Call this cube C2 : Suppose that cubes C1 ; :::; CK
have been de…ned where C1 C2 CK and for each k, Ck has edges of length
2b=2k = b2 k+1 and Ck contains xn ; for in…nitely many integers n: Divide CK in half
along each dimension, obtaining 2N subcubes. One of these contains xn , for in…nitely
many n. Call this cube CK+1 : I have de…ned by induction on K a sequence of cubes
C1 ; C2 ; ::: such that
2. C1 C2 ; and
k+1
3. for all k; each edge of Ck has length b2 :
Example: The set of all open intervals in R is an open cover of [0; 1]:
3
Proof: Suppose that every open cover of A contains a …nite subcover. I show that
A is closed and bounded. In order to show that A is closed let x 2 RN A and for
each m = 1; 2; :::; let Um = fy 2 RN j ky xk > 1=mg. [1 m=1 Um = R
N
fxg, so
M
that U1 ; U2 ; ::: is an open cover of A. Therefore, for some M , A [m=1 Um = UM .
Therefore, B1=M (x) \ A = ;: Hence RN A is open and so A is closed.
In order to show that A is bounded, for m = 1; 2; :::; let Um = fx 2 RN j kxk <
mg. [1 N
m=1 Um = R , so that U1 ; U2 ; ::: is an open cover of A. Therefore, for some M ,
A [M m=1 Um = UM , so that kxk M , for all x 2 A and hence A is bounded.
Suppose now that A is compact. I show that every open cover, U, of A contains
a …nite subcover. Suppose U contains no …nite subcover of A. Let C1 be a cube
containing A. Divide C1 into 2N subcubes of equal size that intersect only along
sides. One of those subcubes, C2 , is such that A \ C2 is not empty and A \ C2 is not
covered by a …nite subcover of U, for if there were no such subcube, the intersection of
A with each subcube would have a …nite subcover and the union of this …nite number
of subcovers would be a …nite subcover of A, contrary to hypothesis. Suppose that
C1 ; C2 ; :::; CK have been de…ned such that, C1 C2 CK and, for each k > 1,
N
Ck 1 is the union of 2 cubes congruent to Ck and that intersect only along sides
and Ck \ A is not empty and has no …nite subcover. Divide CK into 2N congruent
subcubes that intersect only along sides. One of these subcubes, CK+1 ; is such that
A \ CK+1 is not empty and has no …nite subcover. By induction on K, I have de…ned
cubes C1 ; C2 ; ::: such that C1 C2 , limK!1 diam(CK ) = 0, and for all K,
CK \ A is not empty and has no …nite subcover.
By the completeness property of the real numbers, there is x 2 \1 k=1 Ck . Also,
for every k, there is xk 2 Ck \ A. Because limk!1 diam(Ck ) = 0, it follows that
limk!1 xk = x. Since A is closed, x belongs to A. Since U covers A, there is a U in
U, such that U contains x. Since U is open, there is a positive number " such that
B" (x) U . Because xk 2 Ck , limk!1 xk = x, and limk!1 diam(Ck ) = 0, there is a
positive integer K such that CK B" (x) U . Therefore U covers CK \ A, contrary
to hypothesis. This contradiction proves that every open cover of A contains a …nite
subcover.
4
Proof: Since A is compact, it is bounded and hence glb(A) and lub(A) exist. By the
de…nition of lub(A), there is a sequence x1 ; x2 ; ::: in A, such that limn!1 xn = lub(A).
Since A is closed, limn!1 xn 2 A. A similar argument proves that glb(A) 2 A.
Remark: This theorem says that a continuous function de…ned on a compact set
achieves its minimum and maximum.
5
Example: Let X = f(x1 ; x2 ) 2 R2 j0 x; 2; 0 x2 2g and let B = [0; 1):
Let p vary over B: Let f (x1; x2 ; p) = x1 and g(x1 ; x2 ; p) = px1 + x2 p: If p > 0;
h(p) = (1; 0):
If p = 0; h(p) = (2; 0):
6
1. for all b 2 B, fx 2 Xjgk (x; b) 0; for k = 1; :::; Kg is non-empty,
2. for all (x; b) 2 X B such that gk (x; b) 0; for all k, and for all " > 0; there
exists a > 0 such that if jjb1 bjj < ; there exists an x1 2 X such that,
gk (x1 ; b1 ) 0; for k = 1; :::; K; and jjx1 xjj < ";
3. if the problem
max f (x; b)
x2X
s.t. gk (x; b) 0, for k = 1; :::; K (??)
Then, the function h : B ! X; where h(b) is the unique solution of problem (??);
exists and is continuous.
Proof: Since X is compact and the gk are continuous, fx 2 Xjgk (x; b) 0; for
k = 1; :::; Kg is compact, for all b 2 B. By condition 1, this set is non-empty. Since f
is continuous, problem (??) has a solution. By assumption 3, this solution is unique.
Hence, h(b) is a well-de…ned function.
To show that h is continuous, let b1 ; b2 ; ::: be a sequence in B such that limn!1 bn =b,
where b 2 B: I must show that limn!1 h(bn ) = h(b):
If h(bn ) does not converge to h(b), then there exists an " > 0 and a subsequence
nj ; j = 1; 2; :::; such that jjh(bnj ) h(b)jj > ", for all j. Since X is compact, I may
assume that h(bnj ) converges, say to x. (That is, a subsequence of h(bnj ) converges
to x1 and I call this subsequence h(bnj ) again.) Since jjx h(b)jj " > 0; it follows
that x 6= h(b):
I now derive a contradiction. Since gk (h(bnj ); bnj ) 0; for all k and j, and the
functions gk are continuous and limj!1 (h(bnj ); bnj ) = (x; b); it follows that gk (x; b) 5
0; for all k: Therefore, f (x; b) f (h(b); b); by the de…nition of h(b): I prove that
f (x; b) = f (h(b); b): Suppose that f (x; b) < f (h(b); b): Then, f (x; b) < f (h(b); b) 2 ;
for some > 0: Since f is continuous, there exists a positive number such that
jf (x; b) f (h(b); b)j < ; if jjx h(b)jj < and jjb bjj < : By condition 2 of
the theorem, there is a > 0 such that if jjb bjj < ; then there exists an x 2 X
such that gk (x; b) 0; for all k; and jjx h(b)jj < : I may assume that < ; so
that, jf (x; b) f (h(b); b)j < : Since limj!1 bnj = b and limj!1 h(bnj ) = x; there
is a positive integer J such that jjbnj bjj < and jf (h(bnj ); bnj ) f (x; b)j < ;
for j J: By what has been argued, if j = J; there exists xnj 2 X such that
jjxnj h(b)jj < and gk (xnj ; bnj ) 0; for all k: If j J; f (xnj ; bnj ) > f (h(b); b) >
f (x; b) + 2 = f (x; b) + > f (h(bnj ); bnj ); which is impossible by the de…nition of
h(bnj ): This contradiction proves that f (x; b) = f (h(b); b): Condition 3 of the theorem
now implies that x = h(b); which contradicts the inequality jjx h(b)jj ": This
second contradiction implies that limn!1 h(bn ) = h(b):
7
MATH CAMP: Lecture 7
df (c)
if jx cj < and x 6= c. That is, f (x) f (c) dx
(x c) "jx cj; if jx cj < :
1
Let
df (c) df (c) df (c)
g(x) = f (c) + (x c) = f (c) c + x:
dx dx dx
df (c) df (c)
g is an a¢ ne function with constant f (c) dx
c and linear part dx
x. Then,
Let be a small positive number and magnify the graphs of f and g by multiplying
both coordinates by 1 , which is a large number. Adjust the lens so that the part
of the horizontal coordinate in the …eld of vision varies between 1 and 1. Let
x c= x, so that g(x) = g(c + x) and f (x) = f (c + x), where 1 x 1.
What we see after magni…cation is the graphs of the functions of x, 1 f (c + x)
and 1 g(c + x), as x varies between 1 and +1. Let " > 0 and choose > 0
f (x) f (c) df (c)
such that x c dx
< ", if 0 < jx cj < . Then, if j xj 1;
1 1
f (c+ x)
g(c+ x)
1 df
= f (c+ x) f (c) (c) x
dx
1 df
= f (c+ x) f (c) (c) x j xj
j xj dx
f (c+ x) f (c) df
= (c) j xj < "j xj:
x dx
That is, the graphs of 1 f (c + x) and of 1 g(c + x) as functions of x are
within "j xj in the vertical direction as x varies between 1 and 1. In this sense,
the a¢ ne function g(x) = f (c) + dfdx(c) (x c) approximates f near c.
2
Notice that if x and y are numbers, then jxj = jx y + yj 5 jx yj + jyj; so that
jxj jyj 5 jx yj:
Lemma:
a) If f is di¤erentiable at c, and df (c)=dx > 0, then there exists a > 0 such that
f (c) < f (x), if c < x < c + .
b) If df (c)=dx < 0, there exists a > 0 such that f (c) < f (x), if c < x < c.
1 df (c)
Proof: a) Let correspond to " = 2 dx
in the de…nition of di¤erentiability of c.
Then,
f (x) f (c) df (c) 1 df (c)
> ;
x c dx 2 dx
if c < x < c + . Hence,
1 df (c)
f (x) f (c) > (x c) > 0;
2 dx
if c < x < c + :
The proof of (b) is similar.
3
De…nition: If f : (a; b) ! R, where a < b, and c is such that a < c < b, then f
achieves a relative maximum at c if there exists a > 0 such that f (c) f (x), if
jx cj < . A relative maximum is also called a local maximum.
A relative or local minimum for f is de…ned in the same way. f achieves a local
minimum at c, if and only if f achieves a local maximum at c. If f achieves a local
minimum at c, 0 = d( f (c))=dx = [df (c)=dx], so that df (c)=dx = 0.
Proof: If f (x) = 0, for all x, then df (c)=dx = 0, for all c. So suppose f (x) 6= 0, for
some x. Suppose f (x) > 0, for some x. Since [a; b] is compact and f is continuous,
there is c such that a c b and f (c) f (x), for all x. Since f (x) > 0, for some x,
f (c) > 0. Since f (a) = f (b) = 0, a < c < b. By the previous theorem, df (c)=dx = 0.
Use a similar argument if f (x) < 0, for some x.
4
Proof: Let ' : [a; b] ! R be de…ned by
f (b) f (a)
'(x) = f (x) f (a) (x a):
b a
Then '(a) = '(b) = 0, ' is continuous on [a; b] and di¤erentiable on (a; b). By Rolle’s
theorem, there is c such that a < c < b and
d
Leibniz’s Rule: If f : (a; b) ! R and g : (a; b) ! are di¤erentiable, then dx
f (x)g(x) =
f (x) dg(x)
dx
+ dfdx
(x)
g(x):
df ( ) 1 d2 f ( )
f( ) = f( ) + ( )+ ( )2
dx 2 dx2
1 dn 1 f ( ) 1 dn f ( )
+ + ( )n 1
+ ( )n :
(n 1)! dxn 1 n! dxn
5
Proof: Let the number r be de…ned by
( )n df ( ) 1 dn 1 f ( )
r = f( ) f ( )+ ( )+ + ( )n 1
:
n! dx (n 1)! dxn 1
df (x) 1 d2 f (x)
'(x) = f ( ) [f (x) + ( x) + ( x)2
dx 2 dx2
1 df n 1 (x) r
+ + n 1 ( x)n 1 + ( x)n ]:
(n 1)! dx n!
' is continuous on [a; b] because f and all its derivatives are continuous on [a; b].
Similarly, ' is di¤erentiable on (a; b). '( ) = 0, by the de…nition of r. Certainly,
'( ) = 0. By the Rolle’s theorem, there is a between and such that d'( )dx =
0:
d'(x) df (x) df (x) d2 f (x) d2 f (x)
= + ( x) ( x) +
dx dx dx dx2 dx2
1 dn 1 f (x) n 2 1 dn f (x) r
( x) + ( x)n 1
( x)n 1
(n 2)! dxn 1 (n 1)! dxn (n 1)!
1 dn f (x)
= r n ( x)n 1 :
(n 1)! dx
Since d'( )=dx = 0, r = dn f ( )=dxn . The theorem follows from the de…nition of r.
Theorem: Suppose that f : (a; b) ! R is di¤erentiable, where a < b and that the
…rst two derivatives of f exist and are continuous. If c is such that df (c)=dx = 0 and
d2 f (c)=dx2 < 0 (> 0), then f achieves a local maximum (minimum) at c.
2
Proof: Because d dx
f (x)
2 is continuous, there exist a < 0 such that if jx cj < , then
2 2
d f (x)=dx < 0. By Taylor’s theorem, if 0 < jx cj < , there exists a between c
and x such that
df (c) 1 d2 f ( ) 1 d2 f ( )
f (x) = f (c) + (x c) + (x c)2 = f (c) + (x c)2 < f (c);
dx 2 dx2 2 dx2
since d2 f ( )=dx2 < 0.
Similarly, if d2 f (c)=dx2 > 0, then, f achieves a local minimum at c.
6
A local maximum. The bucket does not hold water and so d2 f (c)=dx2 < 0.
d dg df (c)
(g f )(c) = (f (c)) :
dx dy dx
7
If f (x) 6= f (c); then
Suppose there is an " > 0 such that f (x) 6= f (c); if 0 < jx cj < ": The second
term on the right-hand side converges to zero as x converges to c: Since f (x) 6= f (c), if
jx cj is small, the …rst term converges to zero as x goes to c; provided f (x) ! f (c).
Since f is di¤erentiable, it is continuous and so f (x) ! f (c) as x ! c.
Suppose there is no positive number " such that f (x) 6= f (c), if 0 < jx cj < ".
Then, df (c)=dx = 0. If f (x) 6= f (c), the argument of the previous paragraph applies.
If f (x) = f (c), then
Multivariate Calculus
De…nition: Let U be an open subset of RN and let f : U ! RM . f is di¤erentiable
at c 2 U if there exists a linear transformation Df (c) : RN ! RM , called the
derivative of f at c, such that for every " > 0, there exists a > 0 such that
kf (x) f (c) Df (c)(x c)k < " kx ck, if 0 < kx ck < . That is,
The a¢ ne function f (c) + Df (c)(x c) approximates f (x) locally near c, that is,
for x near c.
8
Lemma: A function f : U ! RM has at most one derivative at a point.
Dividing by jtj, we see that 0 < kS(v) T (v)k < 2", for all " > 0, which is impossible.
X
N
jym j = amn xn kam k kxk ;
n=1
Therefore,
v
uM q
p uX p
jym j 5 a N kxk and so kyk = t 2
ym M a2 N k x k2 = a M N k x k :
m=1
p
Let b = a M N .
9
MATH CAMP: Lecture 8
if kx ck < . Therefore,
if jjx cjj < : By the last lemma of the previous lecture, there is a b > 0 such that
Therefore,
jjf (x) f (c)jj (1 + b)jjx cjj
if jjx cjj < :
Remarks:
1. r0 f (c) = 0.
d d
2. rv f (c) = dt f (c + tv) t=0
= dx g(0), where g(t) = f (c + tv).
Proof:k By the de…nition of the di¤erentiability of f , for any " > 0, there is a > 0
such that jf (c + tv) f (c) Df (c)(tv)j " ktvk, if ktvk < . If v = 0, Df (c)(v) =
0 = rv f (c). If v 6= 0 and 0 < jtj < kvk , then 1t [f (c + tv) f (c)] Df (c)(v) =
1 "ktvk "jtjkvk
jf (c + tv) f (c) Df (c)(tv)j
jtj jtj = jtj = " kvk : Therefore, Df (c)(v) =
rv f (c), by the de…nition of rv f (c).
1
De…nition: If f : U ! R, where U RN and U is open, then ren f (c) is called
the nth partial derivative of f at c and is written as @f (c)=@xn ;.where en is the nth
standard basis vector of RN .
Remark:
@f d
(c) = f (c1 ; :::; cn 1 ; xn ; cn+1; :::; cN )jxn =cn :
@xn dxn
That is, all variables of f but the nth are held constant at their values in the vector
c. The result is a function of the single variable xn . The derivative of this function
@f
at xn = cn equals @x n
(c).
Example:
f (x1 ; x2 ; x3 ) = x1 x32 x23
@f (2; 4; 5)
= 2(3)(42 )(52 ) = 6(16)(25) = 2400:
@x2
If f : U ! RM where U RN and U is open, let fm : U ! R be the mth
component of f , for m = 1; :::; M:
Proof: Let " > 0 and let > 0 be such thatkf (x) f (c) Df (c)(x c)k
5 " kx 0 ck, if kx ck 1< . Df (c) : RN ! RM is a linear transformation, so that
(Df (c))1
B .. C
Df (c) = @ . A ; where (Df (c))m : RN ! R is linear, for all m, and is the
(Df (c))M
mth component function of Df (c): If jjx cjj < ; then
2
0 1
@f1 (c) @f1 (c)
@x1 ::: @xN
B .. .. C
Theorem: The matrix B
@ . .
C represents Df (c), if f is di¤eren-
A
@fM (c)
@x1 ::: @f@x
M (c)
N
tiable at c and f : U ! RM , where U RN and U is open.
PN
Proof: Let v = (v1 ; :::; vN ) 2 RN . Then, v = n=1 vn en , so that
N
! N
X X
Df (c)(v) = Df (c) vn en = vn Df (c)(en )
n=1 n=1
0 1 0 1
N Df1 (c) N Df1 (c)(en )
X B .. C X B .. C
= vn @ . A (en ) = vn @ . A
n=1 DfM (c) n=1 DfM (c)(en )
0 1 0 1
@f1 (c)
N ren f1 (c) N @x
X B .. C X B ..
n
C
= vn @ . A= vn B
@ .
C
A
n=1 ren fM (c) n=1 @fM (c)
@xn
0 10 1
@f1 (c) @f1 (c)
::: v1
B @x. 1 @xN
. CB . C
B .. .. C
= @ A @ .. A :
@fM (c) @fM (c) vN
@x1 ::: @xN
Theorem:
Parts 2 and 3 of this theorem generalize Leibniz’s rule for di¤erentiating products.
2. If A and B are M N matrices and a and b are numbers, then (aA + bB)T =
aAT + bB T :
3
3. If A is a matrix, (AT )T = A.
where I treat Df (c) and Dg(c) as M N matrices. The last equation holds because
v T (Df (c))T g(c) is a number and so equals its own transpose.
Now, I describe some useful special cases.
De…nition: f has a local maximum at c 2 U if for some " > 0, f (c) f (x), for
all x 2 B" (c):
Proof: The restriction of f to any line through c has a local maximum at c. There-
fore, rv f (c) = 0, for all v 2 RN . In particular, @f (c)=@xn = 0, for all n. Therefore,
Df (c) = 0.
A local minimum for f may be de…ned in a similar way, and Df (c) = 0 if f has
a local minimum at c:
4
Application (Least Squares Estimator):
K
X
Model y= k xk + e; e = error:
n=1
= yT y 2y T Xb + bT X T Xb;
where I have used the rules for matrix transposition and the fact that since bT X T y is
a number, bT X T y = (bT X T y)T = y T XbT T = y T Xb. The b that minimizes (y
Xb)T (y Xb) is called the least squares estimator.
If k = 1, we have the following:
The least squares estimate, b, minimizes the sum of the squares of the vertical dis-
tances from the data points (xn ; yn ) to the line y = bX.
5
In order to calculate the least squares estimator, we set the derivative of (y Xb)
(y Xb) = y T y 2y T Xb + bT X T Xb with respect to b equal to zero. Let Db denote
the derivative with respect to the vector b.
where I have used the fact that the matrix X T X is symmetric. X T X is symmetric
because (X T X)T = X T X T T = X T X. Since X T X is symmetric, Db bT X T Xb =
2bT X T X, by a formula proved earlier. Setting Db (y Xb) (y Xb) equal to zero,
we obtain the equation 0 = 2y T X + 2bT X T X; which implies that bT X T X = y T X.
Taking the transpose of both sides of this equation, we obtain X T Xb = X T y. If the
matrix X T X is invertible, then b = (X T X) 1 X T y: This is the formula for the least
squares estimator.
The N vector Xb is the projection of y onto the span of the columns of X: In order
to see that this is so, we must show that y Xb is orthogonal to the columns of X: Since
the columns of X are the rows of X T ; it is su¢ cient to show that X T (y Xb) = 0:
However, X T (y Xb) = X T y X T X(X T X) 1 X T y = X T y X T y = 0:
Proof: Let ' : [0; 1] ! R be de…ned by '(t) = f ((1 t)a + tb): (0) = f (a):
(1) = f (b): By the chain rule, d (t)=dt = Df ((1 t)a + tb)(b a): By the mean
value theorem for one variable, there exists a number t0 such that 0 < t0 < 1 and
d (t0 )=dt = (1) (0) = f (b) f (a): Let c = (1 t0 )a + t0 b: Then Df (c)(b a) =
f (b) f (a):
6
MATH CAMP: Lecture 9
Terminology: Let f : U ! RK be di¤erentiable function, where U is an open
subset of RN : The matrix representation of Df (x) is called the Jacobian matrix of f
at x:
If f : U ! R, the derivative Df (x) is called the gradient of f at x, though the
df
word “gradient” usually suggests the vector @x 1
(x); :::; @xdfN (x) that represents the
linear functional Df (x): This vector is sometimes denoted by rf (x):
@ @f
(x)
@xm @xn
@ @f @ 2 f (x)
(x) = :
@xm @xn @xm @xn
@ 2 f (x)
@xn @xm
exists and equals
@ 2 f (x)
:
@xm @xn
0 @2f @2f
1
@x1 @x1
(x) @xN @x1
(x)
B .. C
Remark: It follows that the matrix @ . A is symmetric,
@2f @2f
@x1 @xN
(x) @xN @xN
(x)
@ 2 f (x)
if all the partial derivatives @xm @xn
exist and are continuous functions of x.
1
Interpretation: Let f : U ! R, where U is an open subset of RN : If v 2 RN ;
@f (x)
Df (x)(v) = rv f (x) = N n=1 vn @xn is the rate of change of f in the direction of v.
We need an expression for the rate of change of Df (x)(v) at x = c in a direction
w 2 RN : This rate of change is
! !
XN
@f (c) XN
@ XN
@f (c)
rw vn = wm vn
n=1
@xn m=1
@xm n=1 @xn
X
N X
N
@ 2 f (c)
= vn wm = v T D2 f (c)w; where
m=1 n=1
@xm @xn
0 @ 2 f (c) @ 2 f (c) 1
@x1 @x1 @xN @x1
B .. .. C
D2 f (c) = @ . . A:
@ 2 f (c) @ 2 f (c)
@x1 @xN @xN @xN
This matrix is called the Hessian matrix. The function v T D2 f (c)w of v and w is a
bilinear form in v and w and may be written as D2 f (c)(v; w): The Hessian is the
Jacobian of the function Df : U ! RN :
X
N X
N X
N
@ 3 f (c)
3
D f (c)(v; w; u) = vn wm uk :
n=1 m=1 k=1
@xk @xm @xn
X
N X
N
@ r f (c)
r
D f (c)(v1 ; : : : ; vr ) = v1n1 vrnr
n1 =1 nr =1
@xnr @xn1
Remarks:
@f @f
(x); :::; (x) :
@x1 @xN
2
When we take the derivative of Df (x), we think of Df as a function from U to
RN ; and so write Df (x) as a column vector
0 @f (x) 1
@x1
B .. C
@ . A:
@f (x)
@xN
The matrix representation of the derivative of this function is the Hessian matrix
2
@ f (x)
2. If the functions, @xm @xn
are continuous with respect to x; then the Hessian matrix
is symmetric, so that D2 f (c) is a symmetric bilinear form.
3
Let q = a+ (b a) and substitute into the above equation. The theorem then follows.
Proof: D2 f (x) is negative de…nite if and only if the leading principal minors of
D2 f (x) have determinants that alternate in sign, with the …rst being negative. These
determinants are continuous functions of x. Therefore, if D2 f (c) is negative de…nite,
D2 f (x) is negative de…nite if x is close enough to c. That is, there is " > 0 such that
D2 f (x) is negative de…nite if kx ck < ". Suppose that 0 < kx ck < ". By Taylor’s
theorem, there is a vector q on the line segment from c to x such that
1
f (x) = f (c) + Df (c)(x c) + D2 f (q)(x c; x c)
2
Since kq ck kx ck < "; D2 f (q) is negative de…nite and so D2 f (q)(x c; x c) <
0. By assumption, Df (c) = 0. Therefore f (x) < f (c).
Remark: Similarly if Df (c) = 0 and D2 f (c) is positive de…nite, then f has a local
minimum at c.
This theorem says that f behaves locally like its derivative in that if the derivative
has an inverse at c, then f has an inverse on an open set containing c.
4
I now introduce the implicit function theorem by explaining what it says about
linear functions. Let T : RN +K ! RK be a linear function with K (N + K) matrix
representation C: Suppose that C has rank K. (Since the maximum rank of C is K;
we may say that C has full rank). Hence, C has K independent columns, so that
T (RN +K ) = RK and T is onto. We may assume, without loss of generality, that the
.
last K columns of T are independent. Write C as C = (A..B); where A is a K N
matrix and B is a non-singular (that is, invertible) K K matrix. Write a vector
in RN +K as (x; y), where x 2 RN and y 2 RK : The equation z = T (x; y) may be
.
written as z = C x = (A .. B) x = Ax + By; so that By = z Ax: Solving for y;
y y
we obtain y = B 1 (z Ax): Let H : RN RK ! RK be the linear transformation
H(x; z) = B 1 (z Ax): Then, T (x; H(x; z)) = z; since
. x
T (x; H(x; z)) = (A .. B) 1 (z
B Ax)
= Ax + BB 1 (z Ax) = Ax + I(z Ax) = Ax + z Ax = z:
For each z 2 RK ; the set T 1 (z) is the graph of H(x; z) as a function x with z …xed.
The space RN +K is thereby expressed as RN RK and T is the projection of RN +K
onto RK followed by an invertible linear transformation, with matrix representation
B, from RK to RK : In other words, (x; y) 2 RN +K equals (x; H(x; z)) = (x; B 1 (z
Ax)); for some z 2 RK ; and the point (x; B 1 (z Ax)) projects onto (0; B 1 z); which
in turn is carried to BB 1 z = z: The following graphs may help you visualize this.
The implicit function theorem says that the same assertion applies locally to a
di¤erentiable function.
5
of Df (a; b) are linearly independent. Let x vary over RN and y and z vary over RK :
Then there is an open set U in RN such that a 2 U; there is an open set V in RK
such that c = f (a; b) 2 V; and there is a C 1 function h : U V ! RK such that
f (x; h(x; z)) = z; for all x 2 U and z 2 V; and h(a; c) = b: Also
6
7
MATH CAMP: Lecture 10
max f (x; b)
x2RN
s.t. (x; b) 2 W
where by Db f (h(b); b) I mean the derivative of f (h(b); b) with respect to b with b held
…xed. The equation Df (h(b); b) = Db f (h(b); b) is known as the envelope theorem.
@ @
Similarly, @w (p; w) = L(p; w) = @w [pf (L) wL] with L held …xed at L =
L(p; w):
1
Constrained Optimization
Let U be an open subset of RN and let f : U ! R and, for k = 1; : : : ; K, and
gk : U ! R. Consider the problem:
max f (x)
x2U
s.t. gk (x) = ak ; for k = 1; : : : ; K;
Remark: The requirement that the vectors Dg1 (x); :::; DgK (x) be linearly inde-
pendent is called the constraint quali…cation.
Example: This example shows that the constraint quali…cation is necessary for
the conclusion of the theorem. Let N = 2 and f (x1 ; x2 ) = x2 : Let g1 (x1 ; x2 ) =
x1 and g2 (x1 x2 ) = x1 x22 : Let a1 = a2 = 0: The only point satisfying the
constraints g1 (x1 ; x2 ) = 0 = g2 (x1 ; x2 ) is (x1 ; x2 ) = (0; 0): This point therefore
maximizes f (x1 ; x2 ) subject to the constraints g1 (x1 ; x2 ) = g2 (x1 ; x2 ) = 0: Since
Df (0; 0) = (0; 1) and Dg1 (0; 0) = Dg2 (0; 0) = (1; 0); it is clear that Df (0; 0) is not a
linear combination of Dg1 (0; 0) and Dg2 (0; 0):
2
Since, Dz g(y; z) is invertible, the implicit function implies that there is an open
set W in RN K such that y 2 W and there exists a C 1 function h : W ! RK such
that h(y) = z and g(y; h(y)) = a, for all y 2 W . Also
1
Dh(y) = (Dz g(y; z)) Dy g(y; z):
so that
T
Dy f (y; z) = Dy g(y; z):
The equations Dz f (y; z) = T Dz g(y; z) and Dy f (y; z) = T Dy g(y; z) imply that
P
Df (x) = T Dg(x), which is the same as Df (x) = K k=1 k Dgk (x).
Df (x) must lie in the same line as Dg(x), where x is the constrained maximum,
the constraint being g(x) = 0. This is so because the curve g(x) = 0 is tangent at
x to the curve f (x) = f (x); Dg(x) is perpendicular at x to the curve g(x) = 0 at x;
and Df (x) is perpendicular at x to the curve f (x) = f (x):
The conditions
XK
Df (x) = k Dgk (x)
k=1
3
are called …rst-order conditions. The conditions gk (x) = ak ;for k = 1; :::; k; are called
the constraints.
The function
D(x; ) L(x; ; a) = 0
and
gk (x1 ; : : : ; xN ) = ak , for k = 1; :::; K,
which are the …rst order conditions and constraints, respectively.
The …rst order conditions are necessary conditions for optimality. I now state a
theorem stating conditions that together with the constraints and …rst order condi-
tions are su¢ cient for local optimality.
a) If v T Dx2 L(x; ; a)v < 0, for all v 2 Z(x) such that v 6= 0, then f achieves a local
maximum at x on fx 2 RN j g(x) = ag.
b) If v T Dx2 L(x; ; a)v > 0, for all v 2 Z(x) such that v 6= 0, then f achieves a local
minimum at x on fx 2 RN j g(x) = ag.
4
MATH CAMP: Lecture 11
I give an informal proof.of the theorem giving second order conditions for a local
optimum. Recall that x 2 U is the optimum and Z(x) = fv 2 RN jDgk (x)(v) = 0; for
k = 1; :::; Kg: The second order condition for a constrained local maximum is that
v T Dx2 L(x; ; a)v < 0; for all v 2 Z(x) such that v 6= 0: Let us accept that v 2 Z(x);
where v 6= 0; if and only if there is a twice di¤erentiable function : ( 1; 1) ! U
such that (0) = x; g( (t)) = ak ; for k = 1; :::; K and for all t, and D (0) = v:
Because gk ( (t)) = ak ; for all k and t,
it follows that
Dgk ( (t)) D (t) = 0;
for k = 1; :::; K: At t = 0; this equation becomes
Dgk (x) v = 0;
for k = 1; :::; K: Let us take the derivative with respect to t in the equation Dgk ( (t))
D (t) = 0 and apply the chain rule and Liebniz’s rule. Then,
where 0 1
d2
dt2 1 (t)
B .. C
D2 (t) = @ . A:
d2
dt2 N (t)
1
We also know that
df ( (t))
= Df ( (t))D (t):
dt
By the …rst order conditions for a constrained local optimum,
K
X
Df (x) = k Dgk (x):
k=1
Su¢ cient conditions for a local maximum t = 0 along the curve (t) are that
df ( (0))
=0
dt
and
d2 f ( (0))
< 0:
dt2
However,
X K
df ( (0))
= Df ( (0))D (0) = Df (x)v = k Dgk (x)v =0
dt
k=1
and
d2 f ( (0))
= D (0)T D2 f ( (0))D (0) + Df ( (0))D2 (0)
dt2
K
X
T 2 2
= v D f (x)v + k Dgk (x)D (0)
k=1
XK
= v T D2 f (x)v kv
T
D2 gk (x)v
k=1
" K
#
X
= v T D2 f (x) kD
2
gk (x) v
k=1
= v T Dx2 L(x; ; a)v
Therefore, the inequality v T Dx2 L(x; ; a)v < 0 implies that f ( (t)) has a local maxi-
mum at the point t = 0:
Let us assume that we know that f has a constrained local maximum at x if it
has a local maximum along every curve through x and satisfying g( (t)) = a; for
all t: This assertion would require proof, of course, in a rigorous argument. Despite
the lack of rigor, I hope the preceding argument gives a good idea of why the theorem
is true.
2
Remarks:
1. The condition g(x) = a is the constraint, where g(x) = (g1 (x); :::; gK (x)):
T
2. The condition Df (x) = Dg(x) is a …rst order condition for local optimality.
3. The constraint and …rst order conditions are necessary for local optimality.
4. The conditions v T Dx2 L(x; ; a)v < 0; for all non-zero v 2 Z(x) is a second order
condition for a local maximum.
5. The condition v T Dx2 L(x; ; a)v > 0; for all non-zero v 2 Z(x) is a second order
condition for a local minimum
6. The constraint, the …rst order condition and the second order condition together
are su¢ cient for local optimality.
I next use the implicit function theorem to show that the optimal value of x and
the value of the corresponding vector of Lagrange multipliers may be written as
functions of the K vector a: We begin by writing down the necessary conditions for
a constrained optimum,
T
Df (x) Dg(x) = 0
g(x) + a = 0;
These may be written as the equation F (x; ; a) = 0, where
T
Df (x) Dg(x)
F (x; ; a) = :
g(x) + a
I apply the implicit function theorem to F to show that x and are functions of a a:
In order to do so, I must show that Dx; F (x; ; a) has rank N + K: If we calculate
this derivative, we see that
PK
D2 f (x) k=1 kD
2 g (x)
k Dg(x)T
D(x; ) F (x; ; a) =
Dg(x) 0
Dx2 L(x; ; a) Dg(x)T
=
Dg(x) 0
This matrix is called the bordered Hessian. It is the Jacobian of F (x; ; a) consid-
ered as a function of x and .
Assume the constraint quali…cation that Dg(x) has rank K, for all x, and suppose
that f achieves a local maximum at x on fx 2 RN j g(x) = ag and suppose that x
satis…es the second order condition for a local maximum. Let be the corresponding
vector of Lagrange multipliers. Then, v T D2 L(x; )v < 0; if Dg(x)v = 0, for a non-zero
N -vector v. I show that under these conditions the bordered Hessian is non-singular,
i.e., that it has rank N + K. I must show that if
Dx2 L (Dg)T v v
= 0; then = 0;
Dg 0 w w
3
where v 2 RN and w 2 RK . Since
Dx2 L (Dg)T v
= 0; (?)
Dg 0 w
Dx2 L (Dg)T v
0 = (v T ; wT )
Dg 0 w
= v T (Dx2 L)v v T (Dg)T w wT (Dg)v
= v T (Dx2 L)v 2wT (Dg)v
= v T (Dx2 L)v:
The last equation holds because Dg(x)v = 0: The second order conditions satis…ed
at x imply that v T Dx2 L(x; )v < 0, unless v = 0. Therefore, v = 0. Equation (?)
T
implies that (Dx2 L)v (Dg)T w = 0. Hence, (Dg)T w = 0, so that (Dg)T w =
wT Dg( ) = 0. Since Dg(x) has rank K, the K rows of Dg(x) are independent and
so w = 0. Since I have shown that (v; w) = 0, it follows that the bordered Hessian
has rank N + K:
By the implicit function theorem, there exists an open set V in RK such that a 2 V
and there exist continuously di¤erentiable functions x (a) and (a) such that x(a) = a;
T
(a) = and Df (x(a)) (a)Dg(x(a)) = 0 and g(x(a)) + a = 0, for all a 2 V .
Su¢ cient conditions for a local maximum at x are that g(x) = a; Df (x) = T Dg(x);
and v T Dx2 L(x; )v < 0; for all v 2 RN such that v 6= 0 and Dg(x)(v) = 0. These
conditions hold at (x; ). Therefore they hold for (x; ) close enough to (x; ) and
such that g(x) = a; for some a; since Dg(x) and Dx2 L(x; z) depend continuously on x.
Since x(a) and (a) are continuous functions, these conditions hold at (x(a); (a)), if
a is close enough to a. Therefore, we can assume that they hold for a 2 V; by making
V smaller, if necessary. Hence, we may assume that x(a) is a local maximum, for all
a 2 V:
Observe that
g(x(a)) = a ) Dx g(x(a))Dx(a) = I
and so
T T
Df (x(a)) = Dx f (x(a))Dx(a) = (a)Dx g(x(a))Dx(a) = (a):
This shows that k (a) is the marginal value of increasing ak , for all k.
4
The Envelope Theorem for Constrained Optimization: Consider the prob-
lem
The constraints and …rst order conditions for arbitrary parameter values a; b; c are
T
Dx f (x; b) Dx g(x; c) = 0
g(x; c) + a = 0:
Let
Dx f (x; b) Dx g(x; c)
F (x; ; a; b; c) =
g(x; c) + a
We know that D(x; ) F (x; ; a; b; c) is invertible, so that by the implicit function theo-
rem applied to the equation F (x; ; a; b; c) = 0 there exist locally de…ned C 1 functions
x(a; b; c) and (a; b; c) such that x(a; b; c) is a local maximum with corresponding La-
grange multipliers (a; b; c) 2 RK . Also x(a; b; c) = x and (a; b; c) = :
I will show that
g(x(a; b; c); c) = a
5
so that
T T
Da f (x(a; b; c); b)ja=a = Dx f (x; b)Da x(a; b; c) = Dgx (x; c)Da x(a; b; c) = :
g(x(a; b; c); c) = a
so that
g(x(a; b; c); c) = a
so that
Dx g(x; c)Dc x(a; b; c) = Dgc (x; c):
Hence,
T T
Dc f (x(a; b; c); b)jc=c = Dx f (x; b)Dc x(a; b; c) = Dx g(x; c)Dc x(a; b; c) = Dc g(x; c):
These three steps together imply equation (? ? ?), which is the envelope theorem for
constrained maximization. The same equation applies if max is replaced by min in
problem (??):
Example:
max [ 1 ln x1 + + N ln xN ]
xn >0
n=1;:::;N
s.t. p1 x1 + + pN xN = w;
L(x1 ; : : : ; xN ; ) = 1 ln x1 + + N ln xN p1 x1 pN xN
n
pn = 0; for all n:
xn
6
1
pn xn = n ; for all n:
N
X N
X
1
n = pn xn = w:
n=1 n=1
1 w
= :
P
N
k
k=1
n
pn xn = w; for all n:
P
N
k
k=1
n w
xn = ; for all n:
P
N pn
k
k=1
P
N
k
k=1
=:
w
is known as the marginal utility of wealth.
u(x1 ; : : : ; xN ) = 1 ln x1 + + N ln xN
and the utility function
are known as Cobb–Douglas P utility functions. u and v give rise to the same demand
functions. A fraction an = Nk=1 ak of wealth is spent on commodity n. These fractions
add up to one.
If N = 2, we have the following picture.
7
Let
s.t. p1 x1 + p2 x2 + + pN xN w = 0:
L(x1 ; :::; xN ; ; p1 ; :::; pN ; w; 1 ; :::; N) = 1 ln x1 + + N ln xN (p1 x1 + +pN xN w):
@F @L
= =
@w @w
@F
= xn
@pn
@F
= ln xn :
@ n
Convex Analysis: So far, we have had necessary or su¢ cient conditions for local
maxima or minima. Now we turn to conditions for global maxima or minima. First,
we need some de…nitions
8
9
MATH CAMP: Lecture 12
Therefore,
d
f (a + (b a))j =0 f (b) f (a) > 0:
d
However,
d
f (a + (b a))j =0 = Df (a)(b a) = 0:
d
This contradiction proves that there is no b 2 A such that f (b) > f (a).
1
Remark: f : A ! R is convex if and only if f is concave
Proof: I prove only (1). Let a 2 A and b 2 A, where a < b. I …rst prove (1) for
d2
N = 1. Since dx 2 f (x) 0, dfdx
(x)
is a non-increasing function. That is, if x < y, then
df (x) df (y)
dx dx .
If f is not concave, then there is a c = a + (1 )b, where 0 < < 1,
and f (c) < f (a) + (1 )f (b). By the mean value theorem, there is a c1 , such that
a < c1 < c and
df (c1 )
f (c) f (a) f (c) f (a)
= =
dx c a a + (1 )b a
f (a) + (1 )f (b) f (a) (1 )[f (b) f (a)]
< =
(1 )(b a) (1 )(b a)
f (b) f (a)
= :
b a
Similarly, there is a c2 such that c < c2 < b and
df (c2 ) f (b) f (c) f (b) f (c) f (b) f (a) (1 )f (b)
= = >
dx b c b a (1 )b (b a)
(f (b) f (a)) f (b) f (a)
= = :
(b a) b a
Therefore,
df (c2 ) f (b) f (a) df (c1 )
> > ;
dx b a dx
which is impossible since c2 > c1 . This contradiction proves the theorem for the case
N = 1:
The following diagram illustrates the argument:
2
Now, consider the case N > 1. Let
g( ) = f ( a + (1 )b) = f (b + (a b)):
Then
dg( )
= Df ( a + (1 )b)(a b)
d
and
d2 g( )
= (a b)T D2 f ( a + (1 )b)(a b) 0;
d 2
since D2 f (x) is negative semi-de…nite, for all x. Therefore, by what has been proved
for the case N = 1,
3
De…nition: x 2 RN is said to be feasible if x 2 C and gk (x) ak , for all k.
1;
P:K: : ; K, gk (xk ) ak ; k = 0, if gk (x) < ak , and x solves the problem maxx2C [f (x)
k=1 k gk (x)]. Then x solves problem (?).
Suppose that x solves problem (?) and that the following constraint quali…cation
is satis…ed. There exists an x ^ 2 C such that gk (^ x) < ak , for all k. Then, there
exists 2 R+ K such that, for all k, = 0, if g (x) < ak and x solves the problem
k k
PK
maxx2C [f (x) k=1 k gk (x)].
The conditions that k = 0 if gk (x) < ak , for all k, are called the complementary
slackness conditions.
Notice that the problem
" K
#
X
max f (x) k gk (x)
x2C
k=1
is unconstrained except to the extent that x belongs to C. In this sense, the Kuhn-
Tucker theorem converts a constrained maximization problem to an unconstrained
one.
The function
K
X
L(x; ) = f (x) k gk (x) or
k=1
XK
L(x; ; a) = f (x) k [gk (x) ak ]
k=1
4
max u(x1 ; :::; xN ) ((?))
N
x2R+
s.t. p1 x1 + + pN xN w;
Since u is increasing this problem has no solution unless > 0; and so > 0: The
complementary slackness condition therefore implies that p:x = w:
The units of are utiles per dollar. The units of u(x) p:x are utiles. u(x) p:x
is consumer’s surplus measured in utiles
In general, if the maximization problem is
max f (x)
x2C
s.t. gk (x) ak ; for k = 1; ::; K:
is a kind of surplus. The quantity k gk (x) is the cost of the kth constraint.
max x2
x2R2
s.t. x1 + x22 0
x1 + x22 0
This example does not satisfy the constraint quali…cation, because x = 0 is the only
feasible vector. Because 0 is the only feasible point, it is also the optimum. There
are no non-negative numbers 1 and 2 ; such that x = 0 and 1 and 2 maximize
the Langrangian, for suppose there were. Then, x = 0 would solve the problem
and the derivative of the objective function of this problem would be zero at x = 0:
Setting the partial derivative with respect to x2 equal to zero at x2 = 0; we …nd that
1 2( 1 + 2 )0 = 0; which is impossible.
5
The next …gure should make clear why the Kuhn-Tucker theorem does not apply
to this example. The region where x1 +x22 0 is labeled as g1 (x) 0; and the region
where x1 +x22 0 is labeled as g2 (x) 0: The Langrangian is f (x) 1 g1 (x) 2 g2 (x):
If the Langrangian is maximized at x = 0, then the derivative of the Langrangian is
zero at x = 0: That is, Df (0) = 1 Dg1 (0) + 2 Dg2 (0):The derivative Df (0) points
straight upward, and Dg1 (0) and Dg2 (0) are horizontal. Since a vertical vector cannot
be a linear combination of horizontal vectors, the Lagrangian is not maximized at
the optimal value of x.
The situation changes when the example is modi…ed so as to satisfy the constraint
quali…cation, as in the next example.
max x2
x2R2
s.t. x1 + x22 1
x1 + x22 1
6
Proof of the Su¢ ciency of Kuhn–Tucker conditions for optimality: The
feasibility of x and the complementary slackness condition imply that
K
X
f (x) = f (x) k [gk (x) ak ] = L(x; ; a):
k=1
If x 2 C is such that gk (x) ak , for all k, then because k 0, for all k, f (x)
PK
k [gk (x) ak ] f (x). All these inequalities together imply that f (x) f (x), if
k=1
x is feasible. That is, x solves problem (*).
7
Proof: I show that A is convex. Let a and a 2 A and 2 (0; 1). There exists an x
and an x in C such that gk (x) ak and gk (x) ak , for all k. Because C is convex,
x + (1 )x 2 C. Because gk is convex, for all k,
gk ( x + (1 )x) gk (x) + (1 )gk (x) ak + (1 )ak , for all k:
Therefore a + (1 )a 2 A and so A is convex.
I show that V is concave. Let a and a belong to A, 2 (0; 1), and let " > 0.
There exist x and x in C such that gk (x) ak and gk (x) ak , for all k, and
V (a) " < f (x) V (a) and V (a) " < f (x) V (a). By the argument made above,
x + (1 )x 2 C and gk ( x + (1 )x) ak + (1 )ak , for all k. Because f is
concave,
V ( a + (1 )a) f ( x + (1 )x) f (x) + (1 )f (x)
(V (a) ") + (1 )(V (a) ")
= V (a) + (1 )V (a) ":
Since " is arbitrarily small, V ( a + (1 )a) V (a) + (1 )V (a), and so V is
concave.
Remarks:
1. If f is di¤erentiable and is a subgradient of f at a, then = Df (a):
2. If f is di¤erentiable and concave, then Df (a) is a subgradient of f at a.
Proof of (2): The function g(a) = f (a) + Df (a)(a a) f (a) is convex and
Dg(a) = Df (a) Df (a) = 0. Therefore, g(a) achieves a global minimum at a.
Clearly, g(a) = 0; so that f (a) + Df (a)(a a) f (a) = g(a) 0, for all a, and hence
Df (a) is a subgradient of f at a.
8
3. If f is concave and not di¤erentiable at a, then f may have several subgradients
at a, as in the next …gure.
Then, is a subgradient of V at a.
PK PK
Proof: I must show that if a 2 A, then V (a) k=1 k ak V (a) k=1 k ak .
By the su¢ ciency of the Kuhn-Tucker conditions for optimality, V (a) = f (x). By
assumption,
PK for all
Pk, gk (x) ak and k 0 and Pk = 0, if gk (x) < ak . P It follows that
K K K
k=1 k gk (x) = k=1 k ak . Therefore,
PK
V (a) k=1 k ak = PK
f (x) k=1 k gk (x).
Because x solves problem (??), f (x) k=1 k kg (x) f (x) k=1 P g
k k (x), if x 2 C.
K
Suppose that x 2 C is such that gk (x) ak , for all k. Then, f (x) k=1 k gk (x)
PK
f (x) k ak . Putting all these equations and inequalities together, we see that
Pk=1K PK
V (a) k=1 k ak f (x) k=1 k ak , if x 2 C is such that gk (x) ak , for all k.
PK PK
From the de…nition of V (a), it follows that V (a) k=1 k ak V (a) k=1 k ak .
9
Remark: The theorem says that the Kuhn–Tucker coe¢ cient k is a marginal value
of increasing ak , just as are the Lagrange multipliers in the di¤erentiable case.
max f (x)
x2C
s.t. gk (x) ak , for k = 1; : : : ; K:
Proof: First of all, I show that k 0, for all k. If a and a0 are such that ak a0k ,
for all k, then V (a) V (a0 ), since any x 2 C that satis…es gk (x) a0k , for all k, also
satis…es gk (x) ak , for all k. Without loss of generality, I may assume that k = 1.
Let a be de…ned by a1 = a1 + 1 and ak = ak , if k 2. Because is a subgradient of
V at a, V (a) V (a) + (a a) = V (a) + 1 5 V (a) + 1 ; where the last inequality
follows because ak ak , for all k. Hence 1 0:
I next show that, for all k, k = 0, if gk (x) < ak . Let ak = gk (x), for all k. Because
ak ak , for all k, V (a) V (a). Since g(x) = a, V (a) f (x) = V (a). Therefore,
V (a) = f (x) = V (a). Since is a subgradient of V at a, V (a) + (a a) V (a) =
V (a), so that (a a) 0. Since, for all k, k 0 and ak ak 0, it follows that
(a a) 0 and so (a a) = 0. Since k 0 and ak ak , for all k, it follows
that k = 0 if ak < ak . That is, k = 0, if gk (x) < ak : PK
I now show that x solves the problem maxx2C [f (x) k=1 k gk (x)]. Because
is a subgradient of V at a, V (a) V (a) + (a a), for all a 2 A. Because k = 0 if
gk (x) < ak and gk (x) ak , for all k, g(x) = a. Let x 2 C and a = g(x). Then,
max f (x)
x2C
s.t. g(x) a
10
De…nition: If X is a set of N -vectors, the interior of X, written as int X, is
The Minkowski separation theorem gives conditions under which two sets of N -
vectors may be separated by a non-zero N -vector.
In the next …gure, the 2-vector p separates the sets X and Y . The dotted line H
is a hyperplane perpendicular to p that comes between X and Y . A hyperplane in
N -space is a set of the form a + W , where a is an N -vector and W is a subspace of
dimension N 1. If p separates X and Y , there is hyperplane that comes between
X and Y . For this reason, the Minkowski separation theorem is often referred to as
the theorem of the separting hyperplane.
The next …gure illustrates why the sets X and Y are assumed to be convex in
the statement of the theorem. The set X is not convex, has non-empty interior, and
clearly cannot be separated from Y .
11
The next …gure illustrates why it is assumed in the theorem that one of the sets
to be separated has non-empty interior. Both X and Y are convex, but both have
empty interior, so that the interior of one set does not intersect the other, though
the sets themselves intersect. The sets clearly cannot be separated.
The Necessity of the Kuhn–Tucker Conditions: I now prove that if the con-
straint quali…cation applies at a and if x solves the maximization problem
max f (x)
x2C
s.t. gk (x) ak , for k = 1; :::; K;
then there exists a K-vector such that x and satisfy the Kuhn–Tucker conditions.
Proof: By an earlier theorem, it is su¢ cient to prove that the value function, V
has a subgradient = ( 1 ; :::; K ) at a.
Let G = f(a; t) j a 2 A; t V (a)g and let Q = f(a; t) j a 2 RK ; a a, and
t V (a)g. The set G is the set of all points on or below the graph of V . This
set is convex, since the function V is concave and the set A is convex. It is clear
from the de…nition of Q that it is convex and has non-empty interior. Because V is
non-decreasing, the set G does not intersect the interior of Q. The next …gure should
help you visualize these sets.
12
It follows from the Minkowski separation theorem that there exists a non-zero
(K + 1)-vector v = (v1 ; :::; vK ; s) such that
I show that vk 0, for all k. Without loss of generality, I may let k = 1. Let a =
a + (1; 0; :::; 0). Clearly a belongs to A. Because V is non-decreasing, V (a) V (a),
so that (a; V (a)) belongs to G. Because (a; V (a)) belongs to Q, inequality 1 implies
that
v (a; V (a)) v (a; V (a)):
By de…nition of a,
v (a; V (a)) = v (a; V (a)) + v1 :
Substituting this equation into the previous inequality, we see that
13
inequality 1 implies that
Because ak > ak , for all k, and v 0, it follows that vk = 0, for k = 1; :::; K. Hence,
v = 0, since s = 0. This is impossible, since v 6= 0. This contradiction proves that
s < 0.
For k = 1; :::; K, let k = vks . Then, k 0, for all k, and the vector v may be
replaced, as a separating vector, by the vector ( 1 ; :::; K ; 1) = 1s v. That is,
In order to see that this inequality holds, notice that (a; V (a)) 2 Q and (a; V (a)) 2 G,
so that by inequality 2
That is, X X
k ak V (a) k ak V (a);
k k
14
MATH CAMP: Lecture 13
I believe the creators of this subject found from experience that piecewise continuous
functions were the appropriate class of control functions. In many examples, the set U
is compact and the optimal control stays on the boundary of U, moving continuously
most of the time but flipping occasionally from one side of U to another.
1
u
t1 t2 t3 T
f
f
1
0 1 t
½
0, if 0 ≤ t < 1
f (x) = 1
sin t−1 , if 1 < t ≤ 2
Notation: Let A denote the set of admissible controls. That is, A = {u : [0, T ] →
U | T > 0 and u is piecewise continuous}. Note that U is fixed but T is variable.
Assume that the functions f : RN × U → R and the functions gn : RN × U → R,
for n = 1, . . . , N , are continuously differentiable with respect to x1, . . . , xN , and
continuous with respect to x1, . . . , xN , u1 , . . . uK . Let g(x, u) = (g1 (x, u), ..., gN (x, u).
The problem under consideration is
Z T
max f (x(t), u(t))dt
u∈A 0
dx(t)
s.t. = g(x(t), u(t)), all t, x(0) = x0 and x (T ) = x1 .
dt
2
We seek necessary conditions for optimality. I will not prove that the conditions
to be stated are necessary, but I will derive them in a suggestive but non-rigorous
way using the Kuhn—Tucker theorem. This approach has the advantage of helping
familiarize you with the Kuhn-Tucker theorem. Assume that f and the functions gn
are concave, as they often are in economic problems. Divide the time interval [0, T ]
into M short intervals of length 4t, where ∆t = T M −1 . Assume that N = K = 1, so
that x(t) and u(t) are real numbers. Let us look at the behavior of x and u at times
m∆t, for m = 0, 1, ..., M . Our problem is approximately the following
M−1
X
max ∆tf (x(m∆t), u(m∆t)) ( )
u(0),u(∆t),...,u((M−1)∆t)∈U
x(0),x(∆t),...,x(M∆t)∈R m=0
s.t. x(m∆t) − x((m − 1)∆t) ≤ ∆tg(x((m − 1)∆t), u((m − 1)∆t)),
for m = 1, ..., M,
x(0) ≤ x0 , and − x(M ∆t) ≤ −x1
are convex, the Kuhn—Tucker theorem implies that there exist non-negative numbers
λ(0), ..., λ(M ) and β such that
3
I now simplify the expression for the Lagrangian by using an analogue of integra-
tion by parts. Recall the fundamental theorem R t of calculus, which asserts that if the
function f : [0, T ] → R is differentiable, then 0 df (s)ds
ds = f (t) − f (0), for all t ∈ [0, T ].
Integration by parts is based on the equations
Z T
d
F (T )G(T ) − F (0)G(0) = [F (t)G(t)]dt
0 dt
Z T ∙µ ¶ µ ¶¸
dF (t) dG(t)
= G(t)dt + F (t) dt
0 dt dt
Z Tµ ¶ Z T µ ¶
dF (t) dG(t)
= G(t)dt + F (t) dt.
0 dt 0 dt
The first equation follows from the fundamental theorem of calculus. The second
follows from Leibniz’s rule for the differentiation of the product of two functions.
Hence
Z Tµ ¶ Z T µ ¶
dF (t) dG(t)
G(t)dt = F (T )G(T ) − F (0)G(0) − F (t) dt.
0 dt 0 dt
We are now dealing with differences; the derivative is replaced by the first difference
and the integral is replaced by a sum. If y1 , y2 , ... is a sequence, the first difference
of this sequence is the sequence ∆y1 , ∆y2 , ..., where ∆ym = ym+1 − ym , for all m.
Leibniz’s rule for first differences is that
4
which is the analogue of integration by parts. This equation implies that
M
X M
X
− ym+1 ∆xm = xm ∆ym − xM+1 yM+1 + x1 y1 .
m=1 m=1
Let xm = x((m − 1)∆t) and ym = λ(m − 1). Then the equation just derived implies
that
M
X
− λ(m)[x(m∆t) − x((m − 1)∆t)]
m=1
XM
= − λ(m)∆x((m − 1)∆t)
m=1
M
X
= x((m − 1)∆t)∆λ(m − 1) − λ(M )x(M ∆t) + λ(0)x(0).
m=1
∂
∆λ(m)+∆t [f (x(m∆t), u(m∆t))+λ(m+1)g(x(m∆t), u(m∆t))] = 0, for m = 0, ..., M −1.
∂x
5
Hence
λ(m + 1) − λ(m) ∂
= − [f (x(m∆t), u(m∆t)) + λ(m + 1)g(x(m∆t), u(m∆t))].
∆t ∂x
Think of λ as a function of the continuous time variable t, so that λ(t) is its value at
time t. Replace λ(m) by λ(m∆t) and let ∆t go to zero while increasing m so that
m∆t converges to t. The previous equation becomes
λ((m + 1)∆t) − λ(m∆t) ∂
= − [f (x(m∆t), u(m∆t))+λ((m+1)∆t)g(x(m∆t), u(m∆t))].
∆t ∂x
Taking the limit as ∆t goes to zero, we obtain the equation
dλ(t) ∂
= − [f (x(t), u(t)) + λ(t)g(x(t), u(t))], for all t.
dt ∂x
Similarly u(t) solves the problem
Finally if we assume that all the constraints hold with equality at the optimum,
we see that
dx(t)
= g(x(t), u(t)), for all t.
dt
These conclusions may be summarized using the Hamiltonian function, which
is the continuous time instantaneous analogue of the Lagrangian for problems with
finitely many variables. (In continuous time, there are infinitely many variables,
namely, x(t) and u(t) for all t.) The Hamiltonian function is defined to be
If we assume that x(t) and u(t) are numbers rather than vectors, then
That is, there is no inner product. In this case, we may summarize our intuitively
derived findings as follows
dx(t) ∂
= H(x(t), u(t), λ(t)), for all t.
dt ∂λ
dλ(t) ∂
= − H(x(t), u(t), λ(t)), and
dt ∂x
The equations dx ∂ dλ ∂
dt = ∂λ H and dt = − ∂x H are called the Hamiltonian system. The
fact that u(t) maximizes H(x(t), u, λ(t)) with respect to u is called the maximum
principle. The maximum principle implies that ∂H ∂u (x(t), u(t), λ(t)) = 0.
6
Moreover, if u(t) is differentiable,
d ∂H dx ∂H du ∂H dλ
H(x(t), u(t), λ(t)) = + +
dt ∂x dt ∂u dt ∂λ dt
∂H ∂H du ∂H ∂H
= + (0) + (− ) = 0,
∂x ∂‘λ dt ∂λ ∂x
so that H is constant along the optimal path. (This is so because f and g do not
depend directly on time.) This result is true even if the function u is not differentiable.
If x(t) and u(t) are vectors, then the above statements remain true. More pre-
cisely, a necessary condition for optimality is that there exist piecewise differentiable
functions λ1 (t), ..., λN (t) such that (λ1 (t), ..., λN (t)) 6= 0, for all t, and such that if
H(x1 , . . . , xN , u1 , . . . , uK , λ1 , . . . λN )
N
X
= f (x1 , . . . , xN ; u1 , . . . , uK ) + λn gn (x1 , . . . , xN ; u1 , . . . , uK ),
n=1
then
dxn (t) ∂
= H(x(t), u(t), λ(t)) and
dt ∂λn
dλn (t) ∂
= − H(x(t), u(t), λ(t))
dt ∂xn
Similarly because u(m∆t) solves problem (∗ ∗ ∗), u(t) solves the problem
for n = 1, ..., N, and u(t) solves the problem maxu∈U H(x(t), u, λ(t)), for all t. Fur-
thermore,
d
H(x(t), u(t), λ(t)) = 0.
dt
These statements are true even if f and the gk are not concave. The variables
x1 (t), ..., xN (t) are the state variables. The variables λ1 (t), ..., λN (t) are called dual,
conjugate, costate or auxiliary variables.
From Kuhn—Tucker theory we know that the Kuhn—Tucker coefficients are sub-
gradients of the value function. Let
Z T
W0 (x0 ) = max f (x(s), u(s))ds ( )
u∈A 0
dx(s)
s.t. = g(x(s), u(s)), for all s,
dt
x(0) = x0 and x(T ) = xT ,
Then, λ(0) is the derivative of W0 , if W0 is differentiable. Let (x̄, ū) solve the
maximization problem ( ), and let λ̄(t),for 0 ≤ t ≤ T, be the conjugate function
7
corresponding to (x̄, ū). Let
Z T
Wt (x) = max f (x(s), u(s))ds
u∈ϕ t
dx(s)
s.t. = g(x(s), u(s)), for all s,
ds
x(t) = x and x(T ) = xT .
Then
λ̄(t) = DWt (x̄(t))
The maximum principle and Hamiltonian system determine the evolution of
(x(t), λ(t), u(t)) and hence determine (x(t), λ(t), u(t)), for all t, given appropriate
initial conditions for the differential equations governing the evolution of x(t) and
λ(t). Since x(t) and λ(t) each have N components, 2N initial conditions are re-
quired. These are provided by the N components of x(0) and the N components of
x(T ). These statements may become clearer when considering the following example.
8
The Hamiltonian is H(K, C, λ) = u(C) + λ[f (K) − C]. If λ(t) is the dual variable,
then the evolution conditions are
dK(t) ∂H
= = f (K(t)) − C(t),
dt ∂λ
dλ(t) ∂H df (K(t))
= − = −λ(t) ,
dt ∂K dK
and C = C(t) maximizes H(K(t), C, λ(t)) = u(C) + λ(t)[f (k(t)) − C], for all t. The
first two conditions are the Hamiltonian system. The third condition follows from the
maximum principle.
The third condition implies that if C(t) > 0, then
∂H du(C(t))
0= (K(t), C(t), λ(t)) = − λ(t).
∂C dC
That is,
du(C(t))
λ(t) = .
dC
du(C(0)) dK(0)
The equations dC = λ(0) and dt = f (K(0) − C(0) show that there is a
dK(t)
u(C(t)) + λ(t) = constant.
dt
If C(t) > 0, so that λ(t) = du(C(t))/dC, then
du(C(t)) dK(t)
u(C(t)) + = constant.
dC dt
9
MATH CAMP: Lecture 14
I now show that if C(0) > 0, then C(t) > 0, for all t, and C(t) is forever increasing.
If C(0) > 0; then (0) = du(C(0))=dC > 0. Since d (t)=dt = (t)[df (K(t))=dK]
and df (K)=dK > 0, it follows that d (t)=dt < 0, as long as (t) > 0. The equation
(t) = du(C(t))=dC implies that (t) > 0 and implies by implicit di¤erentiation that
1
dC(t) d2 u(C(t)) d (t)
= :
dt dC 2 dt
Because d (t)=dt < 0 and d2 u(C(t))=dC 2 < 0, this equation implies that dC(t)=dt >
0.
I next show that if dK(t)=dt = 0, then dK(t)=dt is decreasing. Because dK(t)=dt =
f (K(t)) C(t), it follows that
(C _ 2 + K_ 2 = :
C + K)
K_ 2 (bK C)2 = :
1
This equation may be rewritten as
K_ 2 b2 (K K)2 = ;
where K = C=b is the bliss capital stock. Since C = bK, K is the amount of capital
needed to produce the bliss consumption C.
Suppose that = 0 and let y = b(K K). Then, y_ = bK; _ where y_ = dy(t) : Hence,
dt
y_ 2
= _
K 2 = b2 (K K) 2 = y 2 ; so that y_ 2 = b2 y 2 and so y_ = by or y_ = by. If y_ = by;
b2
then y(t) = y(0)ebt : If y_ = by; then y(t) = y(0)e bt . Therefore, either
2
The arrows indicate the direction of motion. This may by determined by the sign
_
of K and hence the direction of motion of K. Notice that the solution on the upper
left branch of the cross in the diagram corresponds to y0 < 0, so that K(t) increases
toward K as t increases.
Now suppose that 6= 0. The equation K_ 2 b2 (K K)2 = describes a hyperbola.
The branches of the hyperbolas are shown in the diagram and they approach the
lines K_ = b(K K) asymptotically. The movement along the curves is indicated
by the arrows. Again the direction of motion along a curve is determined by the sign
_ Curves that are further from the abscissa correspond to higher values of K_
of K:
and hence to faster movement.
Now, …x K0 and K1 as in the next diagram. Five possible paths from K0 to K1
are shown, indicated by 1, 2, 3, 4, and 5. The initial and end points of these paths
are indicated by heavy dots.
3
The higher is a path, the faster is the movement along the path and hence the
more quickly it reaches K1 , unless the path overshoots, increases K above K1 , and
then falls back to K1 : The path labeled 1 goes from K0 to K1 most quickly and among
the labeled paths has the lowest value of T . The path 2 along the line K_ = b(K K)
is the next fastest and so has the next lowest value of T . The path 3 from the …rst
to the second dot is the next fastest. The path 4 is next, and then what probably
comes next is the path 3 from the …rst to the third dot. This path overshoots. A
path can be made to last an arbitrarily long time by starting with K_ su¢ ciently close
to b(K K0 ) but less than b(K K0 ). These paths overshoot, spend a long time
near K and then fall back to K1 . The path 2 from the …rst dot to the third dot is an
example of such a path. The graph of K versus time for such a path looks like the
following diagram.
_
As T is increased, the initial value K(0) must approach the number b(K K0 ) from
below and the proportion of the time spent near K increases to one. This property of
4
lingering near K is not speci…c to this example, but applies quite generally to growth
models. It is called the “turnpike property,” where the term “turnpike” stems from
the resemblance of the diagram to a map of a superhighway with entrance and exit
ramps. In general models, the bliss level of capital is replaced by the stationary
optimal level of capital, which is the level of capital such that an optimal program
would stay at that level if it started there and was to end there.
The relation between T and the initial value of K_ may be visualized as follows.
Let K_ A (0) and K_ B (0) be as in the above …gure. The next …gure shows that the time
T to go from K0 to K1 along the hyperbolic paths shown in the above …gure and as
a function of the initial value of K;_ call it K(0):
_
There are two branches to this function. The lower branch is the time to the …rst
time T such that K(T ) = K1 : The upper branch is the time to the second time T
5
_
such that K(T ) = K1 . This branch goes to in…nity as K(0) approaches K_ B (0) from
_ _
the left. Neither branch exists for K(0) to the left of KA (0); since K1 cannot be
_
reached from K0 if K(0) < K_ A (0):
The question arises as to how a path such as that labeled (4) in the diagram
_
could cross the abscissa from above, if K(t) = dK(t)=dt vanishes there. Recall I have
shown for the more general growth model that d2 K(t)=dt2 < 0, if dK(t)=dt = 0.
This reasoning depended on the assumption that du(C)=dC > 0, whereas the utility
function considered here, u(C) = a2 (C C)2 , has negative slope if C > C. Notice,
however, that if K < K, then total output, bK, is less than bK = C. We know that
dK(t)
C(t) + = bK(t):
dt
Therefore if dK(t)=dt 0, then C(t) bK(t) < bK = C, so that du(C(t))=dC > 0.
Hence the reasoning made before applies along a path on which dK(t)=dt is non-
negative or even if it is negative but exceeds b(K(t) K). We may therefore conclude
that along a path such as that labeled (4) in the …gure,
dC(t)
> 0;
dt
and hence
d2 K(t) dK(t) dC(t) dK(t)
2
=b <b
dt dt dt dt
and hence d2 K(t)=dt2 < 0 at the time when the path crosses the abscissa.
Clearly a di¤erent reasoning applies to paths in the quadrant to the right of
the point (K; 0) in the diagram, for the optimal paths there cross the abscissa from
below, so that d2 K(t)=dt2 > 0 when a path crosses the axis. In this quadrant,
however, K > K, so that consumption must exceed C when the path is near the
abscissa and hence dK(t)=dt is nearly zero. The utility function is decreasing at such
high consumption levels, so that a reasoning opposite to that made earlier implies
that if dK(t)=dt is nearly zero then
du(C(t))
(t) = <0
dC
and so
d (t) dF (K(t))
= (t) >0
dt dK
and so
1
dC(t) d2 u(C(t)) d (t)
= <0
dt dt2 dt
and hence
d2 K(t) bdK(t) dC(t)
2
= > 0;
dt dt dt
dK(t)
when dt = 0:
6
The Contraction Mapping Theorem
I will soon discuss dynamic programming with discrete time. That theory uses the
contraction mapping theorem, which I now present.
Let X be a compact subset of RN , let C(X) = ff : X ! R j f is continuousg. If
f 2 C(X); let kf k = maxx2X jf (x)j. Because X is compact and f is continuous, kf k
exists.
jf (xn ) f (x)j jf (xn ) fK (xn )j + jfK (xn ) fK (x)j + jfK (x) f (x)j
" " "
< + + = ":
3 3 3
Since " is arbitrarily small, limn!1 f (xn ) = f (x).
Lemma: C(X) with the norm k k is complete. That is, if fn is a sequence in C(X)
that is Cauchy with respect to k k, then there is an f 2 C(x) such that limn!1 fn =
f.
Proof: fn is Cauchy if
If fn is Cauchy, then for each x 2 X, the sequence fn (x) is Cauchy. Since the real
numbers are complete, there exists an f (x) 2 R such that limn!1 fn (x) = f (x).
Therefore, there is a function f : X ! R such that limn!1 fn (x) = f (x), for all
x 2 X. I show that limn!1 kfn f k = 0. Let " > 0. Since fn is Cauchy, there exists
N such that kfn fm k < "=2, if n N and m N . If x 2 X, there is a k depending
on x such that k N and jfk (x) f (x)j < "=2. If n N , then
7
De…nition: If Q : C(X) ! C(X), a …xed point of Q is an f 2 C(X) such that
Q(f ) = f .
kQn (f ) Qm (f )k Qn (f ) Qn 1
(f ) + Qn 1
(f ) Qn 2
(f )
m+1 m
+ + Q (f ) Q (f )
n 1 n 2 m
( + + + ) kQ(f ) fk
n 1 n 2 N
( + + + ) kQ(f ) fk
N +1 N
(:::: + + ) k Q(f ) fk
N
= kQ(f ) f k ! 0, as N ! 1:
1
Therefore, the sequence Qn f is Cauchy with respect to k k. Therefore, by the
previous lemma, there exists a g 2 C(X) such that limn!1 kQn (f ) gk = 0. Since
Qn+1 (f ) Q(g) < kQn (f ) gk, it follows that limn!1 Qn+1 (f ) Q(g) = 0;
which means that limn!1 kQn (f ) Q(g)k = 0. Since limn!1 kQn (f ) gk = 0, it
follows that g = Q(g) and hence g is a …xed point of Q.
I now show that the …xed point is unique. Suppose that f and g are …xed points of
Q and that f 6= g: Then,kf gk > 0 and because Q is a contraction with coe¢ cient
;
kf gk = kQ(f ) Q(g)k kf gk ;
which is impossible since < 1:This contradiction proves that Q cannot have two
distinct …xed points.
Dynamic Programming
I do not explain dynamic programming in general, but present its ideas using a growth
model, which I now de…ne. Let there be N commodities. A vector x in RN will be
thought of as a commodity vector in that its nth component, xn ; represents a quantity
of commodity n; for n = 1; :::; N: Assume that because the earth is bounded, there is
a number b such that no more than b units of any commodity could ever be produced.
Let B = fx 2 RN j 0 xn b, for all ng. Let u : B ! R be a utility function and
F : B ! B be a production function. A consumption vector is denoted c, an output
8
vector by y, and a capital vector by K. Assume that u and F are continuous. The
optimal growth problem is,
1
X
t
max u(ct )
(c0 ;c1 ;:::)
t=0
s.t.for some y1; y2; :::: in B; (1)
0 ct yt , and yt+1 = F (yt ct ), for all t;
where y0 in B is given and 0 < < 1. The number is a discount factor, and y0 is
the vector of initial stocks of goods.
The quantity yt ct is the vector of production inputs or capital in period t
and could be denoted by kt . In the above problem, the consumptions ct are control
variables and the output vectors yt are state variables.
The value function for the above problem is
1
X
t
V (y0 ) = max u(ct )
(c0 ;c1 ;:::)
t=0
s.t. for some y1; y2; ::: in B;
0 ct yt , and yt+1 = F (yt ct ), for all t 0;
where y0 in B is given.
Observe that this maximization problem can be written as
2 2 33
6 6 1
X 77
6 6 t 1 77
max 6u(c0 ) + 6 max u(ct )77 :
c0 :0 c0 y0 4 4(c1 ;c2 ;:::):0 ct yt , for t1; 55
y1 =F (y0 c0 ), and t=1
yt =F (yt 1 ct 1 ), for t 2
That is, the maximization problem from time 1 on is just like that from time 0 on,
except that the initial stock of goods may be di¤erent. Therefore
8 9
>
> >
>
>
< X1 >
=
t 1
V (y0 ) = max u(c0 ) + [ max u(ct )
c0 :0 c0 y0 >
> (c1 ;c2 ;:::):0 ct yt , for t 1; >
>
>
: y1 =F (y0 c0 ), and t=1 >
;
yt =F (yt 1 ct 1 ), for t 2
= max [u(c0 ) + V (F (y0 c0 ))]:
c0 :0 c0 y0
9
MATH CAMP: Lecture 15
1
X
t
max u(ct )
(c0 ;c1 ;:::)
t=0
s.t. for some y1; y2; :::: in B; (1)
0 ct yt , and yt+1 = F (yt ct ), for all t;
where W 2 C(B), and y 2 B. It may then be shown that V is the value function
for problem (1), so that this problem has a solution. I use this approach, though the
…rst approach is just as good.
I must …rst show that Q maps C(B) to C(B). If W 2 C(B), then u(c)+ W (F (y
c)) is a continuous function of c. Because c varies over the compact set fc j 0 c yg,
it follows that the maximum appearing in the de…nition of Q exists.
I must show that Q(W )(y) is a continuous function of y. The proof I give uses
the concept of uniform continuity.
Proof: Let " > 0. Because f is continuous, for each b in B, there is a posi-
tive number (b) such that if jf (x) f (b)j < "=2 if kx bk < 2 (b). If B (b) (b) =
fx 2 B j kx bk < (b)g, then fB (b) (b) j b 2 Bg is an open cover of B. By
the Heine–Borel theorem, there is a …nite subcover, B (b1 ) (b1 ); :::; B (bK ) (bK ). Let
= mink=1;:::;K (bk ). Then > 0. If kx yk < , where x and y both belong to B,
then x 2 B (bk ) (bk ), for some k. Since kx yk < (bk ), it follows that
1
Therefore jf (y) f (bk )j < "=2. Similarly since kx bk k < (bk ), it follows that
jf (x) f (bk )j < "=2. Hence
" "
jf (x) f (y)j jf (x) f (bk )j + jf (bk ) f (y)j < + = ":
2 2
This completes the proof that the function Q maps C(B) into C(B).
Proof: I show that kQ(V ) Q(W )k kV W k ; for all V and W in C(B). Let
y 2 B and let cv be such that 0 cv y and Q(V )(y) = u(cv ) + V (F (y cv )).
Similarly, let cw be such that 0 cw y and Q(W )(y) = u(cw ) + W (F (y cw )).
Then
Q(V )(y) = u(cv ) + V (F (y cv ))
u(cw ) + V (F (y cw ))
u(cw ) + W (F (y cw )) kV Wk
= Q(W )(y) kV Wk:
By the symmetric argument, Q(W )(y) Q(V )(y) kV W k. Therefore
jQ(V )(y) Q(W )(y)j kV Wk:
Hence,
kQ(V ) Q(W )k = maxy2B jQ(V )(y) Q(W )(y)j kV Wk:
It follows from the contraction mapping theorem that there exists a unique V 2
C(B) such that Q(V ) = V . I next show that V is the value function for optimization
problem ( ).
2
Theorem: If V is the unique …xed point of Q, then
1
X
t
V (y0 ) = max u(ct )
(c0 ;c1 ;:::)
t=0
s.t. for some y1; y2; :::: in B;
0 ct yt and yt+1 = F (yt ct ), for all t;
3
" T
#
X
s t T t+1
V (yt ) = max u(cs ) + V (F (yT cT )) (3)
(ct ;:::;cT )
s=t
s.t. for some yt+1 ; :::; yT in B;
0 cs ys ; for s = t; t + 1; :::; T; and
ys+1 = F (ys cs ); for s = t; t + 1; :::; T 1:
By the de…nition of Q,
Q(V )(yT ) = max [u(cT ) + V (F (yT cT ))]:
cT :0 cT yT
Now suppose by induction that statements (2) and (3) are true for t; where
0 t T: Then
T
X
s t+1 T t+2
u(cs ) + V (F (yT cT ))
s=t 1
" T
#
X
s t T t+1
= u(ct 1) + u(cs ) + V (F (yT cT ))
s=t
= u(ct 1) + V (yt ) (by the induction assumption)
= u(ct 1) + V (F (yt 1 ct 1 )) (because yt = F (yt 1 ct 1 ))
= max [u(c) + V (F (yt 1 c))] (by the de…nition of ct 1)
c:0 c yt 1
4
Statement 3 for t 1 is that, for any yt 1 in B;
" T #
X
s t+1 T t+2
V (yt 1 ) = max u(cs ) + V (F (yT cT ))
ct 1 ;:::;cT
s=t 1
s.t. for some yt; :::; yT in B;
0 cs ys; for s = t 1; t; :::; T; and
ys+1 = F (ys cs ); for s = t 1; :::; T 1:
In order to prove this statement, observe that
" T
#
X
s t+1 T t+2
max u(cs ) + V (F (yT cT ))
ct 1 ;:::;cT
s=t 1
s.t. for some yt ; :::; yT in B;
0 cs ys ; for s = t 1; t; :::; T; and
ys+1 = F (ys cs ); for s = t 1; :::; T 1
( " " T ##)
X
s t T t+1
= max u(ct 1 ) + max u(cs ) + V (F (yT cT ))
ct 1 :0 ct 1 yt 1 ct ;:::cT
s=1
s.t. for some yt ; :::; yT in B;
0 cs ys ; for s = t; t + 1; :::; T; and
ys+1 = F (ys cs ); for s = t; t + 1; :::; T 1
= max [u(ct 1) + V (F (yt 1 ct 1 ))] (by the induction assumption)
ct 1 :0 ct 1 yt 1
5
Because V and F are continuous and C = f(c; y) j y 2 B; 0 c yg is compact,
the number b1 = max(c;y)2C jV (F (y c))j exists and is …nite. Therefore,
T
V (y0 ) u(c0 ) u(c1 ) u(cT )
= T +1
jV (F (yT cT ))j 5 T +1
b1 ! 0, as T ! 1:
Therefore,
1
X
t
V (y0 ) = u(ct ):
t=0
I now show that (c0 ; c1 ; : : :) solves problem (1), so that V is not just a …xed
point of Q; but is the value function for problem (1). Suppose that there exist
(^
Pc0 ; c^1 ; : : :) and (^y1 ; y^2 ; : : :) such that 0 c^t y^t and y^t+1 = F (^
P yt c^t ), for all t, and
1 t 1 t
t=0 u(^ct ) = t=0 u(c t ) + ", where " > 0. Let b 2 = maxc2B ju(c)j and let T be
so large that
T
" "
b2 < and T b1 < :
1 5 5
Then
X T
t
u(^ct ) + T +1 V (F (^
yT c^T ))
t=0
XT T
X
t T +1 t "
u(^
ct ) b1 > u(^
ct )
5
t=0 t=0
1
X 1
t
T
" X t 2"
u(^
ct ) b2 > u(^
ct )
1 5 5
t=0 t=0
X1 1
X
t 2" t 3"
= u(ct ) + " = u(ct ) +
5 5
t=0 t=0
XT T
t
T
3" X t 2"
u(ct ) b2 + > u(ct ) +
1 5 5
t=0 t=0
XT
t T +1 "
> u(ct ) + b1 +
5
t=0
XT
t T +1 "
> u(ct ) + V (F (yT
cT )) + ;
5
t=0
P
which is impossible since (c0 ; : : : ; cT ) maximizes Tt=0 t u(ct ) + T +1 V (F (yT cT ))
over (c0 ; : : : ; cT ) and (y1 ; : : : ; yT ) such that 0 c0 y0; y1 = F (y0 c0 ); and 0
ct yt and yt+1 = F (yt ct ); for t = 0; :::; T 1: This contradiction proves that
(c0 ; c1 ; : : :) solves problem (1).
Dynamic programming is the use of the value function to study problems with
a temporal structure like that in growth theory. The value function can be used to
derive properties of the optimal paths and to interpret them.
6
I next show that V inherits certain properties of u and F .
Theorem: If u and F are concave and non-decreasing, then V is concave and non-
decreasing, where V is the unique …xed point of Q.
it is clear that Q(W ) is non-decreasing. I show that Q(W ) is concave. Suppose that
y 0 2 B and y0 2 B, and 0 < < 1. I must show that
and
Q(W )(y0 ) = u(c0 ) + W (F (y0 c0 ));
where 0 c0 y 0 and 0 c0 + y0 . Then,
0 c0 + (1 )c0 y 0 + (1 )y0
and
Q(W ) y 0 + (1 )y0
u( c0 + (1 )c0 ) + W (F ( (y 0 c0 ) + (1 )(y0 c0 )))
(by the de…nition of Q(W ))
u(c0 ) + (1 )u(c0 ) + W ( F (y 0 c0 ) + (1 )F (y0 c0 ))
(because u and F are concave and W is non-decreasing)
u(c0 ) + (1 )u(c0 ) + W (F (y0 c0 )) + (1 ) W (F (y0 c0 ))
(because W is concave)
= [u(c0 ) + W (F (y 0 c0 ))]
+(1 )[u(c0 ) + W (F (y0 c0 ))]
= Q(W )(y 0 ) + (1 )Q(W )(y0 ):
7
Suppose that not only are u and F concave, but that V is strictly concave, where
again V is the unique …xed point of Q. Then, the problem
It need not be the case that concave functions have subgradients at points not in
p
the interior of their domain of de…nition. Consider the function f (x) = x de…ned on
the set of non-negative numbers. This function is concave, since its second derivative
is negative, but it has no derivative (or in…nite derivative) at zero, and so has no
subgradient there. Of course, zero is not in the interior of its domain of de…nition.
Theorem: Suppose that u and F are non-decreasing and concave and that u is
di¤erentiable. Suppose that y0 2 int B and let c0 solve the problem
8
it follows that
V (y0 ) u (c0 + y0 y0 ) + V (F (y0 c0 ));
provided y0 2 B is so close to y0 that c0 +y0 y0 0. (Clearly, c0 +y0 y0 y0 , since
c0 y0 . The consumption vector c0 +y0 y0 absorbs all the change in the initial stock
of goods.) The functions V (y0 ) + (y0 y0 ) and u (c0 + y0 y0 ) + V (F (y0 c0 ))
are di¤erentiable functions of y0 . Since
and all three functions have the same value at y0 = y0 , it follows that V is di¤er-
entiable at y0 and that all three functions have the same derivative at y0 . That is,
DV (y0 ) = Du(c0 ) = . The argument should be made plausible by the following
picture.
Suppose that (c0 ; c1 ; : : :) and (y1 ; y2 ; : : :) are optimal for problem (?) and that 0
ct yt , for all t. Assume that u and F are non-decreasing, concave, and di¤erentiable.
Then, the previous theorem implies that DV (yt+1 ) = Du(ct+1 ). Because ct solves the
problem
This equation is known as Euler’s equation. Euler’s equation together with the
9
equation yt+1 = F (yt ct ) determine the evolution of the optimal path (ct ; yt )1 t=0 .
The truth of this remark may be seen as follows. You are given y0 . Choose c0 . Then,
y1 = F (y0 c0 ) is determined. The Euler equation Du(c0 ) = Du(c1 )DF (y0 c0 )
now determines c1 , provided this equation can be solved for c1 .
The equation Du(c0 ) = Du(c1 )DF (y0 c0 ) determines c1 ; provided D2 u(c)
exists and is negative de…nite, for all c, and DF (K) is invertible. That is, the
solution c1 to this equation is unique, if it exists, To see that this is so, suppose that
and
Du(c0 ) = Du(^
c1 )DF (y0 c0 );
where c1 6= c^1 : Then,
1 1
Du(c1 ) = Du(c0 ) (DF (y0 c0 )) = Du(^
c1 )
Hence,
(^
c1 c1 )T Du(c1 ) = (^
c1 c1 )T Du(^
c1 )
By the mean value theorem,
0 = (^
c1 c1 )T Du(^
c1 ) (^
c1 c1 )T Du(c1 )
= (^
c1 c1 )T D2 u(c1 + (^
c1 c1 )) (^
c1 c1 ) ;
(^
c1 c1 )T D2 u(c1 + (^
c1 c1 )) (^
c1 c1 ) < 0:
This contradiction proves that there can be at most one solution c1 to the equation
Knowledge of du(ct+1 )=dc clearly determines ct+1 , provided d2 u(ct+1 )=dc2 < 0.
In this discrete time optimization problem, Euler’s equation and the equation
yt+1 = F (yt ct ) play the same role as do the maximum principle and the Hamiltonian
system in continuous time models. Both systems determine the evolution of the path.
10
These evolutionary systems do not determine an optimal path, however. Recall
that in the discussion above, I …xed c0 as well as the initial output vector y0 . Once
c0 is …xed, it might or might not be possible to continue de…ning ct and yt , for all t,
for it is possible that for some t, ct > yt . That is, the path could become infeasible
and so could not be continued. It is also possible that the path (ct ; yt )1
t=0 could be
continued inde…nitely, but is nevertheless not optimal because consumption converges
to zero at the same time that output converges to a high level. The next example will
illustrate these possibilities. The key problem, then, is to choose the initial value of
consumption, c0 , correctly. The purpose of transversality conditions is to guide this
choice.
Example: N = 1: u(c) = ln c.
p
F (y c) = 2 y c:
Euler’s equation is
11
p p 2
limt!1p Kt = K, where K = K, so that K = and
p hence K = 2 . Let
y = 2 K = 2 . Then, limt!1 ct = c, where c = y K = 2 K K = 2 . Since
2
c = y, it follows that 2 = 2 and hence = (2 )=2 = 1 =2.
The calculated path, (ct ; Kt )1
t=0 is optimal, as I will show.
P1 t t One approach to proving
p
that it is optimal would be to calculate V (y0 ) = t=0 u(c ), where y0 = 2 K 1 ,
for each value of K 1 or y0 , and then to show that Q(V ) = V , so that V is the value
function. It would then follow that (ct ; Kt )1t=0 is optimal. I do not know how to do
this calculation.
so that
at
at+1 = :
21 at
12
If we graph this di¤erence equation, we obtain the next …gure. Note that if a = 2 1 a a ,
then a = 1 2 = , where is the stationary consumption rate calculated earlier.
13
This path has at = 1 2 , for all t, and is the path computed earlier. This ends the
discussion of the example.
The fact that paths can be feasible, satisfy Euler’s equation and yet not be opti-
mal is called the Hahn problem after Frank Hahn who …rst noticed it. The Hahn
problem has led to the search for easy criteria that separate optimal from suboptimal
in…nite horizon feasible paths that satisfy Euler’s equation. It is not easy to …nd con-
ditions su¢ cient for optimality, but necessary conditions come to hand. One condition
follows from the following train of thought. Return now to the model with perhaps
more than one commodity. Assume that u and F are di¤erentiable non-decreasing,
and concave and assume for the sake of presentation that the value function V is dif-
ferentiable. Let (ct ; yt )1 t=0 be an optimal path and assume that ct >> 0, for all t. Let
t t
t = DV (yt ) = Du(ct ). The t are like Kuhn-Tucker coe¢ cients. Because V is
concave, t is a subgradient of t V (y) at y = yt . Because t is a subgradient of t V at
yt , t V (0) t
V (yt ) + t (0 yt ), so that t yt t
[V (yt ) V (0)]. Because yt 2 B,
yt is bounded and so V (yt ) is bounded too and therefore limt!1 t [V (yt ) V (0)] = 0.
Since t 0 and yt 0, t yt 0 and so limt!1 t yt = 0. Thus, a necessary
condition for optimality is that limt!1 t yt = 0, where the t are “Kuhn–Tucker
coe¢ cients” corresponding to the in…nite horizon problem
1
X
t
max u(t)
t=0
s.t. for some y1; y2; ::: in B;
0 ct yt and yt+1 F (yt ct ), for t 0; (4)
with y0 in B given. These coe¢ cients may be de…ned even if u is not di¤erentiable,
14
but in the di¤erentiable case, t = Du(ct ), for all t, provided ct 0. Another
common condition derived from the one just stated is that limt!1 t Kt = 0; where
Kt = yt ct . These conditions are called transversality conditions, though that
term is more commonly applied to continuous time models.
Transversality Conditions
The Hahn problem arises also in continuous time growth models. In…nite hori-
zon paths that satisfy the maximum principle and the Hamiltonian system may be
suboptimal because consumption converges to zero over time and there is an over-
accumulation of capital. Conditions have been devised to exclude such paths, and
these conditions are called transversality conditions.
Just as in the case of discrete time growth models, there is a value function,
Z 1
V (y0 ) = max e rt u(c (t)) dt
c:[0;1]![0;1) 0
dK(t)
s.t. c(t) + = F (K(t)), for all t, and
dt
dK(0)
c(0) + = y0 ;
dt
where y0 is given and positive and r > 0. If (t) is the dual or conjugate vector at
time t, then (t) is a subgradient of e rt V (y) at y = y(t), where y(t) = F (K(t)), and
(c(t); K(t)) is an optimal path.
Because V is concave and (t) is a subgradient of e rt V (y) at y = y(t), (t) y(t)
e rt [V (y(t)) V (0)] ! 0 as t ! 1. Therefore, a necessary condition for optimality
is that limt!1 t yt = 0. A similar condition is that limt!1 t Kt = 0. These are
typical transversality conditions.
15
the optimization problem is the river, the value V is zero and hence constant there.
Since x(T ) = h(s) is a point on the river, (T ) is orthogonal to this level curve. That
is (T ) Dh(s) = 0:
16