MATH CAMP: Lecture 1: 1 Linear Algebra

MATH CAMP: Lecture 1
1 Linear Algebra
Simultaneous Linear Equations:
Example:
3x1 + 2x2 = 6
6x1 + 7x2 = 2
Now solve. Subtract twice …rst equation from second
6x1 + 7x2 = 2
6x1 4x2 = 12
3x2 = 14
14
x2 =
3
Substitute this into equation 1
14
3x1 + 2 =6
3
28 18 + 28 46
3x1 = 6 + = =
3 3 3
46
x1 =
9
The next two equations are equivalent to the …rst two.
3x1 + 2x2 = 6
3x2 = 14
These are equivalent to the following pairs of equations.
x1+ 32 x2 = 2 or x1 = 2 23 14
3 = 46
9
x2 = 314 x2 = 14
3
The last pair is said to be row reduced and in echelon form. We want to do this
more generally
a11 x1 + a12 x2 + + a1N xN = y1
a21 x1 + a22 x2 + + x2N xN = y2
..
.
aM 1 x1 + aM 2 x2 + + aM N xN = yM
1
The amn and yn are numbers and x1 ; :::; xN are unknown. In order to be more
systematic we write the equation as:
Ax = y; where
0 1
a11 a1N
B .. .. C M
A = @ . . A N matrix
aM 1 aM N
0 1
x1
B C
x = @ ... A N -vector of unknowns
xN
0 1
y1
B C
y = @ ... A M -vector of numbers.
yM
Ax is the M -vector
a11 x1 + + a1N xN
a21 x1 + + a2N xN
..
.
aM 1 x1 + + aM N xN
Consider the following so called elementary operations on M N matrix A:
1. Multiply a row of A by a non-zero number.
2. Replace a row by that row plus c times another row, where c is a non-zero
number.
3. Interchange two rows.
If the M N matrix B is obtained from A by any one of these operations, then
A and B are equivalent in the sense the equations Bx = 0 and Ax = 0 have the same
solutions. (Think this through.)
.
Similarily, if the M (N + 1) matrix (B .. z) is obtained from the M (N + 1)
..
matrix (A . y) by an elementary row operation, then the systems Bx = z and Ax = y
have the same solutions.
Elementary row operations can transform any system Ax = y into a system
Bx = z where
a) the …rst non-zero entry in any row of B is 1, and
b) each column of B that contains the leading non-zero entry of some row has all
its other entries 0.
De…nition: Such a matrix B is said to be row-reduced.
Example: 0 1
0 1 4 0
B0 0 0 0C
B C is row reduced.
@1 0 3 0A
0 0 0 1
2
Example:
0 10 1 0 1
3 2 1 x1 3
@6 4 2A @x2 A = @6A
6 8 5 x3 0
0 10 1 0 1
3 2 1 x1 3
! @0 0 0A @x2 A = @ 0 A
0 4 3 x3 6
0 10 1 0 1
1 2=3 1=3 x1 1
! @0 0 0 A @x2 A = @ 0 A
0 1 3=4 x3 3=2
0 10 1 0 1
1 2=3 1=3 x1 1
! @0 1 3=4A @x2 A = @ 3=2A
0 0 0 x3 0
0 10 1 0 1
1 0 1=6 x1 2
! @0 1 3=4 A @x2 A = @ 3=2A
0 0 0 x3 0
0 1
1 0 1=6
The matrix @ 0 1 3=4 A is an examle of a row reduced echelon matrix.
0 0 0
De…nition: A matrix is a row reduced echelon matrix if:
a) it is row reduced,
b) any row of zeros lies below all non-zero rows, and
c) if the …rst r rows are non-zero and the leading non-zero entry of row m is in
column nm for m = 1; : : : ; r then n1 < n2 < < nr .
0 1
1 0 3 0
B0 1 4 0C
Example: The matrix B
@0
C is a row reduced echelon matrix
0 0 1A
0 0 0 0
Theorem: Every matrix is equivalent to a row reduced echelon matrix.
Proof : This may be achieved by elementary row operations.
Theorem: If A is an M N matrix and M < N , then the system Ax = 0 has a

non-zero solution.
Proof : Let B be a row reduced echelon matrix equivalent to A. Ax = 0 () Bx = 0:

Let r be the number of non-zero rows of B. Then r M < N and rows 1; :::; r are
3
the non-zero rows of B. For 1 m r, let the leading non-zero entry of row m
be in column nm , where n1 < n2 < < nr . Since r < N , there is an n such that
1 n N and n 6= nm for any m. For such an n, let xn = 1: For k such that
1 k N , let xk = 0, if k 6= nm , for all m. It is now possible to solve for xn , for
n = n1 ; :::; nr ; so that Bx = 0: Then Ax = 0 and x 6= 0.
Theorem: If A is an N N matrix and if Ax = 0 has no non-zero solution, then A

.
is row equivalent to the N N identity matrix 10 . . 01 = I:
Proof : Let B be a row reduced echelon matrix that is row equivalent to A. Bx = 0

has no non-zero solution. Hence the number of non-zero rows of B is N . Hence,
B = I:
2 Vector Spaces
Consider RN = fv = (v1 ; :::; vN ) jvn 2 R; for all ng; where R is the set of real
numbers. We can de…ne the following operations on RN : If v and w belong to RN ,
v + w = (v1 ; :::; vN ) + (w1 ; :::; wN ) = (v1 + w1 ; :::; vN + wN ): If c 2 R and v 2 RN ,
cv = c(v1 ; :::; vN ) = (cv1 ; :::; cvN ): Let 0 = (0; 0; :::; 0) 2 RN : Observe that
a) x + y 2 RN ; if x 2 RN and y 2 RN
b) x + y = y + x;
c) there is a 0 2 RN and 0 + x = x; for all x 2 RN ;
d) for all x 2 RN , there is a unique x 2 RN such that x + ( x) = 0;
e) 1x = x; for x 2 RN
f) (c1 c2 )x = c1 (c2 x); for all numbers c1 and c2 and for all x 2 RN ;
g) (x + y) + z = x + (y + z); for all x; y; and z in RN ;
h) c(x + y) cx + cy; for numbers c and for x and y in RN ;
i) (c1 + c2 )x = c1 x + c2 x; for all numbers c1 and c2 and for all x 2 RN :
De…nition: A vector space consists of a set, V , together with operations of addition

and multiplication by numbers, denoted x + y and rx, where x 2 V; y 2 V and r 2 R
and these operations satisfy (a)–(h) above with RN everywhere replaced by V:
De…nition: W is a subspace of V , if W V and W is a vector space under the

operations on V:
Note: W V is a subspace of V if v + w 2 W; for all v; w 2 W; and if cw 2 W

whenever c 2 R and w 2 W:
Example: V = ff : [0; 2 ] ! Rjf (0) = f (2 )g is a vector space under the operations

(f + g)(s) = f (s) + g(s); for all s 2 [0; 2 ] and (cf )(s) = cf (s); for all s:
W = fa sin + b cos j a 2 R and b 2 Rg is a subspace of V . W is the linear span
of sin and cos.
4
De…nition: If V is vector space, v 2 V is said to be a linear combination of
w1 ; :::; wN 2 V; if there are numbers c1 ; :::cN such that v = c1 w1 + + cN wN :
De…nitions: If w1 ; :::; wN 2 V; their linear span is the set of all linear combinations
of w1 ; :::; wN : The linear span of w1 ; :::; wN is a subspace of V and is the smallest
subspace containing w1 ; :::; wN : The vectors, w1 ; :::; wN are said to span V , if V is
the linear span of w1 ; :::; wN :
Now I try to get at the idea of the dimension of a vector space.
De…nition: Vectors v1 ; :::; vN in V are linearly dependent if there exist numbers

c1 ; :::cN , not all zero, such that c1 v1 + + cN vN = 0:
De…nition: Vectors v1 ; :::; vN in V are linearly independent if there are not depen-
dent.
0 1 0 1 0 1
1 0 1
Example: @0A ; @1A ; @1Aare dependent in R3 , since
0 0 0
0 1 0 1 0 1 0 1
1 0 1 0
@0A + @1A @1A = @0A
0 0 0 0
0 1 0 1 0 1
1 0 0
@0A ; @1A ; @0Aare independent, since
0 0 1
0 1 0 1 0 1 0 1 0 1
0 1 0 0 c1
@0A = c1 @0A + c2 @1A + c3 @0A = @c2 A ) c1 = c2 = c3 = 0
0 0 0 1 c3
Example: sin and cos are independent, for suppose that a sin + b cos = 0: Then
0 = a sin( =2) + b cos( =2) = a and 0 = a sin(0)+ b cos(0) = b, so that a = b = 0,
sin + 2 cos and 2 sin 4 cos are dependent, since 2(sin + 2 cos)+( 2 sin 4 cos) = 0
De…nition: A basis for a vector space V is a set of linearly independent vectors in

V that spans V .
Example: Let
en = (0; :::; 0; 1; 0; :::; 0) 2 RN
"
nth slot
e1 ; :::; eN is the standard basis for RN :
5
Theorem: If v1 ; :::; vM span a vector space V , then any independent set of vectors
in V has no more than M elements.
Proof : I must show that if N > M and w1 ; :::; wN are in V , then w1 ; :::; wN are
linearly dependent. Since v1 ; :::; vM span V , wn = M m=1 amn vm ; for all n and for
some numbers a1n ; :::; aM n : If x1 ; :::; xN are numbers, then
N
X N
X M
X
x1 w1 + + xN wN = xn wn = xn amn vm
n=1 n=1 m=1
N XM M N
!
X X X
= amn xn vm = amn xn vm
n=1 m=1 m=1 n=1
Since N > M , aPprevious theorem implies that there exist numbers x1 ; :::; xN not
all zero such that n amn xn = 0 for m = 1; :::; M: Hence, x1 w1 + + xN wN = 0
and w1 ; :::; wN are linearly dependent.
De…nition: A vector space is …nite dimensional, if it has a …nite basis.
Corollary: If V is a …nite dimensional vector space, then any two bases have the
same number of elements.
Proof : If v1 ; :::; vM and w1 ; :::; wN are bases, then N M and M N:
De…nition: The dimension of V is dim V = the number of vectors in a basis for V .
Lemma: Let v1 ; :::; vM in V be linearly independent and let w in V not belong to

the span of v1 ; :::; vM : Then v1 ; :::; vM ; w are linearly independent.
Proof : Suppose that c1 v1 + + cM vM + bw = 0. If b 6= 0, then w = bc v1

cM
b vM 2 span(v1 ; :::; vM ): This contradiction implies that b = 0: Therefore,
c1 v1 + + cM vM = 0: Since, v1 ; :::; vM are linearly independent, cm = 0, for all m:
Theorem: If W is a subspace of a vector space V of …nite and positive dimension

and W 6= V , then dim W < dim V:
Proof : Let M = dim W and N = dim V: I must show that M < N: If W = f0g then
dim W = 0 < dim V: Suppose that W 6= f0g. If w1 ; :::; wM are linearly independent
vectors in W , they are linearly independent in V and so M N: Therefore, there
is a linearly independent set of vectors in W with a largest number of elements, say
w1 ; :::; wr : By the previous lemma w1 ; :::; wr is a basis for W and r = dim W: Since
W 6= V; there is a v in V such that v 2 = W: By the previous lemma, w1 ; :::; wr ; v are
independent and hence N M + 1 > M:
6
Theorem: If v1 ; :::; vN is a basis for V and v 2 V , then the numbers c1 ; ::; cN , such
PN
that v = n=1 cn vn are unique.
N
X N
X N
X
Proof : cn vn = v = an vn =) (cn an )vn = 0 =) cn an = 0; for all n,
n=1 n=1 n=1
since v1 ; :::; vN are independent.
7
De…nition: If V and W are vector spaces, T = V ! W is linear or a linear

transformation if T (av1 + bv2 ) = aT (v1 ) + bT (v2 ), for all numbers a and b and for
all vectors v1 and v2 in V .
Example: Let T : RN ! RM be de…ned by T (x) = Ax, where A is an M N

matrix. Then T (ax + by) = A(ax + by) = aAx + bAy = aT (x) + bT (y):
Example: If f 2 V = ff : [0; 2 ] ! Rjf (0) = f (2 )g Let
f (s + ); if 0 s
(T f ) (s) =
f (s ), if s 2
T : V ! V is linear. If [0; 2 ) is thought of as a circle, the transformation T corre-

sponds to rotating the circle counterclockwise 180 and then applying the function
f.
Matrices can be used to represent any linear transformation from one …nite di-
mensional vector space to another. Let T : V ! W be linear, let v1 ; :::; vN be a
basis for V , and let w1 ; :::; wM be a basis for W . If v 2 V , there are unique numbers
x1 ; :::; xN such that v = x1 v1 + +xN vN : Since T (v) 2 W , there are unique numbers,
y1 ; :::; yM such that T (v) = y1 w1 + + yM wM . Since T (vn ) 2 W , for each n, there
are unique numbers a1n ; :::; aM n , such that T (vn ) = a1n w1 + + aM n wM : Therefore,
!
XN XN X
M XM XN XM
T (v) = xn T (vn ) = xn amn wm = amn xn wm = ym wm :
n=1 n=1 m=1 m=1 n=1 m=1
Let 0 1 0 1 0 1
y1 x1 a11 a1N
B C B .. C ; B .. .. C :
y = @ ... A ; x = @ . A A=@ . . A
yM xN aM 1 aM N
Then, y = Ax. The M N matrix A represents T in that there is one and only
one linear transformation T corresponding to A and one and only one matrix A
corresponding to T given the bases v1 ; :::; vN for V and w1 ; :::; wM for w:
Let S : W ! Q be a linear transformation and let q1 ; :::; qJ be a basis for Q. Let
the J M matrix B = (bjm ) representing S, so that
X
J
S (wm ) = bjm qj :
j=1
1
S T :V ! Q is the linear transformation de…ned by S T (v) = S(T (v)). Then
!
XM XM X
M XJ
S T (vn ) = S(T vn ) = S amn wm = amn S(wm ) = amn bjm qj
m=1 m=1 m=1 j=1
!
X
J X
M X
J
= bjm amn qj = cjn qj ,
j=1 m=1 j=1
PM
where cjn = m=1 bjm amn , so that the J N matrix C = (cjn ) represents S T.
De…nition: If A is an M N matrix and B is a J M matrix,P the matrix C = BA,

the product of B and A, is the J N matrix de…ned by cjn = Mm=1 bjm amn , and
0 1
c11 c1N
B .. .. C
C = @ . . A
cJ1 cJN
0 10 1
b11 b1M a11 a1N
B .. .. C B .. .. C = BA:
= @ . . A@ . . A
bJ1 bJM aM 1 aM N
Example: 0 1
2 3 2
1 1 0 @ 0 0 2 3 1
1 A= :
0 1 0 0 0 1
1 0 0
Remark: If the M N matrix A represents T and the J M matrix B represents

S, then the J N matrix C = BA represents S T .
Note: The order in which matrices are multiplied does not a¤ect the product. That
is, if A is an M N matrix, B is a J M matrix and C is a K J matrix, then
(CB)A = C(BA):
1
De…nition: An N N matrix A is invertible if there is an N N matrix A such
that
1 .. 0
A 1 A = AA 1
=I= . :
0 1
I is called the N N identity matrix and represents the identity function idV :
V ! V , where V is an N dimensional vector space and idV (v) = v, for all v 2 V .
Clearly, IA = AI = A; for any N N matrix A.
2
1
Lemma: If A and B are invertible, then AB is invertible and (AB) = B 1A 1.
Proof:
(B 1 A 1 )(AB) = B 1 (A 1 A)B = B 1 IB = B 1 B = I
(AB)(B 1 A 1 ) = AIA 1 = AA 1 = I:
De…nition: A function f : V ! W is invertible, if there exists f 1 : W ! V such

that f f 1 = idW and f 1 f = idV . That is, f (f 1 (w)) = w, for all w 2 W and
f 1 (f (v)) = v; for all v 2 V .
De…nition: f : V ! W is onto, if for every w 2 W , there exists a v 2 V such that

f (v) = w.
De…nition: f : V ! W is one to one, if for every v 2 V and v 2 V such that

v 6= v; f (v) 6= f (v).
Remarks:
1
1. f : V ! W is onto if and only if there exists f : W ! V such that
f (f 1 (w)) = w, for all w 2 W .
2. f is one to one, if and only if there exists f 1 : f (V ) ! V such that f 1

(f (v)) =
v, for all v 2 V , where f (V ) = ff (v) j v 2 V g.
3
3. f is one to one and onto, if and only if f is invertible.
1
Theorem: If T : V ! W is an invertible linear transformation, them T is linear.
Proof: Let w1 ; w2 2 W and c1 ; c2 2 R. Let v1 = T 1 (w1 ) and v2 = T 1 (w2 ). Since

T is linear, T (c1 v1 + c2 v2 ) = c1 T (v1 ) + c2 T (v2 ) = c1 w1 + c2 w2 . Therefore,
1 1 1 1
c1 T (w1 ) + c2 T (w2 ) = c1 v1 + c2 v2 = T T (c1 v1 + c2 v2 ) = T (c1 w1 + c2 w2 ):
Proposition: Let T : V ! V be a linear transformation and let v1 ; :::; vN a basis

for V . If A is the N N matrix representing T with respect to v1 ; :::; vN , then T is
invertible if and only if A is invertible.
Proof: If T is invertible and A 1 represents T 1 , then idV = T 1 T; A 1 A repre-

sents T 1 T , and I represents idV . Hence A 1 A = I. Similarly AA 1 = I.
If A is invertible, A 1 represents a linear transformation T 1 : V ! V and
A 1 A = I = AA 1 , which implies that T 1 T = T T 1 = idV .
Theorem: If A is an N N matrix, then the following are equivalent:

i) A is invertible,
ii) there is an N N matrix B such that BA = I, and
iii) the system Ax = 0 has no non-zero solution.
Proof: (i) ! (ii) Let B = A 1 .

(ii) ! (iii). If BA = I and Ax = 0, then 0 = BAx = Ix = x, so that x is zero.
(iii) ! (i) By a previous theorem, A is equivalent to the N N identity matrix
I. Equivalence is established via elementary row operations. Each elementary row
operation on A corresponds to left multiplication by an invertible matrix P . I check
this statement.
4
a) Multiplication of the rth row of A by c 6= 0 corresponds to P A, where
0 1
1 0 : : : 0 : : : 0
B0 1 : : : : : : : 0C
B C
B: : : : C
B C
B: : : :C
B C
B: 1 0 : C
P = B B0 : : : 0 c 0 : : 0C row r
C
B C
B: 0 1 : C
B C
B: : : : C
B C
@: : : 0A
0 : : : : 0 : : 0 1
"
column r
and
0 1
1 0 : : : 0 : : : 0
B0 1 : : : : : : : 0C
B C
B: : : :C
B C
B: : : :C
B C
B: 1 0 :C
P 1
= B
B0
C
B : : : 0 c 1 0 : : 0C
C row r
B: 0 1 :C
B C
B: : : :C
B C
@: : : 0A
0 : : : : 0 : : 0 1
"
column r
b) Replacement of the rth row of A by row r plus c times row s, where c 6= 0,
5
corresponds to P A, where
column s column r
# #
0 1
1 0 0 : 0 : : : 0 : : : : : 0
B0 1 0 : : : 0C
B C
B: : : :C
B C
B: 0 : : :C
B C
B0 : 0C
B : : 0 c 0 : 0 1 0 : : : C row r
B0 : : : 0 : : : 0 1 0 : : : 0C
B C
B: : : :C
B C
P = B
B: : : : :C
C
B: : : :C
B C
B: : : : :C
B C
B: : : :C
B C
B: : : : :C
B C
B: : : 0C
B C
@0 : : : 0 : : : 0 : : : 0 1 0A
0 : : : 0 : : : 0 : : : : 0 1
and
column s column r
# #
0 1
1 0 0 : 0 : : : 0 : : : : : 0
B0 : : : 0C
B C
B: : : :C
B C
B: 0 : : :C
B C
B0 : 0C
B : : 0 c 0 : 0 1 0 : : : C row r
B0 : : : 0 : : : 0 1 0 : : : 0C
B C
B: : : :C
B C
P 1
= B
B: : : : :C
C
B: : : :C
B C
B: : : : :C
B C
B: : : :C
B C
B: : : : :C
B C
B: : : 0C
B C
@0 : : : 0 : : : 0 : : : 0 1 0A
0 : : : 0 : : : 0 : : : : 0 1
c) Interchange of rows r and s of A corresponds to multiplication of A on the left

by the matrix
6
0 1
1 0 : : : 0 : : : : : : : : 0
B0 1 0 : : 0 : : : : : : : : 0C
B C
B: : : : :C
B C
B: : : : :C
B C
B0 : : 0 1 0 : : : : : 0 : : 0C
B C
B0 : 0C
B : : : : 0 : : : : 0 1 0 C row r
B0 : : : : 0 1 0 : : : 0 : : 0C
B C
P = B
B: : : : :C
C
B: : : : :C
B C
B: : : : :C
B C
B: 0 : : :C
B C
B0 : :C
B : : : 0 1 0 : : : : 0 : C row s
B 0 : 1 :C
B C
@ : : : 0A
0 : : : : 0 : : : : : 0 : 0 1
" "
column r column s
1
P = P:
So, I = P1 P2 ; :::; PQ A, where Pq is invertible, for all q. Let P = P1 P2 ; :::; PQ . P 1 =
PQ 1 ; :::; P1 1 , so that P is invertible. I = P A, since A is row reduced to I via left
multiplication by the matrix P . P 1 = P 1 P A = (P 1 P )A = IA = A. Therefore,
I = P 1 P = AP . Since P A = I = AP , A is invertible.
Corollary: If A is an N N matrix and BA = I, for some N N matrix B, then

B = A 1:
1
Proof: By the theorem, A is invertible. Therefore, BA = I implies that (BA)A =
A 1 , so that B = BI = B(AA 1 ) = (BA)A 1 = A 1 :
De…nition: If T : V ! W is a function, the image or range of T is fT (v) j v 2 V g :
De…nition: If T : V ! W is a linear transformation, the null space or kernel of T

is fv 2 V j T (v) = 0g:
Theorem: If T : V ! W is a linear transformation, then the range of T is a

subspace of W and the kernel of T is a subspace of V .
Proof: c1 T (v1 ) + c2 T (v2 ) = T (c1 v1 + c2 v2 ). T (v1 ) = 0 and T (v2 ) = 0 imply that

T (c1 v1 + c2 v2 ) = c1 T (v1 ) + c2 T (v2 ) = c1 (0) + c2 (0) = 0.
7
De…nition: If T : V ! W is a linear transformation, the rank of T is the dimension
of the range of T and the nullity of T is the dimension of the null space of T .
De…nition: If A is an M N matrix, the column rank of A = the dimension of

the linear span of the columns of A, and the row rank of A = the dimension of the
linear span of the rows of A.
I will later show that the row rank of A equals its column rank.
Theorem: Let T : V ! W be a linear transformation. Then, rank T + nullity

T = dim V .
Proof: Let v1 ; :::; vK be a basis for the null space of T . Extend v1 ; :::; vK to a
basis v1 ; :::; vK ; vK+1 ; :::; vN of V . I show that T (vK+1 ); :::; T (vN ) is a basis for the
range of T . The vectors T (v1 ); :::; T (vN ) span the range of T . Since T (vn ) = 0; if
n 5 K, T (vK+1 ); :::; T (vN ) span the range of T . I show that T (vK+1 ); :::; T (vN ) are
independent, so that T (vK+1 ); :::; T (vN ) is a basis for the range of T and hence rank
T = N K:
!
X N XN XN
cn T (vn ) = 0 = T cn vn =) cn vn
n=K+1 n=K+1 n=K+1
X
K X
K X
N
= bn vn =) bn v n cn vn = 0
n=1 n=1 n=K+1
=) bn = 0 and cn = 0, for all n; since v1 ; :::; vN are independent. Therefore, rank of

T + nullity of T = N K + K = N = dim V .
If T : V ! W is a linear transformation, then for any w 2 W , T 1 (w) =

v + T 1 (0), where v is any vector in V such that T (v) = w and where v + T 1 (0) =
fv + zjz 2 T 1 (0)g. In order to see that this is so, let z 2 T 1 (w): Then T (z v) =
T (z) T (v) = w w = 0: Hence z = v + (z v) 2 v + T 1 (0): Similarly any point in
v + T 1 (0) belongs to T 1 (w): It is possible to visualize the meaning of the assertion
T 1 (w) = v + T 1 (0) by considering a linear transformation T : R2 ! R:
8
The function T portrayed in this diagram may be thought of as a projection of R2
followed by a linear function from the vertical axis onto R.
De…nition: If T : V ! W is a linear transformation, T is non-singular if the kernel

of T is f0g.
Remark: T is non-singular if and only if T is one to one, since T (v1 ) = T (v2 ) if

and only if T (v1 v2 ) = 0.
Lemma: If T : V ! W is a linear transformation, then T is non-singular if and

only if T (v1 ); :::; T (vN ) are linearly independent whenever v1 ; ::::; vN are linearly in-
dependent.
Proof: Suppose that T is non-singular. If v1 ; :::; vN are linearly independent, then
c1 T (v1 ) + + cN T (vN ) = 0 =) T (c1 v1 + + cN vN ) = 0

=) c1 v1 + + cN vN = 0 =) c1 = c2 = = cN = 0:
Suppose T carries independent vectors to independent vectors. Let v 6= 0, where

v 2 V . The vector v by itself is independent. Therefore, T (v) is independent.
Therefore, T (v) 6= 0, since 0 is dependent. Therefore, the kernel of T is f0g.
Theorem: Let T : V ! W be linear and suppose that dim V = dim W . Then, the
following are equivalent.
1) T is invertible.
2) T is non-singular.
3) T is onto.
4) If v1 ; :::; vN is a basis of V , then T (v1 ); :::; T (vN ) is a basis of W .
5) There is a basis v1 ; :::; vN of V such that T (v1 ); :::; T (vN ) is a basis of W .
9
Proof: 1 =) 2. Obvious.
2 =) 3. Suppose that T is non-singular. Let v1 ; :::; vN be a basis of V . By the pre-
vious lemma, T (v1 ); :::; T (vN ) are independent. Since dim W = N , T (v1 ); :::; T (vN )
is a basis of W . If w 2 W , w = c1 T (v1 ) + + cN T (vN ) = T (c1 v1 + + cN vN ).
Therefore, T is onto.
3 =) 4. Let v1 ; :::; vN be a basis of V . Since these vectors span V and T is
onto, T (v1 ); :::; T (vN ) span W . Since dim W = N; T (v1 ); :::; T (vN ) are independent.
Therefore T (v1 ); :::; T (vN ) is a basis of W .
4 =) 5. Obvious.
5 =) 1. Suppose that there is a basis v1 ; :::; vN of V such that T (v1 ); :::; T (vN ) is
a basis of W . Then, rank T = dim W = dim V . Therefore, by a previous theorem,
nullity of T = 0. Therefore, T is one to one. Since rank T = dim W , T is onto.
Therefore, T is invertible.
10
Lemma: Let w1 ; :::; wM be independent

PM PM vectors in the vector space W . Then, the
vectors u1 = m=1 am1 wm ; :::; uK = m=1 amK wm are independent if and only if the
vectors 0 1 0 1 0 1
a11 a12 a1K
B a21 C B a22 C B a2K C
B C B C B C
B .. C B .. C
; ; :::; B .. C
@ . A @ . A @ . A
aM 1 aM 2 aM K
are independent.
Proof: Suppose that u1 ; :::; uK are independent. Then

0 1
0 1 0 1 X
K
a11 a1K B ck a1k C

B C
B a21 C B a2K C B k=1 C
B C B C B .. C
0 = c; B .. C + + cK B .. C = B C
@ . A @ . A B K . C
B X C
aM 1 aM K @ c k aM k A
k=1
P PM PK PK PM
implies that 0 = M m=1 (0)wm = m=1 k=1 ck amk wm = k=1 ck m=1 amk wm =
PK
k=1 ck uk ; which in turn implies that c1 = c2 = = cK = 0 since u1 ; :::; uK are
independent. Hence the vectors
0 1 0 1
a11 a1K
B a21 C B a2K C
B C B C
B .. C ; :::; B .. C
@ . A @ . A
aM 1 aM K
are independent.
Suppose that 0 1 0 1
a11 a1K
B a21 C B a2K C
B C B C
B .. C ; :::; B .. C
@ . A @ . A
aM 1 aM K
P
are independent. To show that u1 ; :::; uK are independent, suppose that 0 = K c k uk =
PK PM PM PK PK k=1
k=1 ck m=1 amk wm = m=1 k=1 ck amk wm; which implies that k=1 ck amk =
1
0, for all m; since w1 ; ::; wM are independent. Therfore,
0 1 0 1
a11 a1K
B a21 C B a2K C
B C B C
c1 B .. C + + cK B .. C = 0
@ . A @ . A
aM 1 aM K
which implies that c1 = c2 = = cK = 0, so that u1 ; :::; uK are independent.
Corollary: If the M N matrix A represents the linear transformation T : V ! W ,

then rank T = column rank of A:
Proof: Let v1 ; :::; vN be a basis for V and let w1 ; :::; wM be the basis for W , such
that A is the representation of T with respect to these bases. Let K be the rank of
T . Then, dim(span(T (v1 ); :::; T (vN ))) = K: There exist K of the vectors v1 ; :::; vN ,
say v1 ; :::; vK such that T (v1 ); :::; T (vK ) is a basis for the range of T; which equals the
span of T (v1 ); :::; T (vN ): By the lemma, the column vectors
0 1 0 1
a11 a1K
B a21 C B a2K C
B C B C
B .. C ; :::; B .. C
@ . A @ . A
aM 1 aM K
P PK PK PM
are independent. If n > K; M m=1 amn wm = T (vn ) = k=1 ck T (vk ) = k=1 ck m=1 amk wm =
PM PK
m=1 k=1 ck amk wm , for some numbers c1 ; ::; cK . Since w1 ; :::; wM are indepen-
P
dent, amn = K k=1 ck amk , for all m. That is,
0 1 0 1 0 1
a1n a11 a1K
B a2n C B a21 C B a2K C
B C B C B C
B .. C is in the linear span of B .. C ; :::; B .. C
@ . A @ . A @ . A
aM n aM 1 aM K
and so these vectors are a basis for the linear span of the columns of A.
Duality
De…nition: If V is a vector space, a linear functional on V is a linear function
f : V ! R: The set of all linear functionals on V is called the dual space of V and
denoted by V :
2
Remark: V is a vector space, where if f 2 V and g 2 V and a and b are numbers,
af + bg : V ! R is de…ned by (af + bg)(v) = af (v) + bg(v):
Let v1 ; :::; vN be a basis for V and Plet f 2 VP. Then (f (v1 ); :::; f (vN )) is the matrix
representation of f , so that if v = N n=1 cn vn 2 V , then
0 1
XN c 1
B C
f (v) = cn f (vn ) = (f (v1 ); :::; f (vN )) @ ... A :
n=1 cN
We may identify f with the N -vector (f (v1 ); :::; f (vN )).

For n = 1; :::; N , let fn : V ! R be de…ned by f (c1 v1 + + cN vN ) = cn . Then,
1, if m = n
fn 2 V and fn (vm )= That is, fn (vm ) = nm , where nm is the
0, otherwise.
Dirac delta function, so that the matrix representation of fn is (fn (v1 ); :::; fn (vN )) =
0; 1; 0; :::; 0) = en , where en is the nth standard basis vector. If f 2 V ,
(0; :::;P
f = N so that f1 ; :::; fN span V P
n=1 f (vn )fn ; P . The functions f1 ; :::; fN are linearly
independent, for if n an fn = 0, then 0 = ( n an fn )(vk ) = ak fk (vk ) = ak , for all
k. Therefore, f1 ; :::; fN is a basis for V . It is called the dual basis to v1 ; :::; vN .
Therefore, dim V = N = dim V .
Example: Let V = RN and let e1 ; :::; eN be the standard basis of RN . The dual
of V = (RN ) is f1 ; :::; fN , where, for all n and k, fn (ek ) = nk
basis P P:NIf y 2 V ,
N N
y = n=1 y n fn , for some numbers y1 ; :::; yN . If x 2 R , y(x) = y( k=1 xk ek ) =
PN P N PN PN
n=1 yn fn ( k=1 xk ek ) = n=1 yn fn (xn en ) = y x . Therefore, y may be iden-
PN n=1 n n
ti…ed with the vector (y1 ; :::; yN ) and y(x) = n=1 yn xn . Hence, V may be identi…ed
with RN .
De…nition: If V is a vector space and S is a subset of V , the annihilator of S is

the set S of linear functionals f on V such that f (v) = 0, for all v 2 S.
Remark: S is a subspace of V .
Example:
P If W = f(t; :::; t) 2 RN jt 2 Rg; W may be identi…ed with f(y1 ; :::; yN ) 2
N
RN j n=1 yn = 0g:
Theorem: If V is a …nite dimensional vector space and W is a subspace of V , then

dim W + dim W = dim V .
3
Proof: Let v1 ; :::; vK be a basis for W . Extend v1 ; :::; vK to a basis v1 ; :::; vK ;
vK+1 ; :::; vN of V . Let f1 ; :::; fN be the basis for V dual to v1 ; :::; vN :
I show you that fK+1 ; :::; fN is a basis for W : IfPn = K + 1, fn 2 W ; since
fn (vm ) = 0, for m 5 K, and for any w 2 W , w = K n=1 am vm , for some numbers
a1 ; :::; aK : The functions fK+1 ; :::; fN are linearly independent, since f1 ; :::; fN is a
basis for V . In order to show that fK+1 ; :::; P fN is a basis for W ; it is su¢ cient to
show that they span W . If f 2 V , f = N n=1 f (vn )fn . If f 2 W , f (vn ) = 0, for
PN
n 5 K. Therefore, f = n=K+1 f (vn )fn , and so fK+1 ; :::; fN span W:
Theorem: If A is an M N matrix, its row rank equals its column rank.
Proof: Let W RN be the linear span of the rows of A and let K = dim W =
row rank of A. Then dim W = N K. W may be viewed as a subset of RN under
the identi…cation of (RN ) with RN . Under this identi…cation, W is the set of all
solutions x of the equation Ax = 0.
Let T : RN ! RM be the linear transformation with matrix representation A with
respect to the standard bases of RN and RM . Then, W is the null space of T , and
the range of T is the linear span of the columns of A. Therefore, the column rank of
A equals the rank of T . We know that rank of T + nullity of T = N . Therefore, the
column rank of A = rank of T = N nullity of T = N dim W = N (N K) = K =
row rank of T .
Inner Product
standard inner product on RN is the function : RN
De…nition: The P RN ! R
de…ned by x y = N n=1 xn yn .
In this de…nition, the symbol “ ”stands for Cartesian product.
De…nition: If A and B are sets, the Cartesian product of A and B is A B =

f(a; b)j 2 A and b 2 Bg.
p p
De…nition: If x 2 RN ; jjxjj = x x = x21 + + x2N = the norm of x or the
length of x.
Remarks:
1. If a and b are numbers and x, y, and z belong to RN , then x y = y x and
x (ay + bz) = ax y + bx z. (These equations are easy to verify.)
xy
2. If x 2 RN and y 2 RN and is the angle between x and y, then cos = jjxjj jjyjj
:
(This equation is a little harder to verify.)
4
3. jx yj jjxjj jjyjj. This is called the Cauchy–Schwarz inequality. It follows
from (2).
4. x is perpendicular or orthogonal to y if and only if cos = 0, if and only if

x y = 0.
P
5. x x = N n=1 xn = 0 and x x = 0 implies that x = 0.
2
6. If y 2 RN , f (x) = y x is a linear functional on RN . Under the identi…cation of

(RN )* with RN mentioned earlier, y 2 RN is identi…ed with the linear functional
f (x) = y x on RN .
De…nition: If S is a subset of RN , let S ? = fy 2 RN jy x = 0, for all x 2 Sg. S ?

is called the orthogonal complement of S. Under the identi…cation of RN with (RN )* ,
S ? is identi…ed with S .
Remarks:
1. If W is a subspace of RN , dim W ? = dim W = N dim W .
2. If W is a subspace of V , which is a subspace of RN , we can write W ? = fv 2

V jv w = 0, for all w 2 W g as the orthogonal complement of W in V , (rather
than in RN ). Every vector v 2 V is the linear functional f on V de…ned by
f (v 0 ) = v v 0 . Therefore, under the identi…cation of RN with (RN )* , we have
that V V . Since dim V * = dim V , V = V * . Hence, W ? = W and dim
?
W = dim V dim W .
5
Orthogonal Projections
Let W be a subspace of V , which is a subspace of RN .
De…nition: An orthogonal projection : V ! W is a linear function :V !W

such that v (v) 2 W ? . That is, [v (v)] w = 0, for all w 2 W .
Example: V = R2 : W = f( t; t)j t is a numberg, W ? = f(t; t)jt 2 Rg. (4; 14) =

( 5; 5), since (4; 14j ( 5; 5) = (9; 9) 2 W ? .
Figure 1
Theorem: An orthogonal projection exists and is unique.
Proof: Let v1 ; :::; vK be a basis for W and let vK+1 ; :::; vM be a basis for W ? V ,
where M = dim V and K = dim W . I show that vP 1 ; :::; vM is a basis for V . First I
show that v1 ; :::; vM are independent. Suppose that M n=1 cn vn = 0. Then c1 v1 + +
?
cK vK = cK+1 vK+1 cM vM . Hence, w = c1 v1 + +cK vK 2 W \W . Therefore,
0 = w w, so that w = 0. Since v1 ; :::; vK are independent, c1 = = cK = 0. Since
0 = cK+1 vK+1 cM vM and vK+1 ; :::; vM are independent, cK+1 = = cM = 0.
Therefore, v1 ; :::; vM are independent.
PM Since M = dim PVK, v1 ; :::; vM is a basis for V .
If v 2 V , then, v = c
n=1 n nv . Let (v) = n=1 cn vn . Then, v (v) =
PM ? ?
n=K+1 cn vn . Since vn 2 W , for n > K, v (v) 2 W . Hence, (v) exists.
In order to show that (v) is unique, suppose P that v 2 V and v =P w^ + (v w),^
where w^ 2 W and v w^ 2 W ? . Then, w^ = K a
n=1 n n v and v w
^ = M
n=K+1 n n ,
a v
1
since v1 ; :::; vKPis a basis for W and vK+1 ; :::; vM is a basis for W ? . Therefore, v =
w^ + v w^ = M n=1 an vn .PSince v1 ; :::; vM is a basis for V , the numbers a1 ; :::; aM are
unique. Therefore, w^ = K n=1 an vn = (v).
Orthonormal Bases
De…nition: A set of vectors v1 ; :::; vM in RN is said to be orthogonal if vn vm = 0,
whenever n 6= m.
Theorem: Orthogonal non-zero vectors are linearly independent.

P
Proof: Let v1 ; :::; vM be orthogonal and suppose that 0 = M n=1 cn vn . For all k,
PM PM
0 = vk 0 = vk n=1 cn vn = n=1 cn vk vn = ck vk vk . Since vk vk > 0, ck = 0.
Hence ck = 0, for all k.
De…nition: A basis v1 ; :::; vM for V is said to be orthonormal if it is orthogonal

and if vn vn = 1, for all n.
Remark: If v1 ; :::; vM is an orthonormal basis for V , then for any v 2 V , v =

PM PM PM
n=1 (v vn )vn , for if v = n=1 an vn , then, for any k, v vk = n=1 an vn vk =
PM
n=1 an vn vk = ak vk vk = ak .
PK
Remark: If v1 ; :::; vM is an orthonormal basis for V and v 2 V; n=1 (v:vn )vn is the
orthogonal projection of v onto the linear span of v1 ; :::; vK :
Theorem: Every vector space V RN has an orthonormal basis.
Proof: Let y1 ; :::; yM be a basis for V . I de…ne the vk by induction on k Let v1 =

y
p 1 . Then, v1 v1 = 1. . Suppose we are given v1 ; :::; vk such that vn vn = 1, if n k
y1 y1
and v1 ; :::; vk are orthogonal and vn is a linear combination of y1 ; :::; yn , for n = 1; :::; k.
Let wk+1 = yk+1 (yk+1 v1 )v1 (yk+1 vk )vk . Then, wk+1 6= 0, for otherwise
y1 ; :::; yk+1 would be linearly dependent and this would contradict the independence
of y1 ; :::; yM . If n k, wk+1 vn = yk+1 vn (yk+1 vn )vn vn = yk+1 vn yk+1 vn = 0.
wk+1
Let vk+1 = pwk+1 wk+1
. Then vk+1 vn = 0, if n k and vk+1 vk+1 = 1. This
completes the induction and hence the de…nition of v1 ; :::; vM . Since v1 ; :::; vM are
independent and dim V = M , it follows that v1 ; :::; vM is a basis for V .
The construction just presented is called the Gram–Schmidt orthogonalization

process. Notice that in the inductive construction, wk+1 is the di¤erence between
2
yk+1 and the projection of yk+1 onto the linear span of v1 ; :::; vk ; which equals the
linear span of y1 ; :::; yk :
Determinants
De…nition: A permutation of f1; :::; N g is a one to one and onto function :
f1; :::; N g ! f1; :::; N g:
Every permutation can be expressed as a succession of interchanges of pairs. In

Figure 2, the permutation on the left is the result of the three successive interchanges
shown on the right. A permutation can be expressed in many ways as a succession of
pairwise interchanges, but the number of pairwise interchanges for one permutation
is either always odd or always even.
Figure 2
Say that is odd if the number of interchanges is odd. Otherwise, is even. Let
1, if is even
the sign of be sgn = :
1, if is odd.
Let A = (amn ) be an N N matrix. The determinant of A is
X
det A = (sgn )a1; (1) a2; (2) ; :::; aN; (N ) :
is a permutation
of f1;:::;N g
That is, pick one entry from each row, every time from a di¤erent column, and
multiply these N numbers together. The choice of column de…nes a permutation of
f1; :::; N g. Multiply the product by the sign of this permutation. Add these products
over all possible permutations. The sum is the determinant.
3
a11 a12
If N = 1, A = (a11 ) and det A = a11 . If N = 2, then A = and det
a21 a22
A = a11 a22 a21 a12 .
For any N , det I = 1, where I is the N N identity matrix. That is,
0 1
1 0 : : : 0
B0 1 0 : : 0C
B C
B: :C
B C
I=B B: :C
C;
B: :C
B C
@0 : : 0 1 0A
0 : : : 0 1
where I has N rows and N columns.
The determinant of an N N matrix A may be considered to be a function of
the N rows of A, each of which is a vector in RN : In order to descriibe this function,
I need the following notation.
De…nition: If S is a set and K is a positive integer, let
SK = S S S ;
K times !
where is the Cartesian product.
The determinant of N N matrice is then a function det : RN RN ! R: It

should be clear from the formula for the determinant that, for any n, det(a1 ; :::; an 1 ; can +
dbn ; an+1 ; :::; aN ) = c det(a1 ; :::; an 1 ; an; an+1 ; :::; aN )+d det(a1 ; :::; an 1 ; bn ; an+1 ; :::; aN );
where a1 ; :::; aN and bn are N -vectors and c and d are numbers. Such a function is
said to be multilinear.
De…nition: A multilinear form on a vector space V is a function f : V K ! R such

that, for k = 1; :::; K; f (v1 ; :::; vk 1 ; vk ; vk+1 ; :::; vK ) is a linear function of vk ; where
vn is held …xed for n 6= k:
If we interchange two rows of A; we change the sign of A; for suppose that A0

is obtained from A by interchanging rows n and k; where n 6= k: Let be the
4
permutation of f1; :::; N g that interchanges n and k: Then
X
det A = (sgn )a1; (1) ; :::; aN; (N )
X
= (sgn )a1; (1) ; :::; aN; (N )
X
= sgn( )a1; (1) ; :::; aN; (N )
X
= sgn( )a1; (1) ; :::; aN; (N ) = det A:
A multilinear form with this property is said to be alternating.
De…nition: The multilinear form f : V N ! R is alternating if f (v1 ; :::; vk 1; vk ; vk+1 ;

:::; vn 1; vn; vn+1 ; :::; vN ) = f (v1 ; :::; vk 1; vn; vk+1 ; :::; vn 1; vk; vn+1 ; :::; vN ); for any k <
n: That is, interchanging two variables of f changes its sign.
If the multilinear form f is alternating and vk = vn = v; where k < n; then
f (v1 ; :::; vN ) = 0: This is so because if we interchange vk and vn we do not change the
value of f (v1 ; :::; vN ); yet we change its sign.
If f is an alternating multilinear form and k < n; then
f (v1 ; :::; vn 1; avk + vn; vn+1 ; :::; vN )
= af (v1 ; :::; vn 1; vk; vn+1 ; :::; vN ) + f (v1 ; :::; vn 1; vn; vn+1 ; :::; vN )
= f (v1 ; :::; vn 1; vn; vn+1 ; :::; vN )
for any number a: That is, adding a multiple of one variable to another one does not
change the value of f:
Since the determinant is an alternating multilinear form, I have proved the fol-
lowing.
Theorem: Let A be an N N matrix

1) If B is obtained from A by multiplying one row of A by a number, c, then
det B = c det A
2) If B is obtained from A by adding a multiple of one row to another, then
det B = det A:
3) If B is obtained from A by interchanging two rows of A, then det B = det A:
4) If two rows of A are equal, then det A = 0:
This theorem relates the determinant to elementary row operations, which can be
used to simplify a matrix and hence compute its determinant.
Of course, f (e1;:::; eN ) = det I; where I is the N N identity matrix and en is the
nth standard basis vector for RN : The following two theorems may be proved using
elementary row operations.
5
Theorem: The determinant is the unique alternating multilinear form
f : (RN )N ! R
such that f (e1 ; :::; eN ) = 1:
Theorem: If A and B are N N matrices. then det(AB) = (det A)(det B):
Corollary: If A is an invertible matrix, then det A 6= 0 and det(A 1 ) = (det A) 1 :
Proof: (det A)(det A 1 ) = det(AA 1 ) = det I = 1
De…nition: If A is an M N matrix, the transpose of A; AT ; is the N M matrix

with (m; n)th entry equal to ATmn = Anm for all m and n:
Example:
0 1T
6 1
@4 2A = 6 4 2
1 2 5
2 5
Theorem: If A is an N N matrix, det AT = det A.
The determinant has a geometric interpretation. Each of the rows of A is a vector

in RN : Form all sums of …nitely many of these vectors. The vectors together with
the sums and the origin, 0; of RN form the corners of a parallelepiped in RN . The
volume of this parallelepiped is the absolute value of the determinant of A, j det Aj.
The sign of det A depends on the order of the rows of A and is more complicated to
explain.
Determinants may be computed using cofactors.
De…nition: If A is an N N matrix and m and n are such that 1 m N and

1 n N , let A(m j n) be the (N 1) (N 1) matrix obtained from A by
eliminating the mth row and nth column of A.
De…nition: The (m; n)th cof actor of A is Cmn = ( 1)m+n det A(mjn):
PN
Theorem: For n such that 1 n 5 N; det A = m=1 amn Cmn :
6
Remarks:
1. This is the expansion of the determinant by cofactors along column n.
2. This formula allows computation of determinants by induction on the number

of rows and columns in a square matrix. Thus,
1 1 det a = a
a a12
2 2 det 11 = a11 a22 a21 a12
0a21 a22 1
a11 a12 a13
@ a a23
3 3 det a21 a22 a23 A = a11 det 22
a32 a33
a31 a32 a33
a a13 a a13
a21 det 12 + a31 det 12
a32 a33 a22 a23
and so on.
PN
3. Since det A = det AT , det A = n=1 amn Cmn . This is the expansion of the
determinant by cofactors along row m.
PN
Theorem: If k 6= n, then m=1 amn Cmk = 0.
Proof: Replace the kth column of A by the nth column, obtaining the matrix B.
Then by the de…nition of B, B(m j k) = A(m j k). Since B has two equal columns,
X
N X
N X
N
m+k m+k
0 = det B = ( 1) bmk det B(m j k) = ( 1) amn det A(m j k) = amn Cmk ;
m=1 m=1 m=1
where bmk is the (m; k)th entry of the matrix B.

P
In conclusion, Nm=1 amn Cmk = nk det A, where nk is the Kronecker delta func-
tion de…ned by the equation
1; if n = k
nk =
0; otherwise.
Because det A = det AT ,

X
N
amn Ckn = mk det A;
n=1
for all m and k.
7
De…nition: The adjoint matrix of A is the matrix adj A, the (m; n)th entry of
which is
(adj A)mn = Cnm = ( 1)n+m det A(n j m);
for m and n such that 1 n, m N.
The adjoint matrix is the transpose of the matrix of cofactors of A, where the
(m; n)th entry of the matrix of cofactors is Cmn . Notice that the (m; n)th entry of
the product matrix (adj A)A is
X
N X
N X
N
(adj A)mj ajn = Cjm ajn = ajn Cjm = nm det A:
j=1 j=1 j=1
Therefore (adj A)A = (det A)I, where I is the N N identity matrix. Hence
1
(adj A)A = I;
det A
if det A 6= 0. That is, if det A 6= 0, A has a left inverse and so is invertible and
1 1
A = adj(A):
det A
Conversely if A is invertible, then A 1 A = I, so that 1 = det I = det A 1 A =
(det A 1 )(det A) and so det A 6= 0 and
1 1
det A = :
det A
This proves the following theorem.
Theorem: An N N matrix A is invertible, if and only if det A 6= 0, and in this

case
1 1
A = adj(A):
det A
8
Singular Matrices
De…nition: Let V be a …nite dimensional vector space. The linear transformation
T : V ! V is singular if T (v) = 0; for some v 6= 0:
De…nition: An N N matrix A is singular if Ax = 0; for some non-zero N vector

x:
Remark: A is singular if and only if the corresponding linear transformation T :

RN ! RN is singular.
Corollary: A is singular if and only if det A = 0:
Characteristic Values and Vectors

p
A complex number is of the form a+bi; where a and b are real numbers and i = 1:
Complex numbers may be added, subtracted, multiplied, or divided. Thus
(a + bi) + (c + di) = (a + c) + (b + d)i;

(a + bi) (c + di) = (a c) + (b d)i;
(a + bi)(c + di) = (ac bd) + (ad + bc)i:
a+bi
The complex number c+di
is found by solving the equation
a + bi = (c + di)(x + yi)
for x and y: This equation becomes
a + bi = (cx dy) + (cy + dx)i;
which implies the two simultaneous linear equations
cx dy = a
dx + cy = b:
The solution of these equations is

ac + bd bc ad
x= 2 2
and y = 2 :
c +d c + d2
1
Notice that any real number, r; is also complex, in that it may be written as r +0i:
Normally a number is said to be complex if it is of the form a + bi; where b 6= 0:
The key property of complex numbers is that any polynomial equation
o = p(x) = aN xN + aN 1x
N 1
+ ::: + a1 x + a0
with complex (or real) coe¢ cients has a complex solution. Furthermore if aN 6= 0;
then the polynomial p(x) may be written as
p(x) = aN (x b1 )(x b2 )::::(x bN );
where b1 ; :::; bN are the N roots of p; which means that they are solutions of the
equation p(x) = 0 These roots may not all be distinct. A complex number b is said
to be a multiple root of p; if b = bn ; for more than one value of n: If aN 6= 0; the
positive integer N is said to be the degree of p:
De…nition: If A is an N N matrix, then a real or complex number is a char-

acteristic value of A (or eigenvalue or proper value) if the matrix A I is singular
where I is the N N identity matrix. The N -vector x is a characteristic vector (or
eigenvector) of A if (A I)x = 0: If is a complex number, the components of x
may be complex.
Remark:
1. The same terminology applies to linear transformations T : V ! V:
2. is a characteristic value of A if and only if det(A I) = 0 if and only if

det( I A) = 0:
3. is a characteristic value of A if and only if is a root of the polynomial

equation p(x) = det(xI A); where x varies over the complex numbers.
4. p(x) = det(xI A) is called the characteristic polynomial of A.
5. p(x) is a polynomial of degree N . It has N complex roots, counting multiplici-

ties, and these are the characteristic values of A.
a b
Example: Let A = () ; where a and b are real numbers. The characteristic
b a
equation of A is
x a b
0 = det(xI A) = det = (x a)2 + b2 = x2 2ax + a2 + b2 :
b x a
2a
p
4a2 4a2 4b2 2a 2bi
p
Therefore, x = 2
= 2
=a bi. where i= 1:
2
Theorem: det(A) = 1 ; :::; N ;where 1 ; :::; N are the characteristic values of A:
Proof: det(xI A) = (x 1 )(x 2 ) : : : (x N ):Let x = 0. Then, ( 1)N det(A) =

det( A) = ( 1)N 1 ::: N ; so that det(A) = 1 : : : N:
De…nition:p If = a + bi is a complex number, where a and b are real numbers,

then j j = a2 + b2 :
Theorem: Let A be an N N matrix and suppose that 1 ; :::; N are the charac-
teristic values of A. If j n j < j for all n; then limk!1 Ak = 0 where Ak = AA
| {z A}
k times
Quadratic Forms
De…nition: If V is a vector space, f : V V ! R is a bilinear form on V if for
each v 2 V; f (v; v) and f (v; v) are linear functions of v.
Bilnear forms have a matrix representation. Let v1 ; :::; vN be an ordered basis of
V and let f : V V ! R be a bilinear form. For each m and n,Plet amn = f (vm ; vn ):
Let A be the N N matrix with (m; n)th entry amn : If v = N m=1 bm vm and w =
PN
n=1 cn vn belong to V , then
!
X X XX
f (v; w) = f bm vm ; cn vn = bm cn f (vm ; vn )
m n m n
0 1
X
N X
N c1
B C
= bm cn amn = (b1 ; :::; bN )A @ ... A
m=1 n=1 cN
The N N matrix A represents f . That is, given a basis 0v1 ;1:::; vN for V , there is one
c1
B C P
and only one matrix A such that f (v; w) = (b1 ; :::; bN )A @ ... A ; where v = N n=1 bn vn
cN
PN
and w = n=1 cn vn :
De…nition: An N N matrix A is symmetric, if A = AT :
De…nition: If f is a bilinear form on V , f is symmetric, if f (v; w) = f (w; v); for

all v and w:
Remark: The quadratic form f is symmetric, if and only if the matrix A repre-
senting it is symmetric.
3
De…nition: The quadratic form associated with a symmetric bilinear form f is
q(v) = f (v; v):
De…nition: A symmetric bilinear form f or its associated quadratic form q is

1. positive de…nite if q(v) > 0; for all v 6= 0;
2. positive semi-de…nite if q(v) 0; for all v;
3. negative de…nite if q(v) < 0; for all v 6= 0; and
4. negative semi-de…nite if q(v) 0; for all v;
The same de…nitions apply to symmetric N N matrices, for each of these rep-
resents a symmetric bilinear form. Thus, the symmetric N N matrix A is positive
de…nite if v T Av > 0, for any non-zero N -vector, v, etc.
If A is an N N matrix and 1 k N , let Ak be the k k submatrix obtained
by eliminating the last N k rows and columns of A:
0 1
a11 a1k
B .. C
Ak = @ ... . A
akl akk
Theorem: The N N symmetric matrix A and any bilinear form it represents, is

1. negative de…nite, if and only if ( 1)k det Ak > 0; for all k = 1; :::; N; and
2. positive de…nite, if and only if det Ak > 0; for all k = 1; :::; N:
Remarks: A symmetric N N matrix A is positive de…nite if and only if all of its

characteristic values are positive. A is positive semi-de…nite if and only if all of its
characteristic values are non-negative. A is negative semi-de…nite if and only if all of
its characteristic values are non-positive. A is negative de…nite if and only if all of its
characteristic values are negative.
Proof: I prove only the only if statement and that only for the positive de…nite
case. The proofs of the other cases are similar.
Let be a characteristic value of A and let x be a corresponding characteristic
vector. Then,
(A I)x = 0;
so that
Ax = x
and hence
xT Ax = xT x
It follows that if A is positive de…nite, then 0 < xT Ax = xT x; so that > 0:
4
Real Analysis
We know what it means for numbers to be close to each other. The part of real
analysis we will use has to do with generalizations of the notion of closeness and
applications of it.
De…nition: An open ball of radius " about a point y in RN is fx 2 RN jk x y k<

"g = B" (y).
De…nition: A subset U of RN is said to be open if for every y 2 U , there is an

" > 0 such that B" (y) U .
Examples:
1. RN is open.
2. The empty set, ;; is open in RN , because any assertion about nothing is true.
3. (0; 1) = fx 2 R j 0 < x < 1g is open in R.
4. [0; 1] = fx 2 R j 0 x 1g is not open in R.
5. f(x0 ; 0) 2 R2 j 0 < x0 < 1g is not open in R2 .
6. If " > 0 and x 2 RN , B" (x) is open in RN .
De…nition: If A is a subset of the set X; X n A = fx 2 X j x 2

= Ag is the (set
theoretic) complement of A.
Remark: The following equations are referred to as De Morgan’s laws. If A and B

are subsets of the set X; then X n (A [ B) = (X n A) \ (X n B) and X n (A \ B) =
(X n A) [ (X n B): Similarly if A is a set of subsets of X; then
[ \
X n( A) = (X n A) and
A2A A2A
!
\ [
Xn A = (X n A):
A2A A2A
De…nition: A subset A of RN is closed if its complement, RN n A, is open.
5
Examples:
1. RN and ; are closed in RN .
2. [0; 1] is closed in R.
3. (0; 1) is not closed in R.
4. f(x0 ; 0) 2 R2 j 0 x0 1g is closed in R2 .
De…nition: A sequence in a set X is a function x : f1; 2; :::g ! X. It is denoted

by xn or x1 ; x2 ; ::::
Example: xn = sin(n2 ), n = 1; 2; ::::
De…nition: A sequence xn in RN converges to x, if for every " > 0, there exists an

integer M such that kx xn k < ", if n = M . To indicate that xn converges to x, we
write limn!1 xn = x.
Examples: lim 1 = 0. lim n 1

= 1: The sequence xn = sin(n2 ) does not converge.
n!1 n n!1 n+1
Theorem: A subset A of RN is closed if and only if every convergent sequence in

A converges to a point in A.
Proof: Suppose that A is closed and that xn is a sequence in A converging to x.

Suppose that x 2 = A. Because A is closed, RN n A is open, so that for some " > 0,
B" (x) RN n A. That is, B" (x) \ A = ;. Since limn!1 xn = x, there is N such that
xn 2 B" (x), if n N . But then, xn 2 = A, if n N , which is impossible since xn is a
sequence in A.
Suppose that every sequence in A converges to a point in A. If A is not closed,
then RN n A is not open, so that there exists an x 2 RN n A, such that B" (x) \ A 6= ;,
for every " > 0. Then, for every positive integer n, there exists an xn 2 A, such that
kx xn k < 1=n. Since limn!1 xn = x, x 2 A, which is impossible. Therefore A is
closed.
Theorem: If A and B are open subsets of RN , then A \ B is open. If U is a

collection of open subsets of RN , then [u2U U is open.
If A and B are closed subsets of RN , then A [ B is closed. If C is a collection of
closed subsets of RN , then \C2C C is closed.
6
Proof: I show that A \ B is open if A and B are open. If x 2 A \ B, there
is "A > 0, such that B"A (x) A and there is "B > 0, such that B"B (x) B. Let
" = min("A ; "B ). Then, B" (x) B"A (x) A and B" (x) B"B (x) B, so that
B" (x) A \ B. Therefore, A \ B is open.
I show that [U 2U U is open. If x 2 [U 2U U , then x 2 U 0 , for some U 0 2 U. Since
U 0 is open, there is " > 0 such that B" (x) U 0 [U 2U U . Therefore, [U 2U U is open.
I show that A[B is closed if A and B are closed. RN n(A[B) = (RN nA)\(RN nB).
Since A and B are closed, RN n A and RN n B are open, so that (RN n A) \ (RN n B)
is open, so that RN n (A [ B) is open and hence A [ B is closed.
If C is closed, RN nC is open. Therefore, [C2C (RN nC) is open and so RN n(\C2C C)
is open. Therefore, \C2C C is closed.
Examples:
1. The intervals [ n1 ; 1] are closed, for n = 1; 2; :::, yet
S
1 1
; 1 = (0; 1] = fx 2 R j 0 < x 1g is not closed.
n=1 n
1
2. The intervals ( n
;1 + n1 ) are open, for n = 1; 2; :::, yet
T
1 1 1
;1 + = [0; 1] is not open.
n=1 n n
De…nition: If A B RN , A is open in B, if for every x 2 A, there is an " > 0

such that B" (x) \ B A. A is closed in B if B n A is open in B.
Examples
1. (0; 1] = fx j 0 < x 1g is open in [0; 1] = fx j 0 x 1g; but is not open in
R:
2. (0; 1] is closed in (0; 2); but is not closed in R:
3. f(x0 ; 0) j 0 < x0 < 1g is open in f(x0 ; 0) j 1 < x0 < 1g, though it is not
open in R2 .
Theorem: If A B RN , A is open in B if and only if A = B \ U , where U is

open in RN .
Proof: If A = B \ U , where U is open in RN , then it should be clear that A is

open in B.
Suppose that A is open in B. For each x 2 A, let "(x) > 0 be such that B"(x) (x) \
B A. Let U = [x2A B"(x) (x). Then, U is open, as the union of open sets, and
A=B\U
7
De…nition: Let A RN , B RM , and f : A ! B. Then, f is continuous if for
every U B that is open in B, f 1 (U ) = fx 2 A j f (x) 2 U g is open in A.
Theorem: f : A ! B is continuous if and only if for every C B that is closed

in B, f 1 (C) is closed in A.
Proof: f -1 (B nC) = Anf 1 (C). If f is continuous, and C is closed in B, f 1 (B nC)

is open in A, since B n C is open in B. Therefore, A n f 1 (C) is open in A and hence
f 1 (C) is closed in A.
Suppose that f 1 (C) is closed in A whenever C is closed in B. Then, if U B is
open in B, B n U is closed in B, so that f 1 (B n U ) = A n f 1 (U ) is closed in A and
hence f 1 (U ) is open in A and so f is continuous.
Theorem: f : A ! B is continuous, if and only if limn!1 f (xn ) = f (limn!1 xn ),

whenever xn is a sequence in A that converges to a point in A.
Proof: Suppose that f is continuous and that xn is a sequence in A converging to

x in A. If " > 0, B" (f (x)) \ B is open in B, so that A \ f 1 (B" (f (x))) is open in
A. Since, x 2 f 1 (B" (f (x))), there is a > 0 such that A \ B (x) f 1 (B" (f (x))):
Since, limn!1 xn = x, there is a positive integer N such that xn 2 B (x), if n N .
Therefore, if n N , xn 2 f 1 (B" (f (x))) and hence kf (xn ) f (x)k < ". Therefore
limn!1 f (xn ) = f (x).
Suppose that whenever xn is a sequence in A converging to a point x in A,
limn!1 f (xn ) = f (limn!1 xn ). In order to show that f is continuous, let C be a
subset of B that is closed in B. I must show that f 1 (C) is closed in A, that is, that
A n f 1 (C) is open in A. If A n f 1 (C) is not open in A, there is an x 2 A n f 1 (C)
such that for every " > 0, B" (x) \ f 1 (C) 6= ;. Therefore, for every positive integer
n, there exits xn 2 f 1 (C) such that kxn xk < 1=n. Therefore, limn!1 xn = x,
so that limn!1 f (xn ) = f (x). Since, f (xn ) 2 C and C is closed, f (x) 2 C. Hence,
x 2 f 1 (C), which is impossible. Therefore, f is continuous.
Examples:
1. f : (0; 1) ! (0; 1) de…ned by f (x) = 1=x is continuous.
8
0, if x = 0
2. f : [0; 1) ! (0; 1) de…ned by f (x) = 1 is not continuous.
x
, if x > 0.
1, if 0 x < 1=2
3. f : [0; 1] ! [0; 1) de…ned by f (x) = f is not continuous.
0, if 1=2 x 1.
De…nition: If f : A ! B, f is continuous at x 2 A if for every " > 0, there is a

> 0 such that kf (x) f (y)k < ", if kx yk < .
9
Theorem: f : A ! B is continuous at x if and only if for every sequence x1 ; x2 ; :::
in A that converges to x, limn!1 f (xn ) = f (x).
Proof: The argument should be clear, given what has been presented earlier.
Theorem: f is continuous on A, if and only if f is continuous at every point in A.
Proof: This is so because f is continuous if and only if f (limn!1 xn ) = limn!1 f (xn ),

for every sequence xn in A converging to a point in A.
10
De…nition: A sequence of numbers, x1 ; x2 ; :::; is said to be Cauchy if for every

" > 0, there exists an integer N such that jxn xm j < ", if n > N and m > N .
The Completeness Property of the Real Numbers: If x1 ; x2 ; ::: is a Cauchy

sequence of numbers, then there is a number x such that limn!1 xn = x.
The completeness property extends to vectors by applying it to each component.
De…nition: A sequence of vectors in RN , x1 ; x2 ; :::; is said to be Cauchy if for every

" > 0, there exists an integer M such that kxn xm k < ", if n > M and m > M .
Lemma: If x1 ; x2 ; :::; is a Cauchy sequence of vectors in RN , then there is a vector

y 2 RN , such that limn!1 xn = y.
Proof: For each n, let xn = (xn1 ; :::; xnN ). If xn is Cauchy, then for each k =
1; :::; N , the sequence xnk is Cauchy. Therefore, there is a number yk such that
limn!1 xnk = yk . Let y = (y1 ; :::; yN ). Then limn!1 xn = y.
The completeness property of the real numbers may be expressed by saying that
every set of numbers with an upper bound has a least upper bound or every set of
numbers with a lower bound has a greatest lower bound.
De…nitions: If X is a set of numbers, the number b is an upper bound for X if

x b, for all x in X. If X has an upper bound, X is said to be bounded from above.
The number c is said to be a least upper bound for X, if c is an upper bound for X
and c b, for any upper bound b for X.
The least upper bound for X is denoted by lub X or supX, which is read as “the
supremum of X.” In an analogous fashion, we may de…ne “bounded from below,”
“lower bound,” and “greatest lower bound.” The greatest lower bound is written as
glb X or as inf X, read as “the in…mum of X.” Clearly glb X = lub( X), where
X = f x j x belongs to Xg, so that a set that is bounded from below has a greatest
lower bound if and only if a set that is bounded from above has a least upper bound.
Least Upper Bound Property: Any set of numbers that is bounded from above
has a least upper bound
1
Theorem: The least upper bound property is equivalent to the completeness prop-
erty.
De…nition: A subset A of RN is bounded if there is a number b > 0 such that

jjxjj b, for all x 2 A:
De…nition: A subset A of RN is compact if it is closed and bounded.
De…nition: A subsequence of the sequence x1 ; x2 ; ::: is a sequence of the form

xn1 ; xn2 ; :::; where nk+1 > nk ; for all k:
Theorem: Every convergent sequence in RN is bounded.
Proof: Let x1; x2; :::; be a sequence in RN that converges to x: There is a positive
integer M such that kxn xk 1; if n > M: Then kxn k 5 max(kx1 k ; :::; kxM k ; kxk+
1); for all n:
Theorem: If x1; x2; :::; is a sequence in RN that converges to x; then every subse-
quence of x1; x2; :::; converges to x:
Proof: If " > 0; let M be a positive integer such that kxn xk < " if n = M: If
xn1; xn2 :::; is a subsequence of x1; x2; :::;then nk = k; for all k; so that kxnk xk 5 "; if
k = M: Therefore xn1; xn2; :::; converges to x:
Theorem (Bolzano-Weierstrass): A subset A of RN is compact, if and only if

every sequence in set A has a subsequence that converges to a point in A:
Proof: Assume that every sequence in A has a subsequence that converges to a

point in A: I show that A is closed and bounded. If A is unbounded, then for every
positive integer N; there is an xn 2 A such that kxn k > N: Let xn1 ; xn2 ; :::; be a
convergent subsequence of x1 ; x2 ; :::; then kxnk k = Nk = k; which goes to in…nity as
k goes to in…nity, which is impossible since the sequence xn1; xn2; :::; converges. This
shows that A is bounded. I now show that A is closed. If A is not closed, there
is a sequence x1; x2; :::; in A that converges to a point x not in A: Let xn1; xn2 be a
subsequence of x1; x2; :::; that converges to a point in A: Since this subsequence must
converge to x; x must belong to A: This contradiction proves that A is closed. Since
A is closed and bounded, it is compact.
Assume now that A is compact. I show that every sequence in A has a subse-
quence that converges to a point in A: Let x1 ; x2 ; ::: be a sequence in A. Because A is
bounded, it is contained in a cube C1 = fy 2 RN j b yn b; for n = 1; :::; N g; for
2
some positive number b. Divide C1 in half along each dimension, obtaining 2N sub-
cubes, each with edges of length 2b=2 = b: One of these subcubes contains the point
xn for in…nitely many numbers n. Call this cube C2 : Suppose that cubes C1 ; :::; CK
have been de…ned where C1 C2 CK and for each k, Ck has edges of length
2b=2k = b2 k+1 and Ck contains xn ; for in…nitely many integers n: Divide CK in half
along each dimension, obtaining 2N subcubes. One of these contains xn , for in…nitely
many n. Call this cube CK+1 : I have de…ned by induction on K a sequence of cubes
C1 ; C2 ; ::: such that
1. for all k; Ck contains xn , for in…nitely many n;
2. C1 C2 ; and
k+1
3. for all k; each edge of Ck has length b2 :
I now de…ne a subsequence xnk of xn by induction on k: Let xn1 be one of x1 ; x2 ; :::

belonging to C1 : Suppose xn1 ; :::; xnK have been de…ned such that xnk 2 Ck ; for
k = 1; :::; K and n1 < n2 < < nK : Since CK+1 contains in…nitely many members
of x1 ; x2 ; :::; there exists an xnK+1 2 CK+1 such that nK+1 > nK : I have de…ned
xn1 ; xn2 ; ::: such that xnk 2 Ck and
p nkk+1 < nk+1 ; for all k:
Since the diameter of Ck is b N 2 ; which converges to zero as k goes to in…nity
it follows that limK!1 supk=K;m=K jjxnk xnm jj = 0: By the completeness property
of the real numbers, there exists x 2 RN such that limk!1 xnk = x. Since A is closed
and xnk 2 A; for all k; x 2 A: That is, the subsequence xnk converges to a point in
A:
I now establish another important property of compact sets.
De…nition: An open cover of a subset A of RN consists of a collection, U, of open

sets in RN such that A [u2U U . That is, A is contained in the union of the sets
U 2 U:
Example: The set of all open intervals in R is an open cover of [0; 1]:
De…nition: If U is an open cover of A, a subcover consists of a collection of sets U

in U whose union contains A.
Example: If U is the set of open intervals in R, the one interval ( 1; 2) is a subcover

of the interval [0; 1].
Theorem (Heine–Borel): A subset A of RN is compact if and only if every open

cover of A contains a …nite subcover.
3
Proof: Suppose that every open cover of A contains a …nite subcover. I show that
A is closed and bounded. In order to show that A is closed let x 2 RN A and for
each m = 1; 2; :::; let Um = fy 2 RN j ky xk > 1=mg. [1 m=1 Um = R
N
fxg, so
M
that U1 ; U2 ; ::: is an open cover of A. Therefore, for some M , A [m=1 Um = UM .
Therefore, B1=M (x) \ A = ;: Hence RN A is open and so A is closed.
In order to show that A is bounded, for m = 1; 2; :::; let Um = fx 2 RN j kxk <
mg. [1 N
m=1 Um = R , so that U1 ; U2 ; ::: is an open cover of A. Therefore, for some M ,
A [M m=1 Um = UM , so that kxk M , for all x 2 A and hence A is bounded.
Suppose now that A is compact. I show that every open cover, U, of A contains
a …nite subcover. Suppose U contains no …nite subcover of A. Let C1 be a cube
containing A. Divide C1 into 2N subcubes of equal size that intersect only along
sides. One of those subcubes, C2 , is such that A \ C2 is not empty and A \ C2 is not
covered by a …nite subcover of U, for if there were no such subcube, the intersection of
A with each subcube would have a …nite subcover and the union of this …nite number
of subcovers would be a …nite subcover of A, contrary to hypothesis. Suppose that
C1 ; C2 ; :::; CK have been de…ned such that, C1 C2 CK and, for each k > 1,
N
Ck 1 is the union of 2 cubes congruent to Ck and that intersect only along sides
and Ck \ A is not empty and has no …nite subcover. Divide CK into 2N congruent
subcubes that intersect only along sides. One of these subcubes, CK+1 ; is such that
A \ CK+1 is not empty and has no …nite subcover. By induction on K, I have de…ned
cubes C1 ; C2 ; ::: such that C1 C2 , limK!1 diam(CK ) = 0, and for all K,
CK \ A is not empty and has no …nite subcover.
By the completeness property of the real numbers, there is x 2 \1 k=1 Ck . Also,
for every k, there is xk 2 Ck \ A. Because limk!1 diam(Ck ) = 0, it follows that
limk!1 xk = x. Since A is closed, x belongs to A. Since U covers A, there is a U in
U, such that U contains x. Since U is open, there is a positive number " such that
B" (x) U . Because xk 2 Ck , limk!1 xk = x, and limk!1 diam(Ck ) = 0, there is a
positive integer K such that CK B" (x) U . Therefore U covers CK \ A, contrary
to hypothesis. This contradiction proves that every open cover of A contains a …nite
subcover.
Theorem: If A RN is compact and f : A ! RM is continuous, then f (A) =

ff (x) j x 2 Ag is compact.
Proof: Let U be an open cover of f (A). For every U 2 U, f 1 (U ) is open in A.

For each U 2 U, let VU be an open subset of RN such that A \ VU = f 1 (U ). Then,
V = fVU j U 2 Ug is an open cover of A. Since A is compact, V has a …nite subcover,
VU1 ; :::; VUk . Then U1 ; :::; Uk is an open cover of f (A). Therefore, f (A) is compact.
Theorem: If A R is compact and non-empty, then glb(A) 2 A and lub(A) 2 A.
4
Proof: Since A is compact, it is bounded and hence glb(A) and lub(A) exist. By the
de…nition of lub(A), there is a sequence x1 ; x2 ; ::: in A, such that limn!1 xn = lub(A).
Since A is closed, limn!1 xn 2 A. A similar argument proves that glb(A) 2 A.
Theorem: If A RN is compact and non-empty and f : A ! R is continuous,

then there exist x and x in A such that f (x) f (x) 5 f (x), for all x 2 A.
Proof: Since A is compact and f is continuous, f (A) is compact. Therefore,

glb(f (A)) 2 f (A) and lub(f (A)) 2 f (A). Let x and x in A be such that f (x) =
glb(f (A)) and f (x) = lub(f (A)). Then, f (x) 5 f (x) 5 f (x), for all x 2 A.
Remark: This theorem says that a continuous function de…ned on a compact set
achieves its minimum and maximum.
Problem (one often met in economics): Let X be a subset of RN and B a

subset of RM . The endogenous variables vary over X: The exogenous variables or
parameters vary over B:
Let f : X B ! R be the objective function for k = 1; :::; K; let gk : X B!R

be the constraint functions. Consider the problem
max f (x; b) (?)
x2X
s.t. gk (x; b) 0; for k = 1; :::; K;
where b 2 B is given. For b 2 B; let h(b) be the set of solutions of this problem.
Question: Under what conditions is h(b) 6= ; for all b; and is h a continuous

function.
If X is compact and the functions gk are continuous, then

fx 2 Xjgk (x; b) 5 0; for k = 1; :::; Kg
is compact . To see that this is so, notice that since ( 1; 0] is closed and X is closed,
fx 2 Xjgk (x; b) 5 0g = gk 1 (:; b)(( 1; 0]) is a closed subset of RN : Therefore,
\
K
fx 2 Xjgk (x; b) 0; for k = 1; :::; Kg = gk 1 (:; b)(( 1; 0])
k=1
is closed. Since X is bounded, fx 2 Xjgk (x; b) 5 0; for k = 1; :::; Kg is bounded and

hence compact. If fx 2 Xjgk (x; b) 0; for k = 1; :::; Kg is non-empty, then problem
(?) has a solution provided f is continuous. Hence, h(b) 6= ; under these conditions.
h(b) may contain more than one point. Even if h is a function, it may not be
continuous, as the following example shows.
5
Example: Let X = f(x1 ; x2 ) 2 R2 j0 x; 2; 0 x2 2g and let B = [0; 1):
Let p vary over B: Let f (x1; x2 ; p) = x1 and g(x1 ; x2 ; p) = px1 + x2 p: If p > 0;
h(p) = (1; 0):
If p = 0; h(p) = (2; 0):
This example corresponds to maximizing the utility function f (x1 ; x2 ) = x1 over

2
the budget set f(x1 ; x2 ) 2 R+ j(p; 1) (x1 ; x2 ) (p; 1) (1; 0)g; where the price of
commodity 1 is p, the price of commodity 2 is 1, and the consumer owns 1 unit of
commodity 1 and none of commodity 2, so that her or his wealth is (p; 1) (1; 0) = p:
When p = 0; the consumer’s budget set explodes in width to include the point (2; 0):
Maximum Theorem: Let X be a compact subset of RN and let B be a subset

of RM : Let f : X B ! R and gk : X B ! R; for k = 1; :::; K; be continuous.
Assume in addition that
6
1. for all b 2 B, fx 2 Xjgk (x; b) 0; for k = 1; :::; Kg is non-empty,
2. for all (x; b) 2 X B such that gk (x; b) 0; for all k, and for all " > 0; there
exists a > 0 such that if jjb1 bjj < ; there exists an x1 2 X such that,
gk (x1 ; b1 ) 0; for k = 1; :::; K; and jjx1 xjj < ";
3. if the problem
max f (x; b)
x2X
s.t. gk (x; b) 0, for k = 1; :::; K (??)
has a solution, then it is unique.
Then, the function h : B ! X; where h(b) is the unique solution of problem (??);
exists and is continuous.
Proof: Since X is compact and the gk are continuous, fx 2 Xjgk (x; b) 0; for
k = 1; :::; Kg is compact, for all b 2 B. By condition 1, this set is non-empty. Since f
is continuous, problem (??) has a solution. By assumption 3, this solution is unique.
Hence, h(b) is a well-de…ned function.
To show that h is continuous, let b1 ; b2 ; ::: be a sequence in B such that limn!1 bn =b,
where b 2 B: I must show that limn!1 h(bn ) = h(b):
If h(bn ) does not converge to h(b), then there exists an " > 0 and a subsequence
nj ; j = 1; 2; :::; such that jjh(bnj ) h(b)jj > ", for all j. Since X is compact, I may
assume that h(bnj ) converges, say to x. (That is, a subsequence of h(bnj ) converges
to x1 and I call this subsequence h(bnj ) again.) Since jjx h(b)jj " > 0; it follows
that x 6= h(b):
I now derive a contradiction. Since gk (h(bnj ); bnj ) 0; for all k and j, and the
functions gk are continuous and limj!1 (h(bnj ); bnj ) = (x; b); it follows that gk (x; b) 5
0; for all k: Therefore, f (x; b) f (h(b); b); by the de…nition of h(b): I prove that
f (x; b) = f (h(b); b): Suppose that f (x; b) < f (h(b); b): Then, f (x; b) < f (h(b); b) 2 ;
for some > 0: Since f is continuous, there exists a positive number such that
jf (x; b) f (h(b); b)j < ; if jjx h(b)jj < and jjb bjj < : By condition 2 of
the theorem, there is a > 0 such that if jjb bjj < ; then there exists an x 2 X
such that gk (x; b) 0; for all k; and jjx h(b)jj < : I may assume that < ; so
that, jf (x; b) f (h(b); b)j < : Since limj!1 bnj = b and limj!1 h(bnj ) = x; there
is a positive integer J such that jjbnj bjj < and jf (h(bnj ); bnj ) f (x; b)j < ;
for j J: By what has been argued, if j = J; there exists xnj 2 X such that
jjxnj h(b)jj < and gk (xnj ; bnj ) 0; for all k: If j J; f (xnj ; bnj ) > f (h(b); b) >
f (x; b) + 2 = f (x; b) + > f (h(bnj ); bnj ); which is impossible by the de…nition of
h(bnj ): This contradiction proves that f (x; b) = f (h(b); b): Condition 3 of the theorem
now implies that x = h(b); which contradicts the inequality jjx h(b)jj ": This
second contradiction implies that limn!1 h(bn ) = h(b):
7
1 Calculus of one variable

The origin of calculus has to do with di¤erential equations, which is integration. We
will focus on di¤erentiation, which has to do with the local approximation of functions
by a¢ ne ones. Let f : (a; b) ! R, where a < b.
De…nition: f is di¤erentiable at c, where a < c < b, if there is a number df (c)=dx

such that
f (x) f (c) df (c)
lim = 0:
x!c
x6=c
x c dx
That is, for every " > 0, there is a > 0 such that
f (x) f (c) df (c)

< ";
x c dx
df (c)
if jx cj < and x 6= c. That is, f (x) f (c) dx
(x c) "jx cj; if jx cj < :
De…nition: A function f : RN ! RM is said to be a¢ ne if it is a linear function

plus a constant. That is, f (x) = T (x) + b; where b 2 RM and T : RN ! RM is a
linear transformation.
Example: The function g(x) = 6 + 7x is an a¢ ne function from R to R:
What does the de…nition of di¤erentiability mean? The function f : (a; b) ! R is

di¤erentiable at c if its graph at c looks like that of an a¢ ne function when examined
under a powerful microscope. The slope of the a¢ ne function is df (c)=dx.
1
Let
df (c) df (c) df (c)
g(x) = f (c) + (x c) = f (c) c + x:
dx dx dx
df (c) df (c)
g is an a¢ ne function with constant f (c) dx
c and linear part dx
x. Then,
f (x) g(x) f (x) f (c) dfdx(c) (x c) f (x) f (c) df (c)

lim = x!c
lim = x!c
lim = 0:
x!c
x6=c
x c x6=c
x c x6=c
x c dx
Let be a small positive number and magnify the graphs of f and g by multiplying
both coordinates by 1 , which is a large number. Adjust the lens so that the part
of the horizontal coordinate in the …eld of vision varies between 1 and 1. Let
x c= x, so that g(x) = g(c + x) and f (x) = f (c + x), where 1 x 1.
What we see after magni…cation is the graphs of the functions of x, 1 f (c + x)
and 1 g(c + x), as x varies between 1 and +1. Let " > 0 and choose > 0
f (x) f (c) df (c)
such that x c dx
< ", if 0 < jx cj < . Then, if j xj 1;
1 1
f (c+ x)
g(c+ x)
1 df
= f (c+ x) f (c) (c) x
dx
1 df
= f (c+ x) f (c) (c) x j xj
j xj dx
f (c+ x) f (c) df
= (c) j xj < "j xj:
x dx
That is, the graphs of 1 f (c + x) and of 1 g(c + x) as functions of x are
within "j xj in the vertical direction as x varies between 1 and 1. In this sense,
the a¢ ne function g(x) = f (c) + dfdx(c) (x c) approximates f near c.
2
Notice that if x and y are numbers, then jxj = jx y + yj 5 jx yj + jyj; so that
jxj jyj 5 jx yj:
Lemma: If f is di¤erentiable at c, it is continuous at c.
f (x) f (c) df (c)

Proof: There exists > 0 such that x c dx
< 1, if jx cj < . Therefore,
f (x) f (c) df (c) f (x) f (c) df (c)

< 1;
x c dx x c dx
df (c)
if jx cj < ; so that jf (x) f (c)j < jx cj dx
+1 converges to 0 as jx cj
converges to 0:
Lemma:
a) If f is di¤erentiable at c, and df (c)=dx > 0, then there exists a > 0 such that
f (c) < f (x), if c < x < c + .
b) If df (c)=dx < 0, there exists a > 0 such that f (c) < f (x), if c < x < c.
1 df (c)
Proof: a) Let correspond to " = 2 dx
in the de…nition of di¤erentiability of c.
Then,
f (x) f (c) df (c) 1 df (c)
> ;
x c dx 2 dx
if c < x < c + . Hence,
1 df (c)
f (x) f (c) > (x c) > 0;
2 dx
if c < x < c + :
The proof of (b) is similar.
3
De…nition: If f : (a; b) ! R, where a < b, and c is such that a < c < b, then f
achieves a relative maximum at c if there exists a > 0 such that f (c) f (x), if
jx cj < . A relative maximum is also called a local maximum.
Interior Maximum Theorem: If f : (a; b) ! R is di¤erentiable, where a < b and

if f achieves a relative maximum at c where, a < c < b, then df (c)=dx = 0.
Proof: Immediate consequence of lemma.
A relative or local minimum for f is de…ned in the same way. f achieves a local
minimum at c, if and only if f achieves a local maximum at c. If f achieves a local
minimum at c, 0 = d( f (c))=dx = [df (c)=dx], so that df (c)=dx = 0.
Rolle’s Theorem: Let f : [a; b] ! R, where a < b, be continuous on [a; b] and

di¤erentiable on (a; b). If f (a) = f (b) = 0, then there exists a c such that a < c < b
and df (c)=dx = 0.
Proof: If f (x) = 0, for all x, then df (c)=dx = 0, for all c. So suppose f (x) 6= 0, for
some x. Suppose f (x) > 0, for some x. Since [a; b] is compact and f is continuous,
there is c such that a c b and f (c) f (x), for all x. Since f (x) > 0, for some x,
f (c) > 0. Since f (a) = f (b) = 0, a < c < b. By the previous theorem, df (c)=dx = 0.
Use a similar argument if f (x) < 0, for some x.
Mean Value Theorem: If a < b and f : [a; b] ! R is continuous on [a; b] and

f is di¤erentiable on (a; b), then there is c such that a < c < b and df (c)=dx =
[f (b) f (a)]=(b a).
4
Proof: Let ' : [a; b] ! R be de…ned by
f (b) f (a)
'(x) = f (x) f (a) (x a):
b a
Then '(a) = '(b) = 0, ' is continuous on [a; b] and di¤erentiable on (a; b). By Rolle’s
theorem, there is c such that a < c < b and
d'(c) df (c) f (b) f (a)

0= = :
dx dx b a
d
Leibniz’s Rule: If f : (a; b) ! R and g : (a; b) ! are di¤erentiable, then dx
f (x)g(x) =
f (x) dg(x)
dx
+ dfdx
(x)
g(x):
df (x)=dx is a function of x. f is twice di¤erentiable if df (x)=dx is di¤erentiable,

and the derivative of df (x)=dx, denoted d2 f (x)=dx2 , is called the second derivative
of f . By induction of n it is possible to de…ne the nth derivative for n = 3; 4; :::. The
nth derivative is denoted by dn f (x)=dxn .
Taylor’s Theorem: Suppose that f : (a; b) ! R and that dk f (x)=dxk exists on

(a; b), for k = 1; :::; n: If and belong to (a; b), there exists a number between
and such that
df ( ) 1 d2 f ( )
f( ) = f( ) + ( )+ ( )2
dx 2 dx2
1 dn 1 f ( ) 1 dn f ( )
+ + ( )n 1
+ ( )n :
(n 1)! dxn 1 n! dxn
5
Proof: Let the number r be de…ned by
( )n df ( ) 1 dn 1 f ( )
r = f( ) f ( )+ ( )+ + ( )n 1
:
n! dx (n 1)! dxn 1
Let the function ' : [a; b] ! R be de…ned by
df (x) 1 d2 f (x)
'(x) = f ( ) [f (x) + ( x) + ( x)2
dx 2 dx2
1 df n 1 (x) r
+ + n 1 ( x)n 1 + ( x)n ]:
(n 1)! dx n!
' is continuous on [a; b] because f and all its derivatives are continuous on [a; b].
Similarly, ' is di¤erentiable on (a; b). '( ) = 0, by the de…nition of r. Certainly,
'( ) = 0. By the Rolle’s theorem, there is a between and such that d'( )dx =
0:
d'(x) df (x) df (x) d2 f (x) d2 f (x)
= + ( x) ( x) +
dx dx dx dx2 dx2
1 dn 1 f (x) n 2 1 dn f (x) r
( x) + ( x)n 1
( x)n 1
(n 2)! dxn 1 (n 1)! dxn (n 1)!
1 dn f (x)
= r n ( x)n 1 :
(n 1)! dx
Since d'( )=dx = 0, r = dn f ( )=dxn . The theorem follows from the de…nition of r.
Theorem: Suppose that f : (a; b) ! R is di¤erentiable, where a < b and that the
…rst two derivatives of f exist and are continuous. If c is such that df (c)=dx = 0 and
d2 f (c)=dx2 < 0 (> 0), then f achieves a local maximum (minimum) at c.
2
Proof: Because d dx
f (x)
2 is continuous, there exist a < 0 such that if jx cj < , then
2 2
d f (x)=dx < 0. By Taylor’s theorem, if 0 < jx cj < , there exists a between c
and x such that
df (c) 1 d2 f ( ) 1 d2 f ( )
f (x) = f (c) + (x c) + (x c)2 = f (c) + (x c)2 < f (c);
dx 2 dx2 2 dx2
since d2 f ( )=dx2 < 0.
Similarly, if d2 f (c)=dx2 > 0, then, f achieves a local minimum at c.
6
A local maximum. The bucket does not hold water and so d2 f (c)=dx2 < 0.
A local minimum. The bucket holds water and so d2 f (c)=dx2 > 0.
Theorem (Chain Rule of Di¤erentiation): If f : (a; b) ! (A; B) and g :

(C; D) ! R are di¤erentiable, where (A; B) (C; D), then
d dg df (c)
(g f )(c) = (f (c)) :
dx dy dx
Proof: I must show that

g(f (x)) g(f (c)) dg(f (c)) df (c)
lim = 0:
x!c
x6=c
x c dy dx
7
If f (x) 6= f (c); then
g(f (x)) g(f (c)) dg(f (c)) df (c)

x c dy dx
g(f (x)) g(f (c)) f (x) f (c) dg(f (c)) f (x) f (c)
=
f (x) f (c) x c dy x c
dg(f (c)) f (x) f (c) dg(f (c)) df (c)
+
dy x c dy dx
g(f (x)) g(f (c)) dg(f (c)) f (x) f (c)
f (x) f (c) dy x c
f (x) f (c) df (c) dg(f (c))
+ :
x c dx dy
Suppose there is an " > 0 such that f (x) 6= f (c); if 0 < jx cj < ": The second
term on the right-hand side converges to zero as x converges to c: Since f (x) 6= f (c), if
jx cj is small, the …rst term converges to zero as x goes to c; provided f (x) ! f (c).
Since f is di¤erentiable, it is continuous and so f (x) ! f (c) as x ! c.
Suppose there is no positive number " such that f (x) 6= f (c), if 0 < jx cj < ".
Then, df (c)=dx = 0. If f (x) 6= f (c), the argument of the previous paragraph applies.
If f (x) = f (c), then
g(f (x)) g(f (c)) dg(f (c)) df (c)

= j0 0j = 0:
x c dy dx
Multivariate Calculus
De…nition: Let U be an open subset of RN and let f : U ! RM . f is di¤erentiable
at c 2 U if there exists a linear transformation Df (c) : RN ! RM , called the
derivative of f at c, such that for every " > 0, there exists a > 0 such that
kf (x) f (c) Df (c)(x c)k < " kx ck, if 0 < kx ck < . That is,
k f (x) f (c) Df (c)(x c) k

lim = 0:
x!c
x6=c
kx ck
The a¢ ne function f (c) + Df (c)(x c) approximates f (x) locally near c, that is,
for x near c.
Remark: If f (x) = a + T (x), where a 2 RM and T : RN ! RM is linear, then

Df (x) = T , for all x.
8
Lemma: A function f : U ! RM has at most one derivative at a point.
Proof: Suppose that S : RN ! RM and T : RN ! RM are linear and satisfy

the de…nition of a derivative of f at c 2 U . If S 6= T , then there is a v 2 RN
such that kvk = 1 and kS(v) T (v)k > 0. Let " > 0 and let > 0 be such that
kf (x) f (c) S(x c)k < " kx ck and kf (x) f (c) T (x c)k < " kx ck, if
0 < kx ck < . Let t be a non-zero number such that jtj < and let x = c + tv.
Then 0 < kx ck < and
0 < jtj kS(v) T (v)k = kS(tv) T (tv)k

= k [f (x) f (c)] + S(x c) + [f (x) f (c)] T (x c)k
kf (x) f (c) S(x c)k + kf (x) f (c) T (x c)k
2" kx ck = 2" ktvk = 2" jtj :
Dividing by jtj, we see that 0 < kS(v) T (v)k < 2", for all " > 0, which is impossible.
Lemma: If T : RN ! RM is a linear transformation, then there exists a positive

number b such that kT (v) T (w)k b kv wk, for all v and w in RN . Therefore a
linear transformation is everywhere continuous.
Proof: Let A be the M N matrix representing T and let a = maxm;n jamn j. If

y = T (x), then, for all m = 1; :::; M ,
X
N
jym j = amn xn kam k kxk ;
n=1
by the Cauchy–Schwarz inequality, where am is the mth row of A.

v
u N p
uX p
kam k = t a2mn 5 N a2 = a N :
n=1
Therefore,
v
uM q
p uX p
jym j 5 a N kxk and so kyk = t 2
ym M a2 N k x k2 = a M N k x k :
m=1
p
Let b = a M N .
9
Lemma: Let f : U ! RM , where U is an open subset of RN . If f is di¤erentiable

at c 2 U , then there exist positive numbers and B such that kf (x) f (c)k
B kx ck, if kx ck < . In particular, f is continuous at c.
Proof: Since f is di¤erentiable at c, there exists a > 0 such that
kf (x) f (c) Df (c)(x c)k 5 kx ck ;
if kx ck < . Therefore,
jjf (x) f (c)jj = jjf (x) f (c) Df (c)(x c) + Df (c)(x c)jj

jjf (x) f (c) Df (c)(x c)jj + jjDf (c)(x c)jj
jjx cjj + jjDf (c)(x c)jj;
if jjx cjj < : By the last lemma of the previous lecture, there is a b > 0 such that
jjDf (c)(x c)jj bjjx cjj:
Therefore,
jjf (x) f (c)jj (1 + b)jjx cjj
if jjx cjj < :
De…nition: Let f : U ! R, where U RN is open, and let v 2 RN . The vector

rv f (c) is said to be the directional derivative of f in the direction v if for every
" > 0, there is > 0 such that if 0 < jtj < , then 1t [f (c + tv) f (c)] rv f (c) <
jf (c + tv) f (c) Df (c)(tv)j. That is rv f (c) = limt!0;t6=0 1t [f (c + tv) f (c)].
Remarks:
1. r0 f (c) = 0.
d d
2. rv f (c) = dt f (c + tv) t=0
= dx g(0), where g(t) = f (c + tv).
Theorem: Suppose that f : U ! R is di¤erentiable at c 2 U; where U is an open

subset of RN : If v 2 RN ; then rv f (c) exists and rv f (c) = Df (c)(v):
Proof:k By the de…nition of the di¤erentiability of f , for any " > 0, there is a > 0
such that jf (c + tv) f (c) Df (c)(tv)j " ktvk, if ktvk < . If v = 0, Df (c)(v) =
0 = rv f (c). If v 6= 0 and 0 < jtj < kvk , then 1t [f (c + tv) f (c)] Df (c)(v) =
1 "ktvk "jtjkvk
jf (c + tv) f (c) Df (c)(tv)j
jtj jtj = jtj = " kvk : Therefore, Df (c)(v) =
rv f (c), by the de…nition of rv f (c).
1
De…nition: If f : U ! R, where U RN and U is open, then ren f (c) is called
the nth partial derivative of f at c and is written as @f (c)=@xn ;.where en is the nth
standard basis vector of RN .
Remark:
@f d
(c) = f (c1 ; :::; cn 1 ; xn ; cn+1; :::; cN )jxn =cn :
@xn dxn
That is, all variables of f but the nth are held constant at their values in the vector
c. The result is a function of the single variable xn . The derivative of this function
@f
at xn = cn equals @x n
(c).
Example:
f (x1 ; x2 ; x3 ) = x1 x32 x23
@f (2; 4; 5)
= 2(3)(42 )(52 ) = 6(16)(25) = 2400:
@x2
If f : U ! RM where U RN and U is open, let fm : U ! R be the mth
component of f , for m = 1; :::; M:
Theorem: Let f : U ! RM ; where U is an open subset of0RN : If f is 1 di¤erentiable

Df1 (c)
B .. C
at c, then fm is di¤erentiable at c, for all m, and Df (c) = @ . A.
DfM (c)
Proof: Let " > 0 and let > 0 be such thatkf (x) f (c) Df (c)(x c)k
5 " kx 0 ck, if kx ck 1< . Df (c) : RN ! RM is a linear transformation, so that
(Df (c))1
B .. C
Df (c) = @ . A ; where (Df (c))m : RN ! R is linear, for all m, and is the
(Df (c))M
mth component function of Df (c): If jjx cjj < ; then
jfm (x) fm (c) (Df (c))m (x c)j

jjf (x) f (c) Df (c)(x c)jj 5 "jjx cjj;
since fm (x) fm (c) (Df (c))m (x c) is the mth component of
f (x) f (c) Df (c)(x c):
Therefore, by the de…nition of a derivative, (Df (c))

0 m is the 1derivative of fm at c:
Df1 (c)
B .. C
That is, (Df (c))m = Dfm (c): Therefore, Df (c) = @ . A.
Dfm (c)
2
0 1
@f1 (c) @f1 (c)
@x1 ::: @xN
B .. .. C
Theorem: The matrix B
@ . .
C represents Df (c), if f is di¤eren-
A
@fM (c)
@x1 ::: @f@x
M (c)
N
tiable at c and f : U ! RM , where U RN and U is open.
PN
Proof: Let v = (v1 ; :::; vN ) 2 RN . Then, v = n=1 vn en , so that
N
! N
X X
Df (c)(v) = Df (c) vn en = vn Df (c)(en )
n=1 n=1
0 1 0 1
N Df1 (c) N Df1 (c)(en )
X B .. C X B .. C
= vn @ . A (en ) = vn @ . A
n=1 DfM (c) n=1 DfM (c)(en )
0 1 0 1
@f1 (c)
N ren f1 (c) N @x
X B .. C X B ..
n
C
= vn @ . A= vn B
@ .
C
A
n=1 ren fM (c) n=1 @fM (c)
@xn
0 10 1
@f1 (c) @f1 (c)
::: v1
B @x. 1 @xN
. CB . C
B .. .. C
= @ A @ .. A :
@fM (c) @fM (c) vN
@x1 ::: @xN
Theorem:
1. Let f : U ! RM and g : U ! RM , where U RN and U is open. If f and g are

di¤erentiable at c 2 U and a and b are numbers, then af + bg is di¤erentiable
at c and D(af + bg)(c) = aDf (c) + bDg(c):
2. If f and g are as in part (1), then f g is di¤erentiable at c and D(f g)(c)(v) =

Df (c)(v) g(c) + f (c) Dg(c)(v), for v 2 RN .
3. If ' : U ! R and ' is di¤erentiable at c and if f is as in part (1), then 'f is

di¤erentiable at c and D('f )(c)(v) = D'(c)(v)f (c) + '(c)Df (c)(v).
Parts 2 and 3 of this theorem generalize Leibniz’s rule for di¤erentiating products.
Some background facts about matrix transposition:
1. If A is an M N matrix and B is an N K matrix, then (AB)T = B T AT :
2. If A and B are M N matrices and a and b are numbers, then (aA + bB)T =
aAT + bB T :
3
3. If A is a matrix, (AT )T = A.
4. If x and y are N -vectors, x y = xT y:
I let you verify these assertions.

The equation in part 2 of the previous theorem may be written as
D(f T g)(c)v = (Df (c)v)T g(c) + f (c)T Dg(c)v

= v T (Df (c))T g(c) + f (c)T Dg(c)v
= g(c)T Df (c)v + f (c)T Dg(c)v;
where I treat Df (c) and Dg(c) as M N matrices. The last equation holds because
v T (Df (c))T g(c) is a number and so equals its own transpose.
Now, I describe some useful special cases.
If f (x) = aT x, where a 2 RN is a constant vector, then Df (x) = D(aT x) = aT ,

since aT x is a linear function of x. Similarly, if f (x) = Ax, where A is an M N
matrix, then Df (x) = A, since Ax is linear.
Now, let M = N in the previous theorem. Let f (x) = x and let g(x) = Ax, where
A is an N N constant matrix. Then by part 2 of the theorem,
D(xT Ax)(c)(v) = D(f g)(c)(v)

= g(c)T Df (c)v + f (c)T Dg(c)v
= cT AT Iv + cT Av
= cT AT v + cT Av:
If in addition A is symmetric, so that AT = A, then D(xT Ax)(c)(v) = cT AT v +

cT Av = 2cT Av. That is, the matrix representation of D(xT Ax)(c) is 2cT A, if A is
symmetric.
Let U be an open subset of RN and f : U ! R:
De…nition: f has a local maximum at c 2 U if for some " > 0, f (c) f (x), for
all x 2 B" (c):
Theorem: If f has a local maximum at c 2 U , then Df (c) = 0.
Proof: The restriction of f to any line through c has a local maximum at c. There-
fore, rv f (c) = 0, for all v 2 RN . In particular, @f (c)=@xn = 0, for all n. Therefore,
Df (c) = 0.
A local minimum for f may be de…ned in a similar way, and Df (c) = 0 if f has
a local minimum at c:
4
Application (Least Squares Estimator):
K
X
Model y= k xk + e; e = error:
n=1
We don’t know the k. Suppose we have N observations,
y1 x11 ::: x1K

.. .. .. :
. . .
yN xN 1 ::: xN K
PN PK 2
The least square estimator is (b1 ; :::; bK ) that minimizes n=1 (yn k=1 bk xnk ) ,
which is the sum of squared errors. Let
0 1 0 1 0 1
y1 x11 ::: x1K b1
B C B .. C ; B C
y = @ ... A ; X = @ ... . A b = @ ... A
yN xN 1 ::: xN K bK
We wish to choose b so as to minimize:
(y Xb) (y Xb) = (y Xb)T (y Xb)

T T T
= (y b X )(y Xb)
T T
= y y y Xb b X T y + bT X T Xb
T
= yT y 2y T Xb + bT X T Xb;
where I have used the rules for matrix transposition and the fact that since bT X T y is
a number, bT X T y = (bT X T y)T = y T XbT T = y T Xb. The b that minimizes (y
Xb)T (y Xb) is called the least squares estimator.
If k = 1, we have the following:
The least squares estimate, b, minimizes the sum of the squares of the vertical dis-
tances from the data points (xn ; yn ) to the line y = bX.
5
In order to calculate the least squares estimator, we set the derivative of (y Xb)
(y Xb) = y T y 2y T Xb + bT X T Xb with respect to b equal to zero. Let Db denote
the derivative with respect to the vector b.
Db (y Xb) (y Xb) = Db [y T y 2y T Xb + bT X T Xb]

= Db y T y 2Db y T Xb + Db bT X T Xb
= 0 2y T X + 2bT X T X;
where I have used the fact that the matrix X T X is symmetric. X T X is symmetric
because (X T X)T = X T X T T = X T X. Since X T X is symmetric, Db bT X T Xb =
2bT X T X, by a formula proved earlier. Setting Db (y Xb) (y Xb) equal to zero,
we obtain the equation 0 = 2y T X + 2bT X T X; which implies that bT X T X = y T X.
Taking the transpose of both sides of this equation, we obtain X T Xb = X T y. If the
matrix X T X is invertible, then b = (X T X) 1 X T y: This is the formula for the least
squares estimator.
The N vector Xb is the projection of y onto the span of the columns of X: In order
to see that this is so, we must show that y Xb is orthogonal to the columns of X: Since
the columns of X are the rows of X T ; it is su¢ cient to show that X T (y Xb) = 0:
However, X T (y Xb) = X T y X T X(X T X) 1 X T y = X T y X T y = 0:
Theorem (The Chain Rule): Suppose that f : U ! V , where U and V are

open subsets of RN and RM , respectively, and that g : V ! RK : Suppose that f
is di¤erentiable at c 2 U and that g is di¤erentiable at b = f (c): Let h : U ! RK
be de…ned by h(x) = g(f (x)) = g f (x): Then, h is di¤erentiable at c and Dh(c) =
Dg(f (c))Df (c):
Mean Value Theorem: Let f : U ! R, where U is an open subset of RN :Suppose

that f is di¤erentiable on U . Let a 2 U and b 2 U and suppose that the line segment
from a to b (= f1 t)a + tbj0 t 1g) is contained in U: Then, there exists a point
c on this line segment such that f (b) f (a) = Df (c)(b a):
Proof: Let ' : [0; 1] ! R be de…ned by '(t) = f ((1 t)a + tb): (0) = f (a):
(1) = f (b): By the chain rule, d (t)=dt = Df ((1 t)a + tb)(b a): By the mean
value theorem for one variable, there exists a number t0 such that 0 < t0 < 1 and
d (t0 )=dt = (1) (0) = f (b) f (a): Let c = (1 t0 )a + t0 b: Then Df (c)(b a) =
f (b) f (a):
Theorem: Existence of a Derivative: Let f : U ! RM ; where U RN is open.

Suppose that @fm (x)=@xn exists and is continuous on U; for all n and m: Then, f is
di¤erentiable on U:
6
Terminology: Let f : U ! RK be di¤erentiable function, where U is an open
subset of RN : The matrix representation of Df (x) is called the Jacobian matrix of f
at x:
If f : U ! R, the derivative Df (x) is called the gradient of f at x, though the
df
word “gradient” usually suggests the vector @x 1
(x); :::; @xdfN (x) that represents the
linear functional Df (x): This vector is sometimes denoted by rf (x):
Second Derivative: Let f : U ! R, where U is an open subset of RN : If f is

@f @f
di¤erentiable, then for each n, @x n
(x) is a function of x. Suppose that @xn
(x) is
di¤erentiable. For m = 1; :::; N;
@ @f
(x)
@xm @xn
is called a second partial derivative of f: This second partial derivative is written as
@ @f @ 2 f (x)
(x) = :
@xm @xn @xm @xn
Theorem: (Interchange of Order of Partial Di¤erentiation) Let f : U ! R

be as above, If for any m and n,
@ 2 f (x)
@xm @xn
exists and is a continuous function of x, then
@ 2 f (x)
@xn @xm
exists and equals
@ 2 f (x)
:
@xm @xn
0 @2f @2f
1
@x1 @x1
(x) @xN @x1
(x)
B .. C
Remark: It follows that the matrix @ . A is symmetric,
@2f @2f
@x1 @xN
(x) @xN @xN
(x)
@ 2 f (x)
if all the partial derivatives @xm @xn
exist and are continuous functions of x.
1
Interpretation: Let f : U ! R, where U is an open subset of RN : If v 2 RN ;
@f (x)
Df (x)(v) = rv f (x) = N n=1 vn @xn is the rate of change of f in the direction of v.
We need an expression for the rate of change of Df (x)(v) at x = c in a direction
w 2 RN : This rate of change is
! !
XN
@f (c) XN
@ XN
@f (c)
rw vn = wm vn
n=1
@xn m=1
@xm n=1 @xn
X
N X
N
@ 2 f (c)
= vn wm = v T D2 f (c)w; where
m=1 n=1
@xm @xn
0 @ 2 f (c) @ 2 f (c) 1
@x1 @x1 @xN @x1
B .. .. C
D2 f (c) = @ . . A:
@ 2 f (c) @ 2 f (c)
@x1 @xN @xN @xN
This matrix is called the Hessian matrix. The function v T D2 f (c)w of v and w is a
bilinear form in v and w and may be written as D2 f (c)(v; w): The Hessian is the
Jacobian of the function Df : U ! RN :
The rate of change of D2 f (x)(v; w) at c 2 U in the direction u 2 RN is
X
N X
N X
N
@ 3 f (c)
3
D f (c)(v; w; u) = vn wm uk :
n=1 m=1 k=1
@xk @xm @xn
This is a trilinear functional D3 f (c)(v; w; u) that is represented by a three dimen-

3
sional matrix with typical entry @xk@@xfm(c)@xn . D3 f (c) is the third derivative of f at c.
Continuing in this way, for any positive integer r, we can de…ne the rth derivative of
f at a to be an r-linear functional
X
N X
N
@ r f (c)
r
D f (c)(v1 ; : : : ; vr ) = v1n1 vrnr
n1 =1 nr =1
@xnr @xn1
where for each s = 1; : : : ; r; vs = (vs1 ; : : : vsN ) 2 RN .
Remarks:
1. The matrix representation of Df (x) is the row vector
@f @f
(x); :::; (x) :
@x1 @xN
2
When we take the derivative of Df (x), we think of Df as a function from U to
RN ; and so write Df (x) as a column vector
0 @f (x) 1
@x1
B .. C
@ . A:
@f (x)
@xN
The matrix representation of the derivative of this function is the Hessian matrix
2
@ f (x)
2. If the functions, @xm @xn
are continuous with respect to x; then the Hessian matrix
is symmetric, so that D2 f (c) is a symmetric bilinear form.
Taylor’s Theorem: Let f : U ! R, where U is an open subset of RN containing a

straight line segment from a to b. Suppose that f has continuous partial derivatives
of order r on U , where r is a positive integer. Then, there exists a point q on the line
segment from a to b such that
1
f (b) = f (a) + Df (a)(b a) + D2 f (a)(b a; b a)
2
1
+ D3 f (a)(b a; b a; b a)
3!
1
+ + Dr 1 f (a)(b a; : : : ; b a)
(r 1)! !
r 1
1 r
+ D f (q)(b a; : : : ; b a)
(r)! !
r
Proof: Let F : [0; 1] ! R be de…ned by F (t) = f (a + t(b a)). Then,

dF (t)
= Df (a + t(b a))(b a)
dt
d2 F (t)
2
= D2 f (a + t(b a))(b a; b a)
dt
..
.
r
d F (t)
r
= Dr f (a + t(b a))(b a; : : : ; b a)
dt !
r
By the one-dimensional Taylor’s theorem applied to F , there is an such that

0 < < 1 and
dF (0) 1 d2 F (0) 1 d3 F (0)
F (1) = F (0) + + +
dt 2 dt2 3! dt3
r 1
1 d F (0) 1 dr F ( )
+ + +
(r 1)! dtr 1 r! dtr
3
Let q = a+ (b a) and substitute into the above equation. The theorem then follows.
Theorem: Let f : U ! R, where U RN and U is open. Suppose that f has

continuous partial derivatives on U of order 2. If c 2 U is such that Df (c) = 0 and
the symmetric bilinear form D2 f (c) is negative de…nite, then f has a local maximum
at c.
Proof: D2 f (x) is negative de…nite if and only if the leading principal minors of
D2 f (x) have determinants that alternate in sign, with the …rst being negative. These
determinants are continuous functions of x. Therefore, if D2 f (c) is negative de…nite,
D2 f (x) is negative de…nite if x is close enough to c. That is, there is " > 0 such that
D2 f (x) is negative de…nite if kx ck < ". Suppose that 0 < kx ck < ". By Taylor’s
theorem, there is a vector q on the line segment from c to x such that
1
f (x) = f (c) + Df (c)(x c) + D2 f (q)(x c; x c)
2
Since kq ck kx ck < "; D2 f (q) is negative de…nite and so D2 f (q)(x c; x c) <
0. By assumption, Df (c) = 0. Therefore f (x) < f (c).
Remark: Similarly if Df (c) = 0 and D2 f (c) is positive de…nite, then f has a local
minimum at c.
Inverse and Implicit Function Theorems

De…nition: If f : U ! RM ; where U is an open subset of RN ; then f is said to
be continuously di¤erentiable or C 1 on U if Df : U ! RN M exists and Df (x) is a
continuous function of x, where Df (x) is here thought of as the M N matrix with
entries @f@x
m (x)
N
:
If N = M , then Df (x) : RN ! RN . Df (x) is invertible, if and only if, there is
a matrix (Df (x)) 1 such that (Df (x)) 1 Df (x) = I = Df (x)(Df (x)) 1 . The next
theorem says that if Df (x) is invertible, then f has an inverse everywhere near c.
Inverse Function Theorem: Let f : U ! RN , where U RN is open and

1
suppose that f is C on U . Suppose that Df (c) is invertible, where c 2 U: Then,
there exists an open set V in RN such that c 2 V U , W = f (V ) is an open subset
N
of R containing f (c); and there exists a function g : W ! V that is C 1 on W and
is such that f g(w) = w and g f (v) = v, for all w 2 W and v 2 V . If v 2 V and
w = f (v), then Dg(w) = (Df (v)) 1 .
This theorem says that f behaves locally like its derivative in that if the derivative
has an inverse at c, then f has an inverse on an open set containing c.
4
I now introduce the implicit function theorem by explaining what it says about
linear functions. Let T : RN +K ! RK be a linear function with K (N + K) matrix
representation C: Suppose that C has rank K. (Since the maximum rank of C is K;
we may say that C has full rank). Hence, C has K independent columns, so that
T (RN +K ) = RK and T is onto. We may assume, without loss of generality, that the
.
last K columns of T are independent. Write C as C = (A..B); where A is a K N
matrix and B is a non-singular (that is, invertible) K K matrix. Write a vector
in RN +K as (x; y), where x 2 RN and y 2 RK : The equation z = T (x; y) may be
.
written as z = C x = (A .. B) x = Ax + By; so that By = z Ax: Solving for y;
y y
we obtain y = B 1 (z Ax): Let H : RN RK ! RK be the linear transformation
H(x; z) = B 1 (z Ax): Then, T (x; H(x; z)) = z; since
. x
T (x; H(x; z)) = (A .. B) 1 (z
B Ax)
= Ax + BB 1 (z Ax) = Ax + I(z Ax) = Ax + z Ax = z:
For each z 2 RK ; the set T 1 (z) is the graph of H(x; z) as a function x with z …xed.
The space RN +K is thereby expressed as RN RK and T is the projection of RN +K
onto RK followed by an invertible linear transformation, with matrix representation
B, from RK to RK : In other words, (x; y) 2 RN +K equals (x; H(x; z)) = (x; B 1 (z
Ax)); for some z 2 RK ; and the point (x; B 1 (z Ax)) projects onto (0; B 1 z); which
in turn is carried to BB 1 z = z: The following graphs may help you visualize this.
The implicit function theorem says that the same assertion applies locally to a
di¤erentiable function.
Implicit Function Theorem: Let Q be an open set in RN +K and let f : Q ! RK

be C 1 : Let (a; b) 2 Q, where a 2 RN and b 2 RK and suppose that the last K columns
5
of Df (a; b) are linearly independent. Let x vary over RN and y and z vary over RK :
Then there is an open set U in RN such that a 2 U; there is an open set V in RK
such that c = f (a; b) 2 V; and there is a C 1 function h : U V ! RK such that
f (x; h(x; z)) = z; for all x 2 U and z 2 V; and h(a; c) = b: Also
Dx h(a; c) = (Dy f (a; b)) 1 Dx f (a; b) and

Dz h(a; c) = (Dy f (a; b)) 1 :
The theorem is illustrated in the next …gure.
It can be seen that f is locally like a projection onto RK along f 1 (z):

The equation f (x; y) = z is said to de…ne the function y = h(x; z) implicitly.
The derivative formulas in the implicit function theorem may be obtained by
applying the chain rule to the equation
f (x; h(x; z)) = z:
Dx f (a; b) + Dy f (a; b)Dx h(a; c) = 0 =) Dx h = (Dy f ) 1 Dx f

Dy f (a; b)Dz h(a; c) = I =) Dz h(a; c) = (Dy f (a; b)) 1 :
The following example is an application of the implicit function theorem.
Example: f (x; y) = x2 + y 2 . The equation f (x; y) = r2 de…nes a circle of radius r.

If r < x < r and x2 + y 2 = r2 , then y 6= 0, so that @f (x; y)=@y = 2y 6= 0. Therefore,
there are intervals (x "; x + ") and (r "; r + "), where " > 0; and there is a C 1
function h : (x "; x + ") (r "; r + ") ! R such that p f (x; h(x; r))p= r, for all
x 2 (x "; x + ") and r 2 (r "; r + "). This function is r2 x2 or r 2 x2 .
6
7
The Envelope Theorem: Suppose that f : W ! R, where W is an open subset

of RN +K = RN RK . Write a vector in RN RK as (x; b), where x 2 RN and
b 2 RK . For each b 2 RK , consider the problem
max f (x; b)
x2RN
s.t. (x; b) 2 W
The vector b is a vector of parameters of the function f . Assume that f is twice

continuously di¤erentiable. That is, the (N + K) (N + K) matrix D2 f (x; b) is a
continuous function of (x; b). (Such functions are said to be C 2 .) Assume also that
Dx2 f (x; b) is negative de…nite, for all x and b, where by Dx2 f (x; b) I mean the second
derivative of f (x; b) with respect to x and with b held …xed. Similarly byDx f (x; b) I
will mean the …rst derivative of f with respect to x. If x solves the above problem at
b, then Dx f (x; b) = 0. Because Dx2 f (x; b) is negative de…nite, it is invertible. By the
implicit function theorem, applied to the equation Dx f (x; b) = 0; there is an open set
U RK containing b and a C 1 function h : U ! RN such that Dx f (h(b); b) = 0, for
all b 2 U . Since Dx2 f (x; b) is negative de…nite, for all x and b, f (x; b), as a function of
x, has a local maximum at x = h(b). (We will later see that this maximum is global.)
By the chain rule,
Df (h(b); b) = Dx f (h(b); b)Dh(b) + Db f (h(b); b) = 0 + Db f (h(b); b) = Db f (h(b); b);
where by Db f (h(b); b) I mean the derivative of f (h(b); b) with respect to b with b held
…xed. The equation Df (h(b); b) = Db f (h(b); b) is known as the envelope theorem.
Example: y = f (L) = production function.

y = output.
L = labor input.
w = wage.
p = price of output.
pro…t = (p; w) = max[pf (L) wL]:

L 0
2
At the maximum, p dfdL
(L)
w = 0. If d dL
f (L)
2 < 0, then there is a di¤erentiable function
L(p; w) such that (p; w) = pf (L(p; w)) wL(p; w):
@ df @L @L df @L
(p; w) = p +f w = p w + f = f (L(p; w))
@p dL @p @p dL @p
@
= [pf (L) wL]
@p L=L(p;w):
@ @
Similarly, @w (p; w) = L(p; w) = @w [pf (L) wL] with L held …xed at L =
L(p; w):
1
Constrained Optimization
Let U be an open subset of RN and let f : U ! R and, for k = 1; : : : ; K, and
gk : U ! R. Consider the problem:
max f (x)
x2U
s.t. gk (x) = ak ; for k = 1; : : : ; K;
where a1 ; : : : ; ak are numbers.
Theorem: Assume that f and g1 ; : : : ; gK are C 1 functions and that f achieves a

local maximum (or minimum) on the set fx 2 U j gk (x) = ak , for k = 1; : : : ; Kg at
x: Assume that the vectors Dg1 (x); :::; DgK (x) are linearly independent. Then there
exist numbers 1 ; : : : ; K such that
K
X
Df (x) = k Dgk (x)
k=1
Remark: The requirement that the vectors Dg1 (x); :::; DgK (x) be linearly inde-
pendent is called the constraint quali…cation.
Example: This example shows that the constraint quali…cation is necessary for
the conclusion of the theorem. Let N = 2 and f (x1 ; x2 ) = x2 : Let g1 (x1 ; x2 ) =
x1 and g2 (x1 x2 ) = x1 x22 : Let a1 = a2 = 0: The only point satisfying the
constraints g1 (x1 ; x2 ) = 0 = g2 (x1 ; x2 ) is (x1 ; x2 ) = (0; 0): This point therefore
maximizes f (x1 ; x2 ) subject to the constraints g1 (x1 ; x2 ) = g2 (x1 ; x2 ) = 0: Since
Df (0; 0) = (0; 1) and Dg1 (0; 0) = Dg2 (0; 0) = (1; 0); it is clear that Df (0; 0) is not a
linear combination of Dg1 (0; 0) and Dg2 (0; 0):
Proof of Theorem: Let g : U ! RK be the function g(x) = (g1 (x); : : : ; gK (x)):

Since the vectors Dg1 (x); :::; DgK (x) are independent, the matrix Dg(x) has rank
K. Without loss of generality, I may assume that the last K columns of the K N
matrix Dg(x) are independent.
Write any vector x 2 U , as x = (y; z), where y 2 RN K and z 2 RK . Let
x = (y; z). Then, g(y; z) = a = (a1 ; : : : ; aK ).
The object is to show that there is a K-vector such that Df (x) = T Dg(x). If
this is so, then Dz f (y; z) = T Dz g(y; z) where Dz is the derivative with respect to z,
holding y …xed at y. Because Dz g(y; z) is invertible, by assumption, we may de…ne
T
by the equation
T
= Dz f (y; z)(Dz g(y; z)) 1 :
This de…nition, of course, implies that
T
Dz f (y; z) = Dz g(y; z):
2
Since, Dz g(y; z) is invertible, the implicit function implies that there is an open
set W in RN K such that y 2 W and there exists a C 1 function h : W ! RK such
that h(y) = z and g(y; h(y)) = a, for all y 2 W . Also
1
Dh(y) = (Dz g(y; z)) Dy g(y; z):
Let F : W ! R be de…ned by F (y) = f (y; h(y)). F has a local maximum at y, so

that
0 = DF (y) = Dy f (y; z) + Dz f (y; z)Dh(y)

1
= Dy f (y; z) Dz f (y; z)(Dz g(y; z)) Dy g(y; z)
T
= Dy f (y; z) Dy g(y; z),
so that
T
Dy f (y; z) = Dy g(y; z):
The equations Dz f (y; z) = T Dz g(y; z) and Dy f (y; z) = T Dy g(y; z) imply that
P
Df (x) = T Dg(x), which is the same as Df (x) = K k=1 k Dgk (x).
Terminology: The numbers 1 ; :::; K are called Lagrange multipliers.
I now try to help you visualize this result. If f : R2 ! R, K = 1, and g : R2 ! R,

and a = 0, then the picture is as follows.
Df (x) must lie in the same line as Dg(x), where x is the constrained maximum,
the constraint being g(x) = 0. This is so because the curve g(x) = 0 is tangent at
x to the curve f (x) = f (x); Dg(x) is perpendicular at x to the curve g(x) = 0 at x;
and Df (x) is perpendicular at x to the curve f (x) = f (x):
The conditions
XK
Df (x) = k Dgk (x)
k=1
3
are called …rst-order conditions. The conditions gk (x) = ak ;for k = 1; :::; k; are called
the constraints.
The function
L(x; ; a) = f (x) [g(x) a]

K
X
= f (x1 ; :::; xN ) k [gk (x1 ; :::; xN ) ak ]
k=1
is called the Lagrangian. The equation
D(x; ) L(x; ; a) = 0
yields the equations

K X
@f @gk
(x1 ; : : : ; xN ) = k (x1 ; :::; xN ), for n = 1; : : : N
@xn @xn
k=1
and
gk (x1 ; : : : ; xN ) = ak , for k = 1; :::; K,
which are the …rst order conditions and constraints, respectively.
The …rst order conditions are necessary conditions for optimality. I now state a
theorem stating conditions that together with the constraints and …rst order condi-
tions are su¢ cient for local optimality.
Theorem: Let U be an open subset of RN , let f : U ! R and g : U ! RK

be twice continuously di¤erentiable PKand let a 2 RK . Let x 2 U and 2 RK be
T
such that Df (x) = Dg(x) = k=1 k Dgk (x) and g(x) = a. Let L(x; ; a) =
PK
f (x) k=1 k (gk (x) ak ) and let
Z(x) = fv 2 RN j Dg(x)(v) = 0g = fv 2 RN j Dgk (x)(v) = 0, for k = 1; : : : ; Kg:
a) If v T Dx2 L(x; ; a)v < 0, for all v 2 Z(x) such that v 6= 0, then f achieves a local
maximum at x on fx 2 RN j g(x) = ag.
b) If v T Dx2 L(x; ; a)v > 0, for all v 2 Z(x) such that v 6= 0, then f achieves a local
minimum at x on fx 2 RN j g(x) = ag.
4
I give an informal proof.of the theorem giving second order conditions for a local
optimum. Recall that x 2 U is the optimum and Z(x) = fv 2 RN jDgk (x)(v) = 0; for
k = 1; :::; Kg: The second order condition for a constrained local maximum is that
v T Dx2 L(x; ; a)v < 0; for all v 2 Z(x) such that v 6= 0: Let us accept that v 2 Z(x);
where v 6= 0; if and only if there is a twice di¤erentiable function : ( 1; 1) ! U
such that (0) = x; g( (t)) = ak ; for k = 1; :::; K and for all t, and D (0) = v:
Because gk ( (t)) = ak ; for all k and t,
it follows that
Dgk ( (t)) D (t) = 0;
for k = 1; :::; K: At t = 0; this equation becomes
Dgk (x) v = 0;
for k = 1; :::; K: Let us take the derivative with respect to t in the equation Dgk ( (t))
D (t) = 0 and apply the chain rule and Liebniz’s rule. Then,
D (t)T D2 gk ( (t))D (t) + Dgk ( (t))D2 (t) = 0;
where 0 1
d2
dt2 1 (t)
B .. C
D2 (t) = @ . A:
d2
dt2 N (t)
At t = 0; we obtain the equation
Dgk (x) D2 (0) = v T D2 gk (x) v:
1
We also know that
df ( (t))
= Df ( (t))D (t):
dt
By the …rst order conditions for a constrained local optimum,
K
X
Df (x) = k Dgk (x):
k=1
Su¢ cient conditions for a local maximum t = 0 along the curve (t) are that
df ( (0))
=0
dt
and
d2 f ( (0))
< 0:
dt2
However,
X K
df ( (0))
= Df ( (0))D (0) = Df (x)v = k Dgk (x)v =0
dt
k=1
and
d2 f ( (0))
= D (0)T D2 f ( (0))D (0) + Df ( (0))D2 (0)
dt2
K
X
T 2 2
= v D f (x)v + k Dgk (x)D (0)
k=1
XK
= v T D2 f (x)v kv
T
D2 gk (x)v
k=1
" K
#
X
= v T D2 f (x) kD
2
gk (x) v
k=1
= v T Dx2 L(x; ; a)v
Therefore, the inequality v T Dx2 L(x; ; a)v < 0 implies that f ( (t)) has a local maxi-
mum at the point t = 0:
Let us assume that we know that f has a constrained local maximum at x if it
has a local maximum along every curve through x and satisfying g( (t)) = a; for
all t: This assertion would require proof, of course, in a rigorous argument. Despite
the lack of rigor, I hope the preceding argument gives a good idea of why the theorem
is true.
2
Remarks:
1. The condition g(x) = a is the constraint, where g(x) = (g1 (x); :::; gK (x)):
T
2. The condition Df (x) = Dg(x) is a …rst order condition for local optimality.
3. The constraint and …rst order conditions are necessary for local optimality.
4. The conditions v T Dx2 L(x; ; a)v < 0; for all non-zero v 2 Z(x) is a second order
condition for a local maximum.
5. The condition v T Dx2 L(x; ; a)v > 0; for all non-zero v 2 Z(x) is a second order
condition for a local minimum
6. The constraint, the …rst order condition and the second order condition together
are su¢ cient for local optimality.
I next use the implicit function theorem to show that the optimal value of x and
the value of the corresponding vector of Lagrange multipliers may be written as
functions of the K vector a: We begin by writing down the necessary conditions for
a constrained optimum,
T
Df (x) Dg(x) = 0
g(x) + a = 0;
These may be written as the equation F (x; ; a) = 0, where
T
Df (x) Dg(x)
F (x; ; a) = :
g(x) + a
I apply the implicit function theorem to F to show that x and are functions of a a:
In order to do so, I must show that Dx; F (x; ; a) has rank N + K: If we calculate
this derivative, we see that
PK
D2 f (x) k=1 kD
2 g (x)
k Dg(x)T
D(x; ) F (x; ; a) =
Dg(x) 0
Dx2 L(x; ; a) Dg(x)T
=
Dg(x) 0
This matrix is called the bordered Hessian. It is the Jacobian of F (x; ; a) consid-
ered as a function of x and .
Assume the constraint quali…cation that Dg(x) has rank K, for all x, and suppose
that f achieves a local maximum at x on fx 2 RN j g(x) = ag and suppose that x
satis…es the second order condition for a local maximum. Let be the corresponding
vector of Lagrange multipliers. Then, v T D2 L(x; )v < 0; if Dg(x)v = 0, for a non-zero
N -vector v. I show that under these conditions the bordered Hessian is non-singular,
i.e., that it has rank N + K. I must show that if
Dx2 L (Dg)T v v
= 0; then = 0;
Dg 0 w w
3
where v 2 RN and w 2 RK . Since
Dx2 L (Dg)T v
= 0; (?)
Dg 0 w
it follows that Dg(x)v = 0. Therefore,
Dx2 L (Dg)T v
0 = (v T ; wT )
Dg 0 w
= v T (Dx2 L)v v T (Dg)T w wT (Dg)v
= v T (Dx2 L)v 2wT (Dg)v
= v T (Dx2 L)v:
The third equation above applies because v T (Dg(x))T w is a number and so
v T (Dg(x))T w = (v T Dg(x))T w)T = wT Dg(x)v:
The last equation holds because Dg(x)v = 0: The second order conditions satis…ed
at x imply that v T Dx2 L(x; )v < 0, unless v = 0. Therefore, v = 0. Equation (?)
T
implies that (Dx2 L)v (Dg)T w = 0. Hence, (Dg)T w = 0, so that (Dg)T w =
wT Dg( ) = 0. Since Dg(x) has rank K, the K rows of Dg(x) are independent and
so w = 0. Since I have shown that (v; w) = 0, it follows that the bordered Hessian
has rank N + K:
By the implicit function theorem, there exists an open set V in RK such that a 2 V
and there exist continuously di¤erentiable functions x (a) and (a) such that x(a) = a;
T
(a) = and Df (x(a)) (a)Dg(x(a)) = 0 and g(x(a)) + a = 0, for all a 2 V .
Su¢ cient conditions for a local maximum at x are that g(x) = a; Df (x) = T Dg(x);
and v T Dx2 L(x; )v < 0; for all v 2 RN such that v 6= 0 and Dg(x)(v) = 0. These
conditions hold at (x; ). Therefore they hold for (x; ) close enough to (x; ) and
such that g(x) = a; for some a; since Dg(x) and Dx2 L(x; z) depend continuously on x.
Since x(a) and (a) are continuous functions, these conditions hold at (x(a); (a)), if
a is close enough to a. Therefore, we can assume that they hold for a 2 V; by making
V smaller, if necessary. Hence, we may assume that x(a) is a local maximum, for all
a 2 V:
Observe that
g(x(a)) = a ) Dx g(x(a))Dx(a) = I
and so
T T
Df (x(a)) = Dx f (x(a))Dx(a) = (a)Dx g(x(a))Dx(a) = (a):
This shows that k (a) is the marginal value of increasing ak , for all k.
4
The Envelope Theorem for Constrained Optimization: Consider the prob-
lem
max f (x; b) (??)

x
s.t. g(x; c) = a;
where x varies over an open subset U of RN ; b varies over an open subset V of

RL and c varies over an open subset W of RM : That is, g : U W ! RK and
f : U V ! R: Assume that both g and f are twice continuously di¤erentiable.
Assume that Dx g(x; c) has rank K; for all a 2 U and c 2 W: Let x be a solution of
problem (??) when (a; b; c) = (a; b; c). Then g(x; c) = a and there is a 2 RK such
that
T
Dx f (x; b) = Dg(x; c):
Assume that
v T Dx2 L(x; ; a; b; c)v < 0;
for all v 2 RN such that v 6= 0 and Dx g(x; c)v = 0; where
T
L(x; ; a; b; c) = f (x; b) [g(x; c) a] :
The constraints and …rst order conditions for arbitrary parameter values a; b; c are
T
Dx f (x; b) Dx g(x; c) = 0
g(x; c) + a = 0:
Let
Dx f (x; b) Dx g(x; c)
F (x; ; a; b; c) =
g(x; c) + a
We know that D(x; ) F (x; ; a; b; c) is invertible, so that by the implicit function theo-
rem applied to the equation F (x; ; a; b; c) = 0 there exist locally de…ned C 1 functions
x(a; b; c) and (a; b; c) such that x(a; b; c) is a local maximum with corresponding La-
grange multipliers (a; b; c) 2 RK . Also x(a; b; c) = x and (a; b; c) = :
I will show that
D(a;b;c) f (x(a; b; c); b)j(a;b;c)=(a;b;c) (? ? ?)

h i
T
= D(a;b;c) f (x; b) (g(x; c) a) j(a;b;c)=(a;b;c)
= D(a;b;c) L(x; ; a; b; c)j(a;b;c)=(a;b;c)
Step 1 Di¤erentiate both sides of the equation
g(x(a; b; c); c) = a
with respect to a at a = a: Then,
Dx g(x; c)Da x(a; b; c) = I;
5
so that
T T
Da f (x(a; b; c); b)ja=a = Dx f (x; b)Da x(a; b; c) = Dgx (x; c)Da x(a; b; c) = :
g(x(a; b; c); c) = a
with respect to b at b = b: Then,
Db g(x(a; b; c); c)jb=b = Dx g(x; c)Db x(a; b; c) = 0;
so that
Db f (x(a; b; c); b)jb=b = Dx f (x; b)Db x(a; b; c) + Db f (x; b)

T
= Dx g(x; c)Db x(a; b; c) + Db f (x; b)
= Db f (x; b):
g(x(a; b; c); c) = a
with respect to c at c = c: Then,
Dc g(x(a; b; c); c)jc=c = Dx g(x; c)Dc x(a; b; c) + Dc g(x; c) = 0;
so that
Dx g(x; c)Dc x(a; b; c) = Dgc (x; c):
Hence,
T T
Dc f (x(a; b; c); b)jc=c = Dx f (x; b)Dc x(a; b; c) = Dx g(x; c)Dc x(a; b; c) = Dc g(x; c):
These three steps together imply equation (? ? ?), which is the envelope theorem for
constrained maximization. The same equation applies if max is replaced by min in
problem (??):
Example:
max [ 1 ln x1 + + N ln xN ]
xn >0
n=1;:::;N
s.t. p1 x1 + + pN xN = w;
where n > 0 and pn > 0, for all n.
L(x1 ; : : : ; xN ; ) = 1 ln x1 + + N ln xN p1 x1 pN xN
n
pn = 0; for all n:
xn
6
1
pn xn = n ; for all n:
N
X N
X
1
n = pn xn = w:
n=1 n=1
1 w
= :
P
N
k
k=1
n
pn xn = w; for all n:
P
N
k
k=1
n w
xn = ; for all n:
P
N pn
k
k=1
P
N
k
k=1
=:
w
is known as the marginal utility of wealth.
The utility function
u(x1 ; : : : ; xN ) = 1 ln x1 + + N ln xN
and the utility function
v(x1 ; : : : ; xN ) = eu(x1 ;:::;xN ) = x1 1 xNN
are known as Cobb–Douglas P utility functions. u and v give rise to the same demand
functions. A fraction an = Nk=1 ak of wealth is spent on commodity n. These fractions
add up to one.
If N = 2, we have the following picture.
7
Let
F (p1 ; :::; pN ; w; 1 ; :::; N)

= max [ ; ln x; + + N ln xN ]
xn 0;n=1;:::;N
s.t. p1 x1 + p2 x2 + + pN xN w = 0:
L(x1 ; :::; xN ; ; p1 ; :::; pN ; w; 1 ; :::; N) = 1 ln x1 + + N ln xN (p1 x1 + +pN xN w):
@F @L
= =
@w @w
@F
= xn
@pn
@F
= ln xn :
@ n
Convex Analysis: So far, we have had necessary or su¢ cient conditions for local
maxima or minima. Now we turn to conditions for global maxima or minima. First,
we need some de…nitions
De…nition: A set A RN is convex if whenever a 2 A, b 2 A and 0 1,

a + (1 )b = b + (a b) 2 A:
De…nition: If A RN is convex and f : A ! R, f is said to be concave if for every

a 2 A and b 2 A and such that 0 1, f ( a + (1 )b) f (a) + (1 )f (b).
f is strictly concave if for every a 2 A and b 2 A, such that a 6= b, and for every
such that 0 < < 1; f ( a + (1 )b) > f (a) + (1 )f (b).
8
9
Theorem: If A RN is convex and f : A ! R is concave, a local maximum of f

is a global maximum.
Proof: Suppose that f achieves a local maximum at a 2 A. If f does not achieve a

global maximum at a, there is b 2 A such that f (b) > f (a). Because f achieves a local
maximum at a, there is " > 0 such that f (x) f (a), if kx ak < ". If is su¢ ciently
close to 1 and less than 1, then a + (1 )b 2 A and ka [ a + (1 )b]k =
(1 ) ka bk < ", so that f ( a + (1 )b) f (a). However, because f is concave,
f ( a + (1 )b) f (a) + (1 )f (b) > f (a). This contradiction proves that f
achieves a global maximum at a.
Theorem: If A RN is convex and f : A ! R is strictly concave, then f achieves

a global maximum at at most one point.
Proof: Suppose that f achieves a maximum at points a and b in A, where a 6= b.

Because f is strictly concave, f ( 12 a + 12 b) > 12 f (a) + 12 f (b) = f (a), and 12 a + 12 b 2 A.
This contradicts the hypothesis that f achieves a global maximum at a.
Theorem: If A RN is convex and open and f : A ! R is concave and di¤eren-

tiable, then f achieves a global maximum at a 2 A, if Df (a) = 0.
Proof: Suppose that b 2 A and f (b) > f (a). Because f is concave,
f (a + (b a)) = f ((1 )a + b) (1 )f (a) + f (b) = f (a) + [f (b) f (a)];
for such that 0 < < 1: Hence

f (a + (b a)) f (a)
= f (b) f (a):
Therefore,
d
f (a + (b a))j =0 f (b) f (a) > 0:
d
However,
d
f (a + (b a))j =0 = Df (a)(b a) = 0:
d
This contradiction proves that there is no b 2 A such that f (b) > f (a).
De…nition: If A RN is convex, f : A ! R is convex if whenever a and b belong

to A, and 0 1, f ( a + (1 )b) f (a) + (1 )f (b).
De…nition: If A RN is convex, f : A ! R is strictly convex if whenever a and

b belong to A, a 6= b, and 0 < < 1, then f ( a + (1 )b) < f (a) + (1 )f (b).
1
Remark: f : A ! R is convex if and only if f is concave
Theorem: If A RN is convex and f : A ! R is convex, a local minimum of f is

a global minimum.
Theorem: If A RN is convex and f : A ! R is strictly convex, f achieves a

global minimum at at most one point.
Theorem: If A RN is convex and open and f : A ! R is convex and di¤eren-

tiable, f achieves a global minimum at a 2 A, if Df (a) = 0:
Theorem: Let A RN be open and convex and let f : A ! R be twice di¤eren-

tiable.
1. f is concave if D2 f (x) is negative semi-de…nite, for all x.
2. f is strictly concave if D2 f (x) is negative de…nite, for all x.
3. f is convex if D2 f (x) is positive semi-de…nite, for all x.
4. f is strictly convex if D2 f (x) is positive de…nite, for all x.
Proof: I prove only (1). Let a 2 A and b 2 A, where a < b. I …rst prove (1) for
d2
N = 1. Since dx 2 f (x) 0, dfdx
(x)
is a non-increasing function. That is, if x < y, then
df (x) df (y)
dx dx .
If f is not concave, then there is a c = a + (1 )b, where 0 < < 1,
and f (c) < f (a) + (1 )f (b). By the mean value theorem, there is a c1 , such that
a < c1 < c and
df (c1 )
f (c) f (a) f (c) f (a)
= =
dx c a a + (1 )b a
f (a) + (1 )f (b) f (a) (1 )[f (b) f (a)]
< =
(1 )(b a) (1 )(b a)
f (b) f (a)
= :
b a
Similarly, there is a c2 such that c < c2 < b and
df (c2 ) f (b) f (c) f (b) f (c) f (b) f (a) (1 )f (b)
= = >
dx b c b a (1 )b (b a)
(f (b) f (a)) f (b) f (a)
= = :
(b a) b a
Therefore,
df (c2 ) f (b) f (a) df (c1 )
> > ;
dx b a dx
which is impossible since c2 > c1 . This contradiction proves the theorem for the case
N = 1:
The following diagram illustrates the argument:
2
Now, consider the case N > 1. Let
g( ) = f ( a + (1 )b) = f (b + (a b)):
Then
dg( )
= Df ( a + (1 )b)(a b)
d
and
d2 g( )
= (a b)T D2 f ( a + (1 )b)(a b) 0;
d 2
since D2 f (x) is negative semi-de…nite, for all x. Therefore, by what has been proved
for the case N = 1,
f ( a + (1 )b) f (a) + (1 )f (b):
Kuhn–Tucker Theory: Kuhn–Tucker extends some of the ideas of the theory

of Lagrange multipliers to constrained optimization problems in which the functions
involved are not necessarily di¤erentiable, but are either concave or convex. Consider
the problem
max f (x) (?)

x2C
s.t. g(xk ) ak , for k = 1; :::; K;
where C 2 RN is convex, f : C ! R is concave, and, for k = 1; : : : ; K, gk : C ! R

is convex. Throughout the discussion of Kuhn-Tucker theory that follows, the set C
and the functions f and gk will be as just described.
3
De…nition: x 2 RN is said to be feasible if x 2 C and gk (x) ak , for all k.
Kuhn–Tucker Theorem: Suppose that x 2 C and 2 R+ K are such that, for k =
1;
P:K: : ; K, gk (xk ) ak ; k = 0, if gk (x) < ak , and x solves the problem maxx2C [f (x)
k=1 k gk (x)]. Then x solves problem (?).
Suppose that x solves problem (?) and that the following constraint quali…cation
is satis…ed. There exists an x ^ 2 C such that gk (^ x) < ak , for all k. Then, there
exists 2 R+ K such that, for all k, = 0, if g (x) < ak and x solves the problem
k k
PK
maxx2C [f (x) k=1 k gk (x)].
The conditions that k = 0 if gk (x) < ak , for all k, are called the complementary
slackness conditions.
Notice that the problem
" K
#
X
max f (x) k gk (x)
x2C
k=1
is unconstrained except to the extent that x belongs to C. In this sense, the Kuhn-
Tucker theorem converts a constrained maximization problem to an unconstrained
one.
The function
K
X
L(x; ) = f (x) k gk (x) or
k=1
XK
L(x; ; a) = f (x) k [gk (x) ak ]
k=1
is called the Lagrangian. The numbers 1 ; : : : ; k are called the Kuhn–Tucker

coe¢ cients. The vectors x and are said to satisfy the Kuhn–Tucker conditions
if x is feasible and, for all
PKk; k 0 and k = 0, if gk (x) < ak and if x solves the
problem maxx2C [f (x) k=1 k kg (x)].
Suppose that g is di¤erentiable and that there exists an x in the interior of C
such that gk (x) = ak ; for k = 1; :::; K. Then, the old constraint quali…cation that
Dg1 (x); :::; Dgk (x) are independent, for all x, implies the new constraint quali…cation
that there exists x ^ such that gk (^
x) < ak , for all k. Let g(x) = (g1 (x); :::; gk (x)): Since
Dg(x) has rank K, the linear transformation Dg(x) is onto and there exists v 2 RN
such that Dg(x)v = ( 1; : : : ; 1). If " > 0 is su¢ ciently small, x + "v 2 C and
gk (x + "v) < ak , for all k.
Let us apply the Kuhn-Tucker theorem to the consumer’s utility maximization

problem:
4
max u(x1 ; :::; xN ) ((?))
N
x2R+
s.t. p1 x1 + + pN xN w;
where u : R+ N ! R is concave and increasing, w > 0; and p > 0; for all n:

n
Since the constraint quali…cation is satis…ed, it follows that if x solves problem
(?), then there exists a non-negative number such that x solves the problem
maxx2RN [u(x) p x]
+
Since u is increasing this problem has no solution unless > 0; and so > 0: The
complementary slackness condition therefore implies that p:x = w:
The units of are utiles per dollar. The units of u(x) p:x are utiles. u(x) p:x
is consumer’s surplus measured in utiles
In general, if the maximization problem is
max f (x)
x2C
s.t. gk (x) ak ; for k = 1; ::; K:
Then the Langrangian,

K
X
L(x; ) = f (x) k gk (x);
k=1
is a kind of surplus. The quantity k gk (x) is the cost of the kth constraint.
Example (Need for the Constraint Quali…cation): Let N = 2 = K: Let C =

R2 and a = (0; 0): Finally, let f (x1 ; x2 ) = x2 ; g1 (x1 ; x2 ) = x22 x1 and g2 (x1 ; x2 ) =
x1 + x22 : So, the maximization problem is
max x2
x2R2
s.t. x1 + x22 0
x1 + x22 0
This example does not satisfy the constraint quali…cation, because x = 0 is the only
feasible vector. Because 0 is the only feasible point, it is also the optimum. There
are no non-negative numbers 1 and 2 ; such that x = 0 and 1 and 2 maximize
the Langrangian, for suppose there were. Then, x = 0 would solve the problem
max x2 1( x1 + x22 ) 2 (x1 + x22 ) ;

x2R2
and the derivative of the objective function of this problem would be zero at x = 0:
Setting the partial derivative with respect to x2 equal to zero at x2 = 0; we …nd that
1 2( 1 + 2 )0 = 0; which is impossible.
5
The next …gure should make clear why the Kuhn-Tucker theorem does not apply
to this example. The region where x1 +x22 0 is labeled as g1 (x) 0; and the region
where x1 +x22 0 is labeled as g2 (x) 0: The Langrangian is f (x) 1 g1 (x) 2 g2 (x):
If the Langrangian is maximized at x = 0, then the derivative of the Langrangian is
zero at x = 0: That is, Df (0) = 1 Dg1 (0) + 2 Dg2 (0):The derivative Df (0) points
straight upward, and Dg1 (0) and Dg2 (0) are horizontal. Since a vertical vector cannot
be a linear combination of horizontal vectors, the Lagrangian is not maximized at
the optimal value of x.
The situation changes when the example is modi…ed so as to satisfy the constraint
quali…cation, as in the next example.
max x2
x2R2
s.t. x1 + x22 1
x1 + x22 1
The solution to the problem is x = (0; 1): If 1 = 2 = 14 ; the solution x = (0; 1)

1 2 1 2
maximizes the Langrangian x2 4 ( x1 + x2 ) 4 (x1 + x2 ): The derivative of the
Langrangian is zero at (0; 1); because Df (1; 0) = (0; 1) = 14 ( 1; 2) + 14 (1; 2) =
1 1
4 Dg1 (0; 1) + 4 Dg2 (0; 1):
The constraints are pictured in the next …gure. The …gure shows vectors point-
ing in the direction of the derivatives, Dg1 (0; 1) and Dg2 (0; 1); of the constraint
functions at the optimum point (0; 1) as well as a vertical vector corresponding
to the derivative of the objective function, Df (0; 1): The derivatives Dg1 (0; 1) and
Dg2 (0; 1) are orthogonal at (0; 1) to the boundaries of the corresponding constraint
sets, fxjg1 (x) 1g and fxjg2 (x) 1g: Clearly, Df (0; 1) may be written as a linear
combination of Dg1 (0; 1) and Dg2 (0; 1) with positive coe¢ cients as is required by
maximization of the Langrangian at x = (0; 1):
6
Proof of the Su¢ ciency of Kuhn–Tucker conditions for optimality: The
feasibility of x and the complementary slackness condition imply that
K
X
f (x) = f (x) k [gk (x) ak ] = L(x; ; a):
k=1
Because the Lagrangian achieves a maximum at x,

K
X K
X
f (x) k [gk (x) ak ] f (x) k [gk (x) ak ], for all x 2 C:
k=1 k=1
If x 2 C is such that gk (x) ak , for all k, then because k 0, for all k, f (x)
PK
k [gk (x) ak ] f (x). All these inequalities together imply that f (x) f (x), if
k=1
x is feasible. That is, x solves problem (*).
I now discuss the interpretation of the these conditions.
De…nition: The value function V (a1 ; : : : ; aK ) = V (a) is de…ned by the equation
V (a1 ; : : : ; aK ) = sup f (x)

x2C
s.t. gk (x) ak , for k = 1; : : : ; K:
V is de…ned on A = f(a1 ; : : : ; aK ) j for some x 2 C, gk (x) ak , for k = 1; : : : Kg.
Lemma: A is convex and V : A ! R is concave.
7
Proof: I show that A is convex. Let a and a 2 A and 2 (0; 1). There exists an x
and an x in C such that gk (x) ak and gk (x) ak , for all k. Because C is convex,
x + (1 )x 2 C. Because gk is convex, for all k,
gk ( x + (1 )x) gk (x) + (1 )gk (x) ak + (1 )ak , for all k:
Therefore a + (1 )a 2 A and so A is convex.
I show that V is concave. Let a and a belong to A, 2 (0; 1), and let " > 0.
There exist x and x in C such that gk (x) ak and gk (x) ak , for all k, and
V (a) " < f (x) V (a) and V (a) " < f (x) V (a). By the argument made above,
x + (1 )x 2 C and gk ( x + (1 )x) ak + (1 )ak , for all k. Because f is
concave,
V ( a + (1 )a) f ( x + (1 )x) f (x) + (1 )f (x)
(V (a) ") + (1 )(V (a) ")
= V (a) + (1 )V (a) ":
Since " is arbitrarily small, V ( a + (1 )a) V (a) + (1 )V (a), and so V is
concave.
De…nition: If A RK and f : A ! R, then a subgradient for f at a 2 A is a

vector 2 RK such that f (a) f (a) + (a a), for all a 2 A.
Remarks:
1. If f is di¤erentiable and is a subgradient of f at a, then = Df (a):
2. If f is di¤erentiable and concave, then Df (a) is a subgradient of f at a.
Proof of (2): The function g(a) = f (a) + Df (a)(a a) f (a) is convex and
Dg(a) = Df (a) Df (a) = 0. Therefore, g(a) achieves a global minimum at a.
Clearly, g(a) = 0; so that f (a) + Df (a)(a a) f (a) = g(a) 0, for all a, and hence
Df (a) is a subgradient of f at a.
8
3. If f is concave and not di¤erentiable at a, then f may have several subgradients
at a, as in the next …gure.
Theorem: Let A = fa 2 RK j for some x 2 C, gk (x) ak , for k = 1; : : : ; Kg. Let

V : A ! R be de…ned by the equation
V (a) = sup f (x)

x2C
s.t. gk (x) ak , for k = 1; : : : ; K:
Suppose that x 2 C, a 2 RK , and 2 R+ K satisfy, for all k; g (x)

k ak , k = 0, if
gk (x) < ak , and x solves the problem
" K
#
X
max f (x) k gk (x) : (??)
x2C
k=1
Then, is a subgradient of V at a.
PK PK
Proof: I must show that if a 2 A, then V (a) k=1 k ak V (a) k=1 k ak .
By the su¢ ciency of the Kuhn-Tucker conditions for optimality, V (a) = f (x). By
assumption,
PK for all
Pk, gk (x) ak and k 0 and Pk = 0, if gk (x) < ak . P It follows that
K K K
k=1 k gk (x) = k=1 k ak . Therefore,
PK
V (a) k=1 k ak = PK
f (x) k=1 k gk (x).
Because x solves problem (??), f (x) k=1 k kg (x) f (x) k=1 P g
k k (x), if x 2 C.
K
Suppose that x 2 C is such that gk (x) ak , for all k. Then, f (x) k=1 k gk (x)
PK
f (x) k ak . Putting all these equations and inequalities together, we see that
Pk=1K PK
V (a) k=1 k ak f (x) k=1 k ak , if x 2 C is such that gk (x) ak , for all k.
PK PK
From the de…nition of V (a), it follows that V (a) k=1 k ak V (a) k=1 k ak .
9
Remark: The theorem says that the Kuhn–Tucker coe¢ cient k is a marginal value
of increasing ak , just as are the Lagrange multipliers in the di¤erentiable case.
Theorem: Let C, f , gk , A, and V be as in the previous theorem. Suppose that x

solves the problem
max f (x)
x2C
s.t. gk (x) ak , for k = 1; : : : ; K:
If 2 RK is a subgradient of V at a = (a1 ; : : : ; aK ), then, for

PKall k, k 0 and
k = 0 if gk (x) < ak , and x solves the problem maxx2C [f (x) g
k=1 k k (x)].
Proof: First of all, I show that k 0, for all k. If a and a0 are such that ak a0k ,
for all k, then V (a) V (a0 ), since any x 2 C that satis…es gk (x) a0k , for all k, also
satis…es gk (x) ak , for all k. Without loss of generality, I may assume that k = 1.
Let a be de…ned by a1 = a1 + 1 and ak = ak , if k 2. Because is a subgradient of
V at a, V (a) V (a) + (a a) = V (a) + 1 5 V (a) + 1 ; where the last inequality
follows because ak ak , for all k. Hence 1 0:
I next show that, for all k, k = 0, if gk (x) < ak . Let ak = gk (x), for all k. Because
ak ak , for all k, V (a) V (a). Since g(x) = a, V (a) f (x) = V (a). Therefore,
V (a) = f (x) = V (a). Since is a subgradient of V at a, V (a) + (a a) V (a) =
V (a), so that (a a) 0. Since, for all k, k 0 and ak ak 0, it follows that
(a a) 0 and so (a a) = 0. Since k 0 and ak ak , for all k, it follows
that k = 0 if ak < ak . That is, k = 0, if gk (x) < ak : PK
I now show that x solves the problem maxx2C [f (x) k=1 k gk (x)]. Because
is a subgradient of V at a, V (a) V (a) + (a a), for all a 2 A. Because k = 0 if
gk (x) < ak and gk (x) ak , for all k, g(x) = a. Let x 2 C and a = g(x). Then,
f (x) V (a) V (a) + (a a) = f (x) + (g(x) g(x));
so that f (x) g(x) f (x) g(x). That is,

K
X K
X
f (x) k gk (x) f (x) k gk (x):
k=1 k=1
The necessity of the Kuhn–Tucker conditions for optimality is demonstrated by

showing that if the problem
max f (x)
x2C
s.t. g(x) a
has a solution and if the constraint quali…cation is satis…ed at a, then V has a

subgradient at a. The proof requires the Minkowski separation theorem.
The Minkowski Separation Theorem
10
De…nition: If X is a set of N -vectors, the interior of X, written as int X, is
fx 2 X j for some " > 0; y belongs to X, if ky xk < "g:
De…nition: An N -vector p is said to be separate the sets of N -vectors, X and Y ,

if p 6= 0 and p x p y, for all x in X and y in Y .
The Minkowski separation theorem gives conditions under which two sets of N -
vectors may be separated by a non-zero N -vector.
In the next …gure, the 2-vector p separates the sets X and Y . The dotted line H
is a hyperplane perpendicular to p that comes between X and Y . A hyperplane in
N -space is a set of the form a + W , where a is an N -vector and W is a subspace of
dimension N 1. If p separates X and Y , there is hyperplane that comes between
X and Y . For this reason, the Minkowski separation theorem is often referred to as
the theorem of the separting hyperplane.
Minkowski Separation Theorem: Let X and Y be convex sets of N -vectors

and suppose that int X is not empty and does not intersect Y . Then, there exists a
non-zero N -vector p that separates X from Y .
The next …gure illustrates why the sets X and Y are assumed to be convex in
the statement of the theorem. The set X is not convex, has non-empty interior, and
clearly cannot be separated from Y .
11
The next …gure illustrates why it is assumed in the theorem that one of the sets
to be separated has non-empty interior. Both X and Y are convex, but both have
empty interior, so that the interior of one set does not intersect the other, though
the sets themselves intersect. The sets clearly cannot be separated.
The Necessity of the Kuhn–Tucker Conditions: I now prove that if the con-
straint quali…cation applies at a and if x solves the maximization problem
max f (x)
x2C
s.t. gk (x) ak , for k = 1; :::; K;
then there exists a K-vector such that x and satisfy the Kuhn–Tucker conditions.
Proof: By an earlier theorem, it is su¢ cient to prove that the value function, V
has a subgradient = ( 1 ; :::; K ) at a.
Let G = f(a; t) j a 2 A; t V (a)g and let Q = f(a; t) j a 2 RK ; a a, and
t V (a)g. The set G is the set of all points on or below the graph of V . This
set is convex, since the function V is concave and the set A is convex. It is clear
from the de…nition of Q that it is convex and has non-empty interior. Because V is
non-decreasing, the set G does not intersect the interior of Q. The next …gure should
help you visualize these sets.
12
It follows from the Minkowski separation theorem that there exists a non-zero
(K + 1)-vector v = (v1 ; :::; vK ; s) such that
v x v y, for all x in Q and y in G: (1)
I show that vk 0, for all k. Without loss of generality, I may let k = 1. Let a =
a + (1; 0; :::; 0). Clearly a belongs to A. Because V is non-decreasing, V (a) V (a),
so that (a; V (a)) belongs to G. Because (a; V (a)) belongs to Q, inequality 1 implies
that
v (a; V (a)) v (a; V (a)):
By de…nition of a,
v (a; V (a)) = v (a; V (a)) + v1 :
Substituting this equation into the previous inequality, we see that
v (a; V (a)) v (a; V (a)) + v1 ;
which implies that v1 0, as was to be proved.

I next show that s 0. By the de…nition of Q and G, (a; V (a) + 1) belongs to Q,
and (a; V (a)) belongs to G. Therefore,
v (a; V (a) + 1) v (a; V (a));
which implies that

v (a; V (a)) + s v (a; V (a));
so that s 0.
I now show that s < 0. Since s 0, it follows that s = 0 if s is not less than 0.
Suppose that s = 0. The constraint quali…cation implies that there exists an a in A
such that ak > ak , for all k. Since,
(a; V (a)) 2 Q and (a; V (a)) 2 G;
13
inequality 1 implies that
v (a; V (a)) v (a; V (a));
which, because s = 0, implies that

K
X K
X
vk ak vk ak :
k=1 k=1
Because ak > ak , for all k, and v 0, it follows that vk = 0, for k = 1; :::; K. Hence,
v = 0, since s = 0. This is impossible, since v 6= 0. This contradiction proves that
s < 0.
For k = 1; :::; K, let k = vks . Then, k 0, for all k, and the vector v may be
replaced, as a separating vector, by the vector ( 1 ; :::; K ; 1) = 1s v. That is,
( 1 ; :::; K; 1) x ( 1 ; :::; K; 1) y for all x in Q and y in G: (2)
I show that the vector =( 1 ; :::; K) is a subgradient of V at a. That is, if a is

in A then X
V (a) V (a) + k (ak ak ): (3)
k
In order to see that this inequality holds, notice that (a; V (a)) 2 Q and (a; V (a)) 2 G,
so that by inequality 2
( 1 ; :::; K; 1) (a; V (a)) ( 1 ; :::; K; 1) (a; V (a)):
That is, X X
k ak V (a) k ak V (a);
k k
which implies inequality 3.
14
Optimal Control Theory

Imagine that you control an all-terrain vehicle that must go from point x0 to point
x1 in the plane. There are varied types of terrain between x0 and x1 , hilly, flat,
sandy, marshy, grassy, and so on. The object is to choose the route and speed so
as to minimize total fuel consumption. You control direction and throttle. These
are roughly equivalent to a vector u in the set {u ∈ R2 | kuk ≤ r} = U , where r
corresponds to the maximal throttle setting. Direction is indicated by the direction
of the vector u. Your control at time t is u(t) and your location is x(t) ∈ R2 . Your
rate of progress is determined by the differential equations
dx1 (t) dx2 (t)

= g1 (x1 (t), x2 (t), u(t)) and = g2 (x1 (t), x2 (t), u(t)).
dt dt
In vector form, these equations become dx(t)/dt = g(x(t), u(t)). Your veloc-
ity dx(t)/dt depends on x(t), because the kind of terrain you are in depends on
x(t). The rate of fuel consumption at time t is f (x(t), u(t)). You start at time 0.
The problem is to choose a time T and a control path u(t) so that if x(t) satisfies
dx(t)/dt = g(x(t), u(t)), for all t, and x(0) = x0 , then x(T ) = x1 and you minimize
RT RT
0 f (x(t), u(t))dt or maximize − 0 f (x(t), u(t))dt.
This problem can be generalized and made more rigorous as follows. Let the
possible controls belong to a closed set U ⊂ RK . An admissible control is a
function u : [0, T ] → U , where T > 0 and u is piecewise continuous.
Definition: A function u : [0, T ] → U is piecewise continuous if:
1. u is continuous except at a finite number of values of t and
2. if t is a point of discontinuity of u, then limt→t,t<t u(t) and limt→t,t>t u(t) both

exist.
I believe the creators of this subject found from experience that piecewise continuous
functions were the appropriate class of control functions. In many examples, the set U
is compact and the optimal control stays on the boundary of U, moving continuously
most of the time but flipping occasionally from one side of U to another.
Example: A piecewise continuous function:
1
u
t1 t2 t3 T
Examples: Non-piecewise continuous functions:

½ 1
f (x) = 1−t , if 0 ≤ t < 1
1, if 1 ≤ t ≤ 2
f
f
1
0 1 t
½
0, if 0 ≤ t < 1
f (x) = 1
sin t−1 , if 1 < t ≤ 2
Notation: Let A denote the set of admissible controls. That is, A = {u : [0, T ] →
U | T > 0 and u is piecewise continuous}. Note that U is fixed but T is variable.
Assume that the functions f : RN × U → R and the functions gn : RN × U → R,
for n = 1, . . . , N , are continuously differentiable with respect to x1, . . . , xN , and
continuous with respect to x1, . . . , xN , u1 , . . . uK . Let g(x, u) = (g1 (x, u), ..., gN (x, u).
The problem under consideration is
Z T
max f (x(t), u(t))dt
u∈A 0
dx(t)
s.t. = g(x(t), u(t)), all t, x(0) = x0 and x (T ) = x1 .
dt
2
We seek necessary conditions for optimality. I will not prove that the conditions
to be stated are necessary, but I will derive them in a suggestive but non-rigorous
way using the Kuhn—Tucker theorem. This approach has the advantage of helping
familiarize you with the Kuhn-Tucker theorem. Assume that f and the functions gn
are concave, as they often are in economic problems. Divide the time interval [0, T ]
into M short intervals of length 4t, where ∆t = T M −1 . Assume that N = K = 1, so
that x(t) and u(t) are real numbers. Let us look at the behavior of x and u at times
m∆t, for m = 0, 1, ..., M . Our problem is approximately the following
M−1
X
max ∆tf (x(m∆t), u(m∆t)) ( )
u(0),u(∆t),...,u((M−1)∆t)∈U
x(0),x(∆t),...,x(M∆t)∈R m=0
s.t. x(m∆t) − x((m − 1)∆t) ≤ ∆tg(x((m − 1)∆t), u((m − 1)∆t)),
for m = 1, ..., M,
x(0) ≤ x0 , and − x(M ∆t) ≤ −x1
In converting constraints which should be equations into inequalities, I am assuming

that it is best to have x(t) as large as possible.
Since f is concave and the functions
x(m∆t) − x((m − 1)∆t) − ∆tg(x((m − 1)∆t), u((m − 1)∆t)
are convex, the Kuhn—Tucker theorem implies that there exist non-negative numbers
λ(0), ..., λ(M ) and β such that
λ(0) = 0, if x(0) < x0 ,
λ(m) = 0, if x(m∆t) − x((m − 1)∆t)

< ∆tg(x((m − 1)∆t), u((m − 1)∆t)), for m = 1, ..., M , and
β = 0, if − x(M ∆t) < −x1 ,

and the vector (u(0), u(∆t), ..., u((M −1)∆t), x(0), x(∆t), ..., x(M ∆t)) maximizes the
Lagrangian
M−1
X M
X
L = ∆tf (x(m∆t), u(m∆t)) − λ(m)[x(m∆t) − x((m − 1)∆t)
m=0 m=1
−∆tg(x((m − 1)∆t), u((m − 1)∆t))] − λ(0)x(0) + βx(M ∆t)
M−1
X
= [∆tf (x(m∆t), u(m∆t)) + λ(m + 1)∆tg(x(m∆t), u(m∆t))]
m=0
M
X
− λ(m)[x(m∆t) − x((m − 1)∆t)] − λ(0)x(0) + βx(M ∆t). ( )
m=1
3
I now simplify the expression for the Lagrangian by using an analogue of integra-
tion by parts. Recall the fundamental theorem R t of calculus, which asserts that if the
function f : [0, T ] → R is differentiable, then 0 df (s)ds
ds = f (t) − f (0), for all t ∈ [0, T ].
Integration by parts is based on the equations
Z T
d
F (T )G(T ) − F (0)G(0) = [F (t)G(t)]dt
0 dt
Z T ∙µ ¶ µ ¶¸
dF (t) dG(t)
= G(t)dt + F (t) dt
0 dt dt
Z Tµ ¶ Z T µ ¶
dF (t) dG(t)
= G(t)dt + F (t) dt.
0 dt 0 dt
The first equation follows from the fundamental theorem of calculus. The second
follows from Leibniz’s rule for the differentiation of the product of two functions.
Hence
Z Tµ ¶ Z T µ ¶
dF (t) dG(t)
G(t)dt = F (T )G(T ) − F (0)G(0) − F (t) dt.
0 dt 0 dt
We are now dealing with differences; the derivative is replaced by the first difference
and the integral is replaced by a sum. If y1 , y2 , ... is a sequence, the first difference
of this sequence is the sequence ∆y1 , ∆y2 , ..., where ∆ym = ym+1 − ym , for all m.
Leibniz’s rule for first differences is that
∆(xm ym ) = xm+1 ym+1 − xm ym = xm+1 ym+1 − xm ym+1 + xm ym+1 − xm ym

= ym+1 ∆xm + xm ∆ym .
The analogue of the equation

Z T
d
[F (t)G(t)]dt = F (T )G(T ) − F (0)G(0)
0 dt
is
M
X
∆(xm ym ) = x2 y2 − x1 y1 + x3 y3 − x2 y2 + x4 y4 − x3 y3 + · · · + xM+1 yM+1 − xM yM
m=1
= xM+1 yM+1 − x1 y1 .
Putting these equations together, we see that

M
X
xM+1 yM+1 − x1 y1 = ∆(xm ym )
m=1
XM M
X
= ym+1 ∆xm + xm ∆ym ,
m=1 m=1
4
which is the analogue of integration by parts. This equation implies that
M
X M
X
− ym+1 ∆xm = xm ∆ym − xM+1 yM+1 + x1 y1 .
m=1 m=1
Let xm = x((m − 1)∆t) and ym = λ(m − 1). Then the equation just derived implies
that
M
X
− λ(m)[x(m∆t) − x((m − 1)∆t)]
m=1
XM
= − λ(m)∆x((m − 1)∆t)
m=1
M
X
= x((m − 1)∆t)∆λ(m − 1) − λ(M )x(M ∆t) + λ(0)x(0).
m=1
Substituting into the expression ( ) above for the Lagrangian, we obtain

M−1
X
L = [∆tf (x(m∆t), u(m∆t)) + λ(m + 1)∆tg(x(m∆t), u(m∆t))]
m=0
M
X
+ x((m − 1)∆t)∆λ(m − 1) − λ(M )x(M ∆t) + λ(0)x(0) + βx(M ∆t) − λ(0)x(0)
m=1
M−1
X
m=0
M
X
+ x((m − 1)∆t)∆λ(m − 1) + [β − λ(M )] x(T )
m=1
M−1
X
m=0
M−1
X
+ x(m∆t)∆λ(m) + [β − λ(M )] x(T ),
m=0
where I have used the fact that M ∆t = T.

According to the Kuhn—Tucker theorem, L is maximized with respect to both
u(m∆t) and x(m∆t), for all m. Therefore u(m∆t) solves the problem
max[f (x(m∆t), u) + λ(m + 1)g(x(m∆t), u)], for all m = 0, ..., M − 1. ( )

u∈U
If we maximize L with respect to x(m∆t), we obtain the first-order conditions
∂
∆λ(m)+∆t [f (x(m∆t), u(m∆t))+λ(m+1)g(x(m∆t), u(m∆t))] = 0, for m = 0, ..., M −1.
∂x
5
Hence
λ(m + 1) − λ(m) ∂
= − [f (x(m∆t), u(m∆t)) + λ(m + 1)g(x(m∆t), u(m∆t))].
∆t ∂x
Think of λ as a function of the continuous time variable t, so that λ(t) is its value at
time t. Replace λ(m) by λ(m∆t) and let ∆t go to zero while increasing m so that
m∆t converges to t. The previous equation becomes
λ((m + 1)∆t) − λ(m∆t) ∂
= − [f (x(m∆t), u(m∆t))+λ((m+1)∆t)g(x(m∆t), u(m∆t))].
∆t ∂x
Taking the limit as ∆t goes to zero, we obtain the equation
dλ(t) ∂
= − [f (x(t), u(t)) + λ(t)g(x(t), u(t))], for all t.
dt ∂x
Similarly u(t) solves the problem
max[f (x(t), u) + λ(t)g(x(t), u)], for all t.

u∈U
Finally if we assume that all the constraints hold with equality at the optimum,
we see that
dx(t)
= g(x(t), u(t)), for all t.
dt
These conclusions may be summarized using the Hamiltonian function, which
is the continuous time instantaneous analogue of the Lagrangian for problems with
finitely many variables. (In continuous time, there are infinitely many variables,
namely, x(t) and u(t) for all t.) The Hamiltonian function is defined to be
H(x, u, λ) = f (x, u) + λ · g(x, u).
If we assume that x(t) and u(t) are numbers rather than vectors, then
H(x, u, λ) = f (x, u) + λg(x, u).
That is, there is no inner product. In this case, we may summarize our intuitively
derived findings as follows
u(t) solves max H(x(t), u, λ(t)),

u∈U
dx(t) ∂
= H(x(t), u(t), λ(t)), for all t.
dt ∂λ
dλ(t) ∂
= − H(x(t), u(t), λ(t)), and
dt ∂x
The equations dx ∂ dλ ∂
dt = ∂λ H and dt = − ∂x H are called the Hamiltonian system. The
fact that u(t) maximizes H(x(t), u, λ(t)) with respect to u is called the maximum
principle. The maximum principle implies that ∂H ∂u (x(t), u(t), λ(t)) = 0.
6
Moreover, if u(t) is differentiable,
d ∂H dx ∂H du ∂H dλ
H(x(t), u(t), λ(t)) = + +
dt ∂x dt ∂u dt ∂λ dt
∂H ∂H du ∂H ∂H
= + (0) + (− ) = 0,
∂x ∂‘λ dt ∂λ ∂x
so that H is constant along the optimal path. (This is so because f and g do not
depend directly on time.) This result is true even if the function u is not differentiable.
If x(t) and u(t) are vectors, then the above statements remain true. More pre-
cisely, a necessary condition for optimality is that there exist piecewise differentiable
functions λ1 (t), ..., λN (t) such that (λ1 (t), ..., λN (t)) 6= 0, for all t, and such that if
H(x1 , . . . , xN , u1 , . . . , uK , λ1 , . . . λN )
N
X
= f (x1 , . . . , xN ; u1 , . . . , uK ) + λn gn (x1 , . . . , xN ; u1 , . . . , uK ),
n=1
then
dxn (t) ∂
= H(x(t), u(t), λ(t)) and
dt ∂λn
dλn (t) ∂
= − H(x(t), u(t), λ(t))
dt ∂xn
Similarly because u(m∆t) solves problem (∗ ∗ ∗), u(t) solves the problem
max[f (x(t), u) + λ(t)g(x(t), u], for all t.

u∈U
for n = 1, ..., N, and u(t) solves the problem maxu∈U H(x(t), u, λ(t)), for all t. Fur-
thermore,
d
H(x(t), u(t), λ(t)) = 0.
dt
These statements are true even if f and the gk are not concave. The variables
x1 (t), ..., xN (t) are the state variables. The variables λ1 (t), ..., λN (t) are called dual,
conjugate, costate or auxiliary variables.
From Kuhn—Tucker theory we know that the Kuhn—Tucker coefficients are sub-
gradients of the value function. Let
Z T
W0 (x0 ) = max f (x(s), u(s))ds ( )
u∈A 0
dx(s)
s.t. = g(x(s), u(s)), for all s,
dt
x(0) = x0 and x(T ) = xT ,
Then, λ(0) is the derivative of W0 , if W0 is differentiable. Let (x̄, ū) solve the
maximization problem ( ), and let λ̄(t),for 0 ≤ t ≤ T, be the conjugate function
7
corresponding to (x̄, ū). Let
Z T
Wt (x) = max f (x(s), u(s))ds
u∈ϕ t
dx(s)
s.t. = g(x(s), u(s)), for all s,
ds
x(t) = x and x(T ) = xT .
Then
λ̄(t) = DWt (x̄(t))
The maximum principle and Hamiltonian system determine the evolution of
(x(t), λ(t), u(t)) and hence determine (x(t), λ(t), u(t)), for all t, given appropriate
initial conditions for the differential equations governing the evolution of x(t) and
λ(t). Since x(t) and λ(t) each have N components, 2N initial conditions are re-
quired. These are provided by the N components of x(0) and the N components of
x(T ). These statements may become clearer when considering the following example.
Example (Simple Optimal Growth Model):

C(t) = consumption at time t.
K(t) = capital at time t.
u(C) = utility function.
f (K) = production function.
C(t) + dK(t)
dt = f (K(t)) is the feasibility condition.
dK(t)
dt = rate of investment
Assume that the functions u and f are twice differentiable and that du(C)/dC > 0
and d2 u(C)/dC 2 < 0, for all C, and df (K)/dK > 0 and d2 f (K)/dK 2 < 0, for all K.
The feasibility condition is that consumption + investment = output. The opti-
mization problem is
Z T
max u(C(t))dt
C:[0,T ]→[0,∞) 0
and C is piecewise continuous
dK(t)
s.t. = f (K(t)) − C(t),
dt
K(0) = K0 and K(T ) = K1 ,
where T , K0 , and K1 are specified in advance.

In this problem, C(t) is the control variable, K̇(t) is the state variable, and the
set of possible controls, U , is [0, ∞). The horizon T is treated as fixed. This example
is given in order to show that there is a relation between T and the initial conditions
K(0) and dK(0)
dt or λ(0).
8
The Hamiltonian is H(K, C, λ) = u(C) + λ[f (K) − C]. If λ(t) is the dual variable,
then the evolution conditions are
dK(t) ∂H
= = f (K(t)) − C(t),
dt ∂λ
dλ(t) ∂H df (K(t))
= − = −λ(t) ,
dt ∂K dK
and C = C(t) maximizes H(K(t), C, λ(t)) = u(C) + λ(t)[f (k(t)) − C], for all t. The
first two conditions are the Hamiltonian system. The third condition follows from the
maximum principle.
The third condition implies that if C(t) > 0, then
∂H du(C(t))
0= (K(t), C(t), λ(t)) = − λ(t).
∂C dC
That is,
du(C(t))
λ(t) = .
dC
du(C(0)) dK(0)
The equations dC = λ(0) and dt = f (K(0) − C(0) show that there is a
one to one relation between λ(0) and dK(0)

dt .
The constancy of the Hamiltonian implies that
H(K(t), C(t), λ(t)) = u(C(t)) + λ(t)[f (K(t)) − C(t)] = constant.
Since f (K(t)) − C(t) = dK(t)/dt, this equation implies that
dK(t)
u(C(t)) + λ(t) = constant.
dt
If C(t) > 0, so that λ(t) = du(C(t))/dC, then
du(C(t)) dK(t)
u(C(t)) + = constant.
dC dt
9
I now show that if C(0) > 0, then C(t) > 0, for all t, and C(t) is forever increasing.
If C(0) > 0; then (0) = du(C(0))=dC > 0. Since d (t)=dt = (t)[df (K(t))=dK]
and df (K)=dK > 0, it follows that d (t)=dt < 0, as long as (t) > 0. The equation
(t) = du(C(t))=dC implies that (t) > 0 and implies by implicit di¤erentiation that
1
dC(t) d2 u(C(t)) d (t)
= :
dt dC 2 dt
Because d (t)=dt < 0 and d2 u(C(t))=dC 2 < 0, this equation implies that dC(t)=dt >
0.
I next show that if dK(t)=dt = 0, then dK(t)=dt is decreasing. Because dK(t)=dt =
f (K(t)) C(t), it follows that
d2 K(t) df (K(t)) dK(t) dC(t)

2
= :
dt dK dt dt
Since dC(t)=dt > 0, it follows that d2 K(t)=dt2 < 0, if dK(t)=dt = 0.
Now, let u(C) = a2 (C C)2 and f (K) = bK, where a > 0; C > 0, and b > 0.
By the Hamiltonian system,
d (t) @H df (K(t))
= = (t) = b (t);
dt @K dK
so that
bt
(t) = (0)e ;
for all t: Also,
du
= a(C C) = a(C C):
dC
The equation
du(C(t)) dK(t)
u(C(t)) + = constant
dC dt
becomes
a dK(t)
(C(t) C)2 + a(C C(t)) = constant:
2 dt
Write K_ for dK(t)=dt and C for C(t). Then the above equation becomes
(C C)2 2(C C)K_ = constant.
Let be this constant. On completing the square, this equation becomes
(C _ 2 + K_ 2 = :
C + K)
Since C + K_ = bK; the above equation becomes
K_ 2 (bK C)2 = :
1
This equation may be rewritten as
K_ 2 b2 (K K)2 = ;
where K = C=b is the bliss capital stock. Since C = bK, K is the amount of capital
needed to produce the bliss consumption C.
Suppose that = 0 and let y = b(K K). Then, y_ = bK; _ where y_ = dy(t) : Hence,
dt
y_ 2
= _
K 2 = b2 (K K) 2 = y 2 ; so that y_ 2 = b2 y 2 and so y_ = by or y_ = by. If y_ = by;
b2
then y(t) = y(0)ebt : If y_ = by; then y(t) = y(0)e bt . Therefore, either
bK(t) = y(0)ebt + bK, or

bt
bK(t) = y(0)e + bK:
That is,
y(0) bt
K(t) = e + K; or
b
y(0) bt
K(t) = e + K:
b
The solution
y(0) bt
K(t) = e +K
b
holds when y_ = by, so that bK_ = b2 (K K). That is, K_ = b(K K):
Similarly, the solution
y(0) bt
K(t) = e +K
b
holds when y_ = by, so that K_ = b(K K).
It is helpful to consider the curves K_ 2 (bK K)2 = in (K; K) _ space, which is
called phase space. If = 0, the curves are the straight lines through (K; 0) de…ned
by the equations K_ = b(K K) and K_ = b(K K):
2
The arrows indicate the direction of motion. This may by determined by the sign
_
of K and hence the direction of motion of K. Notice that the solution on the upper
left branch of the cross in the diagram corresponds to y0 < 0, so that K(t) increases
toward K as t increases.
Now suppose that 6= 0. The equation K_ 2 b2 (K K)2 = describes a hyperbola.
The branches of the hyperbolas are shown in the diagram and they approach the
lines K_ = b(K K) asymptotically. The movement along the curves is indicated
by the arrows. Again the direction of motion along a curve is determined by the sign
_ Curves that are further from the abscissa correspond to higher values of K_
of K:
and hence to faster movement.
Now, …x K0 and K1 as in the next diagram. Five possible paths from K0 to K1
are shown, indicated by 1, 2, 3, 4, and 5. The initial and end points of these paths
are indicated by heavy dots.
3
The higher is a path, the faster is the movement along the path and hence the
more quickly it reaches K1 , unless the path overshoots, increases K above K1 , and
then falls back to K1 : The path labeled 1 goes from K0 to K1 most quickly and among
the labeled paths has the lowest value of T . The path 2 along the line K_ = b(K K)
is the next fastest and so has the next lowest value of T . The path 3 from the …rst
to the second dot is the next fastest. The path 4 is next, and then what probably
comes next is the path 3 from the …rst to the third dot. This path overshoots. A
path can be made to last an arbitrarily long time by starting with K_ su¢ ciently close
to b(K K0 ) but less than b(K K0 ). These paths overshoot, spend a long time
near K and then fall back to K1 . The path 2 from the …rst dot to the third dot is an
example of such a path. The graph of K versus time for such a path looks like the
following diagram.
_
As T is increased, the initial value K(0) must approach the number b(K K0 ) from
below and the proportion of the time spent near K increases to one. This property of
4
lingering near K is not speci…c to this example, but applies quite generally to growth
models. It is called the “turnpike property,” where the term “turnpike” stems from
the resemblance of the diagram to a map of a superhighway with entrance and exit
ramps. In general models, the bliss level of capital is replaced by the stationary
optimal level of capital, which is the level of capital such that an optimal program
would stay at that level if it started there and was to end there.
The relation between T and the initial value of K_ may be visualized as follows.
Let K_ A (0) and K_ B (0) be as in the above …gure. The next …gure shows that the time
T to go from K0 to K1 along the hyperbolic paths shown in the above …gure and as
a function of the initial value of K;_ call it K(0):
_
There are two branches to this function. The lower branch is the time to the …rst
time T such that K(T ) = K1 : The upper branch is the time to the second time T
5
_
such that K(T ) = K1 . This branch goes to in…nity as K(0) approaches K_ B (0) from
_ _
the left. Neither branch exists for K(0) to the left of KA (0); since K1 cannot be
_
reached from K0 if K(0) < K_ A (0):
The question arises as to how a path such as that labeled (4) in the diagram
_
could cross the abscissa from above, if K(t) = dK(t)=dt vanishes there. Recall I have
shown for the more general growth model that d2 K(t)=dt2 < 0, if dK(t)=dt = 0.
This reasoning depended on the assumption that du(C)=dC > 0, whereas the utility
function considered here, u(C) = a2 (C C)2 , has negative slope if C > C. Notice,
however, that if K < K, then total output, bK, is less than bK = C. We know that
dK(t)
C(t) + = bK(t):
dt
Therefore if dK(t)=dt 0, then C(t) bK(t) < bK = C, so that du(C(t))=dC > 0.
Hence the reasoning made before applies along a path on which dK(t)=dt is non-
negative or even if it is negative but exceeds b(K(t) K). We may therefore conclude
that along a path such as that labeled (4) in the …gure,
dC(t)
> 0;
dt
and hence
d2 K(t) dK(t) dC(t) dK(t)
2
=b <b
dt dt dt dt
and hence d2 K(t)=dt2 < 0 at the time when the path crosses the abscissa.
Clearly a di¤erent reasoning applies to paths in the quadrant to the right of
the point (K; 0) in the diagram, for the optimal paths there cross the abscissa from
below, so that d2 K(t)=dt2 > 0 when a path crosses the axis. In this quadrant,
however, K > K, so that consumption must exceed C when the path is near the
abscissa and hence dK(t)=dt is nearly zero. The utility function is decreasing at such
high consumption levels, so that a reasoning opposite to that made earlier implies
that if dK(t)=dt is nearly zero then
du(C(t))
(t) = <0
dC
and so
d (t) dF (K(t))
= (t) >0
dt dK
and so
1
dC(t) d2 u(C(t)) d (t)
= <0
dt dt2 dt
and hence
d2 K(t) bdK(t) dC(t)
2
= > 0;
dt dt dt
dK(t)
when dt = 0:
6
The Contraction Mapping Theorem
I will soon discuss dynamic programming with discrete time. That theory uses the
contraction mapping theorem, which I now present.
Let X be a compact subset of RN , let C(X) = ff : X ! R j f is continuousg. If
f 2 C(X); let kf k = maxx2X jf (x)j. Because X is compact and f is continuous, kf k
exists.
Lemma: If fk is a sequence in C(X) and limk!1 kfk f k = 0, where f : X ! R,

then f is continuous and so belongs to C(X).
Proof: Let xn be a sequence in X converging to x. I must show that limn!1 f (xn ) =

f (x). Let " > 0. Choose K so that kfK f k < "=3. Since fK is continuous, there
exists an N such that jfK (xn ) fK (x)j < "=3, if n N . Then, if n N ,
jf (xn ) f (x)j jf (xn ) fK (xn )j + jfK (xn ) fK (x)j + jfK (x) f (x)j
" " "
< + + = ":
3 3 3
Since " is arbitrarily small, limn!1 f (xn ) = f (x).
Lemma: C(X) with the norm k k is complete. That is, if fn is a sequence in C(X)
that is Cauchy with respect to k k, then there is an f 2 C(x) such that limn!1 fn =
f.
Proof: fn is Cauchy if
lim sup kfn fm k = 0:

N !1 n N;m N
If fn is Cauchy, then for each x 2 X, the sequence fn (x) is Cauchy. Since the real
numbers are complete, there exists an f (x) 2 R such that limn!1 fn (x) = f (x).
Therefore, there is a function f : X ! R such that limn!1 fn (x) = f (x), for all
x 2 X. I show that limn!1 kfn f k = 0. Let " > 0. Since fn is Cauchy, there exists
N such that kfn fm k < "=2, if n N and m N . If x 2 X, there is a k depending
on x such that k N and jfk (x) f (x)j < "=2. If n N , then
jfn (x) f (x)j jfn (x) fk (x)j + jfk (x) f (x)j

" "
kfn fk k + jfk (x) f (x)j < + = ":
2 2
Since x is an arbitrary point in X, kfn f k < ", if n = N . Since " > 0 is arbitrarily
small, lim kfn f k = 0. By the previous lemma, f 2 C(X). Therefore, C(X) is
n!1
complete.
De…nition: A function Q : C(X) ! C(X) is said to be a contraction if for some

such that 0 < < 1, kQ(f ) Q(g)k kf gk, for all f and g in C(X).
7
De…nition: If Q : C(X) ! C(X), a …xed point of Q is an f 2 C(X) such that
Q(f ) = f .
Contraction Mapping Theorem: Every function Q : C(X) ! C(X) that is a

contraction has a unique …xed point.
Proof: Consider the sequence f , Q(f ), Q(Q(f )) = Q2 (f ), Q(Q(Q(f ))) = Q3 (f ); : : :.

I show that this sequence is Cauchy with respect to k k
Qn+1 (f ) + Qn (f ) = Q(Qn (f )) Q(Qn 1

(f ))
n n 1 n
Q (f ) Q (f ) Q(f ) Q0 (f )
n
= kQ(f ) f k , since Q0 (f ) = f:
Therefore, if n > m N , then
kQn (f ) Qm (f )k Qn (f ) Qn 1
(f ) + Qn 1
(f ) Qn 2
(f )
m+1 m
+ + Q (f ) Q (f )
n 1 n 2 m
( + + + ) kQ(f ) fk
n 1 n 2 N
( + + + ) kQ(f ) fk
N +1 N
(:::: + + ) k Q(f ) fk
N
= kQ(f ) f k ! 0, as N ! 1:
1
Therefore, the sequence Qn f is Cauchy with respect to k k. Therefore, by the
previous lemma, there exists a g 2 C(X) such that limn!1 kQn (f ) gk = 0. Since
Qn+1 (f ) Q(g) < kQn (f ) gk, it follows that limn!1 Qn+1 (f ) Q(g) = 0;
which means that limn!1 kQn (f ) Q(g)k = 0. Since limn!1 kQn (f ) gk = 0, it
follows that g = Q(g) and hence g is a …xed point of Q.
I now show that the …xed point is unique. Suppose that f and g are …xed points of
Q and that f 6= g: Then,kf gk > 0 and because Q is a contraction with coe¢ cient
;
kf gk = kQ(f ) Q(g)k kf gk ;
which is impossible since < 1:This contradiction proves that Q cannot have two
distinct …xed points.
Dynamic Programming
I do not explain dynamic programming in general, but present its ideas using a growth
model, which I now de…ne. Let there be N commodities. A vector x in RN will be
thought of as a commodity vector in that its nth component, xn ; represents a quantity
of commodity n; for n = 1; :::; N: Assume that because the earth is bounded, there is
a number b such that no more than b units of any commodity could ever be produced.
Let B = fx 2 RN j 0 xn b, for all ng. Let u : B ! R be a utility function and
F : B ! B be a production function. A consumption vector is denoted c, an output
8
vector by y, and a capital vector by K. Assume that u and F are continuous. The
optimal growth problem is,
1
X
t
max u(ct )
(c0 ;c1 ;:::)
t=0
s.t.for some y1; y2; :::: in B; (1)
0 ct yt , and yt+1 = F (yt ct ), for all t;
where y0 in B is given and 0 < < 1. The number is a discount factor, and y0 is
the vector of initial stocks of goods.
The quantity yt ct is the vector of production inputs or capital in period t
and could be denoted by kt . In the above problem, the consumptions ct are control
variables and the output vectors yt are state variables.
The value function for the above problem is
1
X
t
V (y0 ) = max u(ct )
(c0 ;c1 ;:::)
t=0
s.t. for some y1; y2; ::: in B;
0 ct yt , and yt+1 = F (yt ct ), for all t 0;
where y0 in B is given.
Observe that this maximization problem can be written as
2 2 33
6 6 1
X 77
6 6 t 1 77
max 6u(c0 ) + 6 max u(ct )77 :
c0 :0 c0 y0 4 4(c1 ;c2 ;:::):0 ct yt , for t1; 55
y1 =F (y0 c0 ), and t=1
yt =F (yt 1 ct 1 ), for t 2
That is, the maximization problem from time 1 on is just like that from time 0 on,
except that the initial stock of goods may be di¤erent. Therefore
8 9
>
> >
>
>
< X1 >
=
t 1
V (y0 ) = max u(c0 ) + [ max u(ct )
c0 :0 c0 y0 >
> (c1 ;c2 ;:::):0 ct yt , for t 1; >
>
>
: y1 =F (y0 c0 ), and t=1 >
;
yt =F (yt 1 ct 1 ), for t 2
= max [u(c0 ) + V (F (y0 c0 ))]:
c0 :0 c0 y0
Hence the value function V , if it exists, satis…es the equation

V (y) = max [u(c) + V (F (y c))];
c:0 c y
for any y in B: If we let

Q(W )(y) = max [u(c) + W (F (y c))];
c:0 c y
for any W 2 C(B), then V satis…es the equation V = Q(V ).
9
I want to show that the following maximization problem has a solution.
1
X
t
max u(ct )
(c0 ;c1 ;:::)
t=0
s.t. for some y1; y2; :::: in B; (1)
0 ct yt , and yt+1 = F (yt ct ), for all t;
where y0 in B is given and 0 < < 1. In this problem u : B ! R and F : B ! B;

where B = fx 2 RtN j 0 5 xn 5 b; for n = 1; :::; N g and b is some …xed positive
number. There are two approaches to showing the above problem has a solution.
One is to show directly that (1) has a solution. Another is to show the corresponding
function Q is a contraction mapping from C(B) to C(B); so that Q has a unique
…xed point, V: Recall that Q is de…ned by the equation
Q(W )(y) = max [u(c) + W (F (y c))];

c:0 c y
where W 2 C(B), and y 2 B. It may then be shown that V is the value function
for problem (1), so that this problem has a solution. I use this approach, though the
…rst approach is just as good.
I must …rst show that Q maps C(B) to C(B). If W 2 C(B), then u(c)+ W (F (y
c)) is a continuous function of c. Because c varies over the compact set fc j 0 c yg,
it follows that the maximum appearing in the de…nition of Q exists.
I must show that Q(W )(y) is a continuous function of y. The proof I give uses
the concept of uniform continuity.
De…nition: A function f : B ! R is uniformly continuous if for every " > 0,

there exists a > 0 such that jf (x) f (y)j < ", if kx yk < .
Lemma: If B RN is compact and f : B ! R is continuous, then f is uniformly

continuous.
Proof: Let " > 0. Because f is continuous, for each b in B, there is a posi-
tive number (b) such that if jf (x) f (b)j < "=2 if kx bk < 2 (b). If B (b) (b) =
fx 2 B j kx bk < (b)g, then fB (b) (b) j b 2 Bg is an open cover of B. By
the Heine–Borel theorem, there is a …nite subcover, B (b1 ) (b1 ); :::; B (bK ) (bK ). Let
= mink=1;:::;K (bk ). Then > 0. If kx yk < , where x and y both belong to B,
then x 2 B (bk ) (bk ), for some k. Since kx yk < (bk ), it follows that
ky bk k ky xk + kx bk k < + (bk ) 2 (bk ).
1
Therefore jf (y) f (bk )j < "=2. Similarly since kx bk k < (bk ), it follows that
jf (x) f (bk )j < "=2. Hence
" "
jf (x) f (y)j jf (x) f (bk )j + jf (bk ) f (y)j < + = ":
2 2
Lemma: If W : B ! R is continuous, then Q(W ) : B ! R is continuous.
Proof: The function h(c; y) = u(c) + W (F (y c)) is continuous on the compact

set C = f(c; y) j y 2 B; 0 c yg. Therefore, h is uniformly continuous. Let
" > 0. There is a > 0 such that ju(c) + W (F (y c)) u(c) W (F (y c))j < ",
if kc ck < , ky yk < ; (c; y) 2 C, and (c; y) 2 C:
Let c be such that Q(W )(y) = u(c) + W (F (y c)), where 0 c y. Suppose
that y 2 B and ky yk < . Then, there is a c^ 2 B such that 0 c^ y and
kc c^k < . (Let c^n = min(yn ; cn ), for all n.) Therefore,
Q(W )(y) u(^
c) + W (F (y c^)) u(c) + W (F (y c)) " = Q(W (y)) ":
By the symmetric argument, Q(W )(y) Q(W )(y) ", so that jQ(W )(y) Q(W )(y)j
", if ky yk < . Therefore, Q(W ) is uniformly continuous and hence is continuous.
This completes the proof that the function Q maps C(B) into C(B).
Lemma: Q : C(B) ! C(B) is a contraction mapping with respect to k k on C(B),

where kW k = maxy2B jW (y)j.
Proof: I show that kQ(V ) Q(W )k kV W k ; for all V and W in C(B). Let
y 2 B and let cv be such that 0 cv y and Q(V )(y) = u(cv ) + V (F (y cv )).
Similarly, let cw be such that 0 cw y and Q(W )(y) = u(cw ) + W (F (y cw )).
Then
Q(V )(y) = u(cv ) + V (F (y cv ))
u(cw ) + V (F (y cw ))
u(cw ) + W (F (y cw )) kV Wk
= Q(W )(y) kV Wk:
By the symmetric argument, Q(W )(y) Q(V )(y) kV W k. Therefore
jQ(V )(y) Q(W )(y)j kV Wk:
Hence,
kQ(V ) Q(W )k = maxy2B jQ(V )(y) Q(W )(y)j kV Wk:
It follows from the contraction mapping theorem that there exists a unique V 2
C(B) such that Q(V ) = V . I next show that V is the value function for optimization
problem ( ).
2
Theorem: If V is the unique …xed point of Q, then
1
X
t
V (y0 ) = max u(ct )
(c0 ;c1 ;:::)
t=0
s.t. for some y1; y2; :::: in B;
0 ct yt and yt+1 = F (yt ct ), for all t;
where y0 in B is given. In particular, optimization problem (?) has a solution.
Proof: Let y0 2 B be given. De…ne c0 ; c1 ; : : : and y1 ; y2 ; : : : by induction on t as

follows. Let c0 by a solution of the problem
max[u(c) + V (F (y0 c))]

c2B
s.t. 0 c y0 :
Let y1 = F (y0 c0 ). Given c0 ; : : : ; ct and y1 ; : : : ; yt+1 , let ct+1 be a solution of the

problem
max[u(c) V (F (yt+1 c))]

c
s.t. 0 c yt+1
and let yt+2 = F (yt+1 ct+1 ).

P1I will
t
show that (c0 ; c1 ; :::) solves problem (?) with y0 = y0 and that V (y0 ) =
t=0 u(ct ). Since u is continuous, it is bounded on P the compact set B and so the
sequence u(ct ) is bounded and hence the in…nite sum 1 t=0
t
u(ct ) converges and is
…nite. First of all, I show that for any T 0, (c0 ; c1 ; :::; cT ) solves the problem
" T #
X
t
max u(ct ) + T +1 V (F (yT cT ))
c0 ;:::;cT
t=0
s.t. 0 c0 y0 and for some y1; y2 ; :::; yT in B
y1 = F (y0 c0 ); 0 ct yt ; for t = 1; :::; T;
and yt+1 = F (yt ct ); for t = 1; :::; T 1;
where y0 is given. Furthermore I show that

T
X
t T +1
V (y0 ) = u(ct ) + V (F (yT cT )):
t=0
The demonstration proceeds by proving by backwards inducton on t that for t =

0; 1; :::; T;
T
X
s t
V (yt ) = u(ct ) + T t+1 V (F (yT cT )) (2)
s=t
and that for any value of yt in B,
3
" T
#
X
s t T t+1
V (yt ) = max u(cs ) + V (F (yT cT )) (3)
(ct ;:::;cT )
s=t
s.t. for some yt+1 ; :::; yT in B;
0 cs ys ; for s = t; t + 1; :::; T; and
ys+1 = F (ys cs ); for s = t; t + 1; :::; T 1:
I …rst prove these statements for t = T . By the de…nition of cT ;
u(cT ) + V (F (yT cT )) = max [u(cT ) + V (F (yT cT ))]:

cT :0 cT yT
By the de…nition of Q,
Q(V )(yT ) = max [u(cT ) + V (F (yT cT ))]:
cT :0 cT yT
Since V is a …xed point of Q, it follows that

V (yT ) = Q(V )(yT ) = u(cT ) + V (F (yT cT )):
This proves the statement (2) for t = T: In order to prove statement 3, observe that
because V is a …xed point of Q, for any value of yT in B;
V (yT ) = Q(V )(yT ) = max [u(cT ) + V (F (yT cT ))]:
cT :0 cT yT
This proves statement (3) for t = T:
Now suppose by induction that statements (2) and (3) are true for t; where
0 t T: Then
T
X
s t+1 T t+2
u(cs ) + V (F (yT cT ))
s=t 1
" T
#
X
s t T t+1
= u(ct 1) + u(cs ) + V (F (yT cT ))
s=t
= u(ct 1) + V (yt ) (by the induction assumption)
= u(ct 1) + V (F (yt 1 ct 1 )) (because yt = F (yt 1 ct 1 ))
= max [u(c) + V (F (yt 1 c))] (by the de…nition of ct 1)
c:0 c yt 1
= Q(V )(yt 1) (by the de…nition of Q)

= V (yt 1) (because Q(V ) = V ).
This completes the proof of statement (2) for t 1: That is, it has been shown
that
T
X
s t+1
V (yt 1 ) = u(cs ) + T t+2 V (F (yT cT )):
s=t 1
4
Statement 3 for t 1 is that, for any yt 1 in B;
" T #
X
s t+1 T t+2
V (yt 1 ) = max u(cs ) + V (F (yT cT ))
ct 1 ;:::;cT
s=t 1
s.t. for some yt; :::; yT in B;
0 cs ys; for s = t 1; t; :::; T; and
ys+1 = F (ys cs ); for s = t 1; :::; T 1:
In order to prove this statement, observe that
" T
#
X
s t+1 T t+2
max u(cs ) + V (F (yT cT ))
ct 1 ;:::;cT
s=t 1
s.t. for some yt ; :::; yT in B;
0 cs ys ; for s = t 1; t; :::; T; and
ys+1 = F (ys cs ); for s = t 1; :::; T 1
( " " T ##)
X
s t T t+1
= max u(ct 1 ) + max u(cs ) + V (F (yT cT ))
ct 1 :0 ct 1 yt 1 ct ;:::cT
s=1
s.t. for some yt ; :::; yT in B;
0 cs ys ; for s = t; t + 1; :::; T; and
ys+1 = F (ys cs ); for s = t; t + 1; :::; T 1
= max [u(ct 1) + V (F (yt 1 ct 1 ))] (by the induction assumption)
ct 1 :0 ct 1 yt 1
= Q(V (yt 1 )) (by the de…nition of Q(V ))

= V (yt 1) (because Q(V ) = V ):
The induction is completed by carrying it all the way to t = 0.
This completes the proof that (c0 ; c1 ; :::; cT ) solves the problem
" T #
X
t T +1
max u(ct ) + V (F (yT cT ))
c0 ;:::;cT
t=0
s.t. 0 c0 y0 and for some y1; y2; :::; yT in B
y1 = F (y0 c0 ); 0 ct yt ; for t = 1; :::; T;
and yt+1 = F (yt ct ); for t = 1; :::; T 1;
where y0 is given, and completes the proof that
T
X
t T +1
V (y0 ) = u(ct ) + V (F (yT cT )):
t=0
I now use these assertions to prove that (c0 ; c1 ; :::) solves problem (1). First of
all, I show that
1
X
t
V (y0 ) = u(ct ):
t=0
5
Because V and F are continuous and C = f(c; y) j y 2 B; 0 c yg is compact,
the number b1 = max(c;y)2C jV (F (y c))j exists and is …nite. Therefore,
T
V (y0 ) u(c0 ) u(c1 ) u(cT )
= T +1
jV (F (yT cT ))j 5 T +1
b1 ! 0, as T ! 1:
Therefore,
1
X
t
V (y0 ) = u(ct ):
t=0
I now show that (c0 ; c1 ; : : :) solves problem (1), so that V is not just a …xed
point of Q; but is the value function for problem (1). Suppose that there exist
(^
Pc0 ; c^1 ; : : :) and (^y1 ; y^2 ; : : :) such that 0 c^t y^t and y^t+1 = F (^
P yt c^t ), for all t, and
1 t 1 t
t=0 u(^ct ) = t=0 u(c t ) + ", where " > 0. Let b 2 = maxc2B ju(c)j and let T be
so large that
T
" "
b2 < and T b1 < :
1 5 5
Then
X T
t
u(^ct ) + T +1 V (F (^
yT c^T ))
t=0
XT T
X
t T +1 t "
u(^
ct ) b1 > u(^
ct )
5
t=0 t=0
1
X 1
t
T
" X t 2"
u(^
ct ) b2 > u(^
ct )
1 5 5
t=0 t=0
X1 1
X
t 2" t 3"
= u(ct ) + " = u(ct ) +
5 5
t=0 t=0
XT T
t
T
3" X t 2"
u(ct ) b2 + > u(ct ) +
1 5 5
t=0 t=0
XT
t T +1 "
> u(ct ) + b1 +
5
t=0
XT
t T +1 "
> u(ct ) + V (F (yT
cT )) + ;
5
t=0
P
which is impossible since (c0 ; : : : ; cT ) maximizes Tt=0 t u(ct ) + T +1 V (F (yT cT ))
over (c0 ; : : : ; cT ) and (y1 ; : : : ; yT ) such that 0 c0 y0; y1 = F (y0 c0 ); and 0
ct yt and yt+1 = F (yt ct ); for t = 0; :::; T 1: This contradiction proves that
(c0 ; c1 ; : : :) solves problem (1).
Dynamic programming is the use of the value function to study problems with
a temporal structure like that in growth theory. The value function can be used to
derive properties of the optimal paths and to interpret them.
6
I next show that V inherits certain properties of u and F .
Theorem: If u and F are concave and non-decreasing, then V is concave and non-
decreasing, where V is the unique …xed point of Q.
Proof: First of all, I show that if W is concave and non-decreasing, then so is

Q(W ): Let W 2 C(B) be non-decreasing and concave. From the equation
Q(W )(y0 ) = max[u(c0 ) + W (F (y0 c0 ))]

c0
s.t. 0 c0 y0 ;
it is clear that Q(W ) is non-decreasing. I show that Q(W ) is concave. Suppose that
y 0 2 B and y0 2 B, and 0 < < 1. I must show that
Q(W )( y 0 + (1 )y0 ) Q(W )(y 0 ) + (1 )Q(W )(y0 ):
Let c0 and c0 be such that
Q(W )(y 0 ) = u(c0 ) + W (F (y 0 c0 ))
and
Q(W )(y0 ) = u(c0 ) + W (F (y0 c0 ));
where 0 c0 y 0 and 0 c0 + y0 . Then,
0 c0 + (1 )c0 y 0 + (1 )y0
and
Q(W ) y 0 + (1 )y0
u( c0 + (1 )c0 ) + W (F ( (y 0 c0 ) + (1 )(y0 c0 )))
(by the de…nition of Q(W ))
u(c0 ) + (1 )u(c0 ) + W ( F (y 0 c0 ) + (1 )F (y0 c0 ))
(because u and F are concave and W is non-decreasing)
u(c0 ) + (1 )u(c0 ) + W (F (y0 c0 )) + (1 ) W (F (y0 c0 ))
(because W is concave)
= [u(c0 ) + W (F (y 0 c0 ))]
+(1 )[u(c0 ) + W (F (y0 c0 ))]
= Q(W )(y 0 ) + (1 )Q(W )(y0 ):
Therefore, Q(W ) is concave.

By induction on n; Qn (W ) is non-decreasing and concave, for, n = 1; 2; :::. Since
limn!1 Qn (W ) = V and the limit of non-decreasing and concave functions is non-
decreasing and concave, it follows that V is non-decreasing and concave.
7
Suppose that not only are u and F concave, but that V is strictly concave, where
again V is the unique …xed point of Q. Then, the problem
max[u(c0 ) + V (F (y0 c0 ))]

c0
s.t. 0 c0 y0
c
%has a unique solution, for if c0 and c0 are distinct solutions, then u( 20 + c20 ) +
c
V (F (y0 20 c20 )) > 12 u(c 0 )+ 12 u(c0 )+ 2 V (F (y0 c0 ))+ 2 V (F (y0 c0 )) = u(c0 )+
V (F (y0 c0 )); which contradicts the optimality of c0 : Call this unique solution
c(y0 ). The function c(y0 ) is called a policy function, as is the function K(y0 ) =
y0 c(y0 ). A policy function generates an entire optimal program by induction on
t. Let c0 = c (y0 ) and y1 = F (y0 c(y0 )). Given c0 ; : : : ; ct and y1 ; : : : ; yt+1 , let
ct+1 = c(yt+1 ) and yt+2 = F (yt+1 c(yt+1 )). Then, (c0 ; c1 ; : : :) solves problem (?),
for it is the path that was shown to be optimal in the proof that problem (?) has a
solution.
I next show that under certain assumptions V can be shown to be di¤erentiable.
First of all, I need the following lemma, which was proved in lecture 12 in the course
of proving the necessity of the Kuhn-Tucker conditions.
Lemma: Let C be a convex subset of RN and let f : C ! R be concave and non-

decreasing. If x 2 C is such that x x, for some x 2 C; then f has a subgradient
at x such that = 0:
It need not be the case that concave functions have subgradients at points not in
p
the interior of their domain of de…nition. Consider the function f (x) = x de…ned on
the set of non-negative numbers. This function is concave, since its second derivative
is negative, but it has no derivative (or in…nite derivative) at zero, and so has no
subgradient there. Of course, zero is not in the interior of its domain of de…nition.
Theorem: Suppose that u and F are non-decreasing and concave and that u is
di¤erentiable. Suppose that y0 2 int B and let c0 solve the problem
max[u(c0 ) + V (F (y0 c0 ))]

c0
s.t. 0 c0 y0 ;
where V is the unique …xed point of Q. Suppose that c0 0. Then, V is di¤erentiable

at y0 and DV (y0 ) = Du(c0 ).
Proof: Because V is a …xed point of Q, V (y0 ) = u(c0 ) + V (F (y0 c0 )). Since V

is concave and y0 0, the previous lemma implies that V has a subgradient, , at
y0 . Therefore, V (y0 ) 5 V (y0 ) + (y0 y0 ), for all y0 . Since
V (y0 ) = max[u(c0 ) + V (F (y0 c0 ))]

c0
s.t. 0 c0 y0 ;
8
it follows that
V (y0 ) u (c0 + y0 y0 ) + V (F (y0 c0 ));
provided y0 2 B is so close to y0 that c0 +y0 y0 0. (Clearly, c0 +y0 y0 y0 , since
c0 y0 . The consumption vector c0 +y0 y0 absorbs all the change in the initial stock
of goods.) The functions V (y0 ) + (y0 y0 ) and u (c0 + y0 y0 ) + V (F (y0 c0 ))
are di¤erentiable functions of y0 . Since
V (y0 ) + (y0 y0 ) = V (y0 ) = u (c0 + y0 y0 ) + V (F (y0 c0 ))
and all three functions have the same value at y0 = y0 , it follows that V is di¤er-
entiable at y0 and that all three functions have the same derivative at y0 . That is,
DV (y0 ) = Du(c0 ) = . The argument should be made plausible by the following
picture.
Suppose that (c0 ; c1 ; : : :) and (y1 ; y2 ; : : :) are optimal for problem (?) and that 0
ct yt , for all t. Assume that u and F are non-decreasing, concave, and di¤erentiable.
Then, the previous theorem implies that DV (yt+1 ) = Du(ct+1 ). Because ct solves the
problem
max[u(c) + V (F (yt c))]

c
s.t. 0 c yt ;
it follows that Du(ct ) DV (yt+1 )DF (yt ct ) = 0. Hence,
Du(ct ) = DV (yt+1 )DF (yt ct ):
Substituting Du(ct+1 ) for DV (yt+1 ), we obtain
Du(ct ) = Du(ct+1 )DF (yt ct ):
This equation is known as Euler’s equation. Euler’s equation together with the
9
equation yt+1 = F (yt ct ) determine the evolution of the optimal path (ct ; yt )1 t=0 .
The truth of this remark may be seen as follows. You are given y0 . Choose c0 . Then,
y1 = F (y0 c0 ) is determined. The Euler equation Du(c0 ) = Du(c1 )DF (y0 c0 )
now determines c1 , provided this equation can be solved for c1 .
The equation Du(c0 ) = Du(c1 )DF (y0 c0 ) determines c1 ; provided D2 u(c)
exists and is negative de…nite, for all c, and DF (K) is invertible. That is, the
solution c1 to this equation is unique, if it exists, To see that this is so, suppose that
Du(c0 ) = Du(c1 )DF (y0 c0 )
and
Du(c0 ) = Du(^
c1 )DF (y0 c0 );
where c1 6= c^1 : Then,
1 1
Du(c1 ) = Du(c0 ) (DF (y0 c0 )) = Du(^
c1 )
Hence,
(^
c1 c1 )T Du(c1 ) = (^
c1 c1 )T Du(^
c1 )
By the mean value theorem,
0 = (^
c1 c1 )T Du(^
c1 ) (^
c1 c1 )T Du(c1 )
= (^
c1 c1 )T D2 u(c1 + (^
c1 c1 )) (^
c1 c1 ) ;
for some such that 0 < < 1: However,
(^
c1 c1 )T D2 u(c1 + (^
c1 c1 )) (^
c1 c1 ) < 0:
This contradiction proves that there can be at most one solution c1 to the equation
Du(c0 ) = Du(c1 )DF (y0 c0 ):
This equation, however, may have no solution.

Suppose that c0 ; : : : ; ct and y1 ; : : : ; yt have been determined. Then, yt+1 = F (yt
ct ), and the Euler equation Du(ct ) = Du(ct+1 )DF (yt ct ) determines ct+1 .
This discussion becomes clearer if there is only one commodity, so that N = 1.
In this case, Euler’s equation becomes
du(ct ) du(ct+1 ) dF (yt ct )
=
dt dc dK
so that
1
du(ct+1 ) dF (yt ct ) du(ct )
= :
dc dK dc
Knowledge of du(ct+1 )=dc clearly determines ct+1 , provided d2 u(ct+1 )=dc2 < 0.
In this discrete time optimization problem, Euler’s equation and the equation
yt+1 = F (yt ct ) play the same role as do the maximum principle and the Hamiltonian
system in continuous time models. Both systems determine the evolution of the path.
10
These evolutionary systems do not determine an optimal path, however. Recall
that in the discussion above, I …xed c0 as well as the initial output vector y0 . Once
c0 is …xed, it might or might not be possible to continue de…ning ct and yt , for all t,
for it is possible that for some t, ct > yt . That is, the path could become infeasible
and so could not be continued. It is also possible that the path (ct ; yt )1
t=0 could be
continued inde…nitely, but is nevertheless not optimal because consumption converges
to zero at the same time that output converges to a high level. The next example will
illustrate these possibilities. The key problem, then, is to choose the initial value of
consumption, c0 , correctly. The purpose of transversality conditions is to guide this
choice.
Example: N = 1: u(c) = ln c.
p
F (y c) = 2 y c:
Let y c = K: The optimization problem is

1
X
t
max1 ln ct
(ct ;Kt )t=0
t=0
s.t. K is given, ct 0; Kt
1 0;
p
ct + Kt = 2 Kt 1 , for all t 0, where K 1 > 0:
Euler’s equation is
du(ct ) du(ct+1 ) dF (Kt )

= or
dc dc dK
1 1 1
= p , for t 0:
ct ct+t Kt
p
Assume that ct = 2 Kt 1 = yt , for all t, where 0 < < 1 and is the proportion
of output consumed. Then, Euler’s equation becomes
1 1
p = p p ;
2 Kt 1 2 Kt K t
p
so that Kt = Kt 1 , for t 0. The solution of this di¤erence equation converges,
as the following diagram shows.
11
p p 2
limt!1p Kt = K, where K = K, so that K = and
p hence K = 2 . Let
y = 2 K = 2 . Then, limt!1 ct = c, where c = y K = 2 K K = 2 . Since
2
c = y, it follows that 2 = 2 and hence = (2 )=2 = 1 =2.
The calculated path, (ct ; Kt )1
t=0 is optimal, as I will show.
P1 t t One approach to proving
p
that it is optimal would be to calculate V (y0 ) = t=0 u(c ), where y0 = 2 K 1 ,
for each value of K 1 or y0 , and then to show that Q(V ) = V , so that V is the value
function. It would then follow that (ct ; Kt )1t=0 is optimal. I do not know how to do
this calculation.
Dynamic Programming is the analysis of intertemporal optimization problems

using a value function. A major drawback of this approach is that it is almost never
possible to calculate the value function.
Although I will show that (ct ; Kt )1
t=0 is optimal, the point I am making now is
that it is a major project to verify optimality. The optimal path (ct ; Kt )1t=0 satis…es
Euler’s equation, but so do lots of other paths, and there is no easy criterion for
determining which is optimal.
In order to see that lots of paths with a given initial capital satisfy Euler’s equa-
p
tion, let Ct = at 2 Kt 1 , where 0 < at < 1 and at is the proportion of output
p
consumed in period t. If we substitute Ct = at 2 Kt 1 into Euler’s equation, we
obtain
1 1
p = p p = ;
at 2 Kt 1 at+1 2 Kt Kt at+1 2Kt
p
so that at+1 Kt = at Kt 1 . Output that is not consumed is invested, so that Kt =
p
(1 at )2 Kt 1 . Substituting this equation into Euler’s equation, we obtain
p p
at+1 (1 at )2 Kt 1 = at Kt 1 ;
so that
at
at+1 = :
21 at
12
If we graph this di¤erence equation, we obtain the next …gure. Note that if a = 2 1 a a ,
then a = 1 2 = , where is the stationary consumption rate calculated earlier.
If 1 =2 < a0 < 1, then for t large enough, at > 1, which is impossible,

since consumption cannot exceed output. Therefore, no in…nite path can obey Euler’s
equation and begin with a0 > 1 =2. Therefore, such paths are not optimal.
The next task is to show that a path beginning with a0 < 1 =2 is not optimal.
We see from the previous …gure that if a0 < 1 2 , then limt!1 at = 0, so that
limt!1 ct = 0. Since asymptotically,
p all output is reinvested, Kt converges to the
level K such that K = 2 K. That is, K = 4. The convergence of Kt to 4 when
there is no consumption may be seen in the next …gure. Let T be such that if t T ,
then Kt 1 > 1 and ct < 1. De…ne a new path of consumption and investment,
(^ct ; K^ t )1 as follows. If t < T , let (^ ct ; K^ t ) = (ct ; Kt ). Let K ^ t = 1, if t T . Let
p
t=0
c^T = 2 KT 1 1 and let c^t = 1, if t > T . The path (^ ^ 1
ct ; Kt )t=0 switches at time
T fromp the path (ct ; Kt ) to the constant path with (^ ^ t ) = (1; 1), for t > T .
ct ; K
Since 2 KT 1 > 2, c^T > 1. Therefore, c^t = 1 > ct , if t T . Since c^t = ct , for
P1 t P1 t ^ 0 = F (K 1 ) and
t < T , it follows that t=0 u(^ ct ) > t=0 u(ct ). Since, c^0 + K
c^t + K ^ t = F (K^ t 1 ), for t > 0, the path (^ ct ; K^ t)1
t=0 is feasible with initial capital K 1
1
and hence (ct ; Kt )t=0 is not optimal among all feasible paths with initial capital K 1 .
We know from previous theorems that there is an optimal path and that it satis…es
Euler’s equation. I have shown that all feasible paths obeying Euler’s equation are
paths generated by the di¤erence equation at+1 = 2 1 atat , for some value of a0 . I have
shown that the path is not feasible if a0 > = 1 2 and that it is not optimal if
a0 < 1 2 . Therefore, the path with ao = 1 2 must be the unique optimal path.
13
This path has at = 1 2 , for all t, and is the path computed earlier. This ends the
discussion of the example.
The fact that paths can be feasible, satisfy Euler’s equation and yet not be opti-
mal is called the Hahn problem after Frank Hahn who …rst noticed it. The Hahn
problem has led to the search for easy criteria that separate optimal from suboptimal
in…nite horizon feasible paths that satisfy Euler’s equation. It is not easy to …nd con-
ditions su¢ cient for optimality, but necessary conditions come to hand. One condition
follows from the following train of thought. Return now to the model with perhaps
more than one commodity. Assume that u and F are di¤erentiable non-decreasing,
and concave and assume for the sake of presentation that the value function V is dif-
ferentiable. Let (ct ; yt )1 t=0 be an optimal path and assume that ct >> 0, for all t. Let
t t
t = DV (yt ) = Du(ct ). The t are like Kuhn-Tucker coe¢ cients. Because V is
concave, t is a subgradient of t V (y) at y = yt . Because t is a subgradient of t V at
yt , t V (0) t
V (yt ) + t (0 yt ), so that t yt t
[V (yt ) V (0)]. Because yt 2 B,
yt is bounded and so V (yt ) is bounded too and therefore limt!1 t [V (yt ) V (0)] = 0.
Since t 0 and yt 0, t yt 0 and so limt!1 t yt = 0. Thus, a necessary
condition for optimality is that limt!1 t yt = 0, where the t are “Kuhn–Tucker
coe¢ cients” corresponding to the in…nite horizon problem
1
X
t
max u(t)
t=0
s.t. for some y1; y2; ::: in B;
0 ct yt and yt+1 F (yt ct ), for t 0; (4)
with y0 in B given. These coe¢ cients may be de…ned even if u is not di¤erentiable,
14
but in the di¤erentiable case, t = Du(ct ), for all t, provided ct 0. Another
common condition derived from the one just stated is that limt!1 t Kt = 0; where
Kt = yt ct . These conditions are called transversality conditions, though that
term is more commonly applied to continuous time models.
Transversality Conditions
The Hahn problem arises also in continuous time growth models. In…nite hori-
zon paths that satisfy the maximum principle and the Hamiltonian system may be
suboptimal because consumption converges to zero over time and there is an over-
accumulation of capital. Conditions have been devised to exclude such paths, and
these conditions are called transversality conditions.
Just as in the case of discrete time growth models, there is a value function,
Z 1
V (y0 ) = max e rt u(c (t)) dt
c:[0;1]![0;1) 0
dK(t)
s.t. c(t) + = F (K(t)), for all t, and
dt
dK(0)
c(0) + = y0 ;
dt
where y0 is given and positive and r > 0. If (t) is the dual or conjugate vector at
time t, then (t) is a subgradient of e rt V (y) at y = y(t), where y(t) = F (K(t)), and
(c(t); K(t)) is an optimal path.
Because V is concave and (t) is a subgradient of e rt V (y) at y = y(t), (t) y(t)
e rt [V (y(t)) V (0)] ! 0 as t ! 1. Therefore, a necessary condition for optimality
is that limt!1 t yt = 0. A similar condition is that limt!1 t Kt = 0. These are
typical transversality conditions.
Origin of the term “transversality”

In order to understand the origin of the term “transversality,” recall the all-terrain
vehicle problem I described verbally when introducing optimal control theory. Sup-
pose that the goal is to go from a point x0 to a river, L, during the time interval [0; T ]
while consuming as little fuel as possible. Let the river be described by a di¤erential
function h : R ! R2 , such that Dh(s) 6= 0, for all s 2 R. Let (t) be the dual
variable of the optimal control problem. The transversality condition is that (T ) be
orthogonal to L, that is, (T ) Dh(s) = 0, where s is such that h(s) = x(T ). This
end point condition is necessary for optimality. In in…nite horizon optimal growth
models, there is no end point and no boundary to which (T ) should be traversal.
Nevertheless, there is a need for a condition on the behavior of a path x(t) as t goes
to in…nity, and the term “transversality” has been extended to such conditions.
That (T ) should be orthogonal to Dh(s) makes good intuitive sense, when one
remembers that (T ) is the gradient of the value function V at x(T ); so that (T ) is
orthogonal to the level curve of V through x(T ): Since the destination or end point of
15
the optimization problem is the river, the value V is zero and hence constant there.
Since x(T ) = h(s) is a point on the river, (T ) is orthogonal to this level curve. That
is (T ) Dh(s) = 0:
16

MATH CAMP: Lecture 1: 1 Linear Algebra

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MATH CAMP: Lecture 1: 1 Linear Algebra

Uploaded by

Copyright:

Available Formats

MATH CAMP: Lecture 1

Now solve. Subtract twice …rst equation from second

These are equivalent to the following pairs of equations.

De…nition: Such a matrix B is said to be row-reduced.

De…nition: A matrix is a row reduced echelon matrix if:

Theorem: Every matrix is equivalent to a row reduced echelon matrix.

Proof : This may be achieved by elementary row operations.

Theorem: If A is an M N matrix and M < N , then the system Ax = 0 has a

Proof : Let B be a row reduced echelon matrix equivalent to A. Ax = 0 () Bx = 0:

Theorem: If A is an N N matrix and if Ax = 0 has no non-zero solution, then A

Proof : Let B be a row reduced echelon matrix that is row equivalent to A. Bx = 0

De…nition: A vector space consists of a set, V , together with operations of addition

De…nition: W is a subspace of V , if W V and W is a vector space under the

Note: W V is a subspace of V if v + w 2 W; for all v; w 2 W; and if cw 2 W

Example: V = ff : [0; 2 ] ! Rjf (0) = f (2 )g is a vector space under the operations

Now I try to get at the idea of the dimension of a vector space.

De…nition: Vectors v1 ; :::; vN in V are linearly dependent if there exist numbers

De…nition: A basis for a vector space V is a set of linearly independent vectors in

De…nition: A vector space is …nite dimensional, if it has a …nite basis.

Proof : If v1 ; :::; vM and w1 ; :::; wN are bases, then N M and M N:

De…nition: The dimension of V is dim V = the number of vectors in a basis for V .

Lemma: Let v1 ; :::; vM in V be linearly independent and let w in V not belong to

Proof : Suppose that c1 v1 + + cM vM + bw = 0. If b 6= 0, then w = bc v1

Theorem: If W is a subspace of a vector space V of …nite and positive dimension

De…nition: If V and W are vector spaces, T = V ! W is linear or a linear

Example: Let T : RN ! RM be de…ned by T (x) = Ax, where A is an M N

Example: If f 2 V = ff : [0; 2 ] ! Rjf (0) = f (2 )g Let

T : V ! V is linear. If [0; 2 ) is thought of as a circle, the transformation T corre-

De…nition: If A is an M N matrix and B is a J M matrix,P the matrix C = BA,

Remark: If the M N matrix A represents T and the J M matrix B represents

De…nition: A function f : V ! W is invertible, if there exists f 1 : W ! V such

De…nition: f : V ! W is onto, if for every w 2 W , there exists a v 2 V such that

De…nition: f : V ! W is one to one, if for every v 2 V and v 2 V such that

2. f is one to one, if and only if there exists f 1 : f (V ) ! V such that f 1

Proof: Let w1 ; w2 2 W and c1 ; c2 2 R. Let v1 = T 1 (w1 ) and v2 = T 1 (w2 ). Since

Proposition: Let T : V ! V be a linear transformation and let v1 ; :::; vN a basis

Proof: If T is invertible and A 1 represents T 1 , then idV = T 1 T; A 1 A repre-

Theorem: If A is an N N matrix, then the following are equivalent:

Proof: (i) ! (ii) Let B = A 1 .

b) Replacement of the rth row of A by row r plus c times row s, where c 6= 0,

c) Interchange of rows r and s of A corresponds to multiplication of A on the left

Corollary: If A is an N N matrix and BA = I, for some N N matrix B, then

De…nition: If T : V ! W is a function, the image or range of T is fT (v) j v 2 V g :

De…nition: If T : V ! W is a linear transformation, the null space or kernel of T

Theorem: If T : V ! W is a linear transformation, then the range of T is a

Proof: c1 T (v1 ) + c2 T (v2 ) = T (c1 v1 + c2 v2 ). T (v1 ) = 0 and T (v2 ) = 0 imply that

De…nition: If A is an M N matrix, the column rank of A = the dimension of

Theorem: Let T : V ! W be a linear transformation. Then, rank T + nullity

=) bn = 0 and cn = 0, for all n; since v1 ; :::; vN are independent. Therefore, rank of

If T : V ! W is a linear transformation, then for any w 2 W , T 1 (w) =

De…nition: If T : V ! W is a linear transformation, T is non-singular if the kernel

Remark: T is non-singular if and only if T is one to one, since T (v1 ) = T (v2 ) if

Lemma: If T : V ! W is a linear transformation, then T is non-singular if and

Proof: Suppose that T is non-singular. If v1 ; :::; vN are linearly independent, then

c1 T (v1 ) + + cN T (vN ) = 0 =) T (c1 v1 + + cN vN ) = 0

Suppose T carries independent vectors to independent vectors. Let v 6= 0, where

Lemma: Let w1 ; :::; wM be independent

Proof: Suppose that u1 ; :::; uK are independent. Then

a11 a1K B ck a1k C

which implies that c1 = c2 = = cK = 0, so that u1 ; :::; uK are independent.

Corollary: If the M N matrix A represents the linear transformation T : V ! W ,

We may identify f with the N -vector (f (v1 ); :::; f (vN )).

De…nition: If V is a vector space and S is a subset of V , the annihilator of S is