You are on page 1of 29

Functions of Several Variables

1.1 Euclidean Spaces


Definition 1.1.1. For each positive integer n, Let Rn be the set of all ordered
n− tuples x = (x1 , x2 , ...xn ), where x1 , x2 , ..xn are real numbers, called the
coordinates of x.

The elements of Rn are called points, or vectors, especially when n > 1.


If y = (y1 , y2 , ..yn ) and if c is a real number, put

x + y = (x1 + y1 , x2 + y2 , ...xn + yn ),
cx = (cx1 , cx2 , ...cxn )

so that x + y ∈ Rn and cx ∈ Rn .This defines addition of vectors, as well as


multiplication of a vector by a real number.These two operations satisfy the
commutative, associative, and distributive laws and make Rn into a vector
space over the real field. The zero element of Rn is the point 0, all of whose
coordinate are 0.
We also define the inner product or scalar product of x and y by

n
X
x·y= xi yi
i=1

and the norm of x by

n
! 21
1
X
| x |= (x · x) =
2 x2i
i=1
The structure now defined (the vector space Rn with the above inner product
and norm ) is called euclidean n− space.

Example 1.1.2. if x = (1, 0, −1, 2), y = (3, 2, 1, 4) then x + y = (4, 2, 0, 6),


1
2x = (2, 0, −2, 4), | x |= 6 2 ,

Result 1.1.3. Suppose x, y ∈ Rn and c is real then

(a) |x| ≥ 0

(b) |x| = 0 if and only if x = 0

(c) |cx| = |c||x|

(d) |x · y| ≤ |x||y| (Schwarz inequality)

(e) |x + y| ≤ |x| + |y| (Triangle inequality)

1.2 Basis and dimension


Definition 1.2.1. A non empty set X ⊂ Rn is a vector space if x + y ∈ X
and cx ∈ X for all x ∈ X, y ∈ X and for all scalars c

Definition 1.2.2. If x1 , x2 , ...xk ∈ Rn and c1 , c2 , .., ck are scalars,then the


vector

c1 x1 + c2 x2 + ... + ck xk

is called a linear combination of x1 , x2 , ...xk . If S ⊂ Rn and if E is the set


of all linear combinations of elements of S, we say that S spans E or E is the
span of S

Exercise 1.2.3. If S is a nonempty subset of a vector space X, prove that


the span of S is a vector space

Example 1.2.4. if x1 = (1, 2, 4), x2 = (−1, 3, 1) then x = (1, 12, 14) is a linear
combination of the vectors x1 and x2 . Since x = (1, 12, 14) = 3x1 + 2x2

Definition 1.2.5. A set consisting of vectors {x1 , x2 , ...xk } is said to be inde-


pendent if the relation c1 x1 + c2 x2 + ... + ck xk = 0 implies that
c1 = c2 = . . . = ck = 0 .Otherwise {x1 , x2 , ..., xk } is said to be dependent.

Observe that no independent set contains the null vector.


Example 1.2.6. A = {(1, 0, 0), (1, 1, 0), (1, 1, 1)} is an independent set whereas
B = {(1, 2), (3, 4), (5, 6)} is a dependent set.
Exercise 1.2.7. Prove that Every subset of a linearly independent set is lin-
early independent.
Definition 1.2.8. If a Vector space X contains an independent set of r vectors
but contains no independent set of r + 1 vectors, we say that X has dimension
r and write it as dim X = r.
Observe that if X contains an independent set of n elements, then dim X ≥ n.
Exercise 1.2.9. Prove that if dim X = r then we can find linearly independent
subsets of X which contains elements less than r.
Definition 1.2.10. An independent subset of a vector space X which spans
X is called a basis of X.
Observe that if B = {x1 , x2 , ...xr } is P
a basis of X,then every x ∈ X has
a unique representation of the form x = cj xj , Such a representation exists
since B spans X, and it is unique since B is independent. The numbers c1 , ...cr
are called the coordinates of x with respect to the basis B.
The most familiar example of a basis is the set e1 , ...en where ej is the vector
in Rn whose jth coordinate
Pn is 1 and whose other coordinates are all 0.If x =
(x1 , ...xn ) then x = j=1 xj ej .We shall call {e1 , ...en } the standard basis of
Rn .
Theorem 1.2.11. Let r be a positive integer.If a vector space X is spanned
by a set of r vectors then dim X ≤ r.
Proof. Assume that this result is false. So there exist a vector space X which
is spanned by a set of r vectors, say, S0 = {x1 , ..., xr } and dim X > r. Since
dim X > r, X contains an independent set Q = {y1 , ...yr+1 } contains r + 1
elements.
Suppose 0 ≤ i < r and suppose a set Si , has been constructed which spans X
and which consists of all yj with 1 ≤ j ≤ i plus a certain collection of r − i
members of S0 , say x1 , ...xr−i , (In other words Si , is obtained from S0 by
replacing i of its elements by members of Q, without altering the span.) Since
Si , spans X, yi+1 is in the span of Si ; hence there are scalars a1 , ...ai+1 , b1 , ..br−i
with ai+1 = 1, such that

i+1
X r−i
X
aj yj + bk xk = 0
j=1 k=1
If all bk ’s were 0, the independence of Q would force all aj ’s to be 0, a con-
tradiction. It follows that some xk ∈ Si , is a linear combination of the other
members of Ti = Si ∪ {yi+1 }. Remove this xk from Ti and call the remaining
set Si+1 . Then Si+1 spans the same set as Ti , namely X. Starting with S0 , we
thus construct sets S1 , ..., Sr .The set Sr consists of y1 , ...yr and our constric-
tion shows that it spans X. But Q is independent ; hence yr+1 is not in the
span of Sr ,which is a contradiction.

(Let’s try to understand the construction of Si ’s.


Span of S0 = X and y1 ∈ X implies that there exist scalars a1 , b1 , b2 , ...br with
a1 = 1 such that a1 y1 + b1 x1 + ... + br−1 xr−1 + br xr = 0. If all bk ’s were 0,the
independence of Q would force that a1 = 0 , a contradiction. So without loss
of generality assume that br 6= 0.
It follows that xr ∈ S0 , is a linear combination of {x1 , ..., xr−1 , y1 }. Let T0 =
S0 ∪ {y1 } = {x1 , ..., xr−1 , xr , y1 } and let S1 = {x1 , ..., xr−1 , y1 }. Since xr is a
linear combination of {x1 , ..., xr−1 , y1 } imply that span of T0 is same as span
of S1 . But span of S0 is X and T0 = S0 ∪ {y1 } imply that span of T0 is X. So
span of S1 is X.
Span of S1 = X and y2 ∈ X implies that there exist scalars a1 , a2 , b1 , b2 , ...br−1
with a2 = 1 such that a1 y1 + a2 y2 + b1 x1 + ... + br−1 xr−1 = 0. If all bk ’s were
0,the independence of Q would force that a2 = 0 , a contradiction. So without
loss of generality assume that br−1 6= 0. It follows that xr−1 ∈ S1 , is a linear
combination of {x1 , ..., xr−2 , y1 , y2 }.
Let T1 = S1 ∪ {y2 } = {x1 , ..., xr−1 , y1 , y2 } and let S2 = {x1 , ..., xr−2 , y1 , y2 }.
Then span of T1 is same as span of S2 . But span of T1 is X. So span of S2 is
X. etc....).

Corollary 1.2.12. dim Rn = n.

Proof. Since {e1 , ..., en } spans Rn , the theorem 1.2.11 shows that dim Rn ≤ n.
Since {e1 , ..., en } is independent, by definition 1.2.8 dim Rn ≥ n.

Theorem 1.2.13. Suppose X is a vector space, and dim Rn = n.

(a) A set E of n vectors in X spans X if and only if E is independent.

(b) X has a basis and every basis consists of n vectors.

(c) If 1 ≤ r ≤ n and {y1 , .., yr } is an independent set in X, then X has a


basis containing {y1 , .., yr }.

Proof.
(a) Suppose E = {x1 , ..., xn }. Since dim X = n, the set {x1 , ..., xn , y} is
dependent, for every y ∈ X.
Let y ∈ X. If E is independent and the set {x1 , ..., xn , y} is dependent
then it follows that a1 y + b1 x1 + ... + bn xn = 0 for some a1 6= 0. So y is
in the span of E, hence E spans X.
Conversely, if E is dependent, one of its members can be removed without
changing the span of E (by similar construction of Si+1 from Ti in the
proof of the Theorem 1.2.11 ). Hence, span of E is same as span of E 0 for
some set E 0 ⊂ E containing n − 1 elements. Hence, by Theorem 1.2.11,
dim (span of E 0 ) ≤ n − 1.Hence E cannot span X.
(b) Since dim X = n, X contains an independent set of n vectors, and (a)
shows that every such set is a basis of X. Suppose that E 0 is a basis of
X. Then, by the definition of basis we know that E 0 is an independent
set and span of E 0 is X. Since dim X = n, every independent set has
at- most n elements by the Definition 1.2.8. So the number of elements
in E 0 is at-most n . If number of elements in E 0 is less than n, then by
Theorem 1.2.11, dim X ≤ n − 1, which is not possible. So number of
elements in E 0 is n.
(c) Let{x1 , ..., xn } be a basis of X. The set
S = {y1 , ..., yr , x1 , ..., xn }
spans X and is dependent, since it contains more than n vectors. The
argument used in the proof of Theorem 1.2.11 shows that one of the xi ’s
is a linear combination of the other members of S. If we remove this xi ,
from S, the remaining set still spans X. This process can be repeated r
times and leads to a basis of X which contains {y1 , .., yr } by (a).

1.3 Linear Transformation and Matrices


Definition 1.3.1. A mapping A of a vector space X into a vector space Y is
said to be a linear transformation if

A(x1 + x2 ) = A(x1 ) + A(x2 ) , A(cx) = cA(x)

for all x, x1 , x2 ∈ X and all scalars c. Note that one often writes Ax instead
of A(x) if A is linear.
Exercise 1.3.2. Prove that if the mapping A of a vector space X into a vector
space Y is linear then A(0) = 0.
Observe that a linear transformation A of X into Y is completely deter-
mined by its action on any basis. If {x1 , ..., xn } is a basis of X, then every
x ∈ X has a unique representation of the form
n
X
x= c i xi
i=1

and the linearity of A allows us to compute Ax from the vectors Ax1 , ..., Axn ,
and the coordinates c1 , ..cn by the formula
n
X
Ax = ci Axi (1.1)
i=1

Linear transformations of X into X are often called linear operators on X.


If A is a linear operator on X which is one-to-one and maps X onto X, we
say that A is invertible. In this case we can define an operator A−1 on X by
requiring that A−1 (Ax) = x for all x ∈ X
Exercise 1.3.3. Prove that If A is a linear operator on X which is one-to-one
and maps X onto X then A−1 is linear and invertible.
Exercise 1.3.4. Assume that A is a linear transformations of X into Y and
Ax = 0 only when x = 0. Prove that A is one-one.
Theorem 1.3.5. A linear operator A on a finite-dimensional vector space X
is one-to-one if and only if the range of A is all of X.
Proof. Let {x1 , ..., xn } be a basis of X. The linearity of A shows that its range
R(A) is the span of the set Q = {Ax1 , ..., Axn }(by equation 1.1). Therefore
from Theorem 1.2.13(a), Q spans X if and only if Q is independent. That is
R(A) = X if and only if Q is independent. We are going to prove that this
happens if and only if A is one-to-one.
P
Suppose A is one-to-one and P ci Axi = 0.
Then by the Plinearity of A, A( ci xi ) = 0. A(0) = 0 and A is one-to-one
imply that ci xi = 0.
Since {x1 , ..., xn } is the basis of X , we will get c1 = ... = cn = 0. So we
conclude that Q is independent. P P
Conversely, suppose Q is independent and A( ci xi ) = 0. Then ci Axi = 0.
Hence c1 = ... = cn = 0 and we conclude: Ax = 0 only if x = 0.If Ax = Ay,
then A(x − y) = 0,so that x − y = 0 and this says that A is one-to-one.
Thus we have proved that A is one-to-one if and only if Q is independent.
From Theorem 1.2.13(a) we know that Q is independent if and only if span of
Q is X. We know that span of Q is R(A). From this we can conclude that A
is one-to-one if and only if R(A) = X.
Definition 1.3.6.
(a) Let L(X, Y ) be the set of all linear transformations of the vector space
X into the vector space Y .
Instead of L(X, X) we shall simply write L(X). If A1 , A2 ∈ L(X, Y ) and
if c1 , c2 are scalars, define c1 A1 + c2 A2 by
(c1 A1 + c2 A2 )x = c1 A1 x + c2 A2 x (x ∈ X)
It is then clear that c1 A1 + c2 A2 ∈ L(X, Y ).
(b) If X, Y, Z are vector spaces, and if A ∈ L(X, Y ) and B ∈ L(Y, Z), we
define their product BA to be the composition of A and B:
(BA)x = B(A)x, (x ∈ X).
Then BA ∈ L(X, Z).
Note that BA need not be the same as AB even if X = Y = Z.
(c) For A ∈ L(Rn , Rm ) , define the norm kAk of A to be the sup of all
numbers |Ax|, where x ranges over all vectors in Rn with |x| ≤ 1.

Exercise 1.3.7. Prove that |Ax| ≤ kAk|x| holds for all x ∈ Rn .

x
(Hint: For any x ∈ Rn , let y = |x|
then |Ay| ≤ kAk)
Exercise 1.3.8. If λ is such that |Ax| ≤ λ|x| for all x ∈ Rn ,then kAk ≤ λ.

(Hint: For |x| ≤ 1, |Ax| ≤ λ and use the property of supremum)


Example 1.3.9. For every A ∈ L(Rn , R1 ) corresponds a unique y ∈ Rn such
that Ax = x · y and kAk = |y|
Let {e1 , . . . , en } be the standard basis of Rn , and let
y = A (e1 ) e1 + · · · + A (en ) en
Then for any x = c1 e1 + · · · + cn en we have
A(x) = c1 A (e1 ) + · · · + cn A (en )
=y·x
There can be at most one such y, since if A(x) = z · x, then

|y − z|2 = y · y − y · z − z · y + z · z = A(y) − A(y) − A(z) + A(z) = 0

By the Schwarz inequality (Result 1.1.3) (d) we have

|A(x)| = |y · x| ≤ |y||x|

for all x, so by exercise 1.3.8 kAk ≤ |y|.


On the other hand A(y) = y · y = |y|2 , so by exercise 1.3.7 |Ay| ≤ kAk|y|
implies that kAk ≥ |y|.

Theorem 1.3.10.

(a) If A ∈ L(Rn , Rm ), then kAk < ∞ and A is a uniformly continuous


mapping of Rn into Rm .

(b) If A, B ∈ L(Rn , Rm ) and c is a scalar then

kA + Bk ≤ kAk + kBk, kcAk = |c|kAk

with the distance between A and B defined as kA − Bk, L(Rn , Rm ) is a


metric space.

(c) If A ∈ L(Rn , Rm ) and B ∈ L(Rm , Rk ) then

kBAk ≤ kBkkAk

Proof.

(a) Let {e1 , ..., en } be the standard basis in Rn and suppose x =


P
ci e i ,
|x| ≤ 1,so that |ci | ≤ 1 for i = 1, ..., n. Then by results 1.1.3 (c) and
1.1.3 (e)
X X X
|Ax| = ci Aei ≤ |ci ||Aei | ≤ |Aei |

So that
n
X
kAk ≤ |Aei | < ∞
i=1

By Exercise 1.3.7, |Ax − Ay| = |A(x − y)| ≤ kAk|x − y| if x , y ∈ Rn .


So A is uniformly continuous
(b)
|(A + B)x| = |Ax + Bx| ≤ |Ax| + |Bx| ≤ (kAk + kBk)|x|
So by Exercise 1.3.8, kA + Bk ≤ kAk + kBk
The second part is proved in the same manner.
If A, B, C ∈ L(Rn , Rm ) then we have

(i) kA − Bk ≥ 0
n
kA−Bk = 0 imply that A(x) = B(x) for every x ∈  R  with |x|
 ≤  1.
x x
So for 0 6= x ∈ Rn , if kA − Bk = 0 imply that A |x| = B |x| .
So A = B
(ii) kA − Bk = kB − Ak
(iii) we have the triangle inequality

kA − Ck = k(A − B) + (B − C)k ≤ kA − Bk + kB − Ck
So L(Rn , Rm ) is a metric space with norm metric

(c) by Exercise 1.3.7 and 1.3.8, |BAx| = |B(Ax)| ≤ kBk|Ax| ≤ kBkkAk|x|.


So
kBAk ≤ kBkkAk

Theorem 1.3.11. Let Ω be the set of all invertible operators on Rn


(a) If A ∈ Ω, B ∈ L(Rn ), and

kB − Ak kA−1 k < 1

Then B ∈ Ω.

(b) Ω is an open subset of L(Rn ) and the mapping A → A−1 is continuous


on Ω.
Proof. (a) Put kA−1 k = α1 , put kB − Ak = β then β < α. For every x ∈ Rn

α|x| = α|A−1 Ax| ≤ αkA−1 k |Ax|


= |Ax| ≤ |(A − B)x| + |Bx| ≤ β|x| + |Bx|
So that
(α − β)|x| ≤ |Bx| (x ∈ Rn ) (1.2)
Since α − β > 0; equation 1.2 shows that Bx 6= 0 if x 6= 0. Hence B is
1-1. By Theorem 1.3.5, B ∈ Ω.
(b) Let A ∈ Ω with kA−1 k = α1 . Let Nα (A) consisting of all C ∈ L(Rn ) with
kC − Ak < α. Then by (a) Nα (A) ⊂ Ω. So Ω is an open subset of L(Rn ).
Next we have to prove that the function f : Ω → Ω defined by f (A) =
A−1 is continuous. For that replace x by B −1 y in equation 1.2 the re-
sulting inequality

(α − β)|B −1 y| ≤ |BB −1 y| = |y| (y ∈ Rn )


imply that
|y|
|B −1 y| ≤ | (y ∈ Rn )
(α − β)
So by exercise 1.3.8, kB −1 k ≤ 1
(α−β)
. The identity

B −1 − A−1 = B −1 (A − B)A−1

combined with Theorem 1.3.10 (c), implies therefore that

β
kB −1 − A−1 k ≤ kB −1 kkA − BkkA−1 k ≤
α(α − β)
If B → A then β → 0 imply kB −1 − A−1 k → 0.That is f (B) → f (A).

Matrices:
Suppose {x1 , ..., xn } and {y1 , ..., ym } are bases of vector spaces X and Y re-
spectively. Then every A ∈ L(X, Y ) determines a set of numbers aij such that

m
X
Axj = aij yi (1 ≤ j ≤ n) (1.3)
i=1

It is convenient to visualize these numbers in a rectangular array of m rows


and n columns, called an m by n matrix
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
[A] = 
· · · · · · · · ·

··· 
am1 am2 · · · amn

Observe that the coordinates aij of the vector Axj (with respect to the basis
{y1 , . . . , yn } appear in the j th column of [A]. The vectors Axj are therefore
sometimes called the column vectors of [A]. With this terminology, the range
of A is spanned by the column vectors of [A].
If x = Σcj xj , the linearity of A, combined with 1.3, shows that
m n
!
X X
Ax = aij cj yi (1.4)
i=1 i=1

Thus the coordinates of Ax are Σj aij cj . Note that in 1.3 the summation ranges
over the first subscript of aij , but that we sum over the second subscript when
computing coordinates.
Suppose next that an m by n matrix is given, with real entries aij . If A
is then defined by 1.4, it is clear that A ∈ L(X, Y ) and that [A] is the given
matrix. Thus there is a natural 1-1 correspondence between L(X, Y ) and the
set of all real m by n matrices.
Example 1.3.12. Let A : R2 → R2 by A(x1 , x2 ) = (2x1 +3x2 , 4x1 −x2 ). Then
 
2 3
[A] =
4 −1

Example 1.3.13. Let  


0 2 4
[B ] =
6 8
Then by 1.4 define B : R2 → R2 by B(x1 , x2 ) = (2x1 + 4x2 , 6x1 + 8x2 ) then
B ∈ L(R2 ) and [B] = [B 0 ].
We emphasize, that [A] depends not only on A but also on the choice of
bases in X and Y . The same A may give rise to many different matrices if we
change bases and vice versa.
If Z is a third vector space, with basis {z1 , . . . , zp }, if A is given by 1.3,
and if X X
Byi = bki zk , (BA)xj = ckj zk ,
k k

then A ∈ L(X, Y ), B ∈ L(Y, Z), BA ∈ L(X, Z), and since


X X
B (Axj ) = B aij yi = aij Byi
i i
!
X X X X
= aij bki zk = bki aij zk
k k i

the independence of {z1 , . . . , zp } implies that


X
ckj = bki aij (1 ≤ k ≤ p, 1 ≤ j ≤ n) (1.5)
i
This shows how to compute the p by n matrix [BA] from [B] and [A]. If we
define the product [B][A] to be [BA], then 1.5 describes the usual rule of matrix
multiplication.
Finally, suppose {x1 , . . . , xn } and {y1 , . . . , ym } are standard bases of Rn
and Rm , and A is given by 1.4.The Schwarz inequality shows that
!2 !
X X X X X X
|Ax|2 = aij cJ ≤ a2ij c2j = a2ij |x|2 .
i j i j j i,j

Thus by exercise 1.3.8


( )1/2
X
kAk ≤ a2ij (1.6)
1,j

Theorem 1.3.14. If S is a metric space, if a11 , . . . , amn are real continuous


functions on S, and if, for each p ∈ S, Ap is the linear transformation of Rn into
Rm whose matrix has entries aij (p), then the mapping p → Ap is a continuous
mapping of S into L (Rn , Rm ) .
Proof. Here aij ’s are continuous functions from S to R for 1 ≤ i ≤ m,
1 ≤ j ≤ n . Let p ∈ S , then consider the m by n matrix
 
a11 (p) a12 (p) · · · a1n (p)
 a21 (p) a22 (p) · · · a2n (p) 
[Ap ] = 
 ···

··· ··· ··· 
am1 (p) am2 (p) · · · amn (p)
Let Ap ∈ L (Rn , Rm ) is the corresponding linear transformation. If q → p,
then aij (q) → aij (p). So aij (q) − aij (p) → 0. So, if we apply equation 1.6 to
Aq − Ap in place of A, we will get kAq − Ap k → 0 as q → p.

1.4 Differentiation
Definition 1.4.1. Suppose E is an open set in Rn , f maps E into Rm , and
x ∈ E. If there exist a linear transformation A of Rn into Rm such that
|f(x+h) − f(x) − Ah|
lim =0 (1.7)
h→0 |h|
then we say that f is differentiable at x, and we write

f 0 (x) = A (1.8)

If f is differentiable at every x ∈ E, we say that f is differentiable in E.


Remarks
(a) The equation 1.7 can be rewritten in the form

f(x + h) − f(x) = f 0 (x)h + r(h) (1.9)

where the remainder r(h) satisfies


|r(h)|
lim =0 (1.10)
h→0 |h|

(b) Suppose f and E are as in Definition 1.4.1 and f is differentiable in E. For


every x ∈ E, f 0 (x) is then a function, namely, a linear transformation of
Rn into Rm . But f 0 is also a function: f 0 maps E into L(Rn , Rm ).

(c) Equation 1.9 shows that f is continuous at any point at which f is differ-
entiable.
(Hint:Every Linear transformation is continuous by Theorem 1.3.10 and
lim r(h) = 0)
h→0

(d) The derivative defined by 1.7 or 1.9 is often called the differential of f at
x, or the total derivative of f at x.
Theorem 1.4.2. Suppose E and f are as in Definition 1.4.1 and x ∈ E and
equation 1.7 holds with A = A1 and A = A2 Then A1 = A2 .
Proof. If B = A1 − A2 . the inequality

|Bh| ≤ |f(x+h) − f(x) − A1 h| + |f(x+h) − f(x) − A2 h|


|Bh|
shows that → 0 as h → 0.
|h|
For fixed h 6= 0 , th → 0 as t → 0 it follows that
|B(th)|
→ 0 as t → 0. (1.11)
|th|
|t||B(h)| |B(h)|
The linearity of B shows that the left side of 1.11 is =
|t||h| |h|
independent of t. Thus Bh = 0 for every h ∈ Rn . Hence B = 0.
Example 1.4.3. If A ∈ L(Rn , Rm ) and x ∈ Rn then A0 (x) = A.
Proof. Since A(x + h) − A(x) = Ah by the linearity of A. With f(x) = Ax,
the numerator in equation 1.7 is thus 0 for every h ∈ Rn .
Theorem 1.4.4. Suppose E is an open set in Rn , f maps E into Rm , f is
differentiable at x0 ∈ E, g maps an open set containing f(E) into Rk , and g is
differentiable at f(x0 ). Then the mapping F of E into Rk defined by

F(x) = g(f(x))

is differentiable at x0 . and

F 0 (x0 ) = g 0 (f(x0 ))f 0 (x0 ).

Proof. Put y0 = f(x0 ), A = f 0 (x0 ), B = g 0 (y0 ) and define

u(h) = f(x0 + h) − f(x0 ) − Ah


v(k) = g(y0 + k) − g(y0 ) − Bk

for all h ∈ Rn and k ∈ Rm for which f(x0 + h) and g(y0 + k) are defined. Then

|u(h)| = (h)|h|, |v(k)| = η(k)|k| (1.12)

where (h) → 0 as h → 0 and η(k) → 0 as k → 0


Given h put k = f(x0 + h) − f(x0 ) Then

|k| = |Ah + u(h)| ≤ [kAk + (h)]|h|, (1.13)

and

F(x0 + h) − F(x0 ) − BAh = g(f(x0 + h)) − g(f(x0 )) − BAh


= g(y0 + k) − g(y0 ) − BAh
= B(k − Ah) + v(k)
= Bu(h) + v(k)

Hence by equation 1.12 and equation 1.13 imply, for h 6= 0, that

|F(x0 + h) − F(x0 ) − BAh|


≤ kBk(h) + [kAk + (h)]η(k)
|h|

Let h → 0 Then (h) → 0 . Also , k → 0 by equation 1.13, so that η(k) → 0.


So
|F(x0 + h) − F(x0 ) − BAh|
lim =0
h→0 |h|
It follows that F 0 (x0 ) = BA.

School of Distance Education, University of Calicut


Partial derivatives We again consider a function f that maps an open set
E ⊂ Rn into Rm . Let {e1 , ..., en } and {u1 , ..., um } be the standard bases of Rn
and Rm . The components of f are the real functions f1 , ....fm defined by
m
X
f(x) = fi (x)ui (x ∈ E) (1.14)
i=1

or, equivalently, by fi (x) = f(x) · ui , 1 ≤ i ≤ m


For x ∈ E, 1 ≤ i ≤ m, 1 ≤ j ≤ n we define

fi (x + tej ) − fi (x)
(Dj fi )(x) = lim (1.15)
t→0 t
provided the limit exists,is called partial derivative of fi with respect to xj .

Theorem 1.4.5. Suppose f maps an open set E ⊂ Rn into Rm and f is


differentiable at a point x ∈ E. Then the partial derivatives (Dj fi )(x) exist,
and m
X
0
f (x)ej = (Dj fi )(x)ui (1 ≤ j ≤ n) (1.16)
i=1

where {e1 , , ..., en } and {u1 , ..., um } be the standard bases of Rn and Rm .

Proof. Fix j.Since f is differentiable at x, by equation 1.9

f(x + tej ) − f(x) = f 0 (x)(tej ) + r(tej )

|r(tej )|
where → 0 as t → 0.The linearity of f 0 (x) shows that
t
f(x + tej ) − f(x)
lim = f 0 (x)ej (1.17)
t→0 t
If we now represent f in terms of its components, as in equation 1.14, then
equation 1.17 becomes
m
X fi (x + tej ) − fi (x)
lim ui = f 0 (x)ej (1.18)
t→0
i=1
t

It follows that each quotient in this sum has a limit, as t → 0 so that each
(Dj fi )(x) exists, and then equation 1.16 follows from equation 1.18.
Let [f 0 (x)] be the matrix of the linear transformation f 0 (x) with respect
to our standard bases.
Then f 0 (x)ej is the j th column vector of [f 0 (x)] and 1.16 shows therefore
that the number (Dj fi )(x) occupies the spot in the ith row and j th column of
[f 0 (x)] Thus  
(D1 f1 )(x) · · · (Dn f1 )(x)
[f 0 (x)] =  ··· ··· ··· 
(D1 fm )(x) · · · (Dn fm )(x)
If h = hj ej is any vector in Rn then equation 1.16 implies that
P

m
( n )
X X
f 0 (x)h = (Dj fi )(x)hj ui
i=1 j=1

Example 1.4.6. From Theorem 1.4.5 , if f is differentiable at a point x ∈ E


then all its partial derivative exists. Consider the function f on R2 by

0 if (x, y) = (0, 0)
f (x, y) = xy
 2 if (x, y) 6= (0, 0)
x + y2
By using equation 1.15
f (t, 0) − f (0, 0)
(D1 f ) (0, 0) = lim =0
t→0 t

f (0, t) − f (0, 0)
(D2 f ) (0, 0) = lim =0
t→0 t
However, f (x, y) is not continuous at (0, 0), since if (x, y) → (0, 0) along the
line y = x, then f (x, y) = 12 and if (x, y) → (0, 0) along the x− axis , then
f (x, y) = 0 . So lim f (x, y) does not exist.
(x,y)→(0,0)
So f is not differentiable at (0, 0).
Definition 1.4.7. Let f be a real valued differentiable function with domain
E, Let x ∈ E then the gradient of f at x, defined by
n
X
∇f (x) = (Di f )(x)ei
i=1

Definition 1.4.8. Let f be a real valued differentiable function with domain


E. Fix an x ∈ E, let u ∈ Rn be a unit vector (that is, |u| = 1), then
f (x + tu) − f (x)
lim
t→0 t
is called the directional derivative of f at x, in the direction of the unit vector
u, and is denoted by (Du f ) (x).
By simple calculation we can show that

(Du f ) (x) = (∇f )(x) · u (1.19)

If f and x are fixed, but u varies, then 1.19 shows that (Du f ) (x) attains
its maximum when u is a positive scalar multiple of (∇f )(x).
If u = Σui ei , then 1.19 shows that (Du f ) (x) can be expressed in terms of
the partial derivatives of f at x by the formula
n
X
(Du f ) (x) = (Di f ) (x)ui .
i=1

Result 1.4.9. Suppose f is a continuous mapping of [a, b] into Rk and f is


differentiable in (a, b). Then there exist x ∈ (a, b) such that

|f(b) − f(a)| ≤ (b − a)|f 0 (x)|

Theorem 1.4.10. Suppose f maps a convex open set E ⊂ Rn into Rm , f is


differentiable in E and there is a real mumber M such that

kf 0 (x)k ≤ M

for every x ∈ E. Then

|f(b) − f(a)| ≤ M |b − a|

for all a ∈ E, b ∈ E.
Proof. Fix a ∈ E, b ∈ E. Define

γ(t) = (1 − t)a + tb

for all t ∈ R1 such that γ(t) ∈ E. Since E is convex, γ(t) ∈ E if 0 ≤ t ≤ 1.


Put
g(t) = f (γ(t))
Then
g 0 (t) = f 0 (γ(t))γ 0 (t) = f 0 (γ(t))(b − a)
so that
|g 0 (t)| ≤ kf 0 (γ(t))k| b − a| ≤ M |b − a|
for all t ∈ [0, 1]. By Result 1.4.9,

|g(1) − g(0)| ≤ M |b − a|

But g(0) = f(a) and g(1) = f(b). This completes the proof.
Corollary 1.4.11. If, f 0 (x) = 0 for all x ∈ E, then f is constant.
Proof. To prove this, note that the hypotheses of the above theorem hold with
M = 0 . So
|f(b) − f(a)| ≤ 0|b − a|
for all a ∈ E, b ∈ E.
This imply that |f(b)−f(a)| = 0. That is f(b) = f(a) for all a ∈ E, b ∈ E.
Definition 1.4.12. A differentiable mapping f of an open set E ⊂ Rn into Rm
is said to be continuously differentiable in E if f 0 is a continuous mapping of E
into L (Rn , Rm ). More explicitly, Every x ∈ E and to every  > 0 corresponds
a δ > 0 such that
kf 0 (y) − f 0 (x)k < 
if y ∈ E and |x − y| < δ. If this is so, we also say that f is a C 0 -mapping, or
that f ∈ C 0 (E).
Result 1.4.13. Mean Value theorem: If f is a real continuous function on
[a, b] which is differentiable in (a, b), then there is a point x ∈ (a, b) at which

f (b) − f (a) = (b − a)f 0 (x)

Theorem 1.4.14. Suppose f maps an open set E ⊂ Rn into Rm . Then


f ∈ C 0 (E) if and only if the partial derivatives Dj fi exist and are continuous
on E for 1 ≤ i ≤ m, 1 ≤ j ≤ n.
Proof. Assume first that f ∈ C 0 (E). By equation 1.16,

(Dj fi ) (x) = (f 0 (x)ej ) · ui

for all i, j, and for all x ∈ E. Hence

(Dj fi ) (y) − (Dj fi ) (x) = {[f 0 (y) − f 0 (x)] ej } · ui

and since |ui | = |ei | = 1, it follows from result 1.1.3 (d) and exercise 1.3.7 that

|(Dj fi ) (y) − (Dj fi ) (x)| ≤ |[f 0 (y) − f 0 (x)] ej |


≤ kf 0 (y) − f 0 (x)k

f ∈ C 0 (E). So , for every  > 0 corresponds a δ > 0 such that

|(Dj fi ) (y) − (Dj fi ) (x)| < 

if y ∈ E and |x − y| < δ. Hence Dj fi is continuous.


For the converse, consider the case m = 1. Fix x ∈ E and  > 0. Since E
is open, there is an open ball S ⊂ E, with center at x and radius r, and the
continuity of the functions Dj f shows that r can be chosen so that

|(Dj f ) (y) − (Dj f ) (x)| < (y ∈ S, 1 ≤ j ≤ n) (1.20)
n
Suppose h ∈ Rn with |h| < r . Let h =
P
hj ej , put v0 = 0, and vk =
h1 e1 + · · · + hk ek , for 1 ≤ k ≤ n. Then
n
X
f (x + h) − f (x) = [f (x + vj ) − f (x + vj−1 )] (1.21)
j=1

Since |vk | < |h| < r for 1 ≤ k ≤ n and since S is convex, the segments
with end points x + vj−1 and x + vj lie in S. Since vj = vj−1 + hj ej , the mean
value theorem (Result 1.4.13) shows that the j th summand in 1.21 is equal to

hj (Dj f ) (x + vj−1 + θj hj ej )

for some θj ∈ (0, 1), and this differs from hj (Dj f ) (x) by less than |hj |
n
using 1.20.
[ To understand this , let’s consider the case j = 2.
Let x = (x1 , ...xn ), h = (h1 , ..., hn ). Then f (x + v2 ) − f (x + v1 )
= f ((x1 , x2 , ..., xn ) + (h1 , h2 , 0, ..., 0)) − f ((x1 , ..., xn ) + (h1 , 0, 0, ..., 0))
= f (x1 + h1 , x2 + h2 , ..., xn ) − f (x1 + h1 , x2 , ..., xn ) . So consider the function
g2 : [x2 , x2 + h2 ] → R by g(t) = f (x1 + h1 , t, x3 , ..., xn ) then by mean value
theorem (Result 1.4.13) there exist c2 ∈ (x2 , x2 + h2 ) such that

g(x2 + h2 ) − g(x2 ) = h2 g 0 (c2 )

c2 ∈ (x2 , x2 + h2 ) imply that c2 = x2 + θ2 h2 for some θ2 ∈ (0, 1)]

By equation 1.21, it follows that


n n
X 1X
f (x + h) − f (x) − hj (Dj f ) (x) ≤ |hj |  ≤ |h|
j=1
n j=1

for all h such that |h| < r


This says that f is differentiable at x and that f 0 (x) is the linear function
which assigns the number Σhj (Dj f ) (x) to the vector h = Σhj ej . The matrix
[f 0 (x)] consists of the row (D1 f ) (x), . . . , (Dn f ) (x); and since D1 f, . . . , Dn f
are continuous functions on E, by Theorem 1.3.14, f ∈ C 0 (E).
1.5 The Contraction Principle
Definition 1.5.1. Let X be a metric space, with metric d. If ϕ maps X into
X and if there is a number c < 1 such that

d(ϕ(x), ϕ(y)) ≤ c d(x, y) (1.22)

for all x, y ∈ X, then ϕ is said to be a contraction of X into X.

Theorem 1.5.2. If X is a complete metric space, and if ϕ is a contraction of


X into X, then there exists one and only one x ∈ X such that ϕ(x) = x.

Proof. The uniqueness is a triviality . If ϕ(x) = x and ϕ(y) = y, then equation


1.22 gives d(x, y) ≤ c d(x, y), which happens only when d(x, y) = 0
Let’s prove the existence part.
Pick x0 ∈ X arbitrarily, and define {xn } recursively, by setting

xn+1 = ϕ (xn ) (n = 0, 1, 2, . . .)

Choose c < 1 so that equation 1.22 holds. For n ≥ 1 then we have

d (xn+1 , xn ) = d (ϕ (xn ) , ϕ (xn−1 )) ≤ c d (xn , xn−1 )

Hence induction gives

d (xn+1 , xn ) ≤ cn d (x1 , x0 ) (n = 0, 1, 2, . . .)

If n < m, it follows that,


m
X
d (xn , xm ) ≤ d (xi , xi−1 )
i=n+1

≤ c + cn+1 + · · · + cm−1 d (x1 , x0 )


n


≤ (1 − c)−1 d (x1 , x0 ) cn
 

Thus {xn } is a Cauchy sequence . Since X is complete, lim xn = x for some


x ∈ X.
Since ϕ is a contraction, ϕ is continuous on X. Hence

ϕ(x) = lim ϕ(xn ) = lim xn+1 = x.


n→∞ n→∞
1.6 The Inverse Function Theorem
Theorem 1.6.1. Suppose f is a C 0 − mapping of an open set E ⊂ Rn into
Rm , f 0 (a) is is invertible for some a ∈ E and b = f (a). Then
(a) there exist open sets U and V in Rn such that a ∈ U, b ∈ V, f is one-to
one on U and f (U ) = V

(b) if g is the inverse of f [which exists, by (a)], defined in V by

g(f (x)) = x (x ∈ U )

then g ∈ C 0 (V ).
Proof. (a) Put f 0 (a) = A, and choose λ so that

2λkA−1 k = 1 (1.23)

Since f 0 is continuous at a, there is an open ball U ⊂ E, with center at


a, such that
kf 0 (x) − Ak < λ (x ∈ U ) (1.24)
We associate to each y ∈ Rn a function ϕ, defined by

ϕ(x) = x + A−1 (y − f (x)) (x ∈ E) (1.25)

Note that f (x) = y if and only if x is a fixed point of ϕ.


Since ϕ 0 (x) = I −A−1 f 0 (x) = A−1 (A − f 0 (x)) ,1.23 and 1.24 imply that
1
kϕ 0 (x)k < (x ∈ U )
2
Hence
1
|ϕ (x1 ) − ϕ (x2 )| ≤ |x1 − x2 | (x1 , x2 ∈ U ) (1.26)
2
by Theorem 1.4.10. It follows that ϕ has at most one fixed point in U ,
so that f (x) = y for at most one x ∈ U . Thus f is 1 − 1 in U .
Next, put V = f (U ), and pick y0 ∈ V . Then y0 = f (x0 ) for some
x0 ∈ U . Let B be an open ball with center at x0 and radius r > 0, so
small that its closure B̄ lies in U . We will show that y ∈ V whenever
|y − y0 | < λr. This proves,that V is open.
Fix y, |y − y0 | < λx. With ϕ as in 1.25.
−1 r
|ϕ (x0 ) − x0 | = A (y − y0 ) < A−1 λr =
2
If x ∈ B̄, it therefore follows from 1.26 that
|ϕ(x) − x0 | ≤ |ϕ(x) − ϕ (x0 )| + |ϕ (x0 ) − x0 |
1 r
< |x − x0 | + ≤ r;
2 2
hence ϕ(x) ∈ B. Note that 1.26 holds if x1 ∈ B̄, x2 ∈ B̄.
Thus ϕ is a contraction of B̄ into B̄. Being a closed subset of Rn , B̄
is complete. Theorem 1.5.2 implies therefore that ϕ has a fixed point
x ∈ B̄. For this x, f(x) = y. Thus y ∈ f (B̄) ⊂ f (U ) = V
This proves part (a) of the theorem.
(b) Pick y ∈ V, y + k ∈ V . Then there exist x ∈ U, x + h ∈ U so that
y = f (x), y + k = f (x + h). With ϕ as in 1.25,
−1 −1
ϕ(x + h) − ϕ(x) = h + A [f (x) − f (x + h)] = h − A k
−1 −1
By 1.26, |h − A k| ≤ 12 |h|. Hence |A k| ≥ 12 |h|, and
|h| ≤ 2 A−1 k = λ−1 k| (1.27)
By 1.23, 1.24, and Theorem 1.3.11, f 0 (x) has an inverse, say T . Since
g(y + k) − g(y) − T k = h − T k = −T [f(x + h) − f(x) − f 0 (x)h]
1.27 implies
|g(y + k) − g(y) − T k| kT k |f (x + h) − f (x) − f 0 (x)h|
≤ ·
|k| λ |h|
As k → 0,1.27 shows that h → 0. The right side of the last inequality
thus tends to 0 . Hence the same is true of the left. We have thus proved
that g 0 (y) = T . But T was chosen to be the inverse of f 0 (x) = f 0 (g(y)).
Thus
−1
g 0 (y) = {f 0 (g(y))} (y ∈ V ) (1.28)
Finally, note that g is a continuous mapping of V onto U (since g is
differentiable), that f 0 is a continuous mapping of U into the set Ω of all
invertible elements of L (Rn ), and that inversion is a continuous mapping
of Ω onto Ω, by Theorem 1.3.11. If we combine these facts with equation
1.28, we see that g ∈ C 0 (V ). This completes the proof.

Theorem 1.6.2. If f is a C 0 − mapping of an open set E ⊂ Rn into Rm and


if f 0 (x) is invertible for every x ∈ E, then f (W ) is an open subset of Rn for
every open set W ⊂ E.
1.7 The Implicit Function Theorem
Notation 1.7.1. If x = (x1 , . . . , xn ) ∈ Rn and y = (y1 , . . . , ym ) ∈ Rm , let us
write (x, y) for the point (or vector)
(x1 , . . . , xn , y1 , . . . , ym ) ∈ Rn+m
That is, the first entry in (x, y) will always be a vector in Rn and the second
will be a vector in Rm .
Every A ∈ L (Rn+m , Rn ) can be split into two linear transformations Ax
and Ay , defined by
Ax h = A(h, 0), Ay k = A(0, k)
for any h ∈ Rn , k ∈ Rm . Then Ax ∈ L (Rn ) , Ay ∈ L (Rm , Rn ), and
A(h, k) = Ax h + Ay k. (1.29)
Theorem 1.7.2. Linear version of the implicit function theorem.
If A ∈ L (Rn+m , Rn ) and if Ax is invertible, then there corresponds to every
k ∈ Rm a unique h ∈ Rn such that A(h, k) = 0. This h can be computed
from k by the formula
h = − (Ax )−1 Ay k (1.30)
Proof. By 1.29,
A(h, k) = 0
if and only if
Ax h + A y k = 0
which is the same as 1.30 when Ax is invertible.
Theorem 1.7.3. Implicit function theorem.
Let f be a C 0 -mapping of an open set E ⊂ Rn+m into Rn , such that f (a, b) = 0
for some point (a, b) ∈ E. Put A = f 0 (a, b) and assume that Ax is invertible.
Then there exist open sets U ⊂ Rn+m and W ∈ Rm , with (a, b) ∈ U and
b ∈ W , having the following property:
To every y ∈ W corresponds a unique x such that
(x, y) ∈ U and f (x, y) = 0
If this x is defined to be g(y), then g is a C 0 -mapping of W into Rn , g(b) = a,
f(g(y), y) = 0 (y ∈ W ) (1.31)
and
g 0 (b) = − (−Ax )−1 Ay (1.32)
Proof. Define F by

F(x, y) = (f(x, y), y) ((x, y) ∈ E) (1.33)

Then F is a C 0 -mapping of E into Rn+m . We claim that F 0 (a, b) is an


invertible element of L (Rn+m ) :
Since f(a, b) = 0, we have

f (a + h, b + k) = A(h, k) + r(h, k)

where r is the remainder that occurs in the definition of f 0 (a, b), Since

F(a + h, b + k) − F(a, b) = (f (a + h, b + k), k)


= (A(h, k), k) + (r(h, k), 0)

it follows that F 0 (a, b) is the linear operator on Rn+m that maps (h, k) to
(A(h, k), k). If this image vector is 0, then A(h, k) = 0 and k = 0, hence
A(h, 0) = 0, and Theorem 1.7.2 implies that h = 0 . It follows that F 0 (a, b)
is 1-1; hence it is invertible (Theorem 1.3.5 ).
The inverse function theorem can therefore be applied to F. It shows that
there exist open sets U and V in Rn+m , with (a, b) ∈ U, (0, b) ∈ V such that
F is a 1-1 mapping of U onto V .
We let W be the set of all y ∈ Rm such that (0, y) ∈ V . Note that b ∈ W .
Consider the function L : Rm → Rn+m by L(y) = (0, y). Then L is continuous
and L−1 (V ) = W . So W is open .
If y ∈ W , then (0, y) = F(x, y) for some (x, y) ∈ U. By 1.33, f (x, y) = 0
for this x.
Suppose, with the same y, that (x 0 , y) ∈ U and f (x 0 , y) = 0. Then

F (x 0 , y) = (f (x 0 , y) , y) = (f (x, y), y) = F(x, y)

Since F is 1-1 in U , it follows that x 0 = x. This proves the first part of the
theorem.
For the second part, define g(y), for y ∈ W , so that (g(y), y) ∈ U and 1.31
holds. Then
F(g(y), y) = (0, y) (y ∈ W ) (1.34)
If G is the mapping of V onto U that inverts F, then G ∈ C 0 , by the inverse
function theorem, and 1.34 gives

(g(y), y) = G(0, y) (y ∈ W ) (1.35)

Since G ∈ C 0 ,1.35 shows that g ∈ C 0 .


Finally, to compute g 0 (b), put (g(y), y) = Φ(y). Then

Φ 0 (y)k = (g 0 (y)k, k) (y ∈ W, k ∈ Rm ) (1.36)


By 1.31, f (Φ(y)) = 0 in W . The chain rule shows therefore that

f 0 (Φ(y))Φ 0 (y) = 0

When y = b, then Φ(y) = (a, b), and f 0 (Φ(y)) = A. Thus

AΦ 0 (b) = 0 (1.37)

It now follows from 1.37,1.36, and 1.29, that

Ax g 0 (b)k + Ay k = A (g 0 (b)k, k) = AΦ 0 (b)k = 0


for every k ∈ Rm . Thus

Ax g 0 (b) + Ay = 0
This is equivalent to 1.32, and completes the proof.

Example 1.7.4. Take n = 2, m = 3, and consider the mapping f = (f1 , f2 ) of


R5 into R2 given by

f1 (x1 , x2 , y1 , y2 , y3 ) = 2ex1 + x2 y1 − 4y2 + 3


f2 (x1 , x2 , y1 , y2 , y3 ) = x2 cos x1 − 6x1 + 2y1 − y3

If a = (0, 1) and b = (3, 2, 7), then f(a, b) = 0. With respect to the standard
bases, the matrix of the transformation A = f 0 (a, b) is
 
2 3 1 −4 0
[A] =
−6 1 2 0 −1

Hence    
2 3 1 −4 0
[Ax ] = , [Ay ] = .
−6 1 2 0 −1
We see that the column vectors of [Ax ] are independent. Hence Ax is invert-
ible and the implicit function theorem asserts the existence of a C 0 -mapping
g, defined in a neighborhood of (3, 2, 7), such that g(3, 2, 7) = (0, 1) and
f(g(y), y) = 0. We can use 1.32 to compute g 0 (3, 2, 7) : Since
 
 −1  −1 1 1 −3
(Ax ) = [Ax ] =
20 6 2
1.32 gives
1 1 3
    
0 1 1 −3 1 −4 0 − 20
[g (3, 2, 7)] = − = 4 5
20 6 2 2 0 −1 − 21 6
5
1
10

In terms of parial derivatives, the conclusion is that


D1 g1 = 41 D2 g1 = 1
5
3
D3 g1 = − 20
D1 g2 = − 21 D2 g2 = 6
5
1
D3 g2 = 10
at the point (3, 2, 7)

1.8 Determinants
Definition 1.8.1. If (j1 , . . . , jn ) is an ordered n-tuple of integers, define
Y
s (j1 , . . . , jn ) = sgn (jq − jp ) (1.38)
p<q

where sgn x = 1 if x > 0, sgn x = −1 if x < 0, sgn x = 0 if x = 0. Then


s (j1 , . . . , jn ) = 1, −1, or 0, and it changes sign if any two of the j’s are inter-
changed.
Example 1.8.2.
(a) s(2, 3, 1) = sgn(1−3). sgn(1−2). sgn(3−2) = sgn(−2). sgn(−1). sgn(1) =
(−1).(−1).1 = 1
(b) s(2, 3, 2) = 0
Let [A] be the matrix of a linear operator A on Rn , relative to the standard
basis {e1 , . . . , en }, with entries a(i, j) in the ith row and j th column. The
determinant of [A] is defined to be the number
X
det[A] = s (j1 , . . . , jn ) a (1, j1 ) a (2, j2 ) · · · a (n, jn ) (1.39)
The sum in 1.39 extends over all ordered n-tuples of integers (j1 , . . . , jn ) with
1 ≤ jr ≤ n
The column vectors xj of [A] are
n
X
xj = a(i, j)ei (1 ≤ j ≤ n) (1.40)
i=1

It will be convenient to think of det [A] as a function of the column vectors of


[A], If we write
det (x1 , . . . , xn ) = det[A]
det is now a real function on the set of all ordered n-tuples of vectors in Rn .
Example 1.8.3.  
a(1, 1) a(1, 2)
[A] =
a(2, 1) a(2, 2)
then det[A] =

s(1, 1)a(1, 1)a(2, 1) + s(1, 2)a(1, 1)a(2, 2) + s(2, 1)a(1, 2)a(2, 1) + s(2, 2)a(1, 2)a(2, 2)
= a(1, 1)a(2, 2) − a(1, 2)a(2, 1)

Theorem 1.8.4.

(a) If I is the identity operator on Rn , then

det[I] = det (e1 , . . . , en ) = 1.

(b) det is a linear function of each of the colunn vectors xj , if the others are
held fixed.
That is det(x1 , ..., cxj , ..., xn ) = c det(x1 , ..., xj , ..., xn ) and

det(x1 , ..., xj + yj , ..., xn ) = det(x1 , ..., xj , ..., xn ) + det(x1 , ..., yj , ..., xn )

(c) If [A]1 is obtained from [A] by interchanging two columns, then

det[A]1 = − det[A]

(d) If [A] has two equal columns, then det [A] = 0.

Proof.

(a) If A = I then a(i, i) = 1 and a(i, j) = 0 for i 6= j. Hence

det [I] = s(1, 2, . . . , n)a(1, 1)a(2, 2)...a(n, n) = 1.

(b) By 1.38, s (j1 , . . . , jn ) = 0 if any two of the j’s are equal. Each of the
remaining n! products in 1.39 contains exactly one factor from each col-
umn. This proves (b).

(c) It is an immediate consequence of the fact that s (j1 , . . . , jn ) changes sign


if any two of the j’s s are interchanged.

(d) It is a corollary of (c).


Theorem 1.8.5. If [A] and [B] are n by n matrices, then

det([B][A]) = det[B] det[A].

Proof. If x1 , . . . , xn are the columns of [A], define

∆B (x1 , . . . , xn ) = ∆B [A] = det([B][A] (1.41)

The columns of [B][A] are the vectors Bx1 , . . . , Bxn . Thus

∆B (x1 , . . . , xn ) = det (Bx1 , . . . , Bxn ) (1.42)

By 1.42 and Theorem 1.8.4, ∆B also has properties 1.8.4(b) to (d). By (b) and
1.40,
!
X X
∆B [A] = ∆B a(i, 1)ei , x2 , . . . , xn = a(i, 1)∆B (ei , x2 , . . . , xn )
i i

Repeating this process with x2 , . . . , xn , we obtain


X
∆B [A] = a (i1 , 1) a (i2 , 2) · · · a (in , n) ∆B (ei1 , . . . , ein ) (1.43)

the sum being extended over all ordered n-tuples (i1 , . . . , in ) with 1 ≤ ir , ≤ n.
by (c) and (d)

∆B (ei1 , . . . , ein ) = t (i1 , . . . , in ) ∆B (e1 , . . . , en ) (1.44)

where t = 1, 0, or −1, and since [B][I] = [B],1.41 shows that

∆B (e1 , . . . , en ) = det[B] (1.45)

Substituting 1.45 and 1.44 into 1.43, we obtain


nX o
det([B][A]) = a (i1 , 1) · · · a (in , n) t (i1 , . . . , in ) det[B]

for all n by n matrices [A] and [B]. Taking B = I, we see that the above sum
in braces is det[A]. This proves the theorem.

Theorem 1.8.6. A linear operator A on Rn is invertible if and only if


det[A] 6= 0.
Proof. If A is invertible, Theorem 1.8.5 shows that

det[A] det A−1 = det AA−1 = det[I] = 1,


   

so that det[A] 6= 0.
If A is not invertible, the columns x1 , . . . , xn of [A] are dependent (Theorem
1.3.5); hence there is one, say, xk , such that
X
xk + cj xj = 0 (1.46)
j6=k

for certain scalars cj . By Theorem 1.8.4(b) and (d), xk can be replaced by


xk + cj xj without altering the determinant, if j 6= k. Repeating, we see that
xk can be replaced by the left side of 1.46, i.e., by 0 ,without altering the
determinant. But a matrix which has 0 for one column has determinant 0.
Hence det[A] = 0.
Theorem 1.8.7. Determinant of a linear operator A on Rn is independent
from the choices of basis on Rn
Proof. Suppose {e1 , . . . , en } and {u1 , . . . , un } are bases in Rn . Every linear
operator A on Rn determines matrices [A] and [A]U , with entries aij and αij ,
glven by X X
Aej = aij ei , Auj = αij ui .
i i
P
If uj = Bej = bij ej , then the matix [B] is invertible and Auj is equal to
!
X X X X X
αkj Bek = αkj bik ei = bik αkj ei
k k i i k

and also to !
X X X
ABej = A bkj ek = aik bkj ei .
k i k

Thus Σbik αkj = Σaik bkj , or

[B][A]U = [A][B] (1.47)

Since B is invertible, det [B] 6= 0. Hence 1.47, combined with Theorem


1.8.5, shows that

det[A]U = det[A].

You might also like