You are on page 1of 28

Self-Study Notes

Marcus Tornea
August 5, 2020

1 Multivariable Calculus
Definition 1.1. A function f : A → Rm with A ⊆ Rn is differentiable at
a point a ∈ A iff a is in the interior of A and there exists a linear transfor-
mation λ : Rn → Rm such that
kf (a + h) − f (a) − λ(h)k
lim = 0.
h→0 khk
The function λ is denoted Df (a) and is called the derivative of f at a. It
is simply called differentiable iff Df (a) exists for all a ∈ A.
It can also be justified that λ be called the derivative of f as it is appar-
ently the only linear transformation which satisfies the above property, if it
exists.
Theorem 1.2. If f : A → Rm with A ⊆ Rn is differentiable at a ∈ A, then
its derivative at a is unique.
Proof. Suppose that λ, ψ : Rn → Rm are derivatives of f at a. For all h 6= 0
where f (a + h) is defined, let d(h) = f (a + h) − f (a) and we obtain:
kλ(h) − ψ(h)k kλ(h) − d(h) + d(h) − ψ(h)k
0≤ =
khk khk
kd(h) − λ(h)k kd(h) − ψ(h)k
≤ + .
khk khk
Since a is an interior point of A, there exists some open set containing 0,
all of whose members h yield a + h ∈ A, and therefore the above inequality
(except at h = 0); thus, by squeeze theorem,
kλ(h) − ψ(h)k
lim = 0.
h→0 khk

1
If x ∈ Rn is nonzero, then a change of variables can be performed as follows:
kλ(tx) − ψ(tx)k
0 = lim
t→0 ktxk
|t|kλ(x) − ψ(x)k
= lim
t→0 |t|kxk
kλ(x) − ψ(x)k
= .
kxk
Where t is a scalar variable. Thus, λ(x) = ψ(x). Finally, by nature of linear
transformations, the same can be said for x = 0. We therefore conclude that
λ = ψ.

The following lemma shows the equivalence of the notions of differentia-


bility presented in single variable calculus and that which is presented here,
whenever the function of interest is a real function.
Lemma 1.3. f : A → R with A ⊆ R is differentiable a iff the limit
f (a + h) − f (a)
lim
h→0 h
exists and a is an interior point of A.
Proof. Recall that if λ : R → R is a linear transformation, then it involves
only scaling the variable by a constant; for some c ∈ R,

|f (a + h) − f (a) − ch| f (a + h) − f (a)
0 = lim = lim − c
h→0 |h| h→0 h
f (a + h) − f (a)
⇒c = lim .
h→0 h
The converse is exactly similar; the latter equality implies the former.
Remark. A noteworthy observation in this case is that the matrix of Df (a)
has exactly one entry, that is f 0 (a) by the latter equality.
Theorem 1.4. If f : A → Rm with A ⊆ Rn is differentiable at a ∈ A, then
it is continuous there.
Proof. Following through with the assumption, and with a change of variables
to the limit specified in 2.1, we know that there exists a linear λ : Rn → Rm
satisfying
kf (t) − f (a) − λ(t − a)k
lim = 0.
t→a kt − ak

2
In other words, for all t ∈ A sufficiently close to a, we have that
0 ≤ kf (t) − f (a) − λ(t − a)k < kt − ak.
Because a is an interior point of A, there exists some open ball small enough
that all its elements t satisfy the above inequality; therefore, by squeeze
theorem,
lim kf (t) − f (a) − λ(t − a)k = 0
t→a
⇒ lim(f (t) − λ(t − a)) = f (a). (1)
t→a

Now since λ is a linear transformation, there exists a sufficiently large M > 0


where 0 ≤ kλ(t − a)k ≤ M kt − ak for all t ∈ Rn . Squeeze theorem again,
gives
lim kλ(t − a)k = 0 ⇒ lim λ(t − a) = 0
t→a t→a

which by (1) yields


lim f (t) = f (a).
t→a

Theorem 1.5. Chain Rule: if f : A → Rm with A ⊆ Rn is differentiable


at a ∈ A and g : B → Rp with B ⊆ Rm is differentiable at f (a), then the
composition g ◦ f is differentiable at a, with derivative
D(g ◦ f )(a) = Dg(f (a)) ◦ Df (a).
Proof. Since f (a) is an interior point of B, there exists some subset W of B
open in Rm containing f (a), whose preimage f −1 (W ) equals V ∩ A for some
open set V in Rn , because f is continuous at a (see Lemma ??). Similarly,
there is a subset U of A open in Rn containing a, since a is in the interior of
A. Both sets U and V contain a, so should their intersection U ∩ V , which
is open in Rn .
We also see that U ∩ V = U ∩ f −1 (W ), since U = U ∩ A, and so U ∩ V =
(U ∩ A) ∩ V ; the associativity of set intersection suffices to show this result.
This implies that the image of U ∩ V under f is completely contained in B,
thereby proving that a is an interior point of dom(g ◦ f ).
Now, let b = f (a), λ = Df (a), and µ = Dg(f (a)); define ϕ : A → Rm ,
ψ : B → Rp , and ρ : A → Rp by
ϕ(x) = f (x) − f (a) − λ(x − a),
ψ(y) = g(y) − g(b) − µ(y − b),
ρ(x) = g ◦ f (x) − g ◦ f (a) − µ ◦ λ(x − a)

3
by which the differentiability of f and g at a and b respectively, implies that
kϕ(x)k
lim =0 (2)
x→a kx − ak

kψ(y)k
lim =0 (3)
y→b ky − bk

Moreover, for all x ∈ A:


ρ(x) = g(f (x)) − g(b) − µ(λ(x − a))
= g(f (x)) − g(b) − µ(f (x) − f (a) − ϕ(x))
= g(f (x)) − g(b) − µ(f (x) − b) + µ(ϕ(x))
= ψ(f (x)) + µ(ϕ(x)). (4)
Since µ is a linear transformation, some large enough M > 0 yields 0 ≤
kµ(ϕ(x))k ≤ M kϕ(x)k. This holds for all x in the set U ⊆ A introduced
earlier; therefore, by (2) and the squeeze theorem,
kµ(ϕ(x))k
lim = 0. (5)
x→a kx − ak

Let  > 0. (3) implies that for some δ1 > 0, all y ∈ B satisfying 0 < ky −bk <
δ1 also satisfy kψ(y)k < N +1 ky − bk for some N > 0 suitably large enough so
that kλ(x−a)k ≤ N kx−ak for all x ∈ Rn . Clearly, if y = b, then kψ(y)k = 0,
and so we can tweak the above statement as follows:

ky − bk < δ1 ⇒ kψ(y)k ≤ ky − bk. (6)
N +1
From (2), we also obtain that for some δ2 > 0, all x ∈ A that satisfy
0 < kx − ak < δ2 should also satisfy kϕ(x)k < kx − ak.
Finally, by continuity of f at a, some δ3 > 0 yields kf (x) − bk < δ1
for all x ∈ A satisfying kx − ak < δ3 . Gathering all these facts, define
δ = min{δ2 , δ3 } and assume that 0 < kx − ak < δ. We obtain the inequality
which implies by (6) that

kψ(f (x))k ≤ kf (x) − bk
N +1

= kϕ(x) + λ(x − a)k
N +1

≤ (kϕ(x)k + kλ(x − a)k)
N +1

< (kx − ak + N kx − ak)
N +1
= kx − ak.

4
Therefore,
kψ(f (x))k
lim = 0. (7)
x→a kx − ak

Now, for all x ∈ A (and thus all x in U ) except at a itself, we obtain from
(4) that:

kρ(x)k kψ(f (x)) + µ(ϕ(x))k


0≤ =
kx − ak kx − ak
kψ(f (x))k kµ(ϕ(x))k
≤ + ;
kx − ak kx − ak

thus, from (5) and (7), since the limit of the right side of the above inequality
as x → a is zero, we ultimately deduce that

kρ(x)k kg ◦ f (x) − g ◦ f (a) − µ ◦ λ(x − a)k


lim = lim = 0;
x→a kx − ak x→a kx − ak

i.e. g ◦ f is differentiable at a and its derivative D(g ◦ f )(a) in question is


µ ◦ λ = Dg(f (a)) ◦ Df (a).

Theorem 1.6. The following are true for the derivative:

1. If f : Rn → Rm is a constant function, then for all a ∈ Rn ,

Df (a) = 0.

2. If f : Rn → Rm is a linear transformation, then for all a ∈ Rn ,

Df (a) = f.

3. f : A → Rm with A ⊆ Rn is differentiable at a ∈ Rn if and only if each


fi with 1 ≤ i ≤ m is, and

Df (a) = (Df1 (a), . . . , Dfm (a)).

4. If s : R2 → R is defined by s(x, y) = x + y, then for all a ∈ R2 ,

Ds(a) = s.

5. If p : R2 → R is defined by p(x, y) = xy, then for all a ∈ R2 ,

Dp(a)(x, y) = a2 x + a1 y.

5
6. If q : R\{0} → R is defined by q(x) = 1/x, then for all a ∈ R\{0},
x
Dq(a)(x) = − .
a2

Proof. Let a ∈ Rn for numbers 1-3. Proving individually, we have:

1. Since f (x) = c ∈ Rm for all x ∈ Rn , then

kf (a + h) − f (a) − 0hk kc − ck
lim = lim = 0.
h→0 khk h→0 khk

2. Since f is linear,

kf (a + h) − f (a) − f (h)k kf (a) + f (h) − f (a) − f (h)k


lim = lim
h→0 khk h→0 khk
= 0.

3. If each fi is differentiable at a, let λ = (Df1 (a), . . . , Dfm (a)). Observe


that λ is linear since for all u, v ∈ Rm and scalars c ∈ R,

λ(u + v) = (Df1 (a)(u + v), . . . , Dfm (a)(u + v))


= (Df1 (a)(u) + Df1 (a)(v), . . . , Dfm (a)(u) + Dfm (a)(v))
= (Df1 (a)(u), . . . , Dfm (a)(u)) + (Df1 (a)(v), . . . . . . , Dfm (a)(v))
= λ(u) + λ(v),

and

λ(cu) = (Df1 (cu), . . . , Dfm (cu))


= (cDf1 (u), . . . , cDfm (u))
= c(Df1 (u), . . . , Dfm (u))
= cλ(u).

Now for all h 6= 0 where a + h ∈ A,


m
kf (a + h) − f (a) − λ(h)k X |fi (a + h) − fi (a) − Dfi (a)(h)|
0≤ ≤ .
khk i=1
khk

Since a is an interior point of A, there exists some open set containing 0


whose elements h except h = 0 yield a + h ∈ A and therefore the above
inequality. Clearly, squeeze theorem yields that f is differentiable at a.

6
Conversely, assume that f is differentiable at a. Since by definition,
fi = πi ◦ f , where each πi is the projection transformation (and thus
linear), the chain rule gives that each fi is differentiable at a; namely,
for all i with 1 ≤ i ≤ m,

Dfi (a) = D(πi ◦ f )(a) = Dπi (f (a)) ◦ Df (a) = πi ◦ Df (a).

Thus, each Dfi (a) exists and

Df (a) = (Df1 (a), . . . , Dfm (a)).

4. By construction, we can prove that s is linear and thus Ds(a) = s for


all a ∈ R2 . Let u, v ∈ R2 and c ∈ R, then

s(u + v) = s(u1 + v1 , u2 + v2 ) = u1 + u2 + v1 + v2 = s(u) + s(v)


s(cv) = s(cv1 , cv2 ) = cv1 + cv2 = cs(v).

5. Let a ∈ R2 and define λ : R2 → R by λ(x, y) = a2 x + a1 y, and by


construction, we see that λ is linear; for all u, v ∈ R2 and scalars c ∈ R,

λ(u + v) = λ(u1 + v1 , u2 + v2 ) = a2 (u1 + v1 ) + a1 (u2 + v2 )


= a2 u1 + a1 u2 + a2 v1 + a1 v2
= λ(u) + λ(v),
λ(cv) = λ(cu1 , cu2 ) = a2 cu1 + a1 cu2
= c(a2 u1 + a1 u2 )
= cλ(u).

Now,
|p(a + h) − p(a) − λ(h)|
lim
h→0 khk
|p(a1 + h1 , a2 + h2 ) − p(a1 , a2 ) − λ(h1 , h2 )|
= lim
h→0 khk
|a1 a2 + a2 h1 + a1 h2 + h1 h2 − a1 a2 − a2 h1 − a1 h2 |
= lim
h→0 khk
|h1 h2 |
= lim .
h→0 khk

Since (
h21 |h2 | ≤ |h1 |
|h1 h2 | ≤ ,
h22 |h1 | < |h2 |

7
we obtain
|h1 h2 | h2 + h22
0≤ ≤ 1 = khk,
khk khk
whose limit as h → 0 is obviously zero. Thus, squeeze theorem yields
|h1 h2 |
lim = 0.
h→0 khk

6. Let a ∈ R\{0} and λ : R → R be defined by λ(x) = −x/a2 . Clearly, λ


is linear.

|q(a + h) − q(a) − λ(h)| 1 −1+ h
2
lim = lim a+h a a

h→0 |h| h→0 h
|h|
= lim .
h→0 |a2 (a+ h)|
Since the denominator limit as h → 0 exists and is nonzero, the result
clearly follows.

Theorem 1.7. If f : A → R and g : B → R are differentiable at a ∈ A ∩ B


with A, B ⊆ Rn , then
1. D(f + g)(a) = Df (a) + Dg(a).
2. D(f g)(a) = g(a)Df (a) + f (a)Dg(a).
3. Additionally, if g(a) 6= 0, then
g(a)Df (a) − f (a)Dg(a)
D(f /g)(a) = .
g(a)2

Proof. Proving individually, we have:


1. If s be the function defined in result 4 of the previous theorem, then
we obtain f + g = s ◦ (f, g). Following through with the chain rule,

D(s ◦ (f, g))(a) = Ds((f, g)(a)) ◦ D(f, g)(a)


= s ◦ D(f, g)(a)
= s ◦ (Df (a), Dg(a))
= Df (a) + Dg(a),

referring to results 3 and 4 from the previous theorem.

8
2. The result follows in a similar fashion:

D(f g)(a) = D(p ◦ (f, g))(a)


= Dp((f, g)(a)) ◦ D(f, g)(a)
= Dp(f (a), g(a))(Df (a), Dg(a))
= g(a)Df (a) + f (a)Dg(a),

with function p defined from result 5 of the previous theorem.

3. If g(a) 6= 0 then

D(f /g)(a) = D(p ◦ (f, q ◦ g))(a)


= (q ◦ g)(a)Df (a) + f (a)D(q ◦ g)(a)
Df (a)
= + f (a)(Dq(g(a)) ◦ Dg(a))
g(a)
Df (a) f (a)Dg(a)
= −
g(a) g(a)2
g(a)Df (a) − f (a)Dg(a)
= .
g(a)2

Definition 1.8. If f : A → R where A ⊆ Rn , then the partial derivative


of f with respect to the ith variable, denoted by Di f , is defined for all x in
the interior of A by

f (x1 , . . . , xi−1 , xi + h, xi+1 , . . . , xn ) − f (x1 , . . . , xn )


Di f (x) = lim
h→0 h
if the limit exists.

Remark. If n = 1, this becomes the usual derivative as in the single-variable


case.
Moreover, partial derivatives can be taken in succession since each has
a similar domain and codomain as f . We call expressions such as Dj Di f
second order partial derivatives of f . Additionally, and for notational
convenience, partial derivatives done with respect to the same variable twice,
as in the case of Di Di f , can be denoted further as Di2 f . The same applies
to higher order partial derivatives.

9
Definition 1.9. A function f : U → R with U ⊆ Rn is said to be of
differentiability class C k with nonnegative integer k, iff all partial derivatives

D1α1 D2α2 . . . Dnαn f

exist and are continuous on U , for all nonnegative integers α1 , . . . , αn such


that α1 + · · · + αn ≤ k.
In multi-index notation, this reads: for every multi-index α of n nonneg-
ative integers such that |α| ≤ k, the |α|th order partial derivative

Dα f

exists and is continuous on U .

In particular, the class C 0 is the class of all continuous functions. C 1 is the


class of all continuously differentiable functions, whose first order partial
derivatives exist and are continuous. Moreover, f is said to be smooth if
it is of all differentiability classes. The class of all such functions is denoted
C ∞.
Continuous differentiability at a point, however, is slightly different; namely,
f is continuously differentiable at a point a iff all its first order partial deriva-
tives exist and are continuous at a, with the addition that they must also exist
in an open set containing a.
Differentiability classes may be extended to accommodate vector-valued
functions, that they may also be classified into differentiability classes. Namely,
f belongs to C k iff each component function fi belongs to C k .

Theorem 1.10. If f : A → Rm with A ⊆ Rn is continuously differentiable


at a, then f is differentiable at a.

Proof. Let 1 ≤ i ≤ m and C be an open rectangle containing a, upon which


Di f exists. Consider the set C 0 = {h : a + h ∈ C}.
Claim. There exist functions c1 , . . . , cn : C 0 → Rn so that for all h ∈ C 0 ,
n
X
fi (a + h) − f (a) = hj Dj fi (cj (h))
j=1

and for each cj ,


lim cj (h) = a.
h→0

10
Let 1 ≤ j ≤ n; we first specify a function k : C 0 → R by the following
procedure: let h ∈ C 0 and define g by

g(x) = fi (a1 + h1 , . . . , aj−1 + hj−1 , x, aj+1 , . . . , an ).

for all x so that the resultant set of arguments of fi belongs to C. Clearly, the
derivative of g is a first order partial derivative of fi which exists everywhere
on C, hence differentiable. Moreover, since C is an open rectangle, thus
convex, and a + h ∈ C, the domain of g contains aj and aj + hj , as well as
all points between the two. Consequently, the mean value theorem applies
to g; i.e. we obtain a point p on this interval so that

hj g 0 (p) = g(aj + hj ) − g(aj ). (8)

We define k(h) = p and cj (h) = (a1 + h1 , . . . , aj−1 + hj−1 , k(h), aj+1 , . . . , an ).


Consequently, for all h ∈ C 0 , either aj ≤ k(h) ≤ aj +hj or aj +hj ≤ k(h) ≤ aj ;
regardless, squeeze theorem proves that

lim k(h) = aj ⇒ lim cj (h) = a,


h→0 h→0

by Lemma (??). Moreover,

fi (a + h) − fi (a) = fi (a1 + h1 , . . . , an + hn ) − fi (a1 , . . . , an )


=fi (a1 + h1 , a2 , . . . , an ) − fi (a1 , . . . , an )
+ fi (a1 + h1 , a2 + h2 , a3 , . . . , an ) − fi (a1 + h1 , a2 , . . . , an )
+ ...
+ fi (a1 + h1 , . . . , an + hn ) − fi (a1 + h1 , . . . , an−1 + hn−1 , an ).

If we rewrite (8) in terms of f , we get

hj Dj fi (cj (h)) =fi (a1 + h1 , . . . , aj + hj , aj+1 , . . . , an )


+ fi (a1 + h1 , . . . , aj−1 + hj−1 , aj , . . . , an );

thus, finally,
n
X
fi (a + h) − f (a) = hj Dj fi (cj (h)).
j=1
 
Now, define λi : Rn → R by the 1 × n matrix D1 fi (a) . . . Dn fi (a) and

11
we have for all nonzero h ∈ C 0 that

Xn n
X
Dj fi (cj (h))hj − Dj fi (a)hj


|fi (a + h) − fi (a) − λi (h)|
j=1 j=1

0≤ =
khk khk
n
X |hj |
≤ |Dj fi (cj (h)) − Dj fi (a)|
j=1
khk
Xn
≤ |Dj fi (cj (h)) − Dj fi (a)|.
j=1

By continuity of first order partial derivatives of fi at a,


 
lim Dj fi (cj (h)) = Dj fi lim cj (h) = Dj fi (a)
h→0 h→0

⇒ lim |Dj fi (cj (h)) − Dj fi (a)| = 0;


h→0

thereby proving, by squeeze theorem, that each fi is differentiable at a. The


same must be true of f .
We now prove a seemingly trivial result below. Although this is trivial in
the realm of real functions, it is not so, in the context of real/vector-valued
functions and thus requires a proof.
Corollary 1.11. If f : A → R with A ⊆ Rn is a C k function, then it is also
of all differentiability classes C 0 , C 1 , . . . , C k−1 .
Proof. Following through with the assumption, we know that the (k − 1)th
order partial derivative
D1α1 D2α2 . . . Dnαn f
with α1 + · · · + αn = k − 1 is continuously differentiable; thus, each (k − 1)th
order partial derivative is differentiable, and therefore continuous by virtue
of theorem 2.4. In other words, f is C k−1 . The same procedure can be done
to show that f is C k−2 , and so on.
Definition 1.12. If a function f : A → Rm with A ⊆ Rn has each of
its component functions’ first order partial derivatives at a, the Jacobian
matrix of f at a, denoted by Jf (a), is an m × n matrix whose (i, j)th entry
is Dj fi (x); i.e.  
D1 f1 (a) . . . Dn f1 (a)
Jf (a) =  .. .. ..
.
 
. . .
D1 fm (a) . . . Dn fm (a)

12
Lemma 1.13. If f : A → Rm with A ⊆ Rn is differentiable at a, then all
first order partial derivatives of each fi exist at a and the matrix of Df (a) is
exactly Jf (a).
Proof. Let 1 ≤ i ≤ m and 1 ≤ j ≤ n; define a function h : R → Rn by
h(x) = (a1 , . . . , aj−1 , x, aj+1 , . . . , an ). Observe that fi ◦ h is a real valued
function of one variable and Dj fi (a) = (fi ◦ h)0 (aj ), which means that the
matrix of D(fi ◦ h)(aj ) has only the single entry Dj fi (a) (by Lemma 2.3).
By chain rule, we obtain that
D(fi ◦ h)(ai ) = Dfi (a) ◦ Dh(aj ).
In matrix form,
 
0
 .. 
    .
Dj fi (a) = y1 . . . yn 1 ← jth place
 
.
 .. 
0
 
= yj .
It appears that Dj fi (a) is the jth entry of the 1× n matrix of Dfi (a), namely
the (i, j)th entry of the matrix of Df (a). It cannot therefore be the case that
Dj fi (a) does not exist by differentiability of f at a.
Theorem 1.14. Multivariable Chain Rule: if f : A → Rm with A ⊆
Rn is continuously differentiable at a and g : B → R with B ⊆ Rm is
differentiable at f (a), then for all integers i with 1 ≤ i ≤ n,
m
X
Di (g ◦ f )(a) = Dj g(f (a))Di fj (a).
j=1

Proof. By the previous theorem, f is differentiable at a; thus, by the second


premise and the chain rule, g ◦ f is differentiable at a and
D(g ◦ f )(a) = Dg(f (a)) ◦ Df (a).
In matrix form,
 
D1 (g ◦ f )(a) . . . Dn (g ◦ f )(a)
 
D1 f1 (a) . . . Dn f1 (a)

= D1 g(f (a)) . . . Dm g(f (a)) 
 .. .. .. 
. . . 
D1 fm (a) . . . Dn fm (a)
" m m
#
X X
= Dj g(f (a))D1 fj (a) . . . Dj g(f (a))Dn fj (a)
j=1 j=1

13
Comparing the matrix entries yields the result.

Lemma 1.15. If λ : Rn → R is a linear transformation, then for all x ∈ Rn ,


Dj λ(x) = λ(ej ) where ej is the jth basis vector in Rn .
Proof.
λ(x1 , . . . , xj−1 , xj + h, xj+1 , . . . , xn ) − λ(x1 , . . . , xn )
Dj λ(x) = lim
h→0 h
λ(x1 , . . . , xn ) + λ(hej ) − λ(x1 , . . . , xn )
= lim
h→0 h
= lim λ(ej )
h→0
= λ(ej )
since hej = (0, . . . , h, . . . , 0) with h in the jth place.
Remark. More precisely, all linear transformations are trivially smooth, since
all of its partial derivatives are constant functions.
Theorem 1.16. If any two functions f : A → R and g : B → R with
A, B ⊆ Rn are C k functions, then f + g and f g are also C k . Additionally, if
g(x) 6= 0 for all x ∈ A ∩ B, then f /g is also C k .
Proof. Recall the functions s, p, and q from theorem 2.6. Since s is linear,
by lemma 2.15 we obtain that
D1 s(x, y) = D2 s(x, y) = 1.
Likewise, for p, we have
(x + h)y − xy
D1 p(x, y) = lim
h→0 h
xy + hy − xy
= lim
h→0 h
= y;
similarly, one can show that D2 p(x, y) = x. Taking the partial derivative
of q on the other hand, is equivalent to taking the normal derivative, so
D1 q(x) = −1/x2 for nonzero x.
Let 1 ≤ i ≤ n. Since f and g are continuously differentiable by corollary
2.11, the multivariable chain rule yields for all x ∈ A ∩ B that
Di (f + g)(x) = Di (s ◦ (f, g))(x)
= D1 s(f (x), g(x))Di f (x) + D2 s(f (x), g(x))Di g(x)
= Di f (x) + Di g(x),

14
Di (f g)(x) = Di (p ◦ (f, g))(x)
= D1 p(f (x), g(x))Di f (x) + D2 p(f (x), g(x))Di g(x)
= g(x)Di f (x) + f (x)Di g(x).

Moreover, if g(x) 6= 0, then q is differentiable at g(x); thus,

Di (q ◦ g)(x) = D1 q(g(x))Di g(x)


Di g(x)
=
g(x)2
which is clearly a continuous function; therefore, q ◦ g is continuously differ-
entiable and we obtain that

Di (f /g)(x) = Di ((q ◦ g)f )(x)


= (q ◦ g)(x)Di f (x) + f (x)Di (q ◦ g)(x)
Di f (x) f (x)Di g(x)
= −
g(x) g(x)2
g(x)Di f (x) − f (x)Di g(x)
= .
g(x)2
Now that we’ve proven some of the classic rules of partial differentiation, we
now prove the theorem by induction on k.
Base case. Let k = 1; namely, that f and g are continuously differentiable.
For a ∈ A ∩ B, we have:

lim (Di f (x) + Di g(x)) = Di f (a) + Di g(a)


x→a
= Di (f + g)(a).

Thus, f + g is C 1 . Using similar lines of reasoning and the above product


and quotient rules, one can show that f g and f /g (if g(x) 6= 0 everywhere)
are C 1 .
Induction step. Assume the truth of the claim for some k ≥ 1 and suppose
that f and g are C k+1 functions. For 1 ≤ i ≤ n, this implies that Di f and
Di g are C k , which by the induction hypothesis further implies that Di f +Di g
is C k ; but this equals Di (f + g), therefore f + g is C k+1 .
Now, the product and quotient of C k functions are also C k by the hy-
pothesis. In similar fashion, one can show that gDi f + f Di g, thus Di (f g),
and the quotient (gDi f − f Di g)/g 2 , thus Di (f /g), are also C k ; therefore, f g
and f /g are C k+1 .

15
Corollary 1.17. If any two functions f : A → Rm with A ⊆ Rn and g :
B → R with B ⊆ Rm are of class C k with k ≥ 1 then the composition g ◦ f
is C k .

Proof. We shall immediately proceed by induction on k.


Base case. Suppose that k = 1. Consequently, f is differentiable while g is
continuously differentiable; thus, the multivariable chain rule applies—for all
a ∈ dom(g ◦ f ),
m
X
lim Di (g ◦ f )(x) = lim Dj g(f (x)) lim Di fj (x)
x→a x→a x→a
j=1
Xm  
= Dj g lim f (x) Di fj (a)
x→a
j=1
Xm
= Dj g(f (a))Di fj (a)
j=1

= Di (g ◦ f )(a),

for each i with 1 ≤ i ≤ n; thus, g ◦ f is C 1 .


Induction step. Suppose that the theorem holds for some k ≥ 1 and assume
that f and g are C k+1 . Since each first order partial derivative Dj g with
1 ≤ j ≤ m is C k , so is the composition Di g ◦ f by the induction hypothesis.
Now, each first order partial derivative Di fj with 1 ≤ i ≤ n is also C k , by
which we know that the same is true for Di (g ◦ f ), which is a summation of
products of these C k functions. Therefore, (g ◦ f ) is C k+1 .

Definition 1.18. For any function f : A → Rm where A ⊆ Rn , the direc-


tional derivative ∇a f : Rn → Rm is defined for all interior points a of A
by:
f (a + tv) − f (a)
∇a f (v) = lim ,
t→0 t
if the limit exists.

Lemma 1.19. If f : A → Rm with A ⊆ Rn is differentiable at a, then ∇a f


exists everywhere and Df (a) = ∇a f .

Proof. For all nonzero v ∈ Rn , we can change to a scalar variable t to obtain

16
kf (a + h) − f (a) − Df (a)(h)k
0 = lim
h→0 khk
kf (a + tv) − f (a) − Df (a)(tv)k
= lim
t→0 ktvk
kf (a + tv) − f (a) − tDf (a)(v)k
= lim
t→0 |t|kvk

1 f (a + tv) − f (a)
= lim − Df (a)(v)
kvk t→0 t
f (a + tv) − f (a)
⇒ lim = ∇a f (v) = Df (a)(v).
t→0 t
Although subtle, one must note that substituting tv for h was possible only
because a lies in the interior of A, since a + tv may not fall in the domain of
f otherwise as the choice of v was arbitrary.
Obviously if v = 0, then ∇a f (v) = Df (a)(v) = 0. Thus, we conclude
that ∇a f = Df (a).

Theorem 1.20. If A ⊆ Rn is a open and convex and f : A → R is differ-


entiable, then for any two points a, b ∈ A, there exists a point s lying in the
segment joining a and b such that

f (b) − f (a) = Df (s)(b − a).

Remark. This is the mean value theorem extended to higher dimensions.


Proof. Define the function h : [0, 1] → Rn by h(t) = (1 − t)a + tb and let
g = f ◦ h. Notice that h is a parameterization of the line segment joining a
and b for t ∈ [0, 1]; thus, since A is convex, the range of h completely lies in
A. Also, h is differentiable at (0, 1); thus, the chain rule suffices to show that
g is differentiable at the same.
By continuity of f at a and b, one can easily show that g is continuous
at [0, 1]. As a result, g satisfies the conditions for the mean value theorem;
therefore, for some c ∈ (0, 1),

g 0 (c) = g(1) − g(0)


= f (b) − f (a).

17
Let s = (1 − c)a + cb, then

g(c + t) − g(c)
g 0 (c) = lim
t→0 t
f ((1 − (c + t))a + (c + t)b) − f ((1 − c)a + cb)
= lim
t→0 t
f ((1 − c)a + cb + t(b − a)) − f (s)
= lim
t→0 t
f (s + t(b − a)) − f (s)
= lim
t→0 t
= ∇s f (b − a)
= Df (s)(b − a)

since f is differentiable at s.

Lemma 1.21. If a function f : A → R with A ⊆ Rn has a local maximum


(or minimum) at a point a in the interior of A and Di f (a) with 1 ≤ i ≤ n
exists, then Di f (a) = 0.

Proof. Let g(x) = f (a1 , . . . , ai−1 , x, ai+1 , . . . , an ). It is clear that g has a local
maximum (or minimum) at ai , which is in the interior of the domain of g,
and is differentiable there since g 0 (ai ) = Di f (a). Fermat’s theorem in single-
variable calculus implies that g 0 (ai ) is zero, and the result clearly follows.

Lemma 1.22. If f : A → Rn with A ⊆ Rn is continuously differentiable,


then g : A → R defined by g(x) = det Jf (x) is a continuous function.

Proof. By definition of the determinant, we have for each x ∈ A that


X n
Y
det Jf (x) = sgn(τ ) Di fτ (i) (x),
τ ∈Sn i=1

so that for all a ∈ A,


X n
Y
lim det Jf (x) = sgn(τ ) lim Di fτ (i) (x)
x→a x→a
τ ∈Sn i=1
X Yn
= sgn(τ ) Di fτ (i) (a)
τ ∈Sn i=1

= det Jf (a),

by continuity of each first order partial derivative of f .

18
Lemma 1.23. If A ⊆ Rn is an open ball and f : A → Rn is continuously
differentiable, then if for some real M , all first order partial derivatives of
each fi yield |Dj fi (s)| ≤ M for all s ∈ A, then

kf (y) − f (x)k ≤ n2 M ky − xk

for all x, y ∈ A.
Proof. Let integer i so that 1 ≤ i ≤ n. Since f is continuously differentiable,
it must be differentiable at A, which is open and convex; thus, by Theorem
2.20, there exists some s ∈ A where

|fi (y) − fi (x)| = |Dfi (s)(y − x)|


 

 y1 − x 1


= D1 fi (s) . . . Dn fi (s)  ... 
  

yn − xn

X n
= Dj fi (s)(yj − xj )


j=1
n
X
≤ M |yj − xj |
j=1

≤ nM |y − x|

since each |yj − xj | ≤ ky − xk. Thus,


n
X
kf (y) − f (x)k ≤ |fi (y) − fi (x)| ≤ n2 M ky − xk.
i=1

Theorem 1.24. Inverse Function Theorem: if f : A → Rn with A ⊆ Rn


is continuously differentiable and det Jf (a) 6= 0, then there exists an open
set V containing a so that the restriction f |V has an open range W and a
differentiable inverse with derivative

D(f |−1 −1
V )(y) = Df (f |V (y))
−1

for all y ∈ W .
Proof. Clearly, f is differentiable, so for convenience, let λ = Df (a), whose
matrix’ determinant is nonzero, thereby yielding an inverse λ−1 . Consider the
function λ−1 ◦f ; one can show that this function is continuously differentiable

19
(see Lemma 2.15 and Corollary 2.17), and its corresponding derivative at a
turns out to be the identity transformation;

D(λ−1 ◦ f )(a) = D(λ−1 )(f (a)) ◦ λ


= λ−1 ◦ λ.

Now, if the theorem holds for all continuously differentiable functions whose
derivative at a is the identity, then it holds for λ−1 ◦ f ; namely, for some open
set V containing a, the restriction (λ−1 ◦ f )|V = λ−1 ◦ f |V has a differentiable
inverse f |−1 −1
V ◦ λ which for all z ∈ λ (f (V )) satisfies

D(f |−1
V ◦ λ)(z) = D(λ
−1
◦ f )(f |−1
V (λ(z)))
−1

= (λ−1 ◦ Df (f |−1
V (λ(z))))
−1

= Df (f |−1
V (λ(z)))
−1
◦ λ,

by the chain rule. Repeating the same to the left side, we get

D(f |−1 −1
V ◦ λ)(z) = Df |V (λ(z)) ◦ λ;

therefore,
Df |−1 −1 −1
V (λ(z)) = Df (f |V (λ(z))) .

Now, let y ∈ f (V ). Since λ is a bijection, assuredly, λ(z) = y for some


z ∈ λ−1 (f (V )); thus,

Df |−1 −1 −1
V (y) = Df |V (λ(z)) = Df (f |V (λ(z)))
−1
= Df (f |−1 −1
V (y)) ;

in other words, the theorem holds for f .


Mind that all this is under the assumption that the theorem holds for
all continuously differentiable functions whose derivative at a is the identity,
which we have yet to prove. Essentially, we’ve only reduced the problem to
that which is less cumbersome.
Proceeding further, assume from this point onward that λ = Df (a) is the
identity I. Since

kf (a + h) − f (a) − λ(h)k
lim = 0,
h→0 khk

we know that for all h sufficiently close to 0, we have

kf (a + h) − f (a) − λ(h)k
<1
khk

20
by which we can’t have f (a + h) = f (a), since the left side would evaluate to
exactly 1, otherwise. Therefore, we have for some closed ball U containing a
in its interior, that

f (x) 6= f (a) if x ∈ U and x 6= a. (9)

Since f is continuously differentiable, then according to Lemma 2.22, all


x ∈ U , provided that U is small enough, yield

| det Jf (x) − det Jf (a)| = | det Jf (x) − 1| < 1,

seeing that Jf (a) is the matrix of the identity transformation λ whose deter-
minant is exactly 1, from which it can be shown that

det Jf (x) 6= 0 for all x ∈ U. (10)

Under the same assumption, we get


1
|Dj fi (x) − Dj fi (a)| < for all i, j, and x ∈ U.
2n2
Let g : int(U ) → Rn be defined by g(x) = f (x) − λ(x). From Lemma 2.15,
we have Dj gi (x) = Dj fi (x) − λi (ej ), where λi (ej ) is clearly the (i, j)th entry
of the matrix of λ, that is Jf (a); therefore,

1
|Dj gi (x)| = |Dj fi (x) − Dj fi (a)| <
2n2
and so by Lemma 2.23, since g is continuously differentiable, we have for all
x1 , x2 ∈ int(U ) that
1
kf (x1 ) − x1 − (f (x2 ) − x2 )k ≤ kx1 − x2 k.
2
By the reverse triangle inequality,
1
kx2 − x1 k − kf (x1 ) − f (x2 )k ≤ kx1 − x2 k,
2
and so
kx1 − x2 k ≤ 2kf (x1 ) − f (x2 )k for all x1 , x2 ∈ int(U ). (11)
By Corollary ??, f (∂U ) is compact which by (9) does not contain f (a);
therefore, we have for some d > 0 that kf (x) − f (a)k ≥ d for all x ∈ ∂U .

21
Let W = {y ∈ Rn : ky − f (a)k < d/2}; observe that W is an open ball. If
x ∈ ∂U and y ∈ W , we have:

d ≤ kf (x) − f (a)k ≤ ky − f (x)k + ky − f (a)k


d
< ky − f (x)k + ,
2
and so
d
< ky − f (x)k;
2
therefore,

ky − f (a)k < ky − f (x)k for all x ∈ ∂U and y ∈ W. (12)

Now, let y ∈ W ; define gy : U → R by


n
X
2
gy (x) = ky − f (x)k = (yi − fi (x))2 .
i=1

Clearly, gy is continuous, bounded, and therefore has a minimum by Corollary


??. Moreover, if x ∈ ∂U , then gy (a) < gy (x) by (12) and so it cannot be that
the minimum occurs in ∂U . This means, by Lemma 2.21, that for some x in
the interior of U , we have for all j that
n
X
Dj gy (x) = 2(yi − fi (x))Dj fi (x) = 0.
i=1

Though subtle, this can be written in matrix form;


  
D1 f1 (x) . . . D1 fn (x) 2(y1 − f1 (x))
.. ... .. ..
 = 0.
  
 . .  .
Dn f1 (x) . . . Dn fn (x) 2(yn − fn (x))

Examined more carefully, the n × n matrix shown is the transpose of Jf (x).


Recall that the determinant of the transpose is equal to that of the original
matrix; therefore, by (10), the above matrix has an inverse, which when
applied to both sides, yields
 
2(y1 − f1 (x))
..
 = 0;
 
 .
2(yn − fn (x))

22
therefore, y = f (x). Now, suppose for x1 , x2 ∈ int(U ) that f (x1 ) = f (x2 ).
From (11), we immediately obtain that x1 = x2 .
Now, let V = int(U ) ∩ f −1 (W ). What we’ve essentially shown is that f |V
is a bijection and thus has an inverse, specifically with a range W . Moreover,
f is continuous and W is an open ball, and so f −1 (W ) equals B ∩ A for
some open set B. Since int(U ) ∩ B = (int(U ) ∩ A) ∩ B, we obtain that V is
open by associativity of set intersection. One can also verify that a ∈ V by
construction of W . Proceeding further, we can rewrite (11) as

kf |−1 −1
V (y1 ) − f |V (y2 )k ≤ 2ky1 − y2 k for all y1 , y2 ∈ W ; (13)

in other words, if we let  > 0 and δ = /2, then ky1 − y2 k < δ implies that
kf |−1 −1 −1
V (y1 ) − f |V (y2 )k < , namely that f |V is continuous. Fix x1 , x2 ∈ V
and let µ = Df (x1 ); define a function ψ by

ψ(x) = f |V (x1 + x) − f |V (x1 ) − µ(x)

for all x such that x1 + x ∈ V . Clearly, x2 − x1 satisfies this condition, and


so by plugging x = x2 − x1 we obtain

ψ(x2 − x1 ) = f |V (x2 ) − f |V (x1 ) − µ(x2 − x1 );

also, by differentiability of f ,

kψ(x − x1 )k
lim = 0. (14)
x→x1 kx − x1 k

By (10), µ is invertible, and so the former equation can be rewritten as

µ−1 (ψ(x2 − x1 )) = x1 − x2 + µ−1 (f |V (x2 ) − f |V (x1 )).

Moreover, for all y, t ∈ W , this translates to

µ−1 (ψ(f |−1 −1 −1 −1 −1


V (t) − f |V (y))) = f |V (y) − f |V (t) + µ (t − y)

where µ−1 = Df (f |−1 V (y))


−1
by definition of µ. Imposing an additional re-
striction that t 6= y, we have

kf |−1 −1 −1
V (t) − f |V (y) − µ (t − y)k kµ−1 (ψ(f |−1 −1
V (t) − f |V (y)))k
0≤ =
kt − yk kt − yk
M kψ(f |V (t) − f |−1
−1
V (y))k

kt − yk

23
for some large enough M > 0. Continuing further,
M kψ(f |−1 −1 −1 −1
V (t) − f |V (y))k kf |V (t) − f |V (y)k
=
kf |−1 −1
V (t) − f |V (y)k kt − yk
−1 −1
2M kψ(f |V (t) − f |V (y))k
≤ ,
kf |−1 −1
V (t) − f |V (y)k

since f |−1 −1
V (t) 6= f |V (y) for all t 6= y, by contraposition, whereas (13) was
used to derive the last step. As necessitated by (14) and the continuity of
f |−1
V , we have
kψ(f |−1 −1
V (t) − f |V (y))k
lim = 0.
t→y kf |−1 −1
V (t) − f |V (y)k
In conclusion, f |−1
V is differentiable at y with derivative µ
−1
= Df (f |−1
V (y))
−1

for all y ∈ W .
Corollary 1.25. If f : A → Rn with A ⊆ Rn is a continuously differentiable
injective function and det Jf (x) 6= 0 everywhere, then f is an open map with
a differentiable inverse.
Proof. Since A is open as implied by the continuous differentiability of f , all
sets in the subspace topology on A are open in Rn ; let B be such a set and
y ∈ f (B), corresponding to some x ∈ A such that f (x) = y. By the inverse
function theorem applied to f |B , there exists an open set V ⊆ B containing
x and an open f (V ) ⊆ f (B) containing y. This suffices to show that f (B)
is open and f therefore an open map.
Since f is injective, it has an inverse with domain f (A). Moreover, for
every y ∈ f (A), there exists an open set V containing f −1 (y) so that f |V has
a differentiable inverse with an open domain W containing y. In fact, this
inverse is exactly f −1 |W ; since f |−1
V (y1 ) = f
−1
|W (y2 ) for distinct y1 , y2 ∈ W
implies f (y1 ) = f (y2 ), a contradiction to the fact that f is injective. This
suffices to show that f −1 is everywhere differentiable.
Theorem 1.26. Implicit Function Theorem: if f : A → Rm with A ⊆
Rm+n is continuously differentiable where f (a, b) = 0 for some a ∈ Rn and
b ∈ Rm , and the matrix
 
Dn+1 f1 (a, b) . . . Dn+m f1 (a, b)
M =
 .. ... .. 
. . 
Dn+1 fm (a, b) . . . Dn+m fm (a, b)
has a nonzero determinant, then there exist an open set B ⊆ Rn containing
a and a unique differentiable function g : B → Rm such that g(a) = b and
f (x, g(x)) = 0 for all x ∈ B.

24
Remark. Be mindful that (a, b), for the purposes of the proof, should be taken
not as an ordered pair, but a notational shorthand for (a1 , . . . , an , b1 , . . . , bm ).
The same should apply to the “product” U × V whose points in the technical
sense, are likewise obtained by appending the coordinates of V ’s elements to
those of U , in seeming deviation of the usual Cartesian product.
Proof. Define a function F : A → Rm+n by F (x, y) = (x, f (x, y)) with
x ∈ Rn and y ∈ Rm . More precisely, F = (π1 , . . . , πn , f1 , . . . , fm ), and so
one can easily observe that F is continuously differentiable. We also have for
whenever 1 ≤ j ≤ n + m and 1 ≤ i ≤ n that
(
0 i 6= j
Dj Fi (a, b) = ;
1 i=j

therefore,

det(JF (a, b)) = D1 F1 (a, b) · · · Dn Fn (a, b) det(M )


= det(M )
6= 0

since Dn+j Fn+i (a, b) = Dn+j fi (a, b) whenever 1 ≤ i, j ≤ m.


We see that the inverse function theorem applies to F ; namely, that there
exists an open set U × V containing (a, b), where U and V are open in Rn
and Rm respectively, so that the restriction F |U ×V has an open range W and
a differentiable inverse h. Although the inverse function theorem does not
impose any form on the open set containing (a, b), for the purposes of this
proof it is implied to have been restricted so that it may be expressible as
the product U × V .
Proceeding further, for all (x, y) ∈ W , we have h(x, y) = (h1 (x, y), h2 (x, y))
for some differentiable h1 : W → U and h2 : W → V that denote the first n,
and last m component functions of h, respectively; thus,

(x, y) = F (h(x, y))


= F (h1 (x, y), h2 (x, y))
= (h1 (x, y), f (h1 (x, y), h2 (x, y)));

therefore,

x = h1 (x, y), (15)


y = f (h1 (x, y), h2 (x, y))
= f (x, h2 (x, y)). (16)

25
Define G : Rn → Rn+m by G(x) = (x, 0). Clearly, G is continuous; therefore,
G−1 (W ) is open in Rn . Moreover, G−1 (W ) is nonempty as it contains a,
since F (a, b) = (a, f (a, b)) = (a, 0) ∈ W ; and so we can define B ⊆ G−1 (W )
to be any connected open subset containing a. Observe that B ⊆ U by (15),
since for each x ∈ B, hence (x, 0) ∈ W , we have x = h1 (x, 0) ∈ U .
Now, define g : B → Rm by g(x) = h2 (x, 0). Clearly, by the previous
argument, h(a, 0) = (a, b) and so g(a) = b. Furthermore, if x ∈ B, we have
(x, 0) ∈ W ; thus, f (x, g(x)) = 0 by (16). It can also be shown that g is
differentiable by the chain rule, since g = (h2 ◦ G)|B and G is linear, hence
differentiable. Moreover, observe that by construction, the range of g lies
completely in V .
We shall now proceed to prove that g is unique; suppose that g0 is a
function satisfying the same conditions specified by the theorem for g. We
will want to prove that the set of all points in B at which g0 agrees with
g is in fact, open. Suppose further that g(a0 ) = g0 (a0 ) for some a0 ∈ B.
Consequently, g0 maps a0 to V , since g(B) ⊆ V ; therefore, because g0 is
differentiable, and thus continuous, there exists an open ball B0 containing
a0 whose image under g0 lies completely in V ; moreover, B0 ⊆ U and so
(x, g0 (x)) ∈ U × V for all x ∈ B0 . We also have f (x, g0 (x)) = 0 by definition,
and so

F (x, g0 (x)) = (x, 0) ∈ W,


⇒ (x, g0 (x)) = h(x, 0) = (h1 (x, 0), h2 (x, 0))
= (x, h2 (x, 0))
= (x, g(x));

therefore, g0 agrees with g on B0 . This suffices to show that the subset of


B on which g agrees with g0 is open. Clearly, it is also open relative to the
subspace topology on B, and so the complement of this set relative to B,
must be closed. However, one can easily show by continuity of g and g0 that
this set, namely the set of all points in B where g0 and g disagree, is also
open, which by the connectedness of B, implies that this set must be either
the empty set or B itself.
The final piece is the fact that g0 (a) = g(a) = b by definition; in other
words, it is impossible that this set is the entirety of B since we can name
an element of B not in this set; therefore, it is empty, and so g = g0 .

Theorem 1.27. For all continuously differentiable functions f : A → Rp


with A ⊆ Rn where n ≥ p, if f (a) = 0 and Jf (a) has rank p, then there exist

26
open sets U, V ⊆ Rn where a ∈ U , and a differentiable function h : V → U
with a differentiable inverse such that
f (h(x1 , . . . , xn )) = (xn−p+1 , . . . , xn )
for all x ∈ V .
Proof. Since Jf (a) has rank p, there exist p distinct integers j1 , . . . , jp such
that the p × p matrix
 
Dj1 f1 (a) · · · Djp f1 (a)
M =
 .. .. .. 
. . . 
Dj1 fp (a) · · · Djp fp (a)
has a nonzero determinant. We define a permutation τ ∈ Sn , in cycle nota-
tion, by
(j1 n − p + 1)(j2 n − p + 2) . . . (jp n)
and the permutation matrix
 
eτ (1)
G =  ... 
 
eτ (n)
defining a linear transformation g : Rn → Rn . Observe that
g(x1 , . . . , xn ) = (xτ (1) , . . . , xτ (n) )
and it corresponds to swapping the jk th coordinate with the (n − p + k)th
for each k with 1 ≤ k ≤ p. Since permutation matrices are invertible, let
a0 = g −1 (a); consider the continuously differentiable function f ◦ g. By the
multivariable chain rule, we have
n
X
0
Dn−p+k (fi ◦ g)(a ) = Dj fi (g(a0 ))Dn−p+k gj (a0 )
j=1
Xn
= Dj fi (a)gj (en−p+k )
j=1

= Djk fi (a)
which implies that the matrix
 
Dn−p+1 (f ◦ g)1 (a0 ) · · · Dn (f ◦ g)1 (a0 )
 .. .. .. 
 . . . 
Dn−p+1 (f ◦ g)p (a0 ) · · · Dn (f ◦ g)p (a0 )

27
is exactly identical to M and thus also has a nonzero determinant. Just like
with the previous theorem, define F : A → Rn by F (x, y) = (x, (f ◦ g)(x, y))
with x ∈ Rn−p and y ∈ Rp . Using similar arguments from the previous
theorem, F is continuously differentiable and det(JF (a0 )) 6= 0, and so by the
inverse function theorem, there exists an open set U 0 ⊆ Rn containing a0 so
that the restriction F |U 0 has an open range V and a differentiable inverse k
which for all (x, y) ∈ V (with reference to (16)) satisfies

y = (f ◦ g)(k(x, y));

in other words,

((f ◦ g) ◦ k)(x1 , . . . , xn ) = (xn−p+1 , . . . , xn ).

We thus define U as the open preimage of U 0 under g −1 , which contains


g(a0 ) = a. The differentiable function h = g ◦ k clearly has domain V and
range U , and is also invertible with a differentiable inverse h−1 = F |U 0 ◦ g −1 .
The equation specified in the theorem statement follows immediately.

28

You might also like