Professional Documents
Culture Documents
Marcus Tornea
August 5, 2020
1 Multivariable Calculus
Definition 1.1. A function f : A → Rm with A ⊆ Rn is differentiable at
a point a ∈ A iff a is in the interior of A and there exists a linear transfor-
mation λ : Rn → Rm such that
kf (a + h) − f (a) − λ(h)k
lim = 0.
h→0 khk
The function λ is denoted Df (a) and is called the derivative of f at a. It
is simply called differentiable iff Df (a) exists for all a ∈ A.
It can also be justified that λ be called the derivative of f as it is appar-
ently the only linear transformation which satisfies the above property, if it
exists.
Theorem 1.2. If f : A → Rm with A ⊆ Rn is differentiable at a ∈ A, then
its derivative at a is unique.
Proof. Suppose that λ, ψ : Rn → Rm are derivatives of f at a. For all h 6= 0
where f (a + h) is defined, let d(h) = f (a + h) − f (a) and we obtain:
kλ(h) − ψ(h)k kλ(h) − d(h) + d(h) − ψ(h)k
0≤ =
khk khk
kd(h) − λ(h)k kd(h) − ψ(h)k
≤ + .
khk khk
Since a is an interior point of A, there exists some open set containing 0,
all of whose members h yield a + h ∈ A, and therefore the above inequality
(except at h = 0); thus, by squeeze theorem,
kλ(h) − ψ(h)k
lim = 0.
h→0 khk
1
If x ∈ Rn is nonzero, then a change of variables can be performed as follows:
kλ(tx) − ψ(tx)k
0 = lim
t→0 ktxk
|t|kλ(x) − ψ(x)k
= lim
t→0 |t|kxk
kλ(x) − ψ(x)k
= .
kxk
Where t is a scalar variable. Thus, λ(x) = ψ(x). Finally, by nature of linear
transformations, the same can be said for x = 0. We therefore conclude that
λ = ψ.
2
In other words, for all t ∈ A sufficiently close to a, we have that
0 ≤ kf (t) − f (a) − λ(t − a)k < kt − ak.
Because a is an interior point of A, there exists some open ball small enough
that all its elements t satisfy the above inequality; therefore, by squeeze
theorem,
lim kf (t) − f (a) − λ(t − a)k = 0
t→a
⇒ lim(f (t) − λ(t − a)) = f (a). (1)
t→a
3
by which the differentiability of f and g at a and b respectively, implies that
kϕ(x)k
lim =0 (2)
x→a kx − ak
kψ(y)k
lim =0 (3)
y→b ky − bk
Let > 0. (3) implies that for some δ1 > 0, all y ∈ B satisfying 0 < ky −bk <
δ1 also satisfy kψ(y)k < N +1 ky − bk for some N > 0 suitably large enough so
that kλ(x−a)k ≤ N kx−ak for all x ∈ Rn . Clearly, if y = b, then kψ(y)k = 0,
and so we can tweak the above statement as follows:
ky − bk < δ1 ⇒ kψ(y)k ≤ ky − bk. (6)
N +1
From (2), we also obtain that for some δ2 > 0, all x ∈ A that satisfy
0 < kx − ak < δ2 should also satisfy kϕ(x)k < kx − ak.
Finally, by continuity of f at a, some δ3 > 0 yields kf (x) − bk < δ1
for all x ∈ A satisfying kx − ak < δ3 . Gathering all these facts, define
δ = min{δ2 , δ3 } and assume that 0 < kx − ak < δ. We obtain the inequality
which implies by (6) that
kψ(f (x))k ≤ kf (x) − bk
N +1
= kϕ(x) + λ(x − a)k
N +1
≤ (kϕ(x)k + kλ(x − a)k)
N +1
< (kx − ak + N kx − ak)
N +1
= kx − ak.
4
Therefore,
kψ(f (x))k
lim = 0. (7)
x→a kx − ak
Now, for all x ∈ A (and thus all x in U ) except at a itself, we obtain from
(4) that:
thus, from (5) and (7), since the limit of the right side of the above inequality
as x → a is zero, we ultimately deduce that
Df (a) = 0.
Df (a) = f.
Ds(a) = s.
Dp(a)(x, y) = a2 x + a1 y.
5
6. If q : R\{0} → R is defined by q(x) = 1/x, then for all a ∈ R\{0},
x
Dq(a)(x) = − .
a2
kf (a + h) − f (a) − 0hk kc − ck
lim = lim = 0.
h→0 khk h→0 khk
2. Since f is linear,
and
6
Conversely, assume that f is differentiable at a. Since by definition,
fi = πi ◦ f , where each πi is the projection transformation (and thus
linear), the chain rule gives that each fi is differentiable at a; namely,
for all i with 1 ≤ i ≤ m,
Now,
|p(a + h) − p(a) − λ(h)|
lim
h→0 khk
|p(a1 + h1 , a2 + h2 ) − p(a1 , a2 ) − λ(h1 , h2 )|
= lim
h→0 khk
|a1 a2 + a2 h1 + a1 h2 + h1 h2 − a1 a2 − a2 h1 − a1 h2 |
= lim
h→0 khk
|h1 h2 |
= lim .
h→0 khk
Since (
h21 |h2 | ≤ |h1 |
|h1 h2 | ≤ ,
h22 |h1 | < |h2 |
7
we obtain
|h1 h2 | h2 + h22
0≤ ≤ 1 = khk,
khk khk
whose limit as h → 0 is obviously zero. Thus, squeeze theorem yields
|h1 h2 |
lim = 0.
h→0 khk
8
2. The result follows in a similar fashion:
3. If g(a) 6= 0 then
9
Definition 1.9. A function f : U → R with U ⊆ Rn is said to be of
differentiability class C k with nonnegative integer k, iff all partial derivatives
Dα f
10
Let 1 ≤ j ≤ n; we first specify a function k : C 0 → R by the following
procedure: let h ∈ C 0 and define g by
for all x so that the resultant set of arguments of fi belongs to C. Clearly, the
derivative of g is a first order partial derivative of fi which exists everywhere
on C, hence differentiable. Moreover, since C is an open rectangle, thus
convex, and a + h ∈ C, the domain of g contains aj and aj + hj , as well as
all points between the two. Consequently, the mean value theorem applies
to g; i.e. we obtain a point p on this interval so that
thus, finally,
n
X
fi (a + h) − f (a) = hj Dj fi (cj (h)).
j=1
Now, define λi : Rn → R by the 1 × n matrix D1 fi (a) . . . Dn fi (a) and
11
we have for all nonzero h ∈ C 0 that
Xn n
X
Dj fi (cj (h))hj − Dj fi (a)hj
|fi (a + h) − fi (a) − λi (h)|
j=1 j=1
0≤ =
khk khk
n
X |hj |
≤ |Dj fi (cj (h)) − Dj fi (a)|
j=1
khk
Xn
≤ |Dj fi (cj (h)) − Dj fi (a)|.
j=1
12
Lemma 1.13. If f : A → Rm with A ⊆ Rn is differentiable at a, then all
first order partial derivatives of each fi exist at a and the matrix of Df (a) is
exactly Jf (a).
Proof. Let 1 ≤ i ≤ m and 1 ≤ j ≤ n; define a function h : R → Rn by
h(x) = (a1 , . . . , aj−1 , x, aj+1 , . . . , an ). Observe that fi ◦ h is a real valued
function of one variable and Dj fi (a) = (fi ◦ h)0 (aj ), which means that the
matrix of D(fi ◦ h)(aj ) has only the single entry Dj fi (a) (by Lemma 2.3).
By chain rule, we obtain that
D(fi ◦ h)(ai ) = Dfi (a) ◦ Dh(aj ).
In matrix form,
0
..
.
Dj fi (a) = y1 . . . yn 1 ← jth place
.
..
0
= yj .
It appears that Dj fi (a) is the jth entry of the 1× n matrix of Dfi (a), namely
the (i, j)th entry of the matrix of Df (a). It cannot therefore be the case that
Dj fi (a) does not exist by differentiability of f at a.
Theorem 1.14. Multivariable Chain Rule: if f : A → Rm with A ⊆
Rn is continuously differentiable at a and g : B → R with B ⊆ Rm is
differentiable at f (a), then for all integers i with 1 ≤ i ≤ n,
m
X
Di (g ◦ f )(a) = Dj g(f (a))Di fj (a).
j=1
13
Comparing the matrix entries yields the result.
14
Di (f g)(x) = Di (p ◦ (f, g))(x)
= D1 p(f (x), g(x))Di f (x) + D2 p(f (x), g(x))Di g(x)
= g(x)Di f (x) + f (x)Di g(x).
15
Corollary 1.17. If any two functions f : A → Rm with A ⊆ Rn and g :
B → R with B ⊆ Rm are of class C k with k ≥ 1 then the composition g ◦ f
is C k .
= Di (g ◦ f )(a),
16
kf (a + h) − f (a) − Df (a)(h)k
0 = lim
h→0 khk
kf (a + tv) − f (a) − Df (a)(tv)k
= lim
t→0 ktvk
kf (a + tv) − f (a) − tDf (a)(v)k
= lim
t→0 |t|kvk
1
f (a + tv) − f (a)
= lim
− Df (a)(v)
kvk t→0
t
f (a + tv) − f (a)
⇒ lim = ∇a f (v) = Df (a)(v).
t→0 t
Although subtle, one must note that substituting tv for h was possible only
because a lies in the interior of A, since a + tv may not fall in the domain of
f otherwise as the choice of v was arbitrary.
Obviously if v = 0, then ∇a f (v) = Df (a)(v) = 0. Thus, we conclude
that ∇a f = Df (a).
17
Let s = (1 − c)a + cb, then
g(c + t) − g(c)
g 0 (c) = lim
t→0 t
f ((1 − (c + t))a + (c + t)b) − f ((1 − c)a + cb)
= lim
t→0 t
f ((1 − c)a + cb + t(b − a)) − f (s)
= lim
t→0 t
f (s + t(b − a)) − f (s)
= lim
t→0 t
= ∇s f (b − a)
= Df (s)(b − a)
since f is differentiable at s.
Proof. Let g(x) = f (a1 , . . . , ai−1 , x, ai+1 , . . . , an ). It is clear that g has a local
maximum (or minimum) at ai , which is in the interior of the domain of g,
and is differentiable there since g 0 (ai ) = Di f (a). Fermat’s theorem in single-
variable calculus implies that g 0 (ai ) is zero, and the result clearly follows.
= det Jf (a),
18
Lemma 1.23. If A ⊆ Rn is an open ball and f : A → Rn is continuously
differentiable, then if for some real M , all first order partial derivatives of
each fi yield |Dj fi (s)| ≤ M for all s ∈ A, then
kf (y) − f (x)k ≤ n2 M ky − xk
for all x, y ∈ A.
Proof. Let integer i so that 1 ≤ i ≤ n. Since f is continuously differentiable,
it must be differentiable at A, which is open and convex; thus, by Theorem
2.20, there exists some s ∈ A where
≤ nM |y − x|
D(f |−1 −1
V )(y) = Df (f |V (y))
−1
for all y ∈ W .
Proof. Clearly, f is differentiable, so for convenience, let λ = Df (a), whose
matrix’ determinant is nonzero, thereby yielding an inverse λ−1 . Consider the
function λ−1 ◦f ; one can show that this function is continuously differentiable
19
(see Lemma 2.15 and Corollary 2.17), and its corresponding derivative at a
turns out to be the identity transformation;
Now, if the theorem holds for all continuously differentiable functions whose
derivative at a is the identity, then it holds for λ−1 ◦ f ; namely, for some open
set V containing a, the restriction (λ−1 ◦ f )|V = λ−1 ◦ f |V has a differentiable
inverse f |−1 −1
V ◦ λ which for all z ∈ λ (f (V )) satisfies
D(f |−1
V ◦ λ)(z) = D(λ
−1
◦ f )(f |−1
V (λ(z)))
−1
= (λ−1 ◦ Df (f |−1
V (λ(z))))
−1
= Df (f |−1
V (λ(z)))
−1
◦ λ,
by the chain rule. Repeating the same to the left side, we get
D(f |−1 −1
V ◦ λ)(z) = Df |V (λ(z)) ◦ λ;
therefore,
Df |−1 −1 −1
V (λ(z)) = Df (f |V (λ(z))) .
Df |−1 −1 −1
V (y) = Df |V (λ(z)) = Df (f |V (λ(z)))
−1
= Df (f |−1 −1
V (y)) ;
kf (a + h) − f (a) − λ(h)k
lim = 0,
h→0 khk
kf (a + h) − f (a) − λ(h)k
<1
khk
20
by which we can’t have f (a + h) = f (a), since the left side would evaluate to
exactly 1, otherwise. Therefore, we have for some closed ball U containing a
in its interior, that
seeing that Jf (a) is the matrix of the identity transformation λ whose deter-
minant is exactly 1, from which it can be shown that
1
|Dj gi (x)| = |Dj fi (x) − Dj fi (a)| <
2n2
and so by Lemma 2.23, since g is continuously differentiable, we have for all
x1 , x2 ∈ int(U ) that
1
kf (x1 ) − x1 − (f (x2 ) − x2 )k ≤ kx1 − x2 k.
2
By the reverse triangle inequality,
1
kx2 − x1 k − kf (x1 ) − f (x2 )k ≤ kx1 − x2 k,
2
and so
kx1 − x2 k ≤ 2kf (x1 ) − f (x2 )k for all x1 , x2 ∈ int(U ). (11)
By Corollary ??, f (∂U ) is compact which by (9) does not contain f (a);
therefore, we have for some d > 0 that kf (x) − f (a)k ≥ d for all x ∈ ∂U .
21
Let W = {y ∈ Rn : ky − f (a)k < d/2}; observe that W is an open ball. If
x ∈ ∂U and y ∈ W , we have:
22
therefore, y = f (x). Now, suppose for x1 , x2 ∈ int(U ) that f (x1 ) = f (x2 ).
From (11), we immediately obtain that x1 = x2 .
Now, let V = int(U ) ∩ f −1 (W ). What we’ve essentially shown is that f |V
is a bijection and thus has an inverse, specifically with a range W . Moreover,
f is continuous and W is an open ball, and so f −1 (W ) equals B ∩ A for
some open set B. Since int(U ) ∩ B = (int(U ) ∩ A) ∩ B, we obtain that V is
open by associativity of set intersection. One can also verify that a ∈ V by
construction of W . Proceeding further, we can rewrite (11) as
kf |−1 −1
V (y1 ) − f |V (y2 )k ≤ 2ky1 − y2 k for all y1 , y2 ∈ W ; (13)
in other words, if we let > 0 and δ = /2, then ky1 − y2 k < δ implies that
kf |−1 −1 −1
V (y1 ) − f |V (y2 )k < , namely that f |V is continuous. Fix x1 , x2 ∈ V
and let µ = Df (x1 ); define a function ψ by
also, by differentiability of f ,
kψ(x − x1 )k
lim = 0. (14)
x→x1 kx − x1 k
kf |−1 −1 −1
V (t) − f |V (y) − µ (t − y)k kµ−1 (ψ(f |−1 −1
V (t) − f |V (y)))k
0≤ =
kt − yk kt − yk
M kψ(f |V (t) − f |−1
−1
V (y))k
≤
kt − yk
23
for some large enough M > 0. Continuing further,
M kψ(f |−1 −1 −1 −1
V (t) − f |V (y))k kf |V (t) − f |V (y)k
=
kf |−1 −1
V (t) − f |V (y)k kt − yk
−1 −1
2M kψ(f |V (t) − f |V (y))k
≤ ,
kf |−1 −1
V (t) − f |V (y)k
since f |−1 −1
V (t) 6= f |V (y) for all t 6= y, by contraposition, whereas (13) was
used to derive the last step. As necessitated by (14) and the continuity of
f |−1
V , we have
kψ(f |−1 −1
V (t) − f |V (y))k
lim = 0.
t→y kf |−1 −1
V (t) − f |V (y)k
In conclusion, f |−1
V is differentiable at y with derivative µ
−1
= Df (f |−1
V (y))
−1
for all y ∈ W .
Corollary 1.25. If f : A → Rn with A ⊆ Rn is a continuously differentiable
injective function and det Jf (x) 6= 0 everywhere, then f is an open map with
a differentiable inverse.
Proof. Since A is open as implied by the continuous differentiability of f , all
sets in the subspace topology on A are open in Rn ; let B be such a set and
y ∈ f (B), corresponding to some x ∈ A such that f (x) = y. By the inverse
function theorem applied to f |B , there exists an open set V ⊆ B containing
x and an open f (V ) ⊆ f (B) containing y. This suffices to show that f (B)
is open and f therefore an open map.
Since f is injective, it has an inverse with domain f (A). Moreover, for
every y ∈ f (A), there exists an open set V containing f −1 (y) so that f |V has
a differentiable inverse with an open domain W containing y. In fact, this
inverse is exactly f −1 |W ; since f |−1
V (y1 ) = f
−1
|W (y2 ) for distinct y1 , y2 ∈ W
implies f (y1 ) = f (y2 ), a contradiction to the fact that f is injective. This
suffices to show that f −1 is everywhere differentiable.
Theorem 1.26. Implicit Function Theorem: if f : A → Rm with A ⊆
Rm+n is continuously differentiable where f (a, b) = 0 for some a ∈ Rn and
b ∈ Rm , and the matrix
Dn+1 f1 (a, b) . . . Dn+m f1 (a, b)
M =
.. ... ..
. .
Dn+1 fm (a, b) . . . Dn+m fm (a, b)
has a nonzero determinant, then there exist an open set B ⊆ Rn containing
a and a unique differentiable function g : B → Rm such that g(a) = b and
f (x, g(x)) = 0 for all x ∈ B.
24
Remark. Be mindful that (a, b), for the purposes of the proof, should be taken
not as an ordered pair, but a notational shorthand for (a1 , . . . , an , b1 , . . . , bm ).
The same should apply to the “product” U × V whose points in the technical
sense, are likewise obtained by appending the coordinates of V ’s elements to
those of U , in seeming deviation of the usual Cartesian product.
Proof. Define a function F : A → Rm+n by F (x, y) = (x, f (x, y)) with
x ∈ Rn and y ∈ Rm . More precisely, F = (π1 , . . . , πn , f1 , . . . , fm ), and so
one can easily observe that F is continuously differentiable. We also have for
whenever 1 ≤ j ≤ n + m and 1 ≤ i ≤ n that
(
0 i 6= j
Dj Fi (a, b) = ;
1 i=j
therefore,
therefore,
25
Define G : Rn → Rn+m by G(x) = (x, 0). Clearly, G is continuous; therefore,
G−1 (W ) is open in Rn . Moreover, G−1 (W ) is nonempty as it contains a,
since F (a, b) = (a, f (a, b)) = (a, 0) ∈ W ; and so we can define B ⊆ G−1 (W )
to be any connected open subset containing a. Observe that B ⊆ U by (15),
since for each x ∈ B, hence (x, 0) ∈ W , we have x = h1 (x, 0) ∈ U .
Now, define g : B → Rm by g(x) = h2 (x, 0). Clearly, by the previous
argument, h(a, 0) = (a, b) and so g(a) = b. Furthermore, if x ∈ B, we have
(x, 0) ∈ W ; thus, f (x, g(x)) = 0 by (16). It can also be shown that g is
differentiable by the chain rule, since g = (h2 ◦ G)|B and G is linear, hence
differentiable. Moreover, observe that by construction, the range of g lies
completely in V .
We shall now proceed to prove that g is unique; suppose that g0 is a
function satisfying the same conditions specified by the theorem for g. We
will want to prove that the set of all points in B at which g0 agrees with
g is in fact, open. Suppose further that g(a0 ) = g0 (a0 ) for some a0 ∈ B.
Consequently, g0 maps a0 to V , since g(B) ⊆ V ; therefore, because g0 is
differentiable, and thus continuous, there exists an open ball B0 containing
a0 whose image under g0 lies completely in V ; moreover, B0 ⊆ U and so
(x, g0 (x)) ∈ U × V for all x ∈ B0 . We also have f (x, g0 (x)) = 0 by definition,
and so
26
open sets U, V ⊆ Rn where a ∈ U , and a differentiable function h : V → U
with a differentiable inverse such that
f (h(x1 , . . . , xn )) = (xn−p+1 , . . . , xn )
for all x ∈ V .
Proof. Since Jf (a) has rank p, there exist p distinct integers j1 , . . . , jp such
that the p × p matrix
Dj1 f1 (a) · · · Djp f1 (a)
M =
.. .. ..
. . .
Dj1 fp (a) · · · Djp fp (a)
has a nonzero determinant. We define a permutation τ ∈ Sn , in cycle nota-
tion, by
(j1 n − p + 1)(j2 n − p + 2) . . . (jp n)
and the permutation matrix
eτ (1)
G = ...
eτ (n)
defining a linear transformation g : Rn → Rn . Observe that
g(x1 , . . . , xn ) = (xτ (1) , . . . , xτ (n) )
and it corresponds to swapping the jk th coordinate with the (n − p + k)th
for each k with 1 ≤ k ≤ p. Since permutation matrices are invertible, let
a0 = g −1 (a); consider the continuously differentiable function f ◦ g. By the
multivariable chain rule, we have
n
X
0
Dn−p+k (fi ◦ g)(a ) = Dj fi (g(a0 ))Dn−p+k gj (a0 )
j=1
Xn
= Dj fi (a)gj (en−p+k )
j=1
= Djk fi (a)
which implies that the matrix
Dn−p+1 (f ◦ g)1 (a0 ) · · · Dn (f ◦ g)1 (a0 )
.. .. ..
. . .
Dn−p+1 (f ◦ g)p (a0 ) · · · Dn (f ◦ g)p (a0 )
27
is exactly identical to M and thus also has a nonzero determinant. Just like
with the previous theorem, define F : A → Rn by F (x, y) = (x, (f ◦ g)(x, y))
with x ∈ Rn−p and y ∈ Rp . Using similar arguments from the previous
theorem, F is continuously differentiable and det(JF (a0 )) 6= 0, and so by the
inverse function theorem, there exists an open set U 0 ⊆ Rn containing a0 so
that the restriction F |U 0 has an open range V and a differentiable inverse k
which for all (x, y) ∈ V (with reference to (16)) satisfies
y = (f ◦ g)(k(x, y));
in other words,
28