You are on page 1of 26

Chapter 3

Differentiability

“Most important part of doing physics


is the knowledge of approximation.”

Lev Landau

3.1. Linear Approximation


This chapter we will focus exclusively on functions F : Rm ! Rn , and we assume all
Rn ’s are equipped with the standard Euclidean metric. The goal of this chapter is to
extend the theory of differentiations of single-variable functions f : R ! R to higher
dimensions.
Given a function f : (a, b) ! R, we say f is differentiable at x0 2 (a, b) if the
following limit exists:
f (x0 + h) f (x0 ) f (x) f (x0 )
f 0 (x0 ) := lim = lim .
h!0 h x!x0 x x0
By manipulating the above limit, we also get:
✓ ◆
f (x) f (x0 ) + f 0 (x0 ) (x x0 ) f (x) f (x0 )
(3.1) lim = lim 0
f (x0 ) = 0.
x!x0 x x0 x!x0 x x0
Also, one can easily show that if there exists a constant A 2 R such that
f (x)
f (x0 ) + A(x x0 )
lim = 0,
x!x0 x x0
then f must be differentiable at x0 , and f 0 (x0 ) = A.
The function T (x) := f (x0 ) + f 0 (x0 )(x x0 ) is an affine linear function1 with
T (x0 ) = f (x0 ), so (3.1) essentially says that when x is very close to x0 , the function f (x)
is very close to T (x) in a sense that their gap is much smaller than |x x0 |. We often say
“f (x) can be linearly approximated by T (x) near x0 ”, and using little-o notations, we can
write:
f (x) = f (x0 ) + f 0 (x0 )(x x0 ) + o(|x x0 |) as x ! x0 .
The two notions – linear approximations and derivatives – are equivalent for first
order differentiation for functions on R. However, in MATH 1023, we have seen that

67

1“Affine” linear simply means linear plus a constant.


68 3. Differentiability

the n-th derivative f (n) (x0 ) exists would imply (by repeatedly applying L’Hopsital’s
rule) f (x) can be approximated by an n-th degree polynomial Tn (x), i.e. f (x) =
n
f (x0 ) + Tn (x) + o(|x x0 | ) as x ! x0 , then we must have:
f 0 (x0 ) f (n) (x0 )
Tn (x) = (x x0 ) + · · · + (x x0 ) n
1! n!
and we call Tn (x) the n-th degree Taylor polynomial for f (x) at x0 .
However, the converse is not true – there are functions that can be approximated by
n-th degree polynomial, where n 2, near x0 , but f (n) (x0 ) does not exist. The functions
is of the form xn Q , where
(
1 if x 2 Q
Q (x) = .
0 otherwise

Now we consider functions F : Rn ! R with n 2. It turns out the existence of


first-order (partial) derivatives does not imply it can be linearly approximated either!

3.1.1. Partial derivatives. Throughout this chapter, we assume the domain ⌦ ⇢ Rn


of a function F : ⌦ ! Rm is a non-empty open set, so that whenever x0 2 ⌦, there
exists an open ball B" (x0 ) ⇢ ⌦, and we can approach x0 from any direction. Take R2 as
an example, one can measure the rate of change of f (x, y) along x direction leaving y
fixed, or along y direction leaving x fixed. Furthermore,
p we can also measure the rate of
change along any unit direction (u1 , u2 ) 2 R2 where u21 + u22 = 1.

Definition 3.1 (Partial derivatives, directional derivatives). Given f : ⌦ ⇢ Rn ! R,


a = (a1 , · · · , an ) 2 ⌦, and a unit vector u = (u1 , · · · , un ) 2 Rn , we denote and define
the directional derivative of f along u at a by:
f (a + hu) f (a)
Du f (a) := lim whenever the limit exists.
h!0 h
Denote ei := (0, · · · , |{z}1 , · · · , 0), then in particular we call:
i-th slot
f (a1 , · · · , ai + h, · · · , an ) f (a1 , · · · , ai , · · · , an )
Dei (a) = lim
h!0 h
@f
the partial derivative of f with respect to xi at a, and is usually denoted by (a),
@xi
@f @f
, or simply if the point a can be understood in the context or is not relevant.
@xi a @xi

Calculation of partial derivatives is straight-forward: given a function say f (x, y),


one simply treats y as the variable and x as a constant when computing @f
@y ; whereas one
@f
treats x as the variable and y as a constant when computing @x .

@f @f
Exercise 3.1. Find the partial derivatives @x and @y where f (x, y) equals:
x y x x y y x y
xx , x x , xy , y x , xy , y x , y y , y y .

Example 3.2. Consider the function f : R2 ! R defined by


8 2
< x y if (x, y) 6= (0, 0)
f (x, y) = x + y 2
4 .
:
0 if (x, y) = (0, 0)
3.1. Linear Approximation 69

To find @f
@x at (1, 1), we may simply ignore the value of f at (0, 0) since derivatives of f
depend only on its value in a small open ball B" (1, 1) around (1, 1). We may simply
compute:
@f @ x2 y (x4 + y 2 ) · 2xy x2 y(4x3 )
(1, 1) = = = 0.
@x @x x4 + y 2 (1,1) (x4 + y 2 )2 (1,1)
@f
Similarly for @y at (1, 1).
Let’s demonstrate
p
how to compute the directional derivative from the definition.
Suppose u = ( 2 , 2 ), then
1 3
p
f (1, 1) + h(1/2, 3/2) f (1, 1)
Du f (1, 1) = lim
h!0 h
p
(1+h/2)2 (1+ 3h/2) 1
p
(1+h/2) +(1+ 3h/2)2
4 2
= lim
h!0 h
We leave it as an exercise for reader’s MATH 1013 friends to calculate the above limit.
However, when we handle the partial/directional derivatives at (0, 0), we then need
to be more careful, as f ((0, 0) + hu) and f (0, 0) are defined differently when h 6= 0.
When letting h ! 0, we regard that h 6= 0 yet very close to 0, so
(hu1 )2 (hu2 ) hu1 u2
f ((0, 0) + hu) = f (hu1 , hu2 ) = = 2 4 .
(hu1 )4 + (hu2 )2 h u1 + u22
Therefore, we have
hu1 u2
f (0 + hu) f (0) h2 u41 +u22
0 u1
Du f (0, 0) = lim = lim = when u2 6= 0.
h!0 h h!0 h u2
When u2 = 0, we then have f ((0, 0) + h(u1 , u2 )) = f (hu1 , 0) = 0 for any h. Therefore,
f (hu1 , 0) f (0, 0) 0 0
Du f (0, 0) = lim = lim = 0.
h!0 h h!0 h
@f
In other words, it means @x (0, 0) = 0.

In the above example, we see that f has directional derivatives (including partial
derivatives) along any unit directions u at the point (0, 0). However, it can be shown that
f is not even continuous at (0, 0).
Along the parabola y = x2 , which can be parametrized by (t) = (t, t2 ), we have
t4 1
lim f ( (t)) = lim= .
t!0 +t 4 2t!0 t4

However, along the line y = 0, we have f (x, 0) = 0 for any x, and so


lim f (x, 0) = 0.
x!0
This shows f is not continuous as lim f (x, y) does not exist.
(x,y)!(0,0)

Therefore, existence of directional derivatives in any unit direction does not imply
the function is continuous!
Exercise 3.2. Consider the function f : R2 ! R defined by
8 2
< x y if x + y 6= 0
f (x, y) = x + y 2
2 .
:
0 if x + y = 0
70 3. Differentiability

Find the directional derivative of f along any unit direction at the points (0, 0), (1, 0),
and (1, 1); or show that it does not exist in some directions.

3.1.2. Differentiable function of several variables. The example in the previous


subsection told us that even if a function has directional derivatives along any unit
directions, it could still behave badly around that point – such as being discontinuous at
that point. The notion of differentiability for multivariable functions f : ⌦ ⇢ Rn ! R is
much stronger than merely existence of partial/directional derivatives. It concerns about
the existence of a “good” linear approximations:

Definition 3.3 (Differentiable functions). Let ⌦ be a non-empty open set in Rn , and a =


(a1 , · · · , an ) 2 ⌦. A function f (x1 , · · · , xn ) : ⌦ ⇢ Rn ! R is said to be differentiable
at a if
@f
(1) (a) exists for any i = 1, · · · , n, and
@xi
(2) the following limit exists and equals:
Pn @f
f (x) f (a) + i=1 @x i
(a)(xi ai )
lim = 0.
x!a |x a|
Condition (2) can be rewritten as:
Xn
@f
f (x) = f (a) + (a)(xi ai ) + o |x a| as x ! a.
i=1
@xi
Pn @f
Therefore, we could call f (a) + i=1 @x i
(a)(xi ai ) the linear approximation of f
at a.
For a vector-valued function F : ⌦ ! Rm , one can write F as
F (x) = f 1 (x), · · · , f n (x) ,
where each f i : ⌦ ! R is a scalar function. We say F is differentiable at a if each
component f i is differentiable at a is differentiable at a.

Remark 3.4. Recall in MATH 2023 we defined the gradient vector of f as:
✓ ◆
@f @f
rf (a) = (a), · · · , (a) .
@x1 @xn
Using this notation, one can express the linear approximation of f by:
f (a) + rf (a) · (x a)
where · is the standard dot product in Rn .

Exercise 3.3. Show that if there exist constants c1 , · · · , cn 2 R such that


n
X
f (x) = f (a) + ci (xi ai ) + o |x a| as x ! a,
i=1
then it is necessary that all directional (including partial) derivatives of f exist at a,
@f
and @x i
(a) = ci for any i = 1, · · · , n.
[Remark: Therefore, some textbooks define differentiability of f in a more
succinct way: f is differentiable at a if there exists a constant vector c 2 Rn such
that f (x) = f (a) + c · (x a) + o(|x a|) as x ! a. This exercise shows it is
equivalent to our definition.]
3.1. Linear Approximation 71

Example 3.5. Let f (x, y) = |xy|. We will prove that f is differentiable at (0, 0). First by
observing that f (x, 0) = 0 and f (0, y) = 0 for any x, y 2 R, we conclude that
@f @f
(0, 0) = (0, 0) = 0.
@x @y
Next we work out the limit in Condition (2):
P2 @f
f (x) f (a) + i=1 @x i
(a)(xi ai )
lim
x!a |x a|
|xy| 0 0(x 0) 0(y 0)
= lim
x!a |x 0|
|xy|
= lim p .
(x,y)!(0,0) x2 + y 2
Note that
|xy| |x|
p =p · |y|  |y| ! 0 as (x, y) ! (0, 0),
x2 + y2 x2 + y 2
we conclude that condition (2) holds, and hence f is differentiable at (0, 0).
Example 3.6. Let’s revisit the function f : R2 ! R we defined earlier:
( 2
x y
4 2 if (x, y) 6= (0, 0)
f (x, y) = x +y .
0 if (x, y) = (0, 0)
@f @f
We have already proved that @x (0, 0) = @y (0, 0) = 0. Let’s check if condition (2) holds:
@f @f
f (x, y) f (0, 0) @x (0, 0)(x 0) @y (0, 0)(y 0)
lim p
(x,y)!(0,0) x2 + y 2
x2 y
x4 +y 2
= lim p .
(x,y)!(0,0) x2 + y 2
Along the parabolic path (x, y) = (t, t2 ), we have
x2 y t2 ·t2
x4 +y 2 t4 +t4 1
p =p = p ! +1 as t ! 0.
2
x +y 2 2 |t| 2 2 |t|
Hence the limit does not exist, and so f is not differentiable at (0, 0).

Exercise 3.4. Denote the unit circle in R2 by @B1 (0) := {x 2 R2 : |x| = 1}. Let
g : @B1 (0) ! R be a continuous function on @B1 (0) such that g(0, 1) = g(1, 0) = 0,
and g( x) = g(x) for any x 2 @B1 (0). Define f : R2 ! R by
( ⇣ ⌘
x
|x| · g |x| if x 6= 0
f (x) =
0 if x = 0
(a) Determine whether (or under what condition) the directional derivative Du f (0)
exists, where u is a unit vector in R2 .
(b) Determine whether (or under what condition) f is differentiable at 0.

2
Exercise 3.5. Suppose f : Rn ! R is a function satisfying |f (x)|  |x| for any
x 2 Rn . Show that f is differentiable at 0.
72 3. Differentiability

3.1.3. Jacobian matrix. Recall that a vector-valued function


F (x, y) = u(x, y), v(x, y) : ⌦ ⇢ R2 ! R2
is differentiable at (a, b) if both u and v are differentiable scalar-valued functions at (a, b),
meaning that all partial derivatives of u and v exist at (a, b), and
@u @u
u(x, y) = u(a, b) + (x a) + (y b) + o(|(x, y) (a, b)|)
@x (x,y)=(a,b) @y (x,y)=(a,b)
@v @v
v(x, y) = v(a, b) + (x a) + (y b) + o(|(x, y) (a, b)|)
@x (x,y)=(a,b) @y (x,y)=(a,b)

Express the above in matrix form, we get:


  " #  
@u @u
u(x, y) u(a, b) x a h (x, y)
= @x
+ @v @v @y
+ 1
v(x, y) v(a, b) @x @y
y b h2 (x, y)
(x,y)=(a,b)

where hi 2 o |(x, y) (a, b)| for i = 1, 2. The matrix


" #
@u @u
@(u, v) @x @y
= @v @v
@(x, y) (x,y)=(a,b) @x @y (x,y)=(a,b)

is called the Jacobian matrix of F at (a, b), which is also commonly denoted by DF (a, b).
In higher dimensions, we can rewrite the definition of differentiability for a function
F : ⌦ ⇢ Rn ! Rm as follows: F is differentiable at a 2 ⌦ if
F (x) = F (a) + DF (a) · (x a) + h(x)
where h(x) 2 o(|x a|) as x ! a. Here F = (f 1 , · · · , f m ), and DF (a) is the Jacobian
@f i
matrix of F at a whose (i, j)-th component is @x j
(a). The vector x a is regarded as a
column vector, and · means the matrix-vector product. We say a vector-valued function
h(x) is of o(|x a|) if each of its component is of o(|x a|). The (affine) linear map
x 7! F (a) + DF (a) · (x a) is called the linear approximation of F at a.

Exercise 3.6. Prove that if F : ⌦ ⇢ Rn ! Rm is differentiable at a 2 ⌦, then it is


continuous at a.

3.1.4. C 1 functions. Although existence of partial derivatives does not imply dif-
ferentiability of multivariable functions, if we impose further conditions on the partial
derivatives, namely continuity, then it would imply differentiability.

Definition 3.7 (C 1 functions). A function F = (f 1 , · · · , f m ) : ⌦ ⇢ Rn ! Rm is said to


be C 1 at a 2 ⌦ if
@f i
(1) there exists " > 0 such that all partial derivatives @xj exist on B" (a), and
@f i
(2) all partial derivatives @xj are continuous at a.
If F is C 1 at every point in the open domain ⌦, we may simply say F is C 1 on ⌦.

@f
Example 3.8. The function f (x, y) = |xy| is not C 1 at (0, 0), because @x does not exist
at any (0, b) where b 6= 0:
@f f (0 + t, b) f (0, b) |t|
(0, b) = lim = lim · |b| .
@x t!0 t 0 t!0 t

Any open ball B" (0, 0) center at (0, 0) must include some points (0, b), b 6= 0, so condition
(1) fails.
3.1. Linear Approximation 73

Example 3.9. Consider the function f : R2 ! R defined by


( 2
x y
2 2 if x + y 6= 0
f (x, y) = x +y .
0 if x + y = 0
Denote the diagonal x + y = 0 by := {(a, b) : a + b = 0}. For any (a, b) 62 , the
2
function f is x2x+yy
2 on a small open ball B" (a, b). Partial derivatives of f at this (a, b)

depend only on its values on this ball, and we can simply take partial derivatives as usual:
@f (x2 + y 2 )(2xy) x2 y(2y) 2xy(x2 + y 2 xy)
= =
@x (x2 + y 2 )2 (x2 + y 2 )2
2 2 2 2
@f (x + y )(x ) x y(2x) x (x y)2
2
= =
@y (x2 + y 2 )2 (x2 + y 2 )2
As the partial derivatives are rational functions and the denominator is non-zero on
R2 , they are continuous on R2 . Therefore, f is C 1 on R2 .
Now we discuss the case when (a, b) 2 , i.e. a = b, as f is defined differently at
(a, b) and at (x, y) 62 , we need to find its partial derivatives from the definition:
@f f (a + t, b) f (a, b) f (a + t, a) f (a, a)
(a, b) = lim = lim
@x t!0 t 0 t!0 t 0
(a+t)2 ( a) (
(a+t)2 +a2 0 does not exist if a 6= 0
= lim =
t!0 t 0 if a = 0
@f f (a, a + t) f (a, a)
(a, b) = lim
@y t!0 t 0
a2 ( a+t) (
a2 +( a+t)2 does not exist if a 6= 0
= lim =
t!0 t 0 0 if a = 0
Therefore, partial derivatives exist at (a, a) only when a = 0. In particular, f is not C 1
at any (a, a) with a 6= 0.
We are left to determine whether f is C 1 at (0, 0), i.e. whether @f @f
@x and @y are
continuous at (0, 0). It can be easily seen that along the two paths (x, y) = (t, t) and
(x, y) = (t, t), the limits of @f
@y are different as t ! 0:

@f t2 (t t)2
= lim =0
@y t!0 (t2 + t2 )2
(x,y)=(t,t)
@f t2 (2t)2
= lim = 1.
@y t!0 (t2 + ( t)2 )2
(x,y)=(t, t)
@f
Therefore, @y is not continuous at (0, 0), and so f is not C 1 . Also, we can argue that
by saying that any open ball B" (0, 0) must contain some points (a, a), where a 6= 0.
Both @f @f
@x and @y do not exist at those points, so f is not C at (0, 0).
1

The reason why we discuss C 1 functions is because they are also differentiable (but
not vice versa).

Proposition 3.10 (C 1 implies differentiability). If F : ⌦ ⇢ Rn ! Rm is C 1 at a 2 ⌦,


then it is also differentiable at a.

Proof. Because both differentiability and continuity conditions can be verified component-
wise, it suffices to consider only the case of scalar functions f : ⌦ ⇢ Rn ! R. For
simplicity, we give the proof when n = 2. The higher dimensional case is similar – just
74 3. Differentiability

that we have more variables and terms to write down – we leave it as an exercise for
readers.
Consider f (x, y) : ⌦ ⇢ R2 ! R and (a, b) 2 ⌦. If f is C 1 at (a, b), then @f @f
@x and @y
exist on some open ball B" (a, b) , and they are continuous at (a, b). The key idea of
the proof is to make use of the mean-value theorem in single-variable calculus. First we
rewrite:
f (x, y) f (a, b) = f (x, y) f (x, b) + f (x, b) f (a, b)
@f @f
= (y b) + (x a)
@y (x,u) @x (v,b)

where u is between y and b (and it could depend on x too), and v is between x and a. As
f is C 1 at (a, b), both @f @f
@x and @y are continuous at (a, b). As (x, y) ! (a, b), by squeeze
theorem we have (x, u) ! (a, b) and (v, b) ! (a, b) too. Therefore, by continuity we get
@f @f @f @f
lim = and lim = .
(x,y)!(a,b) @x (x,u) @x (a,b) (x,y)!(a,b) @y (v,b) @y (a,b)

In other words, we have


@f @f
= + o(1)
@x (x,u) @x (a,b)
@f @f
= + o(1)
@y (v,b) @y (a,b)

as (x, y) ! (a, b). Combining the above results, we conclude that


! !
@f @f
f (x, y) f (a, b) = + o(1) (x a) + + o(1) (y b)
@x (a,b) @y (a,b)
@f @f
= (x a) + (y b) + o(x a) + o(y b).
@x (a,b) @y (a,b)

Since o(x a) 2 o |(x, y) (a, b)| as o(y b) 2 o |(x, y) (a, b)| , we conclude that
@f @f
f (x, y) = f (a, b) + (x a) + (y b) + o |(x, y) (a, b)|
@x (a,b) @y (a,b)

as desired. ⇤

Exercise 3.7. Write down the proof of the Proposition 3.10 for any n 2 N.

Exercise 3.8. Consider a function F (x1 , · · · , xn ) = (f 1 , · · · , f m ) : BR ⇢ Rn ! Rm


j
whose domain BR is an open ball. Suppose the first partial derivatives @f @xi are all
bounded on BR , prove that there exists a constant C > 0, depending only m, n and
j
the bounds of @f
@xi ’s, such that
|F (x) F (y)|  C |x y|
for any x, y 2 BR .

Exercise 3.9. Let A be a real m ⇥ n matrix, and we define:


kAk := sup{|Ax| : x 2 Rn and kxk = 1}.
(a) Prove that k·k is a norm on the vector space Mm⇥n (R), the space of m ⇥ n real
matrices.
3.1. Linear Approximation 75

(b) Denote the (i, j)-th entry of A by aij . Show that |aij |  kAk for any i, j.
(c) Show that there q
exists a constant Cm,n > 0, depending only on m and n, such
P
that kAk  Cm,n 2
all i,j aij

(d) Show that for any x 2 Rn , we have |Ax|  kAk |x|.

Remark 3.11. Note that the function f (x, y) = |xy| is differentiable at (0, 0), but it is
not C 1 at (0, 0). Therefore, the converse of Proposition 3.10 does not hold.

3.1.5. Chain rule. From the linear approximation viewpoint on differentiations,


one can formulate the multivariable chain rule as matrix products of Jacobian matrices.

Theorem 3.12 (Chain rule). Consider two functions F : ⌦ ⇢ Rn ! Rm and G : U ⇢


Rm ! Rk where F (⌦) ⇢ U and U is open in Rm . Suppose F is differentiable at a 2 ⌦,
and G is differentiable at F (a), then G F : ⌦ ! Rk is differentiable at a, and the
Jacobian matrices satisfy:
D(G F )(a) = DG F (a) · DF (a).
Here · denotes the matrix product.

Proof. The key idea is to write F and G into linear approximation forms, i.e. there exist
functions h : B⌘ (a) ⇢ Rn ! Rm and k : B⇢ F (a) ⇢ Rm ! Rk be defined near a and
F (a) respectively, such that:
F (x) = F (a) + DF (a) · (x a) + h(x)
G(y) = G F (a) + DG F (a) · y F (a) + k(y)
where h(x) 2 o |x a| as x ! a, and k(y) 2 o |y F (a)| as y ! F (a). Since
F (x) ! F (a) as x ! a, by substituting F (x) into y, we have that when x is sufficiently
close to a,
G F (x) = G F (a) + DG F (a) · F (a) + DF (a) · (x a) + h(x) F (a) + k F (x)
= G F (a) + DG F (a) · DF (a) · (x a) + DG F (a) · h(x) + k F (x) .
Our remaining task is to prove that DG F (a) · h(x) + k F (x) 2 o |x a| as x ! a.
Since DG F (a) is a constant matrix, we have DG F (a) · h(x) 2 o |x a| . For
the term k F (x) : since k(y) 2 o |y F (a)| , for any " > 0, there exists > 0 such
that if |y F (a)| < , we have
|k(y)|  "0 |y F (a)| .
Here " > 0 is to be chosen. Furthermore, by F (x) ! F (a) as x ! a, there exists ✓ > 0
0

sufficiently small such that if |x a| < ✓, we have |F (x) F (a)| < , and so from the
above we have
k F (x)  "0 |F (x) F (a)| = "0 |DF (a) · (x a) + h(x)|

Recall that h 2 o |x a| , there exists ✓0 > 0 such that |h(x)|  |x a| when


|x a| < ✓0 .
Combining the above results, when |x a| < min{✓, ✓0 } =: ✓00 , we have
|k(F (x))|  "0 kDF (a)k |x a| + "0 |x a| .
Now picking
"
"0 = ,
1 + kDF (a)k
76 3. Differentiability

then we have proven that for any " > 0, there exists ✓00 > 0 such that if |x a| < ✓00 ,
then k F (x)  " |x a|. In other words, we conclude that k F (x) 2 o |x a| as
x ! a, completing the proof. ⇤

In terms of partial derivatives, the chain rule, say in two dimensions, can be written
in the following form. Let F (x, y) = (u(x, y), v(x, y)), and G(u, v) = (w(u, v), z(u, v)),
then the chain rule for (w, z) = G F (x, y) is
" #  " #
@w @w @w @w @u @u
@x @y @u @v @x @y
@z @z = @z @z @v @v .
@x @y @u @v @x @y

In particular, we have
@w @w @u @w @v
= +
@x @u @x @v @x
and similarly for other combinations and also in higher dimensions. It is exactly what we
have learned in MATH 2023.
Exercise 3.10. Show that the directional derivative Du f (a) of a differentiable
function f : ⌦ ⇢ Rn ! R can be written as the single-variable derivative:
d
Du f (a) = f (a + tu).
dt t=0
Hence, prove using the chain rule that
Du f (a) = rf (a) · u.

Exercise 3.11. Using the chain rule, generalize the result of Exercise 3.8 so that the
domain of F can be any convex open set ⌦ ⇢ Rn , not just open balls. Hint: for any
x, y 2 ⌦, construct a straight-path joining x and y.
3.2. Higher-Order Differentiability 77

3.2. Higher-Order Differentiability


3.2.1. Higher-order partial derivatives. In single-variable calculus, the derivative
f 0 (x) of a function f is denoted by f 00 (x) := dx f (x) (alternatively f (2) (x)). If f 00 (x)
d 0

exists, then we say f is twice differentiable, and call f 00 the second-order derivative of f .
Inductively, we say f is n-times differentiable if f is (n 1)-times differentiable and its
(n 1)-order derivative f (n 1) is differentiable, and we denote f (n) = dx d (n 1)
f . In this
case we call f (n)
the n-th order derivative, and we say f is n-times differentiable.
In MATH 1023, we saw that if f (n) exists at x = a, then f can be approximated by
an n-th degree polynomial Tn (x) in a sense that
f (x) = Tn (x) + o (x a)n as x ! a.
Note that the the converse is not true (see footnote remark2).
Now for multivariable calculus, we could also take second-order partial derivatives
by differentiating the first partial derivatives. For instance, for a function f (x, y) : ⌦ ⇢
R2 ! R, we can define
✓ ◆ ✓ ◆
@2f @ @f @2f @ @f
:= :=
@x2 @x @x @x@y @x @y
✓ ◆ ✓ ◆
@2f @ @f @2f @ @f
:= :=
@y@x @y @x @y 2 @y @y
2 2
whenever they exist. Alternatively, we may denote @@xf2 by fxx , and @x@y@ f
by fyx , etc.
Although fyx and fxy are defined in different ways, we will later prove that they are the
same if fxy and fyx are continuous. However, let’s assume they are different for now
until that result is proved.
Generally speaking, we denote:
@kf @ @ @ @f
:= ···
@xik @xik 1
· · · @xi1 @xik @xik 1
@xi2 @xi1
and call it a k-order partial derivatives. It can also be denoted as fxi1 xi2 ···xik .

Exercise 3.12. For a function f : Rn ! R, how many different k-th order partial
derivatives are there?

Exercise 3.13. Consider the function f : R2 ! R defined by


(
xy(x2 y 2 )
x2 +y 2 if (x, y) 6= (0, 0)
f (x, y) = .
0 if (x, y) = (0, 0)
Find fx and fy at any (x, y) 2 R2 , and hence find fxy (0, 0) and fyx (0, 0) (which are
not equal for this example).

3.2.2. Higher-order differentiability. Next we give the definition of higher-order


differentiability for multivariable functions. Just like for first-order differentiability, the
existence of partial derivatives is not sufficient to claim the function is differentiable.

2In this course, we define “f is n-times/order differentiable” if the n-order derivative f (n) exists. Some author defines
that to mean f can be approximated by an n-th degree polynomial. Note that they are not equivalent.
78 3. Differentiability

Definition 3.13 (Higher-order differentiable functions). Consider a function f : ⌦ ⇢


Rn ! R, we say that f is k-times differentiable at a 2 ⌦ if
(1) all k-th order partial derivatives of f exist at a; and
(2) all (k 1)-th order partial derivatives of f are differentiable at a.
Condition (2) means for any (k 1)-order partial derivative fxi1 ···xik 1
, we have
n
X @fxi1 ···xik
fxi1 ···xik (x) = fxi1 ···xik (a) + 1
(a)(xi ai ) + o |x a| as x ! a.
1 1
j=1
@xj

We say a vector-valued function F = (f 1 , · · · , f m ) : ⌦ ⇢ Rn ! Rm is k-times


differentiable at a 2 ⌦ if each component f i : ⌦ ! R is k-times differentiable at a.

Remark 3.14. In order for the k-th order partial derivatives to exist at a, it is necessary
that all (k 1)-th order partial derivatives exist at an open ball B" (a) centered at a. This
condition is implicitly assumed whenever we say k-th order partial derivatives exist at a.

Recall that for first-order derivatives we have a stronger condition called C 1 (which
also requires the first partial derivatives are continuous). In higher-order, we also have a
stronger notion called C k :

Definition 3.15 (C k functions). A function f : ⌦ ! R is said to be C k at a 2 ⌦ if all


k-th order partial derivatives of f exist on some ball B" (a), and they are continuous at
a.

Recall that C 1 functions are differentiable. Therefore, if f : ⌦ ! R is C k at a 2 ⌦,


then all (k 1)-th order partial derivatives of are C 1 at a, and so they are differentiable
at a. This implies f is k-times differentiable at a. To conclude, we have the following
result:

Corollary 3.16 (to Proposition 3.10). If a function f : ⌦ ⇢ Rn ! R is C k at a 2 ⌦,


then it is k-times differentiable at a.

Example 3.17. Again, the converse to the above corollary is not true. Here is a coun-
terexample:
✓Z x ◆ ✓Z y ◆
f (x, y) = |s| ds |t| dt .
0 0

We will show that f is twice differentiable at (0, 0), but is not C 2 at (0, 0).
By Fundamental Theorem of Calculus (note that the absolute-value function is
continuous), we have
Z y Z x
@f @f
= |x| |t| dt = |y| |s| ds.
@x 0 @y 0

@2f
As x 7! |x| is not differentiable at any a 6= 0, the second partial derivative does not
@x2
exist at any (0, b) where b 6= 0. This already shows f is not C at (0, 0), as any small open
2

ball B" (0, 0) must contain points (0, b) with b 6= 0.


3.2. Higher-Order Differentiability 79

However, one can still prove that f is twice differentiable at (0, 0). We first compute
the second partial derivatives:
@f @f
@2f @x (h, 0) @x (0, 0) 0 0
(0, 0) = lim = lim =0
@x2 h!0 h h!0 h
@f @f
@2f @x (0, h) @x (0, 0) 0 0
(0, 0) = lim = lim = 0.
@y@x h!0 h h!0 h
@2f @2f
Similarly, one can also show @y 2 (0, 0) = @x@y (0, 0) = 0. Now we are left to check that:
@f @f @2f @2f
@x (x, y) @x (0, 0) @x2 (0, 0) · (x 0) @y@x (0, 0) · (y 0)
lim p
(x,y)!(0,0) x2 + y 2
Ry
|x| 0 |t| dt
= lim p =0
(x,y)!(0,0) x2 + y 2
Z y
|x|
by the facts that p  1 and |t| dt = 0. Hence, we conclude @f
lim @x is
x2 + y 2 (x,y)!(0,0) 0

differentiable at (0, 0). Similarly, one can also check that @f


@y is differentiable at (0, 0) too.
This conclude f is twice-differentiable at (0, 0).

Exercise 3.14. Determine whether or not the function in Exercise 3.13 is twice-
differentiable or not.
The C k condition of a multivariable function is easier to be verified as it only involves
partial derivatives. Therefore, many theorems in analysis in higher dimensions assume
the function is C k instead of being k-times differentiable (even though the former is
more restrictive).
One benefit of being C k is that the partial derivatives up to k-order commute, e.g
fxyx = fyxx = fxxy for a C 3 function f .

Theorem 3.18 (Clairaut). Suppose f : ⌦ ⇢ Rn ! R is C 2 at a 2 ⌦, then we have


@2f @2f
(a) = (a)
@xi @xj @xj @xi
for any i, j = 1, 2, · · · , n.

Proof. It suffices to prove the case n = 2, as the theorem only involves two directions xi
and xj . Also, for simplicity we denote xi by x, xj by y, and a = (a, b). We need to prove
@2f @2f
that (a, b) = (a, b).
@x@y @y@x
Denote (h, k) := f (a + h, b + k) f (a + h, b) f (a, b + k) + f (a, b). We first apply
the mean value theorem on in two different ways. Define:
g1 (x) := f (x, b + k) f (x, b)
g2 (y) := f (a + h, y) f (a, y)
We apply the mean value theorem by grouping the terms as:
(h, k) = f (a + h, b + k) f (a + h, b) f (a, b + k) f (a, b)
= g1 (a + h) g1 (a)
= g10 (c1 ) ·h
= fx (c1 , b + k) fx (c1 , b) h
80 3. Differentiability

for some c1 between a and a + h. Note that c1 depends on a, b, h, k, but by squeeze


theorem we have c1 ! 0 as (h, k) ! (0, 0). Applying mean value theorem on fx (c1 , ·)
above, we then get

(h, k) = fxy (c1 , c2 )hk

for some c2 between b and b + k, and as (h, k) ! 0 we have c2 ! b.


On the other hand, we also have

(h, k) = f (a + h, b + k) f (a, b + k) f (a + h, b) f (a, b)


= g2 (b + k) g2 (b)
= g20 (c2 ) ·k
= fy (a + h, c3 ) fy (a, c3 ) k

for some c3 between b and b + k, and as (h, k) ! 0 we have c3 ! b. Similarly, applying


mean value theorem on fy (·, c3 ) we get

(h, k) = fyx (c4 , c3 )hk

for some c4 between a and a + h, and as (h, k) ! 0 we have c4 ! a


Equating both sides, we then have

fxy (c1 , c2 ) = fyx (c4 , c2 )

for any (h, k) such that h 6= 0 and k 6= 0.


Since (0, 0) is a limit point of the set {(h, k) : h 6= 0 and k 6= 0}, we can let (h, k) !
(0, 0) on both sides of the above equation. Noting that fxy and fyx are continuous (as f is
C 2 ), and (c1 , c2 ) ! (a, b) and (c4 , c3 ) ! (a, b) as (h, k) ! (0, 0), we finally conclude that

fxy (a, b) = fyx (a, b)

as desired. ⇤

By induction, one can then conclude that if f is C k at a, then all partial derivatives
up to and including k-th order commute. For instance, for a C 4 function f (x, y, z) we
have

fxyy = fyyx = fyxy , fxyzx = fzxyx = fyzxx , etc.

As such, we typically work with C k multivariable functions instead k-order differentiable


functions. In many theorems in analysis, PDE, and related fields, the C k condition is
much more often used as the hypothesis in many theorems.

Exercise 3.15. Modify the proof of Clairaut’s Theorem, and prove a stronger result:
Suppose f (x, y) : ⌦ ⇢ R2 ! R is a function such that fx , fy and fxy exist on
some ball B" (a), and are all continuous at a, then fyx (a) exists and fxy (a) = fyx (a).

3.2.3. Taylor approximation. Similar to single-variable calculus, we can approxi-


mate higher-order differentiable functions by polynomials.
3.2. Higher-Order Differentiability 81

Proposition 3.19. If f : ⌦ ⇢ Rn ! R is k-times differentiable at a 2 ⌦, then as x ! a,


we have:
Xn n
@f 1 X @2f
f (x) = f (a) + (a)(xi ai ) + (a)(xj aj )(xk ak )
i=1
@xi 2! @xj @xk
j,k=1
+ ···
X n
1 @kf k
+ (a)(xi1 ai1 ) · · · (xik aik ) + o |x a| .
k! i ,··· ,i =1 @xik · · · @xi1
1 k

The k-degree polynomial


k
X n
X 1 @j f
Tk (x) := f (a) + (a)(xi1 ai1 ) · · · (xij a ij )
j=1 i1 ,··· ,ij =1
j! @xij · · · @xi1

is called the k-th degree Taylor approximation of f at a.

Proof. We prove the case k = 2 as an example, and leave the general case as an exercise
for readers. First consider a small ball B" (a) ⇢ ⌦. For each x 2 B" (a), we consider the
function g : [0, 1] ! R defined by
g(t) := f (1 t)a + tx ,
and the “error” function E(t) : [0, 1] ! R defined by
✓ ◆
g 00 (0) 2
E(t) := g(t) g(0) + g 0 (0)t + t .
2!
Note that both g and E depends on x even though we do not explicitly write them as
g(x, t) and E(x, t).
Using the chain rule, we can easily calculate that
Xn
@f
g 0 (t) = (xi ai )
i=1
@x i (1 t)a+tx
n
X
00 @2f
g (t) = (xi ai )(xj aj ).
i=1
@xj @xi (1 t)a+tx

Therefore, E(1) is given by


n n
!
X @f 1 X @2f
E(1) = f (x) f (a) + (xi ai ) + (xi ai )(xj aj ) ,
i=1
@x i a 2! i=1 @xj @xi a

which measures the error between f (x) and T2 (x). Therefore, our goal is to prove that
2
E(1) 2 o |x a| as x ! a.
Applying the Cauchy’s mean value theorem on functions E(t) and t2 , there exists
c 2 (0, 1) such that
E(1) E(0) E 0 (c) g 0 (c) g 0 (0) g 00 (0)c
= =) E(1) = .
12 02 2c 2c
This shows
E(1)
n
! n
1 X @f @f 1 X @2f
= (xi ai ) (xi ai )(xj aj ).
2c i=1 @xi (1 c)a+cx @xi a 2 i=1 @xj @xi a
82 3. Differentiability

@f
Next we consider the linear approximation of @xi at a (recall f is twice differentiable at
@f
a so @x i
is differentiable at a):
n
X
@f @f @2f
= + (1 c)aj + cxj aj + "((1 c)a + cx),
@xi (1 c)a+cx @xi a j=1
@xj @xi a

where "(x) is a function defined near a such that "(x) 2 o |x a| as x ! a. Substitut-


ing this back in and by cancellation of the second derivative terms, we get:
n
X
E(1) = " (1 c)a + cx (xi ai ).
i=1

Since
" (1 c)a + cx 2 o |(1 c)a + cx a| = o c |x a|  o |x a| ,
2
we conclude that E(1) 2 o |x a| , as desired. ⇤

Exercise 3.16. Complete the proof for any k 3.

Example 3.20. For a function f (x, y) : R2 ! R which is twice differentiable at (a, b), its
Taylor approximation at (a, b) is given by
f (x, y)
= f (a, b) + fx (a, b)(x a) + fy (a, b)(y b)
1
+ fxx (a, b)(x a)2 + fxy (a, b)(x a)(y b)
2!
+fyx (a, b)(y b)(x a) + fyy (a, b)(y b)2
+ o (x a)2 + (y b)2
as (x, y) ! (a, b). If we assume further that f is C 2 at (a, b), then the quadratic term can
be simplified as:
1
fxx (a, b)(x a)2 + 2fxy (a, b)(x a)(y b) + fyy (a, b)(y b)2
2!

Exercise 3.17. Write down the k-order Taylor approximation of a C k function


f : Rn ! R at a = (a1 , · · · , an ) when
(a) n = 3 and k = 2
(b) n = 2 and k = 3

The Taylor approximation of multivariable functions can be used to explain the


second derivative test in higher dimensions. For example, when a C 2 function f : R2 ! R
achieves an interior local maximum or minimum at (a, b), then we first have @f @x (a, b) =
@f
@y (a, b) = 0 as f also achieve local min/max at (a, b) when restricted to coordinate
directions. Therefore, the second-order Taylor approximation of f at (a, b) is given by
only the constant and quadratic terms:
f (x, y)
1
= f (a, b) + fxx (a, b)(x a)2 + 2fxy (a, b)(x a)(y b) + fyy (a, b)(y b)2
2!
+ o (x a)2 + (y b)2 .
3.2. Higher-Order Differentiability 83

Therefore, whether f achieves local maximum or minimum at (a, b) depends only whether
the quadratic form
Q(x, y) := fxx (a, b)(x a)2 + 2fxy (a, b)(x a)(y b) + fyy (a, b)(y b)2
is negative-definite or positive-definite respectively. Using the elementary concepts about
discriminant, we know that
• if fxx > 0 and fxy
2
fxx fyy < 0 at (a, b), then Q(x, y) is positive-definite, and so
f (x, y) f (a, b) near (a, b). This implies f achieves a local minimum at (a, b).
• if fxx < 0 and fxy 2
fxx fyy < 0 at (a, b), then Q(x, y) is negative-definite, and so
f (x, y)  f (a, b) near (a, b). This implies f achieves a local maximum at (a, b).
• if fxy
2
fxx fyy > 0 at (a, b), then Q can be positive at some point near (a, b) and
be negative at some other points near (a, b). In such case, the graph z = f (x, y)
behaves like a saddle near (a, b).
In higher dimensions, the quadratic approximation of a C 2 function f : R2 ! R at a
can be expressed as:
1
f (a) + rf (a) · (x a) + (x a)T · r2 f (a) · (x a)
2!
where r2 f (a) is the Hessian matrix3, which is an n ⇥ n matrix whose (i, j)-th entry is
2
given by @x@i @x
f
j
(a). It is a symmetric matrix by the commutativity of partial derivatives.
Therefore, the positivity/negativity of the quadratic term
(x a)T · r2 f (a) · (x a)
can be determined by the eigenvalues of r f (a). This is further discussed in the linear
2

algebra course.

3Note about notations: physicists often use r2 f for the Laplacian of f , which is the trace of the Hessian matrix, whereas
mathematicians (especially geometers) often use r2 f for the Hessian matrix.
84 3. Differentiability

3.3. Inverse and Implicit Function Theorems


3.3.1. Motivations. Suppose F : ⌦ ⇢ Rn ! Rn is a C 1 map on ⌦ satisfying:
(1) F is injective, and
(2) the image F (⌦) is open in Rn , and its inverse map F 1
: F (⌦) ! ⌦ is C 1 on F (⌦),
then applying the chain rule on:
F F 1
= id and F 1
F = id,
we get:
DFx · D(F 1
)F (x) = I and D(F 1
)F (x) · DFx = I
where I is the n ⇥ n identity matrix. Here we write DFx for DF (x) to avoid having too
many confusing brackets. Therefore, the matrices DFx and D(F 1 )F (x) are inverses of
each other. In particular, DFx is invertible if F satisfies both (1) and (2) above.
A natural question is that whether the converse is true, i.e. if DFx is invertible,
then whether F satisfies both (1) and (2) above. However, it is not true and here is a
counterexample:
Let ⌦ = {(r, ✓) : r > 0 and ✓ 2 R} which is an open set in R2 . Let F (r, ✓) : ⌦ ! R2
be defined by
F (r, ✓) = (r cos ✓, r sin ✓).
Then its Jacobian matrix DF is given by:

cos ✓ r sin ✓
DF (r, ✓) = ,
sin ✓ r cos ✓
which has determinant r > 0 so it is always invertible. However, it is clear that F is not
injective as F (r, ✓ + 2⇡) = F (r, ✓).
However, one can still claim that F has a “local inverse”: if DFa is invertible, one can
find a small ball B" (a) such that the restricted map F B" (a) : B" (a) is injective and has
C 1 inverse. This result is known as the inverse function theorem, which is an important
result not only in analysis but also in other fields especially in differential geometry and
differential topology.
One corollary of the inverse function theorem is the implicit function theorem.
Consider a curve such that x2 + xy 3 + y 4 = 0 in R2 . In single-variable calculus we
probably have learned how to find the slope of a tangent at a point (a, b) on the curve:
regard y as a “function” of x, and differentiate both sides of the equation x2 +xy 3 +y 4 = 0
with respect to x. It will yield:
dy dy
2x + y 3 + 3xy 2 + 4y 3 = 0.
dx dx
dy
One can then solve for dx by plugging in (a, b) into (x, y). However, this method – called
implicit differentiation – needs to be justified. First y may not genuinely a function of
x, and in fact it is not the case even for a simple curve like x2 + y 2 = 1. Also, even if y
is a function of x, one needs to justify why y is differentiable in order to make sense of
dy
dx . The implicit function theorem justifies that under certain condition, the method of
implicit differentiation is legitimate.

3.3.2. Banach contraction mapping theorem. The proof of the inverse function
theorem requires the use of an important tool in analysis, known as Banach contraction
mapping theorem. It is so an important tool that not only it is used to prove the inverse
function theorem, but also other results such as existence theorem of ODEs.
3.3. Inverse and Implicit Function Theorems 85

Theorem 3.21 (Banach contraction mapping). Let (X, d) be a complete metric space,
and f : X ! X be a map such that there exists a constant ↵ 2 (0, 1) such that
d f (x), f (y)  ↵d(x, y) 8x, y 2 X.
Then, there exists a unique x⇤ 2 X such that f (x⇤ ) = x⇤ .

Proof. Pick any x1 2 X, and define the sequence recursively by:


xn := f (xn 1) 8n 2.
We argue that {xn } is a Cauchy sequence in (X, d), and hence it converges to a certain
limit x⇤ 2 X. We then claim argue the limit x⇤ is the desired fixed point of f .
Observe that for any n 2, we have
d(xn+1 , xn ) = d f (xn ), f (xn 1)  ↵d(xn , xn 1 ).

By induction, one can see that


d(xn+1 , xn )  ↵n 1
d(x2 , x1 ).
Hence, by triangle inequality and sum of geometric series, {xn } is a Cauchy sequence in
(X, d) – we leave it as an exercise for readers. Note that we need to use the condition
0 < ↵ < 1 here.
Given that (X, d) is complete, {xn } converges to a limit x⇤ 2 X. Note that f is
continuous by the given condition
d f (x), f (y)  ↵d(x, y) 8x, y 2 X,
so f (xn 1 ) ! f (x⇤ ) as n ! 1. Also we have xn = f (xn 1 ) for any n 2, letting
n ! 1 on both sides we conclude that x⇤ = f (x⇤ ) as desired.
The uniqueness of x⇤ follows directly from the contraction condition. Suppose
x⇤⇤ = f (x⇤⇤ ) for some x⇤⇤ 2 X as well. By the contraction condition, we get
d f (x⇤ ), f (x⇤⇤ )  ↵d(x⇤ , x⇤⇤ ) =) d(x⇤ , x⇤⇤ )  ↵d(x⇤ , x⇤⇤ ).
Since 0 < ↵ < 1, the above can only happen when d(x⇤ , x⇤⇤ ) = 0, hence it concludes
that x⇤ = x⇤⇤ . ⇤

3.3.3. Inverse Function Theorem. For simplicity, from now on we will denote a
vector in Rn by simply x, y, a etc. whenever it is clear from the context that they are
vectors. Now we are ready to state the inverse function theorem, and will prove it using
Banach contraction mapping.

Theorem 3.22 (Inverse function theorem). Let F : ⌦ ⇢ Rn ! Rn be a C k 1 function


on ⌦. If det DF (a) 6= 0 at some a 2 ⌦, then there exists an open set U ⇢ ⌦ containing
a such that Fe := F U : U ! F (U ) is bijective, F (U ) is open in Rn , and the inverse map
Fe 1 : F (U ) ! U is C k .

Before we proceed to the proof, let’s first prove an analogue of “mean value theorem”
for vector-valued function, which can be helpful for estimating |F (x) F (y)| for any two
points x, y. Given an m ⇥ n matrix A whose (i, j)-th entry is denoted by aij , one can
define the “Euclidean norm” of A by
s X
kAk2 := (aij )2 .
1i,jn
86 3. Differentiability

One can check easily that k k2 is a norm on the vector space Mm⇥n (R) of m ⇥ n matrices.
Similar to Rn , one can also define the sup-norm of the matrix A by:
kAk1 := max |aij | .
1i,jn

We leave it as an exercise to prove that the following:

Exercise 3.18. For any n ⇥ n real matrix A, and a vector x 2 Rn , we have


|Ax|  kAk2 |x| .
Hint: Cauchy-Schwarz inequality.

Exercise 3.19. Prove that for any m ⇥ n real matrix A, we have


p
kAk1  kAk2  mn kAk1 .

Remark 3.23. By the above exercise result, one can use either norm interchangeably.
If we have a sequence of matrices {Aj }1 j=1 , then Aj converges to a matrix B with
respect to k k1 if and only if it does so with respect to k k2 -norm. Note that the latter is
also equivalent to saying each entry of Aj converges as a real number sequence to the
corresponding entry of B.

Lemma 3.24. Let G : ⌦ ⇢ Rn ! Rm be a C 1 function on a convex domain ⌦ in Rn .


Then, for any x, y 2 ⌦, we have
p
|G(x) G(y)|  mn sup kDG(z)k1 · |x y| ,
z2⌦

Proof. Consider the straight path (t) = (1 t)x + ty which connects x and y. The path
is contained in ⌦ by convexity. Write G(x) = (g1 (x), · · · , gm (x)), x = (x1 , · · · , xn ), and
y = (y1 , · · · , yn ), then by the single-variable mean-value theorem, we have for each j:
d(gj )
|gj (x) gj (y)| = |gj (0) gj (1)| = (s) |1 0|
dt
for some s 2 (0, 1). The chain rule applied to gj shows
Xn n
d(gj ) @gj d((1 t)xi + tyi ) X @gj
= = · (yi xi ).
dt i=1
@xi dt i=1
@xi

By Cauchy-Schwarz, we have
v v
u n 2u n
d(gj ) uX @gj uX p
(s)  t t 2
( (s)) |yi xi |  n kDG( (s))k1 |x y| .
dt i=1
@xi i=1

This proves for each j, we have


p p
|gj (x) gj (y)|  n kDG( (s))k1 |x y|  n sup kDG(z)k1 |x y| ,
z2⌦

which implies
v
uX
um 2 p
|G(x) G(y)| = t |gj (x) gj (y)|  mn sup kDG(z)k1 |x y| .
j=1 z2⌦


3.3. Inverse and Implicit Function Theorems 87

Proof of Inverse Function Theorem. Given that DF (a) is invertible, we will pick ",
sufficiently small such that any y 2 B" (F (a)) has a unique x 2 B /2 (a) so that F (x) = y.
That will show that F is injective when restricted on U := F 1 B" (F (a)) \ B /2 (a).
Step 1: Define a contraction map.
In order to apply the contraction mapping theorem, we consider for each y 2
B" (F (a)) the following map:
1
Ty (x) := x DF (a) (y F (x)).
We will show that by choosing " and sufficiently small, such a map Ty is a contraction
map from B (a) to itself. Then by Banach’s contraction mapping theorem, such Ty has
a unique fixed point x̄(y) 2 B (a), i.e. Ty (x̄(y)) = x̄(y). One can easily check that it
implies y = F (x̄(y)) as desired.
To verify that Ty is a contraction, we consider
1 1
|Ty (x1 ) Ty (x2 )| = x1 DF (a) F (x1 ) x2 + DF (a) F (x2 )
1
= DF (a) (DF (a)x1 F (x1 )) (DF (a)x2 F (x2 ))
1
 DF (a) 2
|G(x1 ) G(x2 )|
where G(x) := DF (a)·x F (x). Note that DG(x) = DF (a)I DF (x) = DF (a) DF (x).
Suppose x1 , x2 2 B (a), then we have by Lemma 3.24 that
|G(x1 ) G(x2 )|  n sup kDG(z)k1 |x1 x2 | .
z2B (a)

Substitute this back in, we get:


1
|Ty (x1 ) Ty (x2 )|  n DF (a) 2
sup kDF (a) DF (z)k1 |x1 x2 |
z2B (a)

for any x1 , x2 2 B (a). By continuity of the map Ty (x), the above inequality also holds
for any x1 , x2 2 B (a).
Since F is C 1 , each entry of DF (z) approaches to the corresponding entry of DF (a)
as z ! a. By Remark 3.23, kDF (z) DF (a)k1 ! 0 as z ! a. Hence, one can choose
> 0 small so that
1
kDF (a) DF (z)k1 
2n kDF (a) 1 k2
whenever z 2 B (a), and consequently we have
1
(3.2) |Ty (x1 ) Ty (x2 )| 
|x1 x2 | .
2
for any x1 , x2 2 B (a). This proves Ty : B (a) ! Rn is a contraction map.
In order to use the Banach’s Contraction Mapping Theorem, we will show Ty maps
B /2 (a) to itself. For that we need y being sufficiently close to F (a): for any x 2 B /2 (a),
we estimate that:
|Ty (x) a|  |Ty (x) Ty (a)| + |Ty (a) a|
1
 |x a| + DF (a) 1 (y F (a))
2
 + DF (a) 1 2 |y F (a)| .
4
Let " < 4kDF (a) 1 k , then when y 2 B" (F (a)), we have

|Ty (x)+ = . a| 
4 4 2
In other words, we have Ty (x) 2 B /2 (a) whenever y 2 B" F (a) and x 2 B /2 (a).
88 3. Differentiability

Now by the Banach’s contraction mapping applied to Ty : B /2 (a) ! B /2 (a)


where y 2 B" (F (a)), there exists a unique fixed point x̄ 2 B /2 (a) such that Ty (x̄) =
x̄, or equivalently, y = F (x̄). To summarize, we have proved that by taking U :=
F 1 (B" (F (a))) \ B /2 (a), then
Fe := F U
: U ! F (U )

is injective, and one can define an inverse Fe 1


: F (U ) ! U .
Step 2: Prove that F (U ) is open in R . n

For any y0 2 F (U ) ⇢ B" (F (a)), one can write it as y0 = F (x0 ) for some x0 2 U . As
U is open, there exists ✓ > 0 such that B✓ (x0 ) ⇢ U ⇢ B /2 (a). Also, as y0 2 B" (F (a)),
there exists ⌘ > 0 such that B⌘ (y0 ) ⇢ B" (F (a)).
By (3.2), we have for any x 2 B✓ (x0 ) and y 2 B⌘0 (y0 ) where ⌘ 0 2 (0, ⌘) is to be
chosen,
1
|Ty (x) Ty (x0 )|  |x x0 |
2
=) |Ty (x) Ty0 (x0 )|  |Ty (x) Ty (x0 )| + |Ty (x0 ) Ty0 (x0 )|
1
=) |Ty (x) x0 |  |x x0 | + |Ty (x0 ) x0 |
2
1
= |x x0 | + DF (a) 1 y F (x0 )
2
1
 |x x0 | + DF (a) 1 2 |y y0 |
2

 + DF (a) 1 2 ⌘ 0 .
2
n o
Therefore, if one chooses ⌘ 0 < min 2kDF (a)✓ 1 k +1 , ⌘ , then we have Ty (x) 2 B✓ (x0 )
2

for any x 2 B✓ (x0 ) and y 2 B⌘0 (y0 ). Then, Ty : B✓ (x0 ) ! B✓ (x0 ) is a contraction map
whenever y 2 B⌘0 (y0 ), and hence it has a fixed point x⇤ 2 B✓ (x0 ), which is a point such
that y = F (x⇤ ). Note that x⇤ 2 B✓ (x0 ) ⇢ U , it shows y 2 F (U ) whenever y 2 B⌘0 (y0 ).
In other words, B⌘0 (y0 ) ⇢ F (U ). This concludes that F (U ) is open.
Step 3: Prove that Fe 1 is continuous on F (U ).
We will see why it is needed later, although we will prove it is even differentiable on
F (U ). For that we use the contraction inequality (3.2) again. For any y1 , y2 2 F (U ), we
write yi = F (xi ) for some xi 2 U . Then, by (3.2), we get
1
|x1 x2 |
2
1
|Ty1 (x1 ) Ty1 (x2 )|
2
1
= x1 x2 + DF (a) 1 (y1 F (x2 ))
2
1
|x1 x2 | DF (a) 1 2 |y1 y2 | .
2
Therefore, we have
1
|x1 x2 |  2 DF (a) 2
|y1 y2 | ,
or in other words,
Fe 1
(y1 ) Fe 1
(y2 )  2 DF (a) 1
2
|y1 y2 | .

This shows Fe 1
is continuous on F (U ).
3.3. Inverse and Implicit Function Theorems 89

Step 4: Prove that Fe 1


is differentiable on F (U ) with Jacobian matrix given by (DF ) 1
.
Consider y0 = F (x0 ) 2 F (U ), and by bijectivity of Fe we have write every y 2 F (U )
as F (x), and by continuity of F and Fe 1 we have x ! x0 if and only if y ! y0 . Then
one can check that as y ! y0 (and hence as x ! x0 ), we have:
Fe 1
(y) Fe 1
(y0 ) DF (x0 ) 1
(y y0 )
= Fe 1
(F (x)) Fe 1
(F (x0 )) DF (x0 ) 1
(F (x) F (x0 ))
=x x0 DF (x0 ) 1
(DF (x0 )(x x0 ) + E(x)) where E(x) 2 o |x x0 |
1
= DF (x0 ) E(x) 2 o |x x0 | .

Hence Fe is differentiable at any y0 2 F (U ) where D(Fe


1 1
)(y0 ) = (DF (x0 )) 1
.
Step 5: Prove that Fe 1 is C k on F (U ).
The inverse function Fe 1 is C k because its partial derivatives are entries of (DF ) 1
by the previous step. By Crammer’s rule, each entry of (DF ) 1 is a rational function of
partial derivatives of F (which are C k 1 as F is C k ). Therefore, each entry of (DF ) 1 is
C k 1 too, so all the first-order partial derivatives of Fe 1 are all C k 1 , completing the
proof of that Fe 1 is C k . ⇤

Example 3.25. Consider the system of equations


x y2 = a
x2 + y + y 3 = b
Unlike in linear algebra, it is rather difficult to determine for what values of a and b the
system has solution. However, the inverse function theorem can be p used to give a partial
answer to this question. For instance, one can prove that when a2 + b2 is sufficiently
small, then the system always has a solution.
Define F (x, y) = (x y 2 , x2 + y + y 3 ), then F (0, 0) = (0, 0). The Jacobian matrix
DF (0, 0) is given by
 
1 2y 1 0
DF (0, 0) = =
2x 1 + 3y 2 (x,y)=(0,0)
0 1

which is clearly invertible.


Inverse function theorem asserts that there exist open sets U containing (0, 0) and
V containing F (0, 0) = (0, 0, such that F U : U ! V is bijective. In other words, for
any (a, b) 2 V , there exists (x0 , y0 ) 2 U such that F (x0 , y0 ) = (a, b). In other words, the
system is solvable when (a, b) 2 V .

Exercise 3.20. Find another pair (a0 , b0 ), apart from (0, 0), such that the system in
the above example is solvable when (a, b) is sufficiently close to (a0 , b0 ). Justify your
claim.

Example 3.26. Another typical use of inverse function theorem (especially in differential
geometry) is to prove that the local inverse is C k or smooth without actually finding
the inverse explicitly. For instance, consider the spherical coordinate map F (⇢, ✓, ) :
(0, 1) ⇥ (0, ⇡) ⇥ (0, 2⇡) ! R3 defined by:
F (⇢, ✓, ) = (⇢ sin cos ✓, ⇢ sin sin ✓, ⇢ cos ).
Using elementary trigonometry, it is not hard to prove that F is injective on the specific
domain. However, it is rather difficult to write down F 1 explicitly – at least it takes
90 3. Differentiability

different form in different octant, and checking continuity and differentiability of F 1

can be tedious. However, one can easily check that


det DF (⇢, ✓, ) = ⇢2 sin 6= 0
for any (⇢, ✓, ) in our domain of F . Inverse function theorem not only proves that a
local inverse exists (which is a redundant fact since we already know F 1 exists by the
injectivity of F ), but it also shows that the inverse F 1 is C 1 (meaning that it is C k for
any k 2 N) as F is C 1 .
3.3.4. Lagrange’s Multiplier. The inverse function theorem can also be used to
rigorously explain why the method of Lagrange’s multiplier in MATH 2023 works. Given
two C 1 functions f (x, y) and g(x, y), we want to maximize and minimize f subject to
the constraint g(x, y) = 0. In MATH 2023, we learn that we need to solve for (x, y, ) in
the system:
rf (x, y) = rg(x, y) or rg(x, y) = (0, 0)
g(x, y) = 0
Then, those solutions for (x, y) are candidate points of this optimization problem. One
could plug each of them into f to see which give the largest/smallest value. They are
then the maximum and minimum of f subject to the constraint g = 0.
In MATH 2023, you probably learned that the rational behind this method: at the
point (x0 , y0 ) at which f achieves its maximum or minimum under the the constraint
g = 0, the level curves of f and g passing through (x0 , y0 ) are parallel to each other. Since
rf (x0 , y0 ) and rg(x0 , y0 ) are perpendicular to the level sets of f and g respectively4,
the two gradient vectors must be parallel to each other, which is equivalent to saying that
rf = rg (or rg = 0).
The above the rather an “intuitive” explanation of the Lagrange’s multiplier method.
To rigorously justify that rf and rg are parallel at those candidate points, we need to
use the inverse function theorem too. We will argue that if rf and rg are linearly inde-
pendent at (x0 , y0 ), then f cannot achieve maximum or minimum under the constraint
g(x, y) = 0. To argue this, we define a map F (x, y) : R2 ! R2 by
F (x, y) = f (x, y), g(x, y) .
Note that F maps the level curves f = c, g = c into coordinate curves in the target R2 .
Suppose (x0 , y0 ) is a point on the level curve g = 0 such that rf (x0 , y0 ) and rg(x0 , y0 )
are linearly independent, then
" #
@f @f
(x 0 , y0 ) (x 0 , y0 )
DF (x0 , y0 ) = @x
@g
@y
@g
@x (x0 , y0 ) @y (x0 , y0 )

has linearly independent rows, and so is invertible. Inverse function theorem shows
there exists an open set U containing (x0 , y0 ), and an open set V containing F (x0 , y0 ) =
f (x0 , y0 ), 0 such that F U : U ! V is bijective.
Therefore, as (f (x0 , y0 ), 0) 2 V which is open, the line segment (f (x0 , y0 ) + t, 0) 2 V
too when " < t < " for some " > 0. The line segment in the set V correspond to the
curve (x(t), y(t)) in U such that
F (x(t), y(t)) = (f (x0 , y0 ) + t, 0) =) f (x(t), y(t)), g(x(t), y(t)) = (f (x0 , y0 ) + t, 0).
Therefore, the curve (t) := (x(t), y(t)) lies in the constraint g = 0, and when t 2 (0, "),
we have f ( (t)) = f (x0 , y0 )+t > f (x0 , y0 ); whereas when t 2 ( ", 0), we have f ( (t)) <
f (x0 , y0 ). This shows f achieves neither the maximum nor the minimum at (x0 , y0 ) even
along the curve (t) ⇢ {g = 0}.
4Remark: this can be justified by the implicit function theorem later.
3.3. Inverse and Implicit Function Theorems 91

3.3.5. Implicit Function Theorem. Now we address the issue about the method of
implicit differentiation as discussed in the beginning paragraphs of this section. Given a
C 1 function f (x, y), and consider the level set f (x, y) = 0. Typically, it is a curve in R2
and we want to determine the slope of tangent at a point (x0 , y0 ) to this curve, without
explicitly solving y in terms of x.
However, we need to justify whether y can really be regarded as a C 1 function of
x – at least locally near (x0 , y0 ), as the slope of tangent only depends on the part of the
curve locally near (x0 , y0 ). This can be justified by the inverse function theorem, and the
result is often called the implicit function theorem. We will discuss and prove the two
variable version, then we will state the general dimension version and leave the proof as
an exercise for readers.
We will prove that if f (x0 , y0 ) = 0 and @f
@y (x0 , y0 ) 6= 0, then the level curve f = 0
can be locally regarded as a C function of x near (x0 , y0 ). Precisely, there exists a C 1
1

function g(x) : (x0 ", x0 + ") ! R such that g(x0 ) = y0 and f (x, g(x)) = 0 for any
x 2 (x0 ", x0 + ").
We define the map F (x, y) : R2 ! R2 by
F (x, y) = x, f (x, y) .
This function maps the family of level curves {f = c}c2R into horizontal lines in the
target R2 . By checking its Jacobian, we get that:

1 0
DF (x0 , y0 ) = @f @f
@x (x ,
0 0y ) @y (x 0 , y0 )

whose determinant is given by @f @y (x0 , y0 ) 6= 0. Inverse function theorem asserts that there
exists an open set U containing (x0 , y0 ) and an open set V containing F (x0 , y0 ) = (x0 , 0)
such that F |U : U ! V is bijective, and F 1 : V ! U is C 1 . Then, each point
(x, y) 2 U \ {f = 0} must have a unique x-coordinate. To argue this, we suppose there
are points (x, y1 ) and (x, y2 ) in U \ {f = 0}. Then F (x, yi ) = (x, 0) as f (x, yi ) = 0 for
i = 1, 2. Since F is injective on U , we must have (x, y1 ) = (x, y2 ). This proves that the
level set {f = 0} is a graph of x inside U .
Next we construct the g(x) function so that f (x, y) = 0 is locally the graph y = g(x)
near (x0 , y0 ). Take " > 0 small so that B" (x0 , 0) ⇢ V , then for any x 2 (x0 ", x0 + "),
then point (x, 0) 2 B" (x0 , 0) ⇢ V , so it is in the domain of F 1 . The point F 1 (x, 0)
must be of the form (x, ⇤) for some unique value ⇤ (uniqueness follows from the above
paragraph). We denote this unique value by g(x), then 8x 2 (x0 ", x0 + "), we have
1
F (x, 0) = (x, g(x)) =) (x, 0) = x, f (x, g(x)) =) f (x, g(x)) = 0.
Therefore the part of the level set U \ {f = 0} can be written as the graph {(x, g(x)) :
x 2 (x0 ", x0 + ")}. As F 1 is C 1 , and (x, g(x)) = F 1 (x, 0), so g is C 1 as well. See
Figure ?? as a reference (to be added).
Remark 3.27. If we have @f @x (x0 , y0 ) 6= 0 instead, then one can argue that the level set
f = 0 is locally a graph of y near (x0 , y0 ). One may consider the map F (x, y) = (f (x, y), y)
instead and almost repeat the above argument.

Exercise 3.21. Prove the three-variable version of the implicit function theorem:
given a C 1 function f (x, y, z) : R3 ! R. Suppose (x0 , y0 , z0 ) is a point on the level
set {f = 0} and @f @z (x0 , y0 , z0 ) 6= 0. Prove that the level set {f = 0} is locally a graph
z = g(x, y) near (x0 , y0 , z0 ) for some C 1 function g(x, y) defined on a ball B" (x0 , y0 ),
i.e. f x, y, g(x, y) = 0.
92 3. Differentiability

From the implicit function theorem, one can prove that rf (p), if it is non-zero, is
perpendicular to the level set {f = c} at p. Consider a C 1 function f : R3 ! R, and
p is a point on the level set {f = c}. Suppose rf (p) 6= 0, then at least one of @f @x (p),
@f @f @f
@y (p) and @z (p) is non-zero. Without loss of generality, assume it is @z (p), then by
Exercise 3.21, the level set f = c can be locally regarded as a graph z = g(x, y) near p
for some C 1 function g(x, y), i.e. f x, y, g(x, y) = 0 for (x, y) near (x0 , y0 ). Here we let
p = (x0 , y0 , z0 ).
As f and g are C 1 , by differentiating f x, y, g(x, y) we get:
@ @f @f @g
0= f x, y, g(x, y) = (p) + (p) (x0 , y0 )
@x @x @z @x
@ @f @f @g
0= f x, y, g(x, y) = (p) + (p) (x0 , y0 )
@y @y @z @y
Rewrite these equations in vector form, we get:
✓ ◆
@g
rf (p) · 1, 0, (x0 , y0 ) = 0
@x
✓ ◆
@g
rf (p) · 0, 1, (x0 , y0 ) = 0.
@y
⇣ ⌘ ⇣ ⌘
@g @g
Note that 1, 0, @x (x0 , y0 ) and 0, 1, @y (x0 , y0 ) are tangent vectors of the graph z =
g(x, y) at p, which is locally coincide with the level set f = c near p. Therefore, rf (p)
is perpendicular to two of basis tangent vectors of f = c at p, or equivalently it is
perpendicular to the level set f = c at p.
Here we state the general (i.e. higher dimensions) versions of implicit function
theorem for vector-valued functions F : Rn ! Rm . Its proof is slightly modified from
the case f : R3 ! R. We again leave the proof as an exercise. The higher dimensional
case F : Rn ! Rm may not be of uses in this course, but it will become a handy tool in
differentiable geometry – such as for showing the intersection of two surfaces is a C 1
curve.

Theorem 3.28 (Implicit Function Theorem). Let F : ⌦ ⇢ Rn ! Rm be a C k function


with n > m. Express the points in Rn by x = (x1 , · · · , xm , xm+1 , · · · , xn ) 2 Rn , and
the components of F by F (x) = (f 1 (x), · · · , f m (x)). Suppose that there exists p =
(a1 , · · · , am , am+1 , · · · , an ) 2 Rn such that F (p) = 0 2 Rm , and
@(f 1 , · · · , f m )
det (p) 6= 0
@(x1 , · · · , xm )
Then, the level set {F = 0} is locally a graph of (xm+1 , · · · , xn ) near p. Precisely, there
exists a C k function g : U ⇢ Rm ! Rn m defined on an open set U in Rm containing
(a1 , · · · , am ) such that
F x1 , · · · , xm , g(x1 , · · · , xm ) = 0
for any (x1 , · · · , xm ) 2 U .

Exercise 3.22. Prove Theorem 3.28.

You might also like