Professional Documents
Culture Documents
Differentiability
Lev Landau
67
the n-th derivative f (n) (x0 ) exists would imply (by repeatedly applying L’Hopsital’s
rule) f (x) can be approximated by an n-th degree polynomial Tn (x), i.e. f (x) =
n
f (x0 ) + Tn (x) + o(|x x0 | ) as x ! x0 , then we must have:
f 0 (x0 ) f (n) (x0 )
Tn (x) = (x x0 ) + · · · + (x x0 ) n
1! n!
and we call Tn (x) the n-th degree Taylor polynomial for f (x) at x0 .
However, the converse is not true – there are functions that can be approximated by
n-th degree polynomial, where n 2, near x0 , but f (n) (x0 ) does not exist. The functions
is of the form xn Q , where
(
1 if x 2 Q
Q (x) = .
0 otherwise
@f @f
Exercise 3.1. Find the partial derivatives @x and @y where f (x, y) equals:
x y x x y y x y
xx , x x , xy , y x , xy , y x , y y , y y .
To find @f
@x at (1, 1), we may simply ignore the value of f at (0, 0) since derivatives of f
depend only on its value in a small open ball B" (1, 1) around (1, 1). We may simply
compute:
@f @ x2 y (x4 + y 2 ) · 2xy x2 y(4x3 )
(1, 1) = = = 0.
@x @x x4 + y 2 (1,1) (x4 + y 2 )2 (1,1)
@f
Similarly for @y at (1, 1).
Let’s demonstrate
p
how to compute the directional derivative from the definition.
Suppose u = ( 2 , 2 ), then
1 3
p
f (1, 1) + h(1/2, 3/2) f (1, 1)
Du f (1, 1) = lim
h!0 h
p
(1+h/2)2 (1+ 3h/2) 1
p
(1+h/2) +(1+ 3h/2)2
4 2
= lim
h!0 h
We leave it as an exercise for reader’s MATH 1013 friends to calculate the above limit.
However, when we handle the partial/directional derivatives at (0, 0), we then need
to be more careful, as f ((0, 0) + hu) and f (0, 0) are defined differently when h 6= 0.
When letting h ! 0, we regard that h 6= 0 yet very close to 0, so
(hu1 )2 (hu2 ) hu1 u2
f ((0, 0) + hu) = f (hu1 , hu2 ) = = 2 4 .
(hu1 )4 + (hu2 )2 h u1 + u22
Therefore, we have
hu1 u2
f (0 + hu) f (0) h2 u41 +u22
0 u1
Du f (0, 0) = lim = lim = when u2 6= 0.
h!0 h h!0 h u2
When u2 = 0, we then have f ((0, 0) + h(u1 , u2 )) = f (hu1 , 0) = 0 for any h. Therefore,
f (hu1 , 0) f (0, 0) 0 0
Du f (0, 0) = lim = lim = 0.
h!0 h h!0 h
@f
In other words, it means @x (0, 0) = 0.
In the above example, we see that f has directional derivatives (including partial
derivatives) along any unit directions u at the point (0, 0). However, it can be shown that
f is not even continuous at (0, 0).
Along the parabola y = x2 , which can be parametrized by (t) = (t, t2 ), we have
t4 1
lim f ( (t)) = lim= .
t!0 +t 4 2t!0 t4
Therefore, existence of directional derivatives in any unit direction does not imply
the function is continuous!
Exercise 3.2. Consider the function f : R2 ! R defined by
8 2
< x y if x + y 6= 0
f (x, y) = x + y 2
2 .
:
0 if x + y = 0
70 3. Differentiability
Find the directional derivative of f along any unit direction at the points (0, 0), (1, 0),
and (1, 1); or show that it does not exist in some directions.
Remark 3.4. Recall in MATH 2023 we defined the gradient vector of f as:
✓ ◆
@f @f
rf (a) = (a), · · · , (a) .
@x1 @xn
Using this notation, one can express the linear approximation of f by:
f (a) + rf (a) · (x a)
where · is the standard dot product in Rn .
Example 3.5. Let f (x, y) = |xy|. We will prove that f is differentiable at (0, 0). First by
observing that f (x, 0) = 0 and f (0, y) = 0 for any x, y 2 R, we conclude that
@f @f
(0, 0) = (0, 0) = 0.
@x @y
Next we work out the limit in Condition (2):
P2 @f
f (x) f (a) + i=1 @x i
(a)(xi ai )
lim
x!a |x a|
|xy| 0 0(x 0) 0(y 0)
= lim
x!a |x 0|
|xy|
= lim p .
(x,y)!(0,0) x2 + y 2
Note that
|xy| |x|
p =p · |y| |y| ! 0 as (x, y) ! (0, 0),
x2 + y2 x2 + y 2
we conclude that condition (2) holds, and hence f is differentiable at (0, 0).
Example 3.6. Let’s revisit the function f : R2 ! R we defined earlier:
( 2
x y
4 2 if (x, y) 6= (0, 0)
f (x, y) = x +y .
0 if (x, y) = (0, 0)
@f @f
We have already proved that @x (0, 0) = @y (0, 0) = 0. Let’s check if condition (2) holds:
@f @f
f (x, y) f (0, 0) @x (0, 0)(x 0) @y (0, 0)(y 0)
lim p
(x,y)!(0,0) x2 + y 2
x2 y
x4 +y 2
= lim p .
(x,y)!(0,0) x2 + y 2
Along the parabolic path (x, y) = (t, t2 ), we have
x2 y t2 ·t2
x4 +y 2 t4 +t4 1
p =p = p ! +1 as t ! 0.
2
x +y 2 2 |t| 2 2 |t|
Hence the limit does not exist, and so f is not differentiable at (0, 0).
Exercise 3.4. Denote the unit circle in R2 by @B1 (0) := {x 2 R2 : |x| = 1}. Let
g : @B1 (0) ! R be a continuous function on @B1 (0) such that g(0, 1) = g(1, 0) = 0,
and g( x) = g(x) for any x 2 @B1 (0). Define f : R2 ! R by
( ⇣ ⌘
x
|x| · g |x| if x 6= 0
f (x) =
0 if x = 0
(a) Determine whether (or under what condition) the directional derivative Du f (0)
exists, where u is a unit vector in R2 .
(b) Determine whether (or under what condition) f is differentiable at 0.
2
Exercise 3.5. Suppose f : Rn ! R is a function satisfying |f (x)| |x| for any
x 2 Rn . Show that f is differentiable at 0.
72 3. Differentiability
is called the Jacobian matrix of F at (a, b), which is also commonly denoted by DF (a, b).
In higher dimensions, we can rewrite the definition of differentiability for a function
F : ⌦ ⇢ Rn ! Rm as follows: F is differentiable at a 2 ⌦ if
F (x) = F (a) + DF (a) · (x a) + h(x)
where h(x) 2 o(|x a|) as x ! a. Here F = (f 1 , · · · , f m ), and DF (a) is the Jacobian
@f i
matrix of F at a whose (i, j)-th component is @x j
(a). The vector x a is regarded as a
column vector, and · means the matrix-vector product. We say a vector-valued function
h(x) is of o(|x a|) if each of its component is of o(|x a|). The (affine) linear map
x 7! F (a) + DF (a) · (x a) is called the linear approximation of F at a.
3.1.4. C 1 functions. Although existence of partial derivatives does not imply dif-
ferentiability of multivariable functions, if we impose further conditions on the partial
derivatives, namely continuity, then it would imply differentiability.
@f
Example 3.8. The function f (x, y) = |xy| is not C 1 at (0, 0), because @x does not exist
at any (0, b) where b 6= 0:
@f f (0 + t, b) f (0, b) |t|
(0, b) = lim = lim · |b| .
@x t!0 t 0 t!0 t
Any open ball B" (0, 0) center at (0, 0) must include some points (0, b), b 6= 0, so condition
(1) fails.
3.1. Linear Approximation 73
depend only on its values on this ball, and we can simply take partial derivatives as usual:
@f (x2 + y 2 )(2xy) x2 y(2y) 2xy(x2 + y 2 xy)
= =
@x (x2 + y 2 )2 (x2 + y 2 )2
2 2 2 2
@f (x + y )(x ) x y(2x) x (x y)2
2
= =
@y (x2 + y 2 )2 (x2 + y 2 )2
As the partial derivatives are rational functions and the denominator is non-zero on
R2 , they are continuous on R2 . Therefore, f is C 1 on R2 .
Now we discuss the case when (a, b) 2 , i.e. a = b, as f is defined differently at
(a, b) and at (x, y) 62 , we need to find its partial derivatives from the definition:
@f f (a + t, b) f (a, b) f (a + t, a) f (a, a)
(a, b) = lim = lim
@x t!0 t 0 t!0 t 0
(a+t)2 ( a) (
(a+t)2 +a2 0 does not exist if a 6= 0
= lim =
t!0 t 0 if a = 0
@f f (a, a + t) f (a, a)
(a, b) = lim
@y t!0 t 0
a2 ( a+t) (
a2 +( a+t)2 does not exist if a 6= 0
= lim =
t!0 t 0 0 if a = 0
Therefore, partial derivatives exist at (a, a) only when a = 0. In particular, f is not C 1
at any (a, a) with a 6= 0.
We are left to determine whether f is C 1 at (0, 0), i.e. whether @f @f
@x and @y are
continuous at (0, 0). It can be easily seen that along the two paths (x, y) = (t, t) and
(x, y) = (t, t), the limits of @f
@y are different as t ! 0:
@f t2 (t t)2
= lim =0
@y t!0 (t2 + t2 )2
(x,y)=(t,t)
@f t2 (2t)2
= lim = 1.
@y t!0 (t2 + ( t)2 )2
(x,y)=(t, t)
@f
Therefore, @y is not continuous at (0, 0), and so f is not C 1 . Also, we can argue that
by saying that any open ball B" (0, 0) must contain some points (a, a), where a 6= 0.
Both @f @f
@x and @y do not exist at those points, so f is not C at (0, 0).
1
The reason why we discuss C 1 functions is because they are also differentiable (but
not vice versa).
Proof. Because both differentiability and continuity conditions can be verified component-
wise, it suffices to consider only the case of scalar functions f : ⌦ ⇢ Rn ! R. For
simplicity, we give the proof when n = 2. The higher dimensional case is similar – just
74 3. Differentiability
that we have more variables and terms to write down – we leave it as an exercise for
readers.
Consider f (x, y) : ⌦ ⇢ R2 ! R and (a, b) 2 ⌦. If f is C 1 at (a, b), then @f @f
@x and @y
exist on some open ball B" (a, b) , and they are continuous at (a, b). The key idea of
the proof is to make use of the mean-value theorem in single-variable calculus. First we
rewrite:
f (x, y) f (a, b) = f (x, y) f (x, b) + f (x, b) f (a, b)
@f @f
= (y b) + (x a)
@y (x,u) @x (v,b)
where u is between y and b (and it could depend on x too), and v is between x and a. As
f is C 1 at (a, b), both @f @f
@x and @y are continuous at (a, b). As (x, y) ! (a, b), by squeeze
theorem we have (x, u) ! (a, b) and (v, b) ! (a, b) too. Therefore, by continuity we get
@f @f @f @f
lim = and lim = .
(x,y)!(a,b) @x (x,u) @x (a,b) (x,y)!(a,b) @y (v,b) @y (a,b)
Since o(x a) 2 o |(x, y) (a, b)| as o(y b) 2 o |(x, y) (a, b)| , we conclude that
@f @f
f (x, y) = f (a, b) + (x a) + (y b) + o |(x, y) (a, b)|
@x (a,b) @y (a,b)
as desired. ⇤
Exercise 3.7. Write down the proof of the Proposition 3.10 for any n 2 N.
(b) Denote the (i, j)-th entry of A by aij . Show that |aij | kAk for any i, j.
(c) Show that there q
exists a constant Cm,n > 0, depending only on m and n, such
P
that kAk Cm,n 2
all i,j aij
Remark 3.11. Note that the function f (x, y) = |xy| is differentiable at (0, 0), but it is
not C 1 at (0, 0). Therefore, the converse of Proposition 3.10 does not hold.
Proof. The key idea is to write F and G into linear approximation forms, i.e. there exist
functions h : B⌘ (a) ⇢ Rn ! Rm and k : B⇢ F (a) ⇢ Rm ! Rk be defined near a and
F (a) respectively, such that:
F (x) = F (a) + DF (a) · (x a) + h(x)
G(y) = G F (a) + DG F (a) · y F (a) + k(y)
where h(x) 2 o |x a| as x ! a, and k(y) 2 o |y F (a)| as y ! F (a). Since
F (x) ! F (a) as x ! a, by substituting F (x) into y, we have that when x is sufficiently
close to a,
G F (x) = G F (a) + DG F (a) · F (a) + DF (a) · (x a) + h(x) F (a) + k F (x)
= G F (a) + DG F (a) · DF (a) · (x a) + DG F (a) · h(x) + k F (x) .
Our remaining task is to prove that DG F (a) · h(x) + k F (x) 2 o |x a| as x ! a.
Since DG F (a) is a constant matrix, we have DG F (a) · h(x) 2 o |x a| . For
the term k F (x) : since k(y) 2 o |y F (a)| , for any " > 0, there exists > 0 such
that if |y F (a)| < , we have
|k(y)| "0 |y F (a)| .
Here " > 0 is to be chosen. Furthermore, by F (x) ! F (a) as x ! a, there exists ✓ > 0
0
sufficiently small such that if |x a| < ✓, we have |F (x) F (a)| < , and so from the
above we have
k F (x) "0 |F (x) F (a)| = "0 |DF (a) · (x a) + h(x)|
then we have proven that for any " > 0, there exists ✓00 > 0 such that if |x a| < ✓00 ,
then k F (x) " |x a|. In other words, we conclude that k F (x) 2 o |x a| as
x ! a, completing the proof. ⇤
In terms of partial derivatives, the chain rule, say in two dimensions, can be written
in the following form. Let F (x, y) = (u(x, y), v(x, y)), and G(u, v) = (w(u, v), z(u, v)),
then the chain rule for (w, z) = G F (x, y) is
" # " #
@w @w @w @w @u @u
@x @y @u @v @x @y
@z @z = @z @z @v @v .
@x @y @u @v @x @y
In particular, we have
@w @w @u @w @v
= +
@x @u @x @v @x
and similarly for other combinations and also in higher dimensions. It is exactly what we
have learned in MATH 2023.
Exercise 3.10. Show that the directional derivative Du f (a) of a differentiable
function f : ⌦ ⇢ Rn ! R can be written as the single-variable derivative:
d
Du f (a) = f (a + tu).
dt t=0
Hence, prove using the chain rule that
Du f (a) = rf (a) · u.
Exercise 3.11. Using the chain rule, generalize the result of Exercise 3.8 so that the
domain of F can be any convex open set ⌦ ⇢ Rn , not just open balls. Hint: for any
x, y 2 ⌦, construct a straight-path joining x and y.
3.2. Higher-Order Differentiability 77
exists, then we say f is twice differentiable, and call f 00 the second-order derivative of f .
Inductively, we say f is n-times differentiable if f is (n 1)-times differentiable and its
(n 1)-order derivative f (n 1) is differentiable, and we denote f (n) = dx d (n 1)
f . In this
case we call f (n)
the n-th order derivative, and we say f is n-times differentiable.
In MATH 1023, we saw that if f (n) exists at x = a, then f can be approximated by
an n-th degree polynomial Tn (x) in a sense that
f (x) = Tn (x) + o (x a)n as x ! a.
Note that the the converse is not true (see footnote remark2).
Now for multivariable calculus, we could also take second-order partial derivatives
by differentiating the first partial derivatives. For instance, for a function f (x, y) : ⌦ ⇢
R2 ! R, we can define
✓ ◆ ✓ ◆
@2f @ @f @2f @ @f
:= :=
@x2 @x @x @x@y @x @y
✓ ◆ ✓ ◆
@2f @ @f @2f @ @f
:= :=
@y@x @y @x @y 2 @y @y
2 2
whenever they exist. Alternatively, we may denote @@xf2 by fxx , and @x@y@ f
by fyx , etc.
Although fyx and fxy are defined in different ways, we will later prove that they are the
same if fxy and fyx are continuous. However, let’s assume they are different for now
until that result is proved.
Generally speaking, we denote:
@kf @ @ @ @f
:= ···
@xik @xik 1
· · · @xi1 @xik @xik 1
@xi2 @xi1
and call it a k-order partial derivatives. It can also be denoted as fxi1 xi2 ···xik .
Exercise 3.12. For a function f : Rn ! R, how many different k-th order partial
derivatives are there?
2In this course, we define “f is n-times/order differentiable” if the n-order derivative f (n) exists. Some author defines
that to mean f can be approximated by an n-th degree polynomial. Note that they are not equivalent.
78 3. Differentiability
Remark 3.14. In order for the k-th order partial derivatives to exist at a, it is necessary
that all (k 1)-th order partial derivatives exist at an open ball B" (a) centered at a. This
condition is implicitly assumed whenever we say k-th order partial derivatives exist at a.
Recall that for first-order derivatives we have a stronger condition called C 1 (which
also requires the first partial derivatives are continuous). In higher-order, we also have a
stronger notion called C k :
Example 3.17. Again, the converse to the above corollary is not true. Here is a coun-
terexample:
✓Z x ◆ ✓Z y ◆
f (x, y) = |s| ds |t| dt .
0 0
We will show that f is twice differentiable at (0, 0), but is not C 2 at (0, 0).
By Fundamental Theorem of Calculus (note that the absolute-value function is
continuous), we have
Z y Z x
@f @f
= |x| |t| dt = |y| |s| ds.
@x 0 @y 0
@2f
As x 7! |x| is not differentiable at any a 6= 0, the second partial derivative does not
@x2
exist at any (0, b) where b 6= 0. This already shows f is not C at (0, 0), as any small open
2
However, one can still prove that f is twice differentiable at (0, 0). We first compute
the second partial derivatives:
@f @f
@2f @x (h, 0) @x (0, 0) 0 0
(0, 0) = lim = lim =0
@x2 h!0 h h!0 h
@f @f
@2f @x (0, h) @x (0, 0) 0 0
(0, 0) = lim = lim = 0.
@y@x h!0 h h!0 h
@2f @2f
Similarly, one can also show @y 2 (0, 0) = @x@y (0, 0) = 0. Now we are left to check that:
@f @f @2f @2f
@x (x, y) @x (0, 0) @x2 (0, 0) · (x 0) @y@x (0, 0) · (y 0)
lim p
(x,y)!(0,0) x2 + y 2
Ry
|x| 0 |t| dt
= lim p =0
(x,y)!(0,0) x2 + y 2
Z y
|x|
by the facts that p 1 and |t| dt = 0. Hence, we conclude @f
lim @x is
x2 + y 2 (x,y)!(0,0) 0
Exercise 3.14. Determine whether or not the function in Exercise 3.13 is twice-
differentiable or not.
The C k condition of a multivariable function is easier to be verified as it only involves
partial derivatives. Therefore, many theorems in analysis in higher dimensions assume
the function is C k instead of being k-times differentiable (even though the former is
more restrictive).
One benefit of being C k is that the partial derivatives up to k-order commute, e.g
fxyx = fyxx = fxxy for a C 3 function f .
Proof. It suffices to prove the case n = 2, as the theorem only involves two directions xi
and xj . Also, for simplicity we denote xi by x, xj by y, and a = (a, b). We need to prove
@2f @2f
that (a, b) = (a, b).
@x@y @y@x
Denote (h, k) := f (a + h, b + k) f (a + h, b) f (a, b + k) + f (a, b). We first apply
the mean value theorem on in two different ways. Define:
g1 (x) := f (x, b + k) f (x, b)
g2 (y) := f (a + h, y) f (a, y)
We apply the mean value theorem by grouping the terms as:
(h, k) = f (a + h, b + k) f (a + h, b) f (a, b + k) f (a, b)
= g1 (a + h) g1 (a)
= g10 (c1 ) ·h
= fx (c1 , b + k) fx (c1 , b) h
80 3. Differentiability
as desired. ⇤
By induction, one can then conclude that if f is C k at a, then all partial derivatives
up to and including k-th order commute. For instance, for a C 4 function f (x, y, z) we
have
Exercise 3.15. Modify the proof of Clairaut’s Theorem, and prove a stronger result:
Suppose f (x, y) : ⌦ ⇢ R2 ! R is a function such that fx , fy and fxy exist on
some ball B" (a), and are all continuous at a, then fyx (a) exists and fxy (a) = fyx (a).
Proof. We prove the case k = 2 as an example, and leave the general case as an exercise
for readers. First consider a small ball B" (a) ⇢ ⌦. For each x 2 B" (a), we consider the
function g : [0, 1] ! R defined by
g(t) := f (1 t)a + tx ,
and the “error” function E(t) : [0, 1] ! R defined by
✓ ◆
g 00 (0) 2
E(t) := g(t) g(0) + g 0 (0)t + t .
2!
Note that both g and E depends on x even though we do not explicitly write them as
g(x, t) and E(x, t).
Using the chain rule, we can easily calculate that
Xn
@f
g 0 (t) = (xi ai )
i=1
@x i (1 t)a+tx
n
X
00 @2f
g (t) = (xi ai )(xj aj ).
i=1
@xj @xi (1 t)a+tx
which measures the error between f (x) and T2 (x). Therefore, our goal is to prove that
2
E(1) 2 o |x a| as x ! a.
Applying the Cauchy’s mean value theorem on functions E(t) and t2 , there exists
c 2 (0, 1) such that
E(1) E(0) E 0 (c) g 0 (c) g 0 (0) g 00 (0)c
= =) E(1) = .
12 02 2c 2c
This shows
E(1)
n
! n
1 X @f @f 1 X @2f
= (xi ai ) (xi ai )(xj aj ).
2c i=1 @xi (1 c)a+cx @xi a 2 i=1 @xj @xi a
82 3. Differentiability
@f
Next we consider the linear approximation of @xi at a (recall f is twice differentiable at
@f
a so @x i
is differentiable at a):
n
X
@f @f @2f
= + (1 c)aj + cxj aj + "((1 c)a + cx),
@xi (1 c)a+cx @xi a j=1
@xj @xi a
Since
" (1 c)a + cx 2 o |(1 c)a + cx a| = o c |x a| o |x a| ,
2
we conclude that E(1) 2 o |x a| , as desired. ⇤
Example 3.20. For a function f (x, y) : R2 ! R which is twice differentiable at (a, b), its
Taylor approximation at (a, b) is given by
f (x, y)
= f (a, b) + fx (a, b)(x a) + fy (a, b)(y b)
1
+ fxx (a, b)(x a)2 + fxy (a, b)(x a)(y b)
2!
+fyx (a, b)(y b)(x a) + fyy (a, b)(y b)2
+ o (x a)2 + (y b)2
as (x, y) ! (a, b). If we assume further that f is C 2 at (a, b), then the quadratic term can
be simplified as:
1
fxx (a, b)(x a)2 + 2fxy (a, b)(x a)(y b) + fyy (a, b)(y b)2
2!
Therefore, whether f achieves local maximum or minimum at (a, b) depends only whether
the quadratic form
Q(x, y) := fxx (a, b)(x a)2 + 2fxy (a, b)(x a)(y b) + fyy (a, b)(y b)2
is negative-definite or positive-definite respectively. Using the elementary concepts about
discriminant, we know that
• if fxx > 0 and fxy
2
fxx fyy < 0 at (a, b), then Q(x, y) is positive-definite, and so
f (x, y) f (a, b) near (a, b). This implies f achieves a local minimum at (a, b).
• if fxx < 0 and fxy 2
fxx fyy < 0 at (a, b), then Q(x, y) is negative-definite, and so
f (x, y) f (a, b) near (a, b). This implies f achieves a local maximum at (a, b).
• if fxy
2
fxx fyy > 0 at (a, b), then Q can be positive at some point near (a, b) and
be negative at some other points near (a, b). In such case, the graph z = f (x, y)
behaves like a saddle near (a, b).
In higher dimensions, the quadratic approximation of a C 2 function f : R2 ! R at a
can be expressed as:
1
f (a) + rf (a) · (x a) + (x a)T · r2 f (a) · (x a)
2!
where r2 f (a) is the Hessian matrix3, which is an n ⇥ n matrix whose (i, j)-th entry is
2
given by @x@i @x
f
j
(a). It is a symmetric matrix by the commutativity of partial derivatives.
Therefore, the positivity/negativity of the quadratic term
(x a)T · r2 f (a) · (x a)
can be determined by the eigenvalues of r f (a). This is further discussed in the linear
2
algebra course.
3Note about notations: physicists often use r2 f for the Laplacian of f , which is the trace of the Hessian matrix, whereas
mathematicians (especially geometers) often use r2 f for the Hessian matrix.
84 3. Differentiability
3.3.2. Banach contraction mapping theorem. The proof of the inverse function
theorem requires the use of an important tool in analysis, known as Banach contraction
mapping theorem. It is so an important tool that not only it is used to prove the inverse
function theorem, but also other results such as existence theorem of ODEs.
3.3. Inverse and Implicit Function Theorems 85
Theorem 3.21 (Banach contraction mapping). Let (X, d) be a complete metric space,
and f : X ! X be a map such that there exists a constant ↵ 2 (0, 1) such that
d f (x), f (y) ↵d(x, y) 8x, y 2 X.
Then, there exists a unique x⇤ 2 X such that f (x⇤ ) = x⇤ .
3.3.3. Inverse Function Theorem. For simplicity, from now on we will denote a
vector in Rn by simply x, y, a etc. whenever it is clear from the context that they are
vectors. Now we are ready to state the inverse function theorem, and will prove it using
Banach contraction mapping.
Before we proceed to the proof, let’s first prove an analogue of “mean value theorem”
for vector-valued function, which can be helpful for estimating |F (x) F (y)| for any two
points x, y. Given an m ⇥ n matrix A whose (i, j)-th entry is denoted by aij , one can
define the “Euclidean norm” of A by
s X
kAk2 := (aij )2 .
1i,jn
86 3. Differentiability
One can check easily that k k2 is a norm on the vector space Mm⇥n (R) of m ⇥ n matrices.
Similar to Rn , one can also define the sup-norm of the matrix A by:
kAk1 := max |aij | .
1i,jn
Remark 3.23. By the above exercise result, one can use either norm interchangeably.
If we have a sequence of matrices {Aj }1 j=1 , then Aj converges to a matrix B with
respect to k k1 if and only if it does so with respect to k k2 -norm. Note that the latter is
also equivalent to saying each entry of Aj converges as a real number sequence to the
corresponding entry of B.
Proof. Consider the straight path (t) = (1 t)x + ty which connects x and y. The path
is contained in ⌦ by convexity. Write G(x) = (g1 (x), · · · , gm (x)), x = (x1 , · · · , xn ), and
y = (y1 , · · · , yn ), then by the single-variable mean-value theorem, we have for each j:
d(gj )
|gj (x) gj (y)| = |gj (0) gj (1)| = (s) |1 0|
dt
for some s 2 (0, 1). The chain rule applied to gj shows
Xn n
d(gj ) @gj d((1 t)xi + tyi ) X @gj
= = · (yi xi ).
dt i=1
@xi dt i=1
@xi
By Cauchy-Schwarz, we have
v v
u n 2u n
d(gj ) uX @gj uX p
(s) t t 2
( (s)) |yi xi | n kDG( (s))k1 |x y| .
dt i=1
@xi i=1
which implies
v
uX
um 2 p
|G(x) G(y)| = t |gj (x) gj (y)| mn sup kDG(z)k1 |x y| .
j=1 z2⌦
⇤
3.3. Inverse and Implicit Function Theorems 87
Proof of Inverse Function Theorem. Given that DF (a) is invertible, we will pick ",
sufficiently small such that any y 2 B" (F (a)) has a unique x 2 B /2 (a) so that F (x) = y.
That will show that F is injective when restricted on U := F 1 B" (F (a)) \ B /2 (a).
Step 1: Define a contraction map.
In order to apply the contraction mapping theorem, we consider for each y 2
B" (F (a)) the following map:
1
Ty (x) := x DF (a) (y F (x)).
We will show that by choosing " and sufficiently small, such a map Ty is a contraction
map from B (a) to itself. Then by Banach’s contraction mapping theorem, such Ty has
a unique fixed point x̄(y) 2 B (a), i.e. Ty (x̄(y)) = x̄(y). One can easily check that it
implies y = F (x̄(y)) as desired.
To verify that Ty is a contraction, we consider
1 1
|Ty (x1 ) Ty (x2 )| = x1 DF (a) F (x1 ) x2 + DF (a) F (x2 )
1
= DF (a) (DF (a)x1 F (x1 )) (DF (a)x2 F (x2 ))
1
DF (a) 2
|G(x1 ) G(x2 )|
where G(x) := DF (a)·x F (x). Note that DG(x) = DF (a)I DF (x) = DF (a) DF (x).
Suppose x1 , x2 2 B (a), then we have by Lemma 3.24 that
|G(x1 ) G(x2 )| n sup kDG(z)k1 |x1 x2 | .
z2B (a)
for any x1 , x2 2 B (a). By continuity of the map Ty (x), the above inequality also holds
for any x1 , x2 2 B (a).
Since F is C 1 , each entry of DF (z) approaches to the corresponding entry of DF (a)
as z ! a. By Remark 3.23, kDF (z) DF (a)k1 ! 0 as z ! a. Hence, one can choose
> 0 small so that
1
kDF (a) DF (z)k1
2n kDF (a) 1 k2
whenever z 2 B (a), and consequently we have
1
(3.2) |Ty (x1 ) Ty (x2 )|
|x1 x2 | .
2
for any x1 , x2 2 B (a). This proves Ty : B (a) ! Rn is a contraction map.
In order to use the Banach’s Contraction Mapping Theorem, we will show Ty maps
B /2 (a) to itself. For that we need y being sufficiently close to F (a): for any x 2 B /2 (a),
we estimate that:
|Ty (x) a| |Ty (x) Ty (a)| + |Ty (a) a|
1
|x a| + DF (a) 1 (y F (a))
2
+ DF (a) 1 2 |y F (a)| .
4
Let " < 4kDF (a) 1 k , then when y 2 B" (F (a)), we have
|Ty (x)+ = . a|
4 4 2
In other words, we have Ty (x) 2 B /2 (a) whenever y 2 B" F (a) and x 2 B /2 (a).
88 3. Differentiability
For any y0 2 F (U ) ⇢ B" (F (a)), one can write it as y0 = F (x0 ) for some x0 2 U . As
U is open, there exists ✓ > 0 such that B✓ (x0 ) ⇢ U ⇢ B /2 (a). Also, as y0 2 B" (F (a)),
there exists ⌘ > 0 such that B⌘ (y0 ) ⇢ B" (F (a)).
By (3.2), we have for any x 2 B✓ (x0 ) and y 2 B⌘0 (y0 ) where ⌘ 0 2 (0, ⌘) is to be
chosen,
1
|Ty (x) Ty (x0 )| |x x0 |
2
=) |Ty (x) Ty0 (x0 )| |Ty (x) Ty (x0 )| + |Ty (x0 ) Ty0 (x0 )|
1
=) |Ty (x) x0 | |x x0 | + |Ty (x0 ) x0 |
2
1
= |x x0 | + DF (a) 1 y F (x0 )
2
1
|x x0 | + DF (a) 1 2 |y y0 |
2
✓
+ DF (a) 1 2 ⌘ 0 .
2
n o
Therefore, if one chooses ⌘ 0 < min 2kDF (a)✓ 1 k +1 , ⌘ , then we have Ty (x) 2 B✓ (x0 )
2
for any x 2 B✓ (x0 ) and y 2 B⌘0 (y0 ). Then, Ty : B✓ (x0 ) ! B✓ (x0 ) is a contraction map
whenever y 2 B⌘0 (y0 ), and hence it has a fixed point x⇤ 2 B✓ (x0 ), which is a point such
that y = F (x⇤ ). Note that x⇤ 2 B✓ (x0 ) ⇢ U , it shows y 2 F (U ) whenever y 2 B⌘0 (y0 ).
In other words, B⌘0 (y0 ) ⇢ F (U ). This concludes that F (U ) is open.
Step 3: Prove that Fe 1 is continuous on F (U ).
We will see why it is needed later, although we will prove it is even differentiable on
F (U ). For that we use the contraction inequality (3.2) again. For any y1 , y2 2 F (U ), we
write yi = F (xi ) for some xi 2 U . Then, by (3.2), we get
1
|x1 x2 |
2
1
|Ty1 (x1 ) Ty1 (x2 )|
2
1
= x1 x2 + DF (a) 1 (y1 F (x2 ))
2
1
|x1 x2 | DF (a) 1 2 |y1 y2 | .
2
Therefore, we have
1
|x1 x2 | 2 DF (a) 2
|y1 y2 | ,
or in other words,
Fe 1
(y1 ) Fe 1
(y2 ) 2 DF (a) 1
2
|y1 y2 | .
This shows Fe 1
is continuous on F (U ).
3.3. Inverse and Implicit Function Theorems 89
Exercise 3.20. Find another pair (a0 , b0 ), apart from (0, 0), such that the system in
the above example is solvable when (a, b) is sufficiently close to (a0 , b0 ). Justify your
claim.
Example 3.26. Another typical use of inverse function theorem (especially in differential
geometry) is to prove that the local inverse is C k or smooth without actually finding
the inverse explicitly. For instance, consider the spherical coordinate map F (⇢, ✓, ) :
(0, 1) ⇥ (0, ⇡) ⇥ (0, 2⇡) ! R3 defined by:
F (⇢, ✓, ) = (⇢ sin cos ✓, ⇢ sin sin ✓, ⇢ cos ).
Using elementary trigonometry, it is not hard to prove that F is injective on the specific
domain. However, it is rather difficult to write down F 1 explicitly – at least it takes
90 3. Differentiability
has linearly independent rows, and so is invertible. Inverse function theorem shows
there exists an open set U containing (x0 , y0 ), and an open set V containing F (x0 , y0 ) =
f (x0 , y0 ), 0 such that F U : U ! V is bijective.
Therefore, as (f (x0 , y0 ), 0) 2 V which is open, the line segment (f (x0 , y0 ) + t, 0) 2 V
too when " < t < " for some " > 0. The line segment in the set V correspond to the
curve (x(t), y(t)) in U such that
F (x(t), y(t)) = (f (x0 , y0 ) + t, 0) =) f (x(t), y(t)), g(x(t), y(t)) = (f (x0 , y0 ) + t, 0).
Therefore, the curve (t) := (x(t), y(t)) lies in the constraint g = 0, and when t 2 (0, "),
we have f ( (t)) = f (x0 , y0 )+t > f (x0 , y0 ); whereas when t 2 ( ", 0), we have f ( (t)) <
f (x0 , y0 ). This shows f achieves neither the maximum nor the minimum at (x0 , y0 ) even
along the curve (t) ⇢ {g = 0}.
4Remark: this can be justified by the implicit function theorem later.
3.3. Inverse and Implicit Function Theorems 91
3.3.5. Implicit Function Theorem. Now we address the issue about the method of
implicit differentiation as discussed in the beginning paragraphs of this section. Given a
C 1 function f (x, y), and consider the level set f (x, y) = 0. Typically, it is a curve in R2
and we want to determine the slope of tangent at a point (x0 , y0 ) to this curve, without
explicitly solving y in terms of x.
However, we need to justify whether y can really be regarded as a C 1 function of
x – at least locally near (x0 , y0 ), as the slope of tangent only depends on the part of the
curve locally near (x0 , y0 ). This can be justified by the inverse function theorem, and the
result is often called the implicit function theorem. We will discuss and prove the two
variable version, then we will state the general dimension version and leave the proof as
an exercise for readers.
We will prove that if f (x0 , y0 ) = 0 and @f
@y (x0 , y0 ) 6= 0, then the level curve f = 0
can be locally regarded as a C function of x near (x0 , y0 ). Precisely, there exists a C 1
1
function g(x) : (x0 ", x0 + ") ! R such that g(x0 ) = y0 and f (x, g(x)) = 0 for any
x 2 (x0 ", x0 + ").
We define the map F (x, y) : R2 ! R2 by
F (x, y) = x, f (x, y) .
This function maps the family of level curves {f = c}c2R into horizontal lines in the
target R2 . By checking its Jacobian, we get that:
1 0
DF (x0 , y0 ) = @f @f
@x (x ,
0 0y ) @y (x 0 , y0 )
whose determinant is given by @f @y (x0 , y0 ) 6= 0. Inverse function theorem asserts that there
exists an open set U containing (x0 , y0 ) and an open set V containing F (x0 , y0 ) = (x0 , 0)
such that F |U : U ! V is bijective, and F 1 : V ! U is C 1 . Then, each point
(x, y) 2 U \ {f = 0} must have a unique x-coordinate. To argue this, we suppose there
are points (x, y1 ) and (x, y2 ) in U \ {f = 0}. Then F (x, yi ) = (x, 0) as f (x, yi ) = 0 for
i = 1, 2. Since F is injective on U , we must have (x, y1 ) = (x, y2 ). This proves that the
level set {f = 0} is a graph of x inside U .
Next we construct the g(x) function so that f (x, y) = 0 is locally the graph y = g(x)
near (x0 , y0 ). Take " > 0 small so that B" (x0 , 0) ⇢ V , then for any x 2 (x0 ", x0 + "),
then point (x, 0) 2 B" (x0 , 0) ⇢ V , so it is in the domain of F 1 . The point F 1 (x, 0)
must be of the form (x, ⇤) for some unique value ⇤ (uniqueness follows from the above
paragraph). We denote this unique value by g(x), then 8x 2 (x0 ", x0 + "), we have
1
F (x, 0) = (x, g(x)) =) (x, 0) = x, f (x, g(x)) =) f (x, g(x)) = 0.
Therefore the part of the level set U \ {f = 0} can be written as the graph {(x, g(x)) :
x 2 (x0 ", x0 + ")}. As F 1 is C 1 , and (x, g(x)) = F 1 (x, 0), so g is C 1 as well. See
Figure ?? as a reference (to be added).
Remark 3.27. If we have @f @x (x0 , y0 ) 6= 0 instead, then one can argue that the level set
f = 0 is locally a graph of y near (x0 , y0 ). One may consider the map F (x, y) = (f (x, y), y)
instead and almost repeat the above argument.
Exercise 3.21. Prove the three-variable version of the implicit function theorem:
given a C 1 function f (x, y, z) : R3 ! R. Suppose (x0 , y0 , z0 ) is a point on the level
set {f = 0} and @f @z (x0 , y0 , z0 ) 6= 0. Prove that the level set {f = 0} is locally a graph
z = g(x, y) near (x0 , y0 , z0 ) for some C 1 function g(x, y) defined on a ball B" (x0 , y0 ),
i.e. f x, y, g(x, y) = 0.
92 3. Differentiability
From the implicit function theorem, one can prove that rf (p), if it is non-zero, is
perpendicular to the level set {f = c} at p. Consider a C 1 function f : R3 ! R, and
p is a point on the level set {f = c}. Suppose rf (p) 6= 0, then at least one of @f @x (p),
@f @f @f
@y (p) and @z (p) is non-zero. Without loss of generality, assume it is @z (p), then by
Exercise 3.21, the level set f = c can be locally regarded as a graph z = g(x, y) near p
for some C 1 function g(x, y), i.e. f x, y, g(x, y) = 0 for (x, y) near (x0 , y0 ). Here we let
p = (x0 , y0 , z0 ).
As f and g are C 1 , by differentiating f x, y, g(x, y) we get:
@ @f @f @g
0= f x, y, g(x, y) = (p) + (p) (x0 , y0 )
@x @x @z @x
@ @f @f @g
0= f x, y, g(x, y) = (p) + (p) (x0 , y0 )
@y @y @z @y
Rewrite these equations in vector form, we get:
✓ ◆
@g
rf (p) · 1, 0, (x0 , y0 ) = 0
@x
✓ ◆
@g
rf (p) · 0, 1, (x0 , y0 ) = 0.
@y
⇣ ⌘ ⇣ ⌘
@g @g
Note that 1, 0, @x (x0 , y0 ) and 0, 1, @y (x0 , y0 ) are tangent vectors of the graph z =
g(x, y) at p, which is locally coincide with the level set f = c near p. Therefore, rf (p)
is perpendicular to two of basis tangent vectors of f = c at p, or equivalently it is
perpendicular to the level set f = c at p.
Here we state the general (i.e. higher dimensions) versions of implicit function
theorem for vector-valued functions F : Rn ! Rm . Its proof is slightly modified from
the case f : R3 ! R. We again leave the proof as an exercise. The higher dimensional
case F : Rn ! Rm may not be of uses in this course, but it will become a handy tool in
differentiable geometry – such as for showing the intersection of two surfaces is a C 1
curve.