You are on page 1of 17

Multivariable Calculus

1. The Derivative

Definition: Derivative at a Point


Let U be an open subset of Rm , let F : U → Rn , and let a ∈ U . We say that F is
differentiable at a if there exists an n × m matrix DFa satisfying

F (a + h) − F (a) − DFa (h)


lim = 0.
h→0 khk

In this case, DFa is called the derivative of F at a.

A few notes about this definition:


1. The matrix DFa is uniquely determined. Specifically, if A and B are two ma-
trices satisfying the definition of DFa , then subtracting the limits gives
Ah − Bh
lim = 0,
h→0 khk
which implies that A = B.
2. Though we have stated the definition using the variable h, we could alternatively
use the variable x = a + h. In this case, the limit definition of DFa becomes
F (x) − F (a) − DFa (x − a)
lim = 0.
x→a kx − ak

3. The limit definition for DFa is equivalent to the following condition: for every
 > 0, there exists a δ > 0 so that

khk < δ ⇒ kF (a + h) − F (a) − DFa (h)k ≤ khk.


Multivariable Calculus 2

If we use x = a + h instead, this implication becomes

kx − ak < δ ⇒ kF (x) − F (a) − DFa (x − a)k ≤ kx − ak.

4. The definition of the derivative is closely related to a linear approximation


formula. Specifically, the definition states that the error in the approximation

F (a + h) ≈ F (a) + DFa (h)

is much smaller than h as h → 0. This is sometimes written

F (a + h) = F (a) + DFa (h) + o(h),

where o(h) denotes an arbitrary function satisfying

o(h)
lim = 0.
h→0 khk

2. Partial Derivatives

Definition: Partial Derivatives


Let U ⊂ Rm be open, let F : U → Rn , and let a ∈ U . If i ∈ {1, . . . , m}, the partial
derivative of F with respect to xi at a is defined as follows:

∂F F (a + hei ) − F (a)
(a) = lim .
∂xi h→0 h
Here ei denotes a unit vector in the xi direction.

A few notes about this definition:


∂f
1. If f : R → R, then (a) is the same as f 0 (a).
∂x
2. As we have defined it, the partial derivative
 (∂F )/(∂xi )(a) is a vector. Specifi-
cally, if F (x) = f1 (x), . . . , fn (x) , then
 
∂F ∂f1 ∂fn
(a) = (a), . . . , (a) .
∂xi ∂xi ∂xi
Multivariable Calculus 3

3. The partial derivatives for a function F fit into a matrix:


 ∂F ∂F1 
1
(a) ··· (a)
 ∂x1 ∂xm 
 
 .. .. .. 
 .
 . . 

 
 ∂F ∂Fn 
n
(a) ··· (a)
∂x1 ∂xm
This is called the Jacobian matrix for F at a. Each colum of this matrix is
equal to ∂F/∂xi for some i.
4. The definition of a partial derivative closely resembles the limit definition of
the derivative of single-variable function. Indeed, the partial derivative could
alternatively be defined by
∂F
(a) = (F ◦ γ)0 (0)
∂xi
where γ : R → Rm is the path defined by γ(t) = a + tei . That is, the partial
derivative is velocity vector in Rn of the image of a path that moves in the xi
direction in Rm .
5. More generally, one can define the directional derivative of F in the direction
of a vector v ∈ Rm by the formula
∂F F (a + hv) − F (a)
= lim
∂v h→0 h
Note then that the partial derivatives of F are simply the directional derivatives
of F in the directions of the standard basis vectors e1 , . . . , em . Many theorems
about partial derivatives can be generalized to directional derivatives.
The following theorem describes the relationship between the partial derivatives and
the derivative of a differentiable function:

Theorem 1 Partial Derivatives of a Differentiable Function


Let U ⊂ Rm be open, let F : U → Rn , and let a ∈ U . If F is differentiable at a,
then all of the partial derivatives of F exist at a, and
∂F
(a) = DFa (ei )
∂xi
for all i ∈ {1, . . . , m}, where ei is the i’th standard basis vector.
Multivariable Calculus 4

PROOF Let  > 0. Since F is differentiable at a, there exists a δ > 0 so that

khk < δ ⇒ kF (a + h) − F (a) − DFa (h)k ≤ khk.

Let h ∈ R, and suppose that |h| < δ. Then khei k < δ, so



F (a + hei ) − F (a) F (a + hei ) − F (a) − h DFa (ei )
− DFa (ei )
=

h h

kF (a + hei ) − F (a) − DFa (hei )k


=
h
khei k
≤ = .
h
We conclude that
F (a + hei ) − F (a)
lim = DFa (ei ) 
h→0 h

As a generalization of this theorem, one can prove that


∂F
(a) = DFa (v)
∂v
for any vector v, where ∂F/∂v denotes the directional derivative of F in the direction
of v.
The following corollary provides a formula for the derivative DFa of a differentiable
function at a point a.

Corollary 2 The Jacobian Matrix

Let U ⊂ Rm be open, and let a ∈ U . Let F : U → Rn be the function defined by



F (x) = f1 (x), . . . , fn (x)

where each fi : U → R. If F is differentiable at a, then


 ∂f ∂f1 
1
(a) · · · (a)
 ∂x1 ∂xm 
 
 .. . .. .
.. 
DFa = 
 . .

 
 ∂f ∂fn 
n
(a) · · · (a)
∂x1 ∂xm
Multivariable Calculus 5

PROOF By Theorem 1, we know that DFa (ei ) is the partial derivative (∂F/∂xi )(a).
But for any matrix A, the vector Aei is simply the i’th column of A, and therefore
DFa is the matrix whose i’th column is (∂F/∂xi )(a). From the definition of the
partial derivative, it is easy to see that
 
∂F ∂f1 ∂fn
(a) = (a), . . . , (a)
∂xi ∂xi ∂xi
and the corollary follows. 

EXAMPLE 1 Parametric Curves


If f : (a, b) → R is a differentiable function, it follows from Corollary 2 that

Dfx = f 0 (x)

for any x ∈ (a, b). Indeed, it can be shown that f 0 (x) exists if and only if Dfx exists.
More generally, a parametric curve is any function γ : (a, b) → Rn . The deriva-
tive of such a curve is the vector γ 0 (t) defined by the formula

γ(t + h) − γ(t)
γ 0 (t) = lim .
h→0 h
That is, γ 0 (t) is equal to ∂γ/∂t. Again it follows from Corollary 2 that

Dγt = γ 0 (t)

for any t ∈ (a, b). Moreover, one can show that γ 0 (t) exists if and only if Dγt exists. 

EXAMPLE 2 The Gradient


Let U ⊂ Rm , and let f : U → R. If f is differentiable at a point a, then the derivative
of f at a is the matrix.
 
∂f ∂f
Dfa = (a) · · · (a) .
∂x1 ∂xm

Note that Dfa is a row vector, not an ordinary (column) vector. As a linear trans-
formation, Dfa is the linear functional Rm → R defined by
∂f
Dfa (ei ) = (a).
∂xi
The transpose of Dfa is vector, known as the gradient of f :
 
T ∂f ∂f
∇f (a) = (Dfa ) = (a), . . . , (a)
∂x1 ∂xm
Multivariable Calculus 6

The gradient satisfies


Dfa (v) = ∇f (a) · v
for every vector v, where · denotes the dot product. In particular,
∂f
(a) = ∇f (a) · ei
∂xi
for each i ∈ {1, . . . , m}. 

From Corollary 2, we might be tempted to think that the derivative DFa is the
same as the Jacobian matrix of partial derivatives, but this is not always the case.
In particular, it is possible for the partial derivatives of F to exist without F being
differentiable:

EXAMPLE 3 A Non-Differentiable Function


Let ϕ : [0, 2π] → R be any continuous function with ϕ(0) = ϕ(2π). Define a function
f : R2 → R by
f (x, y) = r ϕ(θ)
where (r, θ) denote polar coordinates on R2 . The following statements are easy to
verify:

1. The function f is continuous at (0, 0).

2. (∂f )(∂x)(0, 0) exists if and only if ϕ(0) = −ϕ(π).

3. (∂f )(∂y)(0, 0) exists if and only if ϕ(π/2) = −ϕ(3π/2).

4. The function f is differentiable at (0, 0) if and only if f is linear, i.e. if and only
if ϕ has the form ϕ(θ) = a cos θ + b sin θ for some constants a, b ∈ R.

In particular, if ϕ satisfies conditions (2) and (3) but not (4), then the partial deriva-
tives of f will exist but f will not be differentiable.
For a specific example, consider the case where ϕ(θ) = sin(3θ). In Cartesian
coordinates, the resulting function f can be written

3x2 y − y 3
f (x, y) = ,
x2 + y 2

where f (0, 0) = 0. A graph of this function is shown in Figure 1. Since

f (x, 0) = 0 and f (0, y) = −y


Multivariable Calculus 7

Figure 1: A non-differentiable function whose partial derivatives exist everywhere.

we have (∂f /∂x)(0, 0) = 0 and (∂f /∂y)(0, 0) =−1, soboth partial derivatives exist
at (0, 0). However, it is not true that Df(0,0) = 0 −1 . For example, along the line
y = x we have f (x, x) = x, so the directional derivative in this direction satisfies
 
∂f   1
= 1 6= 0 −1 .
∂(1, 1) 1
An interesting feature of this example is that the partial derivatives of f are not
continuous at the point (0, 0). For example, one can show that
(
∂f 3 if x 6= 0
(x, 0) =
∂y 0 if x = 0,
so ∂f /∂y is not continuous at the point (0, 0). 

The following theorem gives conditions on the partial derivatives that guarantee
the differentiability of a function. We shall not prove this theorem here:

Theorem 3 Continuous Partials Imply Differentiability

Let U ⊂ Rm be open, and let F : U → Rn . Suppose that the partial derivatives


∂F ∂F
, ...,
∂x1 ∂xm
exist at each point of U , and are continuous as functions U → Rn . Then F is
differentiable on U .
Multivariable Calculus 8

In general, a function F : U → Rn is called continuously differentiable if each


of its partial derivatives ∂F/∂xi exists and is continuous as a function U → Rn .
Equivalently, F is continuously differentiable if it is differentiable and the derivative
DF is continuous as a function U → Rn×m , where Rn×m is the space of all n × m
matrices.

3. The Chain Rule


We shall now state and prove the Chain Rule for multivariable functions. Our proof
will require the following definition of the norm of a matrix:

Definition: Norm of a Matrix


The norm of an n × m matrix A is defined by

kAk = max kAuk,


kuk=1

where the max is taken over all unit vectors u ∈ Rm .

Note that the maximum in the above definition always exists since the unit sphere
in Rm is compact. Though we shall not prove it here, the norm kAk can also be defined
as the largest eigenvalue of the symmetric matrix ATA.
The most important property of the matrix norm is the following inequality:

Proposition 4 Matrix Norm Inequality

Let A be an n × m matrix, and let v ∈ Rm . Then

kAvk ≤ kAk kvk.

PROOF Let u be a unit vector such that v = kvku. Then



kAvk = A(kvku) = kvk(Au) = kvk kAuk ≤ kvk kAk. 

We will also need the following inequality, which is similar in spirit to the Mean
Value Theorem from single variable calculus. Its proof is left as an exercise to the
reader:
Multivariable Calculus 9

Lemma 5 Mean Value Inequality

Let U ⊂ Rm be open, let F : U → Rn , and suppose that F is differentiable at a


point a. Then for any M > kDFa k, there exists a δ > 0 so that

kx − ak < δ ⇒ kF (x) − F (a)k ≤ M kx − ak.

As a trivial consequence of this inequality, we see that a function which is differ-


entiable at a must be continuous at a.

Theorem 6 Chain Rule


Let U ⊂ Rm and V ⊂ Rn be open, let F : U → V and G : V → Rp be continuous,
and suppose that F is differentiable at a and G is differentiable at b = F (a).
Then G ◦ F is differentiable at a, and

D(G ◦ F )a = DGb DFa .

PROOF Let  > 0. Let LF : Rm → Rn and LG : Rn → Rp be the functions

LF (x) = F (a) + DFa (x − a) and LG (y) = G(b) + DFb (y − b)

We must show that

k(G ◦ F )(x) − (LG ◦ LF )(x)k ≤ kx − ak

when x is sufficiently close to a. By the triangle inequality,

k(G ◦ F )(x) − (LG ◦ LF )(x)k


≤ k(G ◦ F )(x) − (LG ◦ F )(x)k + k(LG ◦ F )(x) − (LG ◦ LF )(x)k.

We shall handle these two terms separately:


1. Fix a positive constant M > kDFa k. Since G is differentiable at b, there exists
an r > 0 so that

ky − bk < r ⇒ kG(y) − LG (y)k ≤ ky − bk.
2M
Since F is continuous at a, we can find a δ1 > 0 so that

kx − ak < δ1 ⇒ kF (x) − F (a)k < r,


Multivariable Calculus 10

Since F is differentiable at a, the Mean Value Inequality gives us a δ2 > 0 so that

kx − ak < δ2 ⇒ kF (x) − F (a)k ≤ M kx − ak.

Combining these, we find that


 
k(G ◦ F )(x) − (LG ◦ F )(x)k ≤ kF (x) − F (a)k ≤ kx − ak
2M 2
whenever kx − ak < min(δ1 , δ2 ).

2. For the second term, fix a positive constant N > kDGb k. Since F is differentiable
at a, there exists a δ3 > 0 so that

kx − ak < δ3 ⇒ kF (x) − LF (x)k ≤ kx − ak.
2N
If kx − ak < δ3 , it follows that

k(LG ◦ F )(x) − (LG ◦ LF )(x)k = DGb F (x) − LF (x)
≤ N kF (x) − LF (x)k

≤ kx − ak.
2

Combining our two bounds together, we find that


 
k(G ◦ F )(x) − (LG ◦ LF )(x)k ≤ kx − ak + kx − ak = kx − ak
2 2
whenever kx − ak < min(δ1 , δ2 , δ3 ). 

4. Diffeomorphisms
One of the most important types of differentiable maps is the diffeomorphism:

Definition: Diffeomorphism
Let U and V be open subsets of Rn . A diffeomorphism from U to V is a map
F : U → V with the following properties:

1. F is bijective.

2. F is differentiable, and

3. F −1 is differentiable.
Multivariable Calculus 11

F
−−−−−→

U V
Figure 2: A diffeomorphism between two open subsets of R2 .

That is, a diffeomorphism is an invertible differentiable map whose inverse is also


differentiable. Diffeomorphisms play roughly the same role in multivariable calculus
that isomorphisms do in group theory, or homeomorphisms do in topology. (To be
precise, diffeomorphisms are the invertible morphisms in the category of open sets
and differentiable maps.)
Geometrically, a diffeomorphism corresponds to a smooth “distortion” of an open
subset of Rn , as shown in Figure 2. The picture for a diffeomorphism in R3 would be
similar: each coordinate plane in the domain would map to a curved surface in the
range.

Theorem 7 Derivative of the Inverse


Let ϕ : U → V be a diffeomorphism, let a ∈ U , and let b = ϕ(a). Then Dϕa is
nonsingular, and
(Dϕa )−1 = D(ϕ−1 )b .

PROOF By the Chain Rule,


D(ϕ−1 )b Dϕa = D(ϕ−1 ◦ ϕ)a = D(id)a = I,
where I is the n × n identity matrix. The proposition follows. 

EXAMPLE 4 Diffeomorphisms on R
Let f : R → R be a diffeomorphism. According to the above theorem, the derivative
f 0 for such a function must satisfy f 0 (a) 6= 0 for each a ∈ R. Moreover, the derivatives
of f and f −1 are related by the formula
1
(f −1 )0 (b) = 0
f (a)
Multivariable Calculus 12

for each a ∈ R, where b = f (a). Indeed, it follows from the Inverse Function Theorem
(see below) that any differentiable bijection f : R → R with nonzero derivative is a
diffeomorphism.
If f has a critical point, then it cannot be a diffeomorphism. For example, the
map f : R → R defined by f (x) = x3 is a differentiable√ bijection, but is not a
−1
diffeomorphism, since the inverse function f (x) = x is not differentiable at 0. 
3

We can extend the notion of critical points to functions on Rn :

Definition: Critical Points for a Function Rn → Rn


Let U and V be open subsets of Rn , and let F : U → V . A point a ∈ U is called a
critical point for F if either

1. F is not differentiable at a, or

2. F is differentiable at a, but DFa is a singular matrix.

Note that, if F is differentiable at a, then a is a critical point for F if and only if


the determinant of the derivative matrix DFa is zero. This gives a practical test for
finding the critical points of a function.
According to Theorem 7, a diffeomorphism cannot have any critical points. The
converse is also true, but it will require the Inverse Function Theorem to prove:

Theorem 8 Characterization of Diffeomorphisms

Let U and V be open subsets of Rn , and let F : U → V be a bijection. Then F


is a diffeomorphism if and only if F has no critical points.

Curvilinear Coordinates
Though diffeomorphisms might seem like an entirely new concept, they are closely
related to the familiar notion of curvilinear coordinates.
Multivariable Calculus 13

ϕ
0 −−−−−→


0 2 4 6 8
U
V
Figure 3: The polar coordinates diffeomorphism.

Definition: Curvilinear Coordinates


Let V be an open subset of Rn . A set {u1 , . . . , un } of real-valued functions on
V are called curvilinear coordinates if there exists an open set U ⊂ Rn and a
diffeomorphism ϕ : U → V such that

ϕ u1 (x), . . . , un (x) = x

for all x ∈ V .

Note that the coordinat functions u1 , . . . , un are essentially the components of the
inverse diffeomorphism ϕ−1 : V → U . That is,

ϕ−1 (x) = u1 (x), . . . , un (x)




for all x ∈ V . Thus, we could alternatively define curvilinear coordinates as the


components of a diffeomorphism with domain V . For applications, however, the map
ϕ is usually more important, so we will stick with the definition given above.

EXAMPLE 5 Polar Coordinates


Let V be the complement of the negative real axis (−∞, 0] × {0} in R2 . Let U =
(0, ∞) × (−π, π), and define a function ϕ : U → V by

ϕ(r, θ) = (r cos θ, r sin θ).

This function is shown in Figure 3. Clearly ϕ is a bijection, and



cos θ −r sin θ
det Dϕ(r,θ) = = r 6= 0

sin θ r cos θ

for all (r, θ) ∈ U , so ϕ is a diffeomorphism. The resulting curvilinear coordinates


r, θ : V → R are the familiar polar coordinates. 
Multivariable Calculus 14

EXAMPLE 6 Spherical Coordinates


Let V be the complement of the half-plane (−∞, 0] × {0} × R in R3 . Let U be the
region (0, ∞) × (−π, π) × (−π/2, π/2), and define a function ϕ : U → V by

ϕ(ρ, θ, φ) = (ρ cos θ cos φ, ρ sin θ cos φ, ρ sin φ).

Then ϕ is a bijection, and



cos θ cos φ −ρ sin θ cos φ −ρ cos θ sin φ


det Dϕ(ρ,θ,φ) = sin θ cos φ ρ cos θ cos φ −ρ sin θ sin φ = ρ2 cos φ 6= 0


sin φ 0 ρ cos φ

so ϕ is a diffeomorphism. The resulting curvilinear coordinates ρ, θ, φ : V → R are


called spherical coordinates. 

In general, if ϕ : U → V is a diffeomorphism with (x1 , . . . , xn ) = ϕ(u1 , . . . , un ),


then the derivative of ϕ is a matrix of partial derivatives:
 
∂x1 ∂x1
 ∂u1 · · · ∂un 
 . .. 
..
Dϕ =  .. . . 
 
 
 ∂xn ∂xn 
···
∂u1 ∂un

Similarly, since (u1 , . . . , un ) = ϕ(x1 , . . . , xn ), the derivative of ϕ−1 is also a matrix of


partial derivatives  
∂u1 ∂u1
 ∂x1 · · · ∂xn 
 . .. 
−1  . ..
D(ϕ ) =  . . . 

 
 ∂un ∂un 
···
∂x1 ∂xn
By Theorem 7, these two matrices should be inverses at each pair of corresponding
points in U and V .

Change of Variables
In this section we discuss the change of variables formula for integration. We begin
by reviewing the geometric meaning of determinants.
Multivariable Calculus 15

v3

v2
a v1

Figure 4: The parallelepiped P (a, v1 , v2 , v3 ).

Definition: Parallelepiped
A parallelepiped in Rn is a set of the form

P (a, v1 , . . . , vn ) = {a + t1 v1 + · · · + tn vn | t1 , . . . , tn ∈ [0, 1]}

where a, v1 , . . . , vn ∈ Rn (see Figure 4).

Parallelepipeds are the n-dimensional analogues of parallelograms. In the case


where the vectors v1 , . . . , vn are orthogonal, the parallelepiped P (a, v1 , . . . , vn ) is
simply a rectangular box.
The following geometric formula is basic to any development of the theory of
determinants:

Theorem 9 Volume of a Parallelepiped

Let a, v1 , . . . , vn ∈ Rn , and let A be the matrix whose columns are the vectors
v1 , . . . , vn . Then the volume of the parallelepiped P (a, v1 , . . . , vn ) is |det A|.

Here “volume” refers to area in the case where n = 2, or length in the case where
n = 1. As a result of the above theorem, we obtain the following interpretation for
the determinant of a linear transformation:

Theorem 10 Geometric Meaning of the Determinant

Let T : Rn → Rn be a linear transformation, and let P be a parallelepiped in Rn .


Then T (P ) is also a parallelepiped, and the volume of T (P ) is |det T | times the
volume of P .
Multivariable Calculus 16

PROOF Suppose that P = P (a, v1 , . . . , vn ). Then clearly



T (P ) = P T (a), T (v1 ), . . . , T (vn ) .

Let A be the matrix whose columns are v1 , . . . , vn , and let B be the matrix whose
columns are T (v1 ), . . . , T (vn ). Then B is equal to T A, the product of the matrix for
T with A, so
|det B| = |(det T )(det A)| = |det T | |det A|. 

In fact, a much more general statement is true:

Theorem 11 Linear Transformations and Measure


Let T : Rn → Rn be a linear transformation. Then

µ T (S) = |det T | µ(S)

for any measurable set S ⊂ Rn , where µ denotes Lebesgue measure on Rn .


Furthermore, if det T 6= 0, then
Z Z
f dµ = (f ◦ T ) |det T | dµ
Rn Rn

for any L1 function f : Rn → R.

Sketch of Proof The first statement follows from the fact that
P S
µ(S) = inf µ(Pn ) P1 , P2 , . . . are parallelepipeds in Rn and S ⊂ Pn

for any measurable set S. Then the second statement is clearly true for simple
functions, and can be extended to any L1 function using the Dominated Convergence
Theorem. 

We now consider the determinant of a differentiable map at each point:

Definition: Jacobian
Let U be an open subset of Rn , and let F : U → Rn be a differentiable map. The
Jacobian of F is the function JF : U → [0, ∞) defined by

JF (a) = det DFa .


Multivariable Calculus 17

Roughly speaking, the Jacobian measures the amount by which F multiplies vol-
umes at a point a. That is, if S is a measurable set in a small neighborhood of a,
then 
µ f (S) ≈ |JF (a)| µ(S).
Incidentally, one can prove that the Jacobian is always a measurable function.
We are now ready to state the change of variables formula, though we are not in
a position to prove it:

Theorem 12 Change of Variables

Let ϕ : U → V be a diffeomorphism, and let f : V → R be an L1 function. Then


Z Z
f dµ = (f ◦ ϕ) |Jϕ | dµ,
V U

where µ denotes Lebesgue measure on Rn .

EXAMPLE 7 Integration in Polar Coordinates


Let ϕ : U → V be the diffeomorphism

ϕ(r, θ) = (r cos θ, r sin θ)

defined in Example 5, where U = (−π, π) × (0, ∞) and V is the complement of the


negative real axis. Then the Jacobian of ϕ is given by the formula

cos θ −r sin θ
Jϕ (r, θ) = det Dϕ(r,θ) = = r.

sin θ r cos θ

Therefore, by the change of variables formula


Z Z
f (x, y) dµ(x, y) = (f ◦ ϕ)(r, θ) r dµ(r, θ)
V U

for any L1 function f : V → R. Since R2 − V has measure zero, it follows that


Z Z
f (x, y) dµ(x, y) = f (r cos θ, r sin θ) r dµ(r, θ). 
R2 U

You might also like