Multivariable Calculus: 1. The Derivative

Multivariable Calculus
1. The Derivative
Definition: Derivative at a Point

Let U be an open subset of Rm , let F : U → Rn , and let a ∈ U . We say that F is
differentiable at a if there exists an n × m matrix DFa satisfying
F (a + h) − F (a) − DFa (h)

lim = 0.
h→0 khk
In this case, DFa is called the derivative of F at a.
A few notes about this definition:

1. The matrix DFa is uniquely determined. Specifically, if A and B are two ma-
trices satisfying the definition of DFa , then subtracting the limits gives
Ah − Bh
lim = 0,
h→0 khk
which implies that A = B.
2. Though we have stated the definition using the variable h, we could alternatively
use the variable x = a + h. In this case, the limit definition of DFa becomes
F (x) − F (a) − DFa (x − a)
lim = 0.
x→a kx − ak
3. The limit definition for DFa is equivalent to the following condition: for every
> 0, there exists a δ > 0 so that
khk < δ ⇒ kF (a + h) − F (a) − DFa (h)k ≤ khk.

Multivariable Calculus 2
If we use x = a + h instead, this implication becomes
kx − ak < δ ⇒ kF (x) − F (a) − DFa (x − a)k ≤ kx − ak.
4. The definition of the derivative is closely related to a linear approximation

formula. Specifically, the definition states that the error in the approximation
F (a + h) ≈ F (a) + DFa (h)
is much smaller than h as h → 0. This is sometimes written
F (a + h) = F (a) + DFa (h) + o(h),
where o(h) denotes an arbitrary function satisfying
o(h)
lim = 0.
h→0 khk
2. Partial Derivatives
Definition: Partial Derivatives

Let U ⊂ Rm be open, let F : U → Rn , and let a ∈ U . If i ∈ {1, . . . , m}, the partial
derivative of F with respect to xi at a is defined as follows:
∂F F (a + hei ) − F (a)
(a) = lim .
∂xi h→0 h
Here ei denotes a unit vector in the xi direction.
A few notes about this definition:

∂f
1. If f : R → R, then (a) is the same as f 0 (a).
∂x
2. As we have defined it, the partial derivative
(∂F )/(∂xi )(a) is a vector. Specifi-
cally, if F (x) = f1 (x), . . . , fn (x) , then

∂F ∂f1 ∂fn
(a) = (a), . . . , (a) .
∂xi ∂xi ∂xi
3. The partial derivatives for a function F fit into a matrix:

 ∂F ∂F1 
1
(a) ··· (a)
 ∂x1 ∂xm 
 
 .. .. .. 
 .
 . . 

 
 ∂F ∂Fn 
n
(a) ··· (a)
∂x1 ∂xm
This is called the Jacobian matrix for F at a. Each colum of this matrix is
equal to ∂F/∂xi for some i.
4. The definition of a partial derivative closely resembles the limit definition of
the derivative of single-variable function. Indeed, the partial derivative could
alternatively be defined by
∂F
(a) = (F ◦ γ)0 (0)
∂xi
where γ : R → Rm is the path defined by γ(t) = a + tei . That is, the partial
derivative is velocity vector in Rn of the image of a path that moves in the xi
direction in Rm .
5. More generally, one can define the directional derivative of F in the direction
of a vector v ∈ Rm by the formula
∂F F (a + hv) − F (a)
= lim
∂v h→0 h
Note then that the partial derivatives of F are simply the directional derivatives
of F in the directions of the standard basis vectors e1 , . . . , em . Many theorems
about partial derivatives can be generalized to directional derivatives.
The following theorem describes the relationship between the partial derivatives and
the derivative of a differentiable function:
Theorem 1 Partial Derivatives of a Differentiable Function

Let U ⊂ Rm be open, let F : U → Rn , and let a ∈ U . If F is differentiable at a,
then all of the partial derivatives of F exist at a, and
∂F
(a) = DFa (ei )
∂xi
for all i ∈ {1, . . . , m}, where ei is the i’th standard basis vector.
PROOF Let > 0. Since F is differentiable at a, there exists a δ > 0 so that
khk < δ ⇒ kF (a + h) − F (a) − DFa (h)k ≤ khk.
Let h ∈ R, and suppose that |h| < δ. Then khei k < δ, so

F (a + hei ) − F (a) F (a + hei ) − F (a) − h DFa (ei )
− DFa (ei )
=

h h
kF (a + hei ) − F (a) − DFa (hei )k

=
h
khei k
≤ = .
h
We conclude that
F (a + hei ) − F (a)
lim = DFa (ei )
h→0 h
As a generalization of this theorem, one can prove that

∂F
(a) = DFa (v)
∂v
for any vector v, where ∂F/∂v denotes the directional derivative of F in the direction
of v.
The following corollary provides a formula for the derivative DFa of a differentiable
function at a point a.
Corollary 2 The Jacobian Matrix
Let U ⊂ Rm be open, and let a ∈ U . Let F : U → Rn be the function defined by

F (x) = f1 (x), . . . , fn (x)
where each fi : U → R. If F is differentiable at a, then

 ∂f ∂f1 
1
(a) · · · (a)
 ∂x1 ∂xm 
 
 .. . .. .
.. 
DFa = 
 . .

 
 ∂f ∂fn 
n
(a) · · · (a)
∂x1 ∂xm
PROOF By Theorem 1, we know that DFa (ei ) is the partial derivative (∂F/∂xi )(a).
But for any matrix A, the vector Aei is simply the i’th column of A, and therefore
DFa is the matrix whose i’th column is (∂F/∂xi )(a). From the definition of the
partial derivative, it is easy to see that

∂F ∂f1 ∂fn
(a) = (a), . . . , (a)
∂xi ∂xi ∂xi
and the corollary follows.
EXAMPLE 1 Parametric Curves

If f : (a, b) → R is a differentiable function, it follows from Corollary 2 that
Dfx = f 0 (x)
for any x ∈ (a, b). Indeed, it can be shown that f 0 (x) exists if and only if Dfx exists.
More generally, a parametric curve is any function γ : (a, b) → Rn . The deriva-
tive of such a curve is the vector γ 0 (t) defined by the formula
γ(t + h) − γ(t)
γ 0 (t) = lim .
h→0 h
That is, γ 0 (t) is equal to ∂γ/∂t. Again it follows from Corollary 2 that
Dγt = γ 0 (t)
for any t ∈ (a, b). Moreover, one can show that γ 0 (t) exists if and only if Dγt exists.
EXAMPLE 2 The Gradient

Let U ⊂ Rm , and let f : U → R. If f is differentiable at a point a, then the derivative
of f at a is the matrix.

∂f ∂f
Dfa = (a) · · · (a) .
∂x1 ∂xm
Note that Dfa is a row vector, not an ordinary (column) vector. As a linear trans-
formation, Dfa is the linear functional Rm → R defined by
∂f
Dfa (ei ) = (a).
∂xi
The transpose of Dfa is vector, known as the gradient of f :

T ∂f ∂f
∇f (a) = (Dfa ) = (a), . . . , (a)
∂x1 ∂xm
The gradient satisfies

Dfa (v) = ∇f (a) · v
for every vector v, where · denotes the dot product. In particular,
∂f
(a) = ∇f (a) · ei
∂xi
for each i ∈ {1, . . . , m}.
From Corollary 2, we might be tempted to think that the derivative DFa is the
same as the Jacobian matrix of partial derivatives, but this is not always the case.
In particular, it is possible for the partial derivatives of F to exist without F being
differentiable:
EXAMPLE 3 A Non-Differentiable Function

Let ϕ : [0, 2π] → R be any continuous function with ϕ(0) = ϕ(2π). Define a function
f : R2 → R by
f (x, y) = r ϕ(θ)
where (r, θ) denote polar coordinates on R2 . The following statements are easy to
verify:
1. The function f is continuous at (0, 0).
2. (∂f )(∂x)(0, 0) exists if and only if ϕ(0) = −ϕ(π).
3. (∂f )(∂y)(0, 0) exists if and only if ϕ(π/2) = −ϕ(3π/2).
4. The function f is differentiable at (0, 0) if and only if f is linear, i.e. if and only
if ϕ has the form ϕ(θ) = a cos θ + b sin θ for some constants a, b ∈ R.
In particular, if ϕ satisfies conditions (2) and (3) but not (4), then the partial deriva-
tives of f will exist but f will not be differentiable.
For a specific example, consider the case where ϕ(θ) = sin(3θ). In Cartesian
coordinates, the resulting function f can be written
3x2 y − y 3
f (x, y) = ,
x2 + y 2
where f (0, 0) = 0. A graph of this function is shown in Figure 1. Since
f (x, 0) = 0 and f (0, y) = −y

Figure 1: A non-differentiable function whose partial derivatives exist everywhere.
we have (∂f /∂x)(0, 0) = 0 and (∂f /∂y)(0, 0) =−1, soboth partial derivatives exist
at (0, 0). However, it is not true that Df(0,0) = 0 −1 . For example, along the line
y = x we have f (x, x) = x, so the directional derivative in this direction satisfies

∂f 1
= 1 6= 0 −1 .
∂(1, 1) 1
An interesting feature of this example is that the partial derivatives of f are not
continuous at the point (0, 0). For example, one can show that
(
∂f 3 if x 6= 0
(x, 0) =
∂y 0 if x = 0,
so ∂f /∂y is not continuous at the point (0, 0).
The following theorem gives conditions on the partial derivatives that guarantee
the differentiability of a function. We shall not prove this theorem here:
Theorem 3 Continuous Partials Imply Differentiability
Let U ⊂ Rm be open, and let F : U → Rn . Suppose that the partial derivatives

∂F ∂F
, ...,
∂x1 ∂xm
exist at each point of U , and are continuous as functions U → Rn . Then F is
differentiable on U .
In general, a function F : U → Rn is called continuously differentiable if each

of its partial derivatives ∂F/∂xi exists and is continuous as a function U → Rn .
Equivalently, F is continuously differentiable if it is differentiable and the derivative
DF is continuous as a function U → Rn×m , where Rn×m is the space of all n × m
matrices.
3. The Chain Rule

We shall now state and prove the Chain Rule for multivariable functions. Our proof
will require the following definition of the norm of a matrix:
Definition: Norm of a Matrix

The norm of an n × m matrix A is defined by
kAk = max kAuk,

kuk=1
where the max is taken over all unit vectors u ∈ Rm .
Note that the maximum in the above definition always exists since the unit sphere
in Rm is compact. Though we shall not prove it here, the norm kAk can also be defined
as the largest eigenvalue of the symmetric matrix ATA.
The most important property of the matrix norm is the following inequality:
Proposition 4 Matrix Norm Inequality
Let A be an n × m matrix, and let v ∈ Rm . Then
kAvk ≤ kAk kvk.
PROOF Let u be a unit vector such that v = kvku. Then

kAvk = A(kvku) = kvk(Au) = kvk kAuk ≤ kvk kAk.
We will also need the following inequality, which is similar in spirit to the Mean
Value Theorem from single variable calculus. Its proof is left as an exercise to the
reader:
Lemma 5 Mean Value Inequality
Let U ⊂ Rm be open, let F : U → Rn , and suppose that F is differentiable at a

point a. Then for any M > kDFa k, there exists a δ > 0 so that
kx − ak < δ ⇒ kF (x) − F (a)k ≤ M kx − ak.
As a trivial consequence of this inequality, we see that a function which is differ-

entiable at a must be continuous at a.
Theorem 6 Chain Rule

Let U ⊂ Rm and V ⊂ Rn be open, let F : U → V and G : V → Rp be continuous,
and suppose that F is differentiable at a and G is differentiable at b = F (a).
Then G ◦ F is differentiable at a, and
D(G ◦ F )a = DGb DFa .
PROOF Let > 0. Let LF : Rm → Rn and LG : Rn → Rp be the functions
LF (x) = F (a) + DFa (x − a) and LG (y) = G(b) + DFb (y − b)
We must show that
k(G ◦ F )(x) − (LG ◦ LF )(x)k ≤ kx − ak
when x is sufficiently close to a. By the triangle inequality,
k(G ◦ F )(x) − (LG ◦ LF )(x)k

≤ k(G ◦ F )(x) − (LG ◦ F )(x)k + k(LG ◦ F )(x) − (LG ◦ LF )(x)k.
We shall handle these two terms separately:

1. Fix a positive constant M > kDFa k. Since G is differentiable at b, there exists
an r > 0 so that

ky − bk < r ⇒ kG(y) − LG (y)k ≤ ky − bk.
2M
Since F is continuous at a, we can find a δ1 > 0 so that
kx − ak < δ1 ⇒ kF (x) − F (a)k < r,

Since F is differentiable at a, the Mean Value Inequality gives us a δ2 > 0 so that
kx − ak < δ2 ⇒ kF (x) − F (a)k ≤ M kx − ak.
Combining these, we find that

k(G ◦ F )(x) − (LG ◦ F )(x)k ≤ kF (x) − F (a)k ≤ kx − ak
2M 2
whenever kx − ak < min(δ1 , δ2 ).
2. For the second term, fix a positive constant N > kDGb k. Since F is differentiable
at a, there exists a δ3 > 0 so that

kx − ak < δ3 ⇒ kF (x) − LF (x)k ≤ kx − ak.
2N
If kx − ak < δ3 , it follows that

k(LG ◦ F )(x) − (LG ◦ LF )(x)k = DGb F (x) − LF (x)
≤ N kF (x) − LF (x)k

≤ kx − ak.
2
Combining our two bounds together, we find that

k(G ◦ F )(x) − (LG ◦ LF )(x)k ≤ kx − ak + kx − ak = kx − ak
2 2
whenever kx − ak < min(δ1 , δ2 , δ3 ).
4. Diffeomorphisms
One of the most important types of differentiable maps is the diffeomorphism:
Definition: Diffeomorphism
Let U and V be open subsets of Rn . A diffeomorphism from U to V is a map
F : U → V with the following properties:
1. F is bijective.
2. F is differentiable, and
3. F −1 is differentiable.
F
−−−−−→
U V
Figure 2: A diffeomorphism between two open subsets of R2 .
That is, a diffeomorphism is an invertible differentiable map whose inverse is also

differentiable. Diffeomorphisms play roughly the same role in multivariable calculus
that isomorphisms do in group theory, or homeomorphisms do in topology. (To be
precise, diffeomorphisms are the invertible morphisms in the category of open sets
and differentiable maps.)
Geometrically, a diffeomorphism corresponds to a smooth “distortion” of an open
subset of Rn , as shown in Figure 2. The picture for a diffeomorphism in R3 would be
similar: each coordinate plane in the domain would map to a curved surface in the
range.
Theorem 7 Derivative of the Inverse

Let ϕ : U → V be a diffeomorphism, let a ∈ U , and let b = ϕ(a). Then Dϕa is
nonsingular, and
(Dϕa )−1 = D(ϕ−1 )b .
PROOF By the Chain Rule,

D(ϕ−1 )b Dϕa = D(ϕ−1 ◦ ϕ)a = D(id)a = I,
where I is the n × n identity matrix. The proposition follows.
EXAMPLE 4 Diffeomorphisms on R
Let f : R → R be a diffeomorphism. According to the above theorem, the derivative
f 0 for such a function must satisfy f 0 (a) 6= 0 for each a ∈ R. Moreover, the derivatives
of f and f −1 are related by the formula
1
(f −1 )0 (b) = 0
f (a)
for each a ∈ R, where b = f (a). Indeed, it follows from the Inverse Function Theorem
(see below) that any differentiable bijection f : R → R with nonzero derivative is a
diffeomorphism.
If f has a critical point, then it cannot be a diffeomorphism. For example, the
map f : R → R defined by f (x) = x3 is a differentiable√ bijection, but is not a
−1
diffeomorphism, since the inverse function f (x) = x is not differentiable at 0.
3
We can extend the notion of critical points to functions on Rn :
Definition: Critical Points for a Function Rn → Rn

Let U and V be open subsets of Rn , and let F : U → V . A point a ∈ U is called a
critical point for F if either
1. F is not differentiable at a, or
2. F is differentiable at a, but DFa is a singular matrix.
Note that, if F is differentiable at a, then a is a critical point for F if and only if

the determinant of the derivative matrix DFa is zero. This gives a practical test for
finding the critical points of a function.
According to Theorem 7, a diffeomorphism cannot have any critical points. The
converse is also true, but it will require the Inverse Function Theorem to prove:
Theorem 8 Characterization of Diffeomorphisms
Let U and V be open subsets of Rn , and let F : U → V be a bijection. Then F

is a diffeomorphism if and only if F has no critical points.
Curvilinear Coordinates
Though diffeomorphisms might seem like an entirely new concept, they are closely
related to the familiar notion of curvilinear coordinates.
ϕ
0 −−−−−→
-¼
0 2 4 6 8
U
V
Figure 3: The polar coordinates diffeomorphism.
Definition: Curvilinear Coordinates

Let V be an open subset of Rn . A set {u1 , . . . , un } of real-valued functions on
V are called curvilinear coordinates if there exists an open set U ⊂ Rn and a
diffeomorphism ϕ : U → V such that

ϕ u1 (x), . . . , un (x) = x
for all x ∈ V .
Note that the coordinat functions u1 , . . . , un are essentially the components of the
inverse diffeomorphism ϕ−1 : V → U . That is,
ϕ−1 (x) = u1 (x), . . . , un (x)

for all x ∈ V . Thus, we could alternatively define curvilinear coordinates as the

components of a diffeomorphism with domain V . For applications, however, the map
ϕ is usually more important, so we will stick with the definition given above.
EXAMPLE 5 Polar Coordinates

Let V be the complement of the negative real axis (−∞, 0] × {0} in R2 . Let U =
(0, ∞) × (−π, π), and define a function ϕ : U → V by
ϕ(r, θ) = (r cos θ, r sin θ).
This function is shown in Figure 3. Clearly ϕ is a bijection, and

cos θ −r sin θ
det Dϕ(r,θ) = = r 6= 0

sin θ r cos θ
for all (r, θ) ∈ U , so ϕ is a diffeomorphism. The resulting curvilinear coordinates

r, θ : V → R are the familiar polar coordinates.
EXAMPLE 6 Spherical Coordinates

Let V be the complement of the half-plane (−∞, 0] × {0} × R in R3 . Let U be the
region (0, ∞) × (−π, π) × (−π/2, π/2), and define a function ϕ : U → V by
ϕ(ρ, θ, φ) = (ρ cos θ cos φ, ρ sin θ cos φ, ρ sin φ).
Then ϕ is a bijection, and

cos θ cos φ −ρ sin θ cos φ −ρ cos θ sin φ

det Dϕ(ρ,θ,φ) = sin θ cos φ ρ cos θ cos φ −ρ sin θ sin φ = ρ2 cos φ 6= 0

sin φ 0 ρ cos φ
so ϕ is a diffeomorphism. The resulting curvilinear coordinates ρ, θ, φ : V → R are

called spherical coordinates.
In general, if ϕ : U → V is a diffeomorphism with (x1 , . . . , xn ) = ϕ(u1 , . . . , un ),

then the derivative of ϕ is a matrix of partial derivatives:
 
∂x1 ∂x1
 ∂u1 · · · ∂un 
 . .. 
..
Dϕ =  .. . . 
 
 
 ∂xn ∂xn 
···
∂u1 ∂un
Similarly, since (u1 , . . . , un ) = ϕ(x1 , . . . , xn ), the derivative of ϕ−1 is also a matrix of

partial derivatives  
∂u1 ∂u1
 ∂x1 · · · ∂xn 
 . .. 
−1  . ..
D(ϕ ) =  . . . 

 
 ∂un ∂un 
···
∂x1 ∂xn
By Theorem 7, these two matrices should be inverses at each pair of corresponding
points in U and V .
Change of Variables
In this section we discuss the change of variables formula for integration. We begin
by reviewing the geometric meaning of determinants.
v3
v2
a v1
Figure 4: The parallelepiped P (a, v1 , v2 , v3 ).
Definition: Parallelepiped
A parallelepiped in Rn is a set of the form
P (a, v1 , . . . , vn ) = {a + t1 v1 + · · · + tn vn | t1 , . . . , tn ∈ [0, 1]}
where a, v1 , . . . , vn ∈ Rn (see Figure 4).
Parallelepipeds are the n-dimensional analogues of parallelograms. In the case

where the vectors v1 , . . . , vn are orthogonal, the parallelepiped P (a, v1 , . . . , vn ) is
simply a rectangular box.
The following geometric formula is basic to any development of the theory of
determinants:
Theorem 9 Volume of a Parallelepiped
Let a, v1 , . . . , vn ∈ Rn , and let A be the matrix whose columns are the vectors
v1 , . . . , vn . Then the volume of the parallelepiped P (a, v1 , . . . , vn ) is |det A|.
Here “volume” refers to area in the case where n = 2, or length in the case where
n = 1. As a result of the above theorem, we obtain the following interpretation for
the determinant of a linear transformation:
Theorem 10 Geometric Meaning of the Determinant
Let T : Rn → Rn be a linear transformation, and let P be a parallelepiped in Rn .

Then T (P ) is also a parallelepiped, and the volume of T (P ) is |det T | times the
volume of P .
PROOF Suppose that P = P (a, v1 , . . . , vn ). Then clearly

T (P ) = P T (a), T (v1 ), . . . , T (vn ) .
Let A be the matrix whose columns are v1 , . . . , vn , and let B be the matrix whose
columns are T (v1 ), . . . , T (vn ). Then B is equal to T A, the product of the matrix for
T with A, so
|det B| = |(det T )(det A)| = |det T | |det A|.
In fact, a much more general statement is true:
Theorem 11 Linear Transformations and Measure

Let T : Rn → Rn be a linear transformation. Then

µ T (S) = |det T | µ(S)
for any measurable set S ⊂ Rn , where µ denotes Lebesgue measure on Rn .

Furthermore, if det T 6= 0, then
Z Z
f dµ = (f ◦ T ) |det T | dµ
Rn Rn
for any L1 function f : Rn → R.
Sketch of Proof The first statement follows from the fact that
P S
µ(S) = inf µ(Pn ) P1 , P2 , . . . are parallelepipeds in Rn and S ⊂ Pn
for any measurable set S. Then the second statement is clearly true for simple
functions, and can be extended to any L1 function using the Dominated Convergence
Theorem.
We now consider the determinant of a differentiable map at each point:
Definition: Jacobian
Let U be an open subset of Rn , and let F : U → Rn be a differentiable map. The
Jacobian of F is the function JF : U → [0, ∞) defined by
JF (a) = det DFa .

Roughly speaking, the Jacobian measures the amount by which F multiplies vol-
umes at a point a. That is, if S is a measurable set in a small neighborhood of a,
then
µ f (S) ≈ |JF (a)| µ(S).
Incidentally, one can prove that the Jacobian is always a measurable function.
We are now ready to state the change of variables formula, though we are not in
a position to prove it:
Theorem 12 Change of Variables
Let ϕ : U → V be a diffeomorphism, and let f : V → R be an L1 function. Then

Z Z
f dµ = (f ◦ ϕ) |Jϕ | dµ,
V U
where µ denotes Lebesgue measure on Rn .
EXAMPLE 7 Integration in Polar Coordinates

Let ϕ : U → V be the diffeomorphism
ϕ(r, θ) = (r cos θ, r sin θ)
defined in Example 5, where U = (−π, π) × (0, ∞) and V is the complement of the

negative real axis. Then the Jacobian of ϕ is given by the formula

cos θ −r sin θ
Jϕ (r, θ) = det Dϕ(r,θ) = = r.

sin θ r cos θ
Therefore, by the change of variables formula

Z Z
f (x, y) dµ(x, y) = (f ◦ ϕ)(r, θ) r dµ(r, θ)
V U
for any L1 function f : V → R. Since R2 − V has measure zero, it follows that

Z Z
f (x, y) dµ(x, y) = f (r cos θ, r sin θ) r dµ(r, θ).
R2 U

Multivariable Calculus: 1. The Derivative

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multivariable Calculus: 1. The Derivative

Uploaded by

Copyright:

Available Formats

Multivariable Calculus

Definition: Derivative at a Point

F (a + h) − F (a) − DFa (h)

In this case, DFa is called the derivative of F at a.

A few notes about this definition:

khk < δ ⇒ kF (a + h) − F (a) − DFa (h)k ≤ khk.

If we use x = a + h instead, this implication becomes

kx − ak < δ ⇒ kF (x) − F (a) − DFa (x − a)k ≤ kx − ak.

4. The definition of the derivative is closely related to a linear approximation

F (a + h) ≈ F (a) + DFa (h)

is much smaller than h as h → 0. This is sometimes written

F (a + h) = F (a) + DFa (h) + o(h),

where o(h) denotes an arbitrary function satisfying

Definition: Partial Derivatives

A few notes about this definition:

3. The partial derivatives for a function F fit into a matrix:

Theorem 1 Partial Derivatives of a Differentiable Function

PROOF Let  > 0. Since F is differentiable at a, there exists a δ > 0 so that

khk < δ ⇒ kF (a + h) − F (a) − DFa (h)k ≤ khk.

Let h ∈ R, and suppose that |h| < δ. Then khei k < δ, so

kF (a + hei ) − F (a) − DFa (hei )k

As a generalization of this theorem, one can prove that

Corollary 2 The Jacobian Matrix

Let U ⊂ Rm be open, and let a ∈ U . Let F : U → Rn be the function defined by

where each fi : U → R. If F is differentiable at a, then

EXAMPLE 1 Parametric Curves

EXAMPLE 2 The Gradient

The gradient satisfies

EXAMPLE 3 A Non-Differentiable Function

1. The function f is continuous at (0, 0).

2. (∂f )(∂x)(0, 0) exists if and only if ϕ(0) = −ϕ(π).

3. (∂f )(∂y)(0, 0) exists if and only if ϕ(π/2) = −ϕ(3π/2).

where f (0, 0) = 0. A graph of this function is shown in Figure 1. Since

f (x, 0) = 0 and f (0, y) = −y

Figure 1: A non-differentiable function whose partial derivatives exist everywhere.

Theorem 3 Continuous Partials Imply Differentiability

Let U ⊂ Rm be open, and let F : U → Rn . Suppose that the partial derivatives

In general, a function F : U → Rn is called continuously differentiable if each

3. The Chain Rule

Definition: Norm of a Matrix

kAk = max kAuk,

where the max is taken over all unit vectors u ∈ Rm .

Proposition 4 Matrix Norm Inequality

Let A be an n × m matrix, and let v ∈ Rm . Then

kAvk ≤ kAk kvk.

PROOF Let u be a unit vector such that v = kvku. Then

Lemma 5 Mean Value Inequality

Let U ⊂ Rm be open, let F : U → Rn , and suppose that F is differentiable at a

kx − ak < δ ⇒ kF (x) − F (a)k ≤ M kx − ak.

As a trivial consequence of this inequality, we see that a function which is differ-

Theorem 6 Chain Rule

D(G ◦ F )a = DGb DFa .

PROOF Let  > 0. Let LF : Rm → Rn and LG : Rn → Rp be the functions

LF (x) = F (a) + DFa (x − a) and LG (y) = G(b) + DFb (y − b)

We must show that

k(G ◦ F )(x) − (LG ◦ LF )(x)k ≤ kx − ak

when x is sufficiently close to a. By the triangle inequality,

k(G ◦ F )(x) − (LG ◦ LF )(x)k

We shall handle these two terms separately:

kx − ak < δ1 ⇒ kF (x) − F (a)k < r,

Since F is differentiable at a, the Mean Value Inequality gives us a δ2 > 0 so that

kx − ak < δ2 ⇒ kF (x) − F (a)k ≤ M kx − ak.

Combining these, we find that

khk < δ ⇒ kF (a + h) − F (a) − DFa (h)k ≤ khk.

kx − ak < δ ⇒ kF (x) − F (a) − DFa (x − a)k ≤ kx − ak.

PROOF Let > 0. Since F is differentiable at a, there exists a δ > 0 so that

khk < δ ⇒ kF (a + h) − F (a) − DFa (h)k ≤ khk.

PROOF Let > 0. Let LF : Rm → Rn and LG : Rn → Rp be the functions

k(G ◦ F )(x) − (LG ◦ LF )(x)k ≤ kx − ak