TKSM Multivariable Calculus

MULTIVARIABLE CALCULUS
T.K.SUBRAHMONIAN MOOTHATHU
Contents
1. A few remarks about Rn 2

2. Multivariable differentiation: definitions 6
3. Multivariable differentiation: properties 12
4. Higher order partial derivatives 18
5. Inverse function theorem and Implicit function theorem 23
6. Tangent spaces and Lagrange’s multiplier method 27
7. Multivariable Riemann integration over a box 31
8. Iterated integrals and Fubini’s theorem 38
9. Multivariable Riemann integration over a Jordan measurable set 41
10. Change of variable 46
11. Polar, cylindrical, and spherical coordinates 51
12. Line integrals 53
13. Circulation density and Green’s theorem 57
14. Surface integrals 60
15. Divergence and curl 67
16. Stokes’ theorem 69
17. Gauss’ divergence theorem 72
The basic idea in Calculus is to approximate smooth objects locally by linear objects.
Suggested textbooks for additional reading:
1. T.M. Apostol, Calculus, Vol. II, 1969.
2. T.M. Apostol, Mathematical Analysis, 1974.
3. J.J. Callahan, Advanced Calculus, 2010.
4. S.R. Ghorpade and B.V. Limaye, A Course in Multivariable Calculus and Analysis, 2010.
5. J.H. Hubbard and B.B. Hubbard, Vector Calculus, Linear Algebra, and Differential Forms, 1999.
6. S. Lang, Calculus of Several Variables, 1987.
7. P.D. Lax and M.S. Terell, Multivariable Calculus with Applications, 2017.
1
2 T.K.SUBRAHMONIAN MOOTHATHU
8. J.R. Munkres, Analysis on Manifolds, 1991.

9. C.C. Pugh, Real Mathematical Analysis, 2015.
10. M. Spivak, Calculus on Manifolds, 1965.
1. A few remarks about Rn
A general remark about notations: We do not wish to complicate notations unnecessarily; hence
certain notations have to be understood based on the context. For example, the notation ‘x ∈ Rn ’
means x = (x1 , . . . , xn ), where each xj ∈ R; on the other hand, the notation ‘v1 , . . . , vk ∈ Rn ’
means each vi is an n-tuple vi = (vi1 , . . . , vin ) with vij ∈ R.
Recall the following from Linear Algebra:
Definition: Let K = R or C, and X be a vector space over K. A map ∥ · ∥ : X → [0, ∞) is called a

norm on X if the following conditions are satisfied for every x, y ∈ X:
(i) ∥x∥ = 0 iff x = 0,
(ii) ∥αx∥ = |α|∥x∥ for every α ∈ K.
(iii) [triangle inequality] ∥x + y∥ ≤ ∥x∥ + ∥y∥.
If this holds, then (X, ∥ · ∥) is called a normed space. Note that any norm ∥ · ∥ on X induces
a metric on X by the condition d(x, y) := ∥x − y∥ for x, y ∈ X. Thus every normed space is in
particular a metric space.
∑n
Example: Let K = R or C. If 1 ≤ p < ∞, then p-norm ∥ · ∥p on Kn given by ∥x∥p = ( j=1 |xj | )
p 1/p
is a norm on Kn , where the triangle inequality is nothing but the Minkowski’s inequality. When
∑
p = 2, we get the Euclidean norm ∥·∥2 on Kn defined as ∥x∥2 = ( nj=1 |xj |2 )1/2 . The metric induced
∑
by the Euclidean norm is the Euclidean metric dE on Kn , where dE (x, y) = ( nj=1 |xj − yj |2 )1/2 .
Two other commonly used norms on Kn are ∥ · ∥1 (which is the p-norm for p = 1) and ∥ · ∥∞ , defined
∑
respectively as ∥x∥1 = nj=1 |xj | and ∥x∥∞ = max{|xj | : 1 ≤ j ≤ n} for x = (x1 , . . . , xn ) ∈ Kn .
Remark: (i) If ∥ · ∥ is a norm on a vector space X, then |∥x∥ − ∥y∥| ≤ ∥x − y∥ for every x, y ∈ X
(to see this, note by triangle inequality that ∥x∥ ≤ ∥y∥ + ∥x − y∥ and ∥y∥ ≤ ∥x∥ + ∥y − x∥); and
consequently, ∥ · ∥ : X → R is Lipschitz continuous. (ii) Our primary interest is in the normed
space (Rn , ∥ · ∥2 ). As a metric space, Cn can be identified with R2n in a natural manner.
Exercise-1: [Recall from Real Analysis] With respect to dE , the following are true:
(i) Rn is complete and (path) connected.
(ii) Qn is a countable dense subset of Rn .
(iii) Every bounded subset of Rn is totally bounded.
MULTIVARIABLE CALCULUS 3
(iv) Let A ⊂ Rn . Then A is compact ⇔ A is closed and bounded in Rn ⇔ A is sequentially compact

⇔ every infinite subset of A has a limit point in A.
(v) A sequence in (Rn ) converges iff it converges coordinatewise.
(vi) If X is a metric space, then a function f = (f1 , . . . , fn ) : X → Rn is continuous iff each fj is
continuous.
(vii) Every linear map f : Rn → Rm is continuous for n, m ∈ N. Consequently, every invertible
linear map f : Rn → Rn is a homeomorphism.
Notation: When K = R or C, let e1 , . . . , en be the standard basis vectors in Kn , where ej has 1 at

∑
the jth coordinate and zeroes elsewhere. Note that if x = (x1 , . . . , xn ) ∈ Kn , then x = nj=1 xj ej .
Definition: Two norms ∥ · ∥ and ∥ · ∥0 on a vector space X are said to be equivalent if there are
0 < a < b such that a∥x∥ ≤ ∥x∥0 ≤ b∥x∥ for every x ∈ X. Note that this is equivalent to saying
that the identity map I : (X, ∥ · ∥) → (X, ∥ · ∥0 ) is a homeomorphism.
[101] Any two norms on Rn are equivalent (similarly, any two norms on Cn are equivalent).
Proof. The equivalence of norms is an equivalent relation on all the collection of all norms on Rn .
Therefore, it suffices to show that an arbitrary norm ∥ · ∥ on Rn is equivalent to the Euclidean
∑n ∑n
norm ∥ · ∥2 on Rn . Let b = j=1 ∥ej ∥. For x = j=1 xj ej ∈ R , we have |xj | ≤ ∥x∥2 for
n
∑ ∑
every j and hence ∥x∥ ≤ nj=1 |xj |∥ej ∥ ≤ ∥x∥2 nj=1 ∥ej ∥ = b∥x∥2 . From this, we also note that
|∥x∥ − ∥y∥| ≤ ∥x − y∥ ≤ b∥x − y∥2 , and thus ∥ · ∥ : Rn → R is Lipschitz continuous with respect to
the Euclidean norm ∥ · ∥2 . Next, to find a > 0 such that a∥x∥2 ≤ ∥x∥ for every x ∈ Rn , we argue as
follows. Consider the unit sphere S = {y ∈ Rn : ∥y∥2 = 1}, which is compact, and define f : S → R
as f (y) = ∥y∥. Since f is (Lipschitz) continuous and positive on the compact set S, there is a > 0
such that f (y) ≥ a for every y ∈ A. Now for any x ∈ Rn \ {0}, we have y := x/∥x∥2 ∈ S, and
therefore a ≤ f (y) = ∥y∥ = ∥x∥/∥x∥2 , which means a∥x∥2 ≤ ∥x∥ as required.
Remark: Recall the norms ∥ · ∥1 and ∥ · ∥∞ on Rn mentioned earlier. They induce the metrics d1
∑
and d∞ on Rn , where d1 (x, y) = nj=1 |xj − yj | and d∞ (x, y) = max{|xj − yj | : 1 ≤ j ≤ n} for
x, y ∈ Rn . A consequence of [101] is the following:
Exercise-2: (i) For a function f : Rn → R, we have that f : (Rn , dE ) → (R, dE ) is continuous ⇔

f : (Rn , d1 ) → (R, dE ) is continuous ⇔ f : (Rn , d∞ ) → (R, dE ) is continuous.
(ii) If ∥ · ∥ is a norm on Rn , then ∥ · ∥ : (Rn , ∥ · ∥2 ) → R is Lipschitz continuous.
Definition: Let K = R or C, and X be a vector space over K. A function ⟨·, ·⟩ : X × X → K is said

to be an inner product on X if the following conditions are satisfied:
(i) ⟨x, x⟩ ≥ 0 for every x ∈ X; and ⟨x, x⟩ = 0 iff x = 0.
(ii) ⟨y, x⟩ = ⟨x, y⟩ for every x, y ∈ X (where the bar indicates complex conjugate).
(iii) [Linearity in the first variable] For each y ∈ X, the map ⟨·, y⟩ : X → K is K-linear, i.e.,
⟨c1 x1 + c2 x2 , y⟩ = c1 ⟨x1 , y⟩ + c2 ⟨x2 , y⟩ for every x1 , x2 ∈ X and c1 , c2 ∈ K.
Any inner product ⟨·, ·⟩ on X induces a norm ∥ · ∥ on X by the rule ∥x∥ := ⟨x, x⟩1/2 . Then we
have the Cauchy-Schwarz inequality |⟨x, y⟩| ≤ ∥x∥∥y∥ for every x, y ∈ X.
Remark: If K = R, then condition (ii) becomes ⟨y, x⟩ = ⟨x, y⟩ for every x, y ∈ X, and then it
follows by (iii) that ⟨·, ·⟩ is linear in each variable separately (in other words, any inner product on
a real vector space is in particular a bilinear map).
∑
Example: The standard inner product on Rn is defined as ⟨x, y⟩ = nj=1 xj yj for x, y ∈ Rn , and
∑
the standard inner product on Cn is defined as ⟨x, y⟩ = nj=1 xj yj for x, y ∈ Cn . The norm induced
by this inner product is nothing but the Euclidean norm ∥ · ∥2 on Rn (respectively, Cn ).
Remark: Let ⟨·, ·⟩ be the standard inner product on Rn , and ⟨·, ·⟩0 be an arbitrary inner product
on Rn , and A be the n × n matrix whose ijth entry is ⟨ei , ej ⟩0 . Then we see by the bilinearity of
∑ ∑
the inner product that ⟨x, y⟩0 = ni=1 nj=1 xi yj ⟨ei , ej ⟩0 = ⟨Ax, y⟩ for every x, y ∈ Rn . Note that
A is a symmetric real matrix which is positive-definite (i.e, ⟨Av, v⟩ > 0 for every v ∈ Rn \ {0}).
Conversely, it can be verified that if A is an n × n positive-definite real (symmetric) matrix, then
(x, y) → ⟨Ax, y⟩ is an inner product on Rn .
Exercise-3: Let ⟨·, ·⟩ be the standard inner product on Rn .

(i) Let u, v ∈ Rn \ {0}. Then ⟨u, v⟩ = ∥u∥∥v∥ cos θ, where θ ∈ [0, π] is the angle between u and v.
In particular, ⟨u, v⟩ = 0 iff u and v are orthogonal.
(ii) ⟨·, ·⟩ : Rn × Rn → R is continuous, i.e., if (xk ) → x and (yk ) → y in Rn , then (⟨xk , yk ⟩) → ⟨x, y⟩.
(iii) If f : Rn → R is linear and y = (f (e1 ), . . . , f (en )), then f (x) = ⟨x, y⟩ for every x ∈ Rn .
[Hint: (i) Assume ∥u∥ ≤ ∥v∥. Also suppose θ ∈ [0, π/2] (else, replace v with −v). Let w be the
point where the perpendicular from u meets the line passing through the origin and v. The right-
angled triangle with vertices at u, w, and v has side-lengths ∥v − u∥, ∥u∥ sin θ, and ∥v∥ − ∥u∥ cos θ.
Hence ∥v − u∥2 = ∥u∥2 sin2 θ + (∥v∥ − ∥u∥ cos θ)2 = ∥u∥2 + ∥v∥2 − 2∥u∥∥v∥ cos θ. On the other
hand, ∥v − u∥2 = ⟨v − u, v − u⟩ = ∥u∥2 + ∥v∥2 − 2⟨u, v⟩ since ⟨u, v⟩ = ⟨v, u⟩. (ii) |⟨x, y⟩ − ⟨xk , yk ⟩| ≤
|⟨x, y⟩ − ⟨x, yk ⟩| + |⟨x, yk ⟩ − ⟨xk , yk ⟩| = |⟨x, y − yk ⟩| + |⟨x − xk , yk ⟩| ≤ ∥x∥∥y − yk ∥ + ∥x − xk ∥∥yk ∥
by Cauchy-Schwarz inequality, and observe that (yk ) is bounded since it is convergent.]
Exercise-4: [Verify the claims] Let a ∈ Rn . (i) For 1 ≤ k ≤ n, any k-dimensional plane in Rn
containing a has the form a + span{v1 , . . . , vk }, where v1 , . . . , vk ∈ Rn are linearly independent
(here, an (n − 1)-dimensional plane in Rn is called a hyperplane). In particular, any 2-dimensional
plane containing a has the form {a + su + tv : s, t ∈ R}, where u, v ∈ Rn are linearly independent.
(ii) If y ∈ Rn \ {0}, then the hyperplane H passing through a and orthogonal to y is given by
H = {x ∈ Rn : ⟨x − a, y⟩ = 0}. Letting r = ⟨a, y⟩, we may also write H = {x ∈ Rn : ⟨x, y⟩ = r}.
For any b ∈ Rn , the equation of the line passing through b and orthogonal to H is {b + ty : t ∈ R}.
If c ∈ H is the point where the this line intersects H, then c = b + t0 y and ⟨b + t0 y − a, y⟩ = 0 so
that t0 = ⟨a − b, y⟩/∥y∥2 . If ∥y∥ = 1, then we get c = b + ⟨a − b, y⟩y, and dist(b, H) = ∥b − c∥ =
∥⟨b − a, y⟩y∥ = |⟨b − a, y⟩|.
A little bit of geometric visualization is helpful in grasping various concepts in Multivariable

Calculus. Kindle your geometric thinking with the following:
Exercise-5: The graph of a function f : Rn → Rk is the subset {(x, f (x)) : x ∈ Rn } of Rn+k .

Visualize the graphs f1 , f2 , f3 , f4 : R2 → R and g1 , g2 , g3 , g4 : R → R2 as subsets of R3 , where
(i) f1 (x, y) = x + y, f2 (x, y) = xy, f3 (x, y) = x2 + y 2 , and f4 (x, y) = x2 − y 2 .
(ii) g1 (x) = (x + 1, 0), g2 (x) = (x, 2x), g3 (x) = (x, x2 ), and g4 (x) = (|x|, x2 ).
Definition: Any sequence in Rn converging to the origin (0, . . . , 0) ∈ Rn will be called a null
sequence. In particular, a null sequence in R means a sequence converging to 0.
Remark: f : Rn → Rm need not be continuous even if it is continuous in each variable separately .

2xy
(i) Let f : R2 → R be f (0, 0) = 0 and f (x, y) = 2 for (x, y) ̸= (0, 0). Then f is continuous in
x + y2
each variable separately when the other variable is fixed. But f (ak , ak ) = 1 ̸→ 0 = f (0, 0) if (ak ) is
a null sequence in R \ {0}.
xy 2
(ii) For a more striking example, consider f : R2 → R given by f (0, y) = 0 and f (x, y) =
x2 + y 4
for x ̸= 0. Then f is continuous in each variable separately. Moreover, if c ∈ R, and (xk ) is a null
cxk
sequence in R \ {0}, then f (xk , cxk ) = → 0 = f (0, 0), which means f (xk , yk ) → f (0, 0)
1 + cx2k
whenever (xk , yk ) → (0, 0) along a straight line. In spite of this, f is not continuous at (0, 0);
for any null sequence (ak ) in R \ {0}, we have f (a2k , ak ) = 1/2 ̸→ 0 = f (0, 0). When we study
differentiability, we will see that a function f : Rn → Rm may fail to be differentiable even if it is
differentiable along all linear directions.
Remark: Some care must be taken while considering limits in Rn . Let U ⊂ R2 be open, f : U → R
be a function and (a, b) ∈ U . The three expressions ‘lim(x,y)→(a,b) f (x, y)’, ‘limx→a limy→b f (x, y)’,
‘limy→b limx→a f (x, y)’ mean three different things. If f is continuous in a neighborhood of the
point (a, b), then the three expressions give the same value, namely f (a, b) (check). If f is not
continuous, then some of the limits may not exist, and even if they exist, they may not be equal.
|x|
(i) Let f : R2 → R be f (0, 0) = 0 and f (x, y) = for (x, y) ̸= (0, 0). Then we have
|x| + |y|
limx→0 (limy→0 f (x, y)) = 1 ̸= 0 = limy→0 (limx→0 f (x, y)), and (hence) lim(x,y)→(0,0) f (x, y) does
not exist (for the last assertion, we may also note that f (1/k, 0) = 1 and f (0, 1/k) = 0).
(ii) Let f : R2 → R be f (x, y) = g(y)x + g(x)y, where g(x) = 1 for x ≥ 0 and g(x) = −1
for x < 0. Then |f (x, y)| ≤ |x| + |y| and hence by Exercise-2, f is continuous at (0, 0) with
lim(x,y)→(0,0) f (x, y) = 0 = f (0, 0). If x ∈ (0, 1), then f (x, 1/k) = x + 1/k and f (x, −1/k) =
−x − 1/k. Hence limy→0 f (x, y) does not exist. A similar observation holds with x and y inter-
changed. Thus the two iterated limits limx→0 limy→0 f (x, y) and limy→0 limy→0 f (x, y) do not exist.
Moreover, it follows that f fails to be continuous in every neighborhood of (0, 0).
Remark: Often, we will denote the Euclidean norm ∥ · ∥2 on Rn simply as ∥ · ∥ when no other norm
is being considered. Similarly, the notation ⟨·, ·⟩ will mean the standard inner product.
2. Multivariable differentiation: definitions
General tip: Keep track of the dimension: throughout this course, while considering elements of
the Euclidean space and functions between Euclidean spaces, make a mental note of the dimension
of the relevant Euclidean space(s), i.e., observe clearly whether the space is R or Rn or Rm , etc.
This will help you to reduce notational as well as conceptual errors.
Definition: Let U ⊂ Rn be open, f : U → Rm be a function, and a ∈ U .

(i) If v ∈ Rn , then the directional derivative of f at a in the direction of v is defined as f ′ (a; v) =
f (a + tv) − f (a)
limt→0 ∈ Rm , if the limit exists (this makes sense even if v = 0 ∈ Rn ). Note
t
that if f = (f1 , . . . , fm ), then f ′ (a; v) = (f1′ (a; v), . . . , fm
′ (a; v)); moreover, if ε > 0 is chosen with
a + tv ∈ U for every t ∈ (−ε, ε), and gj : (−ε, ε) → R is defined as gj (t) = fj (a + tv), then
fj′ (a; v) = gj′ (0), if the derivative exists.
(ii) The directional derivatives f ′ (a; e1 ), . . . , f ′ (a; en ) in the direction of the standard basis vectors
∂f
e1 , . . . , en ∈ Rn are called the partial derivatives of f at a. We will write f ′ (a; ej ) as (a), and
∂xj
call it the partial derivative of f at a with respect to xj , or call it the jth partial derivative of f at
∂f
a. Note that (a), if it exists, is the derivative of the one-variable function x → f (x, a2 , . . . , an )
∂x1
at x = a1 , and similarly for other partial derivatives. Thus the jth partial derivative of f measures
the rate of change of f with respect to the jth variable, when the other variables are kept fixed.
∂f
Remark: To determine (a), often the following method is used: formally differentiate f with
∂xj
respect to xj and substitute a in the resulting expression. This works provided the partial derivative
exists in a neighborhood of a and is continuous at a. If we cannot see the continuity of the partial
∂f
derivative at a in advance, then the existence or value of (a) should be determined by directly
∂xj
f (a + tej ) − f (a)
studying the limit limt→0 .
t
∂f
Example: (i) Let f : R2 → R be f (x, y) = x2 y. Then for a = (a1 , a2 ), we have (a) = 2a1 a2
∂x
∂f f (a1 + 3t, a2 + 5t) − f (a1 , a2 )
and (a) = a21 . Moreover, if v = (3, 5), then f ′ (a; v) = limt→0 =
∂y t
(a1 + 3t)2 (a2 + 5t) − a21 a2 ∂f ∂f
limt→0 = 6a1 a2 + 5a21 = 3 (a) + 5 (a).
t ∂x ∂y
(ii) The existence of partial derivatives does not ensure the existence of directional derivatives. Let
f : R2 → R be f (x, y) = x if 0 ≤ x ≤ y, f (x, y) = y if 0 ≤ y ≤ x, and f (x, y) = 0 otherwise. Check
∂f ∂f
that f is continuous. For a = (0, 0), we have (a) = f ′ (a; e1 ) = 0 and (a) = f ′ (a; e2 ) = 0.
∂x ∂y
f (a + tv) − f (a) f (t, t) − 0 t
But if v = (1, 1), then limt→0+ = limt→0+ = limt→0+ = 1 ̸= 0 =
t t t
f (a + tv) − f (a) ′
limt→0− , and hence the directional derivative f (a; v) does not exist (geometrically,
t
the graph of f along the line x = y has a sharp turning at (0, 0)).
(iii) The existence of all directional derivatives does not imply continuity for the function. Let
xy 2
f : R2 → R be f (0, 0) = 0 and f (x, y) = 2 for (x, y) ̸= (0, 0). We saw earlier that f is
x + y4
not continuous at (0, 0) because f (1/n2 , 1/n) = 1/2 ̸→ 0. But if a = (0, 0) and v = (v1 , v2 ),
then f ′ (a; v) = 0 if v1 = 0 and f ′ (a; v) = v22 /v1 if v1 ̸= 0; thus all directional derivatives exist at
a = (0, 0). Note that the map v → f ′ (a; v) from R2 → R is not linear in this example.
xy 3
(iv) Let f : R2 → R be f (0, 0) = 0 and f (x, y) = for (x, y) ̸= (0, 0) =: a. Consider
x2 + y6
f (a + tv) − f (a) tv1 v3
v = (v1 , v2 ) ̸= (0, 0). Note that = 2 . If v1 = 0 and v2 ̸= 0, then
t v1 + t4 v26
f (a + tv) − f (a) tv1 v3
= 0, and hence f ′ (a; v) = 0. If v1 ̸= 0, then f ′ (a; v) = limt→0 2 = 0. Thus
t v1 + t4 v26
all directional derivatives exist at a, and the map v 7→ f ′ (a; v) from R2 → R is the zero map, which
is linear. However, f (1/n3 , 1/n) = 1/2 ̸→ 0 = f (a), and therefore f is not continuous at a.
Discussion: In the case of one dimension, we think of the derivative as the ‘rate of change’; moreover,
f (x) − f (a)
if f is differentiable at a, then limx→a = f ′ (a), which is equivalent to saying that
x−a
f (x) − f (a) − L(x − a)
limx→a = 0, where L : R → R is the linear map y 7→ f ′ (a)y. To define
x−a
differentiability in higher dimensions, we need to consider rate of change in different directions. The
whole information about rate of change along various directions is encoded in the map v 7→ f ′ (a; v).
It is nice if this map is a linear map, say L. But this map being linear is not sufficient to guarantee
the continuity of f at a (as noted in the previous example). For differentiability to imply continuity,
f (a + tv) − f (a)
we should also demand that the limit limt→0 exists uniformly for all unit vectors
t
∥f (x) − f (a) − L(x − a)∥
v. This is ensured by demanding that limx→a = 0.
∥x − a∥
[102] Let U ⊂ Rn be open, f : U → Rm , and a ∈ U . Then the following are equivalent:
∥f (x) − f (a) − L(x − a)∥

(i) There is a linear map L : Rn → Rm such that limx→a = 0.
∥x − a∥
(ii) All directional derivatives of f exist at a, the map v 7→ f ′ (a; v) from Rn to Rm is linear, and for
f (a + tv) − f (a)
every ε > 0, there is δ > 0 such that ∥ − f ′ (a; v)∥ < ε for every t ∈ (−δ, δ) \ {0}
t
and every v ∈ Rn with ∥v∥ = 1.
Moreover, if (i) and (ii) hold, then f ′ (a; v) = L(v) for every v ∈ Rn .
Proof. Let S = {v ∈ Rn : ∥v∥ = 1}. (i) ⇒ (ii): Let ε > 0. Choose δ > 0 such that
∥f (x) − f (a) − L(x − a)∥
< ε whenever 0 < ∥x − a∥ < δ. Then for every t ∈ (−δ, δ) \ {0}
∥x − a∥
and every v ∈ S, putting x = a + tv and noting tL(v) = L(tv), we get that
f (a + tv) − f (a) ∥f (a + tv) − f (a) − L(tv)∥ ∥f (x) − f (a) − L(x − a)∥
∥ − L(v)∥ = = < ε.
t |t| ∥x − a∥
This means (ii) holds with f ′ (a; v) = L(v).
(ii) ⇒ (i): Let L : Rn → Rm be L(v) = f ′ (a; v), which is a linear map by (ii). Given ε > 0,
f (a + tv) − f (a)
choose δ > 0 such that ∥ − f ′ (a; v)∥ < ε for every t ∈ (−δ, δ) \ {0} and every
t
x−a
v ∈ S. Then for any x ∈ U with 0 < ∥x − a∥ < δ, putting t = ∥x − a∥, v = and noting
∥x − a∥
f ′ (a; tv) = tf ′ (a; v), we get that
∥f (x) − f (a) − L(x − a)∥ ∥f (a + tv) − f (a) − f ′ (a; tv)∥ f (a + tv) − f (a)
= =∥ − f ′ (a; v)∥ < ε.
∥x − a∥ |t| t
This establishes (i).
Definition: Let U ⊂ Rn be open, f : U → Rm be a function, and a ∈ U . We say f is differentiable

∥f (x) − f (a) − L(x − a)∥
at a if there is a linear map L : Rn → Rm such that limx→a = 0. If this
∥x − a∥
condition holds, then the linear map L must be unique (because we must have L(v) = f ′ (a; v) for
every v ∈ Rn by [102]), and L is called the total derivative or differential of f at a, and we denote
L either as f ′ (a; ·) or as Df (a; ·) (so that L(v) = f ′ (a; v) = Df (a; v) for every v ∈ Rn ). Other
notations for L found in textbooks are Dfa , Df (a), Da f , df (a), dfa , etc. We say f is differentiable
in U if f is differentiable at every a ∈ U .
Example: (i) If f : Rn → Rm is a constant map f ≡ c, then clearly f is differentiable with

f ′ (a; ·) ≡ 0 (the zero map from Rn to Rm ) for every a ∈ R. (ii) Let L : Rn → Rm be linear,
y ∈ Rm , and f : Rn → Rm be the affine map f (x) = L(x) + y. Then for each a ∈ Rn , we have
f (x) − f (a) − L(x − a) = 0 ∈ Rm and consequently, f is differentiable at a, and the differential of f
at a is L, i.e., f ′ (a; v) = L(v) for every v ∈ Rn . In particular (if y = 0), the differential of a linear
map L : Rn → Rm at any a ∈ Rn is the map L itself.
Remark: We emphasize that the total derivative at a ∈ Rn of a differentiable function f : Rn → Rm

is a linear map L : Rn → Rm and not a real number. In Mathematics, a transition from a
lower dimensional theory to a higher dimensional theory often demands such modifications in one’s
perspective. For instance, a real polynomial f : R → R of degree n ≥ 1 has at most n zeroes,
but a polynomial f : R2 → R in two variables of degree n ≥ 1 can have uncountably many zeroes
(example: f (x, y) = xy); so the correct perspective in higher dimension is the ‘dimension’ of the
zero-set and not the number of zeroes.
Definition: Let U ⊂ Rn be open and a ∈ U . (i) If f : U → R is a function such that the partial
∂f
derivative (a) exists for every j ∈ {1, . . . , n}, then the gradient vector ∇f (a) ∈ Rn of f at a is
∂xj
∂f ∂f
defined as ∇f (a) = ( (a), . . . , (a)). For example, if f : R3 → R is f (x, y, z) = 2x3 y − yz 4 ,
∂x1 ∂xn
then ∇f (x, y, z) = (6x2 y, 2x3 − z 4 , −4yz 3 ).
∂fi
(ii) If f = (f1 , . . . , fm ) : U → Rm is a function such that the partial derivative (a) exists for
∂xj
every i ∈ {1, . . . , m} and every j ∈ {1, . . . , n}, then the Jacobian matrix Jf (a) of f at a is defined
∂fi
as the m × n matrix whose ijth entry is (a). Note that the ith row of Jf (a) is ∇fi (a).
∂xj
Exercise-6: [Directional derivative as a linear combination of partial derivatives] Let U ⊂ Rn be
open, a ∈ U , and suppose f : U → Rm is differentiable at a. Then,
∑ ∑ ∂f ∑
(i) f ′ (a; v) = nj=1 vj f ′ (a; ej ) = nj=1 vj (a) for every v = (v1 , . . . , vn ) = nj=1 vj ej ∈ Rn .
∂xj
(ii) If m = 1, then f (a; v) = ⟨∇f (a), v⟩ for every i ∈ {1, . . . , m} and every v ∈ Rn .
(iii) If n = 1, and f = (f1 , . . . , fm ), then f ′ (a) = (f1′ (a), . . . , fm
′ (a)).
(iv) In the general case, f ′ (a; v) = (⟨∇f1 (a), v⟩, . . . , ⟨∇fm (a), v⟩), where f = (f1 , . . . , fm ).
[Hint: (i) v 7→ f ′ (a; v) is linear by [102]. (ii) This follows from (i).]
Remark: Let U ⊂ Rn be open, and f : U → Rm be differentiable at a ∈ U with ∇f (a) ̸= 0. Let S =

∇f (a)
{v ∈ Rn : ∥v∥ = 1} and w = . The function v 7→ f ′ (a; v) = ⟨∇f (a), v⟩ = ∥∇f (a)∥⟨u, v⟩
∥∇f (a)∥
from S to R attains its maximum at v = w by Cauchy-Schwarz inequality (or more precisely, by the
fact ⟨v, w⟩ = ∥v∥∥w∥ cos θ, where θ is the angle between v and w). Thus the gradient vector ∇f (a)
indicates the direction in which the directional derivative f ′ (a; v) (with ∥v∥ = 1) is maximum.
In the one-dimensional case, differentiability of a function is characterized in terms of the so

called Caratheodory lemma 1. Since division by x − a becomes meaningless in Rn for n ≥ 2, we need
to modify Caratheodory lemma in higher dimensions. We will give various modified forms - one as
the equivalence (i) ⇔ (iv) in [103] below, and two others as the results [104] and [105].
[103] Let U ⊂ Rn be open, f = (f1 , . . . , fm ) : U → Rm be a function, and a ∈ U . Then the

following are equivalent: (i) f is differentiable at a.
(ii) fi is differentiable at a for every i ∈ {1, . . . , m}.
1See my notes Real Analysis.

∂fi
(iii) The partial derivative (a) exists for every i ∈ {1, . . . , m} and every j ∈ {1, . . . , n}, and
∂xj
|fi (x) − fi (a) − ⟨∇fi (a), x − a⟩|
limx→a = 0 for each i ∈ {1, . . . , m}.
∥x − a∥
(iv) There exist a linear map L : R → Rm and a function F : U → Rm with limx→a F (x) = 0 =
n
F (a) such that f has the Caratheodory representation f (x) − f (a) = L(x − a) + ∥x − a∥F (x) for
every x ∈ U (and if this holds, then f ′ (a; ·) = L).
Proof. (i) ⇔ (ii): Let L : Rn → Rm be a linear map, and write L = (L1 , . . . , Lm ). Since conver-
∥f (x) − f (a) − L(x − a)∥
gence in Rm is determined coordinatewise, we note that limx→a = 0 iff
∥x − a∥
|fi (x) − fi (a) − Li (x − a)|
limx→a = 0 for every i ∈ {1, . . . , m}.
∥x − a∥
(ii) ⇒ (iii): If Li = fi′ (a; ·), then Li (v) = fi (a; v) = ⟨∇fi (a), v⟩ by [102] and Exercise-6.
(iii) ⇒ (ii): Let Li : Rn → R be the linear map defined as Li (v) = ⟨∇fi (a), v⟩. Then we obtain
|fi (x) − fi (a) − Li (x − a)|
limx→a = 0 by (iii), and hence fi is differentiable at a.
∥x − a∥
(i) ⇒ (iv): Let L be as in the definition of differentiability, and define F : U → Rm as F (a) = 0
f (x) − f (a) − L(x − a)
and F (x) = for x ̸= a.
∥x − a∥
∥f (x) − f (a) − L(x − a)∥
(iv) ⇒ (i): If f (x) − f (a) = L(x − a) + ∥x − a∥F (x), then limx→a =
∥x − a∥
limx→a ∥F (x)∥ = 0.
[104] [Caratheodory lemma for multivariable real-valued functions] Let U ⊂ Rn be open, f : U → R

be a function, and a ∈ U . Then the following are equivalent: (i) f is differentiable at a.
(ii) There is a vector-valued function F : U → Rn (called the Caratheodory function of f at a) such
that f (x) − f (a) = ⟨F (x), x − a⟩ for every x ∈ U , and F is continuous at a.
Moreover, if (i) and (ii) hold, then F (a) = ∇f (a); and the identity f (x) − f (a) = ⟨F (x), x − a⟩
will also be called the Caratheodory representation of f .
(f (x) − f (a) − ⟨∇f (a), x − a⟩)

Proof. (i) ⇒ (ii): Define F : U → Rn as F (x) = ∇f (a) + (x − a)
∥x − a∥2
|f (x) − f (a) − ⟨∇f (a), x − a⟩|
for x ̸= a and F (a) = ∇f (a). Then ∥F (x) − F (a)∥ = → 0 as x → a
∥x − a∥
(f (x) − f (a) − ⟨∇f (a), x − a⟩)∥x − a∥2
by (i). Also, we have ⟨F (x), x − a⟩ = ⟨∇f (a), x − a⟩ + =
∥x − a∥2
f (x) − f (a) for x ̸= a.
(ii) ⇒ (i): Let L : Rn → R be L(v) = ⟨F (a), v⟩, which is a linear map. Then for x ̸= a, using (ii) and
Cauchy-Schwarz inequality, we see that |f (x) − f (a) − L(x − a)| = |⟨F (x), x − a⟩ − ⟨F (a), x − a⟩| =
|f (x) − f (a) − L(x − a)|
|⟨F (x)−F (a), x−a⟩| ≤ ∥F (x)−F (a)∥∥x−a∥. Hence ≤ ∥F (x)−F (a)∥ → 0
∥x − a∥
as x → a by the continuity of F at a.
To generalize [104] to vector-valued functions, we need a little preparation.
Exercise-7: Let L(Rn , Rm ) = {L : Rn → Rm : L is a linear map}.

(i) L(Rn , Rm ) is a vector space over R with respect to pointwise addition and scalar multiplication.
(ii) ∥L∥ := sup{∥L(v)∥ : v ∈ Rn and ∥v∥ ≤ 1} for L ∈ L(Rn , Rm ) defines a norm - called the
operator norm - on the vector space L(Rn , Rm ).
(iii) ∥L(v)∥ ≤ ∥L∥∥v∥ for every L ∈ L(Rn , Rm ) and every v ∈ Rn .
(iv) Since a norm induces a metric, L(Rn , Rm ), becomes a metric space, and hence we can talk
about continuity of functions from Rn to L(Rn , Rm ). Note that L(Rn , Rm ) can be identified with
{all m × n real matrices}, which in turn can be identified with Rm×n . With this identification, it
follows by [101] that convergence in L(Rn , Rm ) with respect to the operator norm is equivalent to
the convergence in Rm×n with respect to the Euclidean norm.
[105] [Caratheodory lemma for vector-valued functions] Let U ⊂ Rn be open, f : U → Rm be a

function, and a ∈ U . Then the following are equivalent: (i) f is differentiable at a.
(ii) There is a function Φ : U → L(Rn : Rm ), Φ(x) = Lx , such that f (x) − f (a) = Lx (x − a) for
every x ∈ U , and Φ is continuous at a (which means limx→a ∥Lx − La ∥ = 0 with respect to the
operator norm).
Moreover, if (i) and (ii) hold, then La = f ′ (a; ·); and the identity f (x) − f (a) = Lx (x − a) will
also be called the Caratheodory representation of f .
Proof. Write f = (f1 , . . . , fm ). (i) ⇒ (ii): If f is differentiable at a, then each fi is differentiable

at a. Hence by [104], there are functions Fi : U → Rn such that for each i ∈ {1, . . . , m}, Fi is
continuous at a and fi (x) − fi (a) = ⟨Fi (x), x − a⟩ for every x ∈ U . Define Φ : U → L(Rn : Rm )
as Φ(x) = Lx , where Lx : Rn → Rm is defined as Lx (v) = (⟨F1 (x), v⟩, . . . , ⟨Fm (x), v⟩). Note that
each Lx is indeed linear because each of its coordinates is a linear map. Since the ith coordinate of
Lx (x − a) is equal to ⟨Fi (x), x − a⟩ = fi (x) − fi (a), we have f (x) − f (a) = Lx (x − a) for every x ∈ U .
∑
If ∥v∥ ≤ 1, then ∥Lx (v) − La (v)∥ ≤ ( m i=1 ∥Fi (x) − Fi (a)∥ )
2 1/2 by Cauchy-Schwarz inequality, and
∑
hence (the operator norm) ∥Lx − La ∥ ≤ ( m i=1 ∥Fi (x) − Fi (a)∥ )
2 1/2 . Since each F is continuous at
i
a, it follows that Φ is continuous at a.
(ii) ⇒ (i): Given Φ as in (ii) with Φ(x) = Lx , put L = La . Then f (x) − f (a) − L(x − a) =
Lx (x − a) − La (x − a). Hence ∥f (x) − f (a) − L(x − a)∥ ≤ ∥Lx − La ∥∥x − a∥ by Exercise-7(iii). So,
∥f (x) − f (a) − L(x − a)∥
limx→a ≤ ∥Lx − La ∥ → 0 as x → a by the continuity of Φ at a.
∥x − a∥
The following sufficient condition is practically useful to check whether a multivariable function
is differentiable. For u, v ∈ Rn , let [u, v] denote the line segment joining u and v.
[106] Let U ⊂ Rn be open, f = (f1 , . . . , fm ) : U → Rm be a function, and a ∈ U . If all the partial

∂fi
derivatives exist and are continuous in a neighborhood of a, then f is differentiable at a.
∂xj
Proof. By considering each fi separately, we may suppose f is real-valued (i.e., m = 1). Choose
r > 0 such that B(a, r) ⊂ U and all the partial derivatives of f are continuous in B(a, r). Fix
∑
x ∈ B(a, r), and note that x − a = nj=1 (xj − aj )ej . Define vectors u0 , u1 , . . . , un ∈ B(a, r) as
follows: u0 = a, u1 = u0 + (x1 − a1 )e1 , u2 = u1 + (x2 − a2 )e2 , · · · , un = un−1 + (xn − an )en = x.
Observe that uj−1 and uj differ only in the jth coordinate. Define gj : [0, 1] → R as gj (t) =
f (uj−1 + t(xj − aj )ej ). Applying the one-variable Mean value theorem to gj , we may find a vector
vj = uj−1 + tj (xj − aj )ej on the line segment [uj−1 , uj ] such that f (uj ) − f (uj−1 ) = gj (1) − gj (0) =
∂f ∂f ∂f
gj′ (tj ) = (vj )(xj − aj ). Define F : B(a, r) → Rn as F (x) = ( (v1 ), . . . , (vn )). Then we
∂xj ∂x1 ∂xn
∑ ∑ ∂f
have f (x) − f (a) = nj=1 (f (uj ) − f (uj−1 )) = nj=1 (vj )(xj − aj ) = ⟨F (x), x − a⟩ for every
∂xj
x ∈ B(a, r). Moreover, F (x) → ∇f (a) = F (a) as x → a by the continuity of the partial derivatives
of f . Hence f is differentiable at a by [104].
3. Multivariable differentiation: properties
[107] [Basic properties] Let U ⊂ Rn be open, and a ∈ U .

(i) Suppose f : U → Rm is differentiable at a. Then f is locally Lipschitz at a in the sense that
there are λ, δ > 0 with B(a, δ) ⊂ U and ∥f (x) − f (a)∥ ≤ λ∥x − a∥ for every x ∈ B(a, δ) (this is
weaker than saying f is Lipschitz continuous in B(a, δ)). In particular, f is continuous at a.
(ii) If f = (f1 , . . . , fm ) : U → Rm is differentiable at a, then the matrix of the linear map f ′ (a; ·) :
∂fi
Rn → Rm is the m × n (not n × m) Jacobian matrix Jf (a) whose ijth entry is (a).
∂xj
(iii) [Linearity] If f, g : U → Rm are differentiable at a and c1 , c2 ∈ R, then c1 f +c2 g is differentiable
at a and (c1 f + c2 g)′ (a; ·) = c1 f ′ (a; ·) + c2 g ′ (a; ·).
(iv) [Product rule] Let f, g : U → R be differentiable at a. Then their product f g is differentiable at
a, and (f g)′ (a; ·) = f (a)g ′ (a; ·) + g(a)f ′ (a; ·). Consequently2, ∇(f g)(a) = f (a)∇g(a) + g(a)∇f (a).
(v) [Quotient rule] Let f, g : U → R be differentiable at a, and suppose g is non-vanishing in U (or in
g(a)f ′ (a; ·) − f (a)g ′ (a; ·)
a neighborhood of a). Then f /g is differentiable at a, and (f /g)′ (a; ·) = .
g(a)2
g(a)∇f (a) − f (a)∇g(a)
Consequently, ∇(f /g)(a) = .
g(a)2
Proof. (i) By [103](iv), f has a Caratheodory representation f (x)−f (a) = L(x−a)+∥x−a∥F (x) for
x ∈ U , where L : Rn → Rm is linear and limx→a F (x) = 0 = F (a). Since L is linear, there is M > 0
such that ∥L(x) − L(y)∥ ≤ M ∥x − y∥ for every x, y ∈ Rn . Since limx→a ∥F (x)∥ = 0, there is δ > 0
2The existence of ∇(f g)(a) by itself does not imply the differentiability of f g at a.
such that B(a, δ) ⊂ U and ∥F (x)∥ ≤ 1 for every x ∈ B(a, δ). Then ∥f (x) − f (a)∥ ≤ (M + 1)∥x − a∥
for every x ∈ B(a, δ).
(ii) The jth column of the matrix of the linear map f ′ (a; ·) is specified by the vector f ′ (a; ej ) =
∂f ∂f1 ∂fm
(a) = ( (a), . . . , (a)).
∂xj ∂xj ∂xj
(iii) If f (x)−f (a) = L1 (x−a)+∥x−a∥F (x) and g(x)−g(a) = L2 (x−a)+G(x) are the Caratheodory
representations as in [103](iv) of f and g respectively, then (c1 f + c2 g)(x) − (c1 f + c2 g)(a) =
(c1 L1 + c2 L2 )(x − a) + ∥x − a∥(c1 F + c2 G)(x). Again use [103].
(iv) Let L1 = f ′ (a; ·), L2 = g ′ (a; ·), and put L = f (a)L2 + g(a)L1 . By adding and subtracting the
quantity f (x)g(a) − f (x)L2 (x − a), we may note that ∥f (x)g(x) − f (a)g(a) − L(x − a)∥ ≤
∥f (x)∥∥g(x) − g(a) − L2 (x − a)∥ + ∥f (x) − f (a)∥∥L2 (x − a)∥ + ∥g(a)∥∥f (x) − f (a) − L1 (x − a)∥.
∥f (x)g(x) − f (a)g(a) − L(x − a)∥
Hence, limx→a = 0 by hypothesis, by part (i), and by the fact
∥x − a∥
that limx→a L2 (x − a) = 0.
We can give another (easier) proof using [104] as follows. Let f (x) − f (a) = ⟨F (x), x − a⟩ and
g(x) − g(a) = ⟨G(x), x − a⟩ be the Caratheodory representations of f and g given by [104]. Then
f (x)g(x) − f (a)g(a) = f (x)g(x) − f (x)g(a) + f (x)g(a) − f (a)g(a) = f (x)(g(x) − g(a)) + g(a)(f (x) −
f (a)) = f (x)⟨G(x), x − a⟩ + g(a)⟨F (x), x − a⟩ = ⟨H(x), x − a⟩, where H(x) := f (x)G(x) + g(a)F (x).
Then H is continuous at a since f, G, F are continuous at a. Hence by [104], f g is differentiable at
a and ∇(f g)(a) = H(a) = f (a)G(a) + g(a)F (a) = f (a)∇g(a) + g(a)∇f (a).
(v) Let f (x) − f (a) = ⟨F (x), x − a⟩ and g(x) − g(a) = ⟨G(x), x − a⟩ be the Caratheodory representa-
f (x)g(a) − f (a)g(x)
tions of f and g given by [104]. Then (f /g)(x)−(f /g)(a) = . But = f (x)g(a)−
g(x)g(a)
f (a)g(x) = (f (x) − f (a))g(a) − f (a)(g(x) − g(a)) = g(a)⟨F (x), x − a⟩ − f (a)⟨G(x), x − a⟩. Hence
g(a)F (x) − f (a)G(x)
(f /g)(x)−(f /g)(a) = ⟨H(x), x−a⟩, where H(x) := . Clearly, H is continuous
g(x)g(a)
g(a)∇f (a) − f (a)∇g(a)
at a. Therefore f /g is differentiable at a and ∇(f /g)(a) = H(a) = .
g(a)2
Exercise-8: Let f : Rn → R be a polynomial in n variables, i.e., f has the form f (x1 , . . . , xn ) =

∑p q(1,k) q(n,k)
k=0 ck x1 · · · xn , where p ∈ N ∪ {0}, ck ∈ R, and q(j, k) ∈ N ∪ {0}. Then,
(i) f is differentiable in Rn .
∂f ∂f ∂f
(ii) ∇f (a) = ( (a), · · · , (a)), and (a) is obtained by formally differentiating f with
∂x1 ∂xn ∂xj
respect to xj by keeping the other variables fixed.
[Hint: (i) For 1 ≤ j ≤ n, let πj : Rn → R be the projection x 7→ xj , which is linear and hence
differentiable. Let π0 : Rn → R be the constant map π0 ≡ 1, which is also differentiable. Now note
that f is a linear combination of products of πj ’s for 0 ≤ j ≤ n, and use [107](iii) and [107](iv).]
Remark: Identify M (n, R) := {all n × n real matrices} with Rn×n . Then A 7→ det(A) and A 7→
trace(A) from Rn×n to R are polynomials in n × n variables, and hence differentiable by Exercise-8.
[108] (i) [Chain rule] Let U ⊂ Rn and V ⊂ Rm be open, f : U → V and g : V → Rk be functions,

and let a ∈ U . If f is differentiable at a, and g is differentiable at f (a), then g ◦ f : U → Rk is
differentiable at a, and (g ◦ f )′ (a; ·) = g ′ (f (a); ·) ◦ f ′ (a; ·) = g ′ (f (a); f ′ (a; ·)) (in terms of Jacobian
matrices, this means that Jg◦f (a) = Jg (f (a))Jf (a)).
(ii) Suppose k = 1 and f = (f1 , . . . , fm ) in (i). Then we have that (g ◦f )′ (a; v) = g ′ (f (a); f ′ (a; v)) =
∑ ∂g ∑ ∂g
⟨∇g(f (a)), f ′ (a; v)⟩ = m i=1 (f (a))⟨∇fi (a), v⟩ ∀ v ∈ Rn . So, ∇(g◦f )(a) = m i=1 (f (a))∇fi (a).
∂yi ∂yi
∂(g ◦ f ) ∑ ∂g ∂fi ∂f
Moreover, (a) = m i=1 (f (a)) (a) = ⟨∇g(f (a)), (a)⟩ for 1 ≤ j ≤ n.
∂xj ∂yi ∂xj ∂xj
(iii) If n = k = 1 in (i), then (g ◦ f )′ (a) = ⟨∇g(f (a)), f ′ (a)⟩.
(iv) If m = k = 1 in (i), then (g ◦ f )′ (a; v) = g ′ (f (a))f ′ (a; v) = g ′ (f (a))⟨∇f (a), v⟩ for every v ∈ Rn .
(v) If n = m = 1 in (i), then (g ◦ f )′ (a) = g ′ (f (a))f ′ (a) = (g1′ (f (a)f ′ (a), . . . , gk′ (f (a))f ′ (a)), where
g = (g1 , . . . , gk ).
Proof. (i) By [103], f and g have Caratheodory representations f (x)−f (a) = L1 (x−a)+∥x−a∥F (x)
and g(y) − g(f (a)) = L2 (y − f (a)) + ∥y − f (a)∥G(y), where L1 = f ′ (a; ·) : Rn → Rm and L2 =
g ′ (f (a); ·) : Rm → Rk are linear, limx→a F (x) = 0 ∈ Rm and limy→f (a) G(y) = 0 ∈ Rk . Let
L = L2 ◦ L1 : Rn → Rk , which is again a linear map. We have g(f (x)) − g(f (a))
= L2 (f (x) − f (a)) + ∥f (x) − f (a)∥G(f (x)) = L2 (L1 (x − a) + ∥x − a∥F (x)) + ∥f (x) − f (a)∥G(f (x))
∥f (x) − f (a)∥
= L(x − a) + ∥x − a∥H(x), where H(x) := L2 (F (x)) + G(f (x)) for x ̸= a and
∥x − a∥
H(a) := 0 ∈ Rk . Since limx→a ∥H(x)∥ = 0 by [107](i) and by the properties of F, G, and L2 , it
follows that the g◦f has the Caratheodory representation g(f (x))−g(f (a)) = L(x−a)+∥x−a∥H(x)
for x ∈ U . Hence by [103], g ◦ f is differentiable at a with (g ◦ f )′ (a; ·) = L = L2 ◦ L1 .
Another proof : By Caratheodory lemma [105], there are functions Φ1 : U → L(Rn , Rm ) and
Φ2 : V → L(Rm , Rk ), Φ1 (x) = L1,x and Φ2 (y) = L2,y , such that Φ1 and Φ2 are continuous at
a, f (a) respectively, f (x) − f (a) = L1,x (x − a) for every x ∈ U , and g(y) − g(f (a)) = L2,y (y − f (a))
for every y ∈ V . Let Φ : U → L(Rn , Rk ) be Φ(x) = Lx := L2,f (x) ◦ L1,x . Then g(f (x)) − g(f (a)) =
L2,f (x) (f (x) − f (a)) = L2,f (x) (L1,x (x − a)) = Lx (x − a). Moreover, using Exercise-7(iii), we may
observe that the operator norm ∥Lx − La ∥ = ∥L2,f (x) ◦ L1,x − L2,f (a) ◦ L1,a ∥ = ∥L2,f (x) ◦ (L1,x −
L1,a ) + (L2,f (x) − L2,f (a) ) ◦ L1,a ∥ ≤ ∥L2,f (x) ∥∥L1,x − L1,a ∥ + ∥L2,f (x) − L2,f (a) ∥∥L1,a ∥ → 0 as x → a
by the continuity of f and Φ1 at a, the continuity of Φ2 at f (a), and the continuity of the operator
norm (see Exercise-2). Thus Φ is continuous at a. Therefore by [105], g ◦ f is differentiable at a,
and (g ◦ f )′ (a; ·) = La = L2,f (a) ◦ L1,a = g ′ (f (a); ·) ◦ f ′ (a; ·).
(ii) The first assertion is evident. For the last assertion, argue as follows. We have Jg◦f (a) =
∂g
Jg (f (a))Jf (a) by (i). Now note that Jg (f (a)) is the 1 × m matrix whose ith entry is (f (a)),
∂yi
∂fi
and Jf (a) is an m × n matrix whose ijth entry is (a).
∂xj
For (iii) and (iv), use part (ii) and Exercise-6(ii). For (v), use part (i) and Exercise-6(iii).
Remark: (i) Suppose n = m = 2 and k = 1 in [105](i). Write f (x, y) = (u(x, y), v(x, y)) and put
h = g ◦ f . Then h(x, y) = g(u(x, y), v(x, y)). So the Chain rule as expressed in [105](ii) takes the
∂g ∂g ∂u ∂g ∂v ∂g ∂g ∂u ∂g ∂v
form = + and = + , an expression usually found in Calculus
∂x ∂u ∂x ∂v ∂x ∂y ∂u ∂y ∂v ∂y
textbooks. (ii) Two examples illustrating the Chain rule are given below. However, in practical
problems, it is often easier to compute (g ◦ f )′ (a; ·) directly after determining g ◦ f . The Chain rule
is mostly useful in theoretical proofs.
Example: In the two examples below, the functions are differentiable by [106].
(i) [n = k = 1 and m = 2] Let f : R → R2 be f (t) = (t, t2 ) and  g : R → R be g(x, y) = x + y .
2 3 4
[ ] 1 [ ]
Then Jg◦f (t) = Jg (f (t))Jf (t) = Jg (t, t2 )Jf (t) = 3t2 4t6   = 3t2 + 87 . Hence (g ◦ f )′ (t) =
2t
3t2 + 8t7 . Direct computation is easier: (g ◦ f )(t) = t3 + t8 and hence (g ◦ f )′ (t) = 3t2 + 8t7 .
(ii) [n = m = 2 and k = 1] Let f : R2 → R2 be f (r, θ) = (r cos θ, r sin θ), and g : R2 →

R be g(x, y) = x 2 + y 2 . Then J
g◦f (r, θ) = Jg (f (r, θ))Jf (r, θ) = Jg (r cos θ, r sin θ)Jf (r, θ) =
[ ] cos θ −r sin θ [ ]
2r cos θ 2r sin θ   = 2r 0 . Hence ∂(g ◦ f ) (r, θ) = 2r and ∂(g ◦ f ) (r, θ) = 0.
sin θ r cos θ ∂r ∂θ
Consequently, (g ◦f )′ ((r, θ); (v1 , v2 )) = 2rv1 . Again, a direct computation is easier: (g ◦f )(r, θ) = r2
and hence ∇(g ◦ f )(r, θ) = (2r, 0).
Exercise-9: Let ϕ, ψ : Rn → R be ϕ(x) = ∥x∥2 , ψ(x) = ∥x∥, and ρ : Rn ×Rn → R be ρ(x, y) = ⟨x, y⟩.
(i) ϕ is differentiable in Rn , ∇ϕ(a) = 2a, and ϕ′ (a; v) = 2⟨a, v⟩ for every a, v ∈ Rn .
a ⟨a, v⟩
(ii) ψ is differentiable in Rn \ {0}, ∇ψ(a) = , and ψ ′ (a; v) = for every a ∈ Rn \ {0} and
∥a∥ ∥a∥
v ∈ Rn . But ψ is not differentiable at 0 ∈ Rn .
(iii) ρ is differentiable in Rn × Rn , ∇ρ(a, b) = (b, a), and ρ′ ((a, b); (u, v)) = ⟨b, u⟩ + ⟨a, v⟩ for every
(a, b), (u, v) ∈ Rn × Rn .
∑n
j=1 xj . So ϕ is differentiable by Exercise-8, and ∇ϕ(a) = 2a. (ii) Let ϕ :
[Hint: (i) ϕ(x) = 2
√
Rn \ {0} → R \ {0} be ϕ(x) = ∥x∥2 , and g : R \ {0} → R be g(x) = x. Then ϕ, g are differentiable
and ψ = g ◦ ϕ on Rn \ {0}. Hence ψ is differentiable on Rn \ {0}, and ψ ′ (a; v) = g ′ (ϕ(a))⟨∇ϕ(a), v⟩.
To see ψ is not differentiable at 0 ∈ Rn , note that t 7→ |t| = ψ(t, 0, . . . , 0) is not differentiable at
0 ∈ R. (iii) ρ is differentiable by Exercise-8 since it is a multivariable polynomial.]
Exercise-10: (i) If f : Rn → Rm \ {0} is differentiable is at a ∈ Rn , then h : Rn → R defined

⟨f (a), f ′ (a; v)⟩
as h(x) = ∥f (x)∥ is differentiable at a, h′ (a; v) = for every v ∈ Rn , and ∇h(a) =
∥f (a)∥
∑m ′
f (a)∇f (a). If n = 1, then h ′ (a) = ⟨f (a), f (a)⟩ .
i i
i=1
∥f (a)∥
(ii) Let f, g : Rn → Rm be differentiable at a, b ∈ Rn respectively. Then h : Rn × Rn → R defined as
h(x, y) = ⟨f (x), g(y)⟩ is differentiable at (a, b) and h′ ((a, b); (u, v)) = ⟨g(b), f ′ (a; u)⟩+⟨f (a), g ′ (b; v)⟩
∑ ∑m
for every (u, v) ∈ Rn × Rn . Also, ∇h(a, b) = ( m i=1 gi (b)∇fi (a), i=1 fi (a)∇gi (b)). If n = 1, then
h′ (a, b) = ⟨g(b), f ′ (a)⟩ + ⟨f (a), g ′ (b)⟩.
(iii) [Generalization of product rule] Let f, g : Rn → Rm be differentiable at a ∈ Rn . Then
h : Rn → R defined as h(x) = ⟨f (x), g(x)⟩ is differentiable at a and h′ (a; v) = ⟨g(a), f ′ (a; v)⟩ +
∑ ∑m
⟨f (a), g ′ (a; v)⟩ for every v ∈ Rn . Also, ∇h(a) = m i=1 gi (a)∇fi (a) + i=1 fi (a)∇gi (a). If f = g,
∑
then h(x) = ∥f (x)∥2 , h′ (a; v) = 2⟨f (a); f ′ (a; v)⟩ for every v ∈ Rn , and ∇h(a) = 2 m i=1 fi (a)∇fi (a).
[Hint: Let ϕ, ψ, ρ be as in the previous exercise. (i) We have h = ψ ◦ f , where ψ(y) = ∥y∥.
Use Exercise-9(ii) after noting that h′ (a; v) = ψ ′ (f (a); f ′ (a; v)) by Chain rule. (iii) We have h =
ρ ◦ (f, g), where ρ : Rm × Rm → R is ρ(x, y) = ⟨x, y⟩. Use Exercise-9(iii) after noting that
h′ (a; v) = ρ′ ((f (a), g(a)); (f ′ (a; v), g ′ (b; v))).]
[109] Let U ⊂ Rn be open, and a, b ∈ U be distinct vectors such that the line segment [a, b] ⊂ U .
(i) [Mean value theorem for real valued functions] If f : U → R is differentiable, then there is
c ∈ (a, b) such that f (b) − f (a) = f ′ (c; b − a) = ⟨∇f (c), b − a⟩.
(ii) [Mean value theorem for vector-valued functions] If f : U → Rm is differentiable, then for each
z ∈ Rm , there is c ∈ (a, b) (where c depends on z) such that ⟨f (b) − f (a), z⟩ = ⟨f ′ (c; b − a), z⟩.
(iii) [Mean value inequalities for multivariable functions] Let f = (f1 , . . . , fm ) : U → Rm be
differentiable. Let M1 = sup{∥f ′ (c; ·)∥ : c ∈ [a, b]}, where ∥f ′ (c; ·)∥ is the operator norm of the
∑
linear map f ′ (c; ·), and M2 = sup{ mi=1 ∥∇fi (c)∥ : c ∈ [a, b]}. Then ∥f (b) − f (a)∥ ≤ M1 ∥b − a∥
and ∥f (b) − f (a)∥ ≤ M2 ∥b − a∥.
Proof. (i) Let g : [0, 1] → R be g(t) = f (a + t(b − a)), which is differentiable, being the composition
of two differentiable functions. Applying the one-variable Mean value theorem to g, we may find
t0 ∈ (0, 1) with f (b) − f (a) = g(1) − g(0) = g ′ (t0 )(1 − 0) = f ′ (c; b − a), where c := a + t0 (b − a).
(ii) Fix z ∈ Rm and define g : [0, 1] → R as g(t) = ⟨f (a + t(b − a)), z⟩, which is differentiable since f
and the maps t 7→ a + t(b − a), y 7→ ⟨y, z⟩ are differentiable. Applying the one-variable Mean value
theorem to g, we may find t0 ∈ (0, 1) with ⟨f (b) − f (a), z⟩ = g(1) − g(0) = g ′ (t0 )(1 − 0) = g ′ (t0 ).
Let c = a + t0 (b − a). By Chain rule and the linearity of the inner product in the first variable, we
g(t) − g(t0 )
may observe that g ′ (t0 ) = ⟨f ′ (c; b − a), z⟩ (or directly calculate the limit limt→t0 ).
t − t0
(iii) Taking z = f (b)−f (a) in (ii) and applying Cauchy-Schwarz inequality, we get ∥f (b)−f (a)∥2 =
|⟨f (b)−f (a), f (b)−f (a)⟩| = |⟨f ′ (c; b−a), f (b)−f (a)⟩| ≤ ∥f ′ (c; b−a)∥∥f (b)−f (a)∥ for some c ∈ (a, b),
and hence ∥f (b) − f (a)∥ ≤ ∥f ′ (c; b − a)∥. Since ∥f ′ (c; b − a)∥ ≤ ∥f ′ (c; ·)∥∥b − a∥, it follows that
∑
∥f (b) − f (a)∥ ≤ M1 ∥b − a∥. Moreover, as f ′ (c; b − a) = m i=1 ⟨∇fi (c), b − a⟩ei , another application
′
∑m ∑
of Cauchy-Schwarz inequality yields ∥f (c; b − a)∥ ≤ i=1 |⟨∇fi (c), b − a⟩| ≤ m i=1 ∥∇fi (c)∥∥b − a∥.
This implies ∥f (b) − f (a)∥ ≤ M2 ∥b − a∥.
Example: Let U ⊂ Rn be open, f : U → Rm be differentiable, and [a, b] ⊂ U . Then there may not
exist c ∈ (a, b) with f (b)−f (a) = f ′ (c; b−a). Consider f : R2 →R2 defined 2 3
 as f (x, y) = (x , y ). Let
2c1 0
a = (0, 0) and b = (1, 1). Since b − a = (1, 1) and Jf (c1 , c2 ) =   for any c = (c1 , c2 ) ∈ R2 ,
0 3c22
we get f ′ (c; b−a) = (2c1 , 3c22 ). If f ′ (c; b−a) = f (b)−f (a) = (1, 1), then c1 ̸= c2 and hence c ∈
/ [a, b].
Exercise-11: Let U ⊂ Rn be open and connected.

(i) If a, b ∈ U , then there is a polygonal path in U from a to b (polygonal path means a continuous
path consisting of finitely many line segments).
(ii) If f : U → Rm is differentiable and f ′ (c; ·) ≡ 0 for every c ∈ U , then f is constant.
[Hint: (i) Let Y = {y ∈ U : there is a polygonal path in U from a to y}. Then a ∈ Y . Check that
Y is both open and closed in A. (ii) Writing f = (f1 , . . . , fm ) and considering each fi separately,
assume m = 1. If [a, b] ⊂ U , then f (a) = f (b) by Mean value theorem [109](i) (or, apply [109](ii)
directly to f with z = f (b) − f (a)). It now follows by (i) that f (a) = f (b) for every a, b ∈ U .]
Exercise-12: Let U ⊂ Rn be open, and f : U → R be differentiable. Suppose f has either a local

maximum or a local minimum at a ∈ U , i.e., there is r > 0 with B(a, r) ⊂ U such that either
f (b) ≥ f (a) for every b ∈ B(a, r) or f (b) ≤ f (a) for every b ∈ B(a, r). Then f ′ (a; ·) ≡ 0. [Hint:
Assume B(a, r) ⊂ U and f (b) ≥ f (a) for every b ∈ B(a, r). Fix v ∈ Rn with ∥v∥ = 1 and define
g : (−r, r) → R as g(t) = f (a + tv). Then g has a local maximum at 0. Hence by one-variable
theory, 0 = g ′ (0) = f ′ (a; v).]
Exercise-13: Let f : Rn → R be differentiable.

∑n ∫1
∂f
(i) If f has continuous partial derivatives, then f (x) = f (0) + j=1 xj 0 (tx)dt ∀ x ∈ Rn .
∂xj
(ii) Let k ∈ N. If f (tx) = tk f (x) for every t ∈ R and x ∈ Rn , then kf (x) = ⟨∇f (x), x⟩ ∀ x ∈ Rn .
(iii) If f (tx) = tf (x) for every t ∈ R and x ∈ Rn , then f (x) = ⟨∇f (0), x⟩ ∀ x ∈ Rn .
d(tx)
[Hint: (i) Fix x ∈ Rm and define F : R → R as F (t) = f (tx). Then F ′ (t) = ⟨∇f (tx), ⟩=
dt
∑n ∂f
j=1 xj (tx). Applying the Fundamental theorem of calculus to F , we see f (x) − f (0) =
∂xj
∫1
F (1) − F (0) = 0 F ′ (t)dt. (ii) Differentiate tk f (x) = f (tx) with respect to t, and put t = 1. (iii)
Differentiate tf (x) = f (tx) with respect to t, and put t = 0.]
4. Higher order partial derivatives
Definition and Example: Let U ⊂ Rn be open and f : U → Rm be a function for which the partial
∂f ∂2f ∂ ∂f
derivatives : U → Rm exist in U . If := ( ) exists, then it is called a second order
∂xj ∂xi xj ∂xi ∂xj
partial derivative of f . Repeating this process, for any k ∈ N, we may define kth order partial
x3 y − xy 3
derivatives of f , if they exist. For example, let f : R2 → R be f (x, y) = if (x, y) ̸= (0, 0)
x2 + y 2
∂f x4 y − 4x2 y 3 − y 5 ∂f
and f (0, 0) = 0. Then (x, y) = 2 2 2
for (x, y) ̸= (0, 0) and (0, 0) = 0. Since
∂x (x + y ) ∂x
∂f ∂2f ∂f x5 − 4x3 y 2 − xy 4
(0, y) = −y, we get (0, y) = −1 for every y ∈ R. Similarly, (x, y) =
∂x ∂y∂x ∂y (x2 + y 2 )2
∂f ∂f 2
∂ f
for (x, y) ̸= (0, 0) and (0, 0) = 0. Since (x, 0) = x, we get (x, 0) = 1 for every y ∈ R.
∂y ∂y ∂x∂y
∂2f ∂2f
Hence (0, 0) = 1 ̸= −1 = (0, 0).
∂x∂y ∂y∂x
[110] [Equality of mixed partial derivatives] Let U ⊂ Rn be open, f : U → Rm be a function, and
∂2f ∂2f ∂2f ∂2f
1 ≤ j, k ≤ n. If and exist and are continuous in U , then = in U .
∂xj ∂xk ∂xk ∂xj ∂xj ∂xk ∂xk ∂xj
Proof. Writing f = (f1 , . . . , fm ) and considering each fi separately, we may suppose m = 1. Also we
may assume n = 2 since we deal with only two variables at a time. So now the [function under consid-]
∂2f 1 ∂f ∂f
eration is f : U ⊂ R → R. Let (a, b) ∈ U . Then
2 (a, b) = limy→b (a, y) − (a, b)
∂y∂x y − b ∂x ∂x
(f (x, y) − f (a, y)) − (f (x, b) − f (a, b))
= limy→b limx→a
(y − b)(x − a)
g(x, y) − g(x, b)
= limy→b limx→a (where g(x, y) := f (x, y) − f (a, y)).
(y − b)(x − a)
∂g
(x, y0 )
∂y
= limy→b limx→a (for some y0 ∈ (b, y) by Mean value theorem)
x−a
∂f ∂f
(x, y0 ) − (a, y0 )
∂y ∂y
= limy→b limx→a
x−a
∂2f
= limy→b limx→a (x0 , y0 ) (for some x0 ∈ (a, x) by Mean value theorem)
∂x∂y
∂2f ∂2f
= (a, b) since is continuous and (x0 , y0 ) → (a, b) as (x, y) → (a, b).
∂x∂y ∂x∂y
∂2f
Note that in this proof, we used only the continuity of .
∂x∂y
Definition: Let U ⊂ Rn be open and f : U → Rm be a function. For k ∈ N, we say f is a

C k -function (or a C k -map) if all the kth order partial derivatives of f exist and are continuous in
U . We say f is a C ∞ -function (or a smooth function/map) if f is a C k -function for every k ∈ N.
Remark: Let U ⊂ Rn be open and f : U → Rm be a function. Since we have the identification

L(Rn Rm ) ∼
= {all m × n real matrices} ∼
= Rm×n and since the entries of the Jacobian matrix are the
(first order) partial derivatives, it follows from [106] that f is a C 1 -function ⇔ f is differentiable in
U and the map x 7→ f ′ (x; ·) from U to L(Rn , Rm ) is continuous with respect to the operator norm
on L(Rn , Rm ).
The following types of results are of importance in Differential Topology:
[111] (i) Let a ∈ Rn and 0 < r < s. Then there exist smooth functions (i.e., C ∞ -functions)
f, g : Rn → R with the following properties: 0 ≤ f, g ≤ 1, f (x) > 0 iff ∥a − x∥ < s, f (x) = 0 iff
∥a − x∥ ≥ s, g(x) > 0 iff ∥a − x∥ > r, and g(x) = 0 iff ∥a − x∥ ≤ r.
(ii) Let a ∈ Rn and 0 < r < s. Then there is a smooth function h : Rn → R with 0 ≤ h ≤ 1 such
that h(x) = 1 for every x ∈ B(a, r), and h(x) = 0 for every x ∈ Rn \ B(a, s).
(iii) Let A ⊂ U ⊂ Rn , where A is a nonempty compact set and U is open in Rn . Then there is a
smooth function h : Rn → R with 0 ≤ h ≤ 1 such that h(x) = 1 for every x ∈ A, and h(x) = 0 for
every x ∈ Rn \ U .
Proof. (i) We know from Real Analysis that there is a smooth function ψ : R → R such that
0 ≤ ψ ≤ 1, ψ(x) = 0 for x ≤ 0, and ψ(x) > 0 for x > 0. For example, we may take ψ to be the
function ψ(x) = 0 if x ≤ 0 and ψ(x) = e−1/x if x > 0. Note that the function y → ∥y∥2 from Rn to
R is a multivariable polynomial and hence smooth. Therefore, the functions f, g : Rn → R defined
as f (x) = ψ(s2 − ∥a − x∥2 ) and g(x) = ψ(∥a − x∥2 − r2 ) are smooth. Verify the required properties.
(ii) Let f, g be as in the above proof. Note that f (x)+g(x) > 0 for every x ∈ Rn . Define h : Rn → R
f (x)
as h(x) = . Check that this works.
f (x) + g(x)
(iii) For each a ∈ A, there is ra > 0 with B(a, 2ra ) ⊂ U . As {B(a, ra ) : a ∈ A} is an open cover for
∪
the compact set A, there are finitely many vectors a1 , . . . , ap ∈ A with A ⊂ pj=1 B(aj , raj ). First
proof : By (ii), there are smooth functions h1 , . . . , hp : Rn → R such that 0 ≤ hj ≤ 1, hj (x) = 1 for
every x ∈ B(aj , raj ), and hj (x) = 0 for every x ∈ Rn \ B(aj , 2raj ), 1 ≤ j ≤ p. Let h : Rn → R be
∏ ∪
h(x) = 1 − pj=1 (1 − hj (x)), and check that this works. Second proof : Let U1 = pj=1 B(aj , raj )
∪
and U2 = pj=1 B(aj , 2raj ). Then A ⊂ U1 ⊂ U1 ⊂ U2 ⊂ U . For each j, we may choose smooth
functions fj , gj : Rn → R by (i) such that fj (x) > 0 iff ∥aj −x∥ < 2raj , fj (x) = 0 iff ∥aj −x∥ ≥ 2raj ,
∑
gj (x) > 0 iff ∥aj − x∥ > raj , and gj (x) = 0 iff ∥aj − x∥ ≤ raj . Let F, G : Rn → R be F = pj=1 fj
∏
and G = pj=1 gj . Then F, G are smooth, F (x) > 0 iff x ∈ U2 , F (x) = 0 iff x ∈ Rn \ U2 , G(x) > 0
iff x ∈ Rn \ U1 , and G(x) = 0 iff x ∈ U1 . In particular, F + G > 0 in Rn . Define h : Rn → R as
F (x)
h(x) = . Check that this works.
F (x) + G(x)
Remark: Using [111](iii), we may construct smooth partitions of unity; read this from textbooks.
Definition: If v = (v1 , . . . , vn ) ∈ Rn , then we define the directional derivative operator ⟨v, ∇⟩

∑n ∂
as ⟨v, ∇⟩ = j=1 vj . Note that if U ⊂ Rn is open and f : U → R is a function for
∂xj
∑n ∂f
which the relevant partial derivatives exist, then ⟨v, ∇⟩f (a) = j=1 vj (a), ⟨v, ∇⟩2 f (a) =
∂xj
∑n ∑n ∂2f ∑n ∑n ∑n ∂3f
i=1 j=1 v i v j (a), ⟨v, ∇⟩3 f (a) =
i=1 j=1 k=1 v i v j v k (a), and so on.
∂xi ∂xj ∂xi ∂xj ∂xk
[112] [Multivariable Taylor’s theorem] Let U ⊂ Rn be open, [a, b] ⊂ U , q ∈ N, and f : U → R be a
C q+1 -function. Then,
∑q ⟨b − a, ∇⟩k f (a) ⟨b − a, ∇⟩q+1 f (c)
(i) There is c ∈ (a, b) such that f (b) = f (a) + k=1 + .
k! (q + 1)!
∑q ⟨b − a, ∇⟩k f (a) ∫ 1 (1 − t)q
(ii) [Integral form] f (b) = f (a) + k=1 + 0 ⟨b − a, ∇⟩q+1 f (a + t(b − a))dt.
k! q!
Proof. (i) Define g : [0, 1] → R as g(t) = f (a + t(b − a)), which is a C q+1 -function. Note that
g ′ (t) = f ′ (a + t(b − a); b − a) = ⟨∇f (a + t(b − a)), b − a⟩ = ⟨b − a, ∇⟩g(t), and hence we may
verify inductively that g (k) (t) = ⟨b − a, ∇⟩k g(t) = ⟨b − a, ∇⟩k f (a + t(b − a)) for 1 ≤ k ≤ q + 1,
where g (k) denotes the kth derivative of g. Applying the one-variable Taylor’s theorem to g, we
∑ g (k) (0)(1 − 0)k g (q+1) (t0 )(1 − 0)q+1
may find t0 ∈ (0, 1) with g(1) = g(0) + qk=1 + . This means
k! (q + 1)!
∑ ⟨b − a, ∇⟩k f (a) ⟨b − a, ∇⟩q+1 f (c)
f (b) = f (a) + qk=1 + , where c := a + t0 (b − a).
k! (q + 1)!
(ii) Let g be as above. Then the integral form of one-variable Taylor’s theorem says that g(1) =
∑ g (k) (0)(1 − 0) ∫ 1 (1 − t)q (q+1)
g(0) + qk=1 + 0 g (t)dt. This yields the required result.
k! q!
Definition: Let U ⊂ Rn be open and f : Rn → R be a function for which all the second order partial
∂2f
derivatives exist at a ∈ U . Then the n × n matrix Hf (a) whose ijth entry is (a) is called
∂xi ∂xj
the Hessian matrix of f . Note that if the second order partial derivatives of f are continuous in a
neighborhood of a, then Hf (a) is a symmetric matrix by [110].
Remark: Let U ⊂ Rn be open and f : Rn → R be differentiable. Then f ′ (a; ·) ∈ L(Rn , R) ∼

= Rn
for every a ∈ U . Under this identification, g := f ′ can be thought of as a map from U to Rn ,
where g(a) = ∇f (a). Now assume that the second order partial derivatives of f exist and are
∂f
continuous in U . Writing g = (g1 , . . . , gn ), we have gi = and hence the ijth entry of Jg (a) is
∂xi
∂gi ∂2f ∂2f
(a) = (a) = (a), where the last equality is by [110]. Thus Jg (a) = Hf (a). We
∂xj ∂xj ∂xi ∂xi ∂xj
can think of g ′ as the ‘second derivative’ of f , and hence the behavior of the second derivative of
f at a is controlled by the Hessian matrix Hf (a). In particular, the Hessian matrix is useful in
studying the local minima and local maxima of f . To state the results precisely, we need a few
facts from Linear Algebra.
Definition: Let A be an n×n real symmetric matrix. For x ∈ Rn , let Ax = A(x) ∈ Rn be the vector
obtained by applying the linear map specified by A (with respect to the standard basis) to x. Since
A is symmetric, Ax can be obtained in terms of matrix multiplication either by multiplying the
column vector x on the left by A, or the row vector x on the right by A. We say A is positive definite
if ⟨Ax, x⟩ > 0 for every x ∈ Rn \ {0}, and negative definite if ⟨Ax, x⟩ < 0 for every x ∈ Rn \ {0}.
   
1 0 −1 0
Example: When n = 2, we have that   is positive definite,   is negative definite,
0 1 0 −1
 
1 0
and   is a symmetric matrix which is neither positive definite nor negative definite.
0 −1
[113] [Facts from Linear Algebra] Let A be an n × n real symmetric matrix. Then,
(i) The eigenvalues, say λ1 , . . . , λn , of A are all real, and there is an orthonormal basis {u1 , . . . , un }
of Rn such that Auj = λj uj for 1 ≤ j ≤ n.
(ii) A is positive definite ⇔ all eigenvalues of A are positive.
(iii) A is negative definite ⇔ all eigenvalues of A are negative.
(iv) Suppose n = 2 and A = [aij ]2×2 . Then, A is positive definite ⇔ det(A) > 0 and a11 > 0.
Similarly, A is negative definite ⇔ det(A) > 0 (warning: not negative) and a11 < 0. If det(A) < 0,
then the two eigenvalues of A have opposite signs because det(A) is equal to the product of the
two eigenvalues of A.
Proof. For (i), see a suitable textbook in Linear Algebra.
(ii) Let λj ∈ R and uj ∈ Rn be as in (i). If A is positive definite, then λj = λj ⟨uj , uj ⟩ =

⟨λj uj , uj ⟩ = ⟨Auj , uj ⟩ > 0 for 1 ≤ j ≤ n. Conversely, assume λj > 0 for 1 ≤ j ≤ n. For any
∑
x = nj=1 cj uj ∈ Rn \ {0}, we have cj ̸= 0 for some j, and hence we see by the orthonormality of
∑ ∑ ∑ ∑ ∑
uj ’s that ⟨Ax, x⟩ = ni=1 nj=1 ci cj ⟨Aui , uj ⟩ = ni=1 nj=1 ci cj λi ⟨ui , uj ⟩ = nj=1 c2j λj ∥uj ∥2 > 0.
(iii) Note that A is negative definite iff −A is positive definite, or imitate the proof of (ii).
(iv) Note that a12 = a21 since A is symmetric. Since ⟨Ae1 , e1 ⟩ = a11 , we may suppose a11 ̸= 0 for
proving both the implications. Now for (x, y) ∈ R2 , observe that ⟨A(x, y), (x, y)⟩
a12 y 2 a2 a12 y 2 det(A)y 2
= a11 x2 + 2a12 xy + a22 y 2 = a11 (x + ) + (a22 − 12 )y 2 = a11 (x + ) + .
a11 a11 a11 a11
From this expression, it is clear that A is positive definite if a11 > 0 and det(A) > 0. Conversely,
if A is positive definite, then from the above expression, a11 = ⟨Ae1 , e1 ⟩ > 0 and det(A)a11 =
⟨A(a12 , −a11 ), (a12 , −a11 )⟩ > 0 so that det(A) > 0.
Remark: If A ∈ Rn×n , then the function Q : Rn → R given by Q(x) = ⟨Ax, x⟩ is called a quadratic
form. Some of the assertions in [113] can also be stated in the language of a quadratic form.
Definition: Let U ⊂ Rn be open, f : U → R be differentiable, and a ∈ U . We say a is a critical

point of f if ∇f (a) = 0 ∈ Rn (equivalently, if f ′ (a; ·) ≡ 0). If a ∈ U is a critical point of f , and is
neither a local maximum nor a local minimum of f , then a is called a saddle point of f .
Example: Let f : R2 → R be f (x, y) = x2 − y 2 . Then (0, 0) is a critical point of f because

∇f (0, 0) = (0, 0). But f (x, y) > 0 for x > y > 0 and f (x, y) < 0 for y > x > 0, and hence (0, 0) is
neither a local minimum nor a local maximum of f . Thus (0, 0) is a saddle point of f .
Exercise-14: Let U ⊂ Rn be open, f : U → R be a C 2 -function, and a ∈ U be a critical point

of f . If r > 0 is with B(a, r) ⊂ U , then for every b ∈ B(a, r) \ {a}, there is c ∈ (a, b) such
that 2(f (b) − f (a)) = ⟨Hf (c)(b − a), b − a⟩, where Hf (c) is the Hessian matrix of f at c. [Hint:
By [112], there is c ∈ (a, b) such that f (b) − f (a) = ⟨b − a, ∇⟩f (a) + (1/2)⟨b − a, ∇⟩2 f (c). But
⟨b − a, ∇⟩f (a) = ⟨∇f (a), b − a⟩ = 0 since ∇f (a) = 0. Also, ⟨b − a, ∇⟩2 f (c) = ⟨Hf (c)(b − a), b − a⟩.]
[114] [Test for local extrema] Let U ⊂ Rn be open, f : U → R be a C 2 -function, and a ∈ U be a

critical point of f , i.e., ∇f (a) = 0 ∈ Rn . Let Hf (x) be the Hessian matrix of f at x ∈ U .
(i) If Hf (a) is positive definite (equivalently, if all eigenvalues of Hf (a) are positive), then a is a
strict local minimum of f .
(ii) If Hf (a) is negative definite (equivalently, if all eigenvalues of Hf (a) are negative), then a is a
strict local maximum of f .
(iii) If there are x, y ∈ Rn with ⟨Hf (a)(x), x⟩ < 0 < ⟨Hf (a)(y), y⟩, or if Hf (a) has two eigenvalues
λ1 and λ2 with λ1 < 0 < λ2 , then a is a saddle point of f .
Proof. Let S = {v ∈ Rn : ∥v∥ = 1}, which is compact.

(i) If Hf (a) is positive definite, there is δ > 0 such that ⟨Hf (a)v, v⟩ ≥ δ for every v ∈ S because the
function v 7→ ⟨Hf (a)v, v⟩ from S to R is continuous. As f is a C 2 -function, we may choose r > 0
such that B(a, r) ⊂ U and the operator norm ∥Hf (c) − Hf (a)∥ < δ for every c ∈ B(a, r). Consider
b ∈ B(a, r) \ {a}. Then 2(f (b) − f (a)) = ⟨Hf (c)(b − a), b − a⟩ for some c ∈ (a, b) by Exercise-14.
b−a 2(f (b) − f (a))
Taking v = ∈ S, we see that = ⟨Hf (c)v, v⟩ = ⟨Hf (a)v, v⟩ + ⟨(Hf (c) −
∥b − a∥ ∥b − a∥2
Hf (a))v, v⟩ > 0 because ⟨Hf (a)v, v⟩ ≥ δ and ⟨(Hf (c) − Hf (a))v, v⟩ ≤ ∥Hf (c) − Hf (a)∥∥v∥2 < δ.
This shows that f (b) > f (a) for every b ∈ B(a, r) \ {a}.
(ii) This is similar to the proof of (i), or apply (i) to −f .
(iii) If Hf (a) has two eigenvalues λ1 and λ2 with λ1 < 0 < λ2 , then for the corresponding eigen-
vectors x, y ∈ Rn \ {0}, we have that ⟨Hf (a)x, x⟩ < 0 < ⟨Hf (a)y, y⟩. So assume this inequality
holds. Let v = y/∥y∥. We claim that f (a + εv) > f (a) for every sufficiently small ε > 0. Since
δ := ⟨Hf (a)v, v⟩ is positive, we may choose r > 0 such that B(a, r) ⊂ U and the operator norm
∥Hf (c) − Hf (a)∥ < δ for every c ∈ B(a, r). Now consider any ε ∈ (0, r) and put b = a + εv ∈
B(a, r)\{a}. By Exercise-14, there is c ∈ (a, b) ⊂ B(a, r) with 2(f (b)−f (a)) = ⟨Hf (c)(b−a), b−a⟩.
2(f (b) − f (a))
As in the proof of (i), we conclude that = ⟨Hf (a)v, v⟩ + ⟨(Hf (c) − Hf (a))v, v⟩ > 0,
∥b − a∥2
where the last inequality is deduced using the choice of δ and r. This shows that f (a + εv) − f (a) =
f (b) − f (a) > 0 for every ε ∈ (0, r). Similarly, by taking u = x/∥x∥, we may show that
f (a + εu) − f (a) < 0 for every sufficiently small ε > 0. Hence a is a saddle point of f .
Example: (i) Let f : R2 → R be f (x, y) = xy −x2 −y 2 . Then we see that ∇f (x, y) = (y −2x, x−2y),
 
−2 1
and hence ∇f (0, 0) = (0, 0). Now, A = [aij ] := Hf (0, 0) =  . The eigenvalues of this
1 −2
matrix are the roots of its characteristic polynomial, which is λ2 + 4λ + 3. Hence the eigenvalues
are −1 and −3. Thus (0, 0) is a strict local maximum of f by [114](i). Alternately, one may use
[114](i) and [113](iv) to conclude the same since det(A) > 0 and a11 = −2 < 0.
2 +y 2
(ii) Let f : R2 → R be f (x, y) = ex . Directly, we may see that (0, 0) is the unique
 minimum
 of
2 0
f . Now ∇f (x, y) = (2xex +y , 2yex +y ) so that ∇f (0, 0) = (0, 0), and Hf (0, 0) =  , whose
2 2 2 2
0 2
eigenvalues are positive. Thus (0, 0) is a local minimum of f by our test.
(iii) Let f : R2 → R be f (x, y) = ex y 2 . Since ex > 0 and y 2 ≥ 0, we see directly that (x, 0) is a local
minimum of f for every x ∈ R. However, this does not follow  from our test. Note that ∇f (x, y) =
0 0
(ex y 2 , 2ex y) so that ∇f (0, 0) = (0, 0). Since Hf (0, 0) =  , we have det(Hf (0, 0)) = 0, and
0 2
the tests [113] and [114] are not applicable.
(iv) Let f : R2 → R be f (x, y) = cos x+ y sin  x. Then ∇f (x, y) = (− sin x + y cos x, sin x) so that
−1 1
∇f (0, 0) = (0, 0). We have Hf (0, 0) =  . Since det(Hf (0, 0)) < 0, we conclude by [113](iv)
1 0
and [114](iii) that (0, 0) is a saddle point of f .
5. Inverse function theorem and Implicit function theorem
Under suitable hypothesis, we will show that if the Jacobian matrix Jf (a) of f at a is invertible,
then f is locally invertible. We need a little preparation.
Exercise-15: Let L(Rn , Rn ) = {all linear maps L : Rn → Rn }, equipped with the operator norm.
Recall that L(Rn , Rn ) ∼
= Rn×n , and the operator norm on L(Rn , Rn ) is equivalent to the Euclidean
norm on Rn×n . Let U := {L ∈ L(Rn , Rn ) : L is invertible} = {A ∈ Rn×n : det(A) ̸= 0}. Then,
(i) U is open in L(Rn , Rn ) (equivalently, in Rn×n ) because det : Rn×n → R is continuous.
(ii) The map A 7→ A−1 from U to U is continuous.
1
[Hint: (ii) Consider A ∈ U . Choose 0 < r < by (i) such that C ∈ U whenever ∥A−C∥ < r.
2∥A−1 ∥
Now consider C ∈ U with ∥A − C∥ < r. Note that C −1 − A−1 = C −1 (A − C)A−1 . Hence ∥C −1 ∥ ≤
∥A−1 ∥ + ∥C −1 ∥∥A − C∥∥A−1 ∥ < ∥A−1 ∥ + ∥C −1 ∥/2, and hence ∥C −1 ∥ ≤ 2∥A−1 ∥. Therefore,
∥C −1 − A−1 ∥ ≤ ∥C −1 ∥∥A − C∥∥A−1 ∥ < 2∥A−1 ∥2 ∥A − C∥, which gives continuity at A.]
Exercise-16: [Matrix form of Mean value theorem] Let U ⊂ Rn be open, f = (f1 , . . . , fn ) : U → Rn

be differentiable, and a, b[∈ U be ]distinct with [a, b] ⊂ U . Then there are vectors c1 , . . . , cn ∈ (a, b)
∂fi (ci )
such that f (b) − f (a) = (b − a), where f (b) − f (a) and b − a are to be considered as
∂xj n×n
column vectors. [Hint: Apply [109](i) to each fi separately.]
[115] [Inverse function theorem] Let U ⊂ Rn be open, f : U → Rn be a C 1 -function, and a ∈ U be

with det(Jf (a)) ̸= 0. Then there is r > 0 with B(a, r) ⊂ U such that the following are true:
(i) f is injective on B(a, r) and det(Jf (x)) ̸= 0 for every x ∈ B(a, r).
(ii) f restricted to B(a, r) is an open map; and in particular, V := f (B(a, r)) is open in Rn .
(iii) Let g : V → B(a, r) be g = (f |B(a,r) )−1 . Then g is a C 1 -function and Jg (v) = Jf (u)−1 for every
v = f (u) ∈ V , where u ∈ B(a, r). Moreover, if f is a C k -function, then g is also a C k -function.
([ ] )
∂f i (u i )
Proof. (i) Let U n = U ×· · ·×U ⊂ Rn×n . Define ϕ : U n → R as ϕ(u1 , . . . , un ) = det .
∂xj n×n
Then ϕ is continuous because f is a C 1 -function and det : Rn×n → R is continuous. Since
ϕ(a, . . . , a) = det(Jf (a)) ̸= 0, we may find r > 0 with B(a, r) ⊂ U such that ϕ(c1 , . . . , cn ) ̸= 0 for ev-
ery (c1 , . . . , cn ) ∈ B(a, r) × · · · × B(a, r) = B(a, r)n . In particular, det(Jf (x)) = ϕ(x, . . . , x) ̸= 0 for
every x ∈ B(a, r). Now suppose
[ ] x ∈ B(a, r) are distinct. By Exercise-16,
u, ([ ])there are c1 , . . . , cn ∈
∂fi (ci ) ∂fi (ci )
(u, x) with f (x)−f (u) = (x−u). Since x−u ̸= 0 and det = ϕ(c1 , . . . , cn ) ̸= 0,
∂xj ∂xj
it follows that f (x) − f (u) ̸= 0. Thus f is injective on B(a, r).
(ii) Let U0 ⊂ B(a, r) be open and u ∈ U0 . We have to find δ > 0 with B(f (u), δ) ⊂ f (U0 ). Choose
ε > 0 with B(u, ε) ⊂ U0 . Since the boundary ∂B(u, ε) is compact, its image f (∂B(u, ε)) is also
compact by the continuity of f . Moreover, f (u) ∈
/ f (∂B(u, ε)) by injectivity. Hence there is δ > 0
such that ∥f (x) − f (u)∥ ≥ 2δ for every x ∈ ∂B(u, ε). Fix y ∈ B(f (u), δ) and define ψ : B(u, ε) → R
∑
as ψ(x) = ∥f (x) − y∥2 = ni=1 (fi (x) − yi )2 . Then ψ is continuous and attains its minimum at
some z ∈ B(u, ε) by the compactness of B(u, ε). Observe that if x ∈ ∂B(u, ε), then ψ(x) ≥
(∥f (x) − f (u)∥ − ∥f (u) − y∥)2 ≥ (2δ − δ)2 = δ 2 > ψ(u). Therefore, ψ cannot attain its minimum
on the boundary of B(u, ε). In other words, we must have z ∈ B(u, ε). Since ψ is differentiable,
∂ψ ∑ ∂fi
we conclude by Exercise-12 that ∇ψ(z) = 0. This means 0 = (z) = 2 ni=1 (fi (z) − yi ) (z)
∂xj ∂xj
for 1 ≤ j ≤ n. In matrix form, this is equivalent to (f (z) − y)Jf (z) = 0 ∈ Rn . As det(Jf (z)) ̸= 0
by (i), the only possibility is f (z) = y. This shows B(f (u), δ) ⊂ f (B(u, ε)) ⊂ f (U0 ).
(iii) Fix v = f (u) ∈ V , where u ∈ B(a, r). We claim that g is differentiable at v with Jg (v) =
Jf (u)−1 . Consider y ∈ V \ {v}. By (i), there is x ∈ B(a, [ r) \ {u} ] with f (x) = y. Then by
∂fi (ci )
Exercise-16, there are c1 , . . . , cn ∈ (u, x) with f (x) − f (u) = (x − u). This means y − v =
∂xj
[ ] [ ]
∂fi (ci ) ∂fi (ci ) −1
(g(y)−g(v)), and hence g(y)−g(v) = (y−v). As y → v, we have x → u since g
∂xj ∂x
[ ]−1 j
∂fi (ci )
is continuous by (ii); and consequently → Jf (u)−1 in Rn×n by the C 1 -property of f and
∂xj
Exercise-15(ii). Therefore, it follows by [105] (or by direct calculation) that g is differentiable at v
with Jg (v) = Jf (u)−1 . This proves the claim, and thus g is differentiable in V with Jg (v) = Jf (u)−1
for every v = f (u), where u ∈ B(a, r). Since f is a C 1 -function, it now follows by Exercise-15(ii)
that g is also a C 1 -function (or note that the entries of Jg (v) are rational functions of the entries
of Jf (u)). Similarly, if f is a C k -function, we may deduce that g is a C k -function.
Remark: (i) Inverse function theorem guarantees only a local inverse for f . Even if det(Jf (a)) ̸= 0
for every a ∈ U , f may not have a global inverse. For example, let f : R2 → R2 be f (x, y) =
(ex cos y, ex sin y). Then f (x, y + π) = f(x,
 y) so that f failsto
 be injective, and hence f has no
ex cos y −ex sin y
global inverse. But det(Jf (x, y)) = det   = e2x ̸= 0 for every (x, y) ∈ R2 .
ex sin y ex cos y
(ii) For another proof of [115] using Banach fixed point theorem for contraction maps, see Theorem
9.24 in Rudin, Principles of Mathematical Analysis.
Notation: We write (x, y) ∈ Rn+m to mean x = (x[1 , . . . , xn )]∈ Rn and y = (y1 , . . . , ym ) ∈ Rm . If

∂f
f = (f1 , . . . , fm ) : Rn+m → Rm is differentiable, let (x, y) denote the m × n matrix whose ijth
[ ] ∂x
∂fi ∂f ∂fi
entry is (x, y), and (x, y) denote the m × m matrix whose ijth entry is (x, y). The
∂xj ∂y [ ∂yj ]
∂f ∂f
Jacobian matrix of f is then expressed as a block matrix: Jf (x, y) = (x, y) (x, y) .
∂x ∂y m×(n+m)
To motivate our next result, we ask the following question: if f : Rn+m → Rm is a function,
then from the expression f (x, y) = 0 (where x ∈ Rn and y ∈ Rm ), can we solve y as a function of
x, at least locally? In other words, can we represent the zero set {(x, y) ∈ Rn+m : f (x, y) = 0}, at
least locally, as the graph G(g) of a function g from an open set in Rn to Rm ? This question about
the ‘implicit’ existence of a function g is answered by the Implicit function theorem stated as [116]
below. First let us consider some examples.
Example: (i) Let f : R2 → R be f (x, y) = x + 2y − 3. If f (x, y) = 0, then y = (3 − x)/2.
(ii) Let f : R2 → R be f (x, y) = xy. Here, y cannot be solved as a function of x globally from
‘f (x, y) = 0’ since f (0, y) = 0 for every y ∈ R. However, if x ̸= 0, then we have y = 0 whenever
f (x, y) = 0, and hence y may be thought of as the zero function of x for x ∈ R \ {0}.
(iii) Let f : R2 → R be f (x, y) = x2 +y 2 −1. Then {(x, y) : f (x, y) = 0} is the unit circle in R2 . The
full circle is not the graph of any function. But any arc of the unit circle which projects injectively
to the x-axis is clearly the graph of a function; for example, {(x, y) : f (x, y) = 0 and y > 0} is the
√
graph of the function g : (−1, 1) → R defined as g(x) = 1 − x2 . If an open arc of the unit circle
contains either (1, 0) or (0, 1), then that arc is not the graph of any function. Here, note that (1, 0)
∂f
and (0, 1) are precisely the points where vanishes.
∂y
(iv) Let f : Rn+m → Rm be linear. Let L1 : Rn → Rm and L2 : Rm → Rm be L1 (x) = f (x, 0)
and L2 (y) = f (0, y) so that f (x, y) = L1 (x) + L2 (y). Now from ‘f (x, y) = 0’, we can solve y as
y = −L2−1 (L1 (x)) provided L2 is invertible. Let A be the m × (n + m) matrix of the linear map f
with respect to the standard bases of Rn+m and Rm . Then Jf (x, y) = A for every (x, y) ∈ Rn+m .
Note that the (n + j)th column of A is same as the jth column of the [matrix of ]the linear map L2
∂f
since L2 (y) = f (0, y). Therefore L2 is invertible iff the m × m matrix (x, y) is invertible.
∂y
[116] [Implicit function theorem] Let U ⊂ Rn+m be open, f : U [→ Rm be 1
] a C -function, and
∂f
(a, b) ∈ U be such that f (a, b) = 0 ∈ Rm and the m × m matrix (a, b) is invertible. Then,
∂y
there exist r > 0, an open neighborhood A ⊂ Rn of a, and a C 1 -function g : A → Rm such that
(i) B((a, b), r) ⊂ U , g(a) = b, and {(x, y) ∈ B((a, b), r) : f (x, y) = 0} = {(x, g(x)) : x ∈ A}.
[ ]−1 [ ]
∂f ∂f
(ii) Jg (x) = − (x, g(x)) (x, g(x)) for every x ∈ A.
∂y ∂x
Proof. (i) Let F : U → Rn+m be F (x, y) = (x, f (x, y)), which is a C 1 -function with F (a, b) =
 
In 0 [ ]
∂f
(a, 0). As JF (x, y) =  ∂f ∂f , we see det(J F (a, b)) = det( (a, b) ) ̸= 0. By
(x, y) (x, y) ∂y
∂x ∂y
Inverse function theorem [115], there exist r > 0, an open set V ⊂ [Rn+m , and 1
] a C -function
∂f
G : V → B((a, b), r) such that B((a, b), r) ⊂ U , det(JF (x, y)) = det( (x, y) ) ̸= 0 for every
∂y
(x, y) ∈ B((a, b), r), F |B((a,b),r) : B((a, b), r) → V is a bijective open map, and G is the inverse of
F |B((a,b),r) . Moreover, G must be of the form G(x, y) = (x, h(x, y)) since F (x, y) = (x, f (x, y)).
Let A = {x ∈ Rn : (x, 0) ∈ V }, and g : A → Rm be g(x) = Π(G(x, 0)), where Π : Rn+m → Rm

is the projection Π(x, y) = y. Then A is an open neighborhood a (since (a, 0) = F (a, b)), and
g is a C 1 -function with g(x) = h(x, 0). For (x, y) ∈ U , we observe that (x, y) ∈ B((a, b), r) and
f (x, y) = 0 ⇔ (x, y) ∈ B((a, b), r) and F (x, y) = (x, 0) ⇔ (x, 0) ∈ V and G(x, 0) = (x, y) ⇔ x ∈ A
and g(x) = y. Hence {(x, y) ∈ B((a, b), r) : f (x, y) = 0} = {(x, g(x)) : x ∈ A}, and g(a) = b.
(ii) Let ϕ : A → Rn+m be ϕ(x) = (x, g(x)) = G(x, 0), which is a C 1 -function with f ◦ ϕ ≡ 0.
We deduce by the Chain rule that 0m×n = Jf ◦ϕ (x) = Jf (ϕ(x))Jϕ (x) for every x ∈ A. But
 
[ ]
∂f ∂f I
and Jϕ (x) =  
n
Jf (ϕ(x)) = (x, g(x)) (x, g(x)) . Hence we get
∂x ∂y m×(n+m) Jg (x)
(n+m)×m
[ ] [ ] [ ]−1 [ ]
∂f ∂f ∂f ∂f
0m×n = (x, g(x)) + (x, g(x)) Jg (x). So, Jg (x) = − (x, g(x)) (x, g(x)) .
∂x ∂y ∂y ∂x
Remark: The required function g in [116] can be produced in a simpler way when n = m = 1.
Let U ⊂ R2 be open, f : U → R be a C 1 -function, and (a, b) ∈ U be such that f (a, b) = 0 and
∂f ∂f
(a, b) ̸= 0. Replacing f with −f if necessary, assume (a, b) > 0. By the C 1 -property of f , we
∂y ∂y
∂f
may choose δ > 0 with > 0 in [a − δ, a + δ] × [b − δ, b + δ]. Then f (x, ·) is strictly increasing
∂y
in [b − δ, b + δ] for each x ∈ [a − δ, a + δ]. In particular, f (a, b − δ) < 0 < f (a, b + δ). Therefore,
we may choose ε ∈ (0, δ) such that f (x, b − δ) < 0 < f (x, b + δ) whenever |x − a| < ε. For each
x ∈ (a−ε, a+ε), applying the intermediate value property to the strictly increasing function f (x, ·),
we may find a unique y ∈ (b − δ, b + δ) with f (x, y) = 0. Define g : (a − ε, a + ε) → R as g(x) = y.
We mention two examples to point out some subtle aspects of [116]:

∂f
Example: (i) Let f : R2 → R be f (x, y) = x2 − y 2 = (x − y)(x + y). Then f (0, 0) = 0, (0, 0) = 0,
∂y
and the set {(x, y) : f (x, y) = 0} is the union of the lines x = y and x = −y. If we define g : R → R
as g(x) = x (or, g(x) = −x), then g is a C 1 -function and f (g(x)) = 0 for every x ∈ R. But in every
neighborhood of (0, 0), the set {(x, y) : f (x, y) = 0} contains points that are not on the graph of g.
(ii) Let f : R2 → R be f (x, y) = x2 − y 3 , and g : R → R be g(x) = (x2 )1/3 . Then f (0, 0) = 0,

∂f
(0, 0) = 0, and {(x, y) ∈ R2 : f (x, y) = 0} = {(x, g(x)) : x ∈ R}. But g is not differentiable at 0.
∂y
6. Tangent spaces and Lagrange’s multiplier method
To find local extrema of a function f restricted to a subset of its original domain (i.e., under
some constraints), a technique called Lagrange’s Multiplier Method (LMM) is often useful. This
method does not pinpoint the local extrema, but helps us to narrow down our search to possible
solutions. First we will explain theoretically why this method works. In this context, we will briefly
mention what is a tangent space to a level set; to learn more about tangent vectors and tangent
spaces, see some textbook in Differential Geometry.
[117] Let U ⊂ Rn+m be open, f : U → Rm be a C 1 -function, c ∈ Rm , and S = f −1 (c) = {(x, y) ∈

U : f (x, y) = c} (which is called a level set of f ). Let p = (a, b) ∈ S be such that the linear
map f ′ (p ; ·) : Rn+m → Rm is surjective, or equivalently, Jf (p) has full rank m. Then for any
v = (v1 , v2 ) ∈ Rn+m , the following are equivalent:
(i) f ′ (p ; v) = 0 ∈ Rm .
(ii) There exist δ > 0 and a C 1 -path α : (−δ, δ) → Rn+m such that α(t) ∈ S for every t ∈ (−δ, δ),
α(0) = p, and α′ (0) = v.
Proof. Replacing f with f − c, we may suppose c = 0 ∈ Rm .

(ii) ⇒ (i): Since f ◦ α : (−δ, δ) → Rm is constant, f ◦ α ≡ c = 0, we get by Chain rule that
0 = (f ◦ α)′ (0) = f ′ (α(0); α′ (0)) = f ′ (p; v).
(i) ⇒ (ii): Since Jf (p) = Jf (a, b) has full rank

[ m, we]may suppose after a permutation of the
∂f
coordinates of Rn+m that the m × m matrix (a, b) is invertible. Then by Implicit function
∂y
theorem, there exist r > 0 with B((a, b), r) ⊂ U , an open neighborhood A ⊂ Rn of a, and a
C 1 -function g : A → Rm such that
(*) g(a) = b, and S ∩ B((a, b), r) = {(x, y) ∈ B((a, b), r) : f (x, y) = 0} = {(x, g(x)) : x ∈ A}.
[ ]−1 [ ]
∂f ∂f
(**) Jg (a) = − (a, b) (a, b) .
∂y ∂x
[ ] [ ]
′ ∂f ∂f
By (i), we have f ((a, b); (v1 , v2 )) = 0, or in other words (a, b) v1 + (a, b) v2 = 0 ∈ Rm .
∂x ∂y
[ ]−1 [ ]
∂f ∂f
Hence v2 = − (a, b) (a, b) v1 = Jg (a)v1 by (**), and thus g ′ (a; v1 ) = v2 . Since A is an
∂y ∂x
open neighborhood of a, there is δ > 0 with a + tv1 ∈ A for every t ∈ (−δ, δ). Define α : (−δ, δ) →
Rn+m as α(t) = (a + tv1 , g(a + tv1 )). Then α is a C 1 -path with α(0) = (a, g(a)) = (a, b) = p, and
α(t) ∈ S for every t ∈ (δ, δ) by (*). Also, α′ (0) = (v1 , g ′ (a; v1 )) = (v1 , v2 ) = v.
Definition: Let U ⊂ Rn+m be open, f : U → Rm be a C 1 -function, c ∈ Rm , and S = f −1 (c). Let

p = (a, b) ∈ S be such that the linear map f ′ (p ; ·) : Rn+m → Rm is surjective, or equivalently,
Jf (p) has full rank m (when m = 1, this condition means simply that ∇f (p) ̸= 0). We say v =
(v1 , v2 ) ∈ Rn+m is a tangent vector to the level set S at p if there is a C 1 -path α : (−δ, δ) → Rn+m
(for some δ > 0) such that α(t) ∈ S for every t ∈ (−δ, δ), α(0) = p, and α′ (0) = v. The set of all
tangent vectors of S at p is called the tangent space of S at p and it is denoted as Tp S.
[118] Let U ⊂ Rn+m be open, f = (f1 , . . . , fm ) : U → Rm be a C 1 -function, c ∈ Rm , and

S = f −1 (c). Let p ∈ S be such that the linear map f ′ (p ; ·) : Rn+m → Rm is surjective, or
equivalently, Jf (p) has full rank m. Then,
(i) The tangent space Tp S is an n-dimensional vector subspace of Rn+m .
(ii) The vectors ∇f1 (p), . . . , ∇fm (p) in Rn+m are linearly independent.
(iii) For each i ∈ {1, . . . , n}, ∇fi (p) ⊥ Tp (S), i.e., ⟨∇fi (p), v⟩ = 0 for every v ∈ Tp S (in particular,
if m = 1, then ∇f (p) is normal to Tp (S)).
(iv) The orthogonal complement of Tp S is equal to the span of the vectors ∇f1 (p), . . . , ∇fm (p),
⊕
and therefore we have the orthogonal decomposition Rn+m = span{∇fi (p) : 1 ≤ i ≤ m} Tp S.
Proof. (i) By [117], Tp S is equal to the kernel of the surjective linear map f ′ ((p ; ·) : Rn+m → Rm .
(ii) These vectors are the rows of Jf (p), which has rank m since f ′ (p ; ·) : Rn+m → Rm is surjective.
(iii) Let v ∈ Tp S. Then f ′ (p ; v) = (⟨∇f1 (p), v⟩, . . . , ⟨∇fm (p), v⟩) = 0 ∈ Rm by [117]. Or argue as
follows. Choose α as in [117](ii). Since f ◦ α is constant, we have that fi ◦ α is constant for each i.
Hence by Chain rule, 0 = (fi ◦ α)′ (0) = ⟨∇fi (α(0)), α′ (0)⟩ = ⟨∇fi (p), v⟩.
(iv) This follows from (i), (ii), and (iii).

∑n+1
Example: (i) Let f : Rn+1 → R be f (x) = j=1 x2j , and S = f −1 (1), which is the unit sphere in
Rn+1 . Let p ∈ S, and note that ∇f (p) = 2p ̸= 0 since p ∈ S. Hence by [118], the tangent space
Tp S is an n-dimensional vector subspace of Rn+1 , and every geometric tangent vector to S at p can
be realized as the velocity vector of a C 1 -path in S as described by [117](ii). Moreover, ∇f (p) is
normal to Tp (S) at p by [118](iii).
(ii) Let f : R2 → R be f (x, y) = xy and S = f −1 (0). Then S is equal to the union of x-axis and
y-axis. Geometrically, the collection of all tangent vectors to S at (0, 0) is equal to S itself, and this
is not a vector subspace of R2 . Here, the problem is that ∇f (0, 0) = (0, 0), and therefore the linear
map f ′ ((0, 0) : ·) : R2 → R is not surjective; in fact, this linear map is the zero map, and hence
its kernel is the whole of R2 , which is strictly larger than the collection of all geometric tangent
vectors to S at (0, 0).
Now we will state the result justifying the Lagrange’s Multiplier method (LMM), and then we
will illustrate its use through several examples.
[119] [LMM theorem] Let U ⊂ Rn+m be open, g : U → R be differentiable, and f = (f1 , . . . , fm ) :

U → Rm be a C 1 -function. Let c ∈ Rm , S = f −1 (c), and p ∈ S be a local extremum of the restricted
function g|S . If the linear map f ′ (p ; ·) : Rn+m → Rm is surjective, or equivalently, if Jf (p) has full
∑
rank m, then there are λ1 , . . . , λm ∈ R (called multipliers) such that ∇g(p) = m i=1 λi ∇fi (p).
Proof. We claim that ∇g(p) ⊥ Tp S. To prove the claim, consider v ∈ Tp S. Then by [117],
there is a C 1 -path α : (−δ, δ) → Rn+m such that α(t) ∈ S for every t ∈ (−δ, δ), α(0) = p, and
α′ (0) = v. Then g ◦ α : (−δ, δ) → R is a differentiable function having a local extremum at 0. Hence
0 = (g ◦ α)′ (0) = g ′ (α(0); α′ (0)) = g ′ (p; v) = ⟨∇g(p), v⟩. This proves the claim. By the claim and
[118](iv), it follows that ∇g(p) ∈ span{∇f1 (p), . . . , ∇fm (p)}.
The Lagrange’s Multiplier Method (LMM) may be explained roughly as follows. Suppose g is a
real-valued function defined on an open subset U of Rn+m , and we need to find the (local) extrema of
g|S , where S := f −1 (c) is the level set of a function f = (f1 , . . . , fm ) : U → Rm . If the assumptions
of [119] are satisfied, then any local extremum p ∈ S of g|S must satisfy the following: (i) f (p) = c,
∑m
and (ii) ∇g(p) = i=1 λi ∇fi (p) for some λ1 , . . . , λm ∈ R. Let S0 = {p ∈ U : p satisfies (i) and (ii)}.
Based on the given problem, we may be able to identify the subset S0 of S (often, S0 is a finite
subset). In this manner, LMM helps us to narrow down our search. Next, by examining each
p ∈ S0 , we need to determine using other considerations whether p is a (local) extremum of g|S .
Example: (i) We wish to find the maximum/minimum of x + y subject to the constraint that
x2 + y 2 = 1. Define g, f : R2 → R as g(x) = x + y and f (x, y) = x2 + y 2 , which are C 1 -functions.
Let S = f −1 (1), and note that ∇f (p) = 2p ̸= 0 for every p ∈ S. If p = (a, b) ∈ S is a local
extremum of g|S , then ∇g(p) = λ∇f (p) for some λ ∈ R by [119]. This gives (1, 1) = 2λ(a, b). Also
1 1 1
a2 + b2 = 1 since (a, b) ∈ S. Therefore, λ = ± √ and (a, b) = ±( √ , √ ). LMM gives only this
2 2 2
much information. Now by direct examination, we deduce that x + y attains its maximum and
1 1 −1 −1
minimum subject to the constraint x2 + y 2 = 1 respectively at ( √ , √ ) and ( √ , √ ).
2 2 2 2
(ii) We wish to find the unique point on the line y = x + 5 at a minimum distance to (2, 1). Define
g, f : R2 → R as g(x, y) = (x − 2)2 + (y − 1)2 , f (x, y) = y − x (which are C 1 -functions), and
put S = f −1 (5). We know geometrically that g|S has a unique minimum. Note that ∇f (x, y) =
(−1, 1) ̸= (0, 0) for every (x, y) ∈ S. Hence if (a, b) ∈ S is where g|S attains its minimum, then
∇g(a, b) = λ∇f (a, b) for some λ ∈ R by [119]. Solving the equations 2((a − 2), (b − 1)) = λ(−1, 1)
and b = a + 5, we get (a, b) = (−1, 4), which must be the required point.
(iii) We wish to maximize x + y + z subject to x − y = 1 and x + z 2 = 1. Let g : R3 → R be

−1
 f = (f1 , f2 ) : R → R be f (x, y, z) = (x − y, x + z ). Let S = f (1, 1).
g(x, y, z) = x + y + z and 3 2 2
1 −1 0
Note that Jf (x, y, z) =   has rank 2 for every (x, y, z) ∈ S. If p = (x, y, z) ∈ S is a local
1 0 2z
extremum of g|S , then by [119], ∇g(p) = λ1 ∇f1 (p)+ λ2 ∇f2 (p) and f (p) = (1, 1). The first equation
implies λ1 = −1, λ2 = 2, and z = 1/4. Then using f (p) = (1, 1), we get x = 1 − z 2 = 15/16 and
y = x − 1 = −1/16. Thus p = (15/16, −1/16, 1/4), and g(p) = 9/8. Observe that any point q ∈ S
must be of the form q = (1 − z 2 , −z 2 , z). If |z| > 1, then g(q) = 1 + z − 2z 2 < 0 < g(p). Moreover,
{(x, y, z) ∈ S : |z| ≤ 1} is compact, and g must have a maximum on this compact set. Therefore,
g|S must attain its maximum at p, and max g(S) = g(p) = 9/8.
(iv) We wish to find the maximum volume of a 3-dimensional rectangular box A with surface
∑
area c > 0. Suppose that A = { 3j=1 tj ej : 0 ≤ t1 ≤ x; 0 ≤ t2 ≤ y; 0 ≤ t3 ≤ z}, where
x, y, z are positive. Then the volume of A is xyz and surface area of A is 2(xy + yz + xz). Let
U = {(x, y, z) ∈ R3 : x > 0, y > 0, z > 0}. Define g, f : U → R as g(x, y, z) = xyz and
f (x, y, z) = xy + yz + xz, and put S = f −1 (c/2). For (x, y, z) ∈ S, from xy + yz + xz = c/2,
c2
we have xy < c/2 and xz < c/2, and therefore g(x, y, z) = xyz < → 0 as x → ∞. Similarly,
4x
g(x, y, z) → 0 when y → ∞, and also when z → ∞, for (x, y, z) ∈ S. Moreover, xyz = 0 if one
of x, y, z is 0. These observations imply that g|S must have a maximum, say at p = (x0 , y0 , z0 ).
Since ∇f (p) = (y0 + z0 , x0 + z0 , x0 + y0 ) ̸= 0 ∈ R3 by the definition of U , we deduce by [119] that
λx0 λy0
∇g(p) = λ∇f (p) for some λ ∈ R. Then y0 = = z0 , and similarly, x0 = = z0 . This
x0 − λ y0 − λ
means the volume is maximum when the box is a cube. Using f (p) = c/2, we get x0 = (c/6)1/2
and hence the maximum volume is (c/6)3/2 .
(v) Let n ≥ 2 and A ∈ Rn×n be a symmetric matrix. We will show A has a real eigenvalue using
LMM. Define g, f : Rn → R as g(x) = ⟨Ax, x⟩, f (x) = ∥x∥2 = ⟨x, x⟩ (which are C 1 -functions),
and put S = f −1 (1), which is the unit sphere in Rn . By Exercise-10(iii) and the symmetry of
A, we have ⟨∇g(x), v⟩ = g ′ (x; v) = ⟨Ax, v⟩ + ⟨x, Av⟩ = 2⟨Ax, v⟩, and hence ∇g(x) = 2Ax. Also,
∇f (x) = 2x ̸= 0 for x ∈ S. Since S is compact, there is x ∈ S where g|S attains its maximum. By
[119], we get ∇g(x) = λ∇f (x) for some λ ∈ R, which implies Ax = λx. Then λ is an eigenvalue of
A because x is a unit vector (∵ x ∈ S). We remark that this type of argument can be continued to
show A is diagonalizable, and all its eigenvalues are real (for the next step, take the unit sphere in
Y := {y ∈ Rn : ⟨x, y⟩ = 0} in the place of S).
∏n 2
∑n 2
Exercise-17: (i) Maximize j=1 xj subject to j=1 xj = 1.
∑n
∏n j=1 aj
(ii) [Geometric mean ≤ Arithmetic mean] If a1 , . . . , an ∈ (0, ∞), then ( j=1 aj )1/n ≤ .
∏n ∑n n
2 2 −1
[Hint: (i) Let g(x1 , . . . , xn ) = j=1 xj , f (x1 , . . . , xn ) = j=1 xj , and S = f (1). Then g has a
positive maximum on the compact set S, say at p = (x1 , . . . , xn ), where xj ̸= 0 for every j. Since
∇f (p) = 2p ̸= 0, LMM gives ∇g(p) = λ∇f (p) for some λ ∈ R. This implies g(x) = λx2j for every
∑
j and hence ng(x) = nj=1 λx2j = λ. Therefore, g(x) = ng(x)x2j , or x2j = 1/n for every j. Thus
∏n
1 √ 1 b j=1 aj
g(p) = n . (ii) Let bj = aj and b = (b1 , . . . , bn ). Then n = g(p) ≥ g( ) = ∑n .]
n n ∥b∥ ( j=1 aj )n
7. Multivariable Riemann integration over a box

∏
Definition: (i) We say A ⊂ Rn is an n-box, if A is a product of n closed intervals, A = nj=1 [aj , bj ].
∏
Its n-dimensional volume (Lebesgue measure) is µ(A) = µn (A) := nj=1 (bj − aj ). If there is r > 0
with bj − aj = r for 1 ≤ j ≤ n, the we say A is an n-cube with side length r; in this case, µ(A) = rn .
(ii) Recall from Real analysis that P = {a0 ≤ a1 ≤ · · · ≤ ak } is a partition of [a, b] if a0 = a

∏ ∏
and ak = b. We say P = nj=1 Pj is a partition of an n-box A = nj=1 [aj , bj ] if Pj is a partition
of [aj , bj ] for each j ∈ {1, . . . , n}. Observe that if Pj = {aj,0 ≤ aj,1 ≤ · · · ≤ aj,kj }, then the
∏ ∏
partition P = nj=1 Pj divides the n-box A into k := nj=1 kj sub n-boxes D1 , . . . , Dk with pairwise
∑
disjoint interiors, and µ(A) = ki=1 µ(Di ). In this case, we define the norm (or mesh) ∥P ∥ of P as
∥P ∥ = max{diam(Di ) : 1 ≤ i ≤ k}.
∏n ∏n
(iii) Let P = j=1 Pj and Q = j=1 Qj be two partitions of an n-box A ⊂ Rn . If Pj ⊂ Qj for
1 ≤ j ≤ n, then we say Q is a refinement of P .
Definition: Let A ⊂ Rn be an n-box and f : A → R be a bounded function.

(i) Let P be a partition of A and D1 , . . . , Dk be the sub n-boxes of A determined by P . The
lower Riemann sum L(f, P ) and upper Riemann sum U (f, P ) of f with respect to P are defined as
∑ ∑
L(f, P ) = ki=1 (inf f (Di ))µ(Di ) and U (f, P ) = ki=1 (sup f (Di ))µ(Di ). As in one-variable theory,
we may show L(f, P ) ≤ L(f, Q) ≤ U (f, Q) ≤ U (f, P ) whenever Q is a partition of A refining P ,
and consequently L(f, P ) ≤ U (f, P ′ ) for any two partitions P, P ′ of A.
(ii) The lower Riemann integral L(f ) and upper Riemann integral U (f ) of f over A are defined as
L(f ) = sup{L(f, P ) : P is a partition of A} and U (f ) = inf{U (f, P ) : P is a partition of A}. By
the last sentence of (i) above, we have that L(f ) ≤ U (f ) always. If L(f ) = U (f ) =: y ∈ R, then
∫ ∫
we say f is Riemann integrable over A and we write A f (x)dx = y (or simply A f dx = y).
(iii) Let P be a partition of A, and D1 , . . . , Dk be the sub n-boxes of A determined by P . A

finite set T = {t1 , . . . , tk } ⊂ A is called a tag for P if ti ∈ Di for 1 ≤ i ≤ k. If this holds, then
∑
S(f, P, T ) := ki=1 f (ti )µ(Di ) is called a Riemann sum of f with respect to the partition P . Clearly,
L(f, P ) ≤ S(f, P, T ) ≤ U (f, P ). Also note that L(f, P ) = inf{S(f, P, T ) : T is a tag for P }, and
U (f, P ) = sup{S(f, P, T ) : T is a tag for P }.
Notation: Let R(A) denote the collection of all Riemann integrable functions f : A → R.
The results about Riemann integration in the multivariable case are analogous to the results in
the one-variable case. Some of the results are stated below and the proofs are left to the student3.
Exercise-18: Let A ⊂ Rn be an n-box and f : A → R be a bounded function.

(i) Let P be a partition of A and D1 , . . . , Dk are the sub n-boxes of A determined by P . Then
∑
U (f, P ) − L(f, P ) = ki=1 diam(f (Di ))µ(Di ).
(ii) f ∈ R(A) ⇔ for every ε > 0, there is a partition P of A with 0 ≤ U (f, P ) − L(f, P ) ≤ ε .
(iii) If f is continuous, then f ∈ R(A).
∫
(iv) If there is c ∈ R such that f (x) = c for every x ∈ int(A), then f ∈ R(A) and A f dx = cµ(A).
∫
In particular, A 0 dx = 0.
∫
(v) If µ(A) = 0, then A f dx = 0.
[Hint: (i) Consider ε > 0. Since f is uniformly continuous on the compact set A, there is δ > 0
ε
such that ∥f (x) − f (y)∥ < for every x, y ∈ A with ∥x − y∥ < δ. Let P be a partition of A
µ(A) + 1
3See my notes Real Analysis for the proofs in the one-variable case.
with ∥P ∥ < δ, and let D1 , . . . , Dk be the sub n-boxes of A determined by P . Then diam(f (Di )) ≤
ε
for every i by the choice of δ. Now (i) may be applied.]
µ(A) + 1
Exercise-19: Let A ⊂ Rn be an n-box and f ∈ R(A). Then for every ε > 0, there is a δ > 0 such
that the following are true for every partition P of A with ∥P ∥ < δ:
∫
(i) 0 ≤ U (f, P ) − L(f, P ) ≤ ε and L(f, P ) ≤ A f dx ≤ U (f, P ).
∫
(ii) | A f dx − S(f, P, T )| ≤ ε for any Riemann sum S(f, P, T ) of f with respect to P .
Consequently, for any f ∈ R(A) and any sequence (Pn ) of partitions of A with (∥Pn ∥) → 0, the
∫
following are true: A f dx = limn→∞ L(f, Pn ) = limn→∞ U (f, Pn ) = limn→∞ S(f, Pn , Tn ) for any
choice of tags Tn of Pn .
Exercise-20: Let A ⊂ Rn be an n-box and f, g ∈ R(A). Then,

∫ ∫ ∫
(i) [Linearity] f + g ∈ R(A) and A (f + g)dx = A f dx + A gdx.
∫ ∫
Moreover, cf ∈ R(A) and A (cf )dx = c A f dx for every c ∈ R.
(ii) f g ∈ R(A), where f g(x) := f (x)g(x).
(iii) If g is non-vanishing in A and 1/g is bounded, then 1/g, f /g ∈ R(A).
∫ ∫ ∫
(iv) If f ≥ 0, then A f dx ≥ 0. If f ≤ g, then A f dx ≤ A gdx.
∫
(v) If s, t ∈ R are with s ≤ f ≤ t, then s(b − a) ≤ A f dx ≤ t(b − a).
∫ ∫
(vi) |f | ∈ R(A) and | A f dx| ≤ A |f |dx.
(vii) max{f, g}, min{f, g} ∈ R(A). So, f + := max{f, 0} and f − := − min{f, 0} belong to R(A).
Exercise-21: Let A ⊂ Rn be an n-box. (i) Let f : A → R be a bounded function, and D1 , . . . , Dk

be the sub n-boxes of A determined by some partition P of A. Then f ∈ R(A) ⇔ f |Di ∈ R(Di )
∫ ∑ ∫
for 1 ≤ i ≤ k. Moreover, when both sides hold, then A f dx = ki=1 Di f dx.
(ii) Let f : A → R be a bounded function with only finitely many discontinuities. Then f ∈ R(A).
(iii) If f ∈ R(A) and g : A → R is a bounded function such that {x ∈ A : f (x) ̸= g(x)} is finite,
∫ ∫
then g ∈ R(A) and A gdx = A f dx.
Exercise-22: Let A ⊂ Rn be an n-box. (i) [Mean value theorem of Riemann integration] If f : A → R

∫
is continuous, there is a ∈ A with f (a)µ(A) = A f dx.
(ii) Let (fm ) be a sequence in R(A) converging uniformly to a function f : A → R. Then f ∈ R(A)
∫ ∫
and A f dx = limm→∞ A fm dx.
(iii) If f : A → [u, v] is Riemann integrable and g : [u, v] → R is continuous, then g ◦ f ∈ R(A).
(iv) If f ∈ R(A) and f ≥ 0, then f 1/k ∈ R(A) for every k ∈ N.
∫ ∫
∫ µ(A) > 0. Let u = min f (A) and v = max f (A). Then A udx ≤ A f dx ≤
[Hint: (i) Assume
∫ A f dx
A vdx, or u ≤ µ(A) ≤ v. Choose x, y ∈ A with f (x) = u and f (y) = v. Let α : [0, 1] → A be
α(t) = x+t(y−x). Applying the intermediate value property to f ◦α, find t0 ∈ [0, 1] with f (α(t0 )) =
∫
A f dx
. Take a = f (α(t0 )). (ii) Verify that f is bounded. Now let d∞ be the supremum metric
µ(A)
on the collection of all bounded real-valued functions on A. Let C > max{3, µ(A)}. Given ε > 0,
choose k ∈ N such that d∞ (f, fm ) < ε/C for every m ≥ k. Since fk ∈ R(A), there is a partition P
of A with U (fk , P ) − L(fk , P ) ≤ ε/C. Since U (f, P ) ≤ U (fk ) + ε/C and L(f, P ) ≥ L(fk , P ) − ε/C,
it follows by the choice of C that U (f, P ) − L(f, P ) ≤ ε. Hence f ∈ R(A). Moreover, for every
∫ ∫ ∫ ∫
m ≥ k we have | A f dx − A fm dx| ≤ A |f − fm |dx ≤ A (ε/C)dx ≤ ε. (iii) Approximate g with
polynomials by Weierstrass approximation theorem, and use (ii).]
For further discussion of Riemann integration, we need the notion of a null set (a set of Lebesgue
measure zero) in Rn - defined below - and some of the results pertaining to null sets.
Definition: We say X ⊂ Rn is a null set (or a set of Lebesgue measure zero) in Rn if for every
∪ ∑∞
ε > 0, there is a sequence (Ak ) of n-boxes with X ⊂ ∞
k=1 Ak and k=1 µ(Ak ) ≤ ε. For example,
if A ⊂ Rn is an n-box, then ∂A is a null set (because each face of A is an n-box of zero volume).
Exercise-23: Let X ⊂ Rn . (i) If X is a null set, then every subset of X is a null set.
(ii) X is a null set ⇔ for every ε > 0, there is a sequence (Ak ) of n-boxes in Rn such that
∪ ∑∞
X⊂ ∞ k=1 int(Ak ) and k=1 µ(Ak ) ≤ ε ⇔ for every ε > 0, there is a sequence (Ak ) of n-cubes in
∪∞ ∑
R such that X ⊂ k=1 Ak and ∞
n
k=1 µ(Ak ) ≤ ε.
(iii) If X is a compact null set, then for every ε > 0, there are finitely many n-boxes A1 , . . . , Ap in
∪ ∑
Rn with X ⊂ pk=1 int(Ak ) and pk=1 µ(Ak ) ≤ ε.
(iv) If X is equal to a countable union of null sets in Rn , then X is a null set.
(v) If X is countable, then X is a null set by (iv) because every singleton is a null set.
Exercise-24: (i) Let X ⊂ Rn be compact and f : X → Rm be continuous. Then its graph G(f ) is
a null set in Rm+n .
(ii) Let f : Rn → Rm be continuous. Then its graph G(f ) is a null set in Rm+n .
[Hint: (i) Let A be an n-box with X ⊂ A and µn (A) > 0. Consider ε > 0. Choose ε0 > 0
ε
with (2ε0 )m < , and let δ > 0 be such that ∥f (x) − f (y)∥ < ε0 for every x, y ∈ X with
µn (A)
∥x − y∥ < δ. Let P be a partition of A with ∥P ∥ < δ. Then A gets divided into finitely many sub
n-boxes of diameter < δ. Let D1 , . . . , Dk be a listing of those sub n-boxes intersecting X. Since
diam(f (Di ∩ X)) ≤ ε0 , there is an m-cube Ei ⊂ Rm of side length 2ε0 with f (Di ∩ X) ⊂ Ei .
ε ∪ ∑
Note that µm (Ei ) = (2ε0 )m < . Now G(f ) ⊂ ki=1 (Di × Ei ), and ki=1 µn+m (Di × Ei ) =
µn (A)
∑k ∑k ε ε
i=1 µn (Di )µm (Ei ) ≤ i=1 µn (Di ) = µn (A) × = ε. (ii) Write Rn as a countable
µn (A) µn (A)
union of compact sets, and use (i) and Exercise-23(iv).]
[120] Let U ⊂ Rn be a nonempty open set. Then, (i) There is a sequence (Kj ) of compact sets in
∪
Rn such that U = ∞j=1 Kj and Kj ⊂ int(Kj+1 ) for every j ∈ N.
(ii) In addition, we may choose Kj ’s in such a way that each Kj is a finite union of n-cubes with
pairwise disjoint interiors.
Proof. (i) If U = Rn , let Kj = B(0, j). Else, let Kj = B(0, j) ∩ {x ∈ Rn : dist(x, Rn \ U ) ≥ 1/j}.
(ii) Choose Kj ’s as in (i). Now fix j ∈ N. Choose δ > 0 such that the δ-neighborhood Nδ (Kj ) :=
{x ∈ Rn : dist(x, Kj ) < δ} of Kj is included in int(Kj+1 ). Let A be an n-cube containing Kj , and
P be a partition of A with ∥P ∥ < δ. The partition P divides A into sub n-boxes. Let Y1 , . . . , Yk
∪
be a listing of those sub n-boxes of A intersecting Kj , and put Ej = ki=1 Yi . Then Ej is compact,
∪
and Kj ⊂ Ej ⊂ Nδ (Kj ) ⊂ int(Kj+1 ). Carry out this construction for each j. Then U = ∞ j=1 Ej
and Ej ⊂ int(Ej+1 ). Thus the new collection (Ej ) of compact sets satisfies the requirement.
[121] Let U ⊂ Rn be open, and f : U → Rn be a C 1 -function.

(i) Let A ⊂ U be a compact convex set, and λ = sup{∥f ′ (c; ·)∥ : c ∈ A}, where ∥f ′ (c; ·)∥ denotes
the operator norm of f ′ (c; ·). Then λ < ∞, and f |A is λ-Lipschitz, i.e., ∥f (b) − f (a)∥ ≤ λ∥b − a∥
for every a, b ∈ A. Moreover, if D ⊂ A is an n-cube with side-length δ, then f (D) is contained in
√
an n-cube of side-length 2δλ n.
(ii) If X ⊂ U is a null set in Rn , then f (X) is also a null set in Rn .
Proof. (i) We have λ < ∞ since f is a C 1 -function and A is compact. Consider a, b ∈ A. Then
the line segment [a, b] ⊂ A because A is convex. By Mean value inequality [109](iii), we see
that ∥f (b) − f (a)∥ ≤ λ∥b − a∥. Next, consider an n-cube D ⊂ A with side-length δ. Then
√ √
diam(D) = δ n, and hence diam(f (D)) ≤ δλ n. Consequently, f (D) can be put inside some
√
n-cube of side-length 2δλ n.
(ii) First suppose there is an n-cube A with X ⊂ A ⊂ U . Since U is open, by enlarging A a little
bit we may suppose that there is δ > 0 such that the δ-neighborhood Nδ (X) of X is included
in A. By part (i), f |A is λ-Lipschitz for some λ > 0. Consider ε > 0, and choose ε0 > 0
√ ∪
with (2λ n)n ε0 ≤ ε. By Exercise-23(ii), there is a sequence (Ck ) of n-cubes with X ⊂ ∞ k=1 Ck
∑∞
and k=1 µ(Ck ) ≤ ε0 . By partitioning Ck ’s into smaller cubes if necessary, we may suppose
Ck ⊂ Nδ (X) ⊂ A for every k ∈ N. Let δk be the side-length of Ck . By (i), there is an n-cube
√ ∪ ∑∞
Ek ⊂ Rn of side-length 2δk λ n with f (Ck ) ⊂ Ek . Hence f (X) ⊂ ∞ k=1 Ek and k=1 µ(Ek ) =
∑∞ √ n √ n ∑∞ √ n
k=1 (2δk λ n) = (2λ n) k=1 µ(Ck ) ≤ (2λ n) ε0 ≤ ε. Thus f (X) is a null set.
∪∞
In the general case, using [120], first write U as a countable union of n-cubes, U = i=1 Ai . By
what is proved above, f (X ∩ Ai ) is a null set for each i ∈ N. Since a countable union of null sets is
∪
again a null set, it follows that f (X) = ∞ i=1 f (X ∩ Ai ) is also null set.
Definition: Let X ⊂ Rn and f : X → R be a function. The oscillation ω(f, x) of f at a point

x ∈ X is defined as ω(f, x) = inf δ>0 diam(f (X ∩ B(x, δ))) = limδ→0 diam(f (X ∩ B(x, δ))). It is
easy to verify that f is continuous at a point x ∈ X iff ω(f, x) = 0.
[122] Let A ⊂ Rn be an n-box, f : A → R be a bounded function, and X = {x ∈ A :

f is not continuous at x}. Then,
(i) [Lebesgue’s criterion for Riemann integrability] f ∈ R(A) ⇔ X is a null set in Rn .
(ii) If X is countable, then f ∈ R(A).
∪∞
Proof. (i) ⇒: Note that X = {x ∈ A : ω(f, x) > 0} = q=1 Xq , where Xq := {x ∈ A : ω(f, x) ≥
1/q}. Since a countable union of null sets is a null set, it suffices to show each Xq is a null
set in Rn . So fix q ∈ N and consider ε > 0. Since f ∈ R(A), there is a partition P of A
ε
with U (f, P ) − L(f, P ) ≤ . Let D1 , . . . , Dk be the sub n-boxes of A determined by P , and
2q ∪
Γ = {1 ≤ i ≤ k : Xj ∩ int(Di ) ̸= ∅}. Write Xq = Xq′ ∪ Xq′′ , where Xq′ = i∈Γ (Xq ∩ int(Di )) and
∪ ∪ ∪
Xq′′ = Xq ∩ ( ki=1 ∂Di ). Choose finitely many n-boxes E1 , . . . , Em such that ki=1 ∂Di ⊂ m j=1 Ej
∑m
and j=1 µ(Ej ) ≤ ε/2 (in fact, we may choose Ej ’s to be the (n − 1)-dimensional faces of Di ’s, and
then µ(Ej ) = 0 for each j). If i ∈ Γ, then Xq ∩int(Di ) ̸= ∅, and therefore diam(f (Di )) ≥ 1/q. Hence
ε ∑ 1∑ ∑
≥ U (f, P ) − L(f, P ) ≥ i∈Γ diam(f (Di ))µ(Di ) ≥ i∈Γ µ(Di ), which implies i∈Γ µ(Di ) ≤
2q ∪ ∪ q∑ ∑
ε/2. Thus Xq = Xq′ ∪ Xq′′ ⊂ ( i∈Γ Di ) ∪ ( j=1 Ej ) and i∈Γ µ(Di ) + j=1 µ(Ej ) ≤ ε/2 + ε/2 = ε.
m m
(i) ⇐: Let ε > 0 be given. We need to find a partition P of A with U (f, P ) − L(f, P ) ≤ ε. Let
M > |f | and ε0 = ε/(2M + µ(A)). Since X is a null set, there is a sequence (Aj ) of n-boxes with
∪ ∑∞
X ⊂ ∞ j=1 int(Aj ) and j=1 µ(Aj ) ≤ ε0 by Exercise-23(ii). For each a ∈ A \ X, choose (by the
continuity of f at a) an n-box Ea such that a ∈ int(Ea ) and diam(f (Ea )) < ε0 . Then {int(Aj ) :
j ∈ N} ∪ {int(Ea ) : a ∈ A \ X} is an open cover for the compact set A. Hence there exist m ∈ N and
a1 , . . . , am ∈ A such that {int(Aj ) : 1 ≤ j ≤ m} ∪ {int(Eaj ) : 1 ≤ j ≤ m} is an open cover for A.
∪ ∪m ∑m
Let A′j = A∩Aj and Ej′ = A∩Eaj . Then we have A = ( m ′ ′
j=1 Aj )∪( j=1 Ej ),
′
j=1 µ(Aj ) ≤ ε0 , and
diam(f (Ej′ )) < ε0 for 1 ≤ j ≤ m. Write each A′j and each Ej′ as products of closed intervals, and use
the endpoints of those closed intervals to define a partition P of A. Then the sub n-boxes D1 , . . . , Dk
of A determined by P satisfy the following: for each i ∈ {1, . . . , k}, there is j ∈ {1, . . . , m} such that
either Di ⊂ A′j or Di ⊂ Ej′ . Let Γ1 = {1 ≤ i ≤ k : Di ⊂ A′j for some j} and Γ2 = {1 ≤ i ≤ k : Di ⊂
∑ ∑
Ej′ for some j}. Then U (f, P ) − L(f, P ) ≤ i∈Γ1 diam(f (Di ))µ(Di ) + i∈Γ2 diam(f (Di ))µ(Di ) ≤
∑ ∑ ∑ ′
∑k
2M i∈Γ1 µ(Di ) + ε0 i∈Γ2 µ(Di ) ≤ 2M m j=1 µ(Aj ) + ε0 i=1 µ(Di ) ≤ 2M ε0 + ε0 µ(A) = ε.
(ii) This is a corollary of (i) because any countable set in Rn is a null set.
Remark: Let A ⊂ Rn be an n-box. For a bounded function f : A → R, let Xf = {x ∈ A :

f is not continuous at x}. Then Xf +g ⊂ Xf ∪ Xg and Xf g ⊂ Xf ∪ Xg . Combining this observation
with [122](i) gives another proof of the fact that f + g, f g ∈ R(A) whenever f, g ∈ R(A). Similarly,
we can give another reasoning for Exercise-22(iii) using [122](ii) because Xg◦f ⊂ Xf when g is
continuous.
(ii) We wish to point out that the continuity of g is necessary in Exercise-22(iii). Let f : [0, 1] → [0, 1]
be f (0) = 1, f (x) = 0 if x is irrational, and f (p/q) = 1/q if p, q ∈ N are coprime with p ≤ q. Then
{x ∈ [0, 1] : f is not continuous at x} = [0, 1] ∩ Q, which is a countable set, and hence f is Riemann
integrable by [122]. Let g : [0, 1] → R be g(0) = 0 and g(x) = 1 if x > 0. Then g is also Riemann
integrable by [122]. But g ◦ f : [0, 1] → R is the indicator function 1[0,1]∩Q of [0, 1] ∩ Q, which is
discontinuous at every point of [0, 1]. Since [0, 1] is not a null set, we see by [122](i) (or by directly
calculating U (g ◦ f, P ) and L(g ◦ f, P )) that g ◦ f is not Riemann integrable.
Exercise-25: Let A ⊂ Rn be an n-box. (i) Let f ∈ R(A) and Y = {y ∈ A : f (y) ̸= 0}. If Y is a

∫ ∫
null set, then A f dx = 0. Conversely, if f ≥ 0 and A f dx = 0, then Y is a null set.
∫ ∫
(ii) Let f, g ∈ R(A). If {x ∈ A : f (x) ̸= g(x)} is a null set in Rn , then A f dx = A gdx. In
∫ ∫
particular, if g(x) = f (x) for every x ∈ int(A), then A f dx = A gdx.
(iii) Let f ∈ R(A). If g : A → R is a bounded function such that {x ∈ A : f (x) ̸= g(x)} is a null
set in Rn , should g be Riemann integrable?
(iv) Let f ∈ R(A). If g : A → R is a bounded function such that {x ∈ A : f (x) ̸= g(x)} is a null
∫ ∫
set in Rn , then g ∈ R(A) and A g = A f .
[Hint: (i) Assume Y is a null set. Let D1 , . . . , Dk be the sub n-boxes of A determined by a partition
P of A. If µ(Di ) > 0, then Di cannot be a subset of Y , and hence inf f (Di ) ≤ 0 ≤ sup f (Di ).
Therefore L(f, P ) ≤ 0 ≤ U (f, P ). This is true for every partition P . Since f ∈ R(A), we
∫
must have A f dx = 0. For the converse part, we may suppose µ(A) > 0. Let X = {x ∈ A :
f is not continuous at x}. By [122], it suffices to show Y ⊂ X. Consider y ∈ Y . Then f (y) > 0.
If y ∈
/ X, there is δ > 0 such that f (z) > f (y)/2 for every z ∈ A with ∥y − z∥ < δ. Let P be
a partition of A with ∥P ∥ < δ, and D1 , . . . , Dk be the sub n-boxes of A determined by P . There
is i0 such that µ(Di0 ) > 0 and y ∈ Di0 . Then inf (f (Di0 )) ≥ f (y)/2, and inf (f (Di )) ≥ 0 for
∫
every i so that A f dx ≥ L(f, P ) ≥ inf(f (Di0 ))µ(Di0 ) ≥ f (y)µ(Di0 )/2 > 0, a contradiction. (ii)
Apply (i) to f − g. (iii) Need not be. Let f, g : [0, 1] → R be f ≡ 0 and g = 1[0,1]∩Q . (iv)
{x ∈ A : g is not continuous at x} ⊂ {x ∈ A : f is not continuous at x} ∪ {x ∈ A : f (x) ̸= g(x)}.]
Remark: Those who know Lebesgue measure theory can give an easier proof for the converse part
∪
of Exercise-25(i) by writing Y = ∞ k=1 Yk , where Yk := {y ∈ A : f (y) ≥ 1/k}, and noting that
∫ ∫ ∫
A f dx = A f dµ ≥ Yk f dµ ≥ µ(Yk )/k.
Seminar topic: Let A ⊂ R2 be a 2-box (rectangle), and f : A → R be a bounded function which

is monotone on each variable separately. Then f ∈ R(A) (see Proposition 5.12 in Ghorpade and
Limaye, A Course in Multivariable Calculus and Analysis).
8. Iterated integrals and Fubini’s theorem
Exercise-26: Let A = A1 ×A2 ⊂ Rn+m be an (n+m)-box, where A1 ⊂ Rn is an n-box and A2 ⊂ Rm

is an m-box. Write (x, y) ∈ Rn+m to mean x ∈ Rn and y ∈ Rm . Let f : A → R be a bounded
function. Let fL,1 , fU,1 : A1 → R be defined as fL,1 (x) = L(f (x, ·)) and fU,1 (x) = U (f (x, ·)), the
lower and upper integrals of f (x, ·) over A2 . Similarly, let fL,2 , fU,2 : A2 → R be fL,2 (y) = L(f (·, y))
and fU,2 (y) = U (f (·, y)), the lower and upper integrals of f (·, y) over A1 . Let P = P1 × P2 be a
partition of A, where P1 is a partition of A1 , and P2 be a partition of A2 . Then,
(i) L(f, P ) ≤ L(fL,1 , P1 ) and U (fU,1 , P1 ) ≤ U (f, P ).
(ii) L(f, P ) ≤ L(fL,2 , P2 ) and U (fU,2 , P2 ) ≤ U (f, P ).
[Hint: (i) Let R1 , . . . , Rr be the sub n-boxes of A1 determined by P1 , and S1 , . . . , Ss be the
sub m-boxes of A2 determined by P2 . Then Ri × Sj for 1 ≤ i ≤ r and 1 ≤ j ≤ s are the
sub (n + m)-boxes of A determined by P . Fix i ∈ {1, . . . , r} and x ∈ Ri . Then for any j,
inf f (Ri × Sj ) ≤ inf{f (x, y) : y ∈ Sj }. Multiplying both sides with µm (Sj ) and summing over j we
∑
get sj=1 inf f (Ri × Sj )µm (Sj ) ≤ L(f (x, ·), P2 ) ≤ L(f (x, ·)) = fL,1 (x). Since this is true for every
∑
x ∈ Ri , we see sj=1 inf f (Ri × Sj )µm (Sj ) ≤ inf fL,1 (Ri ). Multiplying both sides with µn (Ri ) and
summing over i, we obtain that L(f, P ) ≤ L(fL,1 , P1 ).]
[123] [Fubini’s theorem] Let A = A1 × A2 ⊂ Rn+m be an (n + m)-box, where A1 ⊂ Rn is an n-box

and A2 ⊂ Rm is an m-box. Write (x, y) ∈ Rn+m to mean x ∈ Rn and y ∈ Rm .
(i) Let f ∈ R(A). Then the functions x 7→ L(f (x, ·)) and x 7→ U (f (x, ·)) are integrable over A1 ,
the functions y 7→ L(f (·, y)) and y 7→ U (f (·, y)) are integrable over A2 , and
∫ ∫ ∫ ∫ ∫
A f = A1 L(f (x, ·))dx = A1 U (f (x, ·))dx = A2 L(f (·, y))dy = A2 U (f (·, y))dy.
(ii) Let f ∈ R(A). Suppose that f (x, ·) ∈ R(A2 ) for each x ∈ A1 and f (·, y) ∈ R(A1 ) for each
∫ ∫ ∫ ∫
y ∈ A2 . Then the iterated integrals A1 ( A2 f (x, y)dy)dx and A2 ( A1 f (x, y)dx)dy exist and
∫ ∫ ∫ ∫ ∫
A f = A1 ( A2 f (x, y)dy)dx = A2 ( A1 f (x, y)dx)dy.
∫ ∫ ∫ ∫
(iii) If f : A → R is continuous, then A1 ( A2 f (x, y)dy)dx and A2 ( A1 f (x, y)dx)dy exist and
∫ ∫ ∫ ∫ ∫
A f (x, y)d(x, y) = (
A1 A2 f (x, y)dy)dx = A2 ( A1 f (x, y)dx)dy.
Proof. (i) As in Exercise-26, let fL,1 (x) = L(f (x, ·)), fU,1 (x) = U (f (x, ·)), fL,2 (y) = L(f (·, y)), and
fU,2 (y) = U (f (·, y)). Let P = P1 × P2 be a partition of A, where P1 , P2 are partitions of A1 , A2
respectively. Using Exercise-26(i) and the inequality fL,1 ≤ fU,1 , we get
(*) L(f, P ) ≤ L(fL,1 , P1 ) ≤ U (fL,1 , P1 ) ≤ U (fU,1 , P1 ) ≤ U (f, P ), and
(**) L(f, P ) ≤ L(fL,1 , P1 ) ≤ L(fU,1 , P1 ) ≤ U (fU,1 , P1 ) ≤ U (f, P ).

This implies U (fL,1 , P1 ) − L(fL,1 , P1 ) ≤ U (f, P ) − L(f, P ) and U (fU,1 , P1 ) − L(fU,1 , P1 ) ≤
U (f, P ) − L(f, P ). As f ∈ R(A) and P is an arbitrary partition of A, it follows that fL,1 , fU,1 ∈
∫ ∫ ∫
R(A1 ). Moreover, (*) and (**) imply that A f (x, y)d(x, y) = A1 fL,1 (x)dx = A1 fU,1 (x)dx.
∫
Similarly, we may use Exercise-26(ii) to show that fL,2 , fU,2 ∈ R(A2 ) and A f (x, y)d(x, y) =
∫ ∫
A2 fL,2 (y)dy = A2 fU,2 (y)dy.
(ii) By hypothesis L(f (x, ·)) = U (f (x, ·)) for each x ∈ A1 , and L(f (·, y)) = U (f (·, y)) for each
y ∈ A2 . So this is a corollary of (i).
(iii) If f is continuous, then the hypothesis of (ii) is satisfied.
∏
Exercise-27: Let A = nj=1 [aj , bj ] be an n-box, f : A → R be continuous, and σ be any permutation
∫ ∫b ∫b
of {1, . . . , n}. Then A f = a σ(1) (· · · ( a σ(n) f dxσ(n) ) · · · )dxσ(1) (which means we can integrate f in
σ(1) σ(n)
any order over the intervals [aj , bj ]). [Hint: Repeated application of [123](iii).]
Remark: Let A = A1 × A2 , where A1 ⊂ Rn is an n-box and A2 ⊂ Rm is an m-box. Let g : A1 → R,

h : A2 → R be continuous and f : A → R be f (x, y) = g(x)h(y). If (xn ) → x in A1 and
(yn ) → y in A2 , then (g(xn )h(yn )) → g(x)h(y), and hence f is continuous. By Fubini’s theorem,
∫ ∫ ∫ ∫ ∫
A f = (
A1 A2 g(x)h(y)dy)dx = ( A1 g(x)dx)( A2 h(y)dy).
∫2 ∫3
Example: (i) Let A = [0, 2] × [0, 3] and f : A → R be f (x, y) = xy 2 . Then 0 ( 0 xy 2 dy)dx =
∫2 ∫3 ∫2 2 ∫3 2 ∫
0 9xdx = 18 and 0 ( 0 xy dx)dy = 0 2y dy = 18 so that A f = 18.
sin x cos y
(ii) Let A = [0, π]2 and f : A → R be f (x, y) = x . Then the iterated integral
∫π ∫π e + x4 + cos2 x
0 ( 0 f (x, y)dx)dy is difficult to calculate. By [123](iii), the above iterated integral is equal to
∫π ∫π ∫π
0 ( 0 f (x, y)dy)dx, whose value is easily seen to be 0 because 0 f (x, y)dy = 0.
(iii) Let A = [0, 1]2 and f : A → R be f (x, y) = 1 if x = 0 and y ∈ Q, and f (x, y) = 0 otherwise.
Then f is continuous in (0, 1] × [0, 1] (where f ≡ 0), whose complement {0} × [0, 1] is a null set in
∫
R2 . Hence f ∈ R(A) by [122] and A f = 0. If y ∈ [0, 1] is fixed, then f (·, y) is continuous in (0, 1],
∫1
whose complement {0} is a null set in R. Hence 0 f (x, y)dx exists and is equal to 0. Therefore
∫1 ∫1
0 ( 0 f (x, y)dx)dy exists and is equal to 0. But if we fix x = 0, then f (0, ·) fails to be continuous
∫1 ∫1 ∫1
at every point of [0, 1]. Hence the integrals 0 f (0, y)dy and 0 ( 0 f (x, y)dy)dx do not exist.
(iv) Let A = [0, 1]2 , and S ⊂ A be a dense subset of A with the property that S contains at
most one point from each horizontal line and at most one point from each vertical line. Such a
set S can be constructed as follows. Let D1 , D2 , . . . ⊂ A be a listing of all sub-rectangles of A
having rational coordinates for the vertices and having nonempty interiors. Let (x1 , y1 ) ∈ D1 .
Having chosen (xj , yj ) ∈ Dj for 1 ≤ j ≤ n, we choose (xn+1 , yn+1 ) ∈ Dn+1 in such a way that
xn+1 ̸= xj for 1 ≤ j ≤ n and yn+1 ̸= yj for 1 ≤ j ≤ n. Then the set S := {(xn , yn ) : n ∈ N} has
the required properties. Now define f : A → R as the indicator function 1S of S. Then f fails
∫
to be continuous at every point of A, and hence A f does not exist. If x ∈ [0, 1] is fixed, then
∫1
f (x, y) = 0 for every y ∈ [0, 1] with one possible exception. Hence 0 f (x, y)dy = 0, and therefore
∫1 ∫1 ∫1 ∫1
0 ( 0 f (x, y)dy)dx = 0. Similarly, the iterated integral 0 ( 0 f (x, y)dx)dy exists and is equal to 0.
Remark: (i) Let U ⊂ R2 be open and f : U → R be a C 2 -function. We can give another proof
∂2f ∂2f ∂2f ∂2f
for the fact = using Fubini’s theorem. Let F = and G = , which are
∂x∂y ∂y∂x ∂x∂y ∂y∂x
∫ ∫ b ∫ d
continuous since f is a C 2 -function. Consider A = [a, b] × [c, d] ⊂ U . Then A F = a ( c F dy)dx =
∫d ∫b ∫ ∫b ∫d ∫d ∫b
c ( a F dx)dy and A G = a ( c Gdy)dx = c ( a Gdx)dy by Fubini’s theorem. Now, using the
∫d ∫b ∫ d ∂f ∂f
Fundamental theorem of calculus, we note that c ( a F dx)dy = c ( (b, y) − (a, y))dy =
∂y ∂y
∫b ∫d ∫ b ∂f ∂f
f (b, d)−f (b, c)−f (a, d)+f (a, c), and a ( c Gdy)dx = a ( (x, d)− (x, c))dx = f (b, d)−f (a, d)−
∫ ∫ ∂x ∂x
f (b, c) + f (a, c). Hence A F = A G for every 2-box (rectangle) A ⊂ U . If F (x0 , y0 ) ̸= G(x0 , y0 ) for
some (x0 , y0 ) ∈ U , say F > G at (x0 , y0 ), then we can find a rectangle A ⊂ U with (x0 , y0 ) ∈ int(A)
∫ ∫ ∫
and ε > 0 such that F (x, y) > G(x, y) + ε for every (x, y) ∈ A; then A F ≥ A G + εµ(A) > A G,
a contradiction to what we have already proved.
(ii) Conversely4, Fubini’s theorem for a continuous real-valued function f defined on a rectangle
A = [a, b] × [c, d] ⊂ U can be deduced using the equality of mixed partial derivatives of suitable
functions defined in terms of certain integral expressions of f .
[124] [Interchanging differentiation and integration] (i) Let A ⊂ Rn be an n-box and f : A×[c, d] →
∂f
R be a function such that f (·, t) ∈ R(A) for each t ∈ [c, d], and : A × [c, d] → R is continuous.
∂t
∫ d ∫ ∫ ∂f
Then t 7→ A f (x, t)dx from [c, d] to R is differentiable and ( A f (x, t)dx) = A (x, t)dx.
dt ∂t
(ii) Let A ⊂ Rn be an n-box and f : [a, b] × A → R be a function such that f (s, ·) ∈ R(A) for
∂f ∫
each s ∈ [a, b], and : [a, b] × A → R is continuous. Then s 7→ A f (s, x)dx from [a, b] to R is
∂s
d ∫ ∫ ∂f
differentiable and ( A f (s, x)dx) = A (s, x)dx.
ds ∂s
(iii) Let U ⊂ R be open, and f : U × [c, d] → R be a C 1 -function such that f (x, ·) is Riemann
n
∫d
integrable over [c, d] for each x ∈ U . Then F : U → R defined as F (x) = c f (x, t)dt has the
∂F ∫ d ∂f
property that all partial derivatives of F exist and (x) = c (x, t)dt for every j ∈ {1, . . . , n}
∂xj ∂xj
and every x ∈ U .
(iv) Let U ⊂ Rn be open, and f : [a, b] × U → R be a C 1 -function such that f (·, x) is Riemann
∫b
integrable over [a, b] for each x ∈ U . Then F : U → R defined as F (x) = a f (s, x)ds has the
4See A. Aksoy, M. Martelli, Mixed partial derivatives and Fubini’s theorem, College Math. J., 33, (2002).
∂F ∫ b ∂f
property that all partial derivatives of F exist and (x) = a (s, x)ds for every j ∈ {1, . . . , n}
∂xj ∂xj
and every x ∈ U .
f (x, t) − f (x, w)
Proof. (i) Fix w ∈ [c, d], and let Q(x, t) = for t ̸= w. We need to show that
t−w
∫ ∫ ∂f ∂f
limt→w A Q(x, t)dx = A (x, w)dx. Consider ε > 0. Since is uniformly continuous on the
∂t ∂t
∂f ∂f ε
compact set A×[c, d], there is δ > 0 such that | (x, t)− (x, w)| < for every x ∈ A and
∂t ∂t µ(A) + 1
every t ∈ [c, d] with |t−w| < δ. Now consider t ∈ [c, d] with 0 < |t−w| < δ. For each x ∈ A, applying
∂f
the Mean value theorem to f (x, ·), we may find tx between t and w with Q(x, t) = (x, tx ). Hence
∂t
∫ ∫ ∂f ∫ ∂f ∂f ∫ ε
| A Q(x, t)dx − A (x, w)dx| ≤ A | (x, tx ) − (x, w)|dx ≤ A dx < ε.
∂t ∂t ∂t µ(A) + 1
(ii) The proof is similar to that of (i).
(iii) Fix j ∈ {1, . . . , n} and x ∈ U . Choose δ > 0 such that x + sej ∈ U for every s ∈ [−δ, δ],
∂g ∂f
and define g : [−δ, δ] × [c, d] → R as g(s, t) = f (x + sej , t). Note that (s, t) = (x + sej , t).
∂s ∂xj
∂g ∂g
Hence : [−δ, δ] × [c, d] → R is continuous by the C 1 -property of f , and moreover (0, t) =
∂s ∂s
∂f d ∫d ∫ d ∂g
(x, t). Applying (ii) to g, we see that ( c g(s, t)dt) = c (s, t)dt. At s = 0, the right hand
∂xj ds ∂s
∫ d ∂f ∫ d g(s, t) − g(0, t)
side is equal to c (x, t)dt, and the left hand side is equal to lims→0 c dt =
∂xj s
∫ d f (x + sej , t) − f (x, t) F (x + sej ) − F (x) ∂F
lims→0 c dt = lims→0 = (x).
s s ∂xj
(iv) The proof is similar to that of (iii).
9. Multivariable Riemann integration over a Jordan measurable set
As we go through the finer details of Riemann integration theory (such as the theory of Jordan
measurable sets), we will also see some of its disadvantages. To a certain extent, these disadvantages
are rectified in Lebesgue integration theory by the use of Lebesgue measurable sets (which we will
not discuss here).
Exercise-28: Let A, E ⊂ Rn be n-boxes with A ∩ E ̸= ∅. Let f : A ∪ E → R be a bounded function

such that f |A ∈ R(A) and f (x) = 0 for every x ∈ A∆E = (A ∪ E) \ (A ∩ E). Then f |E ∈ R(E) and
∫ ∫
E f = A f . [Hint: Observe that ∂(A ∩ E) is a closed null set in R . In view of Exercise-25(iv),
n
we may modify f on ∂(A ∩ E) and also suppose that f (x) = 0 for every x ∈ ∂(A ∩ E). Choose
∫
a sequence (Pn ) of partitions of A such that limn→∞ L(f |A , Pn ) = limn→∞ U (f |A , Pn ) = A f .
By refining Pn ’s, assume that there is a sequence (Qn ) of partitions of A ∩ E such that Pn is an
extension of Qn . Now choose a sequence (Pen ) of partitions of E such that Pen is an extension
of Qn . Since f ≡ 0 outside int(A ∩ E), we get L(f |E , Pen ) = L(f |A∩E , Qn ) = L(f |A , Pn ) and
∫
U (f |E , Pen ) = U (f |A∩E , Qn ) = U (f |A , Pn ) for every n ∈ N. It follows that L(f |E ) = f = U (f |E ).]
A
Definition: Let X ⊂ Rn be a bounded set and f : X → R be a function.

(i) We define fX : Rn → R as fX (x) = f (x) for x ∈ X, and fX (x) = 0 otherwise.
(ii) We say f is Riemann integrable over X if there is an n-box A ⊂ Rn such that X ⊂ A and
∫ ∫
fX ∈ R(A). If this holds, we define X f := A fX and we write fX ∈ R(X). Note that this
definition is independent of the particular choice of an n-box A because of Exercise-28.
Exercise-29: Let X ⊂ Rn be a bounded set, and f, g ∈ R(X). Then,

∫ ∫ ∫
(i) af + bg ∈ R(X) and X (af + bg) = a X f + b X g for every a, b ∈ R.
(ii) f g ∈ R(X).
∫ ∫ ∫
(iii) If f ≥ 0, then f ≥ 0; and if f ≥ g, then X f ≥ X g.
X
∫ ∫
(iv) If f ≥ 0, then X f ≥ X0 f whenever X0 ⊂ X and f ∈ R(X0 ).
∫ ∫
(v) |f | ∈ R(X) and | X f | ≤ X |f |.
(vi) max{f, g}, min{f, g} ∈ R(X). In particular, f + , f − ∈ R(X).
∫
(vii) If X is a null set in Rn , then X f = 0.
∫ ∫
(viii) If {x ∈ X : f (x) ̸= g(x)} is a null set in Rn , then X f = X g.
[Hint: Consider an n-box A ⊃ X. Apply the usual properties of integration to fX and gX over A.]
Definition: Let X ⊂ Rn be a bounded set and A ⊂ Rn be an n-box containing X. Let D1 , . . . , Dk

be the sub n-boxes of A determined by a partition P of A. Put Γ1 = {1 ≤ i ≤ k : Di ⊂ X} and
∑ ∑
Γ2 = {1 ≤ i ≤ k : Di ∩ X ̸= ∅}. Define LJ(X, P ) = i∈Γ1 µ(Di ) and U J(X, P ) = i∈Γ2 µ(Di )
(where J stands for Jordan). Next define LJ(X) and U J(X) with respect to A as LJ(X) =
sup{LJ(X, P ) : P is a partition of A} and U J(X) = inf{U J(X, P ) : P is a partition of A}. These
quantities attempt to approximate the n-dimensional volume of X from inside and outside. Observe
that LJ(X, P ) = L(1X , P ), U J(X, P ) = U (1X , P ), LJ(X) = L(1X ), and U J(X) = U (1X ).
[125] Let X ⊂ Rn be a bounded set. Then the following are equivalent:

(i) 1X ∈ R(A) for some n-box A ⊂ Rn containing X.
(ii) 1X ∈ R(A) for every n-box A ⊂ Rn containing X.
(iii) LJ(X) = U J(X) with respect to some n-box A ⊂ Rn containing X.
(iv) LJ(X) = U J(X) with respect to every n-box A ⊂ Rn containing X.
(v) ∂X is a null set in Rn .
Proof. We get (i) ⇔ (iii), and (ii) ⇔ (iv) because LJ(X) = L(1X ) and U J(X) = U (1X ). Moreover,
we have (i) ⇔ (ii) because of Exercise-28.
(ii) ⇒ (v) ⇒ (i): Let A be an n-box with X ⊂ int(A). Then {x ∈ A : 1X is not continuous at x} =
∂X. Now use Lebesgue’s criterion [122].
Definition: Let X ⊂ Rn be a bounded set. If the constant function 1 is Riemann integrable over X,
i.e., if the indicator function 1X ∈ R(A) for some n-box A containing X, then we say5 the set X is
∫ ∫
Jordan measurable, and we define the Jordan measure µ(X) of X as µ(X) = X 1dx = A 1X dx.
[126] (i) The definition of Jordan measurability (and Jordan measure) of a bounded set X ⊂ Rn
is independent of the particular choice of an n-box A containing X because of Exercise-28.
(ii) By [125], a bounded set X ⊂ Rn is Jordan measurable ⇔ ∂X is a null set in Rn ⇔ LJ(X) =
U J(X) with respect to some/every n-box A ⊂ Rn containing X.
(iii) If X itself is an n-box, then (by taking A = X, we may see that) X is Jordan measurable and
its Jordan measure coincides with its n-dimensional volume.
∫ ∫
(iv) If X ⊂ Rn is Jordan measurable, then X c = c A 1X = cµ(X) for every c ∈ R; in particular,
by taking c = 1, we observe using Exercise-25(i) that µ(X) = 0 iff X is a null set (i.e., its Lebesgue
measure is zero) in Rn .
[127] Let X ⊂ Rn be a bounded set.

(i) If X is Jordan measurable, then so are int(X) and X, and µ(int(X)) = µ(X) = µ(X).
(ii) X is Jordan measurable ⇔ LJ(int(X)) = U J(X) with respect to some/every n-box A ⊃ X.
(iii) Suppose X = X1 ∪ X2 . Let f : X → R be Riemann integrable over X1 and X2 . Then f is
∫ ∫ ∫ ∫
Riemann integrable over X and X1 ∩ X2 , and X f = X1 f + X2 f − X1 ∩X2 f . In particular (by
taking f : X → R to be f ≡ 1), if there are Jordan measurable sets X1 , X2 ∈ Rn with X = X1 ∪ X2 ,
then X and X1 ∩ X2 are Jordan measurable and µ(X) = µ(X1 ) + µ(X2 ) − µ(X1 ∩ X2 ).
∪
(iv) Suppose X = ki=1 Xi , a finite union, and Xi ∩ Xj is a null set in Rn for every i ̸= j. If
∫ ∑ ∫
f : X → R is Riemann integrable over each Xi , then f ∈ R(X) and X f = ki=1 Xi f .
(v) Assume X is Jordan measurable, and let Y ⊂ Rn be Jordan measurable. Then X ∩ Y and
X \ Y are Jordan measurable. Moreover if f ∈ R(X), then fX ∈ R(Y ).
Proof. Let A ⊂ Rn be an n-box with X ⊂ A.

(i) int(X) and X are Jordan measurable by [125] because ∂(int(X)) ⊂ ∂X and ∂(X) ⊂ ∂(X).
∫ ∫ ∫
Next note that A 1int(X) = A 1X = A 1X by Exercise-25(ii).
∫ ∫
(ii) ⇒: By (i) and [125], LJ(int(X)) = A 1int(X) = A 1X = U J(X). ⇐: Clear by [125].
(iii) By considering f + and f − separately, assume f ≥ 0. Then fX = max{fX1 , fX2 }, fX1 ∩X2 =
min{fX1 , fX2 }, and fX = fX1 + fX2 − fX1 ∩X2 . Now use Exercise-29(vi) and Exercise-29(i).
(iv) We may suppose k = 2; the general case can be proved by a repeated application of this case.
When k = 2, the result follows from (iii) and Exercise-29(viii).
5In some textbooks, [127](ii) is taken as the definition of Jordan measurability.

(v) X ∩ Y is Jordan measurable by (iii). Since ∂(X \ Y ) ⊂ ∂X ∪ ∂Y , it follows by [126](ii) that

X \ Y is also Jordan measurable. Now suppose f ∈ R(X). Let A be an n-box with X ∪ Y ⊂
int(A). Then fX ∈ R(A) and hence {x ∈ A : fX is not continuous at x} is a null set by [122].
Since (fX )Y = fX∩Y , and since {x ∈ A : fX∩Y is not continuous at x} ⊂ ∂X ∪ ∂Y ∪ {x ∈ X :
fX is not continuous at x}, it follows that (fX )Y ∈ R(A) by [122]. Hence fX ∈ R(Y ).
[128] Let X ⊂ Rn be a bounded set.

(i) Let (fn ) be a sequence in R(X) converging uniformly to a function f : X → R. Then f ∈ R(X)
∫ ∫
and X f = limn→∞ X fn .
(ii) If X is Jordan measurable and f : X → R is a bounded continuous function, then f ∈ R(X). In
particular, if X is a Jordan measurable compact set and f : X → R is continuous, then f ∈ R(X).
(iii) Let f ∈ R(X), g : X → R be a bounded function and X0 = {x ∈ X : f (x) ̸= g(x)}. If X0 is
∫ ∫
Jordan measurable with µ(X0 ) = 0, then g ∈ R(X) and X f = X g.
Proof. Let A ⊂ Rn be an n-box with X ⊂ A.

(i) Apply Exercise-22(ii) to fX over A.
(ii) ∂X is a null set by hypothesis, and {x ∈ A : fX is not continuous at x} ⊂ ∂X. Hence fX ∈

R(A) by Lebesgue’s criterion [122](i).
(iii) By [127](i), X0 is also Jordan measurable and µ(X0 ) = µ(X0 ) = 0. Note that {x ∈ A :
fX (x) ̸= gX (x)} ⊂ X0 ∪ ∂X, and the set on the right hand side is a closed null set in Rn . Now
apply Exercise-25(iv).
Example: (i) X := 1[0,1]∩Q is not Jordan measurable because ∂X = [0, 1] is not a null set in R.
This example shows also that a bounded set which is a countable union of Jordan measurable sets
need not be Jordan measurable.
(ii) We will construct a bounded open set X in R which is not Jordan measurable. Let ε ∈ (0, 1/2)
and {xn : n ∈ N} ⊂ (0, 1) be a dense subset of [0, 1]. For each n ∈ N, choose an open interval
∪
Jn ⊂ (0, 1) containing xn with µ(Jn ) < ε/2n , and put X = ∞ n=1 Jn , which is an open set in R
∪
with X = [0, 1]. If ∂X is a null set, there is a sequence (Jen ) of open intervals with ∂X ⊂ ∞ e
n=1 Jn
∑∞ e e
and n=1 µ(Jn ) < ε. Then {Jn : n ∈ N} ∪ {Jn : n ∈ N} is an open cover for the compact
set X ∪ ∂X = [0, 1]. Extract a finite subcover {Jn : 1 ≤ n ≤ p} ∪ {Jen : 1 ≤ n ≤ p}. Then
∑ ∑
1 = µ([0, 1]) ≤ pn=1 µ(Jn ) + pn=1 µ(Jen ) ≤ ε + ε = 2ε < 1, a contradiction (here we used: if
∑
J1 , . . . , Jp are intervals covering an interval J, then µ(J) ≤ pn=1 µ(Jn )).
(iv) If X is as in (ii) above, and K = [0, 1] \ X, then K is a compact set which is not Jordan
measurable because ∂K = ∂X.
(iv) There are path connected Jordan measurable sets which are not Borel sets. Let n ≥ 2, and
A ⊂ Rn be an n-box with int(A) ̸= ∅. Since ∂A is an uncountable compact set, there is a non-Borel
set Y ⊂ ∂A (∵ the cardinality of the collection of Borel subsets of Rn is equal to that of R whereas
the cardinality of the power set P(∂A) is equal to that of P(R)). Let X = int(A) ∪ Y , which is not
a Borel set. But X is Jordan measurable because ∂X ⊂ ∂A, and clearly X is path connected.
Remark: (i) Every Jordan measurable set X ⊂ Rn is a (bounded) Lebesgue measurable set because
X = int(X) ∪ (X ∩ ∂X), where int(X) is an open set and hence Lebesgue measurable, and X ∩ ∂X
is also Lebesgue measurable because it is a subset of the null set ∂X.
(ii) Let X ⊂ Rn be a bounded set. The existence some f in R(X) does not imply the Jordan
measurability of X: trivially, 0 ∈ R(X) for every bounded set X.
Definition: Let U, V ⊂ Rn be open. A function g : U → V is called a C 1 -diffeomorphism if g is

bijective and both g and g −1 are C 1 -functions. Note that if g : U → V is a bijective C 1 -function
with det(Jg (x)) ̸= 0 for every x ∈ U , then g is a C 1 -diffeomorphism by Inverse function theorem.
Exercise-30: Let U, V ⊂ Rn be open, and g : U → V be a C 1 -diffeomorphism.

(i) Let X ⊂ Rn be Jordan measurable with X ⊂ U . Then g(int(X)) = int(g(X)), g(∂X) = ∂g(X),
g(X) = g(X), and g(X) is a Jordan measurable set with g(X) ⊂ V .
(ii) If U, V are Jordan measurable and f ∈ R(V ), then f ◦ g ∈ R(U ).
[Hint: (i) Since g is in particular a homeomorphism, we get g(X) = g(X) ⊂ V , g(int(X)) =
int(g(X)), and g(∂X) = ∂g(X). The first equality implies g(X) is compact, and hence g(X) is
bounded. By [121](ii), we see ∂g(X) = g(∂X) is a null set in Rn . Now [126](ii) can be used. (ii) Let
Y = {y ∈ V : f is not continuous at y}. Note that {x ∈ Rn : (f ◦ g)U is not continuous at x} ⊂
∂U ∪ g −1 (Y ) and use [122].]
Exercise-31: Let X = {(x, y) ∈ R2 : a ≤ x ≤ b and ϕ(x) ≤ y ≤ ψ(x)}, where ϕ, ψ : [a, b] → R are

continuous with ϕ ≤ ψ. Then, (i) X is compact and Jordan measurable.
∫ ∫ b ∫ ψ(x)
(ii) If f : X → R is continuous, then f ∈ R(X) and X f = a ( ϕ(x) f (x, y)dy)dx.
[Hint: (i) ∂X is a null set in R2 because the graphs of ϕ and ψ are null sets in R2 . (ii) f ∈ R(X)
by [128](ii). Now consider a 2-box (rectangle) A = [a, b] × [c, d] ⊃ X. For each x ∈ [a, b], the
map y 7→ fX (x, y) is Riemann integrable over [c, d] (because it can be discontinuous at at most
∫d ∫ ψ(x)
two points, ϕ(x) and ψ(x)), and c fX (x, y)dy = ϕ(x) f (x, y)dy. By Fubini’s theorem [123](ii),
∫ ∫ ∫b ∫d ∫ b ∫ ψ(x)
X f = A fX = (
a c fX (x, y)dy)dx = a ( ϕ(x) f (x, y)dy)dx.]
Remark: Exercise-31 is useful in evaluating certain integrals. For example, let X = {(x, y) ∈ R2 :
∫ ∫ 1 ∫ x2
0 ≤ x ≤ 1 and 0 ≤ y ≤ x2 } and f : X → R be f (x, y) = xy 2 . Then X f = 0 ( 0 xy 2 dy)dx =
∫ 1 x7 1
0 3 dx = 24 . In certain cases, instead of bounding y with functions of x, we may bound x with
functions of y and interchange the order of integration. For example, let X = {(x, y) ∈ R2 : 0 ≤
2 ∫ ∫1 ∫1 2
x ≤ 1 and x ≤ y ≤ 1} and f : X → R be f (x, y) = ey . Then X f = 0 ( x ey dy)dx, but the inner
integral is not easy to evaluate. However note that X = {(x, y) ∈ R2 : 0 ≤ y ≤ 1 and 0 ≤ x ≤ y},
∫ ∫1 ∫y 2 ∫1 2 ∫1 e−1
and hence X f = 0 ( 0 ey dx)dy = 0 yey dy = (1/2) 0 et dt = by the substitution t = y 2 .
2
Exercise-32: Let X ⊂ Rn be a Jordan measurable compact set, and f : X × [c, d] → R be a function
∂f
such that f (·, y) ∈ R(X) for each y ∈ [c, d], and : X × [c, d] → R is continuous. Then the
∂y
∫ d ∫ ∫ ∂f
function y 7→ X f (x, y)dx from [c, d] → R is differentiable and ( X f (x, y)dx) = X (x, y)dx.
dy ∂y
∂f
[Hint: ∈ R(X) by [128](ii). Now imitate the proof of [124].]
∂y
10. Change of variable
We may write the Change of variable formula in one-variable theory in the following form:
Exercise-33: Let f : [a, b] → R be Riemann integrable and g : [c, d] → [a, b] be a surjec-

∫b ∫d
tive C 1 -function with g ′ non-vanishing. Then a f (t)dt = c f (g(x))|g ′ (x)|dx. [Hint: We have
∫ g(d) ∫d ′ ′
g(c) f (t)dt = c f (g(x))g (x)dx by the standard Change of variable formula. Since g is non-
vanishing, either g ′ > 0 or g ′ < 0 on [c, d]. If g ′ > 0, then g(c) = a, g(d) = b, and |g ′ (x)| = g ′ (x). If
g ′ < 0, then g(c) = b, g(d) = a, and |g ′ (x)| = −g ′ (x).]
Our aim is to generalize Exercise-33 to higher dimensions by replacing the ‘local magnification
factor’ |g ′ (x)| with | det(Jg (x))|. The reason for | det(Jg (x))| to be the ‘local magnification factor’
in higher dimensions stems from the result [129] stated below.
Definition: An invertible linear map E : Rn → Rn is said to be an elementary linear map if it is

one of the following three types:
Type-1 : ∃ j ∈ {1, . . . , n} and λ ∈ R \ {0} with E(ej ) = λej and E(ek ) = ek for every k ̸= j.
Type-2 : ∃ i ̸= j in {1, . . . , n} with E(ei ) = ej , E(ej ) = ei , and E(ek ) = ek for every k ̸= i, j.
Type-3 : ∃ i ̸= j in {1, . . . , n} with E(ej ) = ei + ej and E(ek ) = ek for every k ̸= j.
Exercise-34: [Fact from Linear Algebra] Every invertible linear map L : Rn → Rn can be written
as a finite product of elementary linear maps. [Hint: For the corresponding result in terms of
matrices, see Theorem 12 in Section 1.6 of Hoffman and Kunze, Linear Algebra.]
Convention: Let L ∈ L(Rn , Rn ). Then L′ (x; ·) = L, and hence JL (x) is equal to the matrix of L
with respect to the standard basis of Rn for each x ∈ Rn . Identifying L with its matrix, we will
write det(L) to mean the determinant of the matrix of L; with this convention, det(L) = det(JL (x))
for every x ∈ Rn .
[129] Let X ⊂ Rn be Jordan measurable.

(i) If E ∈ L(Rn , Rn ) is an elementary linear map, then µ(E(X)) = | det(E)|µ(X).

(ii) If L ∈ L(Rn , Rn ) is invertible, then µ(L(X)) = | det(L)|µ(X).
Proof. Note that E(X) and L(X) are Jordan measurable because of Exercise-30(i).
(i) Since X is Jordan measurable, we have LJ(X) = U J(X) with respect to any n-cube containing
X by [126](ii). Hence X can be approximated with a finite union of n-cubes with pairwise disjoint
interiors, and therefore, we may suppose X itself is an n-cube. Since E(cx + y) = cE(x) + y for
c ∈ R \ {0} and x, y ∈ Rn , we may also suppose after a scaling and translation that X is the unit
cube in Rn . In particular, µ(X) = 1, and thus we need to just show µ(E(X)) = | det(E)|. Keep in
∑
mind that E(X) = { nk=1 ck E(ek ) : ck ∈ [0, 1] for every k} since X is the unit n-cube.
If E is of type-1, then there are λ ∈ R \ {0} and j ∈ {1, . . . , n} such that E(ej ) = λej and
E(ek ) = ek for every k ̸= j. Hence | det(E)| = |λ|. As E(X) is an n-box whose jth edge has length
|λ| and other edges have unit length, we conclude µ(E) = |λ| = | det E|. If E is of type-2, then
| det(E)| = 1 and E(X) = X so that µ(E(X)) = 1 = µ(X). If E is of type-3, then there are i ̸= j
in {1, . . . , n} such that E(ej ) = ei + ej and E(ek ) = ek for every k ̸= j. Then | det(E)| = 1. In the
xi xj -plane, E maps the unit square to the parallelogram with vertices 0, ei , ei + ej , and 2ei + ej ,
whose area is 1. Consequently, µ(X) = 1 = | det(E)| in this case also.
(ii) Write L = E1 · · · Ep , a finite product of elementary linear maps, and we will use induction
on p. The case p = 1 is covered by (i). Put L0 = E1 · · · Ep−1 so that L = L0 Ep . Since Y :=
Ep (X) is also Jordan measurable, we get by induction assumption on p − 1 that µ(L(X)) =
µ(L0 (Y )) = | det(L0 )|µ(Y ). Now, µ(Y ) = µ(Ep (X)) = | det(Ep )|µ(X) by (i), and det(L0 ) =
∏
det(E1 ) · · · det(Ep−1 ). It follows that µ(L(X)) = pi=1 | det(Ei )|µ(X) = | det(L)|µ(X).
[130] [Linear change of variable] Let L ∈ L(Rn , Rn ) be invertible, and X ⊂ Rn be Jordan measur-
able. If f : L(X) → R is Riemann integrable, then F : X → R defined as F (x) = f (L(x))| det(L)|
∫ ∫
is Riemann integrable over X, and L(X) f = X F .
Proof. By considering f + and f − separately, assume f ≥ 0. Let A ⊂ Rn be an n-box with

X ⊂ int(A). Observe that Y := L(X) is Jordan measurable by Exercise-30(i). Let fY , FX : Rn → R
be the extended functions which are zero respectively outside Y and X. The set C := {c ∈ Rn :
fY is not continuous at c} is a null set in Rn by [122] because f ∈ R(Y ) by hypothesis. Since
{x ∈ A : FX is not continuous at x} ⊂ L−1 (C), and L−1 (C) is a null set in Rn by [121](ii) (applied
to L−1 ), we deduce that F ∈ R(X) by [122].
Let D1 , . . . , Dk be the sub n-boxes of A determined by a partition P of A. Then L(Di ) is Jordan

measurable for every i. Moreover if i ̸= j, then Di ∩ Dj is a null set in Rn , and therefore L(Di ) ∩
L(Dj ) = L(Di ∩Dj ) is also a null set in Rn by the invertibility of L and [121](ii). Now using [127](iv)
∫ ∫ ∑k ∫ ∑k
and [129], we see that Y f = L(A) fY = i=1 L(Di ) fY ≤ i=1 sup(fY (L(Di )))µ(L(Di )) =
∑k ∑k
i=1 sup(fY (L(Di )))| det(L)|µ(Di ) = i=1 sup(FX (Di ))µ(Di ) = U (FX , P ). Similarly, L(FX , P ) ≤
∫ ∫ ∫
Y f . As P is an arbitrary partition of A, and F ∈ R(X), it follows that X F = Y f .
One advantage of n-cubes over n-dimensional balls is that subsets of Rn can be covered with
finitely many or countably many n-cubes of the same size with pairwise disjoint interiors (whereas
any covering using balls will have overlapping of the balls in general, which makes it difficult to add
up estimates from different balls). We need to make an estimate about the volume of the image of
an n-box under a C 1 -map. For this purpose, it is convenient to use certain special norms:
Definition: The supremum norm of x ∈ Rn is defined as ∥x∥∞ = max{|xj | : 1 ≤ j ≤ n}. For a

∑
linear map T : Rn → Rn with matrix [tij ], let ∥T ∥0 = max{ nj=1 |tij | : 1 ≤ i ≤ n}.
Exercise-35: The quantity ∥ · ∥0 defined above is a norm on the vector space L(Rn , Rn ) ∼
= Rn×n
with ∥ − T ∥0 = ∥T ∥0 and ∥I∥0 = 1. Moreover,
(i) ∥T x∥∞ ≤ ∥T ∥0 ∥x∥∞ for every T ∈ L(Rn , Rn ) and x ∈ Rn .
(ii) For every T ∈ L(Rn , Rn ), there is x ∈ Rn with ∥x∥∞ = 1 and ∥T x∥∞ = ∥T ∥0 .
(iii) ∥S ◦ T ∥0 ≤ ∥S∥0 ∥T ∥0 for every S, T ∈ L(Rn , Rn ).
∑ ∑ ∑
[Hint: (i) | nj=1 tij xj | ≤ nj=1 |tij |∥x∥∞ for 1 ≤ i ≤ n. (ii) Let i ∈ {1, . . . , n} be with nj=1 |tij | =
∥T ∥0 . Define x ∈ Rn as xj = 1 if tij ≥ 0 and xj = −1 if tij < 0. (iii) Choose x ∈ Rn with ∥x∥∞ = 1
and ∥(S ◦ T )x∥∞ = ∥S ◦ T ∥0 . Then ∥S ◦ T ∥0 ≤ ∥S∥0 ∥T x∥∞ ≤ ∥S∥0 ∥T ∥0 ∥x∥∞ = ∥S∥0 ∥T ∥0 .]
Exercise-36: Let U ⊂ Rn be open, f = (f1 , . . . , fn ) : U → Rn be a C 1 -function, D ⊂ U be a

compact convex set, and λ = sup{∥Jf (c)∥0 : c ∈ D}. Then,
∑ ∂fi
(i) λ < ∞, and nj=1 | (c)| ≤ λ for every i ∈ {1, . . . , n} and every c ∈ D.
∂xj
(ii) ∥f (b) − f (a)∥∞ ≤ λ∥a − b∥∞ for every a, b ∈ D.
[Hint: (i) λ < ∞ since f is a C 1 -function and D is compact. (ii) By Mean value theorem and
the convexity of D, there are c1 , . . . , cn ∈ [a, b] ⊂ D with fi (b) − fi (a) = ⟨∇fi (ci ), b − a⟩ =
∑n ∂fi ∑ ∂fi
j=1 (ci ) · (bj − aj ) for 1 ≤ i ≤ n. Hence |fi (b) − fi (a)| ≤ nj=1 | (ci )||bj − aj | ≤ λ∥b − a∥∞ .]
∂xj ∂xj
[131] Let U, V ⊂ Rn be open, g : U → V be a C 1 -diffeomorphism (i.e., g is bijective and g, g −1
are C 1 -functions), and A ⊂ U be compact. Then for every ε > 0, there is δ > 0 such that
µ(g(D)) ≤ (1 + ε)n | det(Jg (x))|µ(D) for every n-cube D ⊂ A with side-length < δ and every x ∈ D.
Proof. Define Φ : A × A → Rn×n ∼

= L(Rn , Rn ) as Φ(x, y) = Jg (x)−1 Jg (y). Then Φ is continuous
with respect to the operator norm of L(Rn , Rn ) because g is a C 1 -function and the operation of
inversion is continuous in L(Rn , Rn ) by Exercise-15(ii). Since any two norms are equivalent on a
finite dimensional space by [101], the map Φ is also continuous with respect to the supremum norm
∑
∥ · ∥∞ on Rn and the norm ∥ · ∥0 on Rn×n defined above as ∥[tij ]∥∞ = max{ nj=1 |tij | : 1 ≤ i ≤ n}.
Since Φ(x, x) = I and since A × A is compact also with respect to the supremum norm, we may
find δ > 0 such that ∥Jg (x)−1 Jg (y)∥0 < 1 + ε for every x, y ∈ A with ∥x − y∥∞ < δ.
Consider an n-cube D ⊂ A whose side-length (say) r is < δ, and fix x ∈ D. Now, L := Jg (x) :

Rn → Rn is an invertible linear map, f := L−1 ◦ g : U → L−1 (V ) is a C 1 -diffeomorphism, and
g = L ◦ f . Applying [129](ii) to the Jordan measurable set X := f (D), we obtain µ(g(D)) =
µ(L(f (D))) = | det(L)|µ(f (D)) = | det(Jg (x))|µ(f (D)).
Since the side-length r of D is < δ, we get ∥x − y∥∞ ≤ r < δ for every y ∈ D. Hence
∥Jg (x)−1 Jg (y)∥0 < 1 + ε for every y ∈ D by the choice of δ. Observe that Jf (y) = Jg (x)−1 Jg (y) by
the definition of f , and hence λ := sup{∥Jf (y)∥0 : y ∈ D} ≤ 1+ε. Let a ∈ D be the center of D, and
consider b ∈ D. Note that ∥b−a∥∞ ≤ r/2. By Exercise-36, ∥f (b)−f (a)∥∞ ≤ λ∥b−a∥∞ ≤ (1+ε)r/2.
Therefore, f (D) is contained in an n-cube with center f (a) and side-length (1 + ε)r. This implies
µ(f (D)) ≤ (1 + ε)n rn = (1 + ε)n µ(D). Combining this with the estimate of the previous paragraph,
we conclude µ(g(D)) ≤ (1 + ε)n | det(Jg (x))|µ(D).
[132] Let U, V ⊂ Rn be open sets and g : U → V be a C 1 -diffeomorphism.

(i) [Preparatory step] Let A ⊂ Rn be an n-cube, f : g(A) → R be Riemann integrable, and
f ≥ 0. Then, F : A → R defined as F (x) = f (g(x))| det(Jg (x))| is Riemann integrable over A and
∫ ∫
g(A) f ≤ A F .
(ii) [Change of variable theorem - version 1] Let X ⊂ Rn be a Jordan measurable set with X ⊂ U
and f : g(X) → R be Riemann integrable. Then F : X → R defined as F (x) = f (g(x))| det(Jg (x))|
∫ ∫
is Riemann integrable over X and g(X) f = X F .
(iii) [Change of variable theorem - version 2] Assume the open sets U, V are Jordan measurable,
x 7→ det(Jg (x)) is bounded on U , and f : V → R is Riemann integrable. Then F : U → R defined
∫ ∫
as F (x) = f (g(x))| det(Jg (x))| is Riemann integrable over U and V f = U F .
Proof. (i) As in the proof of [130], we may see that g(A) is Jordan measurable and F ∈ R(A). Let
ε > 0. By [131], there is δ > 0 such that µ(g(D)) ≤ (1 + ε)n | det(Jg (x))|µ(D) for every n-cube
D ⊂ A with side-length < δ. Choose a partition P of A such that the sub n-boxes D1 , . . . , Dk of A
∫
determined by P are n-cubes with side-legth < δ, and such that U (F, P ) ≤ ε + A F . Since g is a
C 1 -diffeomorphism, g(Di ) is Jordan measurable for every i, and g(Di )∩g(Dj ) = g(Di ∩Dj ) is a null
∫ ∑ ∫ ∑
set for every i ̸= j by [121](ii). Therefore, g(A) f = ki=1 g(Di ) f ≤ ki=1 sup(f (g(Di )))µ(g(Di)) ≤
∑
ε + ki=1 f (g(xi ))µ(g(Di)) for some choice of points xi ∈ Di .
By the choice of δ, we have µ(g(Di )) ≤ (1 + ε)n | det(Jg (xi ))|µ(Di ) for 1 ≤ i ≤ k. This can be
combined with the previous inequality because f ≥ 0. Thus we get
∫ ∑k ∑k
g(A) f ≤ ε + (1 + ε)
n n
i=1 f (g(xi ))| det(Jg (xi ))|µ(Di ) = ε + (1 + ε) i=1 F (xi )µ(Di )
∫
≤ ε + (1 + ε)n U (F, P ) ≤ ε + (1 + ε)n (ε + A F ).
∫ ∫
Since ε > 0 is arbitrary, we deduce that g(A) f ≤ A F .
(ii) By considering f + and f − separately, assume f ≥ 0. Let Y = g(X). The continuous map x 7→
| det(Jg (x))| is bounded on the compact set X, and therefore F is bounded. Let fY , FX : Rn → R
be the extended functions which are zero respectively outside Y and X. Since X ⊂ U , there are
∪
finitely many n-cubes A1 , . . . , Ak with pairwise disjoint interiors such that the set K := ki=1 Ai
satisfies X ⊂ K ⊂ U . Since X and Ai are Jordan measurable, the sets Y = g(X) and g(Ai ) are
Jordan measurable. As f ∈ R(Y ), it follows that fY ∈ R(g(Ai )) by [127](v). Applying part (i)
∫ ∫
to fY and FX , we get that FX ∈ R(Ai ) and g(Ai ) fY ≤ Ai FX for 1 ≤ i ≤ k. It follows that
∫ ∫
FX ∈ R(K) and g(K) fY ≤ K FX since Ai ∩ Aj and g(Ai ) ∩ g(Aj ) are null sets for i ̸= j (see
∫ ∫
[127](iv)). This implies F ∈ R(X) and Y f ≤ X F . Now observe that if x ∈ X and y = g(x),
1
then det(Jg−1 (y)) = and hence f (y) = F (g −1 (y))| det(Jg−1 (y))|. This allows us to
det(Jg (x))
interchange the roles of f and F (and interchange g and g −1 ) to establish the reverse inequality
∫ ∫ ∫ ∫
X F ≤ Y f . Thus Y f = X F .
(iii) By considering f + and f − separately, assume f ≥ 0. By Exercise-30(ii), f ◦ g ∈ R(U ).

The bounded continuous function x 7→ | det(Jg (x))| is Riemann integrable over U by [128](ii).
∪
Hence F ∈ R(U ), and also F ≥ 0. Using [120], we may write U = ∞ j=1 Kj , where Kj ’s are
Jordan measurable compact sets with Kj ⊂ int(Kj+1 ) for every j ∈ N. Let Yj = g(Kj ). Then
∪
V = ∞ j=1 Yj , where Yj ’s are Jordan measurable compact sets with Yj ⊂ int(Yj+1 ) for every j ∈ N.
∫ ∫
By (ii), we have f ∈ R(Yj ), F ∈ R(Kj ), and Yj f = Kj F for every j ∈ N. From this we may
∫ ∫ ∫ ∫
deduce V f = limj→∞ Yj f = limj→∞ Kj F = U F as follows.
∫
Consider ε > 0. Choose a partition P of an n-box A ⊃ U with ( U F ) − ε < L(FU , P ). If
D1 , . . . , Dk are the sub n-boxes of A determined by P and Γ = {1 ≤ i ≤ k : Di ⊂ U }, then
∑ ∑ ∫ ∫
inf(FU (Di )) = 0 for i ∈ / Γ, and therefore L(FU , P ) = i∈Γ inf(F (Di ))µ(Di ) ≤ i∈Γ Di F = K F ,
∪ ∪
where K := i∈Γ Di . Since the compact set K ⊂ U = ∞ j=1 int(Kj+1 ), there is j with K ⊂
∫ ∫ ∫
int(Kj+1 ). Hence ( U F ) − ε < L(FU , P ) ≤ K F ≤ Kj+1 F . As ε > 0 is arbitrary and F ≥ 0,
∫ ∫ ∫ ∫
we deduce U F ≤ limj→∞ Kj F . Clearly, we also have U F ≥ Kj F for every j, and hence
∫ ∫ ∫ ∫ ∫ ∫
U F ≥ limj→∞ Kj F . Thus U F = limj→∞ Kj F . Similarly, V f = limj→∞ Yj f .
Remark: For two other proofs of the Change of variable theorem see (i) Chapter 4 of Munkres,
Analysis on Manifolds, and (ii) P.D. Lax, Change of variables in multiple integrals, American
Mathematical Monthly, 1999.
[133] Let U, V ⊂ Rn be open sets and g : U → V be a C 1 -diffeomorphism.

(i) Let X ⊂ Rn be a Jordan measurable set with X ⊂ U . Then x 7→ | det(Jg (x))| is Riemann
∫
integrable over X and µ(g(X)) = X | det(Jg (x))|dx.
(ii) Assume the open sets U, V are Jordan measurable, and x 7→ det(Jg (x)) is bounded on U . Then
∫
µ(V ) = U | det(Jg (x))|dx.
Proof. (i) Let f : g(X) → R be f ≡ 1, and note that f ∈ R(X) by [128](ii). Now by [132](ii), we
∫ ∫ ∫
have µ(g(X)) = g(X) 1 = g(X) f = X | det(Jg (x))|dx.
(ii) Apply [132](iii) to f : V → R defined as f ≡ 1.
11. Polar, cylindrical, and spherical coordinates
One important use of the Change of variable theorem is in transforming the Euclidean coordinate
to polar, cylindrical, spherical coordinates.
Definition: (i) [Polar coordinates in R2 ] Let U = {(r, θ) ∈ R2 : r > 0 and 0 < θ < 2π}, and
g : U → R2 be g(r, θ) = (r cos θ, r sin θ). Then V := g(U ) = R2 \ {(x, 0) : x ≥ 0}, where
{(x, 0) : x ≥ 0} 
is a closed nullset in R2 . The function g : U → V is a bijective C 1 -function
cos θ −r sin θ
with Jg (r, θ) =   so that det(Jg (r, θ)) = r ̸= 0 for every (r, θ) ∈ U . Hence
sin θ r cos θ
g : U → V is a C 1 -diffeomorphism by Inverse function theorem. If (x, y) ∈ V and (x, y) = g(r, θ),
then (r cos θ, r sin θ) is said to be the polar coordinate representation of (x, y). Here note that
r2 = x2 + y 2 , and θ is the angle (measured in the anticlockwise direction) from the positive x-axis
to the line segment joining (0, 0) and (x, y).
(ii) [Cylindrical coordinates in R3 ] Let U = {(r, θ, z) ∈ R3 : r > 0, 0 < θ < 2π, and z ∈ R} and
g : U → R3 be g(r, θ, z) = (r cos θ, r sin θ, z) (this means using polar coordinates in the xy-plane
and keeping the z-coordinate unchanged). Let V = R3 \ {(x, 0, z) : x ≥ 0 and z ∈ R}. Then
{(x, 0, z) : x ≥ 0 and z ∈ R} is a closed null set in R3 , and g : U → V is a bijective C 1 -function
 
cos θ −r sin θ 0
 
with Jg (r, θ, z) =  
 sin θ r cos θ 0 so that det(Jg (r, θ, z)) = r ̸= 0 for every (r, θ, z) ∈ U . Hence
0 0 1
g : U → V is a C -diffeomorphism. If (x, y, z) ∈ V and (x, y, z) = g(r, θ, z), then (r cos θ, r sin θ, z)
1
is said to be the cylindrical coordinate representation of (x, y, z). Here note that r2 = x2 + y 2 , and
θ is the angle (measured in the anticlockwise direction) from the positive x-axis to the line segment
joining (0, 0, 0) and (x, y, 0).
(iii) [Spherical coordinates in R3 ] Note that A := {(x, 0, z) : x ≥ 0 and z ∈ R} is a closed null set
in R3 . Let V = R3 \ A, and consider (x, y, z) ∈ V . Define r > 0 and t > 0 by the conditions
that r2 = x2 + y 2 + z 2 and t2 = x2 + y 2 . In the xy-plane, we may use polar coordinates and write
(x, y, 0) = (t cos θ, t sin θ), where θ ∈ (0, 2π) is the angle (measured in the anticlockwise direction)
from the positive x-axis to the line segment joining (0, 0, 0) and (x, y, 0). Let η ∈ (0, π) be the angle
between the positive z-axis and the line segment joining (0, 0, 0) and (x, y, z). Then z = r cos η and
t = r sin η so that (x, y, z) = (r cos θ sin η, r sin θ sin η, r cos η).
Let U = {(r, θ, η) ∈ R2 : r > 0, 0 < θ < 2π, and 0 < η < π} and g : U → V be
g(r, θ, η) = (r cos θ sin η, r sin θ sin η, r cos η). Then g is a bijective C 1 -function with Jg (r, θ, η) =
 
cos θ sin η −r sin θ sin η r cos θ cos η
 
 sin θ sin η r cos θ sin η r sin θ cos η  so that det(Jg (r, θ, η)) = −r2 sin η ̸= 0 for every (r, θ, η)
 
cos η 0 −r sin η
in U . Hence g : U → V is a C 1 -diffeomorphism. If (x, y, z) ∈ V and (x, y, z) = g(r, θ, η), then
(r cos θ sin η, r sin θ sin η, r cos η) is said to be the spherical coordinate representation of (x, y, z).
Exercise-37: (i) Use polar coordinates to see the area of B(0, λ) ⊂ R2 is πλ2 .
(iii) Use spherical coordinates to see the area of B(0, λ) ⊂ R3 is 4πλ3 /3.
[Hint: (i) Let U = (0, λ) × (0, 2π) and V = B(0, λ) \ {(x, 0) : x ≥ 0}. Note that {(x, 0) : x ≥ 0} is a
closed null set in R2 and g : U → V given by g(r, θ) = (r cos θ, r sin θ) is a C 1 -diffeomorphism with
∫ ∫ λ ∫ 2π
det(Jg (r, θ)) = r. Hence by [133](ii), µ(B(0, λ)) = µ(V ) = U | det(Jg (r, θ))| = 0 0 rdθdr = πλ2 .
(ii) Let U = (0, λ) × (0, 2π) × (0, π), g : U → R3 be g(r, θ, η) = (r cos θ sin η, r sin θ sin η, r cos η),
and V = g(U ). Then V is equal to B(0, λ) minus a closed null set, and g : U → V is a
C 1 -diffeomorphism with |det(Jg (r, θ, η))| = r2 sin η. Hence by [133](ii), µ(B(0, λ)) = µ(V ) =
∫ 2 ∫ λ ∫ 2π ∫ π 2 4πλ3
U r sin η = 0 0 0 r sin ηdηdθdr = .]
3
∫
Example: We wish to evaluate V f , where V = {(x, y) ∈ R2 : x > 0, y > 0, and x2 + y 2 < λ2 },
and f : V → R is f (x, y) = x2 y. Let U = (0, λ) × (0, π/2) and note g : U → V given by
∫
g(r, θ) = (r cos θ, r sin θ) is a C 1 -diffeomorphism with det(Jg (r, θ)) = r. By [133](ii), V f =
∫ ∫ λ ∫ π/2 4 ∫λ∫1
U f (g(r, θ))| det(Jg (r, θ))| = 0 0 r cos2 θ sin θdθdr = 0 0 r4 t2 dtdr = λ5 /15 (where t = cos θ).
[134] For n ∈ N and λ > 0, let v(n, λ) denote the n-dimensional volume of B(0, λ) ⊂ Rn . Then,
(i) v(n, λ) = λn v(n, 1).
(ii) v(n + 2, 1) = 2πv(n, 1)/(n + 2).
(iii) v(3, λ) = 4πλ3 /3 and v(4, λ) = π 2 λ4 /2.
Proof. (i) Let g : Rn → Rn be g(x) = λx. Then g is an invertible linear map (in particular a
C 1 -diffeomorphism) with g(B(0, 1)) = B(0, λ). The matrix of g is a diagonal matrix where all
diagonal entries are equal to λ, and hence | det(Jg (x))| = λn for every x ∈ Rn . By [133](ii),
∫ ∫
v(n, λ) = B(0,1) | det(Jg (x))|dx = B(0,1) λn = λn v(n, 1).
(ii) In Rn+2 , put y = xn+1 and z = xn+2 . With the help of part (i), we see v(n + 2, 1)
∫ ∫
= y 2 +z 2 <1 ( x21 +···+x2n <1−(y 2 +z 2 ) 1)dydz
∫ √ ∫
= y 2 +z 2 <1 v(n, 1 − (y 2 + z 2 ))dydz = v(n, 1) y 2 +z 2 <1 (1 − (y 2 + z 2 ))n/2 dydz.
Now applying [133](ii) to the polar coordinates in the yz-plane, note that
∫ ∫ ∫
2 + z 2 ))n/2 dydz = 1 2π (1 − r 2 )n/2 rdθdr = 2π/(n + 2) by putting t = 1 − r 2 .
2 2
y +z <1 (1 − (y 0 0
(iii) Since v(1, 1) = 2, we get by (i) and (ii) that v(3, λ) = λ3 v(3, 1) = λ3 × 2πv(1, 1)/3 = 4πλ3 /2.
Since v(2, 1) = π, we get by (i) and (ii) that v(4, λ) = λ4 v(4, 1) = λ4 × 2πv(2, 1)/4 = π 2 λ4 /2.
12. Line integrals
Line integral refers to the integral of a function f over a path α, and there are two types:
(i) line integrals of scalar fields (i.e., real-valued) f , and this line integral will be independent of
the orientation of the path α, and
(ii) line integrals of vector fields (i.e., vector-valued) f , and this line integral will be sensitive to
the orientation of the path α.
Discussion: Let U ⊂ Rn be open, f : U → R be continuous, and α : [a, b] → U be a C 1 -path. We

∫ ∫ ∑
would like to define α f in such a way that α f is approximately equal to kj=1 f (α(aj ))∥α(aj ) −
α(aj−1 )∥ whenever P = {a0 ≤ a1 ≤ · · · ≤ ak } is a sufficiently refined partition of [a, b]. If
α = (α1 , . . . , αn ), then by Mean value theorem there are cij ∈ (aj−1 , aj ) with αi (aj ) − αi (aj−1 ) =
αi′ (cij )(aj − aj−1 ). Since α is a C 1 -path, we deduce that α(aj ) − α(aj−1 ) ∼ α′ (aj )(aj − aj−1 ).
∫ ∑
Hence α f ∼ kj=1 f (α(aj ))∥α′ (aj )∥(aj − aj−1 ), where the right hand side is a Riemann sum of
the continuous real-valued function t 7→ f (α(t))∥α′ (t)∥ on [a, b]. Hence we define:
Definition: [Line integral of a scalar field] Let U ⊂ Rn be open and f : U → R be continuous.

∫ ∫
(i) If α : [a, b] → U is a C 1 -path, then we define α f = ab f (α(t))∥α′ (t)∥dt
(ii) If α : [a, b] → U is a continuous path with the property that there is a partition P = {a0 ≤ a1 ≤
∑
· · · ≤ ak } of [a, b] such that α[j] := α|[aj−1 ,aj ] is a C 1 -path for each j, then we say α := kj=1 α[j] is
a piecewise C 1 path (for example, the parametrization of the boundary of a rectangle), and in this
∫ ∑ ∫
case we define α f = kj=1 α f .
[j]
∫ ∫
Definition: The length l(α) of a C 1 -path α : [a, b] → Rn is defined as l(α) = α 1 = ab ∥α′ (t)∥dt
∑
(i.e., take f ≡ 1 in the line integral defined above). If α := kj=1 α[j] : [a, b] → Rn is a piecewise
∑
C 1 path, its length is defined as l(α) = kj=1 l(α[j] ).
Remark: Let U ⊂ R2 be open, and f : U → R be continuous with f ≥ 0. If α : [a, b] → U is a

∫ ∫b
C 1 -path, then the quantity α f = a f (α(t))∥α′ (t)∥dt gives the area of the ‘curtain-shaped’ region
{(x, α(x), z) ∈ R3 : a ≤ x ≤ b and 0 ≤ z ≤ f (x, α(x))} bounded by the image of α in the xy-plane
and the projection of this image on the graph of f (depends also an the ‘speed’ of α).
Example: (i) Let α : [0, π/2] → R2 be α(t) = (2 cos t, 2 sin t) and f : R2 → R be f (x, y) = x + 5y.
∫ ∫ π/2
Then f (α(t)) = 2 cos t + 10 sin t and ∥α′ ∥ ≡ 2. Hence α f = 0 (4 cos t + 20 sin t)dt = 24.
(ii) Let α, β : [0, 2π] → R2 be α(t) = (cos t, sin t) and β(t) = (cos 3t, sin 3t). Then α and β have the
same image (the unit circle), but l(α) = 2π ̸= 6π = l(β) because ∥α′ ∥ ≡ 1 and ∥β ′ ∥ ≡ 3. Moreover,
∫ ∫
if f : R2 → R is f (x, y) = x2 + y 2 , then f ◦ α ≡ 1 ≡ f ◦ β, and therefore α f = 2π ̸= 6π = β f .
Exercise-38: Let U ⊂ Rn be open, f, g : U → R be continuous, and α : [a, b] → U be a piecewise

∫ ∫ ∫
C 1 path. Then, (i) α (c1 f + c2 g) = c1 α f + c2 α g for every c1 , c2 ∈ R.
∫ ∫ ∫
(ii) If f ≥ g, then α f ≥ α g. In particular, if f ≥ 0, then α f ≥ 0.
[135] [Line integral of a scalar field remains invariant under an equivalent reparametrization of the
path] Let U ⊂ Rn be open, and f : U → R be continuous.
(i) Let α : [a, b] → U be a piecewise C 1 path, g : [c, d] → [a, b] be a C 1 -diffeomorphism, and
∫ ∫
β : [c, d] → U be β = α ◦ g. Then α f = β f .
(ii) Let α : [a, b] → U be a piecewise C 1 path, and define −α : [a, b] → Rn as (−α)(t) = α(a + b − t)
∫ ∫
(the path in the reverse direction). Then α f = −α f .
Proof. (i) By the additivity of the integral, we may suppose that α is a C 1 -path; and then β
is also a C 1 -path. Let h : [a, b] → R be h(t) = f (α(t))∥α′ (t)∥. Then by Change of variable
∫ ∫b ∫d
theorem, α f = a h(t)dt = c h(g(s))|g ′ (s)|ds. By Chain rule, β ′ (s) = g ′ (s)α′ (g(s)), and therefore
∫ ∫d ′
∫d ′ ′
∫d ′
∫
β f = c f (β(s))∥β (s)∥ds = c f (β(s))∥α (g(s))∥|g (s)|ds = c h(g(s))|g (s)|ds = α f .
(ii) This follows from (i) by taking g : [a, b] → [a, b] to be g(s) = a + b − s.
Remark: When we have to integrate a scalar field over a circle or the boundary of a rectangle,
etc., we should consider the natural parametrization in the anticlockwise direction. For example,
∫
let A = [a, b] × [c, d], and suppose we wish to evaluate ∂A 1. Let α, β : [a, b] → R2 be α(t) =
(t, c), β(t) = (t, d); and γ, σ : [c, d] → R2 be γ(t) = (a, t), σ(t) = (b, t). Then the anticlockwise
∫ ∫ ∫ ∫ ∫
parametrization of ∂A is given by the path α+σ−β−γ. By [135](ii), ∂A 1 = α 1+ σ 1+ β 1+ γ 1 =
∫b ′ ′
∫d ′ ′
∫b ∫d
a (∥α (t)∥+∥β (t)∥)+ c (∥σ (t)∥+∥γ (t)∥) = a 2+ c 2 = 2((b−a)+(d−c)), which is the perimeter
of the rectangle A.
Discussion: Let U ⊂ Rn be open, f : U → Rn be continuous and α : [a, b] → U be a C 1 -path.

Let P = {a0 ≤ a1 ≤ · · · ≤ ak } be a sufficiently refined partition of [a, b]. Think of f as a force
field. The work done by f in moving a particle from α(aj−1 ) to α(aj ) along the image of α is
approximately equal to ⟨f (α(aj )), α(aj ) − α(aj−1 )⟩. But α(aj ) − α(aj−1 ) ∼ (aj − aj−1 )α′ (aj ) by
Mean value thoerem and the C 1 -property of α. Hence the work done by f in moving a particle
∑
along the image of α from α(a) to α(b) is approximately equal to kj=1 ⟨f (α(aj )), α′ (aj )⟩(aj −aj−1 ),
which is a Riemann sum of the continuous function t 7→ ⟨f (α(t)), α′ (t)⟩ from [a, b] to R. Motivated
by this observation, we define:
Definition: [Line integral of a vector field] Let U ⊂ Rn be open and f : U → Rn be continuous.

∫ ∫
(i) If α : [a, b] → U is a C 1 -path, then we define α f = ab ⟨f (α(t)), α′ (t)⟩dt This integral is
∫
also denoted as f · dα, where the dot in the middle indicates the dot product (inner product).
∫ ∫b∑
If f = (f1 , . . . , fn ) and α = (α1 , . . . , αn ), then f · dα = a ni=1 fi (α(t))αi′ (t)dt. Moreover, if
xi (t) = αi (t), then dxi = αi′ (t)dt, and hence the following expression is also used for the line
∫ ∫
integral of a vector field: f · dα = α (f1 dx1 + · · · + fn dxn )
∑
(ii) If α = kj=1 α[j] : [a, b] → U is a piecewise C 1 path (where each α[j] is a C 1 -path), we define
∫ ∑k ∫
αf = j=1 α[j] f .
∫
Example: (i) Let α : [0, 1] → R3 be α(t) = (t, t2 , t3 ). Then we have α (xdx − 2ydy + zdz) =
∫1
0 (t − 4t + 3t )dt = 0.
3 5
(ii) Let α : [0, π/2] → R2 be α(t) = (2 cos t, 2 sin t) and f : R2 → R2 be f (x, y) = (3x, 5y). Then
∫ ∫ π/2 ∫ π/2 ∫ π/2
αf = 0 ⟨f (α(t)), α′ (t)⟩dt = 0 ⟨(6 cos t, 10 sin t), (−2 sin t, 2 cos t)⟩dt = 8 0 cos t sin tdt =
∫1
8 0 sds = 4 by putting s = sin t.
Exercise-39: Let U ⊂ Rn be open, f, g : U → Rn be continuous, and α : [a, b] → U be a piecewise

∫ ∫ ∫
C 1 path. Then, α (c1 f + c2 g) = c1 α f + c2 α g for every c1 , c2 ∈ R.
[136] [Line integral of a vector field is sensitive to the orientation of the path; but is invariant
under an orientation-preserving equivalent reparametrization of the path] Let U ⊂ Rn be open,
f : U → Rn be continuous, and α : [a, b] → U be a piecewise C 1 path. Then,
∫ ∫
(i) −α f = − α f , where −α : [a, b] → U is given by (−α)(t) = α(a + b − t).
∫ ∫
(ii) Let g : [c, d] → [a, b] be a C 1 -diffeomorphism, and β : [c, d] → U be β = α ◦ g. Then, αf = β f
∫ ∫
if g ′ > 0; and α f = − β f if g ′ < 0.
∫ ∫b ′
∫b ′
∫
Proof. (i) −α f = a ⟨f (α(a + b − t), α (a + b − t)⟩dt = − a ⟨f (α(s), α (s)⟩ds = − αf by putting
s = a + b − t.
∫ ∫b
(ii) Let h : [a, b] → R be h(t) = ⟨f (α(t)), α′ (t)⟩. Then by Change of variable, α f = a h(t)dt =
∫d ′
∫ ∫d ′
∫d ′ ′ ′ ′
c h(g(s))|g (s)|ds. Also, β f = c ⟨f (β(s)), β (s)⟩ds = c h(g(s))g (s)ds since β (s) = g (s)α (g(s))
∫ ∫ ∫ ∫
by Chain rule. Now it is clear that α f = β f if g ′ > 0, and α f = − β f if g ′ < 0.
∫
Example: Let A = [a, b]×[c, d]. We wish to evaluate ∂A ((x+y)dx+(x−y)dy). Let α, β : [a, b] → R2
be α(t) = (t, c), β(t) = (t, d); and γ, σ : [c, d] → R2 be γ(t) = (a, t), σ(t) = (b, t). Then the
anticlockwise parametrization of ∂A is given by the path α + σ − β − γ. Moreover, observe that
∫
dy = 0 along α and β; and dx = 0 along γ and σ. Therefore by [136](i), ∂A (x + y)dx + (x − y)dy
∫ ∫ ∫ ∫
= α (x + y)dx + σ (x − y)dy − β (x + y)dx − γ (x − y)dy
∫b ∫d ∫b ∫d ∫b ∫d
= a (t + c)dt + c (b − t)dt − a (t + d)dt − c (a − t)dt = a (c − d)dt + c (b − a)dt = 0.
Definition: Let U ⊂ Rn be a connected open set and f : U → Rn be continuous. We say f has path
independent line integral in U if for any two piecewise C 1 paths α, β : [a, b] → U with α(a) = β(a)
∫ ∫
and α(b) = β(b), we have that α f = β f .
Recall from Exercise-11 that if U ⊂ Rn is a connected open set, then for every x, y ∈ U , there is
a polygonal path (i.e., a continuous path consisting of finitely many line segments, and in particular
a piecewise C 1 path) in U from x to y. Here note that a path α : [a, b] → U is said to be a path
from x to y if α(a) = x and α(b) = y.
[137] [Fundamental theorem of Calculus for line integrals of a vector field] Let U ⊂ Rn be a
connected open set, and f : U → Rn be continuous.
∫
(i) Assume there is a function F : U → R with ∇F = f . Then αf = F (α(b)) − F (α(a)) for any
piecewise C1 path α : [a, b] → U .
∫
(ii) Assume f has path independent line integral in U . Fix z ∈ U . Define F : U → R as F (x) = α f,
where α is any piecewise C 1 path in U from z to x. Then F is a C 1 -function with ∇F = f .
Proof. (i) Since ∇F = f and f is continuous, it follows that F is a C 1 -function, and in particu-
∫ ∫b ∫b ∫b
lar differentiable. Now, α f = a ⟨f (α(t)), α′ (t)⟩dt = a ⟨∇F (α(t)), α′ (t)⟩dt = a (F ◦ α)′ (t)dt =
F (α(b)) − F (α(a)) by the Chain rule [108](iii) and the Fundamental theorem of Calculus of one-
variable theory.
(ii) It suffices to show ∇F = f , and then the continuity of f will imply that F is a C 1 -function. Let
F (x + tej ) − F (x)
f = (f1 , . . . , fn ). Fix x ∈ U and j ∈ {1, . . . , n}. We need to show that limt→0 =
∫ t
fj (x). Let α be piecewise C 1 path in U from z to x. Then F (x) = α f . Choose an open ball
B ⊂ U centered at x and consider t ̸= 0 with x + tej ∈ B. Let β : [0, 1] → U be β(s) = x + stej ,
i.e., β is a parametrization of the line segment joining x and x + tej . Then α + β is a path in
∫ ∫ ∫ ∫
U from z to x + tej , and therefore F (x + tej ) = α+β f = α f + β f = F (x) + β f . Hence
∫ ∫1 ∫t ∫t
F (x + tej ) − F (x) = β f = 0 ⟨f (x + stej ), tej ⟩ds = 0 ⟨f (x + λej ), ej ⟩dλ = 0 fj (x + λej )dλ by
1 ∫t F (x + tej ) − F (x)
putting λ = st. Moreover, fj (x) = 0 fj (x)dλ. Consequently, | − fj (x)| ≤
∫t t t
0 |fj (x + λej ) − fj (x)|dλ
, and the right hand side goes to 0 as t → 0 by the continuity of fj .
|t|
A path α : [a, b] → Rn is said to be a closed path if α(a) = α(b).
[138] Let U ⊂ Rn be a connected open set, and f : U → Rn be continuous. Then the following are
∫
equivalent: (i) α f = 0 for every piecewise C 1 closed path α in U .
(ii) f has path independent line integral in U .
(iii) There is a C 1 -function F : U → R with ∇F = f .
Proof. (i) ⇒ (ii): Let α, β : [a, b] → U be piecewise C 1 paths with α(a) = β(a) and α(b) = β(b).
∫ ∫ ∫
Then α − β is a piecewise C 1 closed path, and hence 0 = α−β f = α f − β f by (i) and [136](i).
The implication ‘(ii) ⇒ (iii)’ is established in [137](ii), and ‘(iii) ⇒ (i)’ follows from [137](i).
13. Circulation density and Green’s theorem
Discussion: Let U ⊂ R2 be open. A C 1 -function f : U → R2 can be thought of as representing a

flow in the planar set U , where f (x, y) is the velocity vector at (x, y) ∈ U . Some examples are:
(i) If c ∈ R2 , then f : R2 → R2 given by f ≡ c represents the flow moving in the direction of the
vector c with constant velocity c. If c = (0, 0), then f represents a stationary flow.
(ii) f : R2 → R2 given by f (x, y) = (x, y) represents a flow moving outward from (0, 0) in all
directions with increasing speed (expansion). Draw a picture to see this.
(iii) f : R2 → R2 given by f (x, y) = (−y, x) represents a flow rotating by an angle π/2 in the
anticlockwise direction around the origin.
∂f2 ∂f1
[139] [Interpretation of − as circulation density] Let U ⊂ R2 be open, and f : U → R2
∂x ∂y
be a C 1 -function. The circulation density of f at (a, b) ∈ U may be defined as the quantity
1 ∫ 1 ∫
limA→{(a,b)} ∂A f = limε→0 2 ∂A f , where A ⊂ U is a small square with side-length ε > 0
area(A) ε
∂f2 ∂f1
centered at (a, b). Then the circulation density of f at (a, b) is equal to (a, b) − (a, b).
∂x ∂y
Consequently, we have the following:
(i) If Jf (a, b) is a symmetric matrix, then the circulation density of f at (a, b) is zero.
(ii) If there is a function F : U → R with ∇F = f , then the circulation density of f at (a, b) is zero
for every (a, b) ∈ U .
Proof. Write f = (f1 , f2 ), and define α, β, γ, σ : [0, ε] → U as α(t) = (a − ε/2 + t, b − ε/2),

β(t) = (a − ε/2 + t, b + ε/2), γ(t) = (a − ε/2, b − ε/2 + t), and σ(t) = (a + ε/2, b − ε/2 + t). Then
∫ ∫ ∫ ∫ ∫ ∫
∂A f = α+σ−β−γ (f1 dx + f2 dy) = α f1 dx + σ f2 dy − β f1 dx − γ f2 dy because dy = 0 along α
and β, and dx = 0 along γ and σ.
∫
Since the midpoint of the side of A represented by α is (a, b − ε/2), we have that α f1 dx =
∫ε ∫ε ∫
0 f1 (α(t)) × 1dt ∼ 0 f1 (a, b − ε/2)dt = εf1 (a, b − ε/2). Similarly, β f1 dx ∼ εf1 (a, b + ε/2),
∫ ∫ 1 ∫
γ f 2 dy ∼ εf2 (a − ε/2, b), and σ f2 dy ∼ εf 2 (a + ε/2, b). Therefore, f
ε2 ∂A
[∫ ∫ ] [∫ ∫ ]
α f1 dx − β f1 dx σ f2 dy − γ f2 dy
= +
ε ε
[ ] [ ]
f1 (a, b − ε/2) − f1 (a, b + ε/2) f2 (a + ε/2, b) − f2 (a − ε/2, b)
∼ +
ε ε
∂f1 ∂f2
→− (a, b) + (a, b) as ε → 0. This proves the main assertion.
∂y ∂x
Statement (i) is an immediate corollary. To deduce (ii) from (i), note that Jf (a, b) is the transpose
of the Hessian matrix HF (a, b), and HF (a, b) is symmetric by [110] because F is C 2 (as f is C 1 ).
Example: (i) Let f : R2 → R2 be f (x, y) = (−y, x). We know that this flow represents a rotation.
∂f2 ∂f1
Since − ≡ 2, the circulation density of f at (a, b) is 2 for every (a, b) ∈ R2 . From [139](ii),
∂x ∂y
we deduce that there does not exist any C 2 -function F : R2 → R with ∇F = f .
(ii) Let f : R2 → R2 be f (x, y) = (x, y), which represents a flow expanding in all directions from
∂f2 ∂f1
the origin with increasing speed. Here, − ≡ 0, and thus the circulation density of f at
∂x ∂y
x2 + y 2
(a, b) is 0 for every (a, b) ∈ R2 . If F : R2 → R is F (x, y) = , then ∇F = f .
2
∂f2 ∂f1
If f : R2 → R2 is a C 1 -function with − ≡ 0, we may ask whether there is a C 2 -function
∂x ∂y
F : R2 → R with ∇F = f , or equivalently whether f has path independent line integral in R2 .
The affirmative answer is given by [140] below. Another related result is Green’s theorem (stated
as [141] below), which is true for regions bounded by piecewise C 1 paths, but we will prove only a
special case of the Green’s theorem.
[140] Let U ⊂ Rn be open and f : U → Rn be a C 1 -function. Suppose that U is convex6. Then

there is a C 2 -function F : U → R with ∇F = f ⇔ Jf (a) is symmetric for every a ∈ U (i.e.,
∂fi ∂fj
(a) = (a) for every i, j ∈ {1, . . . , n} and every a ∈ U ).
∂xj ∂xi
Proof. ⇒: This is similar to the proof of [139](ii): Jf (a) is the transpose of HF (a), and HF (a) is
symmetric by [110].
∫1
⇐: After a translation, assume 0 ∈ U . Define F : U → R as F (x) = 0 ⟨f (sx), x⟩ds. That
∫
is, F (x) = α f , where α : [0, 1] → U given by α(s) = sx parametrizes the line segment [0, x]
(here we use the fact that U is convex). We will show ∇F = f , and then the C 1 -property of f
will imply that F is a C 2 -function. Fix j ∈ {1, . . . , n} and x ∈ U . By [124](iii), we have that
∂F ∫ 1 ∂⟨f (sx), x⟩ ∂fi ∂fj ∂f
(x) = 0 ds. The assumption = implies that = ∇fj . Hence
∂xj ∂xj ∂xj ∂xi ∂xj
∂⟨f (sx), x⟩ ∂(f (sx)) ∂x ∂f
=⟨ , x⟩ + ⟨f (sx), ⟩ = s⟨ (sx), x⟩ + ⟨f (sx), ej ⟩
∂xj ∂xj ∂xj ∂xj
= s⟨∇fj (sx), x⟩ + fj (sx) = sg ′ (s) + g(s), where g : [0, 1] → U is g(s) := fj (sx). Therefore,
∂F ∫ 1 ∂⟨f (sx), x⟩ ∫1
(x) = 0 ds = 0 (sg ′ (s) + g(s))ds = sg(s)|10 = g(1) = fj (x).
∂xj ∂xj
Definition: A compact set A ⊂ R2 is said to be an elementary region if there are piecewise C 1 paths
e ψe : [c, d] → R such that A has both of the following two representations:
ϕ, ψ : [a, b] → R and ϕ,
6or more generally that U has no ‘holes’.
e
A = {(x, y) : a ≤ x ≤ b and ϕ(x) ≤ y ≤ ψ(x)} = {(x, y) : c ≤ y ≤ d and ϕ(y) e
≤ x ≤ ψ(y)}.
Clearly, every rectangle is an elementary region. On the other hand, the set {(x, y) ∈ R2 : −1 ≤
x ≤ 1 and x2 ≤ y ≤ x2 + 1} is not an elementary region because the second representation fails.
Exercise-40: (ii) If A ⊂ R2 is a solid triangle, then A is an elementary region.

(ii) If A ⊂ R2 is a compact convex set bounded by a polygonal path (i.e., if A is a compact convex
polyhedron), then A is an elementary region.
[Hint: (i) For example, suppose A has vertices (−1, 2), (0, 0), and (1, 1). Take ϕ to be a parametriza-
tion of [(−1, 2), (0, 0)] + [(0, 0), (1, 1)], ψ to be a parametrization of [(−1, 2), (1, 1)], ϕe to be a
parametrization of [(−1, 2), (0, 0)], and ψe to be a parametrization of [(−1, 2), (1, 1)] + [(1, 1), (0, 0)].
(ii) Let [a, b] × [c, d] be the smallest rectangle enclosing A. Let y1 = min{y : (a, y) ∈ A},
y2 = max{y : (a, y) ∈ A}, y3 = min{y : (b, y) ∈ A}, and y4 = max{y : (b, y) ∈ A}. Let
ϕ : [a, b] → R be a parametrization of the ‘lower’ portion of the boundary of A from (a, y1 ) to
(b, y3 ), and ψ : [a, b] → R2 be a parametrization of the ‘upper’ portion of the boundary of A from
e ψe : [c, d] → R2 .]
(a, y2 ) to (b, y4 ). Similarly, define ϕ,
[141] [Green’s theorem] Let U ⊂ R2 be open and f : U → R2 be a C 1 -function.

∫ ∂f2 ∂f1 ∫
(i) If A ⊂ U is an elementary region, then A ( − ) = ∂A f (where the integral over ∂A is
∂x ∂y
taken with the anticlockwise orientation).
∪
(ii) Suppose A ⊂ U is such that A = pj=1 Aj , where p ∈ N and Aj ’s are elementary regions with
∫ ∂f2 ∂f1 ∫
pairwise disjoint interiors. Then A ( − ) = ∂A f .
∂x ∂y
∫ ∫ ∫ ∂f1
Proof. (i) Let f = (f1 , f2 ). Then ∂A f = ∂A (f1 dx + f2 dy). We will show that A (− ) =
∂y
∫ ∫ ∂f2 ∫ 1 e e
∂A f1 dx and A ∂x = ∂A f2 dy. Choose piecewise C paths ϕ, ψ, ϕ, and ψ such that
e
A = {(x, y) : a ≤ x ≤ b and ϕ(x) ≤ y ≤ ψ(x)} = {(x, y) : c ≤ y ≤ d and ϕ(y) e
≤ x ≤ ψ(y)}.
e ψe are null sets in R2 . Since ∂A consists of these graphs and at most two
The graphs of ϕ, ψ, ϕ,
horizontal and at most two vertical line segments, ∂A is also a null set in R2 . Hence the compact
∂f2 ∂f1
set A is Jordan measurable, and therefore the continuous function − is indeed Riemann
∂x ∂y
integrable over A by [128](ii).
∫ ∂f1 ∫ b ∫ ψ(x) ∂f1 ∫b
We have A (− ) = a ( ϕ(x) (− )dy)dx = a (f1 (x, ϕ(x) − f1 (x, ψ(x))dx by the first rep-
∂y ∂y
resentation of A. Let α, β : [a, b] → R2 be α(t) = (t, ϕ(t)) and β(t) = (t, ψ(t)). Then dx = dt
along both α and β. Note that the vertical line segments of ∂A (if any) do not contribute to the
∫
integral ∂A f1 dx since dx = 0 along vertical lines. Therefore, by the first representation of A, we
∫ ∫ ∫ ∫b ∫ ∂f1
get ∂A f1 dx = α f1 dx − β f1 dx = a f1 (t, ϕ(t) − f1 (t, ψ(t))dt = A (− )dy.
∂y
∫ ∂f2 ∫ d ∫ ψ(y)
e ∂f2 ∫d
Next, e e
y) − f2 (ϕ(y),
A ∂x = c ( ϕ(y) e ∂x
dx)dy = c (f2 (ψ(y), y))dy by the second representa-
e
tion of A. Let γ, σ : [c, d] → R2 be γ(t) = (ϕ(t), e
t) and σ(t) = (ψ(t), t). Then dy = dt along both
γ and σ. Note that the horizontal line segments of ∂A (if any) do not contribute to the integral
∫
∂A f2 dy since dy = 0 along horizontal lines. Therefore, by the second representation of A, we get
∫ ∫ ∫ ∫d ∫ ∂f1
e e
∂A f2 dy = σ f2 dy − γ f2 dy = c (f2 (ψ(t), t) − f2 (ϕ(t), t))dt = A (− ∂y )dy.
∫ ∂f2 ∂f1 ∑ ∫ ∂f2 ∂f1

(ii) We have A ( − ) = pj=1 Aj ( − ) by [127](iv) because Ai ∩ Aj is a null set in R2
∂x ∂y ∫ ∑ ∂x ∫ ∂y
for every i ̸= j. We also have ∂A f = pj=1 ∂Aj f because the integrals over the common portions
of ∂Ai and ∂Aj (if any) for i ̸= j are in opposite directions and cancel each other. Hence the result
follows by applying part (i) to each Aj .
∫
∂f2 ∂f1 ∫
Remark: The equality ‘ A( − ) = ∂A f ’ in Green’s theorem may be interpreted as follows:
∂x ∂y
the net amount of anticlockwise rotation of a 2-dimensional flow f in a planar region A is equal to
the net amount of the flow f along the boundary of A in the anticlockwise direction.
x2 y 2
Exercise-41: Let A ⊂ R2 be the region bounded by the ellipse + 2 = r2 , where a, b > 0. Then
a2 b
µ(A) = πabr2 by an application of Green’s theorem. [Hint: Choose a simple enough C 1 -function
∂f2 ∂f1 ∫ ∫ ∂f2 ∂f1 ∫
f : R2 → R2 with − ≡ 1, say f (x, y) = (0, x). Then µ(A) = A 1 = A ( − ) = ∂A f
∂x ∂y ∂x ∂y
by [141]. Parametrizing ∂A with α : [0, 2π] → R2 given by α(t) = (ar cos t, br sin t), we see
∫ ∫1 ′
∫ 2π ∫
2 2π ( 1 + cos(2t) )dt = πabr 2 .]
∂A f = 0 ⟨f (α(t)), α (t)⟩dt = 0 abr cos tdt = abr
2 2
0 2
14. Surface integrals
As in the case of line integrals, we will define two types of surface integrals - (i) for scalar fields,
and (ii) for vector fields. We will think of a surface as a function rather than as a set (as in the
case of a path). First, we need to recall the notion of a cross product (vector product).
Definition: The cross product (also called vector product) is a binary operation on R3 defined by
the following conditions: (i) e1 × e2 = e3 , e2 × e3 = e1 , and e3 × e1 = e2 .
(ii) ej × ei = −(ei × ej ) for 1 ≤ i, j ≤ 3, and in particular ej × ej = 0 ∈ R3 for 1 ≤ j ≤ 3.
∑ ∑
(iii) u × v = 3i=1 3j=1 ui vj (ei × ej ) for every u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) in R3 .
Exercise-42: In R3 , we have: (i) v × u = −(u × v); and hence u × u = 0 ∈ R3 .

(ii) The cross product a is bilinear map, i.e., it is linear in each variable.
   
e1 e2 e3 e1 u 1 v1
   

(iii) Symbolically, u × v = det u1 
u2 u3  = det e2  u 2 v2  t
 (since det(A ) = det(A)).
v1 v2 v3 e3 u 3 v3
(iv) u × v ̸= 0 ⇔ {u, v} is linearly independent (this follows from (iii)).
     
w1 w2 w3 u1 u2 u3 u1 v1 w1
     
(v) ⟨u × v, w⟩ = det 
 u 1 u2 u 3
 = det  v1 v2 v3  = det u2 v2 w2 .
    
v1 v2 v3 w1 w2 w3 u3 v3 w3
(vi) ⟨u × v, w⟩ ̸= 0 ⇔ {u, v, w} is linearly independent (this follows from (v)). In particular, u × v
is perpendicular to both u and v, i.e., ⟨u × v, u⟩ = 0 = ⟨u × v, v⟩.
(vii) ∥u × v∥2 = ∥u∥2 ∥v∥2 − |⟨u, v⟩|2 by (iii). It follows that ∥u × v∥ = ∥u∥∥v∥ sin θ if θ ∈ [0, π] is
the angle between u and v because ⟨u, v⟩ = ∥u∥∥v∥ cos θ.
(viii) ∥u × v∥ is the area of the parallelogram in R3 with vertices 0, u, v, and u + v by (vii).
(ix) |⟨u × v, w⟩| is the area of the parallelepiped in R3 specified by the three vectors u, v, and w (to
see this, note that if η is the angle between u × v and w, then |⟨u × v, w⟩| = ∥u × v∥∥w∥| cos η|).
(x) u × (v × w) = ⟨u, w⟩v − ⟨u, v⟩w.
(xi) [Jacobi identity] (u × v) × w + (v × w) × u + (w × u) × v = 0 ∈ R3 .
There are different approaches to the definition of a surface. We will consider only the restricted
notion of a parametric surface in R3 . As in the case of a path, we will define a parametric surface as
a function; and the image of this function will be what we geometrically think of as a surface. There
will be a little bit of ambiguity in the definition of a parametric surface since minor modifications
will be needed depending on the context.
Definition: A closed path α : [a, b] → Rn is said to be simple if α is injective on [a, b).
Definition: Let X ⊂ R2 be a compact connected set whose boundary can be parametrized by a

piecewise C 1 simple closed path (this implies in particular that X is Jordan measurable). Then a
C 1 -function P : X → R3 will be called a parametric surface (ambiguity: define partial derivatives
of P by considering one-sided limits or assume P is defined and is C 1 in a neighborhood of X).
Geometrically, the image set P (X) is to be thought of as a surface in R3 . To ensure that P (X) is
indeed a ‘two-dimensional’ figure, it is desirable to assume the following:
(i) P is injective on X, or at least on int(X).
(ii) JP (a) has full rank 2 for every a ∈ X, or for every a ∈ int(X). Note that JP (a) has rank 2 ⇔
∂P ∂P ∂P ∂P
the columns (a) and (a) of JP (a) are linearly independent ⇔ (a) × (a) ̸= 0 ∈ R3 .
∂x ∂y ∂x ∂y
Example: (i) [Sphere] Let X = [0, 2π] × [0, π], r > 0, and define a C 1 -function P : X → R3 as
P (x, y) = (r cos x sin y, r sin x sin y, r cos y), which is injective on int(X). Note that the image P (X)
is the sphere with radius r centered at the origin of R3 . We have that
 
e1 e2 e3
∂P ∂P  
(x, y) × (x, y) = det 
 −r sin x sin y r cos x sin y 0  = −r sin yP (x, y) ̸= (0, 0, 0) for

∂x ∂y
r cos x cos y r sin x cos y −r sin y
∂P ∂P
every (x, y) ∈ int(X). Thus P is a parametric surface. In this example, (x, y) × (x, y) is
∂x ∂y
the inward normal to the sphere at P (x, y) because of the negative sign in ‘−r sin yP (x, y)’, and
∂P ∂P
∥ (x, y) × (x, y)∥ = r2 sin y because ∥P (x, y)∥ = r and sin y ≥ 0 for y ∈ [0, π].
∂x ∂y
(ii) [Cylinder] Let r > 0, h > 0, X = [0, 2π] × [0, h], and define a C 1 -function P : X → R3 as
P (x, y) = (r cos x, r sin x, y), which is injective on int(X). Note that the image P (X) is a vertical
cylinder of height h and radius r (without the top and bottom discs) with the center of the bottom
disc placed at the origin of R3 . We have that
 
e1 e2 e3
∂P ∂P  
(x, y) × (x, y) = det 
 −r sin x r cos x 0  = (r cos x, r sin x, 0) = P (x, 0) ̸= (0, 0, 0) for

∂x ∂y
0 0 1
∂P ∂P
every (x, y) ∈ X. Thus P is a parametric surface. In this example, (x, y) × (x, y) is the
∂x ∂y
∂P ∂P
outward normal to the cylinder at P (x, y), and ∥ (x, y) × (x, y)∥ = ∥P (x, 0)∥ = r.
∂x ∂y
Exercise-43: (i) [Observation] If P = (P1 , P2 , P3 ) : X → R3 is a parametric surface and a ∈ X,
then by Exercise-42(iii),
 we see that 
∂P1 ∂P1  
e (a) (a)
 1 ∂x ∂y  e1
 ∂P2  [ ]  
∂P ∂P  ∂P2 
(a) × (a) = det e2 (a) (a) = det E JP (a) , where E :=  
e2  .
∂x ∂y  ∂x ∂y 
 ∂P3 ∂P3 
e3 (a) (a) e3
3×1
∂x ∂y
∂P ∂P
This suggests that ∥ (a) × (a)∥ is the ‘local magnification factor’ of P at a ∈ X.
∂x ∂y
(ii) Let P : X → R3 and Pe : X e → R3 be parametric surfaces, and suppose g : X → Xe is a
C 1 -diffeomorphism with P = Pe ◦ g. Then for every a ∈ X and b := g(a) ∈ X, e we have that
∂P ∂P ∂ Pe ∂ Pe
(a) × (a) = det(Jg (a))( (b) × (b)).
∂x ∂y ∂x ∂y
[Hint: (ii) JP (a) = JPe (b)Jg (a) by Chain rule. Now use (i).]
Discussion: (i) Let P : X → R3 be a parametric surface, and D ⊂ X be a small rectangle. Let

a, a + εe1 , a + δe2 , and a + εe1 + δe2 be the vertices of D, where ε, δ > 0 are small. Then P (D)
is approximately equal to a parallelogram, three of whose four vertices are P (a), P (a + εe1 ), and
P (a + δe2 ). Using the C 1 -property of P and Mean value theorem, we see that the area of this
∂P ∂P
parallelogram is ∼ ∥(P (a + εe1 ) − P (a)) × (P (a + δe2 ) − P (a))∥ ∼ ∥ (a) × (a)∥εδ. Thus
∂x ∂y
∂P ∂P
area(P (D)) ∼ ∥ (a) × (a)∥µ(D).
∂x ∂y
(ii) Let U ⊂ R3 be open, f : U → R be continuous, and P : X → U be a parametric surface. We

∫
wish to define the integral P f . Since X is Jordan measurable, we may approximate X from inside
∪
by a finite union qi=1 Di of rectangles with pairwise disjoint interiors. If Di ’s are sufficiently small
∫ ∑
and ai ∈ Di , then P f should be approximately equal to qi=1 f (P (ai )) × area(P (Di )). By part
∫ ∑ ∂P ∂P
(i), this requirement becomes P f ∼ qi=1 f (P (ai ))∥ (ai ) × (ai )∥µ(Di ) if we take ai to be
∂x ∂y
the lower-left vertex of Di . The sum on the right hand side is a Riemann sum approximating the
∂P ∂P
Riemann integral over (the Jordan measurable set) X of the continuous function (f ◦P )∥ × ∥
∂x ∂y
from X to R. Hence we define:
Definition: (i) [Surface integral of a scalar field] Let U ⊂ R3 be open, f : U → R be continuous,
and P : X → U be a parametric surface. We define ∫ ∫ ∂P ∂P

P f= X (f ◦ P )∥ × ∥
∂x ∂y
(ii) If P : X → R3 is a parametric surface, then we define ∫ ∫ ∂P ∂P

area(P (X)) = P 1= X ∥ × ∥
∂x ∂y
∫ ∫
Remark: The surface integral P f is also denoted as S f dS, where S := P (X).
Example: (i) Recall the examples of the sphere and the cylinder from the previous page. In the
∂P ∂P
case of the sphere with radius r, we have ∥ (x, y) × (x, y)∥ = r2 sin y, and hence the surface
∫ 2π ∫ π ∂x ∫ 2π ∂y
area of this sphere is = 0 ( 0 r2 sin ydy)dx = 0 2r2 dx = 4πr2 . In the case of the cylinder with
∂P ∂P
height h and radius r, we have ∥ × ∥ ≡ r, and hence the surface area of this cylinder (without
∂x ∂y
∫ 2π ∫ h
the top and bottom discs) is = 0 ( 0 rdy)dx = 2πrh.
(ii) Let h > 0, r > 0, X = [0, π/2] × [0, h], and P : X → R3 be P (x, y) = (r cos x, r sin x, y). Note
∂P ∂P ∫ ∫ ∂P ∂P
that ∥ × ∥ ≡ r. If f : R3 → R is f (x, y, z) = x + y + z, then P f = X (f ◦ P )∥ × ∥=
∂x ∂y ∂x ∂y
∫ π/2 ∫ h 2 ∫ 2 2
2 sin x + ry)dy)dx = π/2 (r 2 h cos x + r 2 h sin x + rh )dx = 2r 2 h + πrh .
0 ( 0 (r cos x + r 0 2 4
Exercise-44: Let P : X → R3 be a parametric surface of the form P (x, y) = (x, y, ϕ(x, y)), where
∂P ∂P
ϕ : X → R is a C 1 -function. Then P (X) is the graph of ϕ, P is injective on X, and × =
  ∂x ∂y
e e e3
 1 2  √
 ∂ϕ  ∂ϕ ∂ϕ ∫ ∂ϕ 2 ∂ϕ 2
 
 = (− ∂x , − ∂y , 1) ̸= 0 ∈ R in X. Hence area(P (X)) = A 1 + ( ∂x ) + ( ∂y ) .
det  1 0 3
 ∂x
∂ϕ 
0 1
∂y
Exercise-45: Let U ⊂ R3 be open, f, g : U → R be continuous, and P : X → U be a parametric
∫ ∫ ∫
surface. Then, (i) P (c1 f + c2 g) = c1 P f + c2 P g for every c1 , c2 ∈ R.
∫ ∫ ∫
(ii) If f ≥ g, then P f ≥ P g. In particular, if f ≥ 0, then P f ≥ 0.
[142] Let U ⊂ R3 be open, f : U → R be continuous, and P : X → U and Pe : X e → U be

e with P = Pe ◦ g (if necessary
parametric surfaces. Suppose there is a C 1 -diffeomorphism g : X → X
∫ ∫
assume g is defined in a neighborhood of X). Then, P f = Pe f .
e e
Proof. If a ∈ X and b = g(a) ∈ X, e then ∥ ∂P (a) × ∂P (a)∥ = ∥ ∂ P (b) × ∂ P (b)∥| det(Jg (a))|
∂x ∂y ∂x ∂y
e e
e → R be h(b) = (f ◦ Pe)(b)∥ ∂ P (b) × ∂ P (b)∥. Then by Change of
by Exercise-43(ii). Let h : X
∫∂x ∫ ∂y ∫
variable theorem and the initial observation, we see that Pe f = Xe h = X (h ◦ g)| det(Jg (·)| =
∫ ∂P ∂P ∫
X (f ◦ P )∥ ∂x × ∂y ∥ = P f .
[143] (i) Let 0 ≤ a < b, and ϕ : [a, b] → R be a C 1 -function. Assume that the graph of ϕ lies in
the xz-plane in R3 . Then the area of the ‘surface of revolution’ obtained by rotating the graph of
∫b √
ϕ around the z-axis is 2π a x 1 + (ϕ′ (x))2 dx.
√
(ii) The surface area of the cone with height h > 0 and radius r > 0 (without the disc) is πr r2 + h2 .
Proof. (i) Let X = [a, b] × [0, 2π]. The ‘surface of revolution’ is parametrized by P : X → R3 given
 
e1 e2 e3
∂P ∂P  
by P (x, y) = (x cos y, x sin y, ϕ(x)). Now, (x, y) × (x, y) = det 
 cos y sin y ϕ′ (x)
=
∂x ∂y
−x sin y x cos y 0
∂P ∂P √
(−xϕ′ (x) cos y, −xϕ′ (x) sin y, x) so that ∥ (x, y) × (x, y)∥ = x 1 + (ϕ′ (x))2 . Therefore,
√ ∂x ∂y
√
∫ b ∫ 2π ∫b
area(P (X)) = a ( 0 x 1 + (ϕ′ (x))2 dy)dx = 2π a x 1 + (ϕ′ (x))2 dx.
(ii) The surface of the cone (without the disc) can be obtained as a ‘surface of revolution’ as
described in (i) if we take ϕ : [0, r] → R as ϕ(x) = hx/r. Hence by (i), the area of the cone
∫r √ ∫r √ √ √
= 2π 0 x 1 + (ϕ′ (x))2 dx = 2π 0 x 1 + (h/r)2 dx = πr2 1 + (h/r)2 = πr r2 + h2 .
Our next aim is to define surface integrals for vector fields in R3 .
Discussion: Let U ⊂ R3 be open, f : U → R3 be a C 1 -function, and P : X → U be a parametric

∫
surface. We wish to define the integral P f . Assume f describes a flow (think of f (u) as the
∫
velocity vector of the flow at u ∈ U ), and the value P f should give the total flow across P (X) in
∂P ∂P
the direction specified by ×
∂x ∂y
∂P ∂P
(i) Let D ⊂ X be a small rectangle, a ∈ D, and suppose (a) × (a) ̸= 0 ∈ R3 . Then, observe
∂x ∂y
∂P ∂P ∂P ∂P
that ( (a) × (a))/∥ (a) × (a)∥ is a unit normal to the surface P (X). Hence the flow f
∂x ∂y ∂x ∂y
across P (D) in the direction of this unit normal is
∂P ∂P
∼ ⟨f (P (a)), unit normal⟩ × area(P (D)) ∼ ⟨f (P (a)), (a) × (a)⟩µ(D) because area(P (D)) ∼
∂x ∂y
∂P ∂P
∥ (a) × (a)∥µ(D).
∂x ∂y
∪q
(ii) Since X is Jordan measurable, we may approximate X from inside by a finite union i=1 Di
of rectangles with pairwise disjoint interiors. If Di ’s are sufficiently small and ai ∈ Di , then by
∂P ∂P
(i), the total flow f across P (X) in the direction specified by × is approximately equal to
∂x ∂y
∑q ∂P ∂P
i=1 ⟨f (P (ai )), (ai ) × (ai )⟩µ(Di ). The right hand side is a Riemann sum of the continuous
∂x ∂y
∂P ∂P
real-valued function ⟨f ◦ P, × ⟩ over the Jordan measurable compact set X. This function
∂x ∂y
is Riemann integrable by [128](ii). Hence we define:
Definition: [Surface integral of a vector field] Let U ⊂ R3 be open, f : U → R3 be a C 1 -function,

and P : X → U be a parametric surface. The surface integral of f over P (or over S := P (X)) is
defined as ∫ ∫ ∂P ∂P .
P f= X ⟨f ◦ P, ∂x
×
∂y
⟩
Remark: Sometimes we are interested in calculating the flow f across S := P (X) in the direction
∂P ∂P ∫ ∂P ∂P
of −( × ) (depends on the context). Then we consider − X ⟨f ◦ P, × ⟩ as the surface
∂x ∂y ∫ ∫ ∂x ∂y
integral. The surface integral is also denoted as P f · n
b, or as S f · n
b dS, where S = P (X) and
∂P ∂P ∂P ∂P
b = ±(
n × )/∥ × ∥ (the unit normal).
∂x ∂y ∂x ∂y
Exercise-46: Let U ⊂ R3 be open, f, g : U → R3 be C 1 -functions, and P : X → U be a parametric
∫ ∫ ∫
surface. Then P (c1 f + c2 g) = c1 P f + c2 P g for every c1 , c2 ∈ R.
Example: Let h, r > 0, X = [0, 2π] × [0, h], and P : X → R3 be P (x, y) = (r cos x, r sin x, y). We
know that P (X) is a cylinder with height h and radius r whose axis is the z-axis. (i) Let f : R3 → R3
be f (x, y, z) = (−y, x, 0). Then f represents a rotation around the z-axis, and hence there is no flow
∫ ∫ ∫
out of P (X) so that we expect P f = 0. Indeed P f = X ⟨(−r sin x, r cos x, 0), (r cos x, r sin x, 0)⟩ =
∫
X 0 = 0. (ii) Let f : R → R be f (x, y, z) = (x, y, 0), then there is flow out of P (X); in fact,
3 3
∫ ∫ ∫ 2
P f = X ⟨(r cos x, r sin x, 0), (r cos x, r sin x, 0)⟩ = X r = r µ(X) = 2πr h.
2 2
Another notation for the surface integral of a vector field: Let U ⊂ R3 be open, f = (f1 , f2 , f3 ) :
U → R3 be continuous, and P = (P1 , P2 , P3 ) : X → U be a parametric surface. Write elements of
X as (x, y) and elements of U as (u1 , u2 , u3 ). Then f (P (x, y)) = f (u1 , u2 , u3 ), where u1 = P1 (x, y),
u2 = P2 (x, y), and u3 = P3 (x, y). For distinct j, k ∈ {1, 2, 3}, letting
 
∂Pj ∂Pj
∂P ∂P
duj ∧ duk = det  ∂x ∂y 
, we see ⟨f ◦ P, × ⟩ = f1 du2 ∧ du3 + f2 du3 ∧ du1 + f3 du1 ∧ du2 .
∂Pk ∂Pk ∂x ∂y
∂x ∂y
Also, let S = P (X). Then the surface integral may be written as
∫ ∫
... P f = S (f1 du2 ∧ du3 + f2 du3 ∧ du1 + f3 du1 ∧ du2 ) .
Often, duj ∧ duk is written simply as duj duk (but it should be noted that this does not mean a
simple double integral as in Fubini’s theorem). Since the calculation of the surface integral does
not involve the Chain rule, it is not essential to use disjoint sets of variables for f and P : we may
write the variables of f as x, y, z also. Then the notation for the surface integral becomes
∫ ∫
P f = S (f1 dydz + f2 dzdx + f3 dxdy)
.
Exercise-47: Let f : R3 → R3 be f (x, y, z) = (x, y, 0), and S ⊂ R3 be the upper half of the sphere
∫
with radius r > 0 centered at the origin. Compute the surface integral S f with respect to the
outward unit normal to S by considering the following parametrizations:
(i) P : [0, 2π] × [0, π/2] → R3 , P (x, y) = (r cos x sin y, r sin x sin y, r cos y).
√
(ii) P : B(0, 1) ⊂ R2 → R3 , P (x, y) = (x, y, r2 − x2 − y 2 ).
∂P ∂P
[Hint: (i) We know (x, y) × (x, y) = −r sin yP (x, y), which is an inward normal. Let
∂x ∂y
∫ ∫ ∂P ∂P
X = [0, 2π] × [0, π/2]. Then S f = − X ⟨f ◦ P, × ⟩
∫ ∂x ∂y
= − X ⟨(r cos x sin y, r sin x sin y, 0), −r sin y(r cos x sin y, r sin x sin y, r cos y)⟩
∫ π/2 ∫ 2π 3 3 ∫ ∫ ∫
3 π/2 sin3 ydy = 2πr 3 π/2 (1−cos2 y) sin ydy = 2πr 3 1 (1−λ2 )dλ =
= 0 0 r sin ydxdy = 2πr 0 0 0
√
4πr3 /3 by putting λ = cos y. (ii) Letting ϕ(x, y) = r2 − x2 − y 2 and using Exercise-44, we
∂P ∂P ∂ϕ ∂ϕ P
have × = (− , − , 1) = , which is the outward normal. Let X = B(0, 1) ⊂
∂x ∂y ∂x ∂y ϕ
∫ ∫ ∂P ∂P ∫ x y
R2 . Then S f = X ⟨f ◦ P, × ⟩ = X ⟨(x, y, 0), ( √ ,√ , 1)⟩ =
∂x ∂y r −x −y
2 2 2 r − x2 − y 2
2
∫ x2 + y 2
X
√ . Using the polar coordinates (x, y) = (t cos θ, t sin θ) and Change of variable the-
r2 − x2 − y 2
∫ r ∫ 2π t2 × t ∫ r 2πt3 ∫r
orem, this integral is equal to 0 0 √ dθdt = 0 √ dt = 0 2π(r2 − λ2 )dλ = 4πr3 /3
r 2 − t2 r 2 − t2
√ −tdt
by putting λ = r2 − t2 (then dλ = √ and t2 = r2 − λ2 ).]
r 2 − t2
[144] Let U ⊂ R3 be open, f : U → R3 be continuous, and P : X → U and Pe : X e → U be
e with P = Pe ◦ g (if necessary
parametric surfaces. Suppose there is a C 1 -diffeomorphism g : X → X
assume g is defined in a neighborhood of X).
∫ ∫
(i) If det(Jg (a)) > 0 for every a ∈ X, then P f = Pe f .
∫ ∫
(ii) If det(Jg (a)) < 0 for every a ∈ X, then P f = − Pe f .
e e
Proof. If a ∈ X and b = g(a) ∈ X, e then ∂P (a) × ∂P (a) = det(Jg (a))( ∂ P (b) × ∂ P (b)) by
∂x ∂y ∂x ∂y
e e
Exercise-43(ii). Let h : X e → R be h(b) = ⟨(f ◦ Pe)(b), ∂ P (b) × ∂ P (b)⟩. Then by Change of
∂x
∫ ∫ ∂y ∫
variable theorem and the initial observation, we see that Pe f = Xe h = X (h ◦ g) det(Jg (·)) =
∫ ∂P ∂P ∫
± X ⟨f ◦ P, (a) × (a)⟩ = ± P f , where we have plus sign if det(Jg (·)) > 0 and we have minus
∂x ∂y
sign if det(Jg (·)) < 0.
15. Divergence and curl
We will introduce the notions of divergence and curl for a vector field: divergence measures
expansion/compression (positive divergence indicates expansion and negative divergence indicates
compression), and curl measures the circulation density (the direction of the curl vector indicates
the axis around which maximal rotation happens and the magnitude of the curl vector measures
the speed of rotation).
Definition: (i) Let U ⊂ Rn be open and f = (f1 , . . . , fn ) : U → Rn be a C 1 -function (or just

assume that all the first order partial derivatives exist). Then the divergence of f is the scalar-
∑ ∂fi
valued function divf : U → R defined as divf = ⟨∇, f ⟩ = ∇ · f = ni=1 . Note that if f is a
∂xi
C k -function, then divf is a C k−1 -function.
(ii) Let U ⊂ R3 be open and f = (f1 , f2 , f3 ) : U → R3 be a C 1 -function (or just assume that all the
first order partial derivatives exist). Then the curl of f is the vector-valued function curlf : U → R3
 
e1 e2 e3
∂ ∂ 
 ∂  ∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1
defined as curlf = ∇ × f = det  =( − , − , − ). Note that
 ∂x ∂y ∂z  ∂y ∂z ∂z ∂x ∂x ∂y
f1 f2 f3
if f is a C -function, then curlf is a C k−1 -function. Also observe that if a ∈ U and Jf (a) is
k
symmetric, then (curlf )(a) = 0.
(iii) Let U ⊂ Rn be open. If f : U → R is a C 2 -function, then the Laplacian ∇2 f of f is the

∑ ∂2f
function ∇2 f : U → R is defined as ∇2 f = ⟨∇, ∇f ⟩ = div(∇f ) = ni=1 2 . If ∇2 f ≡ 0, then f is
∂xi
said to be a harmonic function. For example, the real part and imaginary part of a holomorphic
function from C to C are known to be harmonic. If f = (f1 , . . . , fn ) : U → Rn is a C 2 -function,
then we define ∇2 f = (∇2 f1 , . . . , ∇2 fn ).
Remark: [Meaning of divergence in R2 ] Let U ⊂ R2 be open, f = (f1 , f2 ) : U → R2 be a C 1 -

function, and (a, b) ∈ U . Consider a small square A ⊂ U centered at (a, b) with side-length ε > 0.
The flow out of A through an edge of A is approximately equal to
⟨f (mid point of the edge), outward unit normal of the edge⟩ × length of the edge. Hence,
net amount of flow out of A
area(A)
1
∼ 2 [⟨f (a, b − ε/2), −e2 ⟩ε + ⟨f (b, a + ε/2), e1 ⟩ε + ⟨f (a, b + ε/2), e2 ⟩ε + ⟨f (a − ε/2, b), −e1 ⟩ε]
ε
f1 (a + ε/2, b) − f (a − ε/2, b) f2 (a, b + ε/2) − f2 (a, b − ε/2) ∂f1 ∂f2
= + → (a, b) + (a, b)
ε ε ∂x ∂y
= (divf )(a, b) as ε → 0. Similar explanation can be given in higher dimensions.
Remark: [Meaning of curl] Let U ⊂ R3 be open, f = (f1 , f2 , f3 ) : U → R3 be a C 1 -function, and

∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1
consider a ∈ U . By definition, (curlf )(a) = ( (a) − (a), (a) − (a), (a) − (a)).
∂y ∂z ∂z ∂x ∂x ∂y
∂f2 ∂f1
The third coordinate (a) − (a) gives the circulation density of f at a in the plane passing
∂x ∂y
though a and parallel to the xy-plane (equivalently, around the line passing through a and parallel
to the z-axis). Similar explanations can be given for the first and second coordinates of (curlf )(a).
Notation: Let U ⊂ Rn be open, and f, g : U → Rn be functions. Write f = (f1 , . . . , fn ) and

∑
g = (g1 , . . . , gn ). Then ⟨f, g⟩ : U → R is defined as ⟨f, g⟩(a) = ni=1 fi (a)gi (a). If n = 3, then
f × g : U → R3 is defined as (f × g)(a) = (f (a)) × (g(a)).
Exercise-48: Let U ⊂ R3 be open and f, g, h : U → R3 be functions. Then,

(i) f × (g × h) = ⟨f, h⟩g − ⟨f, g⟩h by Exercise-42(x).
(ii) (f × g) × h = −(h × (f × g)) = ⟨h, f ⟩g − ⟨h, g⟩f = ⟨f, h⟩g − ⟨g, h⟩f by (i).
Warning: Let U ⊂ Rn be open, f : U → Rn be a C 1 -function. Then ⟨∇, f ⟩ ̸= ⟨f, ∇⟩: the right
∑ ∂
hand side is the partial differential operator ni=1 fi .
∂xi
[145] Let U ⊂ Rn be open. (i) Let f, g : U → Rn be C 1 -functions, and c1 , c2 ∈ R.
Then div(c1 f + c2 g) = c1 divf + c2 divg. If n = 3, then curl(c1 f + c2 g) = c1 curlf + c2 curlg.
(ii) Let f : U → Rn and ϕ : U → R be C 1 -functions. Then
div(ϕf ) = ⟨∇, ϕf ⟩ = ⟨∇ϕ, f ⟩ + ϕ⟨∇, f ⟩ = ⟨∇ϕ, f ⟩ + ϕ divf . If n = 3, then
curl (ϕf ) = ∇ × (ϕf ) = ∇ϕ × f + ϕ(∇ × f ) = ∇ϕ × f + ϕ curlf .
(iii) Assume n = 3, and let f, g : U → R3 be C 1 -functions. Then
div(f × g) = ⟨∇, f × g⟩ = ⟨g, ∇ × f ⟩ − ⟨f, ∇ × g⟩ = ⟨g, curlf ⟩ − ⟨f, curlg⟩, and
curl(f × g) = ∇ × (f × g) = ⟨g, ∇⟩f − ⟨f, ∇⟩g + ⟨∇, g⟩f − ⟨∇, f ⟩g
= ⟨g, ∇⟩f − ⟨f, ∇⟩g + (divg)f − (divf )g.
Proof. The verifications are left to the student.
[146] Let U ⊂ R3 be open, and f : U → R3 be a C 1 -function.

(i) If f = ∇F for some F : U → R, then curlf ≡ 0. That is, curl(grad) ≡ 0.
(ii) If f = curlg for some C 2 -function g : U → R3 , then divf ≡ 0. That is, div(curl) ≡ 0.
(iii) If f = curlg for some C 2 -function g : U → R3 , then curlf = curl(curlg) = ∇(divg) − ∇2 g.
Proof. (i) Suppose f = ∇F , and consider a ∈ U . Then Jf (a) is equal to the transpose of the
Hessian matrix HF (a). But F is a C 2 -function (since f is C 1 ), and hence HF (a) is symmetric by
[110]. Thus Jf (a) is symmetric, which implies (curlf )(a) = 0 by the definition of curlf .
(ii) Use the equality of second order mixed partial derivatives of g given by [110].
(iii) The proof is similar to that of Exercise-42(x).

Remark: (i) Compare [146](i) with [139](i) and [139](ii). Another way to understand [146](i) is: if
∫
U is connected and f = ∇F , then α f = 0 by [138] for every piecewise C 1 closed path α in U , and
therefore ‘circulation density’ of f is zero everywhere in U ; so curlf ≡ 0.
(ii) At a formal level, [146](ii) says ⟨∇, ∇ × g⟩ = 0, which is similar to the fact ⟨u, u × v⟩ = 0.
Further intuition about [146](ii) is given by Gauss’ divergence theorem [149] (see Exercise-51).
Example: (i) Let f : R3 → R3 be f (x, y, z) = (x, y, z). This represents a flow originating from
(0, 0, 0) and spreading outwards with increasing speed. There is no rotation involved. We see that
divf ≡ 3 and cirlf ≡ 0 ∈ R3 .
(ii) Let f : R3 → R3 be f (x, y, z) = (−y, x, 0), which is a rotation around the z-axis. There is
neither expansion nor compression. We see that divf ≡ 0 and (curlf )(x, y, z) = (0, 0, 2).
Exercise-49: Let U ⊂ R3 be open.

(i) If F : U → R is a harmonic C 2 -function and f = ∇F , then divf ≡ 0 ∈ R and curlf ≡ 0 ∈ R3 .
(ii) If f, g : U → R3 are C 2 -functions with divf = divg and curlf = curlg, then ∇2 (f − g) ≡ 0 ∈ R3 .
(iii) Assume U is connected. Then there is a non-constant C 2 -function f : U → R3 with divf ≡
0 ∈ R and curlf ≡ 0 ∈ R3 .
[Hint: (i) divf = ∇2 F ≡ 0 since F is harmonic. And [146](i) gives curlf = 0. (ii) Let h = f − g,
and note by [146](iii) that curl(curlh) = ∇(divh)−∇2 h. (iii) Let F : R3 → R be F (x, y, z) = x2 −y 2
which is (the real part of a holomorphic function and) harmonic. Let f = ∇F . Then divf ≡ 0 ∈ R
and curlf ≡ 0 ∈ R3 by (i), but f is not a constant: f (x, y, z) = (2x, −2y, 0).]
The converse parts of [146](i) and [146](ii) are true in certain special cases:
[147] Let U ⊂ R3 be open and f : U → R3 be a C 1 -function.

(i) Assume U is convex7. Then there is F : U → R with ∇F = f ⇔ curlf ≡ 0 ∈ R3 .
(ii) Assume U is the interior of a 3-box. Then there is g : U → R with curlg = f ⇔ divf ≡ 0 ∈ R.
Proof. (i) The implication ⇒ is given by [146](i). And the reverse implication follows from [140]
because Jf (a) is symmetric for every a ∈ U if curlf ≡ 0.
(ii) The implication ⇒ is given by [146](ii). For the reverse implication, see Theorem 12.5 of
Apostol, Calculus-II (left as a reading assignment).
16. Stokes’ theorem
The result analogous to Green’s theorem in dimension 3 is called Stokes’ theorem. Roughly
speaking, it says that if P : X ⊂ R2 → R3 is a parametric surface, then the net amount of rotation
7or more generally, an open set without ‘holes’.

of a flow tangential to P (X) is equal to the net amount of anticlockwise flow along the boundary
of P (X). A technical point: if necessary assume P is defined in a neighborhood of X.
[148] [Stokes’ theorem] Let U ⊂ R3 be open, f : U → R3 be a C 1 -function, P : X ⊂ R2 → R3

be an injective C 2 parametric surface, and let α : [c, d] → U be a piecewise C 1 simple closed path
∫ ∫
parametrizing ∂X in the anticlockwise direction. Then P curlf = P ◦α f .
Proof. Write f = (f1 , f2 , f3 ). Then f = (f1 , 0, 0) + (0, f2 , 0) + (0, 0, f3 ). Since the curl and the
∫ ∫
integrals (the surface integral P and the line integral P ◦α ) are linear operators, it suffices to prove
the result for each of the three functions (f1 , 0, 0), (0, f2 , 0), (0, 0, f3 ) separately. We will prove the
result for the function (0, 0, f3 ); the proofs for the other two functions are similar. So assume
f = (0, 0, f3 ) for the rest of the proof. We will use Green’s theorem in the proof.
Write P = (P1 , P2 , P3 ) and α = (α1 , α2 ). We will denote the members of U as u = (u1 , u2 , u3 ).

∫ ∫b
Since f = (0, 0, f3 ), we see that P ◦α f = a ⟨(f ◦ P ◦ α)(t), (P ◦ α)′ (t)⟩dt
[ ]
∫d ∂P3 ′ ∂P3 ′
= c (f3 ◦ P ◦ α)(t) (α(t))α1 (t) + (f3 ◦ P ◦ α)(t) (α(t))α2 (t) dt
∂x ∂y
∫d ∫d ∫ ∫
= c [g1 (α(t))α1′ (t) + g2 (α(t))α2′ (t)] dt = c ⟨g(α(t)), α′ (t)⟩dt = α g = ∂X g,
∂P3 ∂P3
where g : X → R2 is defined as g(a) = (g1 (a), g2 (a)) = (f3 (P (a)) (a), f3 (P (a)) (a)).
∂x ∂y
Note that g is a C 1 -function because f is C 1 and P is C 2 . Applying Green’s theorem to g, we
∫ ∫ ∫ ∂g2 ∂g1
conclude that P ◦α f = ∂X g = X ( − ). (*)
∂x ∂y
For a ∈ X and u = P (a) ∈ U , observe by the definition of g and Chain rule that
( )
∂g2 ∑3 ∂f3 ∂Pi ∂P3 ∂ 2 P3
(a) = i=1 (u) (a) (a) + f3 (u) (a), and
∂x ( ∂ui ∂x ∂y ) ∂x∂y
∂g1 ∑3 ∂f3 ∂P3 ∂Pi ∂ 2 P3
(a) = i=1 (u) (a) (a) + f3 (u) (a).
∂y ∂ui ∂x ∂y ∂y∂x
The final terms are equal by the equality of mixed partial derivatives of P3 given by [110] since P3
is a C 2 -function. Moreover, the terms corresponding to i = 3 are also equal. Therefore,
∂g2 ∂g1 ∂f3 ∂P1 ∂P3 ∂P3 ∂P1 ∂f3 ∂P2 ∂P3 ∂P3 ∂P2
( − )(a) = (u)( − )(a) + (u)( − )(a). (**)
∂x ∂y ∂u1 ∂x ∂y ∂x ∂y ∂u2 ∂x ∂y ∂x ∂y
 
e1 e2 e3
 ∂ ∂ 
 ∂  ∂f3 ∂f3
Next observe that curlf = det  =( ,− , 0), and
 ∂u1 ∂u2 ∂u3  ∂u2 ∂u1
0 0 f3
∂P ∂P ∂P2 ∂P3 ∂P3 ∂P2 ∂P3 ∂P1 ∂P1 ∂P3 ∂P1 ∂P2 ∂P2 ∂P1
× =( − , − , − ).
∂x ∂y ∂x ∂y ∂x ∂y ∂x ∂y ∂x ∂y ∂x ∂y ∂x ∂y
Therefore, for a ∈ X and u = P (a), we may deduce using (**) that
∂P ∂P ∂g2 ∂g1
⟨(curlf )(P (a)), (a) × (a)⟩ = ( − )(a).
∂x ∂y ∂x ∂y
∫ ∫ ∂P ∂P ∫ ∂g2 ∂g1 ∫
Hence (*) implies that P curlf = X ⟨(curlf ) ◦ P, × ⟩ = X( − ) = P ◦α f .
∂x ∂y ∂x ∂y
Remark: (i) Stokes’ theorem can be extended to ‘surfaces’ obtained by ‘pasting together’ finitely
many (images of) parametric surfaces provided on any common boundary of two distinct parametric
surfaces, the line integrals are in opposite directions and cancel each other. On the other hand,
Stokes’ theorem cannot be extended to ‘non-orientable’ surfaces such as the Mobius band (which
has only ‘one side’). For more details on this topic, see Section 12.8 of Apostol, Calculus-II.
(ii) Stokes’ theorem provides an insight into the identity curl(grad) ≡ 0 as follows. Suppose f = ∇F .
∫ ∫
Then by taking β = P ◦ α in [148], we see P curlf = β f = F (β(b)) − F (β(a)) = 0 by [137](i)
since β is a closed path. As P is an arbitrary parametric surface, we may deduce that curlf ≡ 0.
(iii) Let U ⊂ R3 be open, f : U → R3 be a C 1 -function, P : X ⊂ R2 → R3 be an injective C 2

parametric surface, and let α : [a, b] → U be a piecewise C 1 simple closed path parametrizing ∂X
∫
in the anticlockwise direction. If f = curlg, then the surface integral P f simplifies to the line
∫
integral P ◦α g by applying [148] to g.
Exercise-50: (i) Let r > 0 and S ⊂ R3 be the sphere with radius r centered at the origin, with
upper half S1 and lower half S2 . Let X = {(x, y) ∈ R2 : x2 + y 2 ≤ r2 }, and P, Q : X → R3 be
√
P (x, y) = (x, y, ϕ(x, y)) and Q(x, y) = (x, −y, −ϕ(x, y)), where ϕ(x, y) = r2 − x2 − y 2 . Then P
parametrizes S1 , Q parametrizes S2 , and the common boundary ∂S1 ∩ ∂S2 (which is the equator
of S) is parametrized by P and Q in opposite directions (because of the minus sign for y in the
∂P ∂P ∂ϕ ∂ϕ ∂Q ∂Q ∂ϕ ∂ϕ
expression for Q). Moreover, × = (− , − , 1) and × = (− , , −1) give
∂x ∂y ∂x ∂y ∂x ∂y ∂x ∂y
outward normals to S1 and S2 respectively (in the first case, the z-coordinate is positive, which
indicates upward normal; and in the second case, the z-coordinate is negative, which indicates
downward normal). Hence if U ⊂ R3 is an open neighborhood of S and f : U → R3 is a C 1 -
∫ ∫ ∫
function, then S f = P f + Q f .
(ii) Let U ⊂ R3 be open, f : U → R3 be a C 1 -function, and suppose f = curlg for some g : U → R3 .
∫ ∫
Then S f = 0 for any sphere S ⊂ U when the integral S f is considered with respect to the
outward unit normal to S.
[Hint: (ii) For simplicity assume S = {(x, y, z) ∈ R3 : x2 + y 2 + z 2 = r2 }. Write S = S1 ∪ S2 , where
S1 is the upper half and S2 is the lower half of S. Let P, Q be as in part (i). Since the common
∂P ∂P
boundary ∂S1 ∩ ∂S2 is parametrized in opposite directions by P and Q, and since × and
∂x ∂y
∂Q ∂Q
× give outward normals to S1 and S2 respectively, we may extend Stokes’ theorem to S
∂x ∂y ∫ ∫ ∫ ∫ ∫ ∫ ∫
for the function g and conclude S f = S1 f + S2 f = P curlg + Q curlg = ∂S1 g + ∂S2 g = 0.
Let U ⊂ R3 be open, and f : U → R3 be a C 1 -function. By [146](ii), a necessary condition for

the existence of a C 2 -function g : U → R3 with f = curlg is divf ≡ 0. The following example
shows that this condition is not sufficient.
(x, y, z)
Example: Let U = R3 \ {0}, and f : U → R3 be f (x, y, z) = . We see that
(x2 + y 2 + z 2 )3/2
∂f1 y 2 + z 2 − 2x2 ∂f2 z 2 + x2 − 2y 2 ∂f3 x2 + y 2 − 2z 2
= 2 , = , and = . Hence f is a
∂x (x + y 2 + z 2 )3/2 ∂y (x2 + y 2 + z 2 )3/2 ∂z (x2 + y 2 + z 2 )3/2
∂f1 ∂f2 ∂f3
C 1 -function, and divf = + + ≡ 0.
∂x ∂y ∂z
∫
To prove f ̸= curlg for any g, it suffices to show in view of Exercise-50(ii) that S f ̸= 0
∫
for the unit sphere S ⊂ R3 , where the integral ∂S f is taken with respect to the outward unit
normal of S. Let S1 and S2 be respectively the upper and lower halves of the unit sphere S.
√
Let X = {(x, y) ∈ R2 : x2 + y 2 ≤ 1}, and P, Q : X → U be P (x, y) = (x, y, 1 − x2 − y 2 )
√
and Q(x, y) = (x, −y, − 1 − x2 − y 2 ). Then P parametrizes S1 , and Q parametrizes S2 . Using
the polar coordinates (x, y) = (r cos θ, r sin θ), and the Change of variable therem (where the
∫ ∫ ∂P ∂P
determinant of the Jacobian is r), we see P f = X ⟨f ◦ P, × ⟩
∂x ∂y
∫ √ x y ∫ 1
= X ⟨(x, y, 1 − x2 − y 2 ), ( √ ,√ , 1)⟩ = X √
1 − x2 − y 2 1 − x2 − y 2 1 − x2 − y 2
∫ 1 ∫ 2π r ∫1 r ∫1
= 0 0 √ dθdr = 2π 0 √ dr = π 0 t−1/2 dt = 2π (by putting t = 1 − r2 ).
1−r 2 1−r 2
∫ ∫ ∂Q ∂Q
Similarly, Q f = X ⟨f ◦ Q, × ⟩
∂x ∂y
∫ √ x −y ∫ 1
= X ⟨(x, −y, − 1 − x2 − y 2 ), ( √ ,√ , −1)⟩ = X √ = 2π.
1 − x2 − y 2 1 − x2 − y 2 1 − x2 − y 2
∂P ∂P ∂Q ∂Q
Both × and × give outward normals to the unit sphere; see Exercise-50(i). Hence
∂x ∂y ∂x ∂y
∫ ∫ ∫ ∫ ∫
S f = S1 f + S2 f = P f + Q f = 4π ̸= 0. Thus we conclude by Exercise-50(ii) that f ̸= curlg
for any C 2 -function g : U → R3 . (Remark: In this example, the failure of the existence of any g
with curlg = f is essentially due to the fact that the open set U has a ‘hole’).
17. Gauss’ divergence theorem
Recall the interpretation of divergence as measuring expansion/compression of a flow, and think

of compression as negative expansion. Then the divergence theorem of Gauss ([149] below) says
roughly that the net amount of expansion of a flow in a 3-dimensional solid region V is equal to
the net amount of flow out of V through the ‘surface’ ∂V . We will present Gauss’ theorem only
for a special type of solid region (similar to the ‘elementary region’ in Green’s theorem).
Remark: In the theorems of Green, Stokes, and Gauss, we have an equality between an integral
over an n-dimensional region (n = 2 or n = 3) and an integral over the boundary of the region,
In this sense, all these three theorems can be thought of as generalizations of the Fundamental
∫b
theorem of calculus (which says a f (x)dx = F (b) − F (a) if F ′ = f and f is continuous).
Definition: We say V ⊂ R3 is an elementary solid if there are

(i) compact connected sets X, X, e X b ⊂ R2 bounded by piecewise C 1 simple closed paths, and
e ψe : X
(ii) C 1 -functions ϕ, ψ : X → R, ϕ, b ψb : X
e → R, and ϕ, b → R with ϕ < ψ on int(X), ϕe < ψe on
e and ϕb < ψb on int(X)
int(X), b such that
V = {(x, y, z) ∈ R3 : (x, y) ∈ X and ϕ(x, y) ≤ z ≤ ψ(x, y)}
e z) ≤ y ≤ ψ(x,
e and ϕ(x,
= {(x, y, z) ∈ R3 : (x, z) ∈ X e z)}
= {(x, y, z) ∈ R3 : (y, z) ∈ X b z) ≤ x ≤ ψ(y,
b and ϕ(y, b z)}. Note that when this holds, the
e ψ,
graphs of ϕ, ψ, ϕ. e ϕ,
b ψb are (images of) parametric surfaces, and V is the region between pairs of
such surfaces. Examples of elementary solids are 3-boxes, solid spheres, etc.
[149] [Gauss’ divergence theorem] Let U ⊂ R3 be open, f : U → R3 be a C 1 -function, and V ⊂ U

∫ ∫ ∫
be an elementary solid. Then V divf = ∂V f , where the ‘surface integral’ ∂V f is taken with
respect to the outward unit normal to ∂V .
Proof. Write f = (f1 , f2 , f3 ). Then f = (f1 , 0, 0) + (0, f2 , 0) + (0, 0, f3 ). Since div and the integrals
are linear operators, it is enough to prove the result for each of the three functions (f1 , 0, 0),
(0, f2 , 0), (0, 0, f3 ) separately. We will prove the result for the function (0, 0, f3 ); and the proofs for
the other two functions are similar. So assume f = (0, 0, f3 ) for the rest of the proof.
As per definition, the elementary solid V has three representations, out of which choose the
following representation: V = {(x, y, z) ∈ R3 : (x, y) ∈ X and ϕ(x, y) ≤ z ≤ ψ(x, y)}, where
X ⊂ R2 is a compact connected set bounded by piecewise C 1 simple closed path, and ϕ, ψ : X → R
are C 1 -functions with ϕ < ψ on int(X). Since f = (0, 0, f3 ), we get by Fubini’s theorem that
∫ ∫ ∂f3 ∫ ∫ ψ(x,y) ∂f3 ∫
V divf = V ∂z = (x,y)∈X ( ϕ(x,y) ∂z dz)d(x, y) = X (f3 (x, y, ψ(x, y)) − f3 (x, y, ϕ(x, y))) (*)
Let P, Q : X → U be P (x, y) = (x, y, ψ(x, y)) and Q(x, y) = (x, y, ϕ(x, y)). Then V is the region
between the images of the parametric surfaces P (upper part) and Q (lower part). We may write
∂V = P (X) ∪ S ∪ Q(X), where S is the part of ∂V between P (X) and Q(X). Note the following:
∂P ∂P ∂ψ ∂ψ
(i) × = (− , − , 1), which is an outward normal to the upper part P (X) of ∂V because
∂x ∂y ∂x ∂y
the z-coordinate is positive (upward).
∂Q ∂Q ∂ϕ ∂ϕ
(ii) × = (− , − , 1), which is an inward normal to the lower part Q(X) of ∂V because
∂x ∂y ∂x ∂y
the z-coordinate is positive (upward).
(iii) Any outward normal to the ‘middle part’ S of ∂V is parallel to the xy-plane and hence has
∫
the z-coordinate zero. Consequently, S f = 0 because f = (0, 0, f3 ) by assumption.
∫
By the above observations, the value of ∂V f with respect to the outward unit normal is:
∫ ∫ ∫ ∫ ∂P ∂P ∫ ∂Q ∂Q ∫ ∫
∂V f= P f− Qf = X ⟨f ◦P, × ⟩− X ⟨f ◦Q, × ⟩ = X ((f3 ◦P )×1)− X ((f3 ◦Q)×1)
∂x ∂y ∂x ∂y
∫ ∫
= X (f3 (x, y, ψ(x, y)) − f3 (x, y, ϕ(x, y))) = V divf by (*).
Remark: Gauss’ divergence theorem can be extended to more general 3-dimensional solid regions
which are formed by ‘pasting together’ finitely many elementary solids V1 , . . . , Vk provided on any
shared boundary of Vi and Vj (for i ̸= j), the outward unit normals are in opposite directions (so
that the respective parts of surface integrals over ∂Vi and ∂Vj cancel each other).
Exercise-51: Derive ‘div(curl) ≡ 0’ using Gauss’s divergence theorem. [Hint: Let U ⊂ R3 be open,
and f : U → R3 be a C 1 -function, and assume f = curlg for some g : U → R3 . If (divf )(u) ̸= 0
for some u ∈ U , assume (divf )(u) > 0. Choose δ > 0 and a small solid sphere V ⊂ U centered at
∫ ∫ ∫
u such that divf ≥ δ in V . Then V divf ≥ δµ(V ) > 0. Hence by [149], ∂V f = V f > 0. On the
∫
other hand, ∂V f = 0 by Exercise-50(ii), a contradiction.]
[150] [A coordinate-free expression for ‘div’] Let U ⊂ R3 be open, f : U → R3 be a C 1 -function,

1 ∫ 3 ∫
and w ∈ U . Then (divf )(w) = limr→0+ ∂B(w,r) f = limr→0+ 4πr 3 ∂B(w,r) f .
volume(B(w, r))
Proof. Let δ0 > 0 be with B(w, δ0 ) ⊂ U , and Mr = µ(B(w, r)) for r > 0. Consider ε > 0. We
1 ∫
have to find δ ∈ (0, δ0 ) such that |(divf )(w) − f | < ε for every r ∈ (0, δ). As f is a C 1 -
Mr ∂B(w,r)
function, divf is continuous, and so there is δ ∈ (0, δ0 ) such that |(divf )(u)−(divf )(w)| < ε/2 when-
1 ∫
ever u ∈ B(w, δ). Now consider any r ∈ (0, δ). We may write (divf )(w) as (divf )(w).
Mr B(w,r)
∫ ∫ 1 ∫
Moreover, ∂B(w,r) f = B(w,r) divf by [149]. Hence |(divf )(w) − f|
Mr ∂B(w,r)
1 ∫ 1 ∫
≤ B(w,r)
|(divf )(w) − (divf )(u)|du ≤ (ε/2) = ε/2 < ε.
Mr Mr B(w,r)
Remark: Similarly, ‘curl’ is independent of coordinates: recall the explanation given earlier in terms
of circulation density.
*****

TKSM Multivariable Calculus

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TKSM Multivariable Calculus

Uploaded by

Copyright:

Available Formats

MULTIVARIABLE CALCULUS

1. A few remarks about Rn 2

8. J.R. Munkres, Analysis on Manifolds, 1991.

1. A few remarks about Rn

Recall the following from Linear Algebra:

Deﬁnition: Let K = R or C, and X be a vector space over K. A map ∥ · ∥ : X → [0, ∞) is called a

(iv) Let A ⊂ Rn . Then A is compact ⇔ A is closed and bounded in Rn ⇔ A is sequentially compact

Notation: When K = R or C, let e1 , . . . , en be the standard basis vectors in Kn , where ej has 1 at

Exercise-2: (i) For a function f : Rn → R, we have that f : (Rn , dE ) → (R, dE ) is continuous ⇔

Deﬁnition: Let K = R or C, and X be a vector space over K. A function ⟨·, ·⟩ : X × X → K is said

Exercise-3: Let ⟨·, ·⟩ be the standard inner product on Rn .

A little bit of geometric visualization is helpful in grasping various concepts in Multivariable

Exercise-5: The graph of a function f : Rn → Rk is the subset {(x, f (x)) : x ∈ Rn } of Rn+k .

Remark: f : Rn → Rm need not be continuous even if it is continuous in each variable separately .

2. Multivariable differentiation: definitions

Deﬁnition: Let U ⊂ Rn be open, f : U → Rm be a function, and a ∈ U .

∥f (x) − f (a) − L(x − a)∥

Deﬁnition: Let U ⊂ Rn be open, f : U → Rm be a function, and a ∈ U . We say f is diﬀerentiable

Example: (i) If f : Rn → Rm is a constant map f ≡ c, then clearly f is diﬀerentiable with

Remark: We emphasize that the total derivative at a ∈ Rn of a diﬀerentiable function f : Rn → Rm

Remark: Let U ⊂ Rn be open, and f : U → Rm be diﬀerentiable at a ∈ U with ∇f (a) ̸= 0. Let S =

In the one-dimensional case, diﬀerentiability of a function is characterized in terms of the so

[103] Let U ⊂ Rn be open, f = (f1 , . . . , fm ) : U → Rm be a function, and a ∈ U . Then the

1See my notes Real Analysis.

[104] [Caratheodory lemma for multivariable real-valued functions] Let U ⊂ Rn be open, f : U → R

(f (x) − f (a) − ⟨∇f (a), x − a⟩)

To generalize [104] to vector-valued functions, we need a little preparation.

Exercise-7: Let L(Rn , Rm ) = {L : Rn → Rm : L is a linear map}.

[105] [Caratheodory lemma for vector-valued functions] Let U ⊂ Rn be open, f : U → Rm be a

Proof. Write f = (f1 , . . . , fm ). (i) ⇒ (ii): If f is diﬀerentiable at a, then each fi is diﬀerentiable

a, it follows that Φ is continuous at a.

[106] Let U ⊂ Rn be open, f = (f1 , . . . , fm ) : U → Rm be a function, and a ∈ U . If all the partial

3. Multivariable differentiation: properties

[107] [Basic properties] Let U ⊂ Rn be open, and a ∈ U .

Exercise-8: Let f : Rn → R be a polynomial in n variables, i.e., f has the form f (x1 , . . . , xn ) =

[108] (i) [Chain rule] Let U ⊂ Rn and V ⊂ Rm be open, f : U → V and g : V → Rk be functions,

(ii) [n = m = 2 and k = 1] Let f : R2 → R2 be f (r, θ) = (r cos θ, r sin θ), and g : R2 →

Exercise-10: (i) If f : Rn → Rm \ {0} is diﬀerentiable is at a ∈ Rn , then h : Rn → R deﬁned

Exercise-11: Let U ⊂ Rn be open and connected.

Exercise-12: Let U ⊂ Rn be open, and f : U → R be diﬀerentiable. Suppose f has either a local

Exercise-13: Let f : Rn → R be diﬀerentiable.

4. Higher order partial derivatives

Deﬁnition: Let U ⊂ Rn be open and f : U → Rm be a function. For k ∈ N, we say f is a

Remark: Let U ⊂ Rn be open and f : U → Rm be a function. Since we have the identiﬁcation

The following types of results are of importance in Diﬀerential Topology:

Deﬁnition: If v = (v1 , . . . , vn ) ∈ Rn , then we deﬁne the directional derivative operator ⟨v, ∇⟩

Remark: Let U ⊂ Rn be open and f : Rn → R be diﬀerentiable. Then f ′ (a; ·) ∈ L(Rn , R) ∼

Proof. For (i), see a suitable textbook in Linear Algebra.

(ii) Let λj ∈ R and uj ∈ Rn be as in (i). If A is positive deﬁnite, then λj = λj ⟨uj , uj ⟩ =

Deﬁnition: Let U ⊂ Rn be open, f : U → R be diﬀerentiable, and a ∈ U . We say a is a critical

Example: Let f : R2 → R be f (x, y) = x2 − y 2 . Then (0, 0) is a critical point of f because

Exercise-14: Let U ⊂ Rn be open, f : U → R be a C 2 -function, and a ∈ U be a critical point

[114] [Test for local extrema] Let U ⊂ Rn be open, f : U → R be a C 2 -function, and a ∈ U be a

Proof. Let S = {v ∈ Rn : ∥v∥ = 1}, which is compact.

(ii) This is similar to the proof of (i), or apply (i) to −f .

5. Inverse function theorem and Implicit function theorem

Exercise-16: [Matrix form of Mean value theorem] Let U ⊂ Rn be open, f = (f1 , . . . , fn ) : U → Rn

[115] [Inverse function theorem] Let U ⊂ Rn be open, f : U → Rn be a C 1 -function, and a ∈ U be

Notation: We write (x, y) ∈ Rn+m to mean x = (x[1 , . . . , xn )]∈ Rn and y = (y1 , . . . , ym ) ∈ Rm . If

Example: (i) Let f : R2 → R be f (x, y) = x + 2y − 3. If f (x, y) = 0, then y = (3 − x)/2.

Let A = {x ∈ Rn : (x, 0) ∈ V }, and g : A → Rm be g(x) = Π(G(x, 0)), where Π : Rn+m → Rm

We mention two examples to point out some subtle aspects of [116]:

(ii) Let f : R2 → R be f (x, y) = x2 − y 3 , and g : R → R be g(x) = (x2 )1/3 . Then f (0, 0) = 0,