Professional Documents
Culture Documents
Multivariable Calculus
January 2018
MATH6060 Multivariable Calculus
The following is general information relating to MATH1060 for the academic year 2018/19.
Lecturer:
The purpose of the course is to introduce you to the techniques of multivariable calculus.
The module builds on the work in MATH1059 Calculus. The module covers four major
areas: multidimensional functions, partial differentials, multiple integrals and line integrals.
The ability to work with functions that depend on multiple inputs and or produce multidi-
mensional outputs is a vital skill irrespective of whether you choose to take later modules in
applied mathematics, operational research, pure mathematics or statistics.
i
ii MATH6060 MULTIVARIABLE CALCULUS
The course will last for all 12 weeks of the semester. Each week there will be
• three lectures,
• a problems class and
• the opportunity to attend two timetabled workshops.
The lecturer of the weekly problems class will go through problems related to the material
in the lectures.
Questions relating to the course which cannot be resolved in lectures or problems classes
may be raised in the workshops with the tutors assigned to them.
You are expected to attend ALL lectures AND the problems class to which you have
been assigned in your personal timetable.
Teaching Materials
The lecture notes contained in this book for this module are “skeletal notes”, i.e., they have
some of the diagrams, text and examples removed. This missing components will be given
to you to copy down as we work through the notes during the lectures. This is designed
to help you to concentrate on understanding key concepts, without the worry of trying to
write everything down. On the other hand, mathematics is a subject that is best learnt by
doing, so the skeletal nature of the notes will also give you experience of working through
example calculations with the lecturer. Naturally, you are encouraged to attend lectures.
The course materials can (only) be found on the University Blackboard system at:
http://blackboard.soton.ac.uk/
At this site you will find electronic copies of the lecture notes, handouts, homework, feedback
sheets (i.e., full worked solutions), past exam papers (with full solutions) as well as videos.
Communication
All course announcements will be made on Blackboard and emails will be issued only to
those enrolled on the site. You can also check your homework and class test grades in the
on-line grade-book there. Students should therefore check the site and their email boxes
regularly.
0.1. GENERAL INFORMATION iii
Assessment
The course has three elements of formal assessment. The final mark will be composed of
composed of:
• 70% from the final exam, which will be in the following format:
– closed book;
– no formula sheet;
– allow the use of the university-approved calculator only;
– answer all questions (4 question of 25 marks each);
– 2 hours in duration.
Homeworks
• What: In week n, the lecturer will announce a set of questions Q(n) based on the ma-
terial that has been covered in lectures. The questions will be listed on the MATH1060
Blackboard site. Only some of the questions will be marked. Feedback will be given
through the mechanism of handing back the marked homeworks, together with model
answers. You are strongly advised to look at the solutions on the feedback sheets as the
questions may be similar in style to those on the final exam. The homeworks can be
discussed, preferably (for logistical reasons) at workshops, or with the course lecturer.
• How: You should ensure that the homework you hand in each week is securely
stapled together. Elegant origami techniques or paper clips should not be used.
This is for your own benefit: with the number of students taking the course it is
extremely easy to lose loose work during the marking process. A stapler is provided
next to the hand-in trays.
• Who with: Whilst you are encouraged to discuss the course together for mutual
educational benefit, you are reminded of the existence of the plagiarism policy of
the University concerning submitted work. Details of the Academic Integrity policy,
regulations and penalties may be found here:
http://www.southampton.ac.uk/quality/assessment/academic_integrity.page
iv MATH6060 MULTIVARIABLE CALCULUS
Course Test
A class test (worth 10% of your final mark) will take place mid-way during the module.
The test will be multiple choice in format and closed book, with no formula sheet. Marks
will not be deducted for incorrect answers.
If a student does not attend the test (even if they know they will be absent beforehand),
a mark of 0 will be recorded by the module lecturer. The student will need to fill out a
special considerations form, saying why they were unable to attend the test. The relevant
special considerations panel will decide whether the circumstances raised by the student are
sufficient to warrant the test being discounted, and will (in liaison with the module lecturer)
decide the appropriate adjustment to be made to the module mark in these cases.
Feedback
• General written feedback (i.e., line-by line model solutions) will be supplied electroni-
cally on Blackboard for all homework problems.
• Specific written feedback may be given on your script by the markers of the homeworks.
• Verbal feedback may be obtained in person from tutors assigned to the weekly work-
shops.
• Written feedback on the performance of the cohort in the final exam will be posted to
Blackboard.
Quantifiers
∀ – for all
∃ – there exists
Terminology
Sets
R – real numbers √
(rational numbers and irrational numbers e.g. π = 3.14159 . . . , 2 = 1.41421 . . . )
√
C – complex numbers a + ib where a, b are real and i = −1
⊆ – is a subset of subset (is contained in, meaning one set is inside another)
– is a subset of proper subset (is strictly contained in, meaning the sets are not equal)
vi MATH6060 MULTIVARIABLE CALCULUS
In this module and throughout mathematics you will encounter numerous Greek letters.
Here is a table so that you know what they all are and how they are called.
A B Γ ∆ E Z
α β γ δ or ε ζ
Alpha Beta Gamma Delta Epsilon Zeta
H Θ I K Λ M
η θ ι κ λ µ
Eta Theta Iota Kappa Lambda Mu
N Ξ O Π P Σ
ν ξ o π ρ σ
Nu Xi Omicron Pi Rho Sigma
T Y Φ X Ψ Ω
τ υ φ or ϕ χ ψ ω
Tau Upsilon Phi Chi Psi Omega
Contents
1 Multivariable Functions 1
1.3 Contours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
i
ii CONTENTS
1.8 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2 Multiple Integration 93
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Multivariable Functions
All your previous work has been on functions of one variable, for example
f (x) = ex + x2
f : x → y = f (x)
y y = f (x)
f (x)
x
O x
In this module we will generalise this to allow functions between finite real dimensional
(Euclidean) spaces Rn and Rm for integer n, m ≥ 1, where R is the set of all real numbers.
1
2 CHAPTER 1. MULTIVARIABLE FUNCTIONS
The set D is called the domain of F and the set of all points F (x1 , . . . , xn ) obtained from
the domain is called the range of F .
Example: Suppose p
f (x, y) = 4 − x2 − y 2 .
If we restrict ourselves to only real values of the square root, the domain D is the disk of
radius 2 centred at the origin:
D = {(x, y) ∈ R2 | x2 + y 2 6 4}.
x2 + y 2 4
D ) 4 x2 y2 0
x p
so if z = 4 x2 y2
) z2R
p
Figure 1.2: Domain of z = 4 − x2 − y 2
Example: Suppose
1
f (x, y) = .
x2 + y2
The function is well defined (i.e., has finite real values) for all points except (x, y) = (0, 0),
where the function has no finite value. Hence the domain D is
Example: If x is the radius of a right cylinder and y is its height, the volume is given
by a function F : (x, y) ∈ R2 → z ∈ R such that
z = πx2 y.
Clearly in the context of radius and height measurements, x and y are both here positive,
so the domain D is D = {(x, y) ∈ R2 | x, y ≥ 0}. However the formula for z is valid for
1.1. FUNCTIONS OF MORE THAN ONE VARIABLE 3
all real values of x and y, so, in a mathematical context, we are able to drop reference to a
subdomain of R2
y
t = 9⇡/2 15
10
t -10 -5
t=0 5 10
x
t=0 t = 9⇡/2
-5
|{z} -10
R R2
Figure 1.3: Plot of F : R → R2 curve with F (t) = (t cos(t), t sin(t)) (for part of the domain
of R).
Example 1.1.1. If you take a Newtonian classical view of the world (see MATH1057) you
may specify a particle at any point in time t by its spatial position in 3-D space (x, y, z)
relative to some origin and its associated momenta (= mass × velocity) in each of three
directions (px , py , pz ).
To simplify matters (and because humans can only visualise easily up to three dimen-
sions), many of the examples in this module will focus on R2 → R, or R3 → R.
4 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Explicit Function
More generally an explicit function is one where we have an explicit description of how to
calculate the value of f (x) given a value of the input x.
Note that sometimes this does not mean that there is an obvious formula. For example,
consider the greatest integer function, where we set bxc to be the largest integer that is less
than or equal to x. We have an explicit description for bxc given x from which we can
calculate the function but no formula (e.g., given an input x = 25.46, the output bxc is
obviously 25).
Implicit Function
A simple example comes from the equation of a circle in the plane with centre the origin
and radius 1, namely x2 +y 2 = 1. Here, we cannot solve for y as a function of x in a √way that
is valid for all appropriate values of x; specifically, if we solve for y, we get y = ± x2 − 1,
and so for each −1 < x < 1, there are two possible values of y and this violates the definition
of function as discussed (albeit briefly) above.
The same loose split of functions between implicit functions and explicit functions holds
for functions of more than one variable just as it does for functions of one variable.
Graphically we can represent y = f (x) by a set of points (x, f (x)) in the (x, y) plane.
y y = f (x)
f (x)
x
O x
By contrast z = f (x, y) can represent the height z of a surface above the (x, y) plane in
the (x, y, z) space
Figure 1.5: Plot of a multivariate function where the height of the function is z = f (x, y)
and is given by f (x, y) = sin(x) + sin(y)
f (x, y) = 1 − x − y, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1.
z=1 x y z
(0, 0, 1)
z=1 x z=1 y
(y = 0) (x = 0)
(1, 0, 0) (0, 1, 0)
x y
0=1 x y
(z = 0)
p
f (x, y) = 9 − x2 − y 2 ,
D = {(x, y) | x2 + y 2 6 9},
p
Now if we set z = f (x, y), then we have to plot the function z = 9 − x2 − y 2 . Squaring
both sides and rearranging, we have
x2 + y 2 + z 2 = 9.
This says that the square of the distance from the origin of all the points on this surface in
(x, y, z) space must be 9. Hence this is the equation of the surface of a sphere of radius 3.
p
Now reverting back to the actual equation z = 9 − x2 − y 2 , if we assume that the
square root function is positive only, then f (x, y) represents the surface of a hemisphere,
centred the origin, radius 3 above, the (x, y) plane.
1.2. GRAPHICAL REPRESENTATIONS OF MULTIVARIABLE FUNCTIONS 7
z p
z= 9 x2 y2
O
y
x
p
Figure 1.7: z = 9 − x2 − y 2 where the positive square root is taken.
O
y
x p
z= 9 x2 y2
p
Figure 1.8: z = − 9 − x2 − y 2 where the negative square root has been taken.
Graphs in Rn , n > 3
All the examples above are R2 → R. We were able to plot these as 2D surfaces in 2+1=3D
space.
Consider a function
w = f (x, y, z) = x2 + y 2 + z 2
This is a mapping R3 → R. If we wanted to draw both the domain and range on the
same plot, we would need to be able to visualise a 3 dimensional hypersurface in 3+1=4
dimensional space. Clearly this is not possible (in our concept of the world!).
8 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Hence at best we might need to consider only partial representations of the graph. For
example, we might consider what the surface w = 9 looks like and plotting it in 4 − 1 = 3
dimensional space (x, y, z). We could then re-plot the graph for different chosen values of w.
w = x2 + y 2 + z 2 = 9
Hence, to avoid brain overload, when we want to visualise graphs, we shall henceforth
largely stick to mappings of R2 → R.
1.3 Contours
An alternative way of representing surfaces of the form z = f (x, y) is to use a contour plot.
This is exactly what is used to represent the height of the ground on an Ordnance Survey
map1 .
The following diagram is based on data from Mathematica and depicts a relief and
contour plot of Schielhallion. Schielhallion is a mountain in Perthshire in Scotland. Due to
its distinctive shape, size and (lack of) proximity to other hills/mountains, it was chosen
(initially by by the astronomer the Reverend Nevil Maskylene) in the 1770-1790s to verify
the mean density of the earth . Those experiments required the calculation of the volume of
the mountain, which in turn required the development of two vital mathematical techniques.
Contour lines were invented (by Charles Hutton) to map out the level heights around the
mountain so that it could be approximated as a series of column-like structures. The volume
of these columns was calculated by the new technique of numerical integration.2
1
the OS are actually based in Southampton https://www.ordnancesurvey.co.uk/
2
You can read more about the mathematical aspects of the experiment here. Details of the theory
underpinning the calculation can be found here. The original experiment was accurate to around 20% of
the currently accepted value for the density of the earth. Subsequent experiments on the same mountain
achieve estimates within 1%.
1.3. CONTOURS 9
���
���
���
��
�
� �� ��� ��� ��� ��� ��� ���
Figure 1.10: Left: Relief plot of Schielhallion. Right: Contour plot of the same region.
Definition: (Contour): A contour (or level set) of a graph of z = f (x, y) is the set of
points (x, y) ∈ R2 satisfying f (x, y) = c for a chosen c ∈ R:
In words, the contour (or level set of height c) is the set of points in the domain at which
the function takes the value c.
One immediate consequence of this is that level sets of different heights are necessarily
disjoint: by the definition of function, there cannot exist a point at which the function takes
on two different z-values simultaneously.
A contour plot is then the union of the contours for different values of c ∈ R. It is a 2-D
plot of a 3-D surface.
f (x, y) = 1 − x − y
• Hence we have c = 1 − x − y.
• These are straight lines in the (x, y) plane of slope -1 and y-intercept 1 − c.
z
y
(0, 0, 1) 1 x y=c (0, 1)
c decreasing
(0, 1, 0)
y y=1 x
c=0
O x
(1, 0, 0) (1, 0)
c=1
Figure 1.11: Level height contours of z = 1 − x − y in the positive (x, y)-plane (and plotted
on the surface in the positive octant). These are curves where z = c where c is a constant,
i.e., 1 − x − y = c, or y = −x + 1 − c.
In practice we would only draw the 2-D plot on the right, since the whole point is to
avoid having to cope with drawing 3-D surfaces. However in these first examples, we plot
both the contours on the 3-D surface and in the 2-D (x, y) plane, just to give an idea of what
the contours are and how they relate to the surface.
f (x, y) = x2 + y 2
• Hence we have c = x2 + y 2 .
y
z c increasing
z = x2 + y 2 x2 + y 2 = c
x
O
O y
• Note from the analysis above, the function f (x, y) = x2 + y 2 clearly has a local mini-
mum value at (0, 0). If you walk away from the origin in whichever direction on the
surface z = x2 + y 2 , you will always walk uphill.
• Clearly if you consider the function f (x, y) = −(x2 +y 2 ), you will have similar shapes of
contours, but taking the corresponding negative values of c. Hence, for that function,
the origin is a maximum. If you walk away from the origin in whichever direction on
the surface z = −(x2 + y 2 ), you will always walk downhill.
y c<0
z c decreasing
z= (x2 + y 2 )
O
x
y O
x
x2 + y 2 = c
Figure 1.13: Level height contours of z = −(x2 + y 2 ). These are curves where z = c where c
is a constant, i.e., x2 + y 2 = −c, but clearly√c < 0 and decreases as x2 + y 2 increases. They
are still circles, centre the origin, of radius −c.
Remark: If the contour levels c are equally spaced, clearly the closer the contour lines
are together on the (x, y) plot, the “steeper” the surface is in that direction.
• Hence we have c = x2 − y 2 .
• For c = 0 we have x2 = y 2 or y = ±x. These are straight lines through the origin.
c=0 c=0
yc < 0
z
c>0 c>0
x
x
x2 y2 = c c<0
Figure 1.14: Level height contours of z = x2 − y 2 . These are still curves where z = c where c
is a constant, i.e., x2 −y 2 = c. These are (for c 6= 0) hyperbolae. The contours corresponding
to c = 0 are the lines y = ±x. The regions y < |x| have c > 0, the regions y > |x| have
c < 0.
• This is an example of a simple saddle point. The region around the origin is separated
into four quadrants.
• Walking away from the origin on the surface z = x2 − y 2 into two of these quadrants
(|y| > |x|) will result in you going downhill.
• Walking away from the origin on the surface z = x2 − y 2 into the other two quadrants
(|y| < |x|) will result in you walking uphill.
• Walking away from the origin on the surface z = x2 − y 2 on the lines y = ±x and you
will keep the same height of 0 and neither walk up nor downhill.
More complicated contour plots can be drawn using computer packages, e.g., Maple,
Mathematica or Matlab. Further examples are available on Blackboard.
1.4. LIMITS AND CONTINUITY 13
lim f (x)
x→a
(i) every neighbourhood of the point (a, b) contains a point of the domain of f different
from (a, b), and
(ii) for every ε > 0, there exists δ > 0 such that if (x, y) is in the domain and satisfies
p
0< (x − a)2 + (y − b)2 < δ,
(a, b)
O x
Figure 1.15: ∀ (x, y) in this disk, we we must have |(f (x, y) − L| ≤ if the limit of f at (a, b)
exists.
In words this means that for a limit to exist at (a, b), the value of the function f (x, y)
must approach the same value L as (x, y) → (a, b) from whichever route you take.
Note that as (x, y) tends to (0, 0), the function approaches the value 7. So, we shall try to
prove that this is the limit. The following steps gives the standard algorithm for proving the
limit.
• The domain is the (x, y) plane. So every neighbourhood of the point (0, 0) is in the
domain of f (condition (i) above).
p
• We need to show that for any ε > 0, there exists δ > 0 such that if 0 < x2 + y 2 < δ,
then
|x2 + y 2 + 7 − 7| < ε ⇒ |x2 + y 2 | < ε
√
• Obviously, if we now set δ = ε for any ε > 0 we have found the δ > 0 to satisfy the
inequality condition (ii) in the definition above.
• Hence
lim (x2 + y 2 + 7) = 7.
(x,y)→(0,0)
This looks like a rather obvious answer. In the next example, the value of the limit at the
limit point is not so obvious.
1.4. LIMITS AND CONTINUITY 15
• The domain is the whole (x, y)-plane, excluding the origin, where the functional form
is 0/0, so ambiguous.
• A shortcut that you can often deploy in these types of examples is to convert to polar
coordinates: x = r cos θ, y = r sin θ. For then, since | cos θ| ≤ 1 and | sin θ| ≤ 1 and
r ≥ 0, we have
2 3
xy r cos2 θ sin θ
− 0= = r cos2 θ sin θ < |r| = r
x2 + y 2 r2
2
xy
⇒ − 0<r
x2 + y 2
2
xy p
⇒ − 0 < x2 + y 2
x2 + y 2
• Hence comparing with the definition of the limit above, if we see thatpif there has to
be a δ > 0 such that the LHS has to be less than > 0 whenever 0 < x2 + y 2 < δ, a
natural choice is δ = .
The above formal ε, δ definition is usually too cumbersome to use in practice. Fortunately,
to save a lot of pain, the following lemma holds. It speeds up limit-finding when functions
are well behaved (proof of the lemma is not required).
16 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Then
(iv) Given a function h : R → R that is continuous at L ∈ R, then lim h(f (x, y)) =
(x,y)→(a,b)
h(L).
• The domain contains all points of the (x, y)-plane, so the function is expected to exist
at the limit point. Also we then have neighbourhood of the limit point that is in the
domain. Therefore we expect that a limit may exist.
• f is composed of the sum of the functions x2 = x × x, y = x × y and 5.
• We start with the obvious results that
lim x = 1, lim y=0
(x,y)→(1,0) (x,y)→(1,0)
This means that the limits exist for the basic building blocks of the function. We can
now try to apply Lemma 1.4.1 as follows.
• We can say:
2
lim x = lim (x × x) = lim x × lim x =1×1=1
(x,y)→(1,0) (x,y)→(1,0) (x,y)→(1,0) (x,y)→(1,0)
| {z }
(ii) of Lemma 1.4.1
lim xy = lim (x × y) = lim x × lim y =1×0=0
(x,y)→(1,0) (x,y)→(1,0) (x,y)→(1,0) (x,y)→(1,0)
| {z }
(ii) of Lemma 1.4.1
lim 5=5
(x,y)→(1,0)
3x
Example: Find the limit of f (x, y) = as (x, y) approaches (1, 1).
x + 4y
• The locus of points where the function is not defined is the straight line y = −x/4.
This line passes nowhere near the limit point. Hence, we then have neighbourhood of
the limit point that is in the domain.
• So the function is expected to exist at the limit point. We can now try to apply Lemma
1.4.1 as follows.
• We obviously have:
lim x = 1, lim y=1
(x,y)→(1,1) (x,y)→(1,1)
lim 3x = 3 × lim x = 3 × 1 = 3
(x,y)→(1,1) (x,y)→(1,1)
| {z }
(ii) of Lemma 1.4.1
3x lim(x,y)→(1,1) 3x 3
lim = =
(x,y)→(1,1) x + 4y lim(x,y)→(1,1) x + 4y 5
| {z }
(iii) of Lemma 1.4.1
Sometimes, you have to decide whether the limit exists or not. The following examples
illustrate how the requirement that the f (x, y) approaches the same limit, no matter how
the limit point is approached, is actually incredibly restrictive.
Example: If
x2 /(x2 + y 2 ) (x, y) 6= 0
f (x, y) =
1 (x, y) = 0
does lim(x,y)→(0,0) exist?
• The functional form x2 /(x2 + y 2 ) is not defined when (x, y) = (0, 0), since it is of the
form 0/0 (and we can’t use Lemma 2.4.1 part iii, since M = 0).
• However the function is defined to have the value 1 at the limit point.
18 CHAPTER 1. MULTIVARIABLE FUNCTIONS
• Hence the function f is defined for the entire (x, y) plane, so it makes sense to ask if
a limit exists.
• The key thing we have to test is whether the limit is the same regardless of how
we approach the limit point. We proceed as follows.
x2
f (x, 0) = =1 ⇒ lim f (x, 0) = 1 = f (0, 0)
x2 + 0 2 x→0
This suggests that the limit may exist and be 1. However we need to check every
possible approach!
02
f (0, y) = 2 =0 ⇒ lim f (0, y) = 0
0 + y2 y→0
• Hence
lim f (0, y) 6= lim f (x, 0)
y→0 x→0
and so the limit depends on how the limit point is approached, so the limit is not
unique and so the limit does not exist!
• In fact, consider an approach to the limit point at the origin along the straight lines
y = αx. Inserting this into the function we have:
x2 1
f (x, αx) = 2 2 2
= , x 6= 0
x +α x 1 + α2
Hence if we approach along any straight line, the limit will be different, and so does
not exist. In practice it suffices to show that there is ambiguity between just two
different ways of approaching.
• A graph of the function is shown below, which helps to explain the ambiguity in the
limit. Basically, the surface is vertical at the origin. Any curve you chose to take to
walk towards the origin will therefore intersect at different heights, depending on how
it enters the origin.
1.4. LIMITS AND CONTINUITY 19
z
z
x
x
y y
2
x 1
z = f (x, y) = z = f (x, ↵x) =
x2 + y2 1 + ↵2
Figure 1.16: Surface z = f (x, y) = x2 /(x2 + y 2 ) (left). Values of function f (x, αx) along the
lines y = αx (right), which take different values α2 /(1 + α2 ) at the origin (x, y) = (0, 0),
depending on the value of α.
This explains why the limit does not exist in this case. The definition of a limit above
basically says that the value of a function must be unique at a point. Clearly in this case
the function can take any value you like at (x, y) = (0, 0) since it is vertical there.
This next example shows the danger of relying on approaches to the limit point along
straight lines to determine whether a limit exists or not.
2x2 y
f (x, y) =
x4 + y 2
Investigate the limiting behaviour as (x, y) → (0, 0). We can’t use Lemma 2.4.1 since at
the limit point we have 0/0, i.e., M = 0.
• Suppose we consider the behaviour of the function along any path with y = αx. We
have
2x2 αx 2αx3 2αx
f (x, αx) = 4 2 2
= 2 2 2
= 2
x +α x x (x + α ) (x + α2 )
Hence approaching the limit point along any straight line (α 6= 0) into the origin has
2αx 0
lim f (x, αx) = lim = =0
x→0 x→0 (x2 + α2 ) α2
• However, now consider approaching the limit point along the curve y = x2 . Then we
have
2x2 × x2 2x4
f (x, x2 ) = 4 = = 1.
x + (x2 )2 x4 (1 + 1)
Hence approaching the limit point along any y = x2 into the origin has
• Hence there is not a unique limit! This underlines the fact that for the limit to exist
formally, it must be the same, no matter how the limit point is approached (or how
complicated the path you take to approach the limit point).
• What is actually going on with the surface near to (x, y) = (0, 0) is illustrated below.
The surface is vertical at the origin, but folded around the curve y = x2 .
z z = f (x, x2 ) = 1
2
lim f (0, y) = 0 lim f (x, x2 ) = 2
y!0 x!0 1+
Figure 1.17: 3-D plot of z = f (x, y) = 2x2 y/(x4 + y 2 ). If we approach the origin along,
say x = 0, limx→0 f (0, y) = 0. However, if we approach the origin along the curve y = x2
we find limx→0 f (x, x2 ) = 1. In fact if we approach the origin along any curve y = βx2 , we
find limx→0 f (x, βx2 ) = limx→0 2βx4 /(x4 + β 2 x4 ) = 2β/(1 + β 2 ). There is no unique value of
f (x, y) at the origin (x, y) = (0, 0), since the surface is vertical there.
Clearly finding contradictions based on the choice of path taken to approach a limit point
is a way of disproving the existence of a limit.3
3
Don’t panic! if you have a problem like this in an exam, you will be given a hint.
Partial Derivatives 21
Having defined the concept of a limit, we may now define continuity in multivariable functions
It follows that sums and products of continuous functions are again continuous. That is,
there are analogous statements to parts (i) and (ii) of Lemma 1.4.1 for continuity at a point.
For quotients one needs to proceed with a bit of care as in part (iii) of Lemma 1.4.1.
• x2 + y 2 + 7 is continuous at (0, 0) since the limit is the value of the function there.
• 2x2 y/(x4 + y 2 ) is not continuous at (0, 0) since its value at (0, 0) is not unique.
Given a curve y = F (x) we can work out the gradient (slope) of the curve at each point by
differentiating F . The formal mathematical definition of a derivative of a function of one
variable is
dF F (x + h) − F (x)
Slope = = lim
dx h→0 h
For a smooth function with no kinks, this limit exists and is unique at each point x. There
is only one unique slope at each point of a smooth curve.
Now, suppose you are standing on a hill. What is the slope of the hill at that point? A
little thought should tell you that it is not unique: It depends on the direction in which you
are looking.
For example, you could walk around the hill, on a level contour and keeping the same
height. In that case the slope is zero in the direction you walk at each point. You could also
aim for the summit, in which case you have to go uphill and the gradient will be far from
zero!
A hill can be represented as a function of two variables z = f (x, y) (see figure 3.17)
below.
22 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Figure 1.18: Plot of a multivariate function where the height of the function is z = f (x, y)
and is given by f (x, y) = sin(x) + sin(y)
Hence when we ask for the slope on a hill, this translates to asking: what is meant by
the derivative of a multivariate function?
The standard way to proceed is as follows. All but one of the variables (directions of
walking on the hill) are kept constant and we differentiate with respect to the remaining
one. Usually we pick two orthogonal directions and work out the slope in these directions.
(You will learn in a later course how to find the slope in all other directions from these two).
In two dimensions this corresponds to working out the slope looking at the slope either
As these derivatives only involve differentiation with respect to one of the possible variables,
they are called partial derivatives. They are formally defined as follows:
Definition: (Partial Derivatives) For the function z = f (x, y), the partial derivatives
are defined as:
∂f f (x + h, y) − f (x, y)
Slope in x direction = = lim
∂x y=constant h→0 h
∂f f (x, y + k) − f (x, y)
Slope in y direction = = lim
∂y x=constant
k→0 k
provided that these limits exist.
1.5. PARTIAL DERIVATIVES 23
NB: Note the curly ∂ in the derivative, rather than the usual d. This is to denote that
the derivative is of a multivariate function and so is partial with respect to only one of the
variables, rather than with respect to all of them. It is incorrect to use an ordinary d in a
partial derivative. Practice drawing the ∂!!!
In practice you compute the partial derivatives in the same way as an ordinary derivative,
treating all but the relevant variable as constants. Due to this, very similar rules exist for
differentiating combinations of functions.
Partial derivatives of sums, products, quotients, compositions obey similar results derivatives
involving single variables.
(i) Suppose ∂f
∂x
(x, y) exists. Then for a constant c ∈ R,
∂ ∂ ∂f
(cf (x, y)) = c f (x, y) = c (x, y),
∂x ∂x ∂x
∂ ∂f ∂g
(f (x, y) + g(x, y)) = (x, y) + (x, y),
∂x ∂x ∂x
and similarly for the partial derivative with respect to y.
(iii) Suppose ∂f
∂x
(x, y) and ∂g
∂x
(x, y) exist. For two functions f (x, y) and g(x, y), we have the
product rule:
∂ ∂f ∂g
(f (x, y)g(x, y)) = (x, y) g(x, y) + f (x, y) (x, y) ,
∂x ∂x ∂x
(iv) Suppose ∂f
∂x
(x, y) and ∂g
∂x
(x, y) exist. For two functions f (x, y) and g(x, y), we have the
quotient rule:
∂f
∂g
∂ f (x, y) ∂x
(x, y) g(x, y) − f (x, y) ∂x
(x, y)
= ,
∂x g(x, y) (g(x, y))2
Proof. We only prove part (i) to show that the proofs are analogous to those when the
function has only one variable. Proofs for this lemma are not required.
∂ cf (x + h, y) − cf (x, y)
(cf (x, y)) = lim
∂x h→0
h
f (x + h, y) − f (x, y)
= c lim
h→0 h
∂f
=c (x, y) .
∂x
∂f ∂ ∂ ∂ ∂
= x2 − (1) + (y) + (sin(xy)) = 0 − 0 + 1 + x cos(xy) = 1 + x cos(xy).
∂y ∂y ∂y ∂y ∂y
∂f d d
(x, y) = cos(x) sin(y) + y exp(x) = cos(x) cos(y) + exp(x).
∂y dy dy
∂f x ∂ (cos(y)) − cos(y) ∂x
∂
(x) x × 0 − cos(y) × 1 − cos(y)
(x, y) = ∂x 2
= 2
= .
∂x x x x2
∂ ∂
∂f x ∂y (cos(y)) − cos(y) ∂y (x) x × (− sin y) − cos(y) × 0 − sin(y)
(x, y) = 2
= 2
= .
∂y x x x
1.5. PARTIAL DERIVATIVES 25
When a partial derivative of a function z = f (x, y) is evaluated at a point (x, y) = (a, b),
then you often see the following equivalent notation:
∂z ∂
or f (x, y) or ∂x f (x, y)|(a,b) or zx (a, b)
∂x (a,b) ∂x
(a,b)
Example: For the function f (x, y) = x2 − 1 + y + sin(xy), calculate the slope in the
x-direction and y-direction at the point (x, y) = (1, 2).
∂f
= 2x + y cos(xy)|(x,y)=(1,2) = 2 + 2 cos(2) = 2(1 + cos 2).
∂x (1,2)
∂f
= 1 + x cos(xy)|(x,y)=(1,2) = 1 + cos(2).
∂y (1,2)
We don’t normally use the definition to evaluate the partial derivatives of particular
functions, any more than we use the definition for a function of one variable to calculate the
derivative of that function.
∂ ∂f
(cf (x1 , . . . , xn )) = c (x1 , . . . , xn );
∂xj ∂xj
∂ ∂f ∂g
(f (x1 , . . . , xn ) + g(x1 , . . . , xn )) = (x1 , . . . , xn ) + (x1 , . . . , xn )
∂xj ∂xj ∂xj
for each 1 ≤ j ≤ n;
∂
(f (x1 , . . . , xn )g(x1 , . . . , xn ))
∂xj
∂f ∂g
= (x1 , . . . , xn ) g(x1 , . . . , xn ) + f (x1 , . . . , xn ) (x1 , . . . , xn )
∂xj ∂xj
for each 1 ≤ j ≤ n;
Really the only difference between this and the lemma for 2 variables (above), is the
number of variables in the arguments of the function, and you remember that you treat all
the variables as constants, except the one you are differentiating with respect to.
f (x, y, z) = 5x3 y 2 z − y 2 z 2 + 3x − 4
at (−1, 0, 2).
Recall that when we differentiate with respect to x, we treat all other variables as con-
stants and then differentiate normally. So, to calculate ∂f
∂x
, we treat y and z as constants
and see that
1.5. PARTIAL DERIVATIVES 27
∂f ∂ ∂ 2 2 ∂ ∂
(x, y, z) = 5 (x3 y 2 z) − (y z ) + 3 (x) − (4)
∂x ∂x ∂x ∂x ∂x
= 15x2 y 2 z + 3.
∂f
Similarly, to calculate ∂y
, we treat x and z as constants and see that
∂f ∂ ∂ 2 2 ∂ ∂
(x, y, z) = 5 (x3 y 2 z) − (y z ) + 3 (x) − (4)
∂y ∂y ∂y ∂y ∂y
= 10x3 yz − 2yz 2 .
∂f
Similarly, to calculate ∂y
, we treat x and y as constants and see that
∂f ∂ ∂ ∂ ∂
(x, y, z) = 5 (x3 y 2 z) − (y 2 z 2 ) + 3 (x) − (4)
∂z ∂z ∂z ∂z ∂z
= 5x3 y 2 − 2y 2 z.
∂f ∂f ∂f
Then ∂x
(−1, 0, 2) = 3, ∂y
(−1, 0, 2) = 0, and ∂z
(−1, 0, 2) = 0.
Recall that in the single variable case, if we were given a function z = z(x) where x = x(t),
then we could express the derivative of the composite function z = z(x(t)) with respect to t
using the chain rule:
dz z(x(t + h)) − z(x(t)) z(x(t + h)) − z(x(t)) x(t + h) − x(t) dz dx
= lim = lim = .
dt h→0 h h→0 x(t + h) − x(t) h dx dt
(This follows from the formula for the product of individually well-defined limits.)
What is the analogous situation for multivariable functions? We can illustrate this with
a definite example.
Example: Suppose you are walking in a mountainous region with a map. Your horizontal
coordinates on the map are given by (x, y) and your vertical height above sea level is z =
f (x, y).
Now suppose you take a path so that your horizontal coordinates are given at time t by
(x, y) = (x(t), y(t)). Your height at time t is then given by
28 CHAPTER 1. MULTIVARIABLE FUNCTIONS
z = f (x(t), y(t)).
z = f (x, y)
z(t)
x(t) y(t)
y
x
Figure 1.19: A schematic representation of the motion of a point on the surface z = f (x, y)
that is given when we allow the coordinates of the point (x, y, z) to depend on an additional
variable t so that the x coordinate moves as x = x(t) and y = y(t). The chain rule can then
b used to calculate how the rate of change of z with respect to t may be expressed. For
example if t is time, then dz/dt gives the vertical speed.
dz z(t + h) − z(t)
= lim
dt h→0 h
z(x(t + h), y(t + h)) − z(x(t), y(t))
= lim
h→0 h
z(x(t + h), y(t + h)) + [−z(x(t), y(t + h) + z(x(t), y(t + h))] − z(x(t), y(t))
= lim
h→0 h
z(x(t + h), y(t + h)) − z(x(t), y(t + h)) z(x(t), y(t + h)) − z(x(t), y(t))
= lim + lim
h→0 h h→0 h
∂z dx ∂z dy
= (x(t), y(t)) (t) + (x(t), y(t)) (t).
∂x dt ∂y dt
The last line follows by comparison with the 1-d chain rule above.
4
Note that in this example, when z is expressed in terms of the ultimate independent variables we find
that there is only 1 indepdendent variable. Therefore the final derivative with respect to that variable is a
total one.
1.5. PARTIAL DERIVATIVES 29
Of course the result we have derived here is general and not dependent on the mountain
walk context.
So, the Chain Rule says that if z = f (x, y) and x = x(t) and y = y(t) for differentiable
f, x and y, and we compose the functions as R(t) → R2(x,y) → R(z) , we have the following
result:
Lemma 1.5.3. If z = z(x, y) and x = x(t) and y = y(t) for differentiable f, x and y, we
have
dz ∂z dx ∂z dy
= + . (1.1)
dt ∂x dt ∂y dt
We have
dx dy
= 2t, = et
dt dt
∂z ∂z
= cos(x + y 2 ), = 2y cos(x + y 2 )
∂x ∂y
Hence
dz ∂z dx ∂z dy
= + = 2t cos(x + y 2 ) + 2yet cos(x + y 2 ) = cos(t2 + e2t ) 2t + 2e2t
dt ∂x dt ∂y dt
Note that the last result could have been obtained by directly substituting first for x = t2
and y = et to get z(t) = sin(t2 + et ) and then differentiating z as a simple 1-d function with
respect to t.
So in this type of problem we see that x and y are intermediate variables and t is the
actual independent variable.
Suppose now that we have a composition R2(s,t) → R2(x,y) → R(z) , i.e., we have a function
z = f (x, y), but that we have intermediate variable x = x(s, t) and y = y(s, t) with the
independent variables being now s and t, so that
A derivation of the chain rule (not required) delivers the following results:
30 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Lemma 1.5.4. If z = z(x, y) and x = x(s, t) and y = y(s, t) for differentiable f, x and y,
we have
∂z ∂z ∂x ∂z ∂y ∂z ∂z ∂x ∂z ∂y
= + and = + . (1.2)
∂s ∂x ∂s ∂y ∂s ∂t ∂x ∂t ∂y ∂t
z(x, y) = xey ,
with
x = s + 3t, y = s − 2t.
• We have:
∂z ∂z
= ey , = xey ,
∂x ∂y
∂x ∂x ∂y ∂y
= 1, = 3, = 1, = −2.
∂s ∂t ∂s ∂t
= es−2t (1 + s + 3t);
• and
∂z ∂z ∂x ∂z ∂y
= +
∂t ∂x ∂t ∂y ∂t
∂z ∂z
= 3 −2 = 3ey − 2xey = ey (3 − 2x)
∂x ∂y
= es−2t (3 − 2s − 6t).
1.5. PARTIAL DERIVATIVES 31
We can use the chain rule from equation (1.2) with s = r and t = θ. Then we have
∂z ∂z ∂x ∂z ∂y
= +
∂r ∂x ∂r ∂y ∂r
and
∂z ∂z ∂x ∂z ∂y
= +
∂θ ∂x ∂θ ∂y ∂θ
But
∂x ∂y
= cos θ = sin θ
∂r ∂r
∂x ∂y
= −r sin θ = r cos θ
∂θ ∂θ
Hence the r and θ partial derivatives of z may be expressed in terms of the x and y partial
derivatives of z as
∂z ∂z ∂z
= cos θ + sin θ
∂r ∂x ∂y
and
∂z ∂z ∂z
=− r sin θ + r cos θ
∂θ ∂x ∂y
Note: A convenient way to express the chain rule (1.2), that will be useful later, is to
recognise that it can be written as a matrix multiplication:
∂x ∂x
∂z ∂z ∂z ∂z ∂s ∂t
= (1.3)
∂s ∂t ∂x ∂y ∂y ∂y
∂s ∂t
There is a simplified graphical way to remember the chain rules for arbitrary compositions
R(t1 ,t2 ,...tm ) → R(x1 ,x2 ,...,xn ) · · · → R(z) . For the cases studied above we can construct a chart
of chain links that indicates how all the variables depend on one another.
32 CHAPTER 1. MULTIVARIABLE FUNCTIONS
z(x, y)
@z @z
@x @y
x y
dx dy
dt dt
t t
dz @z dx @z dy
= +
dt @x dt @y dt
Figure 1.20: Chain rule for z = z(x, y), when x = x(t) and y = y(t).
z(x, y)
@z @z
@x @y
x y
x
dz @z @z dy
= +
<latexit sha1_base64="XPTezdOs65fnJt1+XnHI1q523pc=">AAACB3icbZC7SgNBGIX/jbcYb1FLmyFBsAq7NiaFELCxjOCaQBLD7OxsMmT2wsysGJd9AXvBVp/ASsXOx/ABLH0HZ5M0ufwwcDjnDBw+J+JMKtP8MXIrq2vrG/nNwtb2zu5ecf/gRoaxINQmIQ9Fy8GSchZQWzHFaSsSFPsOp01neJHlzTsqJAuDazWKaNfH/YB5jGClrduOJzBJ3Ic0ce/T816xbFbM8aFFYU1FuW4+WaWPr79Gr/jbcUMS+zRQhGMp25YZqW6ChWKE07TQiSWNMBniPk3GW1N0rC0XeaHQL1Bo7M70sC/lyHd008dqIOezzFyaZY6QnlwWtmPlVbsJC6JY0YBMVngxRypEGRXkMkGJ4iMtMBFMz0dkgDUbpdkVNBdrnsKisE8rtYp5pflUYXJ5OIISnIAFZ1CHS2iADQQEPMMLvBqPxpvxbnxOqjlj+ucQZs74/gcnWJ36</latexit>
sha1_base64="0fG7hjphWGlyu0yMa0qERqjN7eY=">AAACB3icbZC7SgNBGIX/9RrjLWppMxgEq7BrYyyEgI1lBNcEkhhmZ2eTIbMXZv4V45IXsBds9QmsxNbH8AF8D2eTNLn8MHA45wwcPi+RQqNt/1orq2vrG5uFreL2zu7efung8F7HqWLcZbGMVdOjmksRcRcFSt5MFKehJ3nDG1zneeORKy3i6A6HCe+EtBeJQDCKxnpoB4qyzH8eZf7T6KpbKtsVe3xkUThTUa7Zrw4BgHq39Nf2Y5aGPEImqdYtx06wk1GFgkk+KrZTzRPKBrTHs/HWETk1lk+CWJkXIRm7Mz0aaj0MPdMMKfb1fJabS7PcUTrQy8JWikG1k4koSZFHbLIiSCXBmORUiC8UZyiHRlCmhJlPWJ8aNmjYFQ0XZ57ConDPK5cV+9bwqcLkCnAMJ3AGDlxADW6gDi4wUPAG7/BhvVif1pf1PamuWNM/RzBz1s8/0E6bmg==</latexit>
dx @x @y dx
Note that on substituting y = cos x into z = f (x, y) we end up with z = f (x, cos x) = z(x)
only. In other words, if we make the substitution for y in the first place, we find that z is
just a function of a single variable x. Hence the final derivative we seek using the chain rule
will just be a total derivative dz/dx.
However, if we just use the chain rule, on the way we do need to differentiate with respect
to each of the x and y variables partially. Hence we will see ∂z/∂x appear within the chain
rule: this just means that we must differentiate with respect to the first of the original
variables x, keeping the other one, y, as a constant. However since y = y(x) only, any
derivatives of y with respect to x will also be total and so we can expect dy/dx to appear
too.
Hence, when we apply the diagram to derive the chain rule, we find:
dz ∂z ∂z dy
= + .
dx ∂x ∂y dx
Thus we have
∂z ∂z
= 2xy, = x2 + 3y 2 .
∂x ∂y
We also have
dy
= − sin x.
dx
So
dz ∂z ∂z dy
= + = 2xy(x) + x2 + 3y(x)2 (− sin x) = 2x cos x − x2 + 3 cos2 x sin x.
dx ∂x ∂y dx
Exercise 4 U: Check that if you do make the substitution y = cos x and then differen-
tiate z(x) = f (x, cos x) with respect to x, that you obtain the same answer as above.
Example: Suppose we have z = z(x, y), but x = x(s, t, u) but y = y(s, u) only, and
u = u(v). We can represent the functional relationships using the chain diagram below:
z(x, y)
@z @z
@x @y
x y
@x @x @x @y @y
@s @t @u @s @u
s t u s u
du du
<latexit sha1_base64="VWctzs9DFui+yTKDIoums8+j6RI=">AAACBnicbZDLSsNAGIX/1Futt6pLN0OL4Kokbmx3BTcuKxhbaEOZTCbt0MnFmUmhhDyADyBu9QlcScGVr+EDuPQdnLTd9PLDwOGcM3D43JgzqUzzxyhsbe/s7hX3SweHR8cn5dOzRxklglCbRDwSHRdLyllIbcUUp51YUBy4nLbd0W2et8dUSBaFD2oSUyfAg5D5jGClLafnC0xSL8lSb5z1y1WzZs4OrQtrIapN88WqTL/+Wv3yb8+LSBLQUBGOpexaZqycFAvFCKdZqZdIGmMywgOazqZm6FJbHvIjoV+o0Mxd6uFAykng6maA1VCuZrm5McsdIX25Kewmyq87KQvjRNGQzFf4CUcqQjkU5DFBieITLTARTM9HZIg1GqXRlTQXa5XCurCva42aea/51GF+RbiAClyBBTfQhDtogQ0EnuAV3uDdeDY+jE9jOq8WjMWfc1g64/sfkKadrA==</latexit>
sha1_base64="6miuQy9uYU3XvNxkLTsDiUW7Tkw=">AAACBnicbZDLSsNAGIX/eK31VnXpZrAIrkrixroruHFZwdhCG8pkMmmHTi7O/CmUkAfwAcStPoErcetr+AC+h9PLppcfBg7nnIHD56dSaLTtX2tjc2t7Z7e0V94/ODw6rpycPukkU4y7LJGJavtUcyli7qJAydup4jTyJW/5w7tJ3hpxpUUSP+I45V5E+7EIBaNoLK8bKsryICvyYFT0KlW7Zk+PrApnLqoN+9UhANDsVf66QcKyiMfIJNW649gpejlVKJjkRbmbaZ5SNqR9nk+nFuTSWAEJE2VejGTqLvRopPU48k0zojjQy9nEXJtNHKVDvS7sZBjWvVzEaYY8ZrMVYSYJJmQChQRCcYZybARlSpj5hA2oQYMGXdlwcZYprAr3unZbsx8MnzrMrgTncAFX4MANNOAemuACg2d4g3f4sF6sT+vL+p5VN6z5nzNYOOvnHzmrm0w=</latexit>
dv <latexit sha1_base64="VWctzs9DFui+yTKDIoums8+j6RI=">AAACBnicbZDLSsNAGIX/1Futt6pLN0OL4Kokbmx3BTcuKxhbaEOZTCbt0MnFmUmhhDyADyBu9QlcScGVr+EDuPQdnLTd9PLDwOGcM3D43JgzqUzzxyhsbe/s7hX3SweHR8cn5dOzRxklglCbRDwSHRdLyllIbcUUp51YUBy4nLbd0W2et8dUSBaFD2oSUyfAg5D5jGClLafnC0xSL8lSb5z1y1WzZs4OrQtrIapN88WqTL/+Wv3yb8+LSBLQUBGOpexaZqycFAvFCKdZqZdIGmMywgOazqZm6FJbHvIjoV+o0Mxd6uFAykng6maA1VCuZrm5McsdIX25Kewmyq87KQvjRNGQzFf4CUcqQjkU5DFBieITLTARTM9HZIg1GqXRlTQXa5XCurCva42aea/51GF+RbiAClyBBTfQhDtogQ0EnuAV3uDdeDY+jE9jOq8WjMWfc1g64/sfkKadrA==</latexit>
sha1_base64="6miuQy9uYU3XvNxkLTsDiUW7Tkw=">AAACBnicbZDLSsNAGIX/eK31VnXpZrAIrkrixroruHFZwdhCG8pkMmmHTi7O/CmUkAfwAcStPoErcetr+AC+h9PLppcfBg7nnIHD56dSaLTtX2tjc2t7Z7e0V94/ODw6rpycPukkU4y7LJGJavtUcyli7qJAydup4jTyJW/5w7tJ3hpxpUUSP+I45V5E+7EIBaNoLK8bKsryICvyYFT0KlW7Zk+PrApnLqoN+9UhANDsVf66QcKyiMfIJNW649gpejlVKJjkRbmbaZ5SNqR9nk+nFuTSWAEJE2VejGTqLvRopPU48k0zojjQy9nEXJtNHKVDvS7sZBjWvVzEaYY8ZrMVYSYJJmQChQRCcYZybARlSpj5hA2oQYMGXdlwcZYprAr3unZbsx8MnzrMrgTncAFX4MANNOAemuACg2d4g3f4sF6sT+vL+p5VN6z5nzNYOOvnHzmrm0w=</latexit>
dv
v
<latexit sha1_base64="Koif59WNmiN1/l1rhK4z+XE6FB4=">AAAB+nicbVC7SgNBFL0bXzG+opaCDAbBKuzaGCsDNpYJuCaQhDA7uZsMmX0wMxsIIaWVrX6BldhapPI//AArf8LZJE0eBy4czjkXDseLBVfatn+szMbm1vZOdje3t39weJQ/PnlSUSIZuiwSkax7VKHgIbqaa4H1WCINPIE1r3+f+rUBSsWj8FEPY2wFtBtynzOqjVQdtPMFu2hPQVaJMyeFu+9J9e/5fFJp53+bnYglAYaaCapUw7Fj3RpRqTkTOM41E4UxZX3axdG03phcGqlD/EiaCzWZqgs5Gig1DDyTDKjuqWUvFdd6qSKVr9aZjUT7pdaIh3GiMWSzFn4iiI5IOgTpcIlMi6EhlElu6hPWo5IybebKmV2c5RVWiXtdvC3aVbtQLsEMWTiDC7gCB26gDA9QARcYILzAK7xZY+vd+rA+Z9GMNf85hQVYX/9jTpil</latexit>
<latexit sha1_base64="ZpqLkXV98NOzeTuZT2jlt5Dqwhk=">AAAB+nicbVC7TgJBFL2LL8QXamkzkZhYkVkbsZLExhKMKyRAyOxwFybMPjIzS0IIX2CrX2BlbC2s/A8/wP9wFmx4nOQmJ+ecm5wcP5FCG0p/nNzG5tb2Tn63sLd/cHhUPD550nGqOHo8lrFq+kyjFBF6RhiJzUQhC32JDX94l/mNESot4ujRjBPshKwfiUBwZqxUH3WLJVqmM5BV4v6T0u331wNY1LrF33Yv5mmIkeGSad1yaWI6E6aM4BKnhXaqMWF8yPo4mdWbkgsr9UgQK3uRITN1IcdCrcehb5MhMwO97GXiWi9TlA70OrOVmqDSmYgoSQ1GfN4iSCUxMcmGID2hkBs5toRxJWx9wgdMMW7sXAW7i7u8wirxrso3ZVqnpWoF5sjDGZzDJbhwDVW4hxp4wAHhGV7g1Zk6b8678zGP5pz/n1NYgPP5BzB5lmA=</latexit>
v
<latexit sha1_base64="Koif59WNmiN1/l1rhK4z+XE6FB4=">AAAB+nicbVC7SgNBFL0bXzG+opaCDAbBKuzaGCsDNpYJuCaQhDA7uZsMmX0wMxsIIaWVrX6BldhapPI//AArf8LZJE0eBy4czjkXDseLBVfatn+szMbm1vZOdje3t39weJQ/PnlSUSIZuiwSkax7VKHgIbqaa4H1WCINPIE1r3+f+rUBSsWj8FEPY2wFtBtynzOqjVQdtPMFu2hPQVaJMyeFu+9J9e/5fFJp53+bnYglAYaaCapUw7Fj3RpRqTkTOM41E4UxZX3axdG03phcGqlD/EiaCzWZqgs5Gig1DDyTDKjuqWUvFdd6qSKVr9aZjUT7pdaIh3GiMWSzFn4iiI5IOgTpcIlMi6EhlElu6hPWo5IybebKmV2c5RVWiXtdvC3aVbtQLsEMWTiDC7gCB26gDA9QARcYILzAK7xZY+vd+rA+Z9GMNf85hQVYX/9jTpil</latexit>
<latexit sha1_base64="ZpqLkXV98NOzeTuZT2jlt5Dqwhk=">AAAB+nicbVC7TgJBFL2LL8QXamkzkZhYkVkbsZLExhKMKyRAyOxwFybMPjIzS0IIX2CrX2BlbC2s/A8/wP9wFmx4nOQmJ+ecm5wcP5FCG0p/nNzG5tb2Tn63sLd/cHhUPD550nGqOHo8lrFq+kyjFBF6RhiJzUQhC32JDX94l/mNESot4ujRjBPshKwfiUBwZqxUH3WLJVqmM5BV4v6T0u331wNY1LrF33Yv5mmIkeGSad1yaWI6E6aM4BKnhXaqMWF8yPo4mdWbkgsr9UgQK3uRITN1IcdCrcehb5MhMwO97GXiWi9TlA70OrOVmqDSmYgoSQ1GfN4iSCUxMcmGID2hkBs5toRxJWx9wgdMMW7sXAW7i7u8wirxrso3ZVqnpWoF5sjDGZzDJbhwDVW4hxp4wAHhGV7g1Zk6b8678zGP5pz/n1NYgPP5BzB5lmA=</latexit>
Figure 1.22: Chain rule for z = z(x, y), when x = x(s, t, u) and y = y(s, u).
34 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Hence following the above shortcut rules, it is easy to see that ultimately z = z(s, t, v)
and that.
∂z ∂z ∂x ∂z ∂y
= +
∂s ∂x ∂s ∂y ∂s
∂z ∂z ∂x
=
∂t ∂x ∂t
∂z ∂z ∂x ∂u ∂z ∂y ∂u
= +
∂v ∂x ∂u ∂v ∂y ∂u ∂v
When we integrate a partial derivative, we get something similar, but the role of the con-
stant is taken by an arbitrary function of the variable(s) which has (have) been held constant
in the partial differentiation. Consider a function z(x, y) which has partial derivatives:
ˆ
∂z
dx = z(x, y) + g(y),
∂x y=const
where g(y) is an arbitrary function of y. This is because, when we differentiate the right hand
side, z(x, y) + g(y), partially with respect to x, keeping y constant, any arbitrary function
of y has a zero partial derivative with respect to x:
∂ ∂z ∂z
(z(x, y) + g(y)) = +0= .
∂x y=const ∂x y=const ∂x y=const
Similarly we have: ˆ
∂z
dy = z(x, y) + f (x),
∂y x=const
where f (x) is an arbitrary function of x. This is because, when we differentiate the right
hand side, z(x, y) + f (x), partially with respect to y, keeping x constant, any arbitrary
function of x has a zero partial derivative with respect to y:
∂ ∂z ∂z
(z(x, y) + f (x)) = +0= .
∂y x=const ∂y x=const ∂y x=const
1.5. PARTIAL DERIVATIVES 35
Note that when we are given two partial derivatives of the same function z we can
integrate both derivatives separately as above with respect to the relevant variable to obtain
two apparently different representations for z. The resolution of this paradox is that by
exploit the arbitrary nature of the functions g(y) and f (x) to “pattern match” the two
expressions z(x, y)+g(y) and z(x, y)+f (x) to obtain identical expressions for both integrals:
Example: Suppose you are given that a function z(x, y) exists such that
∂z ∂z
= 2x + 2y, = 3y 2 + 2x.
∂x y=const ∂y x=const
ˆ
∂z
dx = x2 + 2xy + g(y),
∂x y=const
ˆ
∂z
dy = y 3 + 2xy + f (x).
∂y x=const
Given that we are told that z(x, y) exists, both of these expressions should be the same.
Clearly both have a 2xy, but otherwise they look different. However, a careful term-by-term
comparison shows that can be made the same if we match up using the arbitrary functions
as:
f (x) = x2 + C, g(y) = y 3 + C,
where C is an arbitrary constant. Note that the arbitrariness of the the constant C
maintains the arbitrariness of g and f . Whence we have a single expression for z(x, y),
unique up to arbitrary constants (as in the single variable integration case):
z(x, y) = x2 + y 3 + 2xy + C.
You can check for yourselves that if you differentiate this function z(x, y) partially with
respect to x and y independently that you cover the partial derivatives that you started off
with above.
We can use integration of partial derivatives to arrive at the following neat result:
36 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Theorem 1.5.5. Let u(x, y) be a function of the variables x and y, and assume that
∂u ∂u
(x, y) = (x, y) = 0
∂x ∂y
• Integrating with respect to x, we see that u(x, y) = g(y), since (from the way integra-
tion works above) the constant of integration with respect to x must depend only on
y.
∂u ∂u
• Using now the assumption that ∂y
(x, y) = 0 everywhere, this means that ∂y
(x, y) =
g 0 (y) = 0 everywhere.
• Hence, integrating g 0 (y) = 0 with respect to y, since g 0 (y) is just a function of y, the
ordinary 1-D rules of integration apply. Hence g(y)=constant everywhere, and hence
that u(x, y) must be constant everywhere.
The chain rule is used in the solution of first order exact ordinary differential equations.
Chain rules are also useful in the solution of partial differential equations (PDE) by con-
verting the equations form something complicated when expressed in one set of variables, to
something simpler to solve in another set.
Example: Suppose you are looking for a solution to the first order PDE:5
∂z ∂z
+c = 0,
∂t ∂x
where c is a constant.
ξ = x − ct, η = x + ct
5
DON’T PANIC: you won’t be expected to do this until MATH2038! This is just an example to show
you how the chain rule is useful.
1.5. PARTIAL DERIVATIVES 37
so that z(x, t) = z(x(ξ, η), t(ξ, η)) = z(ξ, η). We can use the chain rule to rewrite the
partial derivatives with respect to x and t in the following form,
∂z ∂ξ ∂z ∂η ∂z dz dz
= + = −c + c ,
∂t ∂t ∂ξ ∂t ∂η dξ dη
and
∂z ∂ξ ∂z ∂η ∂z ∂z ∂z
= + = + .
∂x ∂x ∂ξ ∂x ∂η ∂ξ ∂η
∂z ∂z dz dz ∂z ∂z ∂z
+c =−−c +c +c + = 2c = 0.
∂t ∂x dξ dη ∂ξ ∂η ∂η
∂z
= 0.
∂η
Thus if we integrate with respect to η and remember that z also depends on ξ we see
that
z = F (ξ),
In other words z is just an arbitrary function of ξ only.
Converting back to the original (x, t) coordinates we have z(x, t) = F (x − ct). The actual
function F could be determined by specifying the initial profile of the wave at t = 0. If we
set this to be z(x, 0) = z0 (x) we would then find z(x, 0) = F (x) = z0 (x). Hence the solution
would then be
z(x, t) = z0 (x − ct).
The solution of this equation represents a wave of amplitude z traveling to the right (positive
x) with speed c at time t.
In this section we consider the so-called Jacobian matrices of partial derivatives and show
how they can be used to encode and to represent the chain rule. We then on to examine the
relationship between the inverses of these matrices and so establish conditions for an inverse
of a multivariable function to have a local inverse.
38 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Here f is called a vector function, since the result is an ordered set of two functions
f1 (x, y) and f2 (x, y). These could be the component of a vector whose direction and magni-
tude depends on the (x, y) position in space, e.g.,
The function f could represent, say, the wind velocity at sea level at any point on a map.
This is a matrix of partial derivatives. Note the order of the entries. The independent
variables (x, y) go on the bottom of each partial derivative as appropriate. Partial derivatives
of the function fi are in the ith row. The j th column should contain the same independent
variable.
A little bit of thought shows that we can combine these two equations and represent
them as components in the following matrix multiplication:
∂x ∂x
∂f1 ∂f1 ∂f1 ∂f1 ∂s ∂t
= (1.6)
∂s ∂t ∂x ∂y ∂y ∂y
∂s ∂t
∂x ∂x
∂f2 ∂f2 ∂f2 ∂f2 ∂s ∂t
= (1.7)
∂s ∂t ∂x ∂y ∂y ∂y
∂s ∂t
Due to the identity of the final 2 × 2 matrix in both these expressions, we could again
combine these results and write them as a 2 × 2 matrix multiplication:
∂f1 ∂f1 ∂f1 ∂f1 ∂x ∂x
∂s
∂t ∂x
∂y ∂s ∂t
= . (1.8)
∂f ∂f2 ∂f2 ∂f2 ∂y ∂y
2
∂s ∂t ∂x ∂y ∂s ∂t
Or in terms of the definition of the Jacobian Matrix the chain rule for the mapping
g f
R2(x,y) → R2(s,t) → R2 ,
we set
∂f1 ∂f1
∂x1
(x1 , . . . , xn ) ··· ∂xn
(x1 , . . . , xn )
· ·
∂fi
JF (x1 , . . . , xn ) = =
· · .
∂xj · ·
∂fp ∂fp
∂x1
(x 1 . . . , xn ) · · ·
, ∂xn
(x 1 . . . , xn )
,
Example: Jacobians need not be square matrices! For example, if we have a function:
for arbitrary integer p, q, n, not necessarily equal, is then the chain rule is the matrix product
∂g1 ∂g1 ∂g1 ∂g1 ∂f1 ∂f1
···
∂x1 · · · ∂xn ∂f1 ∂f p ∂x1
···
∂xn
. .. .. . . . . . ..
.. .
. = .. ..
.. .. .. .
∂gq ∂gq ∂gq ∂gq ∂fp ∂fp
··· ··· ···
∂x1 ∂xn ∂f1 ∂fp ∂x1 ∂xn
| {z } | {z } | {z }
q×n q×p p×n
Recall that in 1 dimension, if y = y(x), then the inverse function, if it exists, may be
denoted by x = x(y). Inserting the definition of y into the inverse function, we obtain the
(tautological) result:
x = x(y(x)).
Assuming both the function and its inverse to be differentiable, if we differentiate both sides
with respect to x we end up with
dx dx dy dx dy dy dx
= 1= =1 .
dx dy dx dy dx dx dy
Does the same result hold for partial derivatives, i..e, given a differentiable z = f (x, y),
does
∂z ∂x
=1 ?
∂x ∂z
The analogous answer (and question!) is obtained by considering the following example:
Now, substituting the inverse function into the function (as in the 1-D case) we have
x = x(f (x)).
∂x ∂x ∂x ∂x ∂f1 ∂f1
∂x
∂y ∂f1
∂f2 ∂x ∂y
=
∂y ∂y ∂y ∂y ∂f2 ∂f2
∂x ∂y ∂f1 ∂f2 ∂x ∂y
Now since x and y are independent variables,
∂x ∂x
∂x ∂y 1 0
.
∂y ∂y =
0 1
∂x ∂y
Hence we have
∂x ∂x ∂f1 ∂f1
1 0 ∂f2 ∂y
= ∂f1 ∂x
∂y ∂y ∂f2 ∂f2
0 1
∂f1 ∂f2 ∂x ∂y
Since the LHS is the identity matrix, the matrices on the RHS must be inverses of each
other, i.e.,
∂x ∂x ∂f1 ∂f1 −1
∂f1 ∂f2 ∂y
= ∂x . (1.10)
∂y ∂y ∂f2 ∂f2
∂f1 ∂f2 ∂x ∂y
Hence the matrices of the Jacobians of the inverse functions are inverses of each other. This
is a general result for n = p = q maps leading to square Jacobian matrices.
So in general, for multivariable functions, you can see that not much can be said about
the relationship between individual partial derivatives ∂f /∂x and ∂x/∂f .
What can be said is that the Jacobian matrices of the function and the inverse function
are inverses of each other!
Remark: Note that in the case of n = p = q = 1 for single variable functions, the
matrices collapse to just as single entry, so the matrix inverse is just the inverse of a scalar
42 CHAPTER 1. MULTIVARIABLE FUNCTIONS
(i.e., the reciprocal), the partials become total derivatives and so we recover the single
variable ratio result above:
dx df
=1 .
df dx
Very Sketchy Proof: (Not required, but outlined here for information. Note that
we are using x1 = x, x2 = y in the R2 → R2 version of the theorem.) The result (1.10)
assumed that inverse functions existed. Hence in that instance, the Jacobian matrices will
have inverses. For a 2 × 2 matrix, a necessary condition for the inverse to exist is that its
determinant does not vanish. Hence if the Jacobian determinant does not vanish, the inverse
matrix will exist, and so by implication the inverse functions.
Note that the theorem does not find the inverse functions, it just guarantees that they
exist. They may be complicated or even impossible to write down explicitly. Hence the use
of the term implicit function theorem.
Example: Given x = r cos θ and y = r sin θ, find where the inverse functions r(x, y) and
θ(x, y) exist.
From the Implicit Function Theorem, we require the Jacobian determinant to be non-
vanishing. We have
∂x ∂x
cos θ −r sin θ
∂(x, y) ∂r ∂θ
=r
= =
∂(r, θ) ∂y ∂y sin θ r cos θ
∂r ∂θ
Hence we see that inverse functions exist locally wherever r 6= 0. This is the polar origin, at
which θ is ambiguous: it can take any value from 0 to 2π.
The determinant of a Jacobian matrix will return again later when we consider changes
of variables in integrals over multivariate functions.
Definition of Gradient 43
And finally, Jacobians should not be confused with Jacobites (who were 18th supporters
of the claim of the descendants of James II of England (or VII of Scotland) to the throne
of England and Scotland). :-) Jacobians are in fact, named after the mathematician Carl
Gustav Jacobi (1804-1851).
One important thing to note is that, while f : R2 → R, its gradient ∇f (x, y) is a vector
function ∇f : R2 → R2 . For this reason in R2 , if we assign unit vectors i and j to the positive
x, y directions respectively, we can write ∇f as:
∂f ∂f
∇f (x, y) = i+ j,
∂x ∂y
rf (xP , yP ) @f
j
yP @y P
P @f
i
@x P
i
j
O xP x
Figure 1.23: Representation of the vector which is the gradient of a function f (x, y) in the
(x, y) plane at a point P , (xp , yp ).
∂f ∂f ∂f
∇f (x, y, z) = i+ j+ k,
∂x ∂y ∂z
rf (xP , yP , zP )
zP
P @f j
@y P
@f
k
@z P @f
i
@x P
O
xP
yP
k
x y
i j
Figure 1.24: Representation of the vector which is the gradient of a function f (x, y, z) in
(x, y, z) space at a point P , (xp , yp , zp ).
A differentiable curve in one dimension has a tangent line at each point. This is the straight
line that passes through the chosen point on the curve, but which has the same slope of the
curve at that point.
1.7. TANGENT PLANES 45
y y = f (x)
n
t
f (x)
P
i
j
x
O x
Figure 1.25: Tangent line (dotted), tangent vector t and normal vector n at a point P on
the (differentiable) curve y = f (x).
Note that a vector that is tangential to the curve at a chosen point P is given by (see
figure 1.25),
df
t=i+ j.
dx P
A vector n normal to this curve satisfies t.n = 0. A little bit of thought gives this as
df
n=− i + j.
dx P
Suppose we now consider the smooth surface z = f (x, y) (we return to what we mean by
“smooth” later, but for now just think of it has having no kinks or creases). At each point
on this surface, there will be two normal directions to the surface (parallel, but in opposite
directions to each other).
It is then possible to define a tangent plane T at a chosen point P on the surface. This
is the plane that passes through the point P , but is everywhere perpendicular to either of
the normal directions ±n (see figure 1.26).
46 CHAPTER 1. MULTIVARIABLE FUNCTIONS
P
r OP
(x, y, z)
r y
Figure 1.26: Tangent plane T at the point P : (a, b, f (a, b) that is everywhere normal to one
of the normal vectors ±n. Also show is a general point r in the tangent plane and the vector
r − OP that lies wholly within the tangent plane.
• Now suppose that the chosen point P on the surface has (x, y, z) coordinates given by
(a, b, f (a, b)). It will have a position vector relative to the origin given by OP as
OP = ai + bj + f (a, b)k.
• We know the slope of the surface in the x direction. It is given by (∂f /∂x)p . This is
the rate of change of the height of the surface in the x direction. Hence, by analogy
with the 1-D case above, a vector tangential to the surface in the x direction is given
by
∂f
tx = i + k
∂x P
• We also know the slope of the surface in the y direction. It is given by ∂f /∂y. This
is the rate of change of the height of the surface in the y direction. Hence, by analogy
with the 1-D case above, a vector tangential to the surface in the y direction is given
by
∂f
ty = j + k
∂y P
1.7. TANGENT PLANES 47
• We can now find a normal n to the surface by taking the vector product of the two
tangent vectors.
i j k
∂f
1 0 ∂f ∂f
∂x
n = tx × ty = =− i− j+k
P ∂x P ∂y P
∂f
0 1
∂y P
• Now suppose that the chosen point P on the surface has (x, y, z) coordinates given by
(a, b, f (a, b)). It will have a position vector relative to the origin given by OP as
OP = ai + bj + f (a, b)k.
• Likewise, suppose a general point on the tangent plane is given by the position vector
r where
r = xi + yj + zk
• The vector r-OP is a vector that joins the point P to any other general point on the
tangent plane at P . The vector r-OP therefore lies wholly within the tangent plane.
Consequently it must be perpendicular to either of the normals ±n at P . Hence we
must have
(r − OP) · n = 0.
• Substituting for r, OP and n this equation becomes
{xi + yj + zk − (ai + bj + f (a, b)k)} · n = 0.
{(x − a)i + (y − b)j + (z − f (a, b))k} · n = 0.
∂f ∂f
{(x − a)i + (y − b)j + (z − f (a, b))k} · − i− j + k = 0.
∂x P ∂y P
∂f ∂f
−(x − a) − (y − b) + (z − f (a, b)) = 0.
∂x P ∂y P
∂f ∂f
z = f (a, b) + (x − a) + (y − b) .
∂x P ∂y P
Note that we can express the equation of the plane in terms of the gradient of the function
evaluated at P in the following way:
Example: Compute the tangent plane to the surface z = x2 + y 2 at the point P above
(x, y) = (1, 2).
48 CHAPTER 1. MULTIVARIABLE FUNCTIONS
.
• Hence the ingredients for the formula for the tangent surface are:
∂f ∂f
= 2x, ⇒ =2×1=2
∂x ∂x P
∂f ∂f
= 2y, ⇒ =2×2=4
∂y ∂x P
⇒ z = 5 + 2(x − 1) + 4(y − 2)
⇒ z = 2x + 4y − 5
z
z z = x2 + y 2 z P = (1, 2, 5)
z = x2 + y 2
y
P = (1, 2, 5)
P = (1, 2, 5)
z = 2x + 4y 5 z = 2x + 4y 5
y y
z = x2 + y 2
x
x x
z = 2x + 4y 5
Recall that for a curve y = F (x) in 1-D, the tangent line at the point (a, F (a)) is given
by
df
y = F (a) + (x − a).
dx x=a
Differentiability 49
The tangent line gives a linear approximation to the line and is the first two terms of the
Taylor series expansion of y = F (x) about (a, F (a)).
It is then not too difficult to see that the tangent plane is a linear expansion of the
function z = f (x, y) about the point P = (a, b, f (a, b)). In fact it is the first two terms in
the multivariable Taylor series expansion of z = f (x, y) at P , which we will study later.
1.8 Differentiability
We also know that the derivative f 0 (a) has a geometric interpretation as the slope of the
line tangent to the graph of the function at the point (a, f (a)) ∈ R2 . So, the function is
differentiable when such a tangent line exists and hence its slope is well-defined.
If we now consider a function of two variables f : R2 → R instead, then the tangent line
is replaced by the tangent plane at a given point (a, b, f (a, b)). So, the function would be
differentiable at (a, b) when such a tangent plane exists.
The tangent plane does not have a single slope but rather many slopes coming from
different directions. We then compute the slopes in the x and y directions, that is, when
we fix y = b and x = a respectively and consider the slopes of the lines (x, b, f (x, b)) and
(a, y, f (a, y)) in the tangent plane. These slopes are precisely the partial derivatives of f (x, y)
at (a, b).
So, for a function f (x, y) to be differentiable at (a, b), the gradient ∇f (a, b) needs to
exist.6
6
One should expect the converse not to be true as we only choose two specific lines on the tangent plane
in order to describe the gradient vector. But as we shall see, this is almost true.
50 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Now, let us make the precise definition of differentiability. We will only do this for a function
f : R2 → R:
f (a + h, b + k) − f (a, b) − v · (h, k)
lim √ = 0. (1.12)
(h,k)→(0,0) h2 + k 2
The vector v is a bit mysterious. However, the following theorem shows why it is natural
in the definition of differentiability and explains precisely what it is.
Proof:
√
• To see the first assertion note that lim h2 + k 2 = 0. This together with (1.12)
(h,k)→(0,0)
and the limit rules in lemma 2.4.1, gives us
√ f (a + h, b + k) − f (a, b) − v · (h, k)
lim h2 + k 2 · lim √ = 0,
(h,k)→(0,0) (h,k)→(0,0) h2 + k 2
⇒ lim f (a + h, b + k) − f (a, b) = 0,
(h,k)→(0,0)
This is the definition of continuity (see section 2.4.4) and so a differentiable function
at a point is continuous there also. (NB the converse is not true! Just fold a piece
of paper and make a crease: the surface is continuous, but not differentiable for the
reason we now see below.)
1.8. DIFFERENTIABILITY 51
• To see the second assertion, note that we get the same limit if we let k = 0 in (1.12),
since for the limit to exist, which it does by assumption, the value must be the same,
regardless of the direction of approach. We can pick to approach the limit from a
direction where k = 0, without loss of generality. Hence we have
Since the limit exists by assumption and equals zero by definition, it does not depend
on whether h > 0 or h < 0 (any extra negative sign (−1) from the modulus may be
divided away into the zero). Hence we can write:
f (a + h, b) − f (a, b)
⇒ lim − v1 = 0,
h→0 h
f (a + h, b) − f (a, b) ∂f
⇒ v1 = lim = (a, b).
h→0 h ∂x
Similarly, by letting h = 0, we get
f (a, b + k) − f (a, b) ∂f
v2 = lim = (a, b).
k→0 h ∂y
There is a useful criteria that almost states the converse, guaranteeing that a function is
differentiable at a given point. The proof of the theorem is not required, but we will use it
in the subsequent examples.
∂f ∂f
Theorem 1.8.2. If both partial derivatives and of a function f : R2 → R exist and
∂x ∂y
are continuous on a neighbourhood of a point (a, b) then f is differentiable at (a, b).
52 CHAPTER 1. MULTIVARIABLE FUNCTIONS
is differentiable.
∂f
(x, y) = 2x + y exp (x2 + y 2 ) + 2x2 y exp (x2 + y 2 ),
∂x
∂f
(x, y) = 2y + x exp (x2 + y 2 ) + 2xy 2 exp (x2 + y 2 ).
∂y
As these partial derivatives are formed from continuous functions via products, sums and
compositions they are also continuous. So, by Theorem 1.8.2, f (x, y) is differentiable.
For a function of two or more variables, we have already seen that its partial derivatives
describe the rate of change of the function in either x or y directions depending which
partial derivative we consider. In this section we discuss how one can measure the rate of
change of such a function in any chosen direction.
We can appeal to the definitions of partial derivatives of a function f (x, y) in the x and
y directions and write down a definition of the derivative of a function in the direction of
a unit vector w emerging from the point (a, b, f (a, b)) and tangential to the surface in that
direction.
w1
P
w2 w3
u1 u3 x
Q
u2
C
Figure 1.28: The tangent plane T to surface z = f (x, y) at point P : (a, b, f (a, b)) contains an
infinite number of vectors wi . The derivative of the function f (x, y) is, in general, different
in each of these directions. The projection of these vectors wi onto a unit disk C centred at
Q : (a, b) are unit vectors ui . (x, y) plane are unit vectors ui
Actually, by working through the definition of the derivative, we can come up with a
much simpler formula for the directional derivative that is easy to use in practice.
Theorem 1.9.1. Suppose f : R2 → R is differentiable at (a, b) ∈ R2 and u is a unit vector.
Then the directional derivative of f in the direction u at (a, b) is given by
Du (f )(a, b) = ∇f (a, b) · u.
We are now in a position to deduce the geometric meaning of the gradient that we introduced
above, from the definition of the directional derivative. We use the property of dot products
that
a · b = |a| |b| cos(θ)
where θ is the angle between a and b. Letting θ be the angle between ∇f (a, b) and u, we
see that
Du (f )(a, b) = ∇f (a, b) · u = |∇f (a, b)| |u| cos(θ) = |∇f (a, b)| cos(θ),
since u is a unit vector.
We see that since ∇f (a, b) is given constant vector when evaluated at the point (a, b), to
maximise Du (f )(a, b), we only need to maximise cos(θ). Therefore, we obtain the following
geometric properties of the gradient.
rf (a, b)
�
✓ Du (f )(a, b) = rf (a, b) · u
u
�
(a, b)
�
� � � � �
Figure 1.29: Geometric meaning of directional derivative relative to a unit vector u and the
gradient vector ∇f at a point (a, b), superimposed on a contour plot of a R2 → R function.
(i) The function f increase the most rapidly at (a, b) when θ = 0, which is when u is
pointing in the same direction as ∇f (a, b).
(ii) The function f decrease the most rapidly at (a, b) when θ = π, which is when u is
pointing in the opposite direction as ∇f (a, b).
π
(iii) The rate of change of the function f is zero (a, b) when θ = 2
, which is when u is
perpendicular to the direction of ∇f (a, b).
(iv) Choosing u=i we obtain
∂f
Di (f )(x, y) = ∇f · i =
∂x
1.9. DIRECTIONAL DERIVATIVE 55
For a given function f (x, y), the sorts of calculations we do with the directional derivative
involve solving the equation Du (f )(a, b) = c when two of the parameters u, (a, b) and c are
specified.
Examples:
2 +y 2
Evaluate the directional derivative of the function z = f (x, y) where f (x, y) = ex at
the point (x, y) = (1, 1) in the direction
(a) i + j.
• Remember that to find the directional derivative u has to be a unit vector. Hence
we must have that
i+j 1
u= = √ (i + j) .
|i + j| 2
∂ x2 +y2 ∂ x2 +y2 2 2 2 2
∇f (x, y) = e i+ e j = 2xex +y i + 2yex +y j.
∂x ∂y
• Therefore we have
2 2 2 2
1 √ 2 2 √ 2 2
Du (f )(x, y) = ∇f (x, y)·u = 2xex +y i + 2yex +y j · √ (i + j) = 2xex +y + 2yex +y .
2
(b) i − j
• Remember that to find the directional derivative u has to be a unit vector. Hence
we must have that
i+j 1
u= = √ (i − j) .
|i − j| 2
• Therefore we have
1 √ √ x2 +y2
x2 +y 2 x2 +y 2 x2 +y 2
Du (f )(x, y) = ∇f (x, y)·u = 2xe i + 2ye j · √ (i − j) = 2xe − 2ye .
2
y ���
���
(1, 1)
���
rf (1, 1)
���
1
u = p (i j)
2
���
��� ��� ��� ��� ���
x
2 2
Figure 1.30: The level contour lines of f (x, y) = e−(x +2y ) with gradient vectors at the point
(x, y) = (1, 1) superimposed. The unit vector u= (i − j)/Sqrt2 extending from (1, 1) is also
shown. Since ∇f (1, 1) · u = 0, the vector u is perpendicular to the gradient vector and is
thus locally parallel to the level contour line.
1.9. DIRECTIONAL DERIVATIVE 57
Example:
Evaluate the Directional derivative of the above function at (x, y) = (0, 0) in every
direction. Comment on the result:
• We know that
2 +y 2 2 +y 2
∇f (x, y) = 2xex i + 2yex j
• Hence
2 2 2 2
2 2 2 2
Du (f )(x, y) = 2xex +y i + 2yex +y j · (αi + βj) = 2αxex +y + 2βyex +y
Hence the directional derivative at (0, 0) is everywhere zero in no matter what direction
you look in.
• Does this mean the function is everywhere constant? Clearly from the previous ex-
amples, we know that the directional derivatives do vary, depending on the evaluation
point. The actual reason is easily seen by realising that at (x, y) = (0, 0) the surface
has a local maximum. This gives a hint towards how we are going to find local maxima
and minima in the next couple of sections.
y
�
(0, 0)
-�
-�
-� -� � � �
x
2 2
Figure 1.31: Value of directional derivative of z = e−(x +y ) at (x, y) = (0, 0) in any direction
u is zero, since the function is locally horizontal at that point, and is in fact a local maximum.
This can also be seen from the fact that the contour at that point is just an isolated point
and not a line.
58 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Example:
Given the function z = f (x, y) = x2 + 2y 2 , work out the direction of the steepest ascent
and the steepest descent at the point (x, y) = (2, 1).
• We know that the function increases most rapidly at (2, 1) when u is pointing in the
same direction as ∇f (2, 1).
• We also know that at that point f decreases most rapidly in the direction −∇f (2, 1).
• Hence we have
∇f (2, 1) = (2 × 2)i + (4 × 1)j = 4i + 4j.
• Hence the
√ path of steepest ascent at (x, y) = (2, 1) is in the direction 4i + 4j, or
(i + j)/ 2 if you want to express it as a unit vector.
• Hence the√ path of steepest descent at (x, y) = (2, 1) is in the direction −4i − 4j or
−(i + j)/ 2 if you want to express it as a unit vector.
y y
� �
rf (2, 1)
� �
� � (2, 1)
� (2, 1) �
rf (2, 1)
-� -�
|rf (2, 1)|
-� -�
-� -�
-� -� -� � � � �
x -� -� -� � � � �
x
Figure 1.32: The level contour lines of f (x, y) = x2 + 2y 2 with gradient vectors at the point
(x, y) = (2, 1) superimposed. Left: The vector ∇f (2, 1) gives the direction of steepest ascent
at the point (2, 1), i.e., the direction perpendicular to the level contours lines of f (x, y) at
(2, 1) with a positive gradient. The value of |∇f (2, 1)| gives the slope of the surface at that
point in the steepest ascent direction. Right: The unit vector −∇f (2, 1)/|∇f (2, 1)| gives
the direction of steepest descent at the point (2, 1), i.e., the direction perpendicular to the
level contours lines of f (x, y) at (2, 1) with a negative gradient. The value of −|∇f (2, 1)|
gives the slope of the surface at that point in the steepest descent direction (although we
have chosen only to draw the unit vector on the diagram).
Computation of Steepest Paths and Contour Lines 59
Example:
For the same function calculate the direction of the contour lines through (x, y) = (2, 1).
∇f (2, 1) · u = 0,
• Let u = cos θi + sin θj, where θ is the angle in the horizontal (x, y) plane to the positive
real axis of the level lines at (x, y) = (2, 1).
• We thus have
y
���
���
���
u= p1 ( i + j)
2
���
���
(2, 1)
���
u= p1 (i j)
2
���
� � � � �
x
2 2
Figure 1.33:
√ The level contour lines of f (x, y) = x + 2y with the direction vectors u =
±(i − j)/ 2 superimposed as (x, y) = (2, 1), where they are parallel to the level contours
through that point.
60 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Having seen that we can derive the direction of steepest ascent/descent or level lines at
any particular point, we now examine how to compute the corresponding paths of steepest
ascent/descent or level lines as you take an extended walk away from any particular point.
A stream starts at the point (1, 1, f (1, 1)) follows the path of steepest descent away from
that point. Compute the trajectory of stream.
• Let the path of steepest descent in projection be given by y = y(x). The slope y 0 (x)
must then be parallel at each point to −∇f .
• Hence we have
dy y component of(−∇f ) −2y (1 + x2 + 2y 2 )3/2 2y
= = × = .
dx x component of(−∇f ) (1 + x2 + 2y 2 )3/2 −x x
1 = C × 22 ⇒ C = 1/4.
• Hence the steepest descent path from (1, 1) follows the path
y = x2 /4
y = x2 /4
y�
� p
(2, 1, 1/ 7)
<latexit sha1_base64="/RFtm6sr/NOKaqAP7NGSSnaGCxw=">AAACDHicbVDLTsJAFL3FF+Kr6tJNAzHBSLDFBS6JblxiIo+EEjIdpjBh+nBmStI0/IIrl271C9wZt/4DH+B/OAU2PE5yk5Nzzk1OjhMyKqRpTrXM1vbO7l52P3dweHR8op+eNUUQcUwaOGABbztIEEZ90pBUMtIOOUGew0jLGT2kfmtMuKCB/yzjkHQ9NPCpSzGSSurperFSskrWjS1euEyqk6ueXjDL5gzGOrEWpFDL29dv01pc7+l/dj/AkUd8iRkSomOZoewmiEuKGZnk7EiQEOERGpBk1ndiXCqpb7gBV+dLY6Yu5ZAnROw5KukhORSrXipu9FKFC1dsMjuRdO+6CfXDSBIfz1u4ETNkYKTLGH3KCZYsVgRhTlV9Aw8RR1iq/XJqF2t1hXXSrJSt27L5pAa6hzmycAF5KIIFVajBI9ShARjG8A4f8Km9al/at/Yzj2a0xc85LEH7/QfXppyz</latexit>
(2, 1)
�
z
<latexit sha1_base64="pqhLvSpUcfCuq0Y2urDB5+Wy/Oo=">AAAB+3icbVBLSgNBFHwTf3H8RV26aQyCqzCji7gRg25cJmA+kITQ0+lJmvR86H4jxJATuNUTZCduPYNHEA/gPexJssmn4EFRVQ+K8mIpNDrOr5XZ2Nza3snu2nv7B4dHueOTmo4SxXiVRTJSDY9qLkXIqyhQ8kasOA08yeve4CH1689caRGFTziMeTugvVD4glE0UuWlk8s7BWcKskrcOcnffdu38eTHLndyf61uxJKAh8gk1brpOjG2R1ShYJKP7VaieUzZgPb4aFpvTC6M1CV+pMyFSKbqQo4GWg8DzyQDin297KXiWi9VlPb1OrOZoH/THokwTpCHbNbCTyTBiKRDkK5QnKEcGkKZEqY+YX2qKEMzl212cZdXWCW1q4J7XXAqTr50DzNk4QzO4RJcKEIJHqEMVWDA4RXe4N0aWxPrw/qcRTPW/OcUFmB9/QOPRJfX</latexit>
-�
x y
-�
-� -� � � �
x
x =x,
y =x2 /4,
1 1
z =f (x, y(x)) = p =p .
<latexit sha1_base64="9zLAGZSifIcNRzrQLta9M4Replg=">AAACY3icbZHLSgMxGIXT8VZbL7W6EyFYLC1KnakFuykU3bisYC/QaUsmzdRg5mKSkRmHeUQfwAfoE7hVML1srP0hcHLOF/g5sXxGhdT1z5S2sbm1vZPezWT39g8Oc0f5jvACjkkbe8zjPQsJwqhL2pJKRno+J8ixGOlaL/ezvPtGuKCe+yQjnwwcNHGpTTGSyhrlJmGjGF4VoWnCSKlh9bq2uL03inYpvIpKYbncMG2OcGwksSleuYyNSwVeVktzvDysJsl6Ig6HteS6niSV4ihX0Cv6fOB/YSxFASynNcpNzbGHA4e4EjMkRN/QfTmIEZcUM5JkzEAQH+EXNCHxvIYEXihrDG2Pq+NKOHf/cMgRInIsRTpIPovVbGauzWYOF7ZYF/YDadcHMXX9QBIXL7awAwalB2eFwzHlBEsWKYEwp2p9iJ+Rakuqb8moXozVFv6LTrVi3FT0x1qhebdsKA1OwTkoAQPcgiZ4AC3QBhh8gC/wDX5SUy2r5bWTBaqllm+OwZ/Rzn4Bv9+1Og==</latexit>
1 + x2 + 2(x2 /4)2 1 + x2 + x4 /8
1
Figure 1.34: Steepest descent path from point above (2, 1) on surface z = √ . The
1+x2 +2y 2
left hand diagram shows the contour plot of f (x, y) = constant, together with the projection
of the steepest descent path from (2, 1) in the (x, y) plane. The right hand diagram is a
plot of the surface (x, y, z = f (x, y)) in R3 . The steepest descent path from (2.1) is also
show. It is the three dimensional curve with coordinates given by (x, y(x), z = f (x, y(x)))
as explained in the text.
• In (x, y, z) ∈ R3 , the steepest descent curve of course sits on the surface z = f (x, y).
Its coordinates are thus given parametrically by (x, y(x), z = f (x, y(x))), where
x = x,
y = y(x) = x2 /4,
1 1
z = f (x, y(x)) = p =p .
2 2
1 + x + 2(x /4)2 1 + x2 + x4 /8
Find the equation of the level contours, i.e., z(x, y) = c (without doing the obvious!).
• Let the level contour in projection be given by y = y(x). The slope y 0 (x) must then
be parallel at each point to v.
• Hence we have
2 2
dy y component of v −2xe−(x +3y ) −x
= = −(x 2 +3y 2 ) = .
dx x component of v 6ye 3y
y�
(1, 1)
x2 3y 2
� + =1
4 4
-�
-�
-� -� � � �
x
2 3 2 2 +3y 2 )
Figure 1.35: Level contours 4Cx
+ 4C y = 1 of surface z = e−(x . The bold contour is the
2
level contour through (1, 1), i.e., x4 + 43 y 2 = 1.
Of course, we could have just spotted what the level contours were by just observing that
z = constant when x2 + 3y 2 =constant. However this example is to demonstrate how curves
other than steepest descent ones may be calculated.
1.11. HIGHER ORDER PARTIAL DERIVATIVES 63
So far we have just looks at the first partial derivatives of multivariable functions. Now we
consider higher order partial derivatives. Again we initially focus on f : R2 → R.
f (x, y)
@f @f
@x @y
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
@ @f @2f @ @f @2f @ @f @2f @ @f @2f
= = = =
@x @x @x2 @y @x @y@x @x @y @x@y @y @y @y 2
Thus we have up to four second-order partial derivatives. At higher orders the pyramid
structure propagates. (Question: How many partial derivatives are there at order n?)
∂ 2f ∂
= (2x + y cos(xy)) = cos(xy) − xy sin(xy).
∂y∂x ∂y
∂ 2f ∂
= (1 + x cos(xy)) = cos(xy) − xy sin(xy).
∂x∂y ∂x
∂ 2f ∂
= (1 + x cos(xy)) = −x2 sin(xy).
∂y 2 ∂y
64 CHAPTER 1. MULTIVARIABLE FUNCTIONS
∂ 3w ∂ 3w
Example: Given w = f (x, y, z) = x cos(z) + sin(xy), calculate , , and
∂z∂x2 ∂x∂z∂x
∂ 3w
.
∂x2 ∂z
∂ 3w ∂ 2w ∂w
2
= (cos(z) + y cos(xy)) = (−y 2 sin(x)) = 0,
∂z∂x ∂z∂x ∂z
∂ 3w ∂ 2w ∂w
= (cos(z) + y cos(xy)) = (− sin(z)) = 0,
∂x∂z∂x ∂x∂z ∂x
∂ 3w ∂ 2w ∂w
2
= 2
(−x sin(z)) = (− sin(z)) = 0.
∂x ∂z ∂x ∂x
The above examples suggest that under certain circumstances the order in which the
partial derivatives are performed is irrelevant. In fact the following theorem holds.
Theorem 1.11.1. Let f : Rn → R be a function. Suppose for some positive integer k, there
exists a neighbourhood U of a point P ∈ Rn such that f and all its partial derivatives of
order less than k are continuous on U . Then any two mixed partial derivatives of order k
involving the same variables but in different sequential orders are equal at P , provided those
partial derivatives are continuous at P .
Boiled down to something digestible for f : R2 → R and for second order partial deriva-
tives, we also have the theorem:
Theorem: If fxy and fyx are continuous at (x, y) = (a, b) then they are equal there.
So provided the mixed partial derivatives of order up to k are continuous at a point, then
all the mixed partial derivatives of order k will all be equal. This reduces dramatically the
number of partial derivatives at higher order, so that for example, sufficiently continuous
functions and their partial derivatives, fxxy = fxyx = fyxx etc. Hence for such functions, the
order in which you do the partial derivatives doesn’t matter.
1.12. CHANGES OF VARIABLES IN PDES 65
We have seen above that combining the chain rule and changes of variables can be used to
simplify first order (partial) differential equations and make them easier to solve. It may be
that when expressed in a new set of coordinates that respect the symmetry of the geometry,
such an approach can also either simplify the form of PDEs or express the equation in a
form that is more amenable to solution in certain geometries.
ξ = x − ct, η = x + ct,
to write
u = u(ξ(x, t), η(x, t))
∂u ∂u ∂ξ ∂u ∂η ∂u ∂u
= + = + ,
∂x ∂ξ ∂x ∂η ∂x ∂ξ ∂η
∂ 2u ∂ ∂u ∂ ∂ ∂u ∂u
⇒ = = + +
∂x2 ∂x ∂x ∂ξ ∂η ∂ξ ∂η
2
∂ u ∂ 2u ∂ 2u
= +2 +
∂ξ 2 ∂η∂ξ ∂η 2
66 CHAPTER 1. MULTIVARIABLE FUNCTIONS
So now substituting these two expressions back into the wave equation we obtain:
∂ 2u ∂ 2u
= c
∂t2 ∂x2
2 2
2 ∂ u ∂ 2u ∂ 2u 2 ∂ u ∂ 2u ∂ 2u
c −2 + = c +2 +
∂ξ 2 ∂η∂ξ ∂η 2 ∂ξ 2 ∂η∂ξ ∂η 2
∂ 2u
⇒ = 0
∂η∂ξ
for arbitrary functions F and G. These might be determined by giving the initial values of
the function u(x, 0) = F (x) + G(x) and ux (x, 0) = F 0 (x) + G0 (x). You will learn about how
to solve such PDEs in second year modules.
∂ 2u ∂ 2u
+ = 0.
∂x2 ∂y 2
Laplace’s equation is widespread in mathematics and its applications. Its solution in 2-
D represents, for example, the shape formed by a soap bubble, the electrostatic potential
away from a charge distribution, the gravitational potential away from a mass, the real (or
imaginary) part of an analytic complex function, the components of velocity of a steady 2-D
incompressible, irrotational fluid flow, etc. In this example we transform the equation from
cartesian to plane polar coordinates. This is used to solve the equation whenever there is
cylindrical symmetry in the problem. 7
The chain rule is then about transforming from (x, y) to (r, θ). We have
∂u ∂u ∂r ∂u ∂θ
= + ,
∂x ∂r ∂x ∂θ ∂x
∂u ∂u ∂r ∂u ∂θ
= + .
∂y ∂r ∂y ∂θ ∂y
7
Such problems will be discussed next year, but the transformation here serves as a good exercise in use
of the chain rule.
1.12. CHANGES OF VARIABLES IN PDES 67
∂r 2x x r cos θ
= p = = = cos θ.
∂x 2 x2 + y 2 r r
∂r 2y y r sin θ
= p = = = sin θ.
∂y 2
2 x +y 2 r r
∂(tan θ) ∂θ ∂(y/x) y
= sec2 θ but = − 2,
∂x ∂x ∂x x
Combining these two identities, using the fact that y/x = tan θ, we have in (r, θ) coordinates:
∂θ y r sin θ sin θ
= − cos2 θ 2 = − cos2 θ 2 2
=− .
∂x x r cos θ r
Similarly we have:
∂(tan θ) ∂θ ∂(y/x) 1
= sec2 θ but = ,
∂y ∂y ∂y x
∂u ∂u ∂r ∂u ∂θ ∂u sin θ ∂u
= + = cos θ − ,
∂x ∂r ∂x ∂θ ∂x ∂r r ∂θ
∂u ∂u ∂r ∂u ∂θ ∂u cos θ ∂u
= + = sin θ + .
∂y ∂r ∂y ∂θ ∂y ∂r r ∂θ
Hence in polar coordinates we can see that the partial derivatives can be written as the
following operators:
∂ ∂ sin θ ∂
= cos θ − ,
∂x ∂r r ∂θ
∂ ∂ cos θ ∂
= sin θ + .
∂y ∂r r ∂θ
Thus we can now write down the second derivatives with respect to x and y:
68 CHAPTER 1. MULTIVARIABLE FUNCTIONS
∂ 2u ∂ ∂u ∂ sin θ ∂ ∂u sin θ ∂u
= = cos θ − cos θ − ,
∂x2 ∂x ∂x ∂r r ∂θ ∂r r ∂θ
∂ ∂u sin θ ∂u sin θ ∂ ∂u sin θ ∂u
= cos θ cos θ − − cos θ − ,
∂r ∂r r ∂θ r ∂θ ∂r r ∂θ
2
2 ∂ u ∂ 1 ∂u sin θ ∂ ∂u sin θ ∂ ∂u
= cos θ 2 − cos θ sin θ − cos θ + 2 sin θ ,
∂r ∂r r ∂θ r ∂θ ∂r r ∂θ ∂θ
∂ 2 u cos θ sin θ ∂u cos θ sin θ ∂ 2 u
2 sin2 θ ∂u cos θ sin θ ∂ 2 u
= cos θ 2 + − + −
∂r r2 ∂θ r ∂r∂θ r ∂r r ∂r∂θ
sin θ cos θ ∂u sin2 θ ∂ 2 u
+ + 2
r2 ∂θ r ∂θ2
∂ 2u cos θ sin θ ∂u cos θ sin θ ∂ 2 u sin2 θ ∂u sin2 θ ∂ 2 u
= cos2 θ + 2 − 2 + + 2 .
∂r2 r2 ∂θ r ∂r∂θ r ∂r r ∂θ2
Similarly we have:
∂ 2u ∂ ∂u ∂ cos θ ∂ ∂u cos θ ∂u
= = sin θ + sin θ + ,
∂y 2 ∂y ∂y ∂r r ∂θ ∂r r ∂θ
∂ ∂u cos θ ∂u cos θ ∂ ∂u cos θ ∂u
= sin θ sin θ + + sin θ + ,
∂r ∂r r ∂θ r ∂θ ∂r r ∂θ
2
2 ∂ u ∂ 1 ∂u cos θ ∂ ∂u cos θ ∂ ∂u
= sin θ 2 + sin θ cos θ + sin θ + 2 cos θ ,
∂r ∂r r ∂θ r ∂θ ∂r r ∂θ ∂θ
∂ 2 u sin θ cos θ ∂u sin θ cos θ ∂ 2 u cos2 θ ∂u cos θ sin θ ∂ 2 u
= sin2 θ − + + +
∂r2 r2 ∂θ r ∂r∂θ r ∂r r ∂θ∂r
cos θ sin θ ∂u cos2 θ ∂ 2 u
− +
r2 ∂θ r2 ∂θ2
∂ 2u cos θ sin θ ∂u cos θ sin θ ∂ 2 u cos2 θ ∂u cos2 θ ∂ 2 u
= sin2 θ − 2 + 2 + + .
∂r2 r2 ∂θ r ∂r∂θ r ∂r r2 ∂θ2
1.12. CHANGES OF VARIABLES IN PDES 69
This result actually has a simpler form: multiply first through by r, then note that the
sum of the partial derivatives in r can be written as a derivative
∂ 2 u ∂u ∂ ∂u
r 2 + = r .
∂r ∂r ∂r ∂r
Hence Laplace’s equation can be written as
∂ ∂u ∂ 2u
r r + 2 = 0,
∂r ∂r ∂θ
or
∂ ∂u ∂ 2u
r r = − 2.
∂r ∂r ∂θ
The key point here is that everything on the LHS involves derivatives wrt r and everything
on the RHS, derivatives wrt to θ. This is a key step in solving the equation by separation of
variables in plane polar coordinates: see second year modules!
∂u ∂ 2u
= 2 = 0.
∂θ ∂θ
Laplace’s equation then becomes:
d du
r r = 0,
dr dr
i.e., all reference to θ vanishes and the partial derivatives with respect to r become total.
70 CHAPTER 1. MULTIVARIABLE FUNCTIONS
After dividing by r (for r 6= 0) we can integrate this, now, ODE, twice to obtain:
du
r = A,
dr
du A
=
dr r
u(r) = A log r + B.
for arbitrary constants A and B. This shows, for example, that the electrostatic potential
due to a straight line source of charge decays logarithmically with radial distance r from the
line source.
∂ 2u ∂ 2u ∂ 2u
+ + = 0.
∂x2 ∂y 2 ∂z 2
An exercise for you is to show that a radially symmetric solution in 3-D is given by:
A
u(r) = +B
r
p
where r = x2 + y 2 + z 2 .
∇2 u = 0.
The symbol “∇” is pronounced “del” (short for “upside down delta”), so that ∇2 is pro-
nounced “del-squared”. It can be thought of ∇·∇, i.e., the result of taking the scalar product
∂ ∂
of two gradient operators. (Try this for yourself using the cartesian definition ∇ = ∂x i + ∂y j.
More of this type of thing will come in second year modules.
We saw above that the tangent plane could be interpreted as a linear approximation to a
multivariate function around a given point P . We can improve on that approximation using
the chain rule and higher order derivatives.
Recall that for a single-variable function g(t), if all the derivatives g exist at a point, say
t = 0 (that is g(0), g 0 (0), g 00 (0),... all exist), then g can be expanded as a power series in a
neighbourhood of t = 0 via the Taylor-Maclaurin Series expansion
∞
X g (k) (0) g 00 (0) 2 g 000 (0) 3
g(t) = tr = g(0) + g 0 (0)t + t + t + ....
r=0
r! 2! 3!
1.13. MULTIVARIABLE TAYLOR SERIES 71
But if we want to write down a Taylor series for g(t) about t = 0, then we need to
compute the derivatives of g at t = 0. To do that we will need to use the chain rule:
d ∂f dx ∂f dy ∂f ∂f
g 0 (t) = f (x(t), y(t)) = + =h +k .
dt ∂x dt ∂y dt ∂x ∂y
So far so good, now we need the higher derivatives. From above we can see that a
derivative with respect to t is, by the chain rule, in this case equivalent to the following
linear combination of partial derivatives with respect to x and y acting on f :
d ∂ ∂
=h +k
dt ∂x ∂y
We can apply this to g 0 (t) to get g 00 (t) as follows:
00 d 0 ∂ ∂ ∂f ∂f
g (t) = g (t) = h +k h +k
dt ∂x ∂y ∂x ∂y
2 2 2
∂ f ∂ f ∂f
= h2 2 + 2hk + k2 2
∂x ∂x∂y ∂y
We can repeat this for g 000 (t) and we find
2
000 d 00 ∂ ∂ 2∂ f ∂ 2f 2
2∂ f
g (t) = g (t) = h +k h + 2hk +k
dt ∂x ∂y ∂x2 ∂x∂y ∂y 2
∂ 3f ∂ 3f ∂ 3f 3
3∂ f
= h3 3 + 3h2 k 2 + 3hk 2 + k
∂x ∂ x∂y ∂x∂ 2 y ∂y 3
Going one step further we find,
4
(iv) 4∂ f 3 ∂ 4f 4
2 2 ∂ f
4
3 ∂ f
4
4∂ f
g (t) = h + 4h k 3 + 6h k + 4hk +k
∂x4 ∂ x∂y ∂x2 ∂ 2 y ∂x∂ 3 y ∂y 4
The pattern should now be a bit clear:
The nth derivative of g(t) = f (a + ht, b + kt) can be represented as the sum of the
th
n mixed partial derivatives of f , with respect to x and y, weighted by the Pascal triangle
combinatorial coefficients multiplied by the powers of h and k corresponding to the respective
order of the partial derivatives (h with x and k with y).
72 CHAPTER 1. MULTIVARIABLE FUNCTIONS
The final bit is to realise that if we insert this into the Taylor series for g(t), then we
have to evaluate all the derivatives at t = 0. Hence we have
g 000 (0) = . . . .
So finally we can insert the above expressions into the Taylor series for g(t), but then
chose to set t = 1 on the LHS. We have g(1) = f (a + h, b + k) so the Taylor series becomes
∂f ∂f
f (a + h, b + k) = f (a, b) + h +k +
∂x (a,b) ∂y (a,b)
2 2 2 !
1 ∂ f ∂ f ∂ f
+ h2 2
+ 2hk + k2
2! ∂x (a,b) ∂x∂y (a,b) ∂y 2 (a,b)
3 3 3 3 !
1 ∂ f ∂ f ∂ f ∂ f
+ h3 3
+ 3h2 k 2
+ 3hk 2 2
+ k3
3! ∂x (a,b) ∂ x∂y (a,b) ∂x∂ y (a,b) ∂y 3 (a,b)
+ ....
We may substitute for h = x − a and k = y − b and use the more compact subscript notation
partial derivatives to obtain
Notes:
• The first line of the formula is equivalent to the tangent plane approximation of the
function z = f (x, y) at the point (x, y, z) = (a, b, f (a, b)).
• The “. . . ” denotes an infinite sum of terms of increasingly higher order partial deriva-
tives. We make no statement about whether this sum converges (i.e., adds up to
something finite). This type of question will be explored in MATH2039.
Critical Points of Multivariate Functions 73
• Nevertheless for the functions we shall deal with here, we can assume that the more
terms we take in the multivariate Taylor series, the better the approximation to the
function f (x, y), given the data at (x, y) = (a, b).
• An analogous argument may be deployed to find the Taylor series for a function f :
Rn → R about a point (x1 , x2 , . . . , xn ) = (a1 , a2 , . . . an ), but we will restrict ourselves
in this module to only n = 2.
Example: Calculate the Taylor series expansion of f (x, y) = ex sin y about the point
(1, 0) up to third order in x and y:
Substituting into the Taylor series and retaining terms up to third order in x and y we have
1
+ 0(x − 1)2 + 2e(x − 1)(y − 0) + 0(y − 0)2 +
2!
1
+ 0(x − 1)3 + 3e(x − 1)2 (y − 0) + 3.0(x − 1)(y − 0)2 − e(y − 1)3 + . . .
3!
x 1 2 1 3
e sin y = e (y + 2(x − 1)y + (x − 1) y − y + . . . .
2 6
74 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Recall that for a function of one variable, y = f (x) the stationary points are located at the
solutions of
df
=0
dx
.
The type of stationary point is characterised according to the value of the second deriva-
tive.
d2 f
dx2 =0
d2 f d3 f
dx2 =0 dx3 =0
y d2 f y d2 f y d3 f y d4 f
>0 dx2 <0 >0 >0
dx2 dx3 dx4
���������
x x x x
Figure 1.36: Different types of stationary points for R → R functions.
d2 f d2 f d2 f
> 0 : Minimum < 0 : Maximum = 0 : Test Further
dx2 dx2 dx2
What is the corresponding set of tests for the existence of a minimum or maximum in
multivariate functions? We shall again focus on f : R2 → R, but the condition we derive is
valid for f : Rn → R.
Suppose we assume there is a local isolated maximum of z = f (x, y) at (x, y) = (a, b) and
that the partial derivatives of f exist there. Let us consider the directional derivatives at
(a, b).
∂f
Du (f ) = = ∇f · u,
∂u
in the direction u = cos θi + sin θj, where 0 ≤ θ ≤ 2π.
Let us suppose that ∇f 6= 0 at this maximum. Then for θ such that u is in the direction
of ∇f we have u = ∇f /|∇f | and
∂f ∇f |∇f |2
= ∇f · = = |∇f | > 0.
∂u |∇f | |∇f |
1.15. CONDITIONS FOR A LOCAL ISOLATED CRITICAL POINT 75
Thus in the direction u = ∇f /|∇f |, the function is increasing, which contradicts that
the assumption that (a, b) was a local maximum. Hence if the partial derivates exist at a
maximum, then the only other possible situation is that ∇f = 0 there.
An analogous argument holds for a local minimum by considering the directional deriva-
tive in the direction of −∇f .
A little more work (not required here) leads to a theorem that holds for f : Rn → R:
−2y = 0, ⇒ y = 0,
i.e., when (x, y) = (0, 0), see figure 1.37.
z
(x, y, z) = (0, 0, 1)
Figure 1.37: Plot of z = f (x, y) = 1 − x2 − y 2 showing the point (x, y, z) = (0, 0, 1) at which
∇f = 0: which is horizontal at that point and a maximum.
76 CHAPTER 1. MULTIVARIABLE FUNCTIONS
• Part (ii) in the theorem holds, for example if there is no critical point, but the function
is singular and diverges.
Example: Consider the function z = f (x, y) = 1/(x2 + y 2 ) when evaluated over any
domain D : (x, y) including (x, y) = (0, 0).
We have
2x 2y
∇f (x, y) = i+ 2 j.
(x2 2
+y ) 2 (x + y 2 )2
At the origin, (x, y) = (0, 0) a little limit exercise8 shows that both components in
∇f (0, 0) diverges and so does not exist.
A sketch of the function z = 1/(x2 + y 2) shows that it too diverges to +∞ at (x, y) =
(0, 0) but nowhere else, and so it is a local maximum (see figure 1.38).
y
x
Figure 1.38: Plot of z = 1/(x2 + y 2 ).
• Part (iii) in the theorem holds, for example if there is no critical point, but there is a
finite bounded domain.
Example: Consider the plane
z = f (x, y) = 1 − x − y
max z
z=1
(0, 0, 1)
z=1 x y
(0, 1, 0)
y
(1, 0, 0)
Note that critical points are not just local maxima or minima!
z = f (x, y) = x2 − y 2 .
∇f = 2xi − 2yj = 0.
Note that this implies that both components have to be equal to 0 simultaneously . Hence
we have a critical point when both
2x = 0, ⇒ x = 0,
2y = 0, ⇒ y = 0,
i.e., when (x, y) = (0, 0). However if we plot the function z = f (x, y) near to the origin, see
figure 1.40, we see that even though ∇f = 0 there, that point is neither a maximum not a
minimum.
78 CHAPTER 1. MULTIVARIABLE FUNCTIONS
c=0 c=0
yc < 0
z
c>0 c>0
x
x
x2 y2 = c c<0
Figure 1.40: Level height contours of z = x2 − y 2 . These are still curves where z = c where c
is a constant, i.e., x2 −y 2 = c. These are (for c 6= 0) hyperbolae. The contours corresponding
to c = 0 are the lines y = ±x. The regions y < |x| have c > 0, the regions y > |x| have
c < 0.
For:
• If you take a vertical section through the origin along the x axis, the origin looks like
a minimum, z = f (x, 0) = x2 .
• If you take a vertical section through the origin along the y axis, the origin looks like
a maximum, z = f (0, y) = −y 2
The origin is the typical example of the third type of critical point that satisfies ∇f = 0,
namely a saddle point. It is called a saddle point because the local shape of the surface
z = f (x, y) looks like a horse saddle.
Note that is also possible to have non-isolated critical points. For these situations,
∇f = 0 along whole curves, not just at single points. An easy visualisation is to take a piece
of paper and curve it over to create a “∪” or “∩” shape. We do not consider such cases in
this module.
In higher dimensions the same condition for the existence of a critical point holds.
f (x, y, z) = 5 + x2 + y 2 + z 2 + xy.
1.15. CONDITIONS FOR A LOCAL ISOLATED CRITICAL POINT 79
∇f = fx i + fy j + fz k = 0.
fx = 2x + y = 0, fy = 2y + x = 0, fz = 2z = 0.
The third equation immediate gives z = 0. Solving the first two as simultaneous equations
for x and y gives x = 0, y = 0. hence there is a critical point at (x, y, z) = (0, 0, 0).
We focus on case (i) in the theorem above, i.e., critical points for which ∇f = 0. Having
seen that this can lead to a maximum, a minimum or a saddle, how can we systematically
tell the difference between them algebraically? As in the 1-d case, this will use the second
(here partial) derivatives. We shall again restrict an explicit discussion to R2 → R.
Consider the Taylor expansion of the function z = f (x, y) around an isolated interior
critical point P at the point (x, y) = (a, b). From above, we know that ∇f (a, b) = 0. So the
Taylor series expansion about P becomes
( )
=0 =0
f (x, y) = f (a, b) + x (a, b)(x − a) +
f y (a, b)(y − b)
f +
7
1
+ fxx (a, b)(x − a)2 + 2fxy (a, b)(x − a)(y − b) + fyy (a, b)(x − a)2 + . . .
2!
Thus whether the critical point is a maximum or minimum will depend on the sign of
the quadratic terms, or quadratic form:
Q = h2 fxx (a, b) + 2hkfxy (a, b) + k 2 fyy (a, b) .
Clearly we have:
We don’t really want to have to check this for all (h, k) ∈ N as we would be here forever.
Fortunately, we can rearrange the formula to put it in terms of sums of squared quantities.
1 2
Q= (hfxx + kfxy )2 + fxx fyy − fxy
2
k
fxx
It’s now not too difficult to see that the sign of Q depends on the sign of fxx and the
2
sign of the combination fxx fyy − fxy , since all the other terms in the expression are squared
and so are positive. Since these are independent of h and k we don;t have to check the sign
of Q throughout the neighbourhood, but just by evaluating these quantities at P .
2
• fxx (a, b) > 0 and fxx (a, b)fyy (a, b) − fxy (a, b) > 0 ⇒ Q > 0, so minimum;
2
• fxx (a, b) < 0 and fxx (a, b)fyy (a, b) − fxy (a, b) > 0 ⇒ Q < 0, so maximum.
2
What happens if fxx (a, b)fyy (a, b) − fxy (a, b) < 0?
2
For the sake of brevity, if this is the case we define fxx fyy − fxy = −D2 , where D is real,
so that −D2 is always negative. Then the equation Q = 0 has two real roots and Q may be
rearranged as (Gory and we’ll just quote and use the result!):
fxx fxx fxx − D2
Q = γ(k − αh)(k − βh); α= , β = , γ = .
fxy − D2 fxy + D2 2fxx
Recalling the definitions of h = x − a and k = y − b, we can sketch the following diagrams
in the neighbourhood of P , i.e, (h, k) = (0, 0):
✓ ◆ k k = ↵h ✓ ◆
k > ↵h k < ↵h
k> h k> h
Q>0
Q<0
h
Q<0 k= h
Q>0
✓ ◆ ✓ ◆
k > ↵h k < ↵h
k< h k< h
Figure 1.41: Schematic representation of the domains giving the signs of Q in the neigh-
bourhood of h = x − a and k = y − b.
1.15. CONDITIONS FOR A LOCAL ISOLATED CRITICAL POINT 81
The diagram shows that in two sectors, f increases away from the critical point, whereas
in two others, it decreases. The critical point is neither a maximum nor a minimum, but the
saddle point we saw above.
2
Notice that the quantity fxx fyy − fxy is a major player in determining the nature of the
critical point. Notice that it can also be written as the determinant of a matrix of partial
derivatives. This is called a Hessian determinant:
Definition: Assuming that the function z = f (x.y) possesses continuous second partial
derivatives and that all the first partial derivatives exist. We define the Hessian matrix of
f as
2
∂ f ∂ 2f
∂x2 ∂x∂y
H(f )(x, y) = ∂ 2f
∂ 2f
∂y∂x ∂y 2
The Hessian determinant ∆ is defined as
2
∂ f ∂ 2f
∂x2 ∂x∂y
∆(x, y) = detH(f )(x, y) = 2 .
∂ f ∂ 2f
∂y∂x ∂y 2
All of the above can now be summarised in the following theorem (no proof required):
∂2f
(i) If ∆(a, b) > 0 and ∂x2
(a, b) > 0, then f (x, y) has a local minimum at (a, b).
∂2f
(ii) If ∆(a, b) > 0 and ∂x2
(a, b) < 0, then f (x, y) has a local maximum at (a, b).
NB: You may be worrying about what happens when ∆(a, b) > 0 and ∂ 2 f /∂x2 = 0. The
answer is that it can never occur because the condition on the continuity of of the second
order partial derivatives means that the mixed partials are equal: fxy (a, b) = fyx (a, b). Hence
∆(a, b) = fxx (a, b)fy (a, b) − fxy (a, b)2 and so if fxx = 0, ∆(a, b) ≤ 0.
82 CHAPTER 1. MULTIVARIABLE FUNCTIONS
f (x, y) = x2 + y 2
. From an example above, we know that there is a single critical point at (x, y) = (0, 0).
Evaluating the second partial derivatives we have:
so,
2
fxx (0, 0) = 2 > 0, ∆(0, 0) = fxx (0, 0)fyy (0, 0) − fxy (0, 0) = 2.2 − 0 = 4 > 0,
so (0, 0) is a minimum.
f (x, y) = 1 − x2 − xy − y 2
−2x − y = 0 ⇒ y = −2x
−x − 2y = 0 ⇒ y = −x/2
• Solving these simultaneously we obtain that f has a critical point at (x, y) = (0, 0).
• To determine the nature of the critical point we evaluate the second partial derivatives:
so (0, 0) is a maximum.
1.15. CONDITIONS FOR A LOCAL ISOLATED CRITICAL POINT 83
f (x, y) = x + y − x2 − xy
∇f = fx i + fy j = (1 − 2x − y)i + (1 − x)j = 0.
1 − 2x − y = 0 ⇒ y = −2x + 1
1−x=0 ⇒ x=1
• Solving these simultaneously we obtain that f has a critical point at (x, y) = (1, −1)
• To determine the nature of the critical point we evaluate the second partial derivatives:
f (x, y) = x + y 2 − x3 /3
∇f = fx i + fy j = (1 − x2 )i + (2y)j = 0.
1 − x2 = 0 ⇒ x = ±1
y=0 ⇒ y=0
84 CHAPTER 1. MULTIVARIABLE FUNCTIONS
• Solving these simultaneously we obtain that f has a critical point at (x, y) = (1, 0)
and (x, y) = (−1, 0)
• To determine the nature of the critical point we evaluate the second partial derivatives:
– At (1,0):
fxx (x, y) = −2x, fxy (x, y) = 0, fyy (x, y) = 2,
so,
2
fxx (1, 0) = −2 < 0, ∆(1, 0) = fxx (1, 0)fyy (1, 0)−fxy (1, 0) = −2.2−(0)2 = −4 < 0,
so (1, 0) is a saddlepoint.
– At (-1,0):
fxx (x, y) = −2x, fxy (x, y) = 0, fyy (x, y) = 2,
so,
2
fxx (−1, 0) = 2 > 0, ∆(−1, 0) = fxx (−1, 0)fyy (−1, 0)−fxy (−1, 0) = 2.2−(0)2 = +4 > 0,
so (−1, 0) is a minimum.
z
saddle
y (1, 0)
min x
( 1, 0)
f (x, y) = x2 − 3xy 2 + 2y 4
∇f = fx i + fy j = (2x − 3y 2 )i + (−6xy + 8y 3 )j = 0.
1.15. CONDITIONS FOR A LOCAL ISOLATED CRITICAL POINT 85
2x − 3y 2 = 0 ⇒ y 2 = 2x/3
−6xy + 8y 3 = 0 ⇒ y = 0, y 2 = 3x/4
• Solving these simultaneously we obtain that f has a critical point at (x, y) = (0, 0).
• To determine the nature of the critical point we evaluate the second partial derivatives:
– At (0,0):
so,
2
fxx (0, 0) = 2 > 0, ∆(0, 0) = fxx (0, 0)fyy (0, 0)−fxy (0, 0) = 2.(−6.0+24.0)−(−6.0)2 = 0,
so the Hessian matrix test fails at (0, 0). More information is needed. (In
fact looking at contour plots around (0, 0) suggests that the critical point is a
saddlpoint.)
The classification of critical points in higher dimensions follows from a simple extension of
the above results. It proceeds by calculation of successive sub-determinants of the relevant
Hessian matrix.
a11 · · · a1k
a
a .. . . ..
∆0 = 1; ∆1 = a11 ; ∆2 = 11 12 ;
... ∆k = . . . .
a21 a22
ak1 · · · akk
Given this definition, the following theorem can be proved (proof not required):
86 CHAPTER 1. MULTIVARIABLE FUNCTIONS
∂f ∂f ∂f
∇f = i+ j+ k = (2x + y)i + (2y + x)j + 2zk = 0.
∂x ∂y ∂z
2x + y = 0, x + 2y = 0, z = 0,
Hence all the ∆i , i = 0, 1, 2, 3 are greater than zero. Hence the critical point P is a
minimum.
Maximisation and Minimisation with a Constraint: Lagrange Multipliers 87
Hessians should not be confused with mercenaries used in the American War of Indepen-
dence, or Deputy-Führers of the same names.
Example: Find the dimensions and area of the largest rectangle inside the ellipse
x2
+ y 2 = 1.
4
y
(1, 0)
( x, y) (x, y)
A
( 2, 0) (2, 0) x
( x, y) (x, y)
( 1, 0)
• Consider a rectangle that lies within the ellipse with the top right corner being at the
point (x, y) that lies on the ellipse (see diagram). By symmetry the other three corners
of the rectangle are at (x, −y), (−x, y) and (−x, −y).
• The lengths of the sides are 2x and 2y, so the area of the rectangle within the ellipse
is A = 4xy.
9
We take the positive square root as y is a length.
88 CHAPTER 1. MULTIVARIABLE FUNCTIONS
• The finding of the critical point x corresponding to the maximum of A would then
take place by just differentiating A with respect to x and setting
dA √ 2x2 √
= 2 4 − x2 − √ =0 ⇒ x= 2.
dx 4 − x2
where we have taken the positive value of x since it is the length of a rectangle.
√ √ √ √
• Hence we have y = 4 − 2/2 = 1/ 2 and so A = 4 2/ 2 = 4.
In this case we were able to reduce the problem to one of single variable calculus by being
able to solve the constraint equation (x2 /4 + y 2 = 1) exactly for one of the variables. What
happens when the constraint is not solvable for one of the variables, e.g., (x + y) = e−xy ? In
the general case, we shall use the Method of Lagrange Multipliers.
• We seek to optimise L with respect to all three of its variables x, y, λ. The critical
point is when ∇x,y,λ L = 0, i.e., when
∂L ∂f ∂g
= +λ = 0,
∂x ∂x ∂x
∂L ∂f ∂g
= +λ = 0,
∂y ∂y ∂y
∂L
= g(x, y) = 0.
∂λ
• Clearly from the third line, if we find the critical point of L we automatically satisfy
the constraint g(x, y) = 0. Hence the critical points of L lie on C.
• Hence if we are on C and we find the critical point of L from the above three equations,
we are also finding the critical point of f on C, i.e., we are finding the max/min of
f (x, y) subject to the constraint g(x, y) = 0.
• We can go further, and eliminate the Lagrange multiplier λ from the first two equations:
∂f ∂g ∂f ∂g
+λ = 0, ⇒ λ=−
∂x ∂x ∂x ∂x
∂f ∂g ∂f ∂g
+λ =0 ⇒ λ=−
∂y ∂y ∂y ∂y
∂f ∂g ∂f ∂g
⇒ =
∂x ∂x ∂y ∂y
which may be rearranged to give:
∂f ∂g ∂g ∂f
− =0
∂x ∂y ∂x ∂y
or, in terms of the Jacobian determinant,
∂(f, g)
= 0.
∂(x, y)
• Note that the actual value of the Lagrange multiplier is therefore irrelevant!
f (x, y) = 4xy
g(x, y) = x2 /4 + y 2 − 1 = 0
• Finally since x and y are meant to be the lengths of sides of a rectangle, they both
have to be positive. So the critical point is when
√ √
(x, y) = ( 2, 1/ 2)
and the area is √ √
4 2/ 2 = 4.
Example: A farmer wants to enclose a rectangle of land next to a stream (see diagram)
She will build 3 fences, the two perpendicular to the stream will be of length x, the one
parallel to the stream will be of length y. If the farmer has only 100m length of fencing,
what is the maximum area that she may enclose?
x A x
Figure 1.44: Area of enclosed rectangular field is A = xy. Since it is adjacent to the stream
only three fences are built with total perimeter 2x + y = 100m.
1.16. MAXIMISATION AND MINIMISATION WITH A CONSTRAINT:LAGRANGE MULTIPLIERS
y = 2x
g(x, y) = 2x + y − 100 = 0.
2x + 2x − 100 = 0 ⇒ x = 25.
Other types of problems involving optimisation with a constraint are possible (e.g., there
are multiple constraints and/or at least on of the constraints is an inequality), but we will
leave these for later modules.
One interesting use of optimisation with a set of algebraic constraints is in the selection
of an optimal portfolio of stocks to maximise the expected return. This problem is dealt
with in MATH3022.
92 CHAPTER 1. MULTIVARIABLE FUNCTIONS
Chapter 2
Multiple Integration
2.1 Introduction
Having dealt with the differentiation of functions of multiple variables, we now consider the
integration of functions Rn → R.
You have already sen the integral I of a (continuous) function of one variable y = f (x)
over and interval a ≤ x ≤ b:
ˆ b
I= f (x)dx.
a
The geometrical meaning of this symbol is the area of the plane region bounded by the
curve y = f (x), the x-axis and the line x = a and x = b.
y = f (x)
Z b
I= f (x) dx
a
x
x=a x=b
93
94 CHAPTER 2. MULTIPLE INTEGRATION
z = f (x, y)
D y
x
Figure 2.2: Volume of vertical cylinder with base D in (x, y) plane, under surface z = f (x, y)
and above z = 0 is given by ˆ
V= f (x, y) dx dy
D
By analogy, this is the volume of a three dimensional region V, bounded above by the
surface x = f (x, y), below by the (x, y)-plane (z = 0), and the vertical (irregular) cylinder
sides parallel to the z-axis, emerging from the boundary of the domain D (see figure).
The region D is called the domain of integration. We will represent the integral with
the symbol ˆ
V= f (x, y) dA.
D
The symbol dA is the elemental area element in the (x, y)-plane. We can consider the
volume integral as a limit in the following way.
• If we are using cartesian coordinates, (x, y) and the infinitesimal elements are rectan-
gles, we can write
dA = dx dy,
where dx and dy are the sides of the infinitesimal rectangles.
• Now dA at the point (xi , yj ) can be regarded as the base of a vertical rectangular
column of height f (xi , yj )
Properties of the Multiple Integral 95
z z
x y x y
)
X
V = lim f (xi , yj )dA
dA!0
i,j
Figure 2.3: Volume under surface is the limit of the sum of the volumes of the columns of
height f (xi , yi ) each with base area dA as dA → 0.
When we are evaluating an integral explicitly in terms of (x, y), it is normal to write the
integral in the form ¨
V = f (x, y)dx dy.
D
The “double” integration signs denote that we integrate both in the x direction and also
the y direction. The order in which you carry out the integrations will be discussed later.
At first site this looks like the volume of an irregularly bounded vertical cylinder of
with base D and height 1, which indeed it is. But we also know that the general
formula for the volume of a vertical cylinder of constant height is
Since the height of the cylinder is 1, the numerical value of the volume (forgetting the
units!) is the same as the area of D.
96 CHAPTER 2. MULTIPLE INTEGRATION
Therefore we may use double integrals to measure areas of regions of the plane.
ˆ ˆ
Area of D = 1 dA = dA.
D D
D
x y
Figure 2.4: Right cylinder of height 1 and base area D also has (numerical) volume D.
This just says that if have multiple copies of two different volume cylinders, the total
volume is just the sum of all the copies.
3. Inequalities are preserved: If we have two integrable functions f , g and that
f (x, y) ≤ g(x, y) over the integration domain D then
ˆ ˆ
f (x, y) dA ≤ g(x, y) dA.
D D
This just says that if you have two irregular cylinders, one of which has a height smaller
than or equal to the other, then the overall taller cylinder will have the greater volume.
4. Modulus Property: ˆ ˆ
f (x, y) dA ≤ |f (x, y)| dA
D D
This just reflects the fact that if f (x, y) dips below the horizontal (x, y) axis and
becomes negative in parts of the domain D, then the volume becomes negative (cf.
1-D integrals and negative areas).
5. Additivity of Domains: If D1 , D2 , D3 , . . . Dk are non-overlapping domains in the
(x, y)-plane, on each of which f (x, y) is integrable, then f (x, y) is integrable over the
union of these domains D = D1 ∪ D2 ∪ D3 ∪ . . . ∪Dk .
ˆ Xk ˆ
f (x, y) dA = f (x, y) dA.
D j=1 Dj
Evaluation of Double Integrals 97
This just says the volume of a series of cylinders of height f (x, y) over bases Di is just
the sum of the individual volumes.
z
z = f (x, y)
D3
x D2
D1 y
´ P3 ´
Figure 2.5: If D = D1 ∪ D2 ∪ D3 then D
f (x, y)dA = j=1 Di
f (x, y)dA.
Consider an integral over a rectangular domain R in the (x, y) plane. How do we know
whether to integrate in the x direction or the y direction first? Fortunately we are guided
by a theorem due to Fubini1 (proof not required):
Theorem 2.3.1. (Fubini) Let R = [a, b] × [c, d] be a rectangle in R2 whose sides are parallel
to the coordinate axes and let f (x, y) be a function that is continuous on R. Then,
ˆ bˆ d ˆ dˆ b
f (x, y) dy dx = f (x, y) dx dy.
a c c a
Example 1: Find the volume of the solid lying above the square in the (x, y)-plane
defined by 0 ≤ x ≤ 1, 1 ≤ y ≤ 2 and below the plane z = 4 − x − y.
1
Surprisingly, given the long history of calculus, dating from only 1907!)
98 CHAPTER 2. MULTIPLE INTEGRATION
• We will evaluate the x-integral first, treating y as a constant, then integrate with
respect to y, as follows:
z=4 x y
x=0
x=1 D y
y=1 y=2
x
ˆ y=2 ˆ x=1
V = dy dx(4 − x − y),
y=1 x=0
y=2 x=1
x2
ˆ
= dy 4x − − yx ,
y=1 2 x=0
ˆ y=2
(1)2 (0)2
= dy 4 × (1) − − y × (1) − 4 × (0) − − y × (0) ,
y=1 2 2
ˆ y=2
1
= dy 4 − − y ,
y=1 2
ˆ y=2
7
= dy −y ,
y=1 2
y=2
7 y2
= y− ,
2 2 y=1
7 (2)2 7 (1)2
= × (2) − − × (1) − ,
2 2 2 2
= 2.
2
since x is continuous y is continuous, 4 − x − y is continuous etc.
2.3. EVALUATION OF DOUBLE INTEGRALS 99
y=2
y=2
x=0 x=1
y=1
y=1
x x
O O
x=0 x=1
ˆ x=1 ˆ y=2
V = dx dy(4 − x − y),
x=0 y=1
x=1 y=2
y2
ˆ
= dx 4y − xy − ,
x=0 2 y=1
ˆ x=1
(2)2 (1)2
= dx 4 × (2) − 2x − − 4 × (1) − x − ,
x=0 2 2
ˆ x=1
1
= dx 8 − 2 − 4 + − 2x + x ,
x=0 2
ˆ x=1
5
= dx −x ,
x=0 2
x=1
5 x2
= x− ,
2 2 x=0
5 (1)2 5 (0)2
= × (1) − − × (0) − ,
2 2 2 2
= 2.
Note: The limits on the inner integral do not depend here on the integration variable of
the outer limit. This is because, in this example, the domain of integration D is a rectangle.
If D were not a rectangle, the inner limits would depend on the outer integration variable.
100 CHAPTER 2. MULTIPLE INTEGRATION
Not all domains of integration are rectangular. However when we have non-rectangular
domains, evaluation of an integral is easiest when the domain of integration is said to be
simple, see diagram.
y y
y=d
y = d(x)
D
D x = b(y)
x = a(y)
y = c(x)
y=c
x x
x=a x=b
• y-Simple: A y-simple domain is one that is bounded by two vertical lines x = a and
x = b and two continuous graphs y = c(x) and y = d(x).
x = a, x = b, y = c(x), y = d(x),
where c(x) and d(x) are both continuous functions. The integral
ˆ
I= f (x, y) dA
D
is the volume of the vertical cylinder of base D and bounded at the top by z = f (x, y).
We shall now consider how to evaluate this multiple integral using an approach akin to
the very thin slicing of a loaf of bread. The volume of the loaf is the sum of the volume all
the slices (assuming no crumbs).
• Consider a very thin slice of the solid, perpendicular to the x-axis as the point x of
width dx (see diagram).
z z
↵(x) (y)
x y x y
dx
dy
Z y=d(x) Z x=b(y)
↵(x) = f (x, y) dy (y) = f (x, y) dx
y=c(x) x=a(y)
x simple y simple
Figure 2.9: Left: x-slicing. Right: y-slicing
c(x) ≤ y ≤ d(x)
.
• The area of the vertical slice is thus
ˆ y=d(x)
α(x) = f (x, y)dy.
y=c(x)
102 CHAPTER 2. MULTIPLE INTEGRATION
• But remembering the definitions of the volume I and α(x) above, we have
ˆ ˆ x=b "ˆ y=d(x) #
I= f (x, y) dA = f (x, y)dy dx
D x=a y=c(x)
We usually drop the square brackets with the understanding that the first d-something
is the inner integral that is done first. Hence we can write
ˆ ˆ x=b ˆ y=d(x)
I= f (x, y) dA = f (x, y)dy dx
D x=a y=c(x)
• Note that since the volume should not ultimately contain terms in y or x, the limits
are often the key to understanding which of the integrals should be performed first.
Hence, after practice, it is usually possible to leave out the explicit references to x =
and y = in the limits and write the integral as.
ˆ ˆ b ˆ d(x)
I= f (x, y) dA = f (x, y)dy dx
D a c(x)
The advantage of writing the dx and the dy next to the relevant integral sign is that
it reminds you what what has to be substituted in as the limits after each of the
integrations. The implication here is that the y integral must still be done first, since
it has limits that depend on x, the integration that is here done last.
Under those circumstances we could then evaluate the integral as (see above diagram):
ˆ ˆ y=d ˆ x=b(y)
I= f (x, y) dA = β(y) dy, β(y) = f (x, y)dx
D y=c x=a(y)
or equivalently
ˆ ˆ d ˆ b(y)
I= f (x, y) dA = f (x, y)dx dy,
D c a(y)
or equivalently
ˆ ˆ d ˆ b(y)
I= f (x, y) dA = dy dx f (x, y).
D c a(y)
2.3. EVALUATION OF DOUBLE INTEGRALS 103
Definition (Regular Domain): A domain that is not simple, but which can be split up
into a series of x-simple or y-simple domains is called regular. An integral over a regular
domain is evaluated as the sum of integrals over each constituent simple domain (property
5 above).
p
Example: Consider a bagel-shaped domain, 1 ≤ x2 + y 2 ≤ 2. This is not a simple
domain as it has a hole in the middle and the functions that define the boundary are multi-
valued. However by careful slicing, the bagel can me turned into the sum of simple domains,
each of which have boundaries that may be written in terms of single-valued functions.
� �
D D2
� �
y �
) y �
D1 D4
-� -�
D3
-� -�
-� -� � � � -� -� � � �
x x
� � � �
D2
� � � �
= y �
D1 + y �
+ y �
+ y �
D4
-� -� -� -�
D3
-� -� -� -�
-� -� � � � -� -� � � � -� -� � � � -� -� � � �
x x x x
p p p p p p p p
4 x2 y 4 x2 1 x2 y 4 x2 4 x2 y 1 x2 4 x2 y 4 x2
2x 1 1 x +1 1 x +1 +1 x +2
Figure 2.10: Slicing and dicing your regular bagel domain D, to obtain, e.g., a representation
of y-simple domains, such that D = D1 + D2 + D3 + D3 .
The integral of a function z = f (x, y) over the bagel-domain can therefore be evaluated
as follows:
¨ X4 ¨
f (x, y) dxdy = f (x, y) dxdy.
D i=1 Di
104 CHAPTER 2. MULTIPLE INTEGRATION
The following examples stress the importance of drawing the domain of integration in the
(x, y)-plane. It gives geometrical insight that can lead to an understanding of what is going
on and in some cases, considerable simplications.
Example 2: Evaluate
ˆ
I= xy dA,
T
over the triangle T in the (x, y)-plane, with vertices (0, 0), (1, 0) and (1, 1).
The triangle is both x-simple and y-simple (see diagram), so again we can chose the order
in which we perform the integrations.
z = xy
y=x
O
T
x=1 x
• Consider the diagram. For each constant slice of x, we need to integrate y from 0 to
the sloping side of the triangle T . The sloping side of T has the equation y = x The
coordinate of the intersection of the slice at a general point x with the sloping side of
T is (x, y(x)) = (x, x)
• Having integrated over each of these x-slices, we then have to integrate over the range
0 ≤ x ≤ 1 (see diagram).
2.3. EVALUATION OF DOUBLE INTEGRALS 105
• Consider the diagram. For each constant slice of y, we need to integrate x from the
sloping side of the triangle T to x = 1. The sloping side of T has the equation y = x
The coordinate of the intersection of the slice at a general point y with the sloping side
of T is (x(y), y) = (y, y)
• Having integrated over each of these y-slices, we then have to integrate over the range
0 ≤ y ≤ 1 (see diagram).
y y
y=1
y=x T T
x=y x=1
y=0 y=0
x x
x=0 x=1
Figure 2.12: Left: y-Simple evaluation: Right: x-Simple evaluation.
106 CHAPTER 2. MULTIPLE INTEGRATION
y=1 x=1
x2
ˆ
= dy y ,
y=0 2 x=y
ˆ y=1 2 2
(1) (y)
= dy y − y ,
y=0 2 2
ˆ y=1
y y3
= dy − ,
y=0 2 2
2 y=1
y y4
= − ,
4 8 y=0
1
= .
8
Sometimes altering the order of the integration can make the integral easier to evaluate.
Example 3: Evaluate ˆ
3
I= ey dA,
D
√
where D is the domain in the figure below, bounded as 0 ≤ x ≤ 1, x ≤ y ≤ 1.
y y=1 y y=1
��� ���
D D
��� ��� x=0
x = y2
��� ���
��� p ���
y= x
��� ���
However this leads to an immediate problem: we have no idea what the indefinite integral
3
of ey is. (If you don’t believe me, try to find it in a table of integrals, or ask WolframAlpha
to do it.)
The trick here is to reverse the order of integration and do the x integral first, and
then the y-integral. To do this we need to work out the new limits for the integral. Hence
we now consider the same integral but evaluate it as x-simple.
• We are now going to integrate first with respect to x. The limits for a constant y-slice
are (see diagram) 0 ≤ x ≤ x(y) = y 2 . The limits for the second, y-integral are just
0 ≤ y ≤ 1.
• We then have
ˆ y=1 ˆ x=y 2
3
I= dy dx ey .
y=0 x=0
• The point of swapping the order of integration is that we can now easily do at least
the first x integral:
ˆ y=1 h 3 ix=y2
I = dy xey ,
y=0 x=0
ˆ y=1 h i
2 y3 y3
= dy (y )e − (0)e ,
y=0
ˆ y=1
3
= dy y 2 ey .
y=0
• Now the key point here is that we are left with a y-integral that we can do. By
inspection we see that the integrand is an exact derivative:
3
!
2 y3 d ey
y e = .
dy 3
108 CHAPTER 2. MULTIPLE INTEGRATION
1 y=1
d y3
ˆ
= dy e
3 y=0 dy
1h 3
iy=1
= ey
3 y=0
1 1
= e − e0
3
1
= (e − 1)
3
Polar Coordinates
Some integrals are much simpler t evaluate if they are expressed in alternative coordinate
systems to cartesians (x, y). For example, when you have a circular domain D, or a radially-
symmetric integrand, it is often easier to evaluate the integral in terms of polar coordinates.
Recall that in polar coordinates there is an origin called a pole and denoted by O and a
polar axis which is a half-line extending from O.
P
r
✓
O x
Figure 2.14: Plane polar coordinates
The position of a point P in the plane is determined by its polar coordinates (r, θ) where:
2.3. EVALUATION OF DOUBLE INTEGRALS 109
• θ is the angle that the vector OP make with the polar axis, counterclockwise angles
being counted as positive.
Example 1: Find the volume V of the solid region lying above the (x, y) plane and
beneath the paraboloid z = 1 − x2 − y 2 .
Figure 2.15: The volume we want is that above the grey horizontal (x, y)-plane, but below
the meshed surface. The black circle is x2 + y 2 = 1 and denotes the intersection of the two
and so is the boundary of D for this example. It looks rather like the O2 in London....
110 CHAPTER 2. MULTIPLE INTEGRATION
p
y=+ 1 x2
���
���
D
y ���
x= 1 x = +1
-���
-���
p
y= 1 x2
-��� -��� ��� ��� ���
The paraboloid intersects the (x, y)-plane in the circle z = 0 ⇒ x2 + y 2 = 1. The domain
D is thus x-simple or y-simple. We can then pick, say, the y-simple route and so in cartesian
coordinates the required volume is
ˆ ˆ x=+1 ˆ y=+√1−x2
V = (1 − x2 − y 2 ) dA = dx √
dy (1 − x2 − y 2 ).
D x=−1 y=− 1−x2
The integral can be done, but it involves a lot of messy algebra. It is much neater to convert
the problem to polar coordinates.
x2 + y 2 ≤ 1 ⇒ r ≤ 1, 0 ≤ θ ≤ 2π.
Now it is vitally important to realise that just because dA = dxdy, it does not mean that
when expressed in polars, the equivalent infinitesimal area element dA0 is equal to drdθ. For
2.3. EVALUATION OF DOUBLE INTEGRALS 111
a start this is dimensionally wrong: both dx and dy have units of length, so dA has units of
(length)2 . While dr has a unit of length, dθ is just an angle, with (numerically) no units.
Hence drdθ only has units of length, not (length)2 .
If we sketch the respective elemental areas in cartesian and polar coordinates we can
deduce graphically what the representation for dA is in polar coordinates.
y y
dA0
y + dy
dA dy dr
rd✓
y
d✓
dx
✓
O x O x
x x + dx
Figure 2.17: Left: dA = dx dy in cartesians. Right: In the limit as dr and dθ → 0,
dA0 = rdr dθ in plane polars.
We find that in the limits that dx, dy → 0 and dr, dθ → 0 that both elemental areas are
to leading order rectangles. Hence we have:
Note the extra r in the integrand that comes from the change of variable!
Modulo that complication, look now at how easy the integral is to evaluate. The limits
have decoupled and the integrand only depends on one of the variables r. This is because
of the radial symmetry of both the integrand (it only depends on r) and the domain of
integration (it’s a circle). Therefore the double integral is effectively just the product of two
separate, independent integrals. We have:
112 CHAPTER 2. MULTIPLE INTEGRATION
ˆ r=1 ˆ θ=2π
V = dr dθ r(1 − r2 )
r=0 θ=0
ˆ θ=2π ˆ r=1
= dθ dr (r − r3 )
θ=0 r=0
θ=2π r=1
r2 r4
ˆ
= dθ −
θ=0 2 4 r=0
ˆ θ=2π
1 1
= dθ −
θ=0 2 4
1 θ=2π
ˆ
= dθ
4 θ=0
1 θ=2π
= [θ]
4 θ=0
π
= .
2
Sometimes changes of variables can be used to evaluate integrals that we otherwise would
not know how to evaluate.
This integral comes up everywhere in probability√theory as the normalisation term for the
normal distribution. You may know the value as π, but how do you obtain that? There is
2
no antiderivative for e−x in terms of simple functions. One trick is to use polar coordinates
as follows. We consider the integral I 2 and write it as
ˆ +∞ ˆ +∞
2 −x2 −y 2
I = e dx × e dy .
−∞ −∞
2.3. EVALUATION OF DOUBLE INTEGRALS 113
The second integral is exactly the same as the first, it is just that we are using a different
dummy integration variable. If we do that we may rewrite the integrals as a double integral
over R2
ˆ +∞ ˆ +∞
2 2
2
I = dx dy e−(x +y ) .
−∞ −∞
Now look at the integrand. It has the tell-tale radial symmetry of polar coordinates, sincex
and y only appear together in combinations of the type x2 + y 2 .
y +1 y
"
1 ! +1 r ! +1
O x O x
0 ✓ 2⇡
#
1
Figure 2.18: Double Gaussian integral domain in 2-D cartesians (left) and plane polars
(right).
Summary:
2 +y 2 ) 2
1. Express the integrand as a function of r and θ: e−(x → e−r .
2. Express the domain of integration (limits) in polar coordinates: the R2 plane can (with
a bit of care) be expressed as 0 ≤ r ≤ +∞, 0 ≤ θ ≤ 2π.
Once again the integrals have decoupled: the limits do not depend on each other and the
114 CHAPTER 2. MULTIPLE INTEGRATION
2
!
θ=2π r=+∞
d e−r
ˆ ˆ
= dθ dr −
θ=0 r=0 dr 2
" 2
#r=+∞
θ=2π
−e−r
ˆ
= dθ
θ=0 2
r=0
ˆ θ=2π
1
= dθ
θ=0 2
1 θ=2π
ˆ
= dθ
2 θ=0
= π.
√
Hence, since I 2 = π, we have that I = π.
Other examples of changes of variables to polar coordinates are in the problem sheets.
The transformation of double integrals to polar coordinates is just a special case of a general
change of variables formula for double integrals. Often a general change of variables can
help to simplify the algebraic representation of a domain of integration, or to deliver a
simplification of the integrand.
that give a relationship between the old variables (x, y) and the new variables (u, v).
We assume that this transformation is locally invertible, i.e., that there exists a pair of
inverse functions
x = X(u, v), y = Y (u, v).
These functions map the domain D of integration in the (x, y) domain onto a domain D0
in the (u, v) plane.
2.3. EVALUATION OF DOUBLE INTEGRALS 115
y v
u = U (x, y)
v = V (x, y)
)
D D0
(
x = X(u, v)
y = Y (u, v)
x u
Figure 2.19: Transformations between domain D in (x, y) plane and domain D0 in (u, v)
plane.
y v
C0
D0
D C
u = U (x, y)
dA0
dy dA )
v = V (x, y)
B0
A dx B
j A0
i
x u
Figure 2.20: Effect of transformation u = U (x, y), v = V (x, y) on dA = dx dy
Under the transformation u = U (x, y), v = V (x, y), these corners map respectively to the
points A0 , B 0 , C 0 , D0 in the (u, v) plane.
116 CHAPTER 2. MULTIPLE INTEGRATION
Due to the infinitesimal size, of the original rectangle in the (x, y) plane, the new in-
finitesimal patchwork dA0 will approximately be a parallelogram. The coordinates of its
vertices will be
A : (x, y) → A0 : (U (x, y), V (x, y))
Suppose we set up unit vectors i and j parallel to the u and v axes in the (u, v) plane.
Now, from the diagram above, the area of the parallelogram dA0 in the (u, v) plane is
given by the standard vector product result of
dA0 = |A0 B0 × A0 D0 |,
where from the coordinates above,
A0 B0 = (U (x + dx, y) − U (x, y)) i + (V (x + dx, y) − V (x, y)) j
A0 D0 = (U (x, y + dy) − U (x, y)) i + (V (x, y + dy) − V (x, y)) j
These can be expanded about (x, y) using Taylor’s theorem (or use the definition of partial
derivatives) to yield
0 0 ∂U ∂V
AB = dx i + dx j + O(dx)2 ,
∂x ∂x
0 0 ∂U ∂V
AD = dy i + dy j + O(dy)2 .
∂y ∂y
Hence in the limit as dx and dy → 0, we have (using subscripts to denote partial derivatives
to save space)
i j k
Ux Uy Ux Uy ∂(U, V )
0
dA = Ux dx Vx dx 0 =
V V dx dy k = Vx Vy dx dy = ∂(x, y) dA
x y
Uy dy Vy dy 0
Thus the transformed infinitesimal element depends on the modulus of the Jacobian of the
transformation functions at that point. Our initial assumption that a local inversion of the
functions existed means that the Jacobian is non-vanishing. Hence, using the reciprocity
property of Jacobians, we have
∂(X, Y )
dx dy = du dv.
∂(u, v)
2.3. EVALUATION OF DOUBLE INTEGRALS 117
Example: We can verify that this result is consistent with the geometrical result we
obtained for plane polar coordinates:
x = r cos θ, y = r sin θ.
Using the formula above, we have:
∂(x, y) xr xθ
dx dy = dr dθ = dr dθ = cos θ −r sin θ dr dθ = r(cos2 θ + sin2 θ) dr dθ.
∂(r, θ) yr yθ sin θ r cos θ
1. Identify the transformations between the old variables (x, y) and the new variables
(u, v): x = X(u, v), y = Y (u, v).
2. Express the integrand as a function of the new variables (u, v).
3. Write the boundaries of the domain of integration in terms of the new variables (u, v).
4. Write the area element as dA = |J(u, v)| du dv where
∂x ∂x
∂u ∂v
J(u, v) = .
∂y ∂y
∂u ∂v
5. Evaluate the integral!
The motivation for changing the variable could be to exploit a symmetry to simplify the
integrand, or to simplify or decouple the limits, so as to make it easier to perform the
integral.
Example:
Use an appropriate change of variables to find the area of an ellipse with shape given by
x2 y 2
+ 2 = 1,
a2 b
where a, b are positive constants.
118 CHAPTER 2. MULTIPLE INTEGRATION
1. The key thing here is to realise that an ellipse is just a squashed circle. We can convert
the ellipse to a circle by using the coordinate transformation
x = X(u, v) = au, y = Y (u, v) = bv.
For then we have
x2 y 2 (au)2 (bv)2
+ = 1 ⇒ + 2 =1 ⇒ u2 + v 2 = 1.
a2 b2 a2 b
In other words, in the new (u, v) plane, the ellipse becomes a circle of radius 1.
y v
D u = x/a
D0
x
) u
v = y/b
x2 y2
+ 1 u2 + v 2 1
a2 b2
since f (x, y) = 1 will give the numerical value of the area of the ellipse. Obviously we
then have
f (X(u, v), Y (u, v)) = 1,
i.e., the integrand in the case is a constant and so will not change under the transfor-
mation.
3. As we have seen in 1), the domain of integration changes from the ellipse in the (x, y)
plane to the circle (u, v): u2 + v 2 ≤ 1.
4. The Jacobian transformation is: dA = dxdy = |J(u, v)| du dv where
∂x ∂x
∂u ∂v a 0
J(u, v) = = ab.
∂y ∂y =
0 b
∂u ∂v
Hence we have
dx dy = ab du dv.
2.3. EVALUATION OF DOUBLE INTEGRALS 119
This next example shows how sometimes you are given a transformation which is easier
to work with backwards.
Example:
D xy
where D is the region:
x + y ≤ 3, x + y ≥ 1, y ≥ x, y ≤ 2x
1. Note that we have been given the transformation here (and in terms of u = U (x, y)
and v = V (x, y)). (In a difficult exam problem, the exact transformation, or a strong
hint would be given to you!)
The purpose of the transformation is to convert a regular, but complicated, domain D
in the (x, y)-plane into a much simpler, rectangular domain D0 in the (u, v) plane.
y v
x+y =3
y = 2x
��� ���
y=x v=2
���
u=x+y ���
���
D
) ��� u=1 D0 u=3
���
v = y/x ���
v=1
��� ���
x+y =1
��� ��� ���
x � � �
u
Note that we don’t have explicit forms for X(u, v) and Y (u, v) yet. We could invert
the above given transformations u = U (x, y) = x + y, v = V (x, y) = y/x, but this
might be messy. Sometimes it pays to wait and see what form the Jacobian J(u, v)
takes.
3. As we have seen in 1), the domain of integration changes from the polygon in the (x, y)
plane to a rectangle (u, v): 1 ≤ u ≤ 3, 1 ≤ v ≤ 2.
4. The Jacobian transformation is: dA = dxdy = |J(u, v)| du dv where
∂x ∂x
∂u ∂v
J(u, v) = .
∂y ∂y
∂u ∂v
However, here, since we have been given the transformation as u = U (x, y), v =
V (x, y), it’s easier to do partial derivatives of u and v with respect to x and y. Hence
we instead carry out the transformation in the backwards direction 3
∂u ∂u 1
1
∂x ∂y x + y x + y
du dv = dx dy =
y 1 dx dy = x2 dx dy = dx dy,
∂v ∂v x 2
∂x ∂y − 2
x x
since both x > 0 and y > 0.
Solving this for dx dy gives
x2
dx dy = du dv.
x+y
On the face of it this looks stupid as we have mixed up x, y and u, v on the RHS.
But look what happens when we substitute in to the original integral:
2
dx dy 1 x x 1 1 1
¨ ¨ ¨ ¨
= du dv = du dv = du dv.
D xy D0 xy x+y D0 y x+y D0 v u
The form of the integrand and the Jacobian combine to give something that can be
simply expressed in terms of the new variables u, v.
5. The integral is thus
ˆ 3
du 2 dv
ˆ u=3 ˆ v=2
dx dy 1
¨ ˆ
= du dv = = [ln u]31 [ln v]21 = (ln 3)(ln 2).
D xy u=1 v=1 uv 1 u 1 v
3
or by the inversion property of Jacobians from Chapter 1:
∂(X, Y ) ∂(U, V )
J(u, v) = =1 .
∂(u, v) ∂(x, y)
Evaluation of Triple Integrals 121
Once it is clear to extend definite integration from one to two-dimensional domains, the
extension to three (or more) dimensions is relatively straightforward.
dV = dx dy dz.
dx
dz
dV
dy
x
Figure 2.23: Volume element dV = dx dy dz.
The procedure for evaluating the integrals in 3-D is effectively the same as that studied for
2-D; we integrate the three variables in succession.
122 CHAPTER 2. MULTIPLE INTEGRATION
z = constant
c
x=0
y = constant
x=a
b
O y
a
x
Figure 2.24: Slice with z =constant, then y =constant, then integrate with respect to x.
• We slice the box with planes perpendicular to the z-axis, i.e., a constant z-slice.
• On each constant z-slice, we consider a straight line perpendicular to the to the y-axis
and integrate first along x on that y =constant line.
• Hence we have:
z=c y=b x=a
ˆ ˆ ˆ
I = dz dy dx xy 2 + z 3
z=0 y=0 x=0
c b x=a
x2 y 2
ˆ ˆ
3
= dz dy + xz ,
0 0 2 x=0
ˆ c ˆ b 2 2 a
ay 3
= dz dy + az ,
0 0 2 0
ˆ c 2 3 y=b
ay 3
= dz + az y ,
0 2×3 y=0
ˆ c 2 3
ab 3
= dz + abz ,
0 6
2 3 z=c
abz z4
= + ab ,
6 4 z=0
2 3
abc c4
= + ab ,
6 4
abc ab2 c3
= +b .
2 3 2
• Of course, we could alter the order of integration and we would still obtain the same
answer.
Example 2: If T is the tetrahedron with vertices (0, 0, 0), (1, 0, 0), (0, 1, 0) and (0, 0, 1),
evaluate ˆ
y dV.
T
• From the vertices of the tetrahedron, the domain of integration is the volume under
z = 1 − x − y, for x > 0, y > 0, z > 0.
• We slice the pyramid with slices perpendicular to the y-axis, i.e., a constant y-slice.
• On each constant y-slice, we cut along strips perpendicular to the x-axis, i.e,, integrate
first along x on that y =constant line.
124 CHAPTER 2. MULTIPLE INTEGRATION
z=1 x y
y=0 y = constant
x=0
z=0 y=1
y
x x=1 y
Figure 2.25: Domain T . Take triangle slices: y =constant, x =constant: first integrate
0 ≤ z ≤ 1 − x − y, then integrate 0 ≤ x ≤ y, then integrate 0 ≤ y ≤ 1.
• Hence we have:
ˆ y=1 ˆ x=1−y ˆ z=1−x−y
J = dy dx dz y,
y=0 x=0 z=0
ˆ y=1 ˆ x=1−y
= dy dx [yz]z=1−x−y
z=0 ,
y=0 x=0
ˆ y=1 ˆ x=1−y
= dy dx [y(1 − x − y)] ,
y=0 x=0
y=1 x=1−y
x2
ˆ
= dy y(x − − yx) ,
y=0 2 x=0
ˆ y=1
(1 − y)2
= dy y((1 − y) − − y(1 − y)) ,
y=0 2
ˆ y=1
y 2 y3
= dy −y + ,
y=0 2 2
2 y=1
y y3 y4
= − + ,
4 3 8 y=0
1 1 1
= − + ,
4 3 8
1
= .
24
2.4. EVALUATION OF TRIPLE INTEGRALS 125
There are several commonly occurring alternative coordinate systems in R3 which may be
used to simplify the evaluation of triple integrals when the domain of integration and/or the
integrand possess certain symmetries.
The cylindrical polar coordinate system is a straightforward extension of the 2-D polar
coordinate system to 3-D. It uses the 2-D plane polar coordinate system in planes parallel to
th (x, y) plane, while retaining the cartesian z-coordinate for measuring vertical distances,
hence the name “cylindrical”. The relationship between cartesian (x, y, z) and cylindrical
polar coordinates (r, θ, z) is given by
x = r cos θ, y = r sin θ, z = z.
z z
r
(x,y,z)
y k
r
0 0
θ
i
x
Figure 2.26: The relationship between cartesians and cylindrical polar coordinates.
As the above diagram might suggest, the cylindrical polar system is very useful for
problems that have some radial symmetry about a central axis.
By comparison with plane polar coordinates, when expressed in cylindrical polars the
volume element dV , it is not too difficult to see that in the limit as dr, dθ, dz → 0 it
transforms as:
126 CHAPTER 2. MULTIPLE INTEGRATION
rd✓
dr
r dV 0
dz
y
✓
d✓
x
dz0 in
Figure 2.27: dV 0 = r dr dθdV = rthe d✓ dzas dr, dθ, dz → 0.
dr limit
Example 1: Evaluate
ˆ
I= (x2 + y 2 )dV
D
• The form of the integrand x2 + y 2 suggests using cylindrical polar coordinates, since it
becomes x2 + y 2 = r2 .
• We must then check if the domain D can be represented nicely in terms of cylindrical
polar coordinates. (If not, there is no real point in converting coordinate systems.)
2.4. EVALUATION OF TRIPLE INTEGRALS 127
r=1
r=2
z=1
x ✓ = ⇡/2
D
✓ = ⇡/4 y
z=0
x2 + y 2 = 1 ⇒ r=1 (x2 + y 2 = r2 )
x2 + y 2 = 4 ⇒ r=2 (x2 + y 2 = r2 )
z=0 ⇒ z=0 (z = z)
z=1 ⇒ z=1 (z = z)
Note that all the limits will then decouple and so evaluating the integral in cylindricals
in this example means just evaluating the triple integral as the product of three one-
dimensional integrals.
1 ≤ r ≤ 2, π/4 ≤ θ ≤ π/2, 0 ≤ z ≤ 1.
128 CHAPTER 2. MULTIPLE INTEGRATION
z z
y
xx x
• Consider the diagram above, in cylindrical polars, the boundaries of the volume become
0 ≤ r ≤ 1 − z 2 : consider w.l.o.g. the x−plane, the distance of curve r from the z−axis is x.
−1 ≤ z ≤ +1 : consider w.l.o.g. the x−plane, the curve intersects the z−axis at z = ±1.
4
Some of you may have treated similar problems at A-level using other methods, but we are using this
example as a way to illustrate how to compute volumes of solids of revolution using cylindrical polars.
130 CHAPTER 2. MULTIPLE INTEGRATION
The spherical polar coordinate system is similar to (but not exactly the same as5 the coor-
dinate system that locates places on the surface of the earth, namely latitude and longitude.
A point P in R3 is represented in spherical polar coordinates by (r, θ, φ) where
• θ is the angle that the radial line OP makes with the positive z axis in the (x, z) plane,
i.e., π/2−latitude.
• φ is the angle between the plane containing P , the z axis and the (x, z) plane, i.e.,
longitude.
P (x,y,z)
ρr y
θ
0
φ
Figure 2.30: The relationship between cartesians and spherical polar coordinates.
5
the earth is not actually spherical but squashed at the poles to form a “oblate spheroid”.
2.4. EVALUATION OF TRIPLE INTEGRALS 131
Example: Find the cartesian coordinates of P with spherical coordinates (2, π/3, π/2).
r = 2, θ = π/3, φ = π/2.
√ √
y = r sin θ sin φ = 2 sin(π/3) sin(π/2) = 2 × ( 3/2) × 1 = 3;
√
• Hence the cartesian coordinates are (0, 3, 1).
√
Example: Find the spherical polar coordinates of Q with cartesian coordinates (1, 1, 2).
p √ √
tan θ = x2 + y 2 /z = 12 + 12 / 2 = 1 ⇒ θ = π/4;
Spherical polars are very useful for problems which have a spherical symmetry, for xample
in the study of the gravitational force of a planet. In spherical polars the volume element
dV can be represented by considering the diagram below in the limit as dr, dθ, dφ → 0.
132 CHAPTER 2. MULTIPLE INTEGRATION
z z
d r sin ✓d
dV 0 r sin ✓ dV 0
dr dr
✓
r d✓
O
rd✓
O
y y
x x
We conclude that
dV = r2 sin θ dr dθ dφ.
O
y
x
Figure 2.32: Half ball H: x2 + y 2 + z 2 ≤ 1, z ≥ 0.
0 ≤ x 2 + y 2 + z 2 ≤ a2 ⇒ 0≤r≤a (x2 + y 2 + z 2 = r2 );
We need to know how to change the elemental volume for arbitrary changes of variables in
R3 .
134 CHAPTER 2. MULTIPLE INTEGRATION
The formulae we studied in R2 can be easily extended to R3 . In this case the cartesian
cuboid dx dy dz maps onto a parallelepided, whose volume is given by a 3 × 3 determinant.
The idea is as follows.
where
∂x ∂x ∂x
∂u ∂v ∂w
∂(x, y, z) ∂y ∂y ∂y
J(u, v, w) = =
∂(u, v, w) ∂u ∂v ∂w
∂z ∂z ∂z
∂u ∂v ∂w
• We have
x = r sin θ cos θ, y = r sin θ sin θ, z = r cos θ.
= ...,
= r2 sin θ.
Chapter 3
At A-level and in MATH1059 you learned how to integrate a function y(x) along the x-axis.
In the previous chapter of this module, you learned how to integrate a function f (x, y) over
a 2-d domain D in the (x, y) plane.
In this chapter we will learn how to integrate real valued R2 → R functions P (x, y) or
Q(x, y) over a curve C given by a portion of the curve y = f (x) in the (x, y) plane. These
types of integral are called “line integrals” and can be written as:
ˆ ˆ
P (x, y) dx and Q(x, y) dy,
C C
or more generally as ˆ
{P (x, y) dx + Q(x, y) dy} .
C
y B
y = f (x)
C
A
x
Figure 3.1: The curve C over which the line integral is taken is the portion of the curve
y = f (x) between endpoints A and B.
135
136 CHAPTER 3. LINE INTEGRALS AND GREEN’S THEOREM
• The integral
ˆ x=b
y(x)dx
x=a
corresponds to the area of paint you need to paint one side of a straight fence of height
y(x) between x = a and x = b.
• Let C denote the curve along the base of a fence in the (x, y) plane given by the equation
y = f (x). Let ds be the elemental arc length distance along the tangential direction
of a curve C.
y B
y = f (x)
ds
<latexit sha1_base64="MnnttU+7uaHaa+278s9sZ8oKN4s=">AAAB/HicbVDJSgNBFHwdtxi3qEcvjUHwFGa8mGPAi8coZoEkhJ6enqRJz0L3m0AI8Qu86hd4E6/+ix/gf9iTzCVLwYOiqh4U5SVKGnScX1LY2d3bPygelo6OT07PyucXLROnmosmj1WsOx4zQslINFGiEp1ECxZ6SrS98UPmtydCGxlHLzhNRD9kw0gGkjO00rNvBuWKU3UWoJvEzUkFcjQG5b+eH/M0FBFyxYzpuk6C/RnTKLkS81IvNSJhfMyGYrboN6c3VvJpEGt7EdKFupJjoTHT0LPJkOHIrHuZuNXLFG0Cs83sphjU+jMZJSmKiC9bBKmiGNNsCepLLTiqqSWMa2nrUz5imnG0e5XsLu76CpukdVd1nar75FTqtXyhIlzBNdyCC/dQh0doQBM4BPAG7/BBXskn+SLfy2iB5D+XsALy8w+mNJTx</latexit>
sha1_base64="p9YMhLONrAaQpjJc4MwKIgC0SIc=">AAAB/HicbVDJSgNBFHwTtxi3qEcvjUHwFHq8mJsBLx5jMAskIfT09CRNeha63wghxC/wql/gTbyKJ//DD/A/7Em8ZCl4UFTVg6K8REmDlP44uY3Nre2d/G5hb//g8Kh4fNI0caq5aPBYxbrtMSOUjEQDJSrRTrRgoadEyxvdZn7rUWgj4+gBx4nohWwQyUByhlaq+6ZfLNEynYGsEveflG6+v+pgUesXf7t+zNNQRMgVM6bj0gR7E6ZRciWmhW5qRML4iA3EZNZvSi6s5JMg1vYiJDN1IcdCY8ahZ5Mhw6FZ9jJxrZcp2gRmndlJMaj0JjJKUhQRn7cIUkUwJtkSxJdacFRjSxjX0tYnfMg042j3Kthd3OUVVknzquzSsntPS9UKzJGHMziHS3DhGqpwBzVoAIcAnuEFXp0n5815dz7m0Zzz/3MKC3A+/wCLRpb4</latexit>
sha1_base64="59lX8GO48ALcFb4BvO3h+DhonWg=">AAAB/HicbVC7SgNBFL0bXzG+opaCDAbBKuzamM6AjWUi5gFJCLOzs8mQ2Qczd4UQYmdnq19gJ7aSyv/wA6z8CWcTmzwOXDiccy4cjhtLodG2v63M2vrG5lZ2O7ezu7d/kD88qusoUYzXWCQj1XSp5lKEvIYCJW/GitPAlbzhDm5Sv/HAlRZReI/DmHcC2guFLxhFI915upsv2EV7CrJMnH9SuP6aVH+fTieVbv6n7UUsCXiITFKtW44dY2dEFQom+TjXTjSPKRvQHh9N+43JuZE84kfKXIhkqs7laKD1MHBNMqDY14teKq70UkVpX68yWwn6pc5IhHGCPGSzFn4iCUYkXYJ4QnGGcmgIZUqY+oT1qaIMzV45s4uzuMIyqV8WHbvoVO1CuQQzZOEEzuACHLiCMtxCBWrAwIdneIFX69F6s96tj1k0Y/3/HMMcrM8/wGCZPQ==</latexit>
dy
C <latexit sha1_base64="lWqNSvH00ESeqbg4WGEDqmB2ftU=">AAAB/HicbVBLSgNBFHwdfzH+oi7dNAbBVZhxY5YBNy6jmA8kIfT09CRNej50vxGGEE/gVk/gTtx6Fw/gPexJZpOYggdFVT0oykuUNOg4P6S0tb2zu1ferxwcHh2fVE/POiZONRdtHqtY9zxmhJKRaKNEJXqJFiz0lOh607vc7z4LbWQcPWGWiGHIxpEMJGdopUc/G1VrTt1ZgP4nbkFqUKA1qv4O/JinoYiQK2ZM33USHM6YRsmVmFcGqREJ41M2FrNFvzm9spJPg1jbi5Au1JUcC43JQs8mQ4YTs+7l4kYvV7QJzCazn2LQGM5klKQoIr5sEaSKYkzzJagvteCoMksY19LWp3zCNONo96rYXdz1Ff6Tzk3dderug1NrNoqFynABl3ANLtxCE+6hBW3gEMArvME7eSEf5JN8LaMlUvycwwrI9x+vvpT3</latexit>
sha1_base64="bwQyCOHhm/Nn0i2TJaTq2dbUW20=">AAAB/HicbVC7SgNBFL0bXzG+oiltBkPAKuzamDJgYxnFPCAJYXZ2Nhky+2DmbmAJsbK01S+wE1v/xc7G/3A2SZPHgQuHc86Fw3FjKTTa9o+V29nd2z/IHxaOjk9Oz4rnFy0dJYrxJotkpDou1VyKkDdRoOSdWHEauJK33fFd5rcnXGkRhU+Yxrwf0GEofMEoGunRSwfFsl215yCbxFmScr1UefkFgMag+NfzIpYEPEQmqdZdx46xP6UKBZN8VuglmseUjemQT+f9ZqRiJI/4kTIXIpmrKzkaaJ0GrkkGFEd63cvErV6mKO3rbWY3Qb/Wn4owTpCHbNHCTyTBiGRLEE8ozlCmhlCmhKlP2IgqytDsVTC7OOsrbJLWTdWxq86DGagGC+ThEq7gGhy4hTrcQwOawMCHV3iDd+vZ+rA+ra9FNGctf0qwAuv7H1TmltM=</latexit>
sha1_base64="I8kHGAjLQdT6g9t7fXnAGT5JR1g=">AAAB/HicbVC7SgNBFL0bXzG+oiltBkPAKuzamDJgYxnFPCAJYXYymwyZfTBzV1hCrCxt9QvsxNZP8B+01y/wA5xN0uRx4MLhnHPhcNxICo22/WVlNja3tneyu7m9/YPDo/zxSUOHsWK8zkIZqpZLNZci4HUUKHkrUpz6ruRNd3SV+s17rrQIgztMIt716SAQnmAUjXTbT3r5ol22pyCrxJmTYrVQevz7/P2u9fI/nX7IYp8HyCTVuu3YEXbHVKFgkk9ynVjziLIRHfDxtN+ElIzUJ16ozAVIpupCjvpaJ75rkj7FoV72UnGtlypKe3qd2Y7Rq3THIohi5AGbtfBiSTAk6RKkLxRnKBNDKFPC1CdsSBVlaPbKmV2c5RVWSeOi7Nhl58YMVIEZsnAKZ3AODlxCFa6hBnVg4METPMOL9WC9Wm/W+yyaseY/BViA9fEPAGCZbQ==</latexit>
A dx
<latexit sha1_base64="yzxDn/SDyi1mZkiABfJ7vW9Gcmo=">AAAB/HicbVDJSgNBFHwTtxi3qEcvjUHwFHq8mGPAi8coZoEkhJ6enqRJz0L3GzGE+AVe9Qu8iVf/xQ/wP+xJ5pKl4EFRVQ+K8hIlDVL66xS2tnd294r7pYPDo+OT8ulZy8Sp5qLJYxXrjseMUDISTZSoRCfRgoWeEm1vfJf57WehjYyjJ5wkoh+yYSQDyRla6dF/GZQrtErnIOvEzUkFcjQG5b+eH/M0FBFyxYzpujTB/pRplFyJWamXGpEwPmZDMZ33m5ErK/kkiLW9CMlcXcqx0JhJ6NlkyHBkVr1M3OhlijaB2WR2Uwxq/amMkhRFxBctglQRjEm2BPGlFhzVxBLGtbT1CR8xzTjavUp2F3d1hXXSuqm6tOo+0Eq9li9UhAu4hGtw4RbqcA8NaAKHAN7gHT6cV+fT+XK+F9GCk/+cwxKcn3+uJ5T2</latexit>
sha1_base64="T0ZUFd8jIlQUzyLGCfALkr1GciE=">AAAB/HicbVDJSgNBFHwTtxi3UY9eGoPgKcx4MTcDXjzGYBZIQujp6Uma9Cx0vxFDiF/gVb/Am3gVT/6HH+B/2JPkkqXgQVFVD4ryEik0Os6vldvY3Nreye8W9vYPDo/s45OGjlPFeJ3FMlYtj2ouRcTrKFDyVqI4DT3Jm97wNvObj1xpEUcPOEp4N6T9SASCUTRSzX/q2UWn5ExBVok7J8Wbn+8aGFR79l/Hj1ka8giZpFq3XSfB7pgqFEzySaGTap5QNqR9Pp72m5ALI/kkiJW5CMlUXcjRUOtR6JlkSHGgl71MXOtlitKBXme2UwzK3bGIkhR5xGYtglQSjEm2BPGF4gzlyBDKlDD1CRtQRRmavQpmF3d5hVXSuCq5Tsm9d4qVMsyQhzM4h0tw4RoqcAdVqAODAF7gFd6sZ+vd+rA+Z9GcNf85hQVYX/+TOZb9</latexit>
sha1_base64="KFVxXf7gfsWG1eB2Ea89EJntd0U=">AAAB/HicbVC7SgNBFL0bXzG+opaCDAbBKuzamM6AjWUi5gFJCLOzs8mQ2Qczd8UQYmdnq19gJ7aSyv/wA6z8CSePJo8DFw7nnAuH48ZSaLTtHyu1tr6xuZXezuzs7u0fZA+PqjpKFOMVFslI1V2quRQhr6BAyeux4jRwJa+5vZuxX3vgSosovMd+zFsB7YTCF4yike68x3Y2Z+ftCcgycWYkd/09Kv89n45K7exv04tYEvAQmaRaNxw7xtaAKhRM8mGmmWgeU9ajHT6Y9BuScyN5xI+UuRDJRJ3L0UDrfuCaZECxqxe9sbjSGytK+3qV2UjQL7QGIowT5CGbtvATSTAi4yWIJxRnKPuGUKaEqU9YlyrK0OyVMbs4iyssk+pl3rHzTtnOFQswRRpO4AwuwIErKMItlKACDHx4gVd4s56sd+vD+pxGU9bs5xjmYH39A8hTmUI=</latexit>
Pythagoras shows that ds and elemental steps along the curve in the x and y directions,
dx and dy, respectively, are related by:
s 2
p dy
ds = (dx)2 + (dy)2 ⇒ ds = 1+ dx.
dx
The integral
s 2
dy
ˆ ˆ ˆ
R(x, y)ds = R(x, y) 1 + dx = P (x, y)dx,
C C dx C
say, corresponds to the area of paint you need to paint one side of a fence of height
R(x, y)
Evaluation of Line Integrals in a Plane 137
z = R(x, y)
<latexit sha1_base64="qbduznD1GtJzhMUWEGl5BYhbOV4=">AAACAnicbVDJSgNBFHzjGuMW9eilMQgRJMx40YsQ9OIxilkgCaGn05M06VnofiOOITe/wKt+gTfx6o/4Af6HPclcshQ8KKrqQVFuJIVG2/61VlbX1jc2c1v57Z3dvf3CwWFdh7FivMZCGaqmSzWXIuA1FCh5M1Kc+q7kDXd4m/qNJ660CINHTCLe8Wk/EJ5gFI3UfLl+KD2fJ2fdQtEu2xOQReJkpAgZqt3CX7sXstjnATJJtW45doSdEVUomOTjfDvWPKJsSPt8NGk5JqdG6hEvVOYCJBN1Jkd9rRPfNUmf4kDPe6m41EsVpT29zGzF6F11RiKIYuQBm7bwYkkwJOkepCcUZygTQyhTwtQnbEAVZWhWy5tdnPkVFkn9ouzYZefeLlZusoVycAwnUAIHLqECd1CFGjCQ8Abv8GG9Wp/Wl/U9ja5Y2c8RzMD6+QclmZbX</latexit>
sha1_base64="eGmVAWg/2QnCNc4M+7a1DXESOgU=">AAACAnicbVC7SgNBFL0bXzG+opY2Q4IQUcKujTZC0MYyinlAEsLsZDYZMvtg5q4YQzq/wFYLazux9Uf8AP/D2SRNHgcuHM45Fw7HjaTQaNu/VmpldW19I72Z2dre2d3L7h9UdRgrxisslKGqu1RzKQJeQYGS1yPFqe9KXnP7N4lfe+RKizB4wEHEWz7tBsITjKKR6s9X94Wns8FJO5u3i/YYZJE4U5Iv5ZqnHwBQbmf/mp2QxT4PkEmqdcOxI2wNqULBJB9lmrHmEWV92uXDccsROTZSh3ihMhcgGaszOeprPfBdk/Qp9vS8l4hLvURR2tPLzEaM3mVrKIIoRh6wSQsvlgRDkuxBOkJxhnJgCGVKmPqE9aiiDM1qGbOLM7/CIqmeFx276NyZga5hgjQcQQ4K4MAFlOAWylABBhJe4Q3erRfr0/qyvifRlDX9OYQZWD//VUOYYA==</latexit>
sha1_base64="KDoYLApo7VfTLhShoJ8AMhFj8Fc=">AAACAnicbVDJSgNBFHwTtxi3qEcvTYIQUcKMF70IQS8eo5gFkhB6Oj1Jk56F7jfiGHLzC8SbfoE38eqP5AP8DzuJlywFD4qqelCUG0mh0bZHVmpldW19I72Z2dre2d3L7h9UdRgrxisslKGqu1RzKQJeQYGS1yPFqe9KXnP7N2O/9siVFmHwgEnEWz7tBsITjKKR6s9X94Wns+Sknc3bRXsCskicf5Iv5Zqnb6NSUm5nf5udkMU+D5BJqnXDsSNsDahCwSQfZpqx5hFlfdrlg0nLITk2Uod4oTIXIJmoMznqa534rkn6FHt63huLS72xorSnl5mNGL3L1kAEUYw8YNMWXiwJhmS8B+kIxRnKxBDKlDD1CetRRRma1TJmF2d+hUVSPS86dtG5MwNdwxRpOIIcFMCBCyjBLZShAgwkvMI7fFgv1qf1ZX1Poynr/+cQZmD9/AF7n5nm</latexit>
z z
z = y(x)
y
B
A C
x=a x=b x
x=a x=b x
Figure 3.3: Vertical fences that need painting: one with a straight line for a base (left) and
one with a curved base (right).
Note that if you set R(x, y) = 1 you end up with an expression for working out the length
of the curve C between A and B:
ˆ ˆ
1ds = ds = [Length of curve]B
A = Length of C between A and B.
C C
over the portion of the differentiable curve y = f (x) in the (x, y) plane between x = a and
x = b.
and ˆ ˆ y=d
Q(x, y) dy = Q(g(y), y)dy.
C y=c
O (1, 0) x
x + y = x + x2 .
For these examples, we can equally evaluate these integrals by representing the curve of
integration C in terms of equivalent functional representations x = g(y).
x+y =1−y+y =1
.
• If x = 1 − y, then dx = −dy.
• We thus have
ˆ ˆ y=0 ˆ y=0
P (x, y) dx = (1 − y + y) (−1)dy = − 1 dy = − [y]x=0
y=1 = 1.
C1 y=1 y=1
√
• The curve y = x2 may be rearranged as x = + y, since we are in the positive quadrant.
• The limit of integration x = 0 becomes y = 02 = 0.
• The limit of integration x = 1 becomes y = 12 = 1.
√
• The integrand on the curve C2 when written as x = g(y) = + y is given by
√
x+y =+ y−y
.
140 CHAPTER 3. LINE INTEGRALS AND GREEN’S THEOREM
√
• If x = + y, then dx = 21 y −1/2 dy.
• We thus have
between the points (0, 1) and (2, 3) along the curve C given by y = x + 1.
• Hence we have
ˆ x=2
I = (x2 + 2(x + 1)) dx + (x + (x + 1)2 ) dx,
ˆx=0
x=2
= (x2 + 2x + 2) dx + (x + x2 + 2x + 1) dx,
ˆx=0
x=2
= (2x2 + 5x + 3) dx,
x=0
x=2
2 3 5 2
= x + x + 3x dx,
3 2 x=0
64
= .
3
• From the question statement, we see the limit x = 0 corresponds to y = 1 and the
limit x = 1 corresponds to y = 3.
Properties of Line Integrals 141
Hence the sign of the line integral changes if the direction of traversal of C is reversed.
3. Addition of Paths
Suppose the curve of integration C is given by y = f (x) possessing end points with
x-coordinates a and b, but is divided into two parts with a common point (xc , yc ).
Then we have
ˆ ˆ x=b ˆ x=xc ˆ x=b
P (x, y) dx = P (x, f (x)) dx = P (x, f (x)) dx + P (x, f (x)) dx.
C x=a x=a x=xc
y
B
y = f2 (x)
A y = f1 (x)
x=a x=b x = xc x
Figure 3.5: The curves C is multivalued for xb ≤ x ≤ xc (where the curve, as traversed from
A to B is first vertical). It must be split up into two single-valued portions, the lower curve
given by y = f1 (x), the upper curve given by y = f2 (x).
(0, 1)
A p
y=+ 1 x2
C1
O x
(1, 0)
C2 p
y= 1 x2
B
(0, 1)
• The issue, from point 4 above, is that the equation of the path of integration C is not
single-valued. For every value of x on the path, there are two corresponding values of
y.
√
– C1 : y = + 1 − x2 between (x, y) = (0, 1) and (x, y) = (1, 0).
√
– C2 : y = − 1 − x2 between (x, y) = (1, 0) and (x, y) = (0, −1).
144 CHAPTER 3. LINE INTEGRALS AND GREEN’S THEOREM
A natural question arises: if the endpoints of the line integral are the same, is the value
of the line integral the same regardless of which path is taken between the endpoints? In
general the answer is “no”, as the following example illustrates.
Both these integrations have the same integrands and endpoints. They just differ by the
curves over which the integrals are performed.
y
B : (1, 1)
C1
C3
A : (0, 0) C2 (1, 0) x
Figure 3.7: Different paths between the endpoints A and B, either the direct path C1 or the
joined path via the intermediate point (1, 0), C2 ∪ C3 .
3.3. PROPERTIES OF LINE INTEGRALS 145
(b) From (0, 0) to (1, 0) along the curve C2 , y = 0, then from (1, 0) to (1, 1) along the curve
C3 , x = 1.
• On the first leg C2 along y = 0 we have dy = 0. Hence we have:
ˆ
IC2 = (x + y) dx + xy dy,
C2
ˆ x=1
= (x + 0) dx + 0
x=0
1
1 2
= x ,
2 0
1
= .
2
Hence we seeHence
that, we
even
seethough the limits
that, even thoughof the
the limits
integrals are integrals
of the the same,are
thethe
values
same,ofthe
thevalues of the
• Hence
integrals acrossthe
the total line
di↵erent integral
paths di↵er along
and I C 6
=
2 ∪I C
integrals across the di↵erent pathsC1di↵erC2and 3 is
. given by
[C3 IC1 6= IC2 [C3 .
For arbitraryFor
general integrands
arbitrary generalIP (x, y) and P
integrands Q(x, 1 integrals
y) and
(x, y) the 1 between the between
same
C2 ∪C3 = IC2 + IC3 = +Q(x,=y)1.the integrals the same
endpoints will depend on the paths taken. Later we will 2
examine 2the conditions on
endpoints will depend on the paths taken. Later we will examine the conditions P (x, y) on P (x, y)
and Q(x, y) under which
and Q(x, integrals
y) under are integrals
which independent of the particular
are independent path
of the taken between
particular the between the
path taken
same endpoints.
same endpoints.
Hence we see that, even though the limits of the integrals are the same, the values of the
4.4 across
integrals Line
4.4 Integrals
the different Around
Line Integrals
paths Closed
differ and IC1 6= IC2Curves
Around Closed
∪C 3. Curves
For arbitrary general integrands P (x, y) and Q(x, y) the integrals between the same
We now consider the consider
line integral
We now the around a closed,
line integral non-intersecting,
around curve, which curve,
a closed, non-intersecting, starts which
and starts and
endpoints will
finishes at the depend
same on
point. the paths
finishes at the same point. taken. Later we will examine the conditions on P (x, y)
and Q(x, y) under which integrals are independent of the particular path taken between the
We need to specify
same endpoints. We needthetodirection in which
specify the we traverse
direction in which the
we closed curve.
traverse The convention
the closed curve. The convention
adopted is that a counterclockwise
adopted traversal is said
is that a counterclockwise to be in
traversal is the
saidpositive
to be indirection, a clockwise
the positive direction, a clockwise
traversal is said to be in the negative direction.
traversal is said to be in the negative direction.
We now consider the line integral around a closed, non-intersecting, curve, which starts and
finishes at the same point.
We need to specify the direction in which we traverse the closed curve. The convention
adopted is that a counterclockwise traversal is said to be in the positive direction, a clockwise
Figure
traversal is said to be in the negative 4.7: TBC
direction.Figure 4.7: TBC
y y
When integrals
When are integrals
evaluatedare
counterclockwise (respectively (respectively
evaluated counterclockwise clockwise) around a closed
clockwise) around a closed
curve C we denote
curve Cthis
weby denote this by
‰ ‰ ⇢ ⇢
, , or respectively
or respectively . .
C C C C
C C
Note that youNote
mightthattempted
you might tempted
to think thattosince
think
thethat since the
endpoints areendpoints
the same,arethethe same, the value
value
of line
of line intgrals over intgrals over closed
closed paths paths
is always is always
zero. They canzero.beThey
zero, can
but be
as zero, but as the following
the following
examples
examples suggest, theysuggest,
are not they areso.
always notxalways so. x
FigureTo3.8:
evaluate To
Left: suchevaluate
integralssuch
overintegrals
counterclockwise, closed overtraversal
paths,
positive closed paths,
we need to weRight:
need
of break
C. the to break thecontour
integration
clockwise, integration
negative contour C
C traver-
intoparts
into component component parts
that each havethat each have a single-valued
a single-valued representation.
representation.
sal of C.
For example For example
for the for the
left hand left hand
(convex) (convex)
curve in the curve
diagramin the diagram
below, below,
we can breakwe C break the C
thecan
up intoby
up into C1 denoted C1ydenoted
= f1 (x) by
andy =
C2 fdenoted
1 (x) andbyC2ydenoted
= f2 (x).by y = f2 (x).
Note that you might tempted to think that since the endpoints are the same, the value
of line intgrals over closed paths is always zero. They can be zero, but as the following
examples suggest, they are not always so.
To evaluate such integrals over closed paths, we need to break the integration contour C
into component parts that each have a single-valued representation.
For example for the left hand (convex) curve in the diagram below, we can break the
C up into C1 denoted by y = f1 (x) and C2 denoted by y = f2 (x), with endpoints A and B
(where the curve is vertical).
For the right hand (non-convex) curve in the diagram below, we can break the C up
into four parts Ci , i = 1, 2, 3, 4 each separately denoted by y = fi (x), i = 1, 2, 3, 4, along the
portions AB, BC, CD and DA, respectively.
y y
C2
D
C3
B C4 B
C
C2
A A
C1 C1
x x
where C is made up of the sides of the triangle with vertices O = (0, 0), A = (1, 0), B = (1, 1).
y
B : (1, 1)
O : (0, 0) A : (1, 0) x
y p
C1 : y = + a2 x2
B A
x
p
C2 : y = a2 x2
√
• The representation of C, y = a2 − x2 is not single-valued.
• We split the contour up into two halves on which the contour is a single-valued function
of x:
√
– Upper semi-circle C1 : y = + a2 − x2 with + a ≥ x ≥ −a,
√
– Lower semi-circle C2 : y = − a2 − x2 with − a ≤ x ≤ a.
Note that, up to a sign, this line integral gives the area of the enclosed circle. We could
also have found the same result by integrating
‰
xdy.
C
We now introduce the concept of connectivity of a planar region. This will be used in later
sections.
150 CHAPTER 3. LINE INTEGRALS AND GREEN’S THEOREM
y y y
R C2
C3
C2
C1
C R
C1
R
x x x
Figure 3.12: Left: Simply connected region R. Middle: Doubly connected region R. Right:
Triply connected region R.
In the middle diagram above the curve C1 may be shrunk to a point without leaving R.
However the curve C2 cannot be shrunk to a point in R. There are thus two types of curves
in R, those like C1 and those like C2 . R is this not simply connected. Due to the existence
of two different types of curves in R it is, however, said to be doubly connected.
Similarly in the right hand diagram above, the region R is said to be triply connected as
there are three different types of curves:
In general if there are n − 1 holes in region R, the region is said to be n−fold connected.
We now deduce the conditions under which the value of a line integral with the same end-
points is independent of the path between them.
Suppose F (x, y) is a single valued continuous function of (x, y) with single valued and
continuous first partial derivatives in a region R of the (x, y)−plane.
3.6. INDEPENDENCE OF PATH 151
Let C be a curve lying entirely within R with endpoints A : (x1 , y1 ) and B : (x1 , y1 ). We
define C parametrically as:
C : x = f (t), y = g(t),
for t1 ≤ t ≤ t2 with f (ti ) = xi , g(ti ) = yi , i = 1, 2.
= F (x2 , y2 ) − F (x1 , y1 ),
where we have used the chain rule to go between the second and third lines.
Note that the final result only depends on the endpoints and not on the actual contour
of integration C.
Thus we can now see that the condition for the value of a line integral of the form
ˆ
I = {P (x, y)dx + Q(x, y)dy} ,
C
A caveat to this is that if R is not simply connected, this result does not hold.
152 CHAPTER 3. LINE INTEGRALS AND GREEN’S THEOREM
Consider the following path integral around a closed curve in a simply connected region R:
‰
I = {P (x, y)dx + Q(x, y)dy} .
C
If the integral is independent of path, then we have I = 0 for all simple closed curves in R.
Consider two paths between the points A and B in a simply connected region R. Let
the endpoints be connected by two paths C1 and C2 as in the diagram
y
R
C2
B
A C1
x
Figure 3.13: Two different paths C1 and C2 between the endpoints A and B in a simply
connectd region R.
Then we have
‰
I = {P (x, y)dx + Q(x, y)dy}
C
ˆ ˆ
= {P (x, y)dx + Q(x, y)dy} − {P (x, y)dx + Q(x, y)dy}
C1 C2
= 0,
since, by assumption the integrals are independent of path, and so the terms in the second
line are identical.
for all simple closed curves C in the simply connected region R, then the integral
ˆ
{P (x, y)dx + Q(x, y)dy}
Green’s Theorem 153
When you come to study complex analysis, the path independence of integrals of complex
analytical functions is called Cauchy’s theorem.
This integral is independent of the path jointing A and B in the (x, y) plane, since if we
set
P (x, y) = y cos x, Q(x, y) = sin x,
we observe that
∂P ∂Q
= cos x = .
∂y ∂x
Consequently, from above, the line integral can be written as an exact derivative of a
function F (x, y). We can find what F (x, y) is by integration both partial derivatives and
comparing the results (see Chapter 1):
∂F
ˆ
= y cos x ⇒ F (x, y) = y cos x dx = y sin x + g(y); (3.1)
∂x
∂F
ˆ
= sin x ⇒ F (x, y) = sin x dy = y sin x + h(y);
∂y
where g(y) and h(x) are arbitrary functions. A comparison of the two expressions shows
that they can only be equal if g(y) = h(x) = constant. Hence up to an arbitrary constant
(which will vanish in the final definite integration below) it is not too difficult to see that:
F (x, y) = y cos x.
Green’s theorem is one of the most fundamental and beautiful theorems involving line inte-
grals.1 It links the value of an integral over a curve to that of an integral over the region
the curve bounds. It is of fundamental importance in many fields of the physical and math-
ematical sciences. It is a two-dimensional special case of the Kelvin- Stokes and divergence
theorems and can also be used to prove results involving integrals in complex analysis, all of
which you may learn next year.
Theorem: (Green’s Theorem) Suppose P (x, y) and Q(x, y) are two functions which are
finite and continuous inside and on the non- self-intersecting boundary C of some simply
connected region R of the (x, y)-plane. If the first partial derivatives of the these functions
are continuous inside and on the boundary of R, then
¨
∂Q(x, y) ∂P (x, y)
‰
{P (x, y) dx + Q(x, y) dy} = − dx dy.
C R ∂x ∂y
The proof of the theorem when the boundary C is a simple (non-self-intersecting) closed
curve and R is simply connected, is relatively straightforward. The following diagram will
be helpful.
1
Green’s Theorem is named after George Green (1793-1841). He was self-taught and had minimal school-
ing as a child. His father was a baker and Green spent much of his adult life running a windmill in Sneinton
in Nottinghamshire. Nevertheless, in 1828 he privately published a text on the application of mathematics
to electricity and magnetism. At the time, only 51 people bought it, but it contains results of fundamental
importance in many fields of mathematics and science. Some of the subscribers encouraged Green to go to
Cambridge to do an undergraduate degree. He entered in 1832, aged 39 and finally graduated in 1838 with
a BA, some 10 years after his research work (thereby proving that you don’t need to have been anywhere
near Cambridge to change the world)! He died only 3 years later, with some rumours of alcoholism. The
picture on the back of these lecture notes is of your lecturer pondering some maths on Green’s grave in the
churchyard in Sneinton. Green’s mill still stands today and can be visited both physically and electonically
3.7. GREEN’S THEOREM 155
y y
y = y2 (x)
y=d
C2 x = x4 (y)
x = x3 (y)
R R
C4
C3
C1
y=c
y = y1 (x)
x=a x=b x x
C = C1 [ C 2 C = C3 [ C 4
Figure 3.14: The region R is the same in both diagrams. The boundary curve C is also the
same. In the left-hand diagram, C decomposed into two curves C1 , C2 , with single valued
representations y = y1 (x) and y = y2 (x), respectively with C = C1 ∪ C2 . In the right-hand
diagram, the same boundary C is now decomposed into two curves C3 , C4 , with single valued
representations x = x3 (y) and x = x4 (y), respectively with C = C3 ∪ C4 .
• Subtracting the first result from the second delivers the following result, as required:
¨
∂Q(x, y) ∂P (x, y)
‰
{P (x, y) dx + Q(x, y) dy} = − dx dy.
C R ∂x ∂y
A special case of Green’s theorem leads to a surprising and immediately useful result.
1
‰ ¨
⇒ {x dy − y dx} = dx dy = A.
2 C R
1
‰
⇒ A = {x dy − y dx} .
2 C
3.7. GREEN’S THEOREM 157
In other words, the area of the simply-connected region enclosed by the non-self-intersecting
curve C can be given by a simple line integral. This result underpins the principle of the
planimeter, a mechanical device for measuring the area of an arbitrary two dimensional
shape.
Note that the above area result can also be simplified by making the same substitutions
P (x, y) = y, Q(x, y) = −x into the two integrals used to prove Green’s theorem above. For
then they generate:
¨ ‰
A = dx dy = − y dx,
R C
¨ ‰
A = dx dy = x dy.
R C
‰
I= (x2 + y) dx + (x + sin y) dy ,
C
where C is the boundary of the unit square with vertices (0, 0), (1, 0), (1, 1), (0, 1).
y
(1, 1)
(0, 1)
R C
O (1, 0) x
• The functions P and Q are continuous and possess continuous first partial derivatives
(obvious).
158 CHAPTER 3. LINE INTEGRALS AND GREEN’S THEOREM
• The integration curve C is not self intersecting and encircles a region R that is simply
connected.
• Hence we have
¨
2 ∂Q(x, y) ∂P (x, y)
‰
I= (x + y) dx + (x + sin y) dy = − dx dy,
C R ∂x ∂y
¨
∂(x + sin y) ∂(x2 + y)
= − dx dy,
R ∂x ∂y
ˆ x=1 ˆ y=1
= dx dy {1 − 1} ,
x=0 y=0
= 0.
x
‰
I= (e − 2y) dx + (x + cos(1457y 956 )) dy ,
C
where C is the boundary of the circle, centred at the origin and radius a.
C
R a
O x
• The functions P and Q are continuous and possess continuous first partial derivatives
(obvious).
• The integration curve C is not self intersecting and encircles a region R that is simply
connected.
• Hence we may apply Green’s theorem.
• Hence we have
¨
x ∂Q(x, y) ∂P (x, y)
‰
956
I= (e − 2y) dx + x + cos(1457y ) dy = − dx dy,
C R ∂x ∂y
¨
∂(x + cos(1457y 956 )) ∂(ex − 2y)
= − dx d
R ∂x ∂y
¨
= {1 + 2} dx dy,
R
¨
= 3 dx dy,
R
= 3πa2 .
Example: Use Green’s theorem to find the area of an ellipse whose boundary is given
by given by
x = a cos t, y = b sin t, 0 ≤ t ≤ 2π.
C
b
a
O x
R
• On C we have
x = a cos t ⇒ dx = −a sin t,
y = b sin t ⇒ dy = b cos t.
• The integration curve C is not self intersecting and encircles a region R that is simply
connected.
• Hence we have
1
‰
A = {x dy − y dx} ,
2 C
1
‰
= {a cos t dy − b sin t dx} ,
2 C
t=2π
1
ˆ
= {(a cos t) × (b cos t)dt − b sin t × (−a sin t)dt} ,
2 t=0
1 t=2π
ˆ
= (ab cos2 t + ab sin2 t)dt ,
2 t=0
ab t=2π
ˆ
= (cos2 t + sin2 t)dt ,
2 t=0
t=2π
ab
ˆ
= dt,
2 t=0
ab
= × 2π,
2
= πab.
Professor Chris Howls took a First in Joint Honours
Mathematics and Physics before obtaining his PhD with
Professor Sir Michael Berry FRS at the University of
Bristol. He has published numerous papers in the area of
asymptotics and its applications, with research collabora-
tions in the UK, US, Europe and Asia.
Multivariable Calculus is one of the core modules taken by first-year students in all of the
Honours Mathematics programmes at the University of Southampton. The module provides an
introduction to the use of multivariable functions that forms the basis of many of the later modules
in the mathematical sciences degree programmes.