Lectures Compressed (3985) MAT135

Dmitry Panchenko
Calculus I and II
Essentials
ISBN-13: 978-1-9994190-7-3
ISBN-10: 1-9994190-7-3
1st edition © 2022 Dmitriy Panchenko
Acknowledgement
This text grows out of the author’s experience teaching MAT135 & MAT136 at the
University of Toronto during the 2021-2022 academic year. This course in its current form
was developed between 2017 and 2021 under the direction of Sarah Mayes-Tang, with
input from many members of a large teaching team, notably including Bernardo Galvão-
Sousa, who also coordinated the course in 2021-2022 academic year. The material in these
notes reflects the structure of the course as designed by Professor Mayes-Tang, for example
through the emphasis on a flipped classroom and in the selection of topics. The presentation
is intended to complement the treatment of these topics as found in the standard MAT135
& MAT136 course material.
I want to thank the entire MAT135 & MAT136 teaching teams from the previous several
years, and also MAT187 from last year, who created many of the exercises. I also want to
thank all the students in these classes, whose feedback was very important to me, and whose
positive energy made the classes a real pleasure to teach.
Contents
1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Exponential functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Logarithmic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Logarithmic scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6 Trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.7 Polynomials and rational functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.8 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.9 Limits and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.1 Practical interpretation of derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.2 Formal definition of derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.3 Derivatives and graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.4 Differentiation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.5 First applications: old and new . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.6 Critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.7 Optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2.8 Parametric families of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
2.9 Related rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.1 Definite integrals: the case of velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.2 Definite integrals: general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
3.3 Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3.4 Application of FTC: differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
3.5 Techniques of integration: substitution rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3.6 Techniques of integration: integration by parts . . . . . . . . . . . . . . . . . . . . . . . . 173
3.7 Approximating integrals using Taylor polynomials . . . . . . . . . . . . . . . . . . . . . 180
3.8 CAS: computer algebra systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
3.9 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
3.10 Slicing problems: geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
3.11 Slicing problems: densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
4 Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

4.1 Differential equations: qualitative analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
4.2 Differential equations: approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
4.3 Separable differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
4.4 Lotka-Volterra predator-prey model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
4.5 The SIR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
4.6 Approximating solutions by Taylor polynomials . . . . . . . . . . . . . . . . . . . . . . . 255
5 Taylor polynomials and series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
5.1 From Taylor polynomials to Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
5.2 Transformations of Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
5.3 Ratio test and the radius of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
5.4 Applications of Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Chapter 1
Functions
1.1 Introduction
In this chapter we will study various basic functions that appear frequently in
Calculus, such as linear, exponential, logarithmic, power, polynomial, rational,
and trigonometric functions. Of course, we will often combine these functions by
adding, subtracting, multiplying, dividing, taking compositions and inverses. The
functions themselves, but also various properties of functions and operations we
can do with functions, can be:
• described in words, both in plain English or using mathematical terminology;
• expressed with mathematical formulas and notation;
• depicted and observed in figures via their graphs;
• represented by tables of values.
For this reason, learning Calculus is a lot like learning a new language, and even if
you feel that you understood a new concept, it is important to be able to express it
in different ways and translate it between words, formulas, graphs, and sometimes
recognize it from a table. Let us show an example of how the same information can
be expressed or observed in these different ways.
Example 1. Suppose that a function T =
f (t) describes temperature T changing over
time, and suppose that (in plain English)
the temperature is growing but the growth
is slowing down. Then we can express this
using mathematical terminology by saying
that the function is increasing and concave
down. We can also observe this behaviour
from the graph of y = f (t), or use formulas
and write that its first derivative is positive,
f ′ (t) > 0, and second derivative is negative,
f ′′ (t) < 0 (as we will learn later). Finally, if
1
2 1 Functions
we are given a table of temperature values at a few equally spaced points in time,
for example,
t 0 1 2 3 4 5
T 0 3.75 7 9.75 12 13.75
we can also see that the values are increasing, but the gaps between two consecutive
temperatures are decreasing: 3.75, 3.25, 2.75, 2.25, 1.75. Of course, in this case we
do not know what happens for all times t but, from what we can see in the table,
we can guess that the temperature is growing but it is growing slower and slower,
so the function T = f (t) is probably increasing and concave down.
Exercise 1. By looking at the table of values, is the function T = f (t) increasing

or decreasing? Concave up or concave down?
t 0 1 2 3 4 5
T 1 1.5 3 5.5 9 13.5
Domain and range. We will discuss terminology associated with functions all
throughout this chapter, but here let us briefly discuss the domain and range of
a function y = f (x). Generally speaking, the domain of a function is a set of all
√ the function f , and the range is a
possible inputs x that we are allowed to plug into
set of all possible outputs y. For example, y = x has the domain [0, ∞) because we
are only allowed to plug in positive values 0 ≤ x < ∞ into the square root, and the
range is also [0, ∞). However, in some cases the domain may be more narrow for
various reasons.
√ For example, if in a given problem we are only interested in the
function y = x on the interval [1, 2],√for the purpose of that problem the domain
will be [1, 2] and the range will be [1, 2]. In applied problems the domain may be
limited by the physical constraints of the problem.
Example 2. Suppose we have a 10′′ × 10′′ card-

board, and we cut off the four corners of size
x × x and then fold the sides to make a box. The
volume of the box V = V (x) will depend on x.
What is V (x) and what is its domain?
Solution: The dimensions of the box (ℓ × w × h)
will be (10 − 2x) × (10 − 2x) × x, so the volume
will be V (x) = (10−2x)2 x. The domain is [0, 5],
because we cannot physically cut off the corners
of size bigger than 5′′ × 5′′ .
Exercise 2. A ball is tossed straight up with initial speed 10m/s and initial height
above ground of 2m. The height of the ball at time t is given by h = −5t 2 + 10t + 2
meters. What is the domain and range of the function that describes the height of
the ball until the time it hits the ground? Hint: Recall quadratic formulas or see
Section 1.7.
1.1 Introduction 3
Answer to Exercise 1. It is increasing and the gaps between consecutive values

are also increasing: 0.5, 1.5, 2.5, 3.5, 4.5, so the function is increasing and concave
up.
Answer qto Exercise 2. The domain is
[0, 1+ 75 ]. As we can see in the figure,
the main point here is that the height q
h(t) is negative after time t = 1 + 75 ,
so the formula is no longer applicable
there. The maximum height will be 7m
at time t = 1, so the range is [0, 7].
4 1 Functions
1.2 Linear functions

Linear functions are functions of the form
y = b+m·x
where constant b is the y-intercept and constant m is the slope, x is the independent
variable (input of the function) and y is the dependent variable (output of the
function). Linear functions may be the simplest functions in Calculus, but they play
a fundamental role because the notion of a derivative f ′ (a) of a function y = f (x) at
a point x = a will be based on approximating f (x) at that point by linear functions.
Let us take a look at a graph of a linear function and consider any two points
(x1 , y1 ) and (x2 , y2 ) on this graph. We can see that:
• y1 = b + m · x1 – first point.
• y2 = b + m · x2 – second point.
• ∆x = x2 − x1 is called run.
• ∆y = y2 − y1 is called rise.
∆y
• m = tan(θ ) = ∆x is called slope.
• b = b + m · 0 is y-intercept.
The notation ∆ is used often in Calculus and means increment. For example,
above ∆x is the increment x2 −x1 of the variable x, and ∆y is the increment y2 −y1 of
∆y
the variable y. The slope m = ∆x represents how much the output variable changes,
∆y, relative to how much the input variable changes, ∆x. We can think of it as a
rate of change of y with respect to x, and for linear functions it is always the same
no matter what the interval [x1 , x2 ] is.
Example 1. If a car drives with a constant speed of 60 km/h, what is the distance
d it covers in t hours? If we think of the distance as a function of time, what is the
meaning of its slope?
Solution: Since Distance = Speed × Time when the speed is constant, in this case,
Distance d = 60 · t, measured in km/h × h = km. It is a linear function with y-
intercept 0 and slope 60, so the meaning of the slope m = ∆d ∆t is speed. Constant
speed means that distance is a linear function of time.
Exercise 1. Suppose that during photosynthesis at temperature 10°C in direct sun-

light a leaf of some plant produces 30 µmol of glucose and oxygen per one hour.
What is the amount of glucose and oxygen produced during t hours? If we think of
this amount as a function of time, what is the meaning of its slope?
1.2 Linear functions 5
Example 2. What is the linear function whose graph passes through points (1, 3)
and (5, 2)?
Solution: Since ∆x = 5 − 1 = 4 and ∆y = 2 − 3 = −1, the slope is m = ∆x ∆y
= − 41 =
−0.25. To find the intercept, we can use any one of the two points, for example the
first one: 3 = b + m · 1 = b − 0.25 · 1 = b − 0.25, so b = 3 + 0.25 = 3.25. The linear
function is y = 3.25 − 0.25x.
Exercise 2. What is the linear function whose graph passes through points (4, 1)
and (0, 3)?
Line from slope and one point. In the Example 2 above, once we found the
slope m, we computed the intercept b by plugging in the value of one point. Actu-
ally, if we know the slope m and a point (x0 , y0 ) on the graph of a linear function,
we can write the formula for this linear function directly:
y = y0 + m · (x − x0 ).
Indeed, if x = x0 then y = y0 + m · (x0 − x0 ) = y0 + m · 0 = y0 . It is very important to

remember this formula, because it will be used frequently to write the equations of
secant lines and tangent lines later one when we study derivatives. Let us see how
we could have used it in the second half of Example 2.
Example 3. What is the linear function whose slope is −0.25 and whose graph
passes through the point (1, 3)?
Solution: The function is y = 3 − 0.25(x − 1). Of course, we can multiply out the
second term and rewrite this as y = 3 − 0.25(−1) − 0.25x = 3.25 − 0.25x.
7
Exercise 3. What is the linear function whose slope is 11 and whose graph passes
through the point (2, 1)?
Tables and trend lines. If we are given a table of (x, y) values, it is easy to check
if they all lie on a graph of a linear function y = b + m · x. We only need to check
that all slopes between two consecutive values of x are equal. This is especially
easy if all increments ∆x are the same, in which case we only need to check that all
increments ∆y are also the same.
Example 4. Are all the points in the table lie on the graph of a linear functions? If
yes, which one?
x 0 2 4 6 8 10
y 0 3 6 9 12 15
Solution: We can see that all increments ∆x between consecutive points are equal
to 2, so we only need to check that all ∆y are also the same. Indeed, all ∆y are equal
∆y
to 3, so the points lie on the graph of one linear function. Its slope is ∆x = 23 = 1.5,
and since it passes through the point (0, 0), the formula is y = 0+1.5(x−0) = 1.5x.
6 1 Functions
Exercise 4. Are all the points in the table lie on the graph of a linear functions? If
yes, which one?
x −2 0 3 6 8 11
y −3 1 7 13 17 23
Example 5. Sometimes the points might not be exactly on a straight line but pretty
close to a straight line, as the points in the following table
x 0 2 4 6 8 10
y 0 3.4 5.5 8.8 12.6 15.1
shown as blue dots in the figure below. One way to find the trend line (shown in
red in the figure below) is to use the so called least squares regression, which can
be solved using optimization techniques studied later in this course. For now, we
will simply mention how to find this line using Google Sheets. The equation for
the line is shown at the top of the chart, in this case y = 1.52x − 0.0333.
• Enter x-values and y-values into two

columns, and select those columns.
• Go to Insert > Chart > Chart Type.
• Go to Insert > Chart.
• Under Chart Type select Scatter.
• Under Customize select Series.
• Scroll and check Trendline.
• A little below, under Label choose
Use Equation.
Exercise 5. Find the trend line (least squares regression) for the following data
points:
x −2 0 3 6 8 11
y −3.3 1.1 7.5 12.4 16.6 23.4
Answer to Exercise 1. y = 30t µmol. Slope 30 µmol/h is the rate of photosynthe-

sis, or the rate of production of of glucose and oxygen.
Answer to Exercise 2. y = 3 − 12 x.
7 3 7
Answer to Exercise 3. y = 1 + 11 (x − 2) = − 11 + 11 x.
Answer to Exercise 4. Yes, y = 2x + 1.
Answer to Exercise 5. y = 2.01x + 0.914.

1.3 Exponential functions 7
1.3 Exponential functions

Let us recall basic algebra rules involving powers:
ab ac = ab+c , (ab )c = abc , (ab)c = ac bc ,

1 b a c ac
−c a b−c
= a , = a , = c.
ac ac b b
Here we assume that all the terms make sense, i.e. we do not divide by zero, etc.
Example 1. Let us check that 2−x = 0.5x .

Solution: Using the above rules, we can write
1 1 x
2−x = x = = 0.5x .
2 2
Another way is to write
1 x
2−x = 2(−1)x = (2−1 )x = = 0.5x .
2
This means that 2−x and 0.5x are the same function of x.
Exercise 1. Check that (a) 100x = 102x , (b) 25−x/2 = 0.2x , (c) 2−2x = 0.25x .
Exponential functions are functions of the form
P = P0 · at .
• a > 0 is a positive constant called (exponential) base.

• t is an independent variable which often represents time.
• P is a dependent variable that sometimes represents a population.
• P0 is the initial value of P at time 0. Indeed, P(0) = P0 · a0 = P0 .
The notation for both variables t and P can change depending on the situation, so
y = 0.5 · 2x , h = 3 · 0.5−t , P = 100 · 2−t , z = 2 · 3−t/2 ,
are all possible examples of exponential functions. Typical examples of processes

described or modelled by exponential functions are:
• decay of a radioactive material;
• compound interest in a bank account;
• increase or decrease of a population;
• concentration of a drug in a patient’s body;
• probability of a lifetime of high quality products.
Let us take a look at the graph of several exponential functions.
8 1 Functions
Exponential growth: a > 1. When the exponential base a is bigger than 1,

for example y = 2x in the above graph, the function is increasing as the variable
x increases. For example, if the variable increases by one unit then the function
y = 2x+1 = 2 · 2x doubles compared to 2x . On the same unit interval the function
y = 4x quadruples, because y = 4x+1 = 4 · 4x , so it grows faster than y = 2x as we
can see in the above figure. The bigger a is, the faster the function ax grows.
Exponential decay: 0 < a < 1. When the exponential base a is less than 1, for
example y = 0.5x in the above graph, the function is decreasing as the variable
x increases. For example, if the variable increases by one unit then the function
y = 0.5x+1 = 0.5 · 0.5x if half of 0.5x . On the same unit interval y = 0.25x+1 =
0.25 · 0.25x is a quarter of 0.25x so the function y = 0.25x decays faster than y = 5x .
The smaller a is, the faster the function ax decays.
Example 2. In the figure we see

the graphs of three exponential
functions y = p · ax , y = q · bx , and
y = r · cx .
(a) Compare constants a, b and c.
(b) Compare constants p, q and r.
Solution: (a) The constant a is the

smallest, because a is less than
1 (exponential decay), and both b
and c are bigger than 1 (exponen-
tial growth). Also, b > c because
the growth of y = q · bx is faster. So, a < c < b.
(b) The constants p, q and r are initial values at x = 0, so we are comparing the
y-intercepts: q = r < p.
Exercise 2. Sketch the graphs of functions y = 2 · 1.2x , y = 3 · 1.1x and y = 2 · 0.8x

on the same plot. (Check your answer using, for example, geogebra.org).
Rate of growth/decay. Given an exponential function P = P0 · at , let us express

the base a relative to 1 as a = 1 + r:
P = P0 · at = P0 · (1 + r)t .
In the case of exponential growth, when a > 1, the constant r = a − 1 > 0 is called
the (exponential) growth rate. In the case of exponential decay, when 0 < a < 1,
the constant −r = 1 − a > 0 is called the (exponential) decay rate.
Example 3. Find the exponential growth/decay rate of the functions y = 2 · 1.1x

and y = 3 · 0.98x .
Solution: Since y = 2 · 1.1x = 2 · (1 + 0.1)x , the growth rate is r = 0.1. Since y =
3 · 0.98x = 2 · (1 − 0.02)x , the decay rate is −r = 0.02.
Exercise 3. Write down an exponential function with the growth rate 0.01 and
initial value 1.5. Write down an exponential function with the decay rate 0.04 and
initial value 2.
Example 4. We deposit D dollars into a savings account with annual interest 2%.
How much money is in the account after t years?
Solution: If we start with D dollars, in one year we accumulate interest 0.02D, so
the total will be D + 0.02D = D · 1.02. After two years total will be D · 1.02 · 1.02 =
D(1.02)2 , after three years D(1.02)3 , and after t years D(1.02)t . Actually, a typical
savings account accumulates interest continuously, so if we close the account after
t years, where t is not necessarily integer, we will have D(1.02)t dollars.
Exercise 4. Suppose a radioactive material decays at the rate of 2.5% per year.
What percent of the original will remain after 100 years?
Example 5. Suppose that annual sales at a bakery are growing at 2% per year. How
can we model the annual sales?
Solution: If the sales during current year total A dollars, time t = 1 denotes next
year, t = 2 is the year after next, etc., then as in the previous example the sales
during year t will be A(1.02)t . If one prefers to denote current year by t = 1 instead
of t = 0 then we need to shift time by 1 so that the sales during year t will be
A(1.02)t−1 .
Also, the difference with the previous example is that, for non-integer t, this
formula does not have a particular meaning; for example, A(1.02)2.5 does not di-
rectly represent anything related to sales at t = 2.5 years. Instead of annual sales,
we could model the rate of sales using exponential, but this will be studied much
later because, in this case, sales within any interval of time would be computed
using integrals.
Exercise 5. Suppose that annual sales at a bakery are decreasing at 1% per year.
How can we model the annual sales?
10 1 Functions
Continuous rate of growth/decay. Recall Euler’s number e = 2.718281828 . . .

and the natural logarithm function ln(x). We will review logarithms in the next
section, but for now we only need to recall that any positive number a > 0 can be
written as a = eκ , and this equation can be solved for κ using natural logarithm:
a = eκ =⇒ κ = ln(a).
It means that, given an exponential function P = P0 · at , we can always express the

base a as a = eκ with κ = ln(a), and rewrite this function as
P = P0 · at = P0 · (eκ )t = P0 · eκt .
In the case of exponential growth, when a > 1, the constant κ = ln(a) > 0 is called
the continuous growth rate. In the case of exponential decay, when 0 < a < 1, the
constant −κ = − ln(a) > 0 is called the continuous decay rate.
Example 6. Find the continuous growth/decay rate of the functions y = 2 · 1.1x and
y = 3 · 0.98x .
Solution: In the first case, κ = ln(1.1) = 0.0953 . . . is the continuous growth rate,
and we can write the function y = 2 · 1.1x as y = 2 · e0.0953x . In the second case,
κ = ln(0.98) = −0.0202 . . ., so the continuous decay rate is 0.0202, and we can
write the function y = 3 · 0.98x as y = 3 · e−0.0202x . Notice that the decay rate itself
is positive, but the fact that the function is decreasing (decay) instead of increasing
(growth) is expressed by the minus sign in the exponent e−0.0202x .
Exercise 6. Find the continuous growth/decay rate of the functions y = 2 · 1.02x

and y = 7 · 0.9x .
The reason why in Calculus we prefer to write

exponential functions in the form P = P0 · eκt is
because the base e is quite special and has nice
properties, which will make it more convenient to
use when dealing with derivatives and integrals
later on. For example, both the derivative and in-
tegral of ex will be ex itself. For more about con-
tinuous rates: youtu.be/sbLWLvSfvwk.
Half-life. Many exponential decay processes are described by their half-life,

which is the interval of time H required for the quantity to decrease to one-half of
its initial value. It is easy to guess the formula
1 t/H
P = P0 · = P0 · 2−t/H
2
1 H/H
= P20 . We can also find this

because, in this case, P(H) is equal to P0 · 2
formula by taking an exponential function P = P0 · at , making sure that P(H) is
equal to P20 , and solving for a:
P0 1 1 1/H
P(H) = P0 · aH = =⇒ aH = =⇒ a= .
2 2 2
1 t/H
Then P = P0 · at = P0 ·

2 , which matches the formula above.
Example 7. Aspirin has a half-life of 20 minutes in a patient’s body (once absorbed

in the upper gastrointestinal tract). How long does it take for 100mg of aspirin to
be reduced to 30mg?
t/20
Solution: We want to find time t such that P = 100 · 12 = 100 · (0.5)t/20 = 30.
Dividing both sides by 100 and then taking natural logarithm of both sides:
t
(0.5)t/20 = 0.3 =⇒ ln(0.5)t/20 = ln 0.3 =⇒ ln 0.5 = ln 0.3,
20
so t = 20lnln0.5
0.3
= 34.74 min ≈ 34 minutes and 44 seconds. In the above calculation
we used the property of logarithm that
ln(ab ) = b ln(a).
Exercise 7. If it takes 60 minutes for 100mg of a drug to be reduced to 30mg in a

patient’s body, what is the half-life of this drug?
Key property of exponentials. The key property of an exponential function is

that, on any interval of the same length h, it changes by the same factor. Indeed,
P(t + h) = P0 · at+h = P0 · at · ah = ah · P(t).
So the factor P(t+h) h

P(t) = a depends only on h but not on t. One way we can use this is
to find the formula for an exponential function if we know two points on its graph.
Example 8. Find the exponential function

y = y0 · ax that passes through the points
(2, 3) and (5, 7).
Solution: Since 3 = y0 · a2 and 7 = y0 · a5 ,

5
we get 37 = yy0 ·a 3 , so a = 7 1/3 and y =

2 = a 3
x/3 0 ·a
y0 · 37 . To find y0 , we can use any one
of the two points, for example, 3 = y(2) =
2/3 2/3
y0 · 73 , so y0 = 3 37 .
12 1 Functions
Exercise 8. Find the exponential function y = y0 · ax that passes through the points
(2, 5) and (6, 1).
We can also use the above property to check if the points in a table correspond
to some exponential function.
Example 9. Do the values in the table correspond to some exponential function?
x 0 2 4 6 8
y 4 6 9 13.5 20.25
Solution: We see that the increments of x in the table are all equal to h = 2, so the
ratio y(x+2) 2 x
y(x) should be the same for all x, equal to a if y = y0 · a . Indeed,
6 9 13.5 20.25
= = = = 1.5,
4 6 9 13.5
2
√ values in the table correspond to exponential function with a = 1.5, or
so the
a = 1.5. We can find y0 using any point in the table, for example the first one,
4 = y(0) = y0 · a0 = y0 , so y = 4 · (1.5)x/2 .
Exercise 9. Do the values in the table correspond to some exponential function?
x −5 −2 1 4 7
y 8 12 18 28 40.5
Later on we will discuss the so called logarithmic scale, which will allow us
to observe more easily whether the points lie on the graph of some exponential
function, even in the case when the increments of x are not necessarily equal.
Average exponential growth. If the rate of growth constantly changes, how do

we define the average growth rate over a given period of time? Let us answer this
by looking at the example of world population growth.
Example 10. The world population

in 1928 was 2 billion and in 1950 it
was 2.5 billion. What was the average
growth rate during this period?
Solution: If we had a constant rate of
growth r over the period of 22 years,
we would have 2.5 = 2 · (1 + r)22 .
Solving for r we get r = (1.25)1/22 −
1 = 0.01. This means that the average
rate of growth was 1%.
Exercise 10. The world population in 1950 was 2.5 billion and in 1987 it was 5
billion. What was the average growth rate during this period?
Double exponentials. Functions of the form

−ct ct
y = ae−be and y = ae−be
where a > 0, b > 0 and c > 0 are

positive constants (parameters), can
be used to describe various growth
and decay processes, such as tumour
growth or survival probabilities in a
population. We will see example of
such models later on. Such functions,
where one exponential appears inside
another exponential, are also called
Gompertz functions. Notice that y =
−ct ct
ae−be is increasing and y = y = ae−be is decreasing with t. Here we will only
use these Gompertz functions to practice some basic calculations.
−ct
Example 11. What value does y = ae−be approach as t gets bigger and bigger?
Solution: First of all, as t gets bigger, e−ct
approaches 0 because it is an expo-
−be −ct
nential decay function. This means that e approaches e−b·0 = e0 = 1 and,
−be−ct
finally, ae approaches a. This means that y = a is the horizontal asymptote as
t approaches infinity.
−ct
Exercise 11. At what time t is the function y = ae−be equal to 12 ? For such time
to exist, what do we need to assume about a?
Answer to Exercise 3. y = 1.5 · (1.01)x , y = 2 · (0.96)x .
Answer to Exercise 4. y = y0 ·(0.975)t so y(100) = y0 ·(0.975)100 = y0 ·0.0795 . . .,

which is about 7.95% of the original amount.
Answer to Exercise 5. If the sales during current year total A dollars, time t = 1
denotes next year, t = 2 is the year after next, etc., then the sales during year t will
be A(0.99)t .
Answer to Exercise 6. Continuous growth rate of y = 2 · 1.02x is κ = ln(1.02) =

0.0198, and continuous decay rate of y = 7 · 0.9x is −κ = − ln(0.9) = 0.10536.
Answer to Exercise 7. 30 = 100·2−60/H , and solving for H we get H = − 60 ln(2)

ln(0.3) =
34.53 min. See next section if you need to review properties of logarithms that are
used in such calculations.
14 1 Functions
Answer to Exercise 8. y = 53/2 · 5−t/4 = 53/2−t/4 .
Answer to Exercise 9. No, because all increments of x are equal to 3, but the ratio
12 28
8 = 1.5 is different from 18 = 1.555 . . . .
Answer to Exercise 10. r = 21/37 − 1 = 1.01891 or about 1.89%. For more about
this example, see youtu.be/9_VJ2PvZBuo.
Answer to Exercise 11. There are two ways to answer this question. First, we
show in the previous example that the top asymptote in the figure is y = a, so the
function takes values between 0 and a. If we want the function to be equal to 0.5 at
−ct
some point x, we must have a > 0.5. Another way to solve this is set ae−be = 0.5
and start solving for t:
−ct 1 1 ln(2a)
e−be = =⇒ − be−ct = ln = − ln(2a) =⇒ e−ct = .
2a 2a b
Before we take logarithms of both sides again, we must notice that the exponential
e−ct on the left hand side is always positive, so the right hand side must also be
positive. Since we agreed that b > 0, the numerator ln(2a) must be positive. This
means that 2a should be bigger than 1 or, again, a > 0.5. So a must be bigger than
0.5, otherwise, such t does not exist. If a > 0.5 then we can take logarithms on both
sides of the last equation above to get that t = − 1c ln ln(2a)
b .
1.4 Logarithmic functions 15
1.4 Logarithmic functions

Suppose we want to find a number a such that 10a is equal to 4. In other words,
we want to solve 10a = 4 for a. Such number a is called log(4) = 0.602 . . . where
the function log is called logarithm base 10. More generally, because any positive
number b > 0 can be an output of the exponential function 10x , we can also solve
10a = b to find a given any b > 0. Such a is denoted log(b). To summarize:
b = 10a ⇐⇒ a = log(b).
If base 10 in the exponent is replaced by Euler’s number e = 2.71828 . . . ,
b = ea ⇐⇒ a = ln(b)
then log is replaced by the natural logarithm ln(b), again assuming that b > 0.
Remark. One can similarly define logarithm with any positive base, but we will
only use log(x) and ln(x). In fact, most of the time we will use ln(x), and even
log(x) might appear only occasionally. We mentioned in the previous section that
exponential function ex with the base e is the most commonly used exponential
function in Calculus because of its nice properties (that we will learn later on).
As a consequence, natural logarithm ln(x) is the most commonly used logarithmic
function and, by default, ‘logarithm’ refers to ln(x).
Basic properties of logarithms. Let us take a look at the graph of y = ln(x) and
discuss some of its properties after stating them first.
• The graph of ln(x) is graph of ex

flipped around the diagonal y = x.
• ln(x) is defined only for positive
values x > 0.
• ln(x) approaches −∞ when x ap-
proaches 0 (vertical asymptote).
• ln(1) = 0, i.e. x-intercept is 1.
• ln(x) slowly approaches +∞ as x
goes to infinity.
• The first property means that if (a, b) is on the graph of y = ex then (b, a) is on
the graph of y = ln(x) (the two coordinates are flipped). This is true because,
if (a, b) is on the graph of y = ex this means that b = ea , which means that
a = ln(b), which means that (b, a) is on the graph of y = ln(x).
16 1 Functions
• The second property is true because ea takes only positive values, so we can
solve ea = x for a only if x is positive. This is a good time to mention the ter-
minology of the domain and range of a function y = f (x). Domain means the
set of all allowed inputs of a function, and range means the set of all possible
outputs. For example, domain of ex is the set of all real numbers R = (−∞, ∞),
while the range is the set of all positive numbers (0, ∞). For ln(x), it is exactly
the opposite – domain is (0, ∞) and range is R.
• The third property we can see from the graph, but we can also think of it this
way. If x > 0 is small and ea = x then a = ln(x) must be a large negative number,
because the exponential growth function ea takes small values x when the input
a is approaching negative infinity.
• The fourth property is clear because e0 = 1 implies that 0 = ln(1).
• The last property has two parts: the first is that ln(x) never stops growing (so it
does not have a horizontal asymptote!) and the second is that it grows slowly.
First, why will ln(x) eventually be equal to any large number we want, for ex-
ample, 100? That is because ln(x) = 100 means that x = e100 and, in this case,
‘eventually’ simply means e100 . This also illustrates how slowly the logarithm
grows. Even though it will reach 100, we have to ‘wait’ until e100 which is ap-
proximately equal to 26881171418161354484126255515800135873611118.
Example 1. Can the values in the following table correspond to an exponential or

logarithmic function?
x −1 1 3 6 8 11
y 0 2 4 5 5.5 5.9
Solution: The answer is no to both. It cannot be a logarithm because we cannot
plug in a negative number x = −1 into a logarithm (it is not in the domain). It
cannot be an exponential because it can not take value y = 0 (it is not in the range).
Exercise 1. What is the domain of y = ln(−x)? How does its graph look like?
Algebraic properties of logarithms. From the definition above, one can derive
several important algebraic properties of the logarithm:
a
ln(ab) = ln(a) + ln(b), ln = ln(a) − ln(b)
b
ln(ac ) = c · ln(a), ln(ex ) = x, eln x = x.
Example 2. Find x such that 42x = 7 · 5−x/4 .

Solution: Take logarithm of both sides, ln(42x ) = ln(7 · 5−x/4 ), and then apply the
above rules:
x
2x ln(4) = ln(7) + ln(5−x/4 ) = ln(7) − ln(5).
4
This is now a linear equation in x, so group all the terms with x on one side:
x ln(5) 8 ln(4) + ln(5)

ln(7) = 2x ln(4) + ln(5) = x 2 ln(4) + =x ,
4 4 4
4 ln(7)
so x = 8 ln(4)+ln(5) = 0.612895 . . . .
Exercise 2. If the room temperature is 20°C and the temperature of a cup of coffee
is 80°C at time t = 0 then the coffee will cool down according to the formula
T = 20 + 60 · e−κt for some constant κ. At what time will the temperature reach
70°C? Your answer may depend on κ.
Example 3. Simplify y = 3e−2 ln x .

Solution: We can write 3e−2 ln x = 3(eln x )−2 = 3x−2 = x32 . We can also take different
−2
steps: 3e−2 ln x = 3eln(x ) = 3x−2 = x32 .
1
Exercise 3. Simplify y = 4e−3 ln( x ) .
Example 4. Suppose that the temperature T = T (t) of a cup of coffee is initially

T (0) = 90◦ C and, if the room temperature is 20◦ C, it is decreasing according to
the equation − ln(T − 20) = 0.1t + c. Find T (t).
Solution: First, we plug in T (0) = 90 into the equation − ln(90 − 20) = 0.1 · 0 + c,
which gives that c = − ln(70). The equation becomes − ln(T − 20) = 0.1t − ln(70)
or ln(T − 20) = −0.1t + ln(70). Exponentiating both sides we get that T − 20 =
e−0.1t+ln(70) = e−0.1t eln(70) = 70e−0.1t . Finally, T = 20 + 70e−0.1t .
Exercise 4. Suppose that the number of people N = N(t) that have heard a rumour
spreading at a party is initially N(0) = 1 and, if there are 100 people attending the
1 N t
party, it is increasing according to the equation 100 ln 100−N = 50 + c. Find N(t).
Logarithmic scales. There are many quantities that are conventionally mea-
sured on logarithmic scales when the original (more physical) measurement can
cover a wide range of values of very different orders of magnitude, from very
small to very large. We will give three examples below.
The Richter magnitude R of an earthquake is defined as
A
R = log
A0
where A is the maximum amplitude (measured in millimetres) recorded on a stan-

dard seismograph (the Wood-Anderson seismograph) at a distance of 100 km from
the earthquake epicentre, and A0 = 0.001mm is the amplitude corresponding to
the so called ‘standard earthquake’. The amplitude A is one empirical parameter
describing the strength of the earthquake. We can rewrite the above formula as
A = A0 · 10R = 0.001 · 10R = 10R−3 .

18 1 Functions
This means that if the Richter magnitude R increases by 1, the amplitude A (or the
strength of the earthquake) increases 10 times. For example, magnitude 9 earth-
quake would correspond to A = 106 mm= 1 km, which means that in practice the
measurements are not as simple as the definition suggests.
Example 5. How much stronger is the

earthquake of magnitude 5.8 compared
to the earthquake of magnitude 5.3? Is it
5.8
5.3 ≈ 1.094 times stronger?
Solution: Because
A(5.8) A0 · 105.8 √
= 5.3
= 100.5 = 10,
A(5.3) A0 · 10
the
√ earthquake of magnitude 5.8 is
10 ≈ 3.16 times stronger than the earth-
quake of magnitude 5.3. It is not 5.8
5.3 ≈ 1.094 times stronger, because the Richter
magnitude measures strength on the logarithmic scale.
Exercise 5. How much stronger is the earthquake of magnitude 7.1 compared to

the earthquake of magnitude 5?
In chemistry, pH scale is used to measure the acidity or alkalinity of a solution

in water according to the formula
1
pH = log = − log H +
H+
where H + is the number of moles of hydrogen ions per litre of solution.

In acoustics, sound pressure level L is measured in decibels (dB) according to
the formula p
L = 20 · log
p0
where p is the sound pressure measured in pascal (Pa) and p0 = 20 µPa =
20 · 10−6 Pa is the reference sound pressure considered as the threshold of hu-
man hearing (according to Wikipedia, roughly the sound of a mosquito flying 3 m
away).
Example 6. If your earphones can output 110 dB and your friend’s earphones can
output 100 dB, how much more damage can you do to your ears? Is it only 10%
more?
Solution: If p1 is the maximum sound pressure of your earphones and p2 is the
maximum sound pressure of your friend’s earphones then the above formula gives
p p
1 2
110 = 20 · log , 100 = 20 · log .
p0 p0
From here we can solve it in two ways. First, we can subtract the two equations
and use properties of the logarithm,
p p p
1 2 1
110 − 100 = 20 · log − 20 · log = 20 · log ,
p0 p0 p2
√
which implies that log pp21 = 20 = 0.5, so pp12 = 10 ≈ 3.16. This means that your
10
earphones are 3.16 times as noisy in terms of sound pressure. Again, it is not just
10% more, because the dB measurement is on the logarithmic scale. Another way
is first to solve the above pressure level equation for p,
p = p0 · 10L/20 ,
which gives that p1 = p0 · 10110/20 and

√ p2 = p0 · 10
100/20 . Dividing two equations,
p1 110/20−100/20
we again get p2 = 10 = 10.
Exercise 6. If the sound pressure level of a jackhammer is 100 dB and of the jet
engine is 140 dB, how much louder is the jet engine in terms of sound pressure?
Answer to Exercise 1. Because we can only plug in positive numbers into loga-
rithm, −x must be positive, so −x > 0, or x < 0. The domain is all negative num-
bers, (−∞, 0). The graph will be the same as ln(x) flipped around the y-axis. It is
always the case that the graph of y = f (−x) is graph of y = f (x) flipped around the
y-axis.
Answer to Exercise 2. Set 70 = 20 + 60 · e−κt , so e−κt = 65 , and taking logarithms,

−κt = ln 56 = − ln 65 or t = κ1 ln 65 = lnκ1.2 .
Answer to Exercise 3. y = 4x3 .

1 1
Answer to Exercise 4. Plugging in t = 0 and N = 1 we get that c = 100 ln 99 . Then
1 N t 1 1 N 1
ln = + ln =⇒ ln = 2t + ln
100 100 − N 50 100 99 100 − N 99
N 1 2t 2t 2t
=⇒ = e =⇒ 99N = 100e − Ne
100 − N 99
100e2t
=⇒ N(99 + e2t ) = 100e2t =⇒ N = .
99 + e2t
Answer to Exercise 5. 102.1 ≈ 125 times.
Answer to Exercise 6. 10140/20−100/20 = 100.

20 1 Functions
1.5 Logarithmic scales

In the last section we have seen several examples of measurements on logarithmic
scales, and in this section we will look at logarithmic scales from a different angle.
Namely, we will use log-transformations to help us decide if some data points
follow an exponential trend of the form y = y0 · ax or a power function trend of the
form y = c · xκ .
Log scale. Let us begin with an exponential trend of the form y = y0 · ax .
Example 1. Bacteria is grown in a petri dish and its surface area A (cm2 ) is mea-
sured at various times t (days).
t 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
A 2.5 2.6 3.9 5.3 5.9 7.0 9.4 12.1 14.4 17.6
The values are given in the above table. It is natural
to assume that the growth is exponential, but it may
be hard to tell just by looking at the graph of these
point (shown in blue in the figure below). Before we
discuss how log-transformation can be used to see
the exponential trend more clearly, let us first men-
tion how to plot the exponential trend line (shown
in red in the figure) in Google Sheets.

• Under Type select Exponential.
• Under Label choose Use Equation.
The exponential trend line A = 1.98e1.1t looks like a good fit, but how could we
expect this just by looking at data points? One way is to transform the dependent
variable, in this case the area A, into log(A) or ln(A). That is because if A = A0 · at
then, taking logarithms,
A = A0 · at =⇒ log(A) = log(A0 ) + (log a) · t.
For example, if A = 3.5 · 2t then log(A) = log(3.5) + log(2) · t = 0.544 + 0.301 · t.

This means that log(A) is a linear function of t and, of course, it is easier to see
visually if a function is linear. For example, let us add log(A) values to the above
table:
1.5 Logarithmic scales 21
t 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
A 2.5 2.6 3.9 5.3 5.9 7.0 9.4 12.1 14.4 17.6
log(A) 0.39 0.42 0.59 0.73 0.77 0.84 0.97 1.08 1.16 1.24
If we plot the values log(A) in the

last row vs. time t (shown in blue in
the figure), we can see with the naked
eye that these points follow a linear
trend. By the way, in Google Sheets
this figure can be produced from the
above figure by a simple extra step:
• Under Customize select Vertical
axis and select option Log scale.
Remark. If you look at the y-axis in this graph, you might notice something
strange: the increment between 4 and 2 looks bigger than the increment between 6
and 4 which looks bigger than the increment between 8 and 6, etc. This is because
the values 2, 4, 6, 8 etc. marked on the y-axis are the values of the original area A,
while the actual values plotted on the graph are log(2), log(4), log(6), log(8) etc.
This is just a convention with log-scale plots that allows us to see possible linear
trend of log(A) vs. t, while at the same time see the original values A.
Exercise 1. Bacteria is grown in a petri dish and its surface area A (cm2 ) is mea-
sured at various times t (days):
t 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
A 1.9 2.2 3.3 4.2 4.9 7.8 9.4 11.7 15.8 20.4
Plot the data on log-scale and fit the exponential trend line.
To summarize the above discussion, if on the log-scale A is a linear function of t
(or approximately follows a linear trend) then A is an exponential function of t (or
approximately follows an exponential trend). Using formulas:
log(A) = b + m · t =⇒ A = 10b+m·t = 10b · (10m )t .
Although base 10 is commonly used in log-scale plots, the same would hold if we
used base e and natural logarithm:
ln(A) = b + m · t =⇒ A = eb+m·t = eb · emt .
Notice that in both cases if the slope on the log-scale is positive, m > 0, then we
have an exponential growth on the original scale and, if the slope on the log-scale
is negative, m < 0, then we have an exponential decay on the original scale.
Example 2. Find y as a function of x if (a) log(y) = 2 + 2x , (b) ln(y) = 1 − 2x. What
is the continuous growth/decay rate in both cases?
22 1 Functions
Solution: (a) y = 102+x/2 = 102 · 10x/2 = 100 · (101/2 )x . This is an exponential

growth function with the base 101/2 ≈ 3.1622 and the continuous growth rate
ln(101/2 ) = 12 ln(10) ≈ 1.151.
(b) y = e1−2x = e · e−2x . This is an exponential decay function with the continu-
ous decay rate 2.
Exercise 2. Find y as a function of x if on the log-scale it is given by −1 − 2x

3 . What
is the continuous growth/decay rate?
Log-log scale. Next we will discuss how to use logarithmic scales to observe a
trend given by the power function of the form y = c · xκ . In this case we will be
using log-log scale, because both input variable and output variable will be trans-
formed by the logarithm. There are many examples of a power law relationship be-
tween variables: see, for example, wikipedia.org/wiki/Power_law. We
will start by explaining the idea on a simple example.
Example 3. Consider the following data points (x, y) shown in blue in the figure
below:
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
y 0.003 0.024 0.070 0.196 0.347 0.607 0.934 1.656 2.178 2.941
We can see that the power function y = 2.9 · x3.03 (shown in red) is a good fit, and
it can be found in the same way as above only selecting Power Series instead of
Exponential under the trend line type.

• Under Type select Power Series.
• Under Label choose Use Equation.
How can we guess that a power function might be a good fit? This can be done
by transforming both variables x and y into log(x) and log(y). The reason is because
if y = c · xκ then, taking logarithms,
y = c · xκ =⇒ log(y) = log(c) + κ · log(x).
For example, if y = 3.5 · x2 then log(y) = log(3.5) + 2 · log(x) = 0.544 + 2 · log(x).

This means that log(y) is a linear function of log(x) and, again, it should be easier
to see visually if a function is linear. For example, let us replace all the values in
the above table by their logarithms:
log(x) -1.00 -0.70 -0.52 -0.40 -0.30 -0.22 -0.15 -0.10 -0.05 0.00
log(y) -2.46 -1.68 -1.06 -0.75 -0.42 -0.23 0.02 0.20 0.34 0.46
If we plot the values log(y) vs.

log(x) (shown in blue in the figure),
we can see with the naked eye that
these points follow a linear trend
(shown in red). To produce this fig-
ure in Google Sheets:
• Select Log scale option under
Vertical axis.
• Select Log scale option under
Horizontal axis.
Again, notice that the labels on the two axes are from the first (x, y) table above,
although the actual plot uses log(x) and log(y).
Exercise 3. Consider the following data points (x, y) shown in blue in the figure
below:
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
y 0.003 0.021 0.060 0.134 0.277 0.488 0.718 1.410 1.851 2.541
Plot the data on the log-log-scale and fit the power trend line.
To summarize the above discussion, if on the log-log-scale y is a linear function
of x (or approximately follows a linear trend) then y is a power function of x (or
approximately follows a power trend). Using formulas:
log(y) = b + κ · log(x) =⇒ y = 10b+κ·log(x) = 10b · xκ .
Although base 10 is commonly used in log-scale plots, the same would hold if we
used base e and natural logarithm:
ln(y) = b + κ · ln(x) =⇒ y = eb+κ·ln(x) = eb · xκ .
Notice that in both cases the slope κ of

the line on the log-log-scale is the power in
c · xκ on the original scale. We can see in the
figure that the behaviour of such power func-
tions is quite different depending on whether
κ < 0, 0 < κ < 1, or κ > 1. As a result, the
slope of a linear trend on the log-log scale
tells us about the shape of the function on
the original scale.
24 1 Functions
Example 4. Find y as a function of x if (a) log(y) = 2 + 12 log(x), (b) ln(y) =

1 − 2 ln(x).
1 √
Solution: (a) y = 102+ 2 log(x) = 102 x1/2 = 100 · x. (b) y = e1−2 ln(x) = e · x−2 = xe2 .
Exercise 4. Find y as a function of x if on the log-log-scale their relationship is
linear with y-intercept −1 and slope − 32 .
Exercise 5. Match the lines 1, 2, 3 on the log-log-scale to the power functions
A, B,C on the original scale.
Finally, let us take a look at one example of application of log-log-scale in biology.
Here is an example from the paper1 . The

figure shows log-log-plots of body mass M
(in kg) vs. body length L (in m) of 66 plant
species (white dots) and 67 animal species
(black dots). This data shows that the mass
of animals follows the power trend of the
form M = c · L2.81 and the mass of plants
follows the power trend of the form M =
c · L2.95 , so the slopes of the two lines in the
figure are 2.81 and 2.95. Since mass M is
proportional to volume V , which should be
roughly proportional to L3 , it seems natural
that the law should be M = c · L3 ; indeed,
both exponents 2.81 and 2.95 are close to
3. However, things are not so simple. The
same paper mentions that the law for 20 year
old human males is closer to M = c · L5 and
another paper2 shows that within the same
species of plants the law is closer to M = c · L4 .
1“The scaling of plant and animal body mass, length, and diameter” by K.J. Niklas.
2“Invariant scaling relationships for interspecific plant biomass production rates and body size” by K.J.
Niklas and B.J. Enquist.
Exercise 6. The figure below is from a New York Times article about bear mar-
kets.3 On the figure it says that “the vertical scale is adjusted so that percentage
changes are comparable”. What is the vertical scale, and why does it make per-
centage changes comparable?
Summary of log-scales. To summarize why log-scales are so useful, let us com-

pare the Dow Jones index on the original scale and on the logarithmic scale.
• Many quantities take values on vastly different scales. For example, for several
decades the Dow Jones was below or around 100, while in recent two decades
3 https://www.nytimes.com/2022/06/13/business/bear-market-timeline-stocks.html
26 1 Functions
it was around or above 10000. In addition to examples from the last section
(strength of earthquakes, acidity or alkalinity of a solution in water, sound pres-
sure), other examples include mass and size of plants, or luminosities of stars4 .
In the first figure above we can see that, if we plot values of different magnitude
on the same graph, we can barely distinguish the values that are relatively small.
For example, we can barely see what is going on with the Dow Jones between
1900 and 1980. Logarithm grows slowly, so it can make very small and very
large values comparable to each other.
• As we discussed in this section, log-scale transforms an exponential function
into a linear one. If we observe a linear trend on the log-scale, it suggests an
exponential trend on the original scale. Although the Dow Jones index has large
fluctuations, we can see that long term it follows a roughly linear trend on the
log-scale.
• As we saw in the last exercise, log-scale makes relative changes comparable.
For example, when we look at the Dow Jones on the original scale, we might
think that the Financial crisis of 2007–2008 was the worst stock market crash,
while on the log-scale we can clearly see that the Wall Street Crash of 1929 was
much worse.
• The same considerations apply to the log-log-scale, except that it turns a power
trend into a linear one.
Answer to Exercise 1. The exponential trend line is A = 1.44e1.33t .
Answer to Exercise 2. y = 10−1−2x/2 = 10−1 · 10−2x/3 = 0.1 · (10−2/3 )x . This is

an exponential decay function with the base 10−2/3 ≈ 0.2154 and the continuous
decay rate − ln(10−2/3 ) = 32 ln(10) ≈ 1.535.
Answer to Exercise 3. The power function trend line is y = 2.33 · x2.96 .

2
Answer to Exercise 4. y = 10−1− 3 log(x) = 10−1 x−2/3 = 0.1
x2/3
.
Answer to Exercise 5. A3, B1,C2.
Answer to Exercise 6. This is a log-scale. In other words, the y-axis reflects the
logarithm of the S&P 500 price P(t). If the stock market changed by 100 · r%
between time t and t + ∆t then the price P(t + ∆t) = P(t) · (1 + r). After taking
logarithms, we get log P(t + ∆t) − log P(t) = log(1 + r), and since the graph shows
y(t) = log P(t), this means that y(t + ∆t) − y(t) = log(1 + r). In other words, the
same increment on the log-scale corresponds to the same percentage change. That
is why log-scale makes percentage changes comparable no matter what the actual
value is.
4 https://en.wikipedia.org/wiki/Hertzsprung-Russell_diagram
1.6 Trigonometric functions 27
1.6 Trigonometric functions

Basic transformations of functions. First, let us discuss what happens to the graph
of a function y = f (x) when we multiply and add a constant to its input and output:

y = b + a f c(x − d)
for some constants a, b, c and d. Let us break this down into steps. At the same time,
you can visualize by dragging sliders in this app: www.geogebra.org/m/csuqnyhc.
• Vertical stretching and flipping: y = ±a · f (x).

Multiplying the output f (x) of a function by a constant a stretches the graph
by a factor of a in the vertical direction, since output is depicted on the y-axis.
For example, the graph of y = 2 f (x) is stretched two times vertically, and the
graph of y = 0.5 f (x) is stretched 0.5 times (or shrunk 2 times) vertically. If we
multiply by −a, the graph is also flipped upside down.
• Horizontal shrinking and flipping: y = f (±c · x).
Multiplying the input x of a function by a constant c shrinks the graph c times in
the horizontal direction. For example, the graph of y = f (2x) is shrunk 2 times
horizontally, and the graph of y = f (0.5x) is shrunk 0.5 times (or stretched 2
times) horizontally. This might sound a bit counterintuitive, but it is true. For
example, if c = 2 then a point (2x, f (2x)) on the original graph of y = f (x)
becomes (x, f (2x)) on the graph of y = f (2x), so the first coordinate 2x becomes
x, so it is shrunk be a factor of 2. If we multiply by −c, the graph is also flipped
horizontally around the y-axis.
• Shifting up or down: y = f (x) ± b.
Adding b to or subtracting b from f (x) simply shifts the graph up or down by b.
• Shifting right or left: y = f (x ± d).
Subtracting d from the input x shifts the graph to the right by d. Again, this
might sound counterintuitive, but a point (x − d, f (x − d)) becomes (x, f (x − d))
so x − d is shifted to the right by d. Adding d shifts the graph to the left by d.
Example 1. If we stretch the graph of y = f (x) both horizontally and vertically

2 times, shift it up by 1 and shift it to the left by 2, what function will this graph
correspond to?
Solution: y = 2 f (x) stretches vertically, then y = 2 f ( 2x ) stretches horizontally
(shrinking by a factor of 21 is stretching by a factor of 2), then y = 1 + 2 f ( 2x ) shifts
up by 1, and finally y = 1 + 2 f ( x+2
2 ) shifts to the left by 2.
Exercise 1. If we shrink the graph of y = f (x) vertically 2 times and shrink it

horizontally 3 times, shift it down by 1 and shift it to the right by 7, what function
will this graph correspond to?
28 1 Functions
Example 2. If we shift the graph of y = f (x) up by 1 and to the left by 2, and then
stretch it both horizontally and vertically 2 times, what function will this graph
correspond to?
Solution: y = 1 + f (x) shifts up by 1, then y = 1 + f (x + 2) shifts to the left by 2,
then y = 2(1+ f (x+2)) stretches vertically 2 times and, finally y = 2(1+ f ( 2x +2))
stretches horizontally 2 times (shrinking by a factor of 12 is stretching by a factor
of 2). Notice: compared to Example 1 we simply changed the order, but we got a
very different answer.
Exercise 2. If we shift the graph of y = f (x) down by 1 and to the right by 7, and
then stretch it vertically 2 times and shrink it horizontally 3 times, what function
will this graph correspond to?
Example 3. If the graph of y = f (x) is given
by the solid green curve, what is the function
whose graph is given by the dashed blue curve?
Hint: Use the grey dotted curve as a guideline.
Solution: Dashed curve looks like the dotted
curve shifted by 2 to the right, and dotted curve
looks like the solid curve shrunk vertically by a
factor of 2. So we shrink first, y = 0.5 f (x), and
then shift to the right, y = 0.5 f (x − 2).
Exercise 3. If the graph of y = f (x) is given

by the solid green curve, what is the function
whose graph is given by the dashed blue curve?
Hint: Use the grey dotted curve as a guideline.
Sine and cosine. Let us recall the graphs of functions sin(x) and cos(x), and
recall some of their basic properties.
• Both fluctuate between −1 and +1. We say
that their amplitude is equal to 1.
• Both are periodic functions with the period
2π. This means that sin(x + 2π) = sin(x) and
cos(x + 2π) = cos(x), and the graphs of these
functions repeat the same pattern every 2π.
• Many natural phenomena are (approxi-
mately) periodic, for example, daylight
hours, average monthly temperatures, ocean
tides, circadian rhythms, heart beat, etc. Cosine and sine can be used to describe
(or model) some of them.
Just like sine and cosine, a function y = f (x) is said to be periodic with the
period T if f (x + T ) = f (x) for all x, in which case the graph of this function
repeats every T (units of x).
Compared with the above general transformation y = b + a f c(x − d) , when
dealing with sin(x) and cos(x) we will write this transformation as
2π
y = b + a · cos (x − d)
T
by replacing the constant c with 2π T , for some constant T > 0 (and, similarly, for
sin(x)). This means that we shrink the graph of cos(x) horizontally by a factor 2πT ,
T
which is the same as stretching it by a factor 2π , which means that the period 2π
becomes T . In other words, we express the horizontal stretch factor in terms of the
period T , which has an important practical meaning and which is often easier to
see visually. Let us look at the graph of this function.
• Vertical stretch factor a is called ampli-
tude.
• Vertical shift b is also called average, since
the function fluctuates around this level.
• Horizontal shift d is also called phase shift.
• T is the period.
• 2π
T is called angular frequency and T is
1
called frequency.
Example 4. What is the amplitude, average, phase shift, period, and frequency of
y = 2(0.5 + 1.5 cos(3x + 1))?
Solution: If we rewrite it as y = 1 + 3 cos 3(x + 31 ) , we see that the amplitude is 3,

average is 1, and phase shift is − 13 (minus because + 13 means that we shift to the
left). Since 2π 2π 1
T = 3, period is T = 3 , and frequency is T = 2π .
3
Exercise 4. What is the amplitude, average, phase shift, period, and frequency of
y = −3 + 4 cos( x−2
3 )?
Example 5. What is the function in the figure on the left?

30 1 Functions
Solution: We see that the minimum is 0.2 and maximum is 2.2, so the average
is 1.2. Then the amplitude is the difference between the maximum and average,
2.2 − 1.2 = 1. The function looks like cosine shifted to the right by 0.4, so phase
shift is 0.4. We can also see that the period is 1, for example by looking at the
difference between consecutive peaks 1.4 − 0.4 = 1. Therefore, the function is y =
1.2 + cos(2π(x − 0.4)).
Exercise 5. What is the function in the figure above on the right?
Daylight hours in Toronto. The length of daylight in Toronto can be approxi-

mated by the function
2π
12.175 − 3.255 cos (t + 10)
365
where t ≤ 365 is measured in days. See: youtu.be/0ht56fyCIFQ?t=186.
Varying average and amplitude. The figure above5 depicts monthly number of
sunspots over 400 years of sunspot observations, which follow an approximately
11 year solar cycle. We can think of T = 11 as the period of this cycle, but the
amplitude and average are varying over time. Let us look at a couple of simple
examples of functions of this form:
2π
y = b(x) + a(x) · cos (x − d)
T
where b = b(x) is the average and a = a(x) is the amplitude, both varying with x.
Example 6. Write down a possible model for the function below on the left.
Solution: The distance between consecutive peaks appears to be about 10, so we
can take the period of one cycle to be T = 10. The curve is bounded above by the
60
line that passes through (0, 0) and (90, 60), so its slope is 90 = 2/3 and the line is
5 Robert A. Rohde, commons.wikimedia.org/wiki/File:Sunspot_Numbers.png
y = 2x
3 . The curve is bounded below by the x-axis y = 0. That means that the average
(line through the middle) is b(x) = 3x , and the amplitude is a(x) = 2x x x
3 − 3 = 3 . As
a result, we can guess that the curve is y = 3x + 3x cos( 2π
10 x). It appears that the first
peak is shifted from zero, so there might be a phase shift, but this is just an artifact
of scaling cosine by a varying amplitude which is 0 at x = 0. The peaks away from
zero appear to be near 10, 20, 30,etc., so there is no phase shift.
Exercise 6. Write down a possible model for the function above on the right.
Example 7. The diameter of a Ferris

Wheel is 20 meters, and at the lowest point
it is 2 meters above the ground. It takes the
Ferris Wheel 3 minutes to complete one
revolution. What is the height y = h(t) of
a rider starting at time t = 0 at the lowest
point?
Solution: Let us first find the coordinates
(x, y) of the rider in terms of the angle θ
in the figure. Since the radius of the Fer-
ris wheel is 10m, the vertical side of the
right triangle is 10 cos(θ ) and the horizontal side is 10 sin(θ ). This means that
x = 10 sin(θ ) and, because the center of the Ferris wheel is at height 12, the height
y is 12 − 10 cos(θ ). Now, what is the angle θ as a function of time, θ (t)? Since the
wheel is revolving at constant angular speed, θ (0) = 0 and θ (3) = 2π (full revolu-
tion), we see that θ (t) = 2π 2π
3 t. Finally, we get that the height y = 12 − 10 cos( 3 t).
Exercise 7. The diameter of a Ferris Wheel is 30 meters, and at the lowest point it
is 5 meters above the ground. It takes the Ferris Wheel 2 minutes to complete one
revolution. What is the height y = h(t) of a rider starting at time t = 0 at the highest
point?
32 1 Functions
Answer to Exercise 1. y = −1 + 21 f (3(x − 7)).
Answer to Exercise 2. y = 2( f (3x − 7) − 1).
Answer to Exercise 3. y = 0.5 f (x + 1).
Answer to Exercise 4. Amplitude is 4, average is −3, phase shift is 2, period is

1
T = 6π, and frequency is 6π .
Answer to Exercise 5. y = −2 + 5 cos( π3 (x − 3)). We can also view it as cosine

flipped upside down: y = −2 − 5 cos( π3 x).
Answer to Exercise 6. y = (15 − 4x ) + (15 − 4x ) cos( 2π

15 x).
Answer to Exercise 7. y = 20 − 15 cos(π(1 + t)).

1.7 Polynomials and rational functions 33
1.7 Polynomials and rational functions

Polynomials. Polynomials are functions of the type
y = a0 + a1 x + a2 x2 + . . . + an−1 xn−1 + an xn
where constants a0 , a1 , a2 , . . . , an are called the coefficients of this polynomial. In

other words, a polynomial is a sum of power functions cxκ with integer powers
κ ≥ 0. The largest power n is called the degree of a polynomial. For example:
• 1 − 2x + 2x2 is a polynomial of degree 2, also called a quadratic polynomial;
• −3x + x3 is a polynomial of degree 3, also called a cubic polynomial.
Motivation. Polynomials play a fundamental role in Calculus. On the one hand,
they arise naturally in applications (perhaps, the most famous example is the height
of a projectile changing over time due to gravity, such as an apple falling from
a tree). On the other hand, they also serve as a very useful tool, because of the
combination of the following two factors:
1. Polynomials are easy to work with, e.g. easy to differentiate and integrate.
2. Other functions can be approximated by polynomials.
This means that we can often replace a more complicated function by a polynomial
if it helps us solve the problem.
For example, here is a figure showing how
y = cos(x) can be approximated near x = 0 by
polynomials Pn (x) of degree n = 2, 4, 6 and 8.
We can see that approximations get better on
wider and wider intervals as the degree n in-
creases. This is a preview of the so called Tay-
lor polynomials that will be studied later on and
that have numerous applications. Degree 8 poly-
2 4 6 8
nomial is given by 1 − x2! + x4! − x6! + x8! .
Here is another example. In Chap-

ter 1.5, Example 1, we considered a
table of data points and, using Google
Sheets, fit an exponential trend curve
y = 1.98e1.1x . In the figure on the left,
instead of Exponential, we now select
a Polynomial trend curve, which turns
out to be a quadratic polynomial y =
2.5 − 0.525x + 4.01x2 . We see that it
fits the data very well and, as a result, this quadratic polynomial could also be used
to describe the data if it is easier to work with or, especially, if this polynomial and
its coefficients have some physical meaning.
34 1 Functions
Basic properties. Let us review some basic

properties of polynomials.
• The graph of a polynomial of degree n can
change direction (increasing or decreasing)
at most n times. For example, polynomial
of degree 4 in the figure is decreasing, then
increasing, then decreasing, then increasing
again. All the examples in the figure change
direction exactly n times, but it could be
fewer than n times.
• Polynomial p(x) of degree n can have at most n roots, i.e. points x such that
p(x) = 0 (where the graph crosses the x-axis). In the figure above, the polyno-
mials of degrees 1, 3 and 4 have 1, 3 and 4 roots correspondingly, but polynomial
of degree 2 has no roots.
• When x goes to +∞ or −∞ (when x takes very large positive or negative values),
the polynomial p(x) also goes to +∞ or −∞ and the sign can be determined by
looking at the term an xn with the largest degree n, because this term dominates
all the other terms for large x in the sense that it grows faster. For example, if
p(x) = 1 − 2x + x2 − 3x3 then, p(x) goes to +∞ when x goes to −∞, because the
term −3x3 becomes positive, e.g. −3(−10)3 = 3 · 103 > 0.
Example 1. For the polynomials (a) and (b)

in the figure, determine the smallest possible
degree n, whether n is even or odd, and the sign
of the coefficient an .
Solution: (a) The graph changes direction 5
times, so the smallest possible degree is n = 5.
Degree n must be odd, because p(x) goes to
+∞ when x goes to −∞, and −∞ when x goes
to +∞. Coefficient an < 0 must be negative,
because an xn dominates and becomes negative
when x goes to +∞.
(b) The graph changes direction 2 times, so the smallest possible degree is n = 2.
Degree n must be even, because p(x) goes to −∞ when x goes to −∞ and +∞. Co-
efficient an must be negative, because an xn dominates and becomes negative when
x goes to +∞. Of course, this graph looks like a familiar upside down parabola.
Exercise 1. For the polynomials (c) and (d) in the above figure, determine the
smallest possible degree n, whether degree n is even or odd, and the sign of the
coefficient an .
If we know that a polynomial p(x) of degree n has n roots x1 , . . . , xn then we can

write this polynomial as
y = p(x) = c(x − x1 )(x − x2 ) · · · (x − xn )
for some constant c. If we also know the value of p(x) at some point x other than
the roots, we can use it to determine c.
Example 2. Determine the cubic polynomial
(a) in the figure.
Solution: We can see from the graph that the
roots are −1, 1 and 3, so p(x) = c(x + 1)(x −
1)(x − 3). We can also see from the graph that
p(0) = 6, so 6 = c(0 + 1)(0 − 1)(0 − 3) = 3c,
and c = 2, so p(x) = 2(x + 1)(x − 1)(x − 3).
If we need to, we can multiply this out to get
p(x) = 6 − 2x − 6x2 + 2x3 .
Exercise 2. Determine the cubic polynomial
(b) in the above figure.
Example 3. In the figure, the graphs of two
polynomials are visible in a limited region.
Dashed red line is a cubic polynomial with
negative leading coefficient, a3 < 0. How
many zeros does this polynomial have, and
what can we say about their location?
Solution: We can see two roots, x1 = −3 and
x2 = +3. Since a3 < 0 and the leading term
a3 x3 dominates, polynomial must go to −∞
when x goes to +∞, so the graph must start
decreasing eventually. The graph can change
direction at most 3 times, which means that it must start decreasing somewhere on
the right of the observed region, and so there will be another root x3 > 7.
Exercise 3. In the above figure, solid blue line is a quartic polynomial (of degree
4) with negative leading coefficient, a4 < 0. How many zeros does this polynomial
have, and what can we say about their location?
Quadratic polynomials. Polynomials of
degree 2, y = ax2 + bx + c, appear frequently
in Calculus, so let us recall their basic prop-
erties. Their graphs are given by parabolas,
which open upwards when a > 0 and open
downwards when a < 0. If the discriminant
D = b2 − 4ac is nonnegative, D ≥ 0, then this
polynomial has roots
√ √
−b − b2 − 4ac −b + b2 − 4ac
x1 = , x2 = .
2a 2a
36 1 Functions
If the discriminant is equal to zero, D = 0, then the roots are the same. If D < 0
then there are no roots and the parabola is entirely above or below the x-axis. The
extreme point of the parabola (minimum or maximum) is called the vertex and
b b2
its coordinates are x = − 2a and y = c − 4a . Later we will learn how to find this
extreme point by setting the derivative to zero (since the slope of the parabola is
zero at that point), but one can also find it using simple algebra, by completing the
square.
Example 4. Find the roots (if they exist) and vertex of y = 2x2 − 2x − 4 by com-
pleting the square.
Solution: First factor out the leading coefficient y = 2(x2 −x−2). Next, to complete
the square for x2 − x − 2, we want to create something that looks like (x ± r)2 =
x2 ± 2rx + r2 . In this case, we want −x to look like −2rx, so we must take r = 0.5.
Then we add and subtract r2 = 0.52 and rewrite
x2 − x − 2 = x2 − 2 · 0.5 · x + 0.52 − 0.52 − 2

= (x − 0.5)2 − 0.52 − 2 = (x − 0.5)2 − 2.25.
This finishes completing the square: y = 2(x−0.5)2 −4.5. The vertex is at the point
(0.5, −4.5) since the parabola opens upwards and will have a minimum when the
term (x − 0.5)2 is as small as possible, i.e. when x = 0.5. Another way to see this
is by noticing that y = 2(x − 0.5)2 − 4.5 is obtained from y = x2 by stretching it
vertically 2 times, then shifting to the right by 0.5 and down by 4.5. We can also
find the roots once we completed the square:
(x − 0.5)2 − 2.25 = 0 ⇒ (x − 0.5)2 = 2.25 ⇒ x − 0.5 = ±1.5 ⇒ x = −1, 2.
Of course, we could have also used the formulas above.

Exercise 4. Find the roots (if they exist) and vertex of y = 2x2 + 4x − 6 by com-
pleting the square.
Rational functions. Rational functions are functions of the type
p(x)
y=
q(x)
where both the numerator p(x) and denominator q(x) are polynomials. Such func-
tions are undefined whenever q(x) = 0, so the roots of the denominator are not in
the domain. Also, these roots are often vertical asymptotes, for example, when the
numerator p(x) ̸= 0 at the same point, so the function will approach −∞ or +∞
as the variable x approaches the root of q(x). Rational functions can sometimes
have horizontal asymptotes as x approaches −∞ and +∞, and we will see in the
examples how these asymptotes can be determined.
Example 5. Find all vertical and horizontal

2
asymptotes of y = 2xx2 −4x−96
−x−30
.
Solution: First, we rewrite the function as y =
2(x2 −2x−48)
x2 −x−30
= 2(x+6)(x−8)
(x+5)(x−6) by finding the roots
of the quadratic polynomials in the numerator
and denominator. The denominator has roots
−5 and 6, and the numerator is not zero at
these points, so the function will approach −∞
or +∞ as x approaches −5 and 6. Of course,
the easiest way to see how the function ap-
proaches these vertical asymptotes is to graph it in any graphical calculator. If
we want to sketch it without a graphical calculator, we can check the sign of ±∞
by considering values of x near a root. For example, if x is slightly bigger than
6 then the function will be negative, because the signs of all the factors will be
2(+)(−)
(+)(+) < 0, so the function approaches −∞ from the right side of x = 6. Similarly,
if x is slightly smaller than 6 then the function will be positive, because the signs
of all the factors will be 2(+)(−)
(+)(−) > 0, so the function approaches +∞ from the left
side of x = 6. We can check what happens near x = −5 similarly.
To find the horizontal asymptote at infinity (if it exists), heuristically we can
remember that a polynomial 2x2 − 4x − 96 is dominated by the leading term 2x2
when x is large, and x2 −x −30 is dominated by x2 , so for large x their ratio behaves
2
like 2xx2
= 2 and so the horizontal asymptote will be y = 2. To make this heuristic
argument more precise, what we do is we take the leading term (here x2 ) in the
denominator and divide both the numerator and denominator by it:
2x2 −4x−96
2x2 − 4x − 96 x2
2 − 4x − 96
x2 2−0−0
= = 1 30
→ =2
x2 − x − 30 x2 −x−30 1− − 1−0−0
x2 x x2
as x goes to infinity, because the terms where we divide by x or x2 all go to zero.

Finally, the function crosses x-axis at x = −6 and x = 8, which are the roots of the
polynomial in the numerator.
x2 −3x−2
Exercise 5. Find all vertical and horizontal asymptotes of y = 2x2 −8
. Sketch the
graph.
The calculation of the horizontal asymptote in the above example can be used
to show the following.
• If the degrees of p(x) and q(x) are equal then the horizontal asymptote is the
ratio of their leading coefficients.
• If the degree of p(x) is smaller than the degree of q(x) then the horizontal asymp-
tote is y = 0.
• If the degree of p(x) is bigger than the degree of q(x) then there is no horizontal
asymptote at infinity.
38 1 Functions
For example, in the above example, if we replace x2 in the denominator by x3

then
2x2 −4x−96 2 4 96
2x2 − 4x − 96 x3 x − x2 − x3 0−0−0
= = 1 30
→ = 0,
x3 − x − 30 x3 −x−30 1− 2 − 3 1−0−0
x3 x x
so the asymptote is y = 0. On the other hand, if we remove the term x2 altogether

then
2x2 −4x−96
2x2 − 4x − 96 x 2x − 4 − 96x
= −x−30 = 30
≈ −(2x − 4),
−x − 30 x −1 − x
in which case there is no horizontal asymptote.
Example 6. Match the blue solid curve to one

of the following rational functions:
x
• y= (x−2)(x−3) ,
x
• y= (x+2)(x−3) ,
x2
• y= (x+2)(x−3) ,
x2
• y= (x−2)(x−3) .
Solution: Because the vertical asymptotes in

the figure are −2 and 3, the denominator
should be (x − (−2))(x − 3) = (x + 2)(x − 3).
Since the horizontal asymptote is y = 1, the degrees of the numerator and denom-
inator should be equal, so the numerator should be x2 and not x. This means that
x2
y = (x+2)(x−3) .
Exercise 6. Match the dashed green curve to one of the rational functions in the
above exercise.
Answer to Exercise 1. (c) Degree n must be odd, because p(x) goes to both +∞
when x goes to +∞, and −∞ when x goes to −∞. Coefficient an must be positive,
again, because an xn dominates and becomes positive when x goes to +∞. The graph
does not change direction, so the smallest possible degree is n = 1, but because the
graph is not linear, the smallest possible degree is n = 3.
(d) Degree n must be even, because p(x) goes to +∞ both when x goes to +∞
and to −∞. Coefficient an must be positive, because an xn dominates and becomes
positive when x goes to +∞. The graph changes direction 2 times, so the smallest
possible degree is n = 2, but because the graph does not look like a parabola, the
smallest possible degree is n = 4.
Answer to Exercise 2. y = −4(x + 0.5)(x − 0.5)(x − 2) = −2 + x + 8x2 − 4x3 .
Answer to Exercise 3. We can see one root, x1 = 6. Since a4 < 0 and the leading
term a4 x4 dominates, polynomial must go to −∞ when x goes to −∞, so the graph
must start decreasing eventually on the left. The graph can change direction at most
4 times, and from what we can see there will be one more root somewhere to the
left of −6, x2 < −6.
Answer to Exercise 4. y = 2(x + 1)2 − 8. Roots are −3 and 1. Vertex is (−1, −8).
Answer to Exercise 5. Vertical asymptotes are

x = −2 and x = 2. Horizontal asymptote is y =
0.5. We can check that the function approaches
+∞ from the left side of x = −2, −∞ from the
right side of x = −2, +∞ from the left side of
x = 2, and −∞ from the right side of x = 2.
x
Answer to Exercise 6. y = (x+2)(x−3) . The vertical asymptotes are the same, so the
denominator should be the same. Horizontal asymptote is zero, so the degree of the
numerator should be smaller, and our only choice given was x. Another way to see
this is to notice that the function changes sign as we cross x = 0. If the numerator
was x2 , it would not change sign at x = 0, so the behaviour would be like in the
previous example, where the blue curve does not change sign at x = 0.
40 1 Functions
1.8 Inverse functions

Recall the definition of the natural logarithm:
y = ex ⇐⇒ x = ln(y).
This is one example of an inverse function. Before we give a general definition of

an inverse function, let us consider some examples.
√
Example 1. Consider a function y = x + 1 in the figure. What is the domain and
range of this function? Solve this equation for x.
Solution: We are allowed√ to plug in only non-
negative numbers into x, so the domain √ of
this function is [0, ∞). The output of x is
also nonnegative number, but since we add
+1, the range is √[1, ∞). To solve for x, we
first write y−1 = x and, squaring both sides,
we get x = (y − 1)2 . We can think of this an-
swer x as a function of y, i.e. y is the input
and x = (y − 1)2 is the output. This function √
x = (y − 1)2 is called the inverse function of y = x + 1, but only if we restrict y
to be in [1, ∞). Although the formula
√ (y − 1)2 also makes sense for y < 1,√ we can
solve the original equation y = x + 1 only when y is in the range of y = x + 1,
i.e. y ≥ 1. As we see in the
√ figure, the input x in the domain [0, ∞) of the original
function produces output x + 1 in the range [1, ∞), while the input y in [1, ∞) in
the domain of the inverse function x = (y−1)2 produces output (y−1)2 in its range
[0, ∞). The domain and range switch between the original function and the inverse
function.
√
Exercise 1. Consider a function y = x − 1. What is its domain and range? Solve
this equation for x. What is the domain and range of the inverse function?
Definition. Let us now define the inverse

function for any function y = f (x) whenever
it is possible, and also explain when this is not
possible. If we look at the graph of the func-
tion y = f (x) in the figure, as we well know,
given any input a in the domain of the func-
tion f depicted on the x-axis, the output is a
point y = f (a) in the range of this function
depicted on the y-axis. Now, let us take any
point b in the range of this function, which means that it is an output b = f (x) for
some point x in the domain. There may be more than one such x as we will see
in the examples below, but suppose for now that there is only one solution x, as in
the figure. This x is denoted f −1 (b) and it is pronounced “ f inverse of b”. In other
1.8 Inverse functions 41
words, if we can solve the equation b = f (x) for unknown x and the solution is
unique, it is denoted x = f −1 (b). Suppose that such solution x = f −1 (b) is unique
for every point b in the range of f . If so, the function f is called invertible and
x = f −1 (y) is called the inverse function of y = f (x). Notice how the output y of
f becomes the input of its inverse f −1 and the input x of f becomes the output of
f −1 . This means that:
• The domain of f is the range of f −1 and the range of f is the domain of f −1 .
Range and domain switch when taking an inverse.
To summarize in plain English, a function y = f (x) is invertible if, for any possible
output y in the range of f , we can determine exactly what the input x was.
Warning! The superscript −1 in f −1 is just a notation for the inverse, √
and it does
1
not mean the reciprocal f . For example, in Example 1 above, f (x) = x + 1 and
f −1 (y) = (y − 1)2 , not √x+1
1
.
Example 2. Suppose p is the price (in dollars)

a bakery sets for its plain bagel. Let q = f (p)
depicted in the figure be the average monthly
demand q for plain bagels when the price p is
between $1 and $5. Is this function invertible?
What is the domain and range of f −1 ? What is
the meaning of f (2), and what are the units of
2? What is the meaning of f −1 (1000), and what
are the units of 1000?
Solution: The domain of f is [1, 5] and the range is [0, 3000]. The function f is
invertible because we can see from the graph that for every q in the range, there is
a unique p in the domain such that q = f (p). The domain of the inverse function
p = f −1 (q) is [0, 3000] and its range is [1, 5]. The meaning of f (2) is the average
monthly demand for plain bagels if the price is $2, with the units of 2 being dollars.
The meaning of f −1 (1000) is the price of a bagel at which the average monthly
demand is 1000, with the units of 1000 being the number of bagels.
Exercise 2. After taking 100 mg of aspirin, the amount of aspirin in a patient’s body
is m = f (t) = 100 · (0.5)t/20 mg, where time t is measured in minutes. A therapeutic
effect of aspirin becomes negligible after 4 half-lives, so we only consider this
function on the domain between t = 0 and 4 half-lives. Is this function invertible?
What is the domain and range of f −1 ? What is the meaning of f (60), and what are
the units of 60? What is the meaning of f −1 (25), and what are the units of 25?
Horizontal line test. If we are given the graph of a function y = f (x), we can
see that f is invertible if every horizontal line y = b intersects the graph not more
than once. If it does not intersect then b is not in the range of f . If it intersects
more than once, such b is the output f (x) of more than one input x, so we cannot
determine f −1 (b), and f is not invertible.
42 1 Functions
For example, if we look at the graph of

y = sin(x), any horizontal line y = b for b
in its range [−1, 1] intersects the graph many
times, so the function is not invertible. How-
ever, if we consider y = sin(x) only on the in-
terval [− π2 , π2 ] (solid blue piece in the figure)
then the function becomes invertible, since
it is strictly increasing on that interval and
passes the horizontal line test. This inverse is
called arcsine and is denoted
x = sin−1 (y) or x = arcsin(y).
The domain of arcsine is [−1, 1] and the range is [− π2 , π2 ]. As we can see here, the
domain is very important when deciding is the function is invertible.
Example 3. Is it true that sin−1 (sin(x)) = x for all −∞ < x < ∞?

Solution: The answer is no. For example, if we take x = π, sin(π) = 0 and
sin−1 (0) = 0 so sin−1 (sin(π)) = 0. The problem here is that π is not in the inteval
[− π2 , π2 ] which was used to define arsine function. If we restrict to − π2 ≤ x ≤ π2
then it is true that sin−1 (sin(x)) = x.
Exercise 3. Is it true that sin(sin−1 (y)) = y for all −1 ≤ y ≤ 1?
Cancelling inverses. The above example shows that if f is invertible then the
inverses cancel each other,
f −1 ( f (x)) = x and f ( f −1 (y)) = y
but x must be in the domain of f and y must be in the range of f that appear in the
definition of the inverse f −1 .
Monotonic functions. The most common
reason for a function to be invertible is when it
is monotone, which means that it is strictly in-
creasing or strictly decreasing, as in the above
examples. A couple of points to keep in mind.
• The function (a) in the figure is increasing,
but not strictly increasing, because it is equal
to 0.2 on the interval 0 ≤ x ≤ 0.4. It does not
pass the horizontal line test and is not invert-
ible, so ‘strictly’ part is important.
1.8 Inverse functions 43
• A function does not have to be monotone to be invertible. For example, the

function (b) in the figure is equal to x + 1 for −1 ≤ x ≤ 0 and equal to −x for
0 < x ≤ 1. It is not monotone, but it passes the horizontal line test. Of course,
most common examples are monotone.
Example 4. Arnold is taking 100 mg tablet

of ibuprofen every night before going to bed.
If half-life of ibuprofen is 20 minutes, is the
amount of ibuprofen in Arnold’s body invertible
over a 72 hour period? What about over a 5 hour
period starting right after he takes a pill?
Solution: The amount of ibuprofen in Arnold’s
body after taking a 100 mg pill is m = f (t) =
100 · (0.5)t/20 mg, which is strictly decreasing
and it will be invertible over a 5 hour period, i.e. if we restrict the domain of t
to [0, 5]. However, after 24 hours Arnold will take another pill and the amount will
jump from essentially zero to 100 mg, and then start decreasing again, as in the
figure. So over a 72 hour period the function will not be invertible.
By the way, the reason we restricted to 5 hours is because when t = 5 · 60 min
(since the time t in the formula is measured in minutes), the amount m will be
100(0.5)15 = 0.003mg, which is essentially zero. Mathematically speaking, the
function will be invertible up to 24 hours, but in practical terms the amount will
eventually become zero and the formula will no longer be applicable.
Exercise 4. The price of mailing a letter by Canada Post in 2022 is:

• $1.07 up to 30 g, • $4.44 over 200 g up to 300 g,
• $1.30 over 30 g up to 50 g, • $5.09 over 300 g up to 400 g,
• $1.94 over 50 g up to 100 g, • $5.47 over 400 g up to 500 g.
• $3.19 over 100 g up to 200 g,
Is the price invertible as a function of weight?
Graphs of inverse functions. If a func-

tion f is invertible, once we solve y =
f (x) for x to find the inverse function x =
f −1 (y), we can use this function f −1 with
any variable as an input, such as x,t, etc.
If we plot the graph of y = f −1 (x) on the
same x-y plane as the graph of the original
function y = f (x), the two graphs will be
mirror images around the diagonal y = x.
We explained why this is so when we in-
troduced logarithmic functions, but let us
look at one more example. If we look at
the graph of y = tan(x), it is not invertible because tan(x) is periodic, so it does not
44 1 Functions
pass the horizontal line test. However, if we limit the domain to (− π2 , π2 ), tangent is
strictly increasing there, so we can define the inverse function, called x = tan−1 (y)
or x = arctan(y). If we plot it on the same x-y plane, i.e we plot y = tan−1 (x),
we can see that, indeed, the graph is a mirror image of the original graph around
the diagonal. The domain of tan−1 (x) is the entire real line, −∞ < x < ∞, and the
range is (− π2 , π2 ). Since y = tan(x) has vertical asymptotes at x = π2 and x = − π2 ,
y = tan−1 (x) has a horizontal asymptote y = π2 as x approaches +∞ and a horizon-
tal asymptote y = − π2 as x approaches −∞. These functions will arise naturally in
applications, but they are also good examples to keep in mind whenever we need a
function with vertical or horizontal asymptotes.
Example 5. Give an example of a function that has a horizontal asymptote y = 2

as x approaches +∞ and a horizontal asymptote y = 0 as x approaches −∞.
Solution: If we shrink y = tan−1 (x) vertically by a factor π/2 and then shift it up
by 1, we will get what we want, so y = π2 tan−1 (x) + 1 is one such example.
Exercise 5. Give an example of a function that has a horizontal asymptote y = −1

as x approaches +∞ and a horizontal asymptote y = 0 as x approaches −∞.
Answer to Exercise 1. Domain is [1, ∞), range is [0, ∞). Solution is x = y2 + 1. If

we think of this solution as the inverse function, then its domain is [0, ∞) and the
range is [1, ∞).
Answer to Exercise 2. The function f (t) is an exponential decay with the half-life
of 20 minutes, so the domain is [0, 80] and the range is [6.25, 100], because f (0) =
100 and f (80) = 6.25. It is invertible and, in fact, we can find the inverse f −1
20 ln m 20 ln 100
explicitly by solving m = 100 · (0.5)t/20 for t: t = ln 0.5
100
= ln 2m . The domain
of the inverse of [6.25, 100] and the range is [0, 80]. The meaning of f (60) is the
amount of aspirin left in the body after 60 minutes. The meaning of f −1 (25) is the
number of minutes until there is only 25 mg left in a patient’s body.
Answer to Exercise 3. Yes, because y is in the range of sin(x).
Answer to Exercise 4. No, because it is not strictly increasing.
Answer to Exercise 5. If we flip y = tan−1 (x) upside down, shrink it by a factor π

and then shift it down by 21 , we will get what we want, so y = − π1 tan−1 (x) − 12 is
one such example.
1.9 Limits and continuity 45
1.9 Limits and continuity

We have already implicitly encountered the idea of a limit when we discussed
vertical and horizontal asymptotes. Now we will consider a more general situation
and introduce new terminology and notation that will be more compact and very
convenient, especially, once we start studying derivatives. For example, instead of
saying that a function y = f (x) has a horizontal asymptote y = 2 as x approaches
+∞, which means that f (x) approaches 2 as x approaches +∞, we can also say that
the limit of f (x) as x goes to +∞ is equal to 2, and we will express this by writing
lim f (x) = 2.
x→+∞
Before we summarize all the definitions and notation, let us demonstrate them on
a specific example.
Example 1. For the function in the figure:
• What are the left and right limits of f (x)
at x = 0? Does the limit of f (x) at x = 0
exist? Is the function continuous as x = 0?
• What are the left and right limits at x = 2?

Does the limit of f (x) at x = 2 exist? Is the
function continuous as x = 2?
• What are the left and right limits at x = 3? Does the limit of f (x) at x = 3 exist?
Is the function continuous as x = 3?
• What are the left and right limits of f (x) at x = −2? Does the limit of f (x) at
x = −2 exist? Is the function continuous as x = −2?
Solution: We see that as x approaches 0 from the right side (i.e. x > 0 is getting
close to 0), the value of the function f (x) approaches 8. In this case we say: the
right limit of f (x) at x = 0 is equal to 8, which is expressed using mathematical
notation as:
lim f (x) = 8.
x→0+
Here, notation x → 0+ means that x goes to 0 from the right. Similarly, we see that
as x approaches 0 from the left side (i.e. x < 0 is getting close to 0), the value of
the function f (x) also approaches 8. In this case we say: the left limit of f (x) at
x = 0 is equal to 8, and we write:
lim f (x) = 8.
x→0−
Here, notation x → 0− means that x goes to 0 from the lef t. When the function
approaches the same value from both sides, we say that the limit exists and, in this
46 1 Functions
particular case, the limit of f (x) at x = 0 is equal to 8,
lim f (x) = 8.
x→0
Here, notation x → 0 means that x goes to 0 from both sides. From the graph we
can see that f (0) = 5, indicated by the solid dot at (0, 5), so the limit of f (x) at
x = 0 is not equal to f (0),
lim f (x) = 8 ̸= 5 = f (0).

x→0
In this case, we say that the function f (x) is discontinuous (or not continuous) at
x = 0.
In the second case of x = 2, we see that
lim f (x) = 6, lim f (x) = 4,

x→2− x→2+
and we see that f (2) = 6, indicated by the solid dot at (2, 6). Since the function
approaches different values from the left and right sides, we say that the limit does
not exist. In this case, we again say that the function f (x) is discontinuous at x = 0.
In the third case of x = 3, we see that
lim f (x) = 2,
x→3
because the function approaches the same value 2 from both sides, so the limit
exists and is equal to 2. However, the function f (x) is undefined at x = 3, because
the white open dot at (3, 2) indicates that the value of f (3) is not 2, and there is
no solid dot anywhere on the line x = 3, which is a way to indicate that x = 3 is
not in the domain of f . In this case, again, the function f (x) is discontinuous at
x = 3 simply because we cannot compare the limit with any value f (3). Whenever
a point x = a is not in the domain of f (x), the function cannot be continuous at that
point.
Finally, in the last case of x = −2, we see that
lim f (x) = 6,
x→−2
because the function approaches the same value 6 from both sides, so the limit
exists and is equal to 6. The value of the function at that point is f (−2) = 6, so
lim f (x) = 6 = f (−2).

x→−2
In this case, we say that the function f (x) is continuous at x = −2.

Exercise 1. For the function in the figure:

• What are the left and right limits of f (x)
at x = 0? Does the limit of f (x) at x = 0
exist? Is the function continuous as x = 0?
• What are the left and right limits at x = 1?

Does the limit of f (x) at x = 1 exist? Is the
function continuous as x = 1?
• What are the left and right limits at x = 2? Does the limit of f (x) at x = 2 exist?
Is the function continuous as x = 2?
• What are the left and right limits of f (x) at x = −1? Does the limit of f (x) at
x = −1 exist? Is the function continuous as x = −1?
Definitions. To summarize the definitions in the above example:

• If a function f (x) approaches some value when x approaches a from the right,
this value is called the right limit of f (x) at x = a and is denoted limx→a+ f (x).
• If a function f (x) approaches some value when x approaches a from the left,
this value is called the left limit of f (x) at x = a and is denoted limx→a− f (x).
• If both right and left limits of f (x) at x = a exist and are equal to each other,
their value is called the limit of f (x) at x = a and is denoted limx→a f (x).
• If limx→a f (x) = f (a) , i.e. the limit at x = a exists and is equal to the value of
the function at that point, we say that the function is continuous at x = a.
• If the limit limx→a f (x) does not exist, or it is not equal to f (a), or the function
is undefined at x = a, we say that the function is discontinuous at x = a.
Examples. Typical functions, such as linear, power functions, polynomials,

exponentials, logarithms, sine and cosine, are all continuous on their domains.
What happens if we add, subtract, multiply, or divide two continuous functions
f (x) and g(x)? If f (x) approaches f (a) and g(x) approaches g(a) then, obvi-
f (x)
ously, f (x) ± g(x) approaches f (a) ± g(a), f (x)g(x) approaches f (a)g(a), g(x)
f (a)
approaches g(a) if g(a) ̸= 0, so:
• The sum, difference, product, and also ratio if g(a) ̸= 0, are all continuous if
f (x) and g(x) are continuous at x = a.
Of course, dividing by zero can create a problem.
1
Example 2. Are f (x) = x−2 and g(x) = x + 2x continuous for all −1 ≤ x ≤ 1?
48 1 Functions
1
Solution: f (x) = x−2 is continuous for all −1 ≤ x ≤ 1, because we divide by zero
only when x = 2, which is not on the interval [−1, 1]. g(x) = x + 2x is not continuous
for all −1 ≤ x ≤ 1, because we divide by 0 when x = 0, so 0 is not in the domain
of this function.
1 1
Exercise 2. Are f (x) = 2x−1 and g(x) = cos(x) continuous for all 0 ≤ x ≤ 1?
In the next two problems we will use the following functions:

 2  2
x x
− 2 + 8, x ≤ −2
 − 2 + 8,
 x ≤ −2
• f (x) = 6, −2 < x ≤ 2 • p(x) = 6, −2 < x ≤ 2
 
−2x + 8, 2 < x κx + 8, 2<x
 
 
3 3
13x − x , x < 0
 13x − x ,
 x<0
• g(x) = x, 0≤x<1 • q(x) = κ, 0≤x<1
 
−2x + 3, 1 ≤ x −2x + 3, 1≤x
 
The list notation means that each function is defined by different formulas on three
different intervals. For example, if we want to find f (3), we see that 2 < 3, so x = 3
belongs to the last interval and f (3) = −2 · 3 + 8 = 2.
Example 3. Find where the function f (x) is discontinuous and explain why. For
which value of the constant κ will the function p(x) be continuous?
Solution: One each interval the function is a polynomial, so it is continuous. We
need to check if the function value jumps when the interval changes. First time the
interval changes at x = −2. When x is approaching −2 from the left, where x < −2,
2 2
the function is − x2 + 8, so it approaches − (−2)
2 + 8 = 6. We can write this using
formulas instead of words:
x2 (−2)2
lim f (x) = lim − + 8 = − + 8 = 6.
x→−2− x→−2− 2 2
When x is approaching −2 from the right, x now belongs to the second interval
−2 < x ≤ 2 where the function is constant, 6, so it approaches 6 :
lim f (x) = lim 6 = 6.

x→−2+ x→−2+
The left and right limits are the same, 6, which means that the limit exists and is
2
equal to 6, and the function at x = −2 is also f (−2) = − (−2)
2 +8 = 6 (we are using
the first formula because x = 2 belongs to the first interval x ≤ 2) so the function is
continuous at x = −2.
Second time the interval changes at x = 2. Again, let us compute the left and
right limits using the corresponding formulas from the list:
lim f (x) = lim 6 = 6, lim f (x) = lim (−2x + 8) = −2 · 2 + 8 = 4.

x→2− x→2− x→2+ x→2+
Since the two limits are not equal, the limit does not exist, and the function is
discontinuous at x = 2. Everywhere else the function is continuous.
To find the constant κ which ensures that p(x) is continuous at x = 2, let us
compute the left and right limits using the corresponding formulas from the list:
lim p(x) = lim 6 = 6, lim p(x) = lim (κx + 8) = κ · 2 + 8.

x→2− x→2− x→2+ x→2+
The limits must be equal, so we must have that 6 = κ · 2 + 8. Solving for κ we get
that κ = −1. For κ = 1 the limit exists and is equal to 6, and the function f (2) = 6,
so now the function p(x) is also continuous at x = 2.
Exercise 3. Find where the function g(x) is discontinuous and explain why. For
which value of the constant κ will the function q(x) be continuous?
Example 4. The price of mailing a letter by Canada Post in 2022 is (where ‘up to’
means ‘including’):
• $1.07 up to 30 g, • $4.44 over 200 g up to 300 g,
• $1.30 over 30 g up to 50 g, • $5.09 over 300 g up to 400 g,
• $1.94 over 50 g up to 100 g, • $5.47 over 400 g up to 500 g.
• $3.19 over 100 g up to 200 g,
At which points is the price as a function of weight discontinuous? What are the
right and left limits at those points?
Solution: The domain of the price p = p(w) of a letter as a function of weight is
0 < w ≤ 500. Beyond that weight it is no longer considered a letter. The first jump
(discontinuity) is at 30g, where the left limit is $1.07 and the right limit is $1.30.
Other discontinuities are similar, at 50g, 100g, etc.
Exercise 4. 50 mg of a drug is injected into a patient at a constant rate for 12
seconds. After that the plasma concentration of the drug decreases exponentially
with a half-life of 10 minutes. Express the quantity q = q(t) of the drug in the
patient’s body as a continuous function of time t, measured in minutes. Make sure
it is continuous at 12 seconds.
|x|
Example 5. What are the left and right limits of f (x) = x at x = 0? Does the limit
at x = 0 exist? Is the function continuous at this point?
Solution: If x < 0 then f (x) = |x| |−1|
x = −1. For example, if x = −1 then −1 = −1.
Similarly, if x > 0 then f (x) = |x|
x = 1. This means that the left limit limx→0− f (x) =
−1 and the right limit limx→0+ f (x) = 1. We can conclude that the limit limx→0 f (x)
does not exist, and the function is not continuous. Also, x = 0 is not in the domain
since we cannot divide by zero, so the function cannot be continuous even if the
limit existed.
Exercise 5. What are the left and right limits of f (x) = |x−2|
x−2 at x = 2? Does the
limit at x = 2 exist? Is the function continuous at this point?
50 1 Functions
2
Example 6. Consider a rational function f (x) = xx2−3x+2
+x−2
. Compute limx→1 f (x)? Is
the function continuous at x = 1?
Solution: At x = 1 both the numerator and denominator are zero, so the function
is undefined at x = 1 which means it cannot be continuous no matter if the limit
exists or not. Finding the roots of quadratic polynomials, we can write
x2 − 3x + 2 (x − 1)(x − 2)
= .
x2 + x − 2 (x − 1)(x + 2)
When we take the limit x → 1, we consider x that approach 1 but are not equal to
1, so x − 1 ̸= 0 and we can cancel it:
x2 − 3x + 2 (x − 1)(x − 2) x−2 1−2 1

lim = lim = lim = =− .
x→1 x2 + x − 2 x→1 (x − 1)(x + 2) x→1 x + 2 1+2 3
Exercise 6. Give an example of a rational function that is not continuous at x = 2

but has a limit at x = 2.
Intermediate Value Theorem. If we

know that a function is continuous on some
interval [a, b] then it must cross any value y
between f (a) and f (b) at some point x inside
the interval [a, b], because it is not allowed to
jump over this value. Of course, there could
be more than one such x (for example, there
are three in the figure), but we know there is
at least one. This is very useful, because we
know that it is possible to solve the equation
y = f (x) for x on the interval a ≤ x ≤ b, even if it might not be obvious.
Example 7. Show that there is a number c ∈ [0, 1] such that ec − 3c = 0.
Solution: The function f (x) = ex − 3x is continuous. We see that f (0) = 1 and
f (1) = e − 3 = −0.2817 . . . < 0, so y = 0 is in between these two values. By the
Intermediate Value Theorem (IVT for short), there must be a point c on the interval
[0, 1] such that f (c) = ec − 3c = 0. Using computer we can find that c = 0.619 . . . .
Exercise 7. Show that there is a number c ∈ [0, 1] such that c7 + c2 − 1 = 0.
In the next two problems we will use the following two functions.
( 2 (
− x2 + 8, 0 ≤ x ≤ 2 sin(x), 0 ≤ x ≤ π2
• f (x) = • g(x) =
−2x + 8, 2 < x ≤ 4. x − 12 , π2 < x ≤ 2.
Example 8. Can we apply the IVT to the function f (x) on the interval [0, 4]? If not,
find the value y between f (0) and f (4) such that y = f (x) has no solution x ∈ [0, 4].
Solution: We can check that limx→2− = 6 and limx→2+ = 4, so the function is not
continuous at x = 2, and IVT cannot be applied. On the first interval [0, 2], the
2
parabola − x2 + 8 is decreasing, and on the second interval [2, 4], the linear function
−2x + 8 is also decreasing, so when the function f (x) jumps from 6 to 4 when we
cross x = 2 it skips those values. This means that, for example, f (x) = 5 has no
solutions for 0 ≤ x ≤ 4. The graph of this function is in Example 1 above.
Exercise 8. Can we apply the IVT to the function g(x) on the interval [0, 2]? If not,
find the value y between g(0) and g(2) such that y = g(x) has no solution x ∈ [0, 2].
Growth and limits at infinity. When we discussed rational functions, we have

seen how we could find their horizontal asymptotes. For example, using the limit
notation introduced above, we can write
2x2 − 4x − 96 2 − 4x − 96
x2 2−0−0
lim 2
= lim 1 30
= =2
x→∞ x − x − 30 x→∞ 1 − − 1−0−0
x 2 x
and the main idea in this calculation was to take the leading term in the denominator
(here x2 ) and divide both the numerator and denominator by it. The reason why this
idea worked was because we could easily compare which function grows faster, x
or x2 , by dividing and cancelling. For example, xx2 = 1x → 0 means that x2 grows
faster than x.
To generalize this idea, let us compare how various functions grow at infinity.
We will say that a function g(x) grows faster than f (x) when x goes to infinity, or
f (x) grows slower than g(x), if
f (x)
lim = 0.
x→∞ g(x)
We can also express this more compactly by saying that g(x) dominates f (x) and
write f (x) ≪ g(x) as x → ∞.
• Logarithms grow slower than power functions: ln(x) ≪ x p for p > 0.

• Power functions grow slower when the power is smaller: x p ≪ xq if p < q.
• Power functions grow slower than any exponential growth: x p ≪ ax if a > 1.
• Exponentials grow slower when the base is smaller: ax ≪ bx if a < b.
x
The second one we already know, and the last one is true because abx = ( ba )x → 0
since ( ba )x is an exponential decay function when the base ab < 1. We will not
spend time discussing why the first and third cases are true but, using a change of
variables, both equations can be reduced to showing that x grows slower than ex ,
which is much more obvious. Remember also that an exponential growth function
ax for a > 1 can always be written as eκx for κ = ln(a) > 0. Now that we know
52 1 Functions
how to compare the growth of basic functions, we can consider more complicated
examples.
+x x 2
Example 9. Compute the limit limx→+∞ 5·77x +ln(x) .
Solution: The fastest growing function in the denominator is 7x so, if we divide
both the numerator and denominator by it, we get that
7x +x2 2
7x + x2 7x 1 + 7x x 1+0
lim = lim 5·7x +ln(x)
= lim = = 0.2.
x→+∞ 5 · 7x + ln(x) x→+∞ x→+∞ 5 + ln(x) 5+0
7x 7x
√
Exercise 9. Compute the limit limx→+∞ ln(x)+
√ x.
x− x
+x x 2
Example 10. For which values of κ > 0 does the limit limx→+∞ eκx7 +ln(x) exist?
Solution: In words, the fastest growing term in the numerator is 7x and in the de-
nominator it is eκx , so the limit will exist if the denominator grows at least as fast,
so eκ ≥ 7 or κ ≥ ln(7). To make this explanation more precise, let us divide both
the numerator and denominator by eκx ,
7x x2 7 x

7x + x2 κx + κx κ +0 7 x
lim κx = lim e ln(x) e
= lim e = lim .
x→+∞ e + ln(x) x→+∞ 1 + x→+∞ 1 + 0 x→+∞ eκ
κx e
This limit can exist only if e7κ ≤ 1; otherwise, it will grow exponentially. If e7κ =
1 then the limit is 1 and if e7κ < 1 then the limit is zero, because it will decay
exponentially.
+x κ 2
Exercise 10. For which values of κ > 0 does the limit limx→+∞ 2xx2 −ln(x) exist?
Some famous limits. We conclude this section with a list of a few famous limits
that, in particular, will be useful when we study the derivatives of exponential and
trigonometric functions:
x n ex − 1 sin x cos x − 1
lim 1 + = ex , lim = 0, lim = 1, lim = 0.
n→∞ n x→0 x x→0 x x→0 x
There is no need to memorize these limits at this point but, if you have time, you
can learn more about them in the videos in the footnotes.6 The first limit is, in fact,
the definition of Euler’s number e and the exponential function ex . The second limit
is the consequence of the first one, and it will be used to compute the derivative
of ex . The last two limits are relatively easy consequences of the definition of sine
and cosine, and will also be used to compute the derivative of sin x and cos x.
6 https://youtu.be/sbLWLvSfvwk, https://youtu.be/IX1cZHz-bc0, https://youtu.be/dLXal60n3JQ.
Answer to Exercise 1. limx→0 f (x) = 1 and the function is continuous at x = 0.

limx→1− f (x) = 4, limx→1+ f (x) = 1, limit does not exist, the function is discontinu-
ous. limx→2 f (x) = −1, but the function is undefined at x = 2 and so discontinuous.
limx→−1 f (x) = −2 ̸= −1 = f (−1), the function is discontinuous.
1
Answer to Exercise 2. f (x) = 2x−1 is not continuous for all 0 ≤ x ≤ 1, because we
divide by zero when x = 0.5, so 0.5 is not in the domain of this function. g(x) =
1
cos(x) is continuous for all 0 ≤ x ≤ 1, because we divide by 0 when cos(x) = 0,
when x = π2 , 3π
2 , . . ., which are not on the interval 0 ≤ x ≤ 1.
Answer to Exercise 3. g(x) is always continuous. It is impossible to choose κ to
make q(x) continuous.
Answer to Exercise 4. Since the time is measured in minutes, we need to write
12 seconds as 0.2 min. The quantity of drug increases linearly from 0mg at time
50
0min to 50mg at time 0.2min, so the slope is 0.2 = 250, and q(t) = 250t for 0 ≤
t ≤ 0.2. After that it starts decaying exponentially with a half-life of 10 minutes,
so a common mistake is to write it as 50 · 0.5t/10 . However, this formula would be
correct if the time started at t = 0, while now exponential decay starts at t = 0.2.
This means we need to shift this exponential decay function to the right by 0.2, so
it will be q(t) = 50 · 0.5(t−0.2)/10 for 0.2 < t. We can check that
(
250t, 0 ≤ t ≤ 0.2
q(t) =
50 · 0.5(t−0.2)/10 , 0.2 < t
is continuous at all times including t = 0.2.

Answer to Exercise 5. The left limit limx→2− f (x) = −1 and the right limit
limx→2+ f (x) = 1, and the function is not continuous at x = 2.
x−2
Answer to Exercise 6. For example, x−2 .
Answer to Exercise 7. f (0) = −1 and f (1) = 1, so the IVT implies that such c
exists. Using computer, c = 8398 . . . .
Answer to Exercise 8. We cannot apply IVP because the function is not continuous
at x = π2 . Both sin(x) and x − 21 are increasing on their corresponding intervals, and
at x = π2 the function jumps from 1 to π−1 2 = 1.07 . . ., so for example f (x) = 1.03
has no solution x ∈ [0, 2].
Answer to Exercise 9. The fastest growing function in the denominator is x so, if
we divide both the numerator and denominator by it, we get that
√
√ ln(x)+ x ln(x)
+ √1x
ln(x) + x x√ x 0+0
lim √ = lim = lim = = 0.
x→+∞ x − x x→+∞ x− x x→+∞ 1− √1 1−0
x x
Answer to Exercise 10. κ ≤ 2. If κ = 2 then the limit is 1, if κ < 2 then the limit
is 0.5.
Chapter 2
Derivatives
2.1 Practical interpretation of derivatives

In this section, we will start discussing derivatives rather informally and learn what
is means that “the derivative f ′ (a) of a function y = f (x) at a point x = a is equal
to m”. We will also look at the definition of the derivative using graphs, again very
informally. Jumping ahead, we will think of the derivative f ′ (a) as
• the slope of the tangent line to the graph of y = f (x) at x = a, or
• the rate of change of the function near x = a.
However, our main goal will be to express the meaning of the derivative in plain
English in practical situations, also paying attention to the units of the variables x
and y. Of course, to learn how to actually compute derivatives we will need a more
formal definition that will be discussed in the later sections.
Linear functions. Let us first discuss linear functions and, as an illustration, let
us use two examples that we have seen before:
• If a car drives with constant speed of 60 km/h then the distance it covers in t
hours is d = 60 · t km.
• If during photosynthesis in direct sunlight at temperature 10°C a leaf of some
plant produces 30 µmol of glucose and oxygen per hour then the amount of
glucose and oxygen produced during t hours is y = 30 · t µmol.
Given a linear function
y = b+m·x
and any two points (x1 , y1 ) and

(x2 , y2 ) on its graph, we recall that the
slope m can be computed as
∆y y2 − y1
m= = .
∆x x2 − x1
55
56 2 Derivatives
∆y
Notice that for a linear function the ratio ∆x is the same constant m no matter what
the two points (x1 , y1 ) and (x2 , y2 ) are and, when we learn a general definition of
the derivative, we will see that, for a linear function, the slope m also happens to
be its derivative.
How can we think about this quantity m? If the input x changes by ∆x and the
∆y
output of our linear function changes by ∆y, the ratio ∆x tell us how the output
changes relative to the input, so it has the meaning of the rate of change of y with
respect to x. For example:
• In the car example above, if the time changes by ∆t = 0.2 hours then the distance
12
changes by ∆d = 12km, and the ratio 0.2 = 60 km/h is the change of distance
relative to time, better known as speed.
• In the photosynthesis example, if the time changes by ∆t = 0.2 hours then the
6
amount of glucose and oxygen changes by ∆d = 6 µmol, and the ratio 0.2 = 30
µmol/h is the change of glucose and oxygen relative to time, which is the rate
of photosynthesis.
Let us rephrase the same thing in a different way. What does it mean that the
derivative (or slope) of a linear function is equal to m? It means that:
If the input changes by ∆x then the output changes by ∆y = m · ∆x.
The derivative 60 km/h means that, for example, between time 0.1 and 0.3 hours,
the distance will change by 60 · 0.2 = 12 km. The derivative 30 µmol/h means that
between time 0.1 and 0.3 hours, the amount of glucose and oxygen produced will
change by 30 · 0.2 = 6 µmol.
Derivative of a general function. Next, let us look at an informal definition of
the derivative for a general function y = f (x).
If the function is not linear then its slope may be constantly changing and the
∆y
ratio ∆x can depend on the points (x1 , y1 ) and (x2 , y2 ). However, if we zoom in very
close to a particular point (a, f (a)) on the graph of the function (figure above on
2.1 Practical interpretation of derivatives 57
the right), the graph looks almost linear and we can draw a so-called tangent line
that passes through the same point (a, f (a)) and has the same slope at that point.
Of course, this zooming in procedure is a very informal definition of this slope, and
we will make it more formal and precise later on, but for now:
• the slope of the tangent line at the point (a, f (a)) is called the derivative of a
function y = f (x) at the point x = a and it is denoted f ′ (a).
By analogy with the linear functions, what does it mean that the derivative of a
function y = f (x) at a point x = a is equal to f ′ (a)? It means that:
If ∆x is small then between x = a and x = a + ∆x
the output y = f (x) will change by approximately f ′ (a)∆x.
By contrast with the linear functions, here the change of y is only approximately
equal ≈ to f ′ (a)∆x, not exactly, and only if ∆x is small. Of course, we can rephrase
the above statement slightly depending on the setting of the problem. For example,
for the sake of clarity we will always select some specific small increment ∆x.
Example 1. The rate of photosynthesis depends on temperature, and suppose that

at 10°C a leaf of some plant produces glucose and oxygen at a rate of 30 µmol/h.
Let f (t) be the total amount of glucose and oxygen produced by the leaf since 12
pm, where time t is measured in hours. If the temperature steadily rises and at 3
pm it reaches 10°C, what is f ′ (3) and what is its practical meaning?
Solution: Since t = 3h corresponds to 3 p.m. and the temperature at that time is
10°C, the rate of photosynthesis f ′ (3) = 30 µmol/h. Supposing that the tempera-
ture changes slowly, f ′ (3) = 30 µmol/h means in plain English that, for example,
between 3:00 pm and 3:10 pm the leaf will produce approximately 30× 16 = 5 µmol
of glucose and oxygen. Our choice of the 10 minute time interval is reasonable be-
cause the temperature is unlikely to change much on the scale of 10 minutes.
Exercise 1. Suppose that the average monthly sales S at a bakery, in dollars, are a
function S = f (A) of its monthly spending on advertisement A, also in dollars. If
f ′ (100) = 4.5, what are the units and practical meaning of this derivative?
Example 2. In the figure1 and table below we see the average finish time in 2009
New York marathon by age group and sex. Let T = f (A) be the average finish time
for men of age A. In each age group, take the middle age (for example, in the 45-49
age group the middle age is 47) and suppose that the average finish time for men
of that age is the same as for the group. For example, f (47) = 4h 13min, ignoring
seconds. Estimate the derivative f ′ (52), give its units and describe its meaning.
The table of values is:
1 https://www.runtri.com/2010/11/new-york-city-marathon-average-finish.html
58 2 Derivatives
A 22 27 32 37 42 47 52 57 62 67
T 4:12 4:06 4:08 4:10 4:09 4:13 4:22 4:36 4:47 5:12
Solution: From the figure, we see that

f (47) = 4h 13min, f (52) = 4h 22min,
and f (57) = 4h 36min. We also see that
the slope of the line connecting the val-
ues at A = 47 and A = 57 is a good can-
didate for the slope at A = 52, which
is in the middle of those values. Since
∆A = 10 years and ∆T = 4h 36min −
4h 13min = 23min, the slope of this
line is ∆T
∆A = 2.3 min/year. Of course,
this is only an approximation, but if indeed f ′ (52) = 2.3 min/year then its meaning
is the following: the average finish time of 53 year old runners is approximately
2.3 minutes slower than 52 year old runners.
Exercise 2. In the figure2 we

see the graph of the energy
economy E, in miles/kWh, as
a function E = f (S) of speed
S, in mph, for 2021 Porsche
Taycan electric vehicle. Es-
timate the derivative f ′ (60),
give its units and describe
its meaning. From the graph:
f (55) = 4.02, f (60) = 3.64,
f (65) = 3.36.
Tangent line. By our definition, the derivative f ′ (a) of a function y = f (x) at

x = a is the slope of the tangent line to this function f (x) at the point x = a. Also,
the tangent line passes through the same point (a, f (a)). We know the formula
y = y0 + m(x − x0 ) of a linear function that has slope m and passes through the
point (x0 , y0 ) so, according to this formula, the tangent line is
y = f (a) + f ′ (a) · (x − a).
Indeed, the slope of this line is f ′ (a) and, if we plug in x = a into this formula, the
second term becomes zero and we get y = f (a), so the line passes through the point
(a, f (a)). We can also write this equation in terms of the increments ∆y = y − f (a)
and ∆x = x − a: ∆y = f ′ (a) · ∆x.
2 https://www.cleanmpg.com//community/index.php?media/35360/full
√
Example 3. We will later learn that the derivative of the function y = x at any
1
positive value x > 0 is equal to 2√ x
. What is its tangent line at x = 121? Using the
√
tangent line, approximate 132 and compare with the actual value.
√
Solution: The function at x = 121 is 121 = 11 and the derivative at x = 121 is
√1 = 221
, so the tangent line is y = 11 + 221
(x − 121). When x = 132, the tangent
2 121
11
√
line gives y = 11 + 22 = 11.5 . The actual value is 132 = 11.4891 . . ., and we
see that the function and the tangent line are quite close in this case even when the
increment ∆x = 132 − 121 = 11 is not very small.
Exercise 3. We will later learn that the derivative of the function y = cos x is equal
to − sin(x). What is its tangent line at x = π4 ? Using the tangent line, approximate
cos( π5 ) and compare with the actual value. Recall that cos( π4 ) = sin( π4 ) = √12 .
Estimating derivatives from tables. In the two examples above about New
York marathon and Porsche Taycan we actually estimated the derivative using
nearby values before interpreting its practical meaning. Right now we will do a
couple of similar examples, but we will spell out a bit more explicitly how we can
estimate the derivatives in such cases.
Example 4. In the table below we see the average finish time (in hours) in 2009
New York marathon by age group among women.
A 22 27 32 37 42 47 52 57 62 67
T 4.62 4.53 4.6 4.67 4.65 4.75 4.88 5.15 5.47 5.57
If T = f (A) is the average finish time for women of age A, estimate f ′ (22), f ′ (67)
and f ′ (52).
Solution: In the figure on the right, we
plot the data points from the table and
a dotted curve that interpolates smoothly
between those point and could hypothet-
ically represent the graph of the function
T = f (A). If we knew this function, we
could find the slope of the tangent line
at any age A, which would give us the
derivative f ′ (A). The problem is that we
do not know this function, so we have
to use the values given in the table. Nor-
mally, if we could zoom in on the actual function, we could estimate the slope of
∆y
the tangent line by the ratio ∆x using nearby points. Right now, the closest points
are the neighbouring points in the table, so we will use those values.
For example, to estimate the derivative f ′ (22), we can use the points (22, 4.62)
and (27, 4.53), for which ∆x = 27 − 22 = 5 years and ∆y = 4.53 − 4.62 = −0.09
hours. (Notice that we subtract the values in the same order.) As a result, − 0.09
5 =
60 2 Derivatives
−0.018 hour/year is our estimate for the derivative f ′ (22). We could also change
the units from hours to minutes, using that 1 hour = 60 minutes, to get −0.018
hour/year = −60 · 0.018 min/year = −1.08 min/year.
Similarly, we can estimate f ′ (67) using the points (62, 5.47) and (67, 5.57), in
which case we get 5.57−5.47
67−62 = 0.02 hour/year = 1.2 min/year.
Finally, to estimate f ′ (52), we have several choices. We can use a point to the
right to estimate f ′ (52) ≈ 5.15−4.88
57−52 = 0.054 hour/year = 3.24 min/year. We can use
a point to the left to estimate f ′ (52) ≈ 4.75−4.88
47−52 = 0.026 hour/year = 1.56 min/year.
Or we can take the average of the two estimates, which would be 3.24+1.56 2 = 2.4
min/year. Of course, taking the average is not strictly necessary, but it would often
give a better estimate. In the case when the increments ∆x are the same to the left
and right of our point, averaging the two values is the same as computing the slope
between those two neighbouring points, in this case 5.15−4.75
57−47 = 0.04 hour/year
= 2.4 min/year.
Exercise 4. The table below shows the energy economy E, in miles/kWh, at various
speeds S, in mph, for 2022 Audi GT RS. If E = f (S), estimate the derivatives
f ′ (50), f ′ (60), and f ′ (75).
S 50 55 60 65 70 75
E 3.65 3.40 3.30 2.95 2.70 2.30
When derivatives do not exist. It is important to remember that derivatives

do not always exist, and here are some examples when the derivative f ′ (a) is not
defined.
• If the function is not continuous at a point x = a then the derivative f ′ (a) does
not exist, as in the figure on the left at x = 0 where the function has a jump.
• If the function has a corner (also called a kink) at x = a then the derivative f ′ (a)
does not exist, as in the case of y = |x| in the middle figure, because the slope
on the right of x = 0 is different than on the left of x = 0.
• A less common example is in the figure on the right, where the function y =
x sin( 1x ) keeps fluctuating between two lines y = x and y = −x as in approaches
x = 0 so, again, there is no tangent line at x = 0.
Exercise. Draw two examples of graphs when derivatives do not exist, and give
one example of a formula y = f (x) when a derivative does not exist. Specify at
what points x the derivative is not defined and explain why.
Derivatives of inverse functions. Sup-
pose that a function y = f (x) is invertible,
so x = f −1 (y). In the figure on the right, we
flipped the figure at the beginning of this sec-
tion around the diagonal, so now it shows the
inverse function x = f −1 (y). Notice that the
x-axis and y-axis switched, b = f (a) is now
on the horizontal axis, and the slope of the
tangent line at y = b is the derivative of this
inverse function f −1 at the point y = b, which
is ( f −1 )′ (b). We will study how to calculate
derivatives later on, including derivatives of inverse functions, but for now let us
practice interpreting the meaning of this derivative ( f −1 )′ (b). Since the role of x
and y switch for the inverse functions, we can say:
If ∆y is small then between y = b and y = b + ∆y
the output x = f −1 (y) will change by approximately ( f −1 )′ (b) · ∆y.
Again, this is just another way to state that ( f −1 )′ (b) ≈ ∆x

∆y , where the increments
∆x and ∆y switched their roles, but should still be computed for two points close to
(b, f −1 (b)) = (b, a).
Example 5. The temperature of a cup of coffee is decreasing from 90°C at time

t = 0 min according to the function T = f (t). Suppose that ( f −1 )′ (70) = −0.8.
What are the units of 70 and −0.8, and what is the practical interpretation of this
derivative.
Solution: The inverse function is t = f −1 (T ) so 70 must be in °C and −0.8 must
∆t
be in min/°C, since this derivative has units of ∆T . The meaning of this derivative
is that if the coffee temperature decreased from 70°C to 65°C then the time it took
is approximately −0.8 · (65 − 70) = 4 min.
Exercise 5. Suppose that r = f (t) is the number of centimetres of rainfall since

midnight, where t is in hours. Suppose that the rainfall accumulated by 6 am is 12
cm. If f (t) is strictly increasing, is it invertible? Suppose that ( f −1 )′ (12) = 0.5.
What are the units of 12 and 0.5, and what is the practical interpretation of this
derivative.
Example 6. Let T = f (A) be the average finish time for women of age A in 2009
New York marathon that appeared in the Examples 2 and 4 above. Until about
62 2 Derivatives
age of 40 this function is not monotone, so not invertible, but if we restrict the
domain to ages of 40 and above then it looks increasing and invertible. Estimate
the derivative ( f −1 )′ (4.88) and give its units.
Solution: Since A = f −1 (T ), the units of the derivative will be the untis of ∆T
∆A
, so
−1
year/hour. From the table we see that f (4.88) = 52, so we can use nearby points
(4.75, 47) and (5.15, 57) to estimate the slope. Notice how we changed the role
of A and T and now write the average time T first, because it is the input of the
inverse function. Since the increments between those two points are ∆A = 57 −
47 = 10 years and ∆T = 5.15 − 4.75 = 0.4 hours, our estimate of the derivative is
( f −1 )′ (4.88) ≈ ∆T
∆A 10
= 0.4 = 25 year/hour. In Exercise 4 we found that f ′ (52) ≈ 0.4
1
hour/year, which is exactly the reciprocal 0.4 = 25 . This is not surprising because
∆A ∆T
for the inverse function we used ∆T instead of ∆A for the original function A =
f (T ). Remember this example when studying later on how to compute derivatives
of inverse functions, which will be based on the formula:
1
b = f (a) =⇒ ( f −1 )′ (b) = .
f ′ (a)
Exercise 6. Let E = f (S) be the energy economy function from the Exercise 4
above. Estimate the derivative ( f −1 )′ (3.30) and give its units. How does it relate
to f ′ (60) in the Exercise 4?
Answer to Exercise 1. The meaning of f ′ (100) = 5 is that, if the bakery spends

$101 instead of $100 on monthly advertisement then its average sales will increase
approximately by 4.5 · 1 = 2.5 dollars. The units of f ′ (100) are $/$. One can also
cancel the $ units and think of the derivative as a unitless quantity.
Answer to Exercise 2. It looks like the slope of the line connecting values at
55mph and 65mph would be a good approximation for the slope of the function at
60mph. Since ∆S = 10 mph and ∆E = 3.36 − 4.02 = −0.66 miles/kWh, the slope
of this line is ∆E
∆S = −0.066 (miles/kWh)/mph. Since mph is miles/hour, we can
cancel miles in the units and use h/kWh as the units of this derivative. However,
when interpreting its meaning we will keep using the increments of the original
variables S and E, which have units mph and miles/kWh, so from this point of
view there is no need to simplify the units. Again, the above calculation was only an
approximation of the derivative, but if indeed f ′ (60) = −0.066 (miles/kWh)/mph
then its meaning is the following: driving a car at the speed of 61 mph decreases
the energy efficiency approximately by 0.066 miles/kWh compared to driving it at
60 mph.
Answer to Exercise 3. Since cos( π4 ) = √12 and the derivative of cosine at π4 is

− sin( π4 ) = − √12 , the tangent line is y = √12 − √12 (x − π4 ). When x = π5 , the tangent
line gives y = √12 − √12 ( π5 − π4 ) = 0.818 . . . . The actual value is cos( π5 ) = 0.809 . . .,
and the difference is about 0.09.
Answer to Exercise 4. f ′ (50) ≈ −0.05, f ′ (75) ≈ −0.08, and f ′ (60) ≈ −0.045. In

the last one, the average was used. The units are (miles/kWh)/mph.
Answer to Exercise 5. Since t = f −1 (r), 12 is in cm and 0.5 is in hour/cm. The

interpretation of ( f −1 )′ (12) = 0.5 is that, between 12 cm and 12.1 cm of rainfall
accumulation it should take approximately 0.5 · 0.1 = 0.05 hours = 3 min. Since
we know that 12 cm accumulated at 6 am, we can say that at 6 am it will take
approximately 3 more minutes to accumulate another 1 mm of rainfall.
Answer to Exercise 6. ( f −1 )′ (3.30) ≈ 22.22 mph/(miles/kWh). This is the recip-

rocal of f ′ (60) from the Exercise 4, because f (60) = 3.30.
64 2 Derivatives
2.2 Formal definition of derivative

In this section, we will translate the zooming in procedure we used to define the
tangent line and its slope into formulas that we will later use to actually compute
derivatives. Along the way, we will also discuss secant lines and their slopes, which
have an important meaning of the average rate of change.
Secant lines and average rate of change. When we estimated the slope f ′ (a)
∆y
of the tangent line, we used the ratio ∆x of the increments of the input and output
of our function y = f (x), but we said that this approximation works well only if we
zoom in, which means that the two points should be pretty close to each other. It
∆y
turns out that this ratio ∆x has an important meaning and special name even if two
points are not close to each other.
In the figure on the right we pick
two points (a, f (a)) and (b, f (b)) on
the graph of the function y = f (x)
and draw a line through those points.
This line is called a secant line and its
slope
∆y f (b) − f (a)
=
∆x b−a
is called the average rate of change of

the function f (x) on the interval [a, b].
• Average rate of change has the same units and the same general meaning as the
∆y
derivative, because the above ratio ∆x describes how much the output changes,
f (b) − f (a), relative to how much the input changes, b − a.
• To understand why this rate of change is called average, think that the function
f (x) represents the position of a car at time x. The car might travel from position
f (a) to position f (b) between times a and b with constantly changing velocity,
f (b)− f (a)
but if it travelled with constant velocity v then v must be distance
time = b−a . The
secant line represents a car travelling at constant speed from point f (a) to point
f (b) between times a and b.
• In specific examples, when “rate of change” has a specific name then we can
replace it by that name. For example, in the car example, we can say “average
velocity” instead of “average rate of change”.
Example 1. What is the average rate of change of cos(x) on the interval [0, π]?
Solution: Because cos(0) = 1 and cos(π) = −1, the average rate of change equals
cos(π)−cos(0)
π−0 = −1−1 2
π−0 = − π = −0.6366 . . . .
Exercise 1. What is the average rate of change of ex on the interval [a, a + 1]?
2.2 Formal definition of derivative 65
Example 2. Draw a graph of any concave down function on some interval [a, b]
and compare the derivatives f ′ (a), f ′ (b) at the endpoints and the average rate of
change f (b)− f (a)
b−a .
Solution: One example of a graph of a
concave down function is in the fig-
ure on the right. We can see that,
as we move left to right, the slope
of the tangent line is getting smaller
and smaller. This means that f ′ (a) >
f ′ (b), and the average rate of change
is somewhere in between:
f (b) − f (a)
f ′ (a) > > f ′ (b).
b−a
What happens is the function is concave up? See https://youtu.be/U7GajesEPCo.
Exercise 2. For the functions in the

two figures, compare the following
quantities: 0, f ′ (1), f (3) − f (2),
and f ′ (4). Hint: 3 − 2 = 1.
Definition of derivative. Until now

we have thought about the derivative
f ′ (a) as the slope of the tangent line or
the rate of change near the point x = a,
but we have justified this informally by
observing that a function looks almost
like a straight line if we zoom in close
enough. How can we turn this zooming
in procedure into something more formal
that can be used to calculate f ′ (a)? We
can notice that:
f (b)− f (a)
• If we move the point b closer and closer to a, the slope of the secant line b−a
will get closer and closer to the slope of the tangent line f ′ (a).
This is a geometric definition of the derivative. The process of moving b closer
and closer to a should remind us of the concept of taking a limit and, using the
language of limits, we can write
f (b) − f (a)
f ′ (a) = lim .
b→a b−a
66 2 Derivatives
Another way to write this is to rename the increment b − a as h, so b = a + h, and

let this increment h become smaller and smaller,
f (a + h) − f (a)
f ′ (a) = lim .
h→0 h
This is an algebraic definition of the derivative that translates the above geometric
definition into formulas.
For example, what is the derivative of a constant function y = f (x) = c? Since
its graph is a horizontal line, the slope is equal to 0 everywhere, so f ′ (a) = 0. Now
we can also see this using the algebraic definition,
f (a + h) − f (a) c−c
f ′ (a) = lim = lim = lim 0 = 0.
h→0 h h→0 h h→0
Let us take a look at a few more examples of using this definition.
Example 3. Write down and simplify the definition of the derivative of y = ex at

x = 2.
Solution: Using the above formula and properties of exponentials, the derivative is
e2+h − e2 e2 eh − e2 e2 (eh − 1) eh − 1
lim = lim = lim = e2 · lim .
h→0 h h→0 h h→0 h h→0 h
In the last step, we took the factor e2 outside of the limit, because it is just a constant
that does not depend on h. Here, we practiced using the definition of the derivative
and simplified it a little bit, but we will come back to the last limit in a second.
Exercise 3. If we know that f (1) = 0, write down and simplify the definition of
the derivative of y = f (cos(x)) at x = 0.
Example 4. Write down and compute the derivative of y = x2 at x = 1.

Solution: Before taking the limit, let us first simplify the slope of the tangent line,
(1 + h)2 − 12 (1 + 2h + h2 ) − 1 2h + h2
= = = 2 + h.
h h h
When h gets small, this slope approaches 2 because 2 + h → 2 + 0 = 2, so the
derivative of y = x2 at x = 1 is 2.
Exercise 4. Write down and compute the derivative of y = x3 at x = 1. Hint: you

can use that (a + b)3 = a3 + 3a2 b + 3ab2 + b3 .
√
Example 5. Write down and compute the derivative of y = x at x = 9.
Solution: We will use a special trick of multiplying and dividing by the so called
√ and then
conjugate √ use √ (a −√b)(a + 2b) =√
√ the identity a2 − b2 to simplify the numer-
ator ( 9 + h − 9)( 9 + h + 9) = ( 9 + h) − ( 9)2 = (9 + h) − 9 = h:
√ √ √ √ √ √
9+h− 9 9+h− 9 9+h+ 9 h 1
= ·√ √ = √ √ =√ √ .
h h 9 + h + 9 h( 9 + h + 9) 9+h+ 9
When we take the limit h → 0, we get √ 1 √ = 16 .

9+0+ 9
1
Exercise 5. Write down and compute the derivative of y = x at x = 3.
Derivative as a function. If we can compute the derivative f ′ (a) for all points
x = a where the derivative exists, then we can think of the derivative as a new
function y = f ′ (x). In the examples and exercises above, instead of choosing some
specific value of a to compute the derivative f ′ (a), such as a = 2, 1, 9 or 3, we
could have chosen an arbitrary x and the same calculations would have given us
f ′ (x). Let us see how this works on a couple of examples.
Example 6. Show that (ex )′ = ex .

Solution: Setting up the derivative using the definition is the same as above:
ex+h − ex ex eh − ex ex (eh − 1) eh − 1
(ex )′ = lim = lim = lim = ex · lim .
h→0 h h→0 h h→0 h h→0 h
h
Let us plug in smaller and smaller values of h to see what number e h−1 approaches.
0.001 0.0001 0.00001 −1
For example, e 0.001−1 = 1.0005 . . ., e 0.0001−1 = 1.00005 . . ., e 0.00001 = 1.000005 . . ..
We can see that this gets closer and closer to 1, so
eh − 1
lim =1
h→0 h
and this shows what we wanted, (ex )′ = ex . From now on we no longer need to
calculate the derivative of ex at a specific point a, since we have a formula that
works for all x.
Comment. The truth is that, although we could see using a calculator that the above
h
limit limh→0 e h−1 was equal to 1, Euler’s number e = 2.718281828 . . . is actually
chosen in such a way that this limit is 1. If you recall, this limit was mentioned at the
end of Section 1.9 as one of the famous limits. You can watch in the footnote links3
more about where Euler’s number e comes from and how its definition implies the
above limit. In Chapter 1, we also mentioned that e is a very special base of an
exponential function and, what makes it special is exactly that the derivative of the
function y = ex is the function ex itself.
3 https://youtu.be/sbLWLvSfvwk, https://youtu.be/IX1cZHz-bc0.
68 2 Derivatives
Exercise 6. In the above examples and exercises replace a specific x = 1, 9 and 3

by a general x and show that
√ 1 1 ′ 1
(x2 )′ = 2x, (x3 )′ = 3x2 , ( x)′ = √ and = − 2.
2 x x x
All the functions in Exercise 6 are power functions of the form y = xn for n =
2, 3, 21 and −1, and all the derivatives are given by the following power rule:
′
xn = nxn−1 .
√
Example 7. Using the power rule, compute the derivative of x. What is the do-
main of this derivative function?
Solution: Using the power rule with n = 12 ,
√ ′ 1 ′ 1 1 1 1 1
x = x 2 = x 2 −1 = x− 2 = √ .
2 2 2 x
Of course, the power rule applies only where the function and the derivative are
well defined, so the domain of this derivative is x > 0.
1
Exercise 7. Using the power rule, compute the derivative of x2
. What is the domain
of this derivative function?
Example 8. Given any power function y = cxn , show that near any point x in its
domain,
∆y ∆x
≈n .
y x
If n = 2 and x changes by 1%, by what percentage approximately does y change?
Solution: If we move ∆x and y to the opposite sides of the equation, what we want
to show is that
∆y y cxn
≈n =n = cnxn−1 .
∆x x x
But cnxn−1 is the derivative of y = cxn , which by definition of the derivative can
∆y
be approximated by ∆x . It means that the above equation is just a rephrasing of
the usual meaning of the derivative in the case of power functions. If n = 2 and x
∆y
changes by 1% then ∆x = 0.01x, so ∆x ∆x
x = 0.01 and y ≈ 2 x = 0.02. So the above
equations shows that, for a power function y = cx2 , if the input x changes by 1%
then the output changes by approximately 2%. Of course, we can change 2 to any
other power n.
Exercise 8. If the radius of the sphere changes by 1%, by approximately what

percentage does the volume change?
A list of important derivatives. It is very important to remember the algebraic

and geometric definitions of the derivative above, just like it is important to know
the practical meaning of the derivative. However, once we learn the derivatives of
basic functions and after we learn a few rules of differentiation,4 we will be able
to compute derivatives of pretty much any function in a mechanical way. At this
point it is a good time to memorize the following derivatives:
1
(xn )′ = nxn−1 , (ex )′ = ex , (ln x)′ = ,
x
(sin x)′ = cos x, (cos x)′ = − sin x.
We have already explained above the formula (ex )′ = ex , and have checked several
special cases of the power rule (xn )′ = nxn−1 . The derivatives of sine and cosine
follow from some trigonometric identities and the famous limits mentioned at the
end of Section 1.9; if you are interested you can learn more about it in the footnote
link.5 The derivative of ln x and the general case of the power rule (for arbitrary
power n) will be explained later when we discuss derivatives of inverse functions.
The famous limits at the end of Section 1.9 were used to compute the derivatives
of ex , sin x and cos x, but once we know these derivatives we can reinterpret those
limits as derivatives of these functions at zero.
h
Example 9. Compute the limit limh→0 e h−1 using that (ex )′ = ex .
h 0+h 0 0+h 0
Solution: Since e h−1 = e h−e , the limit limh→0 e h−e is the definition of the
derivative of ex at x = 0. Since (ex )′ = ex , the derivative at x = 0 is equal to e0 = 1,
so the limit is 1.
Exercise 9. Compute the limit limh→0 sinh h using that (sin x)′ = cos x, and the limit
limh→0 ln(1+h)
h using that (ln x)′ = 1x .
First rules of differentiation. If the function f (x) changes by 5 between x = 0

and x = 1 and another function g(x) changes by 7 between x = 0 and x = 1 then
the sum f (x) + g(x) will change by 5 + 7 = 12. This means that if we add two
functions f (x) + g(x) then the increment ∆y of the sum will be equal to the sum
of their increments. Similarly, if we subtract two functions f (x) − g(x) then the
increment ∆y of the sum will be equal to the difference of their increments. Since
∆y
we compute derivatives by looking at ∆x and then taking limits, this means that
′ ′
f (x) + g(x) = f ′ (x) + g′ (x), f (x) − g(x) = f ′ (x) − g′ (x).
4 To differentiate a function means to take its derivative, and differentiation means taking a derivative.
5 https://youtu.be/buqwRTJcEmw.
70 2 Derivatives
If the function f (x) changes by 5 between x = 0 and x = 1 then the function 3 f (x)
will change by 3 · 5 = 15. This means that if we multiply our function by a constant
c then the increment ∆y will be multiplied by c, which implies that
′
c f (x) = c f ′ (x).
The above two rules together are called the linearity of differentiation. They can
also be called the sum rule, difference rule and constant multiple rule.
√
x(2x+7 x)−1
Example 10. Compute the derivative of f (x) = x 5/2 .
xa
Solution: First, using that xa xb = xa+b and xb
= xa−b , we can simplify,
√
x(2x + 7 x) − 1 2x2 + 7x3/2 − 1
=
x5/2 x5/2
= 2x 2−5/2
+ 7x 3/2−5/2
− x−5/2 = 2x−1/2 + 7x−1 − x−5/2 .
Then, using the above two rules and the power rule,
(2x−1/2 + 7x−1 − x−5/2 )′ = 2(x−1/2 )′ + 7(x−1 )′ − (x−5/2 )′

= 2(−1/2)x−1/2−1 + 7(−1)x−1−1 − (−5/2)x−5/2−1
1 7 5
= −x−3/2 − 7x−2 + (5/2)x−7/2 = − 3/2 − 2 + 7/2 .
x x 2x
The last step is not strictly necessary and the answer could be left in the form
−x−3/2 − 7x−2 + (5/2)x−7/2 . Both the original function and the derivative have
domain x > 0.
√
x(1−x)+ 2x
Exercise 10. Compute the derivative of f (x) = x5/2
. What is the tangent line
to this function at x = 1?
√1 − 3ex .

Example 11. Compute the derivative of f (x) = cos x + 2 ln x
Solution: First, let us simplify
1
2 ln √ = 2 ln(x−1/2 ) = 2(−1/2) ln(x) = − ln(x)
x
and then use the rules of differentiation,
′ 1
cos x − ln(x) − 3ex = (cos x)′ − (ln(x))′ − 3(ex )′ = − sin x − − 3ex .
x
Exercise 11. Compute the derivative of f (x) = −2 sin x − ln(x2 ) + 5ex .

Common notation for derivatives. Given a function y = f (x), its derivative can
be written in a number of way, for example,
df d dy
f ′ (x), , f (x), y′ (x), .
dx dx dx
dy
The last two, y′ (x) and dx can be used for any function, but it must be clear from
the context which specific function f (x) we are talking about. If we want to write
a derivative at some specific point x = a then we can use the following notation:
df d dy
f ′ (a), , f (x) , y′ (a), .
dx x=a dx x=a dx x=a
d
For example, dx (1 + x2 + cos x)|x=1 means that we first want to compute the
derivative of the function y = 1 + x2 + cos x and then plug in x = 1. We could
also write y′ (1) since we know what the function is. However, we cannot write
(1 + 12 + cos 1)′ , because it looks like we are taking the derivative of a constant
1 + 12 + cos 1 = 2.5403 . . . which is zero.
Higher order derivatives. Since the derivative f ′ (x) of a function y = f (x) is

a function itself, we can also compute its derivative ( f ′ (x))′ , which is called the
second derivative of f (x) and is denoted f ′′ (x). If we take another derivative, we
will get the third derivative f ′′′ (x). We can continue taking higher order deriva-
tives as long as they are well defined. The derivative of order n can be written in a
number of way, for example,
dn f dn dny
f (n) (x), , f (x), y(n) (x), .
dxn dxn dxn
As before, the last two can be used for any function, but it must be clear from the
context which specific function f (x) we are talking about. If we want to write a
derivative at some specific point x = a then we can use the following notation:
dn f dn dny
f (n) (a), , f (x) , y(n) (a), .
dxn x=a dxn x=a dxn x=a
For the first three derivatives when n = 1, 2 or 3, instead of writing f (n) we write
f ′ , f ′′ , f ′′′ and, instead of writing y(n) we write y′ , y′′ , y′′′ . Of course, the linearity of
differentiation rules apply to higher order derivatives because they apply at each
step,
dn dn
f (x) ± g(x) = f (n) (x) ± g(n) (x), c f (x) = c f (n) (x).

dx n dx n
72 2 Derivatives
Example 12. Compute the eighth derivative of f (x) = x7 − 4x6 + x5 − 2x2 + 1.

Solution: First of all, by linearity of differentiation,
d8 7 6 5 2 d8 7 d8 6 d8 5 d8 2 d8
(x − 4x + x − 2x + 1) = x − 4 x + x − 2 x + 8 1.
dx8 dx8 dx8 dx8 dx8 dx
First of all, the derivative of a constant 1 is 0, so all higher derivatives of a constant
will be zero. Every time we take a derivative of a power function, by the power
rule, the power will decrease by 1. For example, derivative of x2 will become 2x,
then 2, then 0, so the third and higher derivatives of x2 will be zero. For the same
reason, if we take the derivative of x7 eight times it will also become zero. So the
answer is zero.
Exercise 12. Compute the fourth derivative of cos x. Can you think of any other
function that has the same fourth derivative as cos x?
ea+1 −ea ea e1 −ea ea (e−1)
Answer to Exercise 1. (a+1)−a = 1 = 1 = (e − 1)ea .
Answer to Exercise 2. The left figure: 0 < f ′ (4) < f (3) − f (2) < f ′ (1), because
the slope is positive and decreasing as we move left to right, and because f (3) −
f (2) is the average rate of change on the interval [2, 3]. The right figure: f ′ (1) <
f (3) − f (2) < f ′ (4) < 0, because the slope is negative and increasing as we move
left to right, and because f (3) − f (2) is the average rate of change on the interval
[2, 3].
f (cos(0+h))− f (cos(0)) f (cos(h))
Answer to Exercise 3. lim h = lim h because the second
h→0 h→0
term is f (cos(0)) = f (1) = 0.
Answer to Exercise 4. Before taking the limit, let us first simplify the slope of the
tangent line,
(1 + h)3 − 13 (1 + 3h + 3h2 + h3 ) − 1 3h + 3h2 + h3

= = = 3 + 3h + h2 .
h h h
When h gets small, this slope approaches 3 because 3 + 3h + h2 → 3 + 0 + 0 = 3,
so the derivative of y = x3 at x = 1 is 3.
Answer to Exercise 5. The answer is − 91 because
1 3−(3+h) h
3+h − 31 (3+h)3 − (3+h)3 1 1 1
= = =− →− =− .
h h h (3 + h)3 (3 + 0)3 9
Answer to √Exercise 6. We will not repeat all the calculations and only show the
case of the x:
√ √ √ √ √ √
x+h− x x+h− x x+h+ x (x + h) − x 1
= ·√ √ = √ √ =√ √ .
h h x + h + x h( x + h + x) x+h+ x
1 √ 1
When we take the limit h → 0, we get √x+0+ x
= 2√ x
. Of course, this only works
√ ′
when x > 0, so the derivative ( x) exists only when x > 0.
′ ′
Answer to Exercise 7. Using the power rule with n = −2, x12 = x−2 =
−2x−2−1 = −2x−3 = − x23 . The function and derivative are defined when x ̸= 0.
Answer to Exercise 8. Volume is proportional to the radius cubed, V = 4π 3

3 r , so
this is just like Example 10 with n = 3, and the volume will change by approxi-
mately 3%.
Answer to Exercise 9. Since sinh h = sin(0+h)−sin(0)

h , the limit limh→0 sin(0+h)−sin(0)
h is
the definition of the derivative of sin x at x = 0. Since (sin x)′ = cos x, the derivative
at x = 0 is equal to cos(0) = 1, so the first limit is 1. Since ln(1+h) h = ln(1+h)−ln(1)
h ,
ln(1+h)−ln(1)
the limit limh→0 h is the definition of the derivative of ln x at x = 1. Since
(ln x)′ = 1x , the derivative at x = 1 is equal to 1, so the second limit is also 1.
Answer to Exercise 10. First we simplify the function as x−2 − x−1 + 2x−7/2 and
then take the derivative,
(x−2 − x−1 + 2x−7/2 )′ = (x−2 )′ − (x−1 )′ + 2(x−7/2 )′

= (−2)x−2−1 − (−1)x−1−1 + 2(−7/2)x−7/2−1
2 1 7
= −2x−3 + x−2 − 7x−9/2 = − 3 + 2 − 9/2 .
x x x
Since f (1) = 2 and f ′ (1) = −8, the tangent line is y = 2 − 8(x − 1) = 10 − 8x.
Answer to Exercise 11. −2 cos x − 2x + 5ex .
Answer to Exercise 12. Consecutive derivatives of cos x will be − sin x, − cos x, sin x
and cos x. So the fourth derivative of cos x is cos x itself. Any function of the form
cos x +ax3 +bx2 +cx +d will also have the fourth derivative equal to cos x, because
all the power functions will disappear after taking four derivatives, just like in the
previous example.
74 2 Derivatives
2.3 Derivatives and graphs

Because the derivative f ′ (x) represents the rate of change of the function y = f (x)
near a point x and, at the same time, it is the slope of the tangent line at this point,
we can relate the behaviour of the rate of change to the behaviour of the graph. You
can watch the footnote link for a quick summary.6
• If f ′ (x) > 0 then the slope is positive and the function is increasing.
• If f ′ (x) < 0 then the slope is negative and the function is decreasing.
• If f ′ (x) = 0 then the slope is horizontal. Such points will be an example of the
so called critical points, but we will discuss them in the later sections when we
study optimization problems.
The sign of the second derivative f ′′ (x) tells us whether the first derivative f ′ (x) is
increasing or decreasing.
• If f ′′ (x) > 0 then the slope is increasing and the function is concave up.
• If f ′′ (x) < 0 then the slope is decreasing and the function is concave down.
Basic building blocks are the following four examples.
To help us talk about these four cases, let us imaging that the function y =
f (t) describes a position (or coordinate) y of a car moving along a straight line
as a function of time t. The straight line has a positive and negative direction, so
the car can move forward or backward. The derivative f ′ (t) represents velocity at
time t, which can be positive (if the car is moving forward) or negative (if the car
is moving backward). Absolute value of the velocity | f ′ (t)| is called speed. The
second derivative f ′′ (t) represents acceleration.
(a) The graph is increasing and concave up. If f ′ (x) > 0 then velocity is positive
and the car is moving in the positive direction. If f ′′ (0) > 0 then velocity is
increasing and, in this case, the car is moving faster and faster. The fact that the
slope is increasing means that the graph is concave up.
(b) The graph is increasing and concave down. If f ′ (x) > 0 then velocity is positive
and the car is moving in the positive direction. If f ′′ (0) < 0 then velocity is
decreasing and, in this case, the car is moving slower and slower. The fact that
the slope is decreasing means that the graph is concave down.
6 https://youtu.be/tCs5DK951Js.
2.3 Derivatives and graphs 75
(c) The graph is decreasing and concave up. If f ′ (x) < 0 then velocity is nega-
tive and the car is moving in the negative direction. If f ′′ (0) > 0 then velocity
is increasing and, again, increasing slope means that the graph is concave up.
However, in this case, the speed is decreasing so the car is moving slower and
slower. That is because, if the velocity increases from −3 to −1 then the speed
decreases from 3 to 1.
(d) The graph is decreasing and concave down. If f ′ (x) < 0 then velocity is negative
and the car is moving in the negative direction. If f ′′ (0) < 0 then velocity is
decreasing and, again, decreasing slope means that the graph is concave down.
However, in this case, the speed is increasing so the car is moving faster and
faster. That is because, if the velocity decreased from −1 to −3 then the speed
increased from 1 to 3.7
Example 1. A cup of hot coffee left at room temperature will cool down, but the
rate of cooling will slow down. Translate this into a statement about derivatives
and graph of some function.
Solution: The function here is the temperature T (t) of a cup of coffee as a function
of time. Cooling down means that T ′ (t) < 0, and the rate of cooling slowing down
means that T ′′ (t) > 0. The graph of T (t) will be decreasing and concave up.
Exercise 1. Between January and May, 2021, the decline in Lake Mead water levels
have accelerated. Translate this into a statement about derivatives and graph of
some function.
The next example and exercise will refer to the following two figures.
Example 2. In the figure on the left, given the graph of y = f (x) (solid black line),
determine which curve is the graph of its derivative f ′ (x), (a), (b), or (c).
Solution: The function is decreasing up to about x = −4.5 and right after that it
starts increasing. This means that the derivative should be negative up to −4.5, so
7 https://youtu.be/6NaUJ6OGcLU.
76 2 Derivatives
the graph should be below the x-axis, and right after −4.5 it should become posi-
tive, so the graph should be above the x-axis. The only graph with such behaviour
is (b). Similar behaviour happens at x = 0 and about x = 4.5, where the function
f (x) changes direction and (b) changes sign by crossing the x-axis. Such points are
called critical points.
Notice also that when (b) is decreasing (between about −2.5 and 2.5), the func-
tion f (x) is concave down (the slope is decreasing), and when (b) is increasing,
the function f (x) is concave up (the slope is increasing). These points where the
derivative changes direction (here about −2.5 and 2.5) and the original function
changes concavity are called inflection points.
Exercise 2. In the figure on the right, given the graph of y = f (x) (solid black line),
determine which curve is the graph of its derivative f ′ (x), (a), (b), or (c). Where
are the critical points, and inflection points of f (x)?
The next two examples will refer to the following two figures. Notice that the
solid black line is the graph of the derivative y = f ′ (x), not the original function.
Example 3. In the figure on the left, given the graph of y = f ′ (x) (solid black line),
determine which curve is the graph of f (x), (a), (b), or (c).
Solution: The derivative is positive between about −4 and 2 and negative outside
of that interval, so the function f (x) should be increasing between −4 and 2 and
decreasing outside. The answer is (b). The derivative changes direction at about
x = −1.2, so the function (b) has an inflection point there, where it switches from
concave up to concave down.
Exercise 3. In the figure on the right, given the graph of y = f ′ (x) (solid black
line), determine which curve is the graph of f (x), (a), (b), or (c).
Example 4. Search “pole vault” on Youtube and watch some videos. Which of
the following two graphs below more accurately describes the horizontal position
x = x(t) of the vaulter as a function of time t?
Solution: During the approach be-

fore the jump, the athlete is ac-
celerating, so the graph is increas-
ing and concave up. Starting from
plant and take-off stage, the vaulter
starts moving in the vertical direc-
tion but the horizontal movement
slows down, and the distance be-
tween take off and landing is relatively small. So the graph on the left is a more
accurate description of the horizontal position x(t) of the vaulter over time.
Exercise 4. Sketch a graph of the posi-
tion of a tennis ball along the tennis court
ℓ = f (t) as a function of time t when the
players hit the ball back and forth from
the baseline.8
Example 5. A skydiver free falling in a belly-to-earth (face down) position will

approach the so called terminal speed of about 56 m/s. The speed is increasing
due to gravity, but it is increasing slower and slower due to the drag force (air
resistance). Sketch the graph of velocity as a function of time when the distance to
the skydiver is measured (a) from the height of the jump or (b) from the ground.
Solution: (a) When the height f (t) of
the skydiver is measured from the orig-
inal point of the jump, it is increas-
ing and the derivative (velocity) f ′ (t) is
positive. Also velocity will be increas-
ing, concave down (because it is in-
creasing slower and slower due to the
drag force), and it will have a hori-
zontal asymptote equal to the terminal
speed. (b) When the height f (t) of the
skydiver is measured from the ground,
it is decreasing and the derivative f ′ (t)
is negative. In this case the velocity is equal to minus the speed, so velocity will be
decreasing, concave up (because it is decreasing slower and slower due to the drag
force), and it will have a horizontal asymptote equal to the minus terminal speed.
The two cases differ only by a minus sign, so the graph is simply flipped around
the x-axis.
8 Photo from https://www.pexels.com/photo/two-person-playing-tennis-1619860/
78 2 Derivatives
Exercise 5. Sketch a graph of a function that satisfies the following properties:

• limx→−∞ f (x) = 2 • f (2) = 2
• f (−3) = 2 • f (x) is continuous at x = 3
• limx→−3 f (x) does not exist • f (x) is not differentiable at x = 3
• f ′ (0) < f (1)− f (−1)
1−(−1) • f ′ (x) > 0 for x > 3
• f (x) has a vertical asymptote at x = 2 • f ′′ (x) < 0 for x > 3.
Example 6. Suppose that two functions y = f (x)

and y = g(x) are equal at x = a, i.e. f (a) = g(a),
and f ′ (a) < g′ (a). Which function is bigger imme-
diately to the right of x = a?
Solution: We can see from the figure that the graphs
of y = f (x) and y = g(x) intersect at x = a and the
slope of g(x) is bigger than the slope of f (x) at the
point of intersection. As a result, g(x) is bigger im-
mediately to the right of x = a.
Exercise 6. Suppose that two functions y = f (x) and y = g(x) are equal at x = a,
i.e. f (a) = g(a), and f ′ (a) < g′ (a). Which function is bigger immediately to the
left of x = a?
Example 7. If f ′′ (x) > 0 on some interval then f ′ (x) is and f (x) is

on that interval.
Solution: If f ′′ (x) > 0 on some interval then f ′ (x) is increasing and f (x) is concave
up on that intervals.
Exercise 7. If f ′′ (x) < 0 on some interval then f ′ (x) is and f (x) is

on that interval.
The next example and exercise will refer to the following two figures.
Example 8. In the figure above on the left, determine which graph corresponds to
f (x), f ′ (x), and f ′′ (x). Where are the inflection points of f (x), and what happens
to f ′ (x) and f ′′ (x) at those points?
Solution: We can see that at the points x where the dotted blue curve (a) crosses the
x-axis, the solid green curve (b) changes direction from increasing to decreasing or
vice versa. This means that (a) is the derivative of (b). Similarly, where the solid
green curve (b) crosses the x-axis, the dashed red curve (c) changes direction, so (b)
is the derivative of (c). This means that (c) if the graph of y = f (x), (b) if the graph
of y = f ′ (x), and (a) if the graph of y = f ′′ (x). Inflection points are where f ′ (x)
changes direction, which is at around x = 0, 0.75, 1.85, 3.2. At inflection points
f ′ (x) changes direction, and f ′′ (x) crosses the x-axis.
Exercise 8. In the figure above on the right, determine which graph corresponds to
f (x), f ′ (x), and f ′′ (x). Where are the inflection points of f (x), and what happens
to f ′ (x) and f ′′ (x) at those points? What happens at x = 0?
Example 9. Give an example of a function

f (x) such that f ′′ (0) = 0 but x = 0 is not
an inflection point of f (x).
Solution: One example if f (x) = x4 in the
figure on the right. In this case, f ′ (x) = 4x3
and f ′′ (x) = 12x2 , so f ′′ (0) = 0, but f ′ (x)
does not change direction at x = 0 and f (x)
does not have change concavity at x = 0.
Exercise 9. Given y = f ′ (x), sketch a continuous function y = f (x) in the two

figures below. Notice that in the second case, the derivative f ′ (x) is not defined at
x = 0, where it has a jump.
80 2 Derivatives
Answer to Exercise 1. The function here is the water level h(t) of Lake Mead
as a function of time. Water level declining means that h′ (t) < 0, and the decline
accelerating means that h′′ (t) < 0. The graph of T (t) will be decreasing and con-
cave down. Of course, water levels might fluctuate slightly, so we should talk about
averages over a certain period of time.
Answer to Exercise 2. The answer is (b). Critical points are about −4.8, −0.4, 4.2
where the function f (x) changes direction and f ′ (x) crosses the x-axis, and inflec-
tion points are about −3.5 and 1.5, where the derivative f ′ (x) changes direction.
Answer to Exercise 3. The answer is (b).
Answer to Exercise 4. https://youtu.be/aewfFlVg-MU
Answer to Exercise 5. You can check one by one that the graph below satisfies
all the above properties. Notice that f (1)− f (−1)
1−(−1) is the slope of the line connecting
the points (−1, f (−1)) and (1, f (1)), which is bigger than f ′ (0) = 0 in the figure.
Also, although f (x) has a vertical asymptote, the dot at (2, 2) indicates that we
chose f (2) to be equal to 2; this is a legal move although the function will be
discontinuous at x = 2. f (x) is continuous but not differentiable at x = 3 because it
has a corner there.
Answer to Exercise 6. f (x) is bigger immediately to the left of x = a.

Answer to Exercise 7. If f ′′ (x) < 0 on some interval then f ′ (x) is decreasing and
f (x) is concave down on that interval.
Answer to Exercise 8. (a) if the graph of y = f (x), (b) if the graph of y = f ′ (x), and
(c) if the graph of y = f ′′ (x). Inflection points are where f ′ (x) changes direction,
which is at around x = 1.1, 2.3, 3.6. This is where (b) changes direction, and (c)
crosses the x-axis. Notice that x = 0 is not an inflection point despite the fact that
f ′′ (0) = 0, because f ′′ only touches the x-axis but does not cross it and as a result,
f ′ (x) does not change direction at x = 0. So it is possible that f ′′ (x) = 0 but x is
not an inflection point.
Answer to Exercise 9. Possible sketches are in the figures above. The functions
y = f (x) could be shifted vertically, because adding a constant y = f (x) + c does
not affect the derivative, ( f (x) + c)′ = f ′ (x). In the second figure, the function f (x)
has a corner at x = 0, so f ′ (0) is undefined. The derivative jumps from positive
to negative value, so increasing function suddenly becomes decreasing (like a ball
bouncing off a wall changing direction suddenly). The derivative is increasing on
both sides of x = 0, so the function is concave up on both sides. The function is
increasing exactly when f ′ (x) > 0, i.e. the graph of y = f ′ (x) is above the x-axis.
82 2 Derivatives
2.4 Differentiation rules

In this section we will learn how to use four rules of differentiation.
• Product rule: the derivative of the product f (x)g(x) of two functions is
′
f (x)g(x) = f ′ (x)g(x) + f (x)g′ (x).
f (x)
• Quotient rule: the derivative of the ratio g(x) of two functions is
f (x) ′ f ′ (x)g(x) − f (x)g′ (x)

= .
g(x) g(x)2
• Chain rule: the derivative of the composition f (g(x)) of two functions is
′
f (g(x)) = f ′ (g(x))g′ (x).
• Inverse function rule: the derivative of the inverse function f −1 (x) is
′ 1
f −1 (x) = .
f ′ ( f −1 (x))
A more convenient way to phrase the chain rule is that, if b = f (a) then
1
f −1 )′ (b) = .
f ′ (a)
Of course, in all these rules we assume that everything is well defined on the right
hand side of each equation. For example, we never divide by zero, etc.
Where the formulas come from. Below we will focus on learning how to use
these rules, but if you are interested to learn where they come from, all the rules can
be derived by simple manipulations from the algebraic definition of the derivative.
You can learn more about the derivation of the chain rule9 and the inverse function
rule 10 in the footnote links. Here we will only show how to derive the product rule.
∆y
We want to see what happens to the ratio ∆x when the increment ∆x gets smaller
and smaller and y = f (x)g(x). If ∆ f and ∆h are the increments of f and g then
f (x + h) = f (x) + ∆ f , h(x + h) = g(x) + ∆g,
and we can rewrite the increment ∆y = ∆( f g) of the product as

9 https://youtu.be/iJ94gm_-vsE
10 https://youtu.be/y9jzS-sUeM8
2.4 Differentiation rules 83
∆y = ∆( f g) = f (x + h)g(x + h) − f (x)g(x)

= f (x) + ∆ f g(x) + ∆g − f (x)g(x)
= ∆ f · g(x) + f (x) · ∆g + ∆ f · ∆g.
After dividing by the increment ∆x (which we also call h),

∆y ∆ f ∆g ∆ f
= g(x) + f (x) + ∆g → f ′ (x)g(x) + f (x)g′ (x) + f ′ (x) · 0
∆x ∆x ∆x ∆x
as ∆x → 0, which is exactly the formula in the product rule. The quotient rule can
be shown by a similar calculation.
Chain rule. We will start with the chain rule, because it is the most basic build-
ing block, and because it will give us a much richer collection of functions to
play with when we use the product rule and quotient rule. The chain rule is some-
times called the outside-inside rule, because when we compute the derivative of
the composition f (g(x)) we first take derivative of the outside function f ′ , plug in
the inside function g(x), f ′ (g(x)), and then multiply by the derivative of the inside
function g′ (x).
Example 1. State what the outside function f (x) and inside function g(x) are, and
compute the derivative using the chain rule.
√
(a) e2x+5 (d) ln x
p 2
(b) cos(x2 ) (e) ex + cos(x)
ecos(x)
(c) cos2 (x) (f) e2x
Solution: (a) In e2x+5 , the outside function is f (x) = ex and the inside function is
g(x) = 2x + 5, so e2x+5 = f (g(x)). Because f ′ (x) = ex and g′ (x) = 2, the chain rule
f ′ (g(x))g′ (x) gives e2x+5 · 2 = 2e2x+5 .
(b) In cos(x2 ), the outside function is f (x) = cos x and the inside function is
g(x) = x2 , so cos(x2 ) = f (g(x)). Because f ′ (x) = − sin x and g′ (x) = 2x, the chain
rule f ′ (g(x))g′ (x) gives − sin(x2 ) · 2x = −2x sin(x2 ).
(c) In cos2 (x) = (cos x)2 , the outside function is f (x) = x2 and the inside function
is g(x) = cos x, so cos2 (x) = f (g(x)). Because f ′ (x) = 2x and g′ (x) = − sin x, the
chain rule f ′ (g(x))g′ (x) gives 2 cos x · (− sin x) = −2 cos x sin x.
√
√ (d) In1/2ln x, the√outside function is f (x) = ln x and the inside function is g(x) =
x = x , so ln x = f (g(x)). Because f ′ (x) = 1x and g′ (x) = 2√ 1
x
, the chain
rule f ′ (g(x))g′ (x) gives √1
x
1
· 2√ x
= 1
2x .
There is a much easier way to compute
√
this derivative if we first simplify the function ln x = ln(x1/2 ) = 21 ln x; then the
1
derivative is immediately 2x .
84 2 Derivatives
p2 √
(e) In ex + cos(x), the outside function is f (x) = x = x1/2 and the inside
2
function is ex + cos(x). The derivative of the outside function is f ′ (x) = 2√
1
x
. The
2 2
derivative of the inside function (ex +cos(x))′ = (ex )′ −sin x requires us to use the
2 2 2 2
chain rule one more time to compute (ex )′ = ex · (x2 )′ = ex · 2x = 2xex . Finally,
we get that
2
′ ′ 2xex − sin x
f (g(x))g (x) = p 2 .
2 ex + cos(x)
(f) This problem might look like we need to use the quotient rule, but, in fact,
we can simplify the function as ecos(x)−2x and apply the chain rule. The outside
function is f (x) = ex and the inside function is g(x) = cos(x) − 2x. Because f ′ (x) =
ex and g′ (x) = − sin(x) − 2 = −(sin(x) + 2), the chain rule f ′ (g(x))g′ (x) gives
−ecos(x)−2x (sin(x) + 2).
The case when the inside function is linear, as in (a) above, is so common that it
is worth stating it explicitly as a special case of the chain rule:
′
f (mx + b) = m f ′ (mx + b).
One particularly important case is an exponential function y = ax with the general

base a > 0. Recall that we can always rewrite it as ax = eκx with κ = ln(a), and we
already know that (eκx )′ = κeκx = ln(a)ax . This shows that
(ax )′ = ln(a)ax .
It is worth remembering this formula instead of repeating the same argument.

Exercise 2. State what the outside function f (x) and inside function g(x) are, and
compute the derivative using the chain rule.
√
(a) 2 x (d) ln(cos(x))
p
(b) ln(x2 ) (e) (23x 32x )7
2cos(x)
p
(c) sin(x) (f) 3sin(x)
Example 2. Given the graph of a function y = f (x)

in the figure and the table of values for the function
d
y = g(x), compute dx f (g(x))|x=1 .
x 0 1 2 3
g(x) 0 3 0.5 1
g′ (x) 2 0.5 -1 0
Solution: If we use the chain rule, dx d

f (g(x)) = f ′ (g(x))g′ (x) and then plug in x = 1,
we get dx f (g(x))|x=1 = f (g(1))g (1). From the table, g(1) = 3 and g′ (1) = 0.5,
d ′ ′
so f ′ (g(1))g′ (1) = f ′ (3) · 0.5. From the graph we see that f ′ (3) = −2 because
between x = 2 and x = 3 the graph is a line connecting the points (2, 4) and (4, 0),
so it has slope −2. This gives dx d
f (g(x))|x=1 = f ′ (g(1))g′ (1) = −2 · 0.5 = −1.
d
Exercise 2. In the setting of the previous problem, compute dx g( f (x))|x=1 .
Example 3. If the slope of f (x) is always positive and the slope of g(x) is always
negative, are the following functions increasing or decreasing?
(a) f (g(x)) (b) g( f (x)) (c) f ( f (x)) (d) g(g(x))
Solution: To decide if each function is increasing or decreasing, we will compute

its derivative and check if it is positive or negative. We will use symbols ⊕ and ⊖
to indicate if a quantity is positive or negative.
(a) ( f (g(x)))′ = f ′ (g(x))g′ (x) = ⊕ × ⊖ = ⊖, so decreasing.
(b) (g( f (x)))′ = g′ ( f (x)) f ′ (x) = ⊖ × ⊕ = ⊖, so decreasing.
(c) ( f ( f (x)))′ = f ′ ( f (x)) f ′ (x) = ⊕ × ⊕ = ⊕, so increasing.
(d) (g(g(x)))′ = g′ (g(x))g′ (x) = ⊖ × ⊖ = ⊕, so increasing.
Exercise 3. Let f (t) be the quantity (measured in kg) of a chemical produced by

some chemical reaction up to time t (measured in minutes). What is the rate of
production of this chemical in g/sec at time t seconds?
Product and quotient rule. Now, we will add the product and quotient rules
into the mix.
Example 4. Compute the derivatives
d d f (x)
f (x)g(x) and
dx x=2 dx g(x) x=2
x f (x) g(x) f ′ (x) g′ (x)

given the table of values:
2 0.5 3 −1 1
Solution: By the product rule,
d
f (x)g(x) = f ′ (2)g(2) + f (2)g′ (2) = (−1) · 3 + 0.5 · 1 = −2.5.
dx x=2
By the quotient rule,

d f (x) f ′ (2)g(2) − f (2)g′ (2) (−1) · 3 − 0.5 · 1 3.5
= 2
= 2
=− .
dx g(x) x=2 g(2) 3 9
86 2 Derivatives
Exercise 4. Compute the derivatives
d d f (x)
f (x)g(x) and
dx x=2.3 dx g(x) x=2.3
x 1 2 3 4
given the table of values: f (x) 1.5 0.5 0 0.3
g(x) -1 0.25 0 -0.35
Example 5. Compute the derivatives of the following functions.
(a) e−2x+5 cos(3x) (c) 1

sin2 (x)
(b) tan(x) (d) x2 ex sin(x)
Solution: (a) We use the product rule and then the chain rule,
′ ′ ′
e−2x+5 cos(3x) = e−2x+5 cos(3x) + e−2x+5 cos(3x)
= e−2x+5 (−2) cos(3x) + e−2x+5 − sin(3x)(3)

= −2e−2x+5 cos(3x) − 3e−2x+5 sin(3x)

= −e−2x+5 2 cos(3x) + 3 sin(3x) .

sin(x)
(b) We first rewrite tan(x) = cos(x) and then use the quotient rule,
sin(x) ′ (sin(x))′ cos(x) − sin(x)(cos(x))′
=
cos(x) cos2 (x)
cos(x) cos(x) + sin(x) sin(x)
=
cos2 (x)
cos (x) + sin2 (x)
2 1
= 2
= = sec2 (x).
cos (x) cos2 (x)
It is good to remember that tan′ (x) = sec2 (x).

(c) One way to compute the derivative of sin21(x) is to use the quotient rule (com-
bined with the chain rule when taking the derivative of sin2 (x))
1 ′ (1)′ sin2 (x) − (1)(sin2 (x))′
=
sin2 (x) (sin2 (x))2
(0) sin2 (x) − 2 sin(x) cos(x)
=
sin4 (x)
2 sin(x) cos(x) 2 cos(x)
=− 4
=− 3 .
sin (x) sin (x)
Another way to compute the derivative is to rewrite 1

sin2 (x)
= (sin(x))−2 and use
the chain rule with the outside function f (x) = x−2 and the inside function g(x) =
sin(x),
′ 2 cos(x)
(sin(x))−2 = −2(sin(x))−2−1 cos(x) = −2(sin(x))−3 cos(x) = − 3 .
sin (x)
(d) In this case, we need to apply the product rule twice. First, we can think of
x2 ex sin(x) as the product of x2 and ex sin(x), so that
′ ′ ′ ′
x2 ex sin(x) = x2 ex sin(x) + x2 ex sin(x) − = 2xex sin(x) + x2 ex sin(x) .
In the second term, we need to use the product rule again to compute (ex sin(x))′ =
(ex )′ sin(x) + ex (sin(x))′ = ex sin(x) + ex cos(x), and then plug in above to get the
final answer,
′
x2 ex sin(x) = 2xex sin(x) + x2 ex sin(x) + x2 ex cos(x)
= xex 2 sin(x) + x sin(x) + x cos(x) .

In the last problem when we computed the derivative of the product of three
functions and used the product rule twice, the two steps can be combined into one
easy-to-remember formula:
′
f (x)g(x)h(x) = f ′ (x)g(x)h(x) + f (x)g′ (x)h(x) + f (x)g(x)h′ (x).
The same rule will work with four or more factors. We have to apply derivative to
each factor separately and then add up all the terms.
Exercise 5. Compute the derivatives of the following functions.
(a) e−x (cos(x) + sin(x)) (c) 1

(2x +1)2
ln x
(b) 1+x2
(d) x22x ln(1 + x2 )
Inverse function rule. An explanation of the inverse function rule can be found
in the footnote link11 , but the basic idea is quite simple and we have already seen
it in the Example 6 in Section 2.1. Basically, in the inverse function the role of
∆y
variables x and y switches, so the derivative is approximated by ∆x ∆y instead of ∆x .
The only subtle point is that if b = f (a) then the same increments ∆x and ∆y are
used at x = a for f or y = b for f −1 ; that is why the derivative of the inverse function
at b is the reciprocal of the derivative of the original function at a. Before we look
at the examples of using the inverse function rule, let us first review the practical
meaning of the derivative of an inverse function.
11 https://youtu.be/y9jzS-sUeM8
88 2 Derivatives
Example 6. Chocolate store sell 150 Sacher tortes

(small size, serves 10 people) each month for the
price of $40. If they lowered the price to $35, they
would sell 160 per month. If N = S(p) is the num-
ber N of tortes sold each month when the price is
p dollars, what formula does the above information
correspond to?
(a) (S−1 )′ (40) ≈ −5 (c) S′ (150) ≈ 10

(b) S′ (40) ≈ −0.5 (d) (S−1 )′ (150) ≈ −0.5
Solution: When solving this type of problem, it is helpful to visualize it using the
following diagram that indicates the input variable and output variable for both
N = S(p) and its inverse p = S−1 (N).
S
Price p (in $) Tortes sold N (in #)
S−1
• The input of S and S′ should be price p, in this case p = 40, which eliminates
(c).
• The input of S−1 and (S−1 )′ should be the number N of tortes sold, in this case
N = 150, which eliminates (a).
• The derivative S′ (40) can be approximated by ∆N ∆p (increment of the output over
increment of input). In our case, ∆N = 160 − 150 = 10 and ∆p = 35 − 40 = −5,
so S′ (40) ≈ −5
10
= −2. So (b) is not correct.
• The derivative (S−1 )′ (150) can be approximated by ∆N ∆p
(again, increment of
the output over increment of input), so (S−1 )′ (150) ≈ −5
10 = −0.5. So (d) is the
correct answer.
Exercise 6. The total cost of owning a car depends on the APR (annual percentage
rate) of the auto loan. Suppose that when APR is 4%, the total cost is $30, 000, and
lowering APR to 3.5% will decrease the total cost to $29, 500. If c = f (r) is the
total cost c at the rate r, what formula does the above information correspond to?
(a) ( f −1 )′ (4) ≈ 0.5 (c) f ′ (30, 000) ≈ 500

(b) ( f −1 )′ (30, 000) ≈ 0.001 (d) f ′ (4) ≈ −500
Next, let us practice the inverse function rule: if b = f (a) then f −1 )′ (b) = 1
f ′ (a) .
Example 7. Given the graph of a function y = f (x)

in the figure and the table of values for the function
y = g(x), which is invertible, compute (a) (g−1 )′ (3)
d −1
and (b) dx g ( f (x))|x=1 .
x 0 1 2 3
g(x) 3 2 1.5 1.25
g′ (x) −1.5 −1 −0.5 −0.25
Solution: (a) Inverse function rule tells us that (g−1 )′ (3) = g′ 1(a) , where a is such
that 3 = g(a). Looking at the table, we see that g(0) = 3, so a = 0. This means that
(g−1 )′ (3) = g′ 1(0) = −1.5
1
= − 23 .
d −1
(b) First, by chain rule, dx g ( f (x))|x=1 = (g−1 )′ ( f (1)) f ′ (1). Looking at the
graph, we see that f (1) = 2 and f ′ (1) = 2, so dx d −1
g ( f (x))|x=1 = 2(g−1 )′ (2) =
2 2 2
g′ (a) , where 2 = g(a). Looking at the table, g(1) = 2, so a = 1 and g′ (1) = −1 = −2.
Exercise 7. Given the functions in Example 7, compute (a) (g−1 )′ (1.5) and (b)
d −1
dx g ( f (x))|x=3 .
When we discussed inverse functions, we define three classic ones: ln(x) as the
inverse of ex , arctan(x) as the inverse of sin(x) on [− π2 , π2 ], and arctan(x) as the
inverse of tan(x) on (− π2 , π2 ). One can use the inverse function rule to show that:
′ 1 ′ 1 ′ 1
ln(x) = , arctan(x) = , arcsin(x) = √ .
x 1 + x2 1 − x2
The first one is explained in the footnote link12 , so let us show the second one here.
Example 8. Show that
′ 1
arctan(x) = .
1 + x2
Solution: Before solving the problem, recall

the graph of arctan(x). Its range is between
(− π2 , π2 ), and it has horizontal asymptotes at
+∞ and −∞, so the slope is approaching 0
1
there. We can see that 1+x 2 → 0 as x → +∞
or −∞, so the formula matches the behaviour
1
of the slope. Also, arctan(x) is increasing, which matches that 1+x2
> 0.
12 https://youtu.be/-W7r1Ug062w
90 2 Derivatives
Now, let us prove the formula. Recall that we already proved that tan′ (x) =
sec2 (x). If b = tan(a) for some a ∈ (− π2 , π2 ) then
1 1
arctan′ (b) = = .
tan′ (a) sec2 (a)
However, the answer should be in terms of b, so we need to express sec2 (a) in terms
of b = tan(a), which can be done by finding the relationship between sec(a) and
tan(a) among the Pythagorean trigonometric identities13 : sec2 (a) = 1 + tan2 a =
1 + b2 , so arctan′ (b) = 1+b
1
2 , which is exactly what we wanted.
Exercise 8. Show that

′ 1
arcsin(x) = √ .
1 − x2
√
Hint: you can use the identity cos(x) = 1 − sin2 (x) for x ∈ [− π2 , π2 ].
Beyond essentials: logarithmic differentiation. Although this rule is a bit

more advanced, it is actually quite simple and very useful. It say that
′
f ′ (x) = f (x) · ln f (x)
whenever f (x) > 0. The main point of this rule is that sometimes it is easier to
calculate the derivative of the logarithm ln f (x) of a function f (x) instead of calcu-
lating f ′ (x) directly. The reason why we need f (x) to be positive is because we are
only allowed to plug in positive values into ln(x). When f (x) is negative, we can
use a more general rule
′
f ′ (x) = f (x) · ln | f (x)| .
The logarithmic differentiation rule can be used, for example, to prove the power
rule (xn )′ = nxn−1 for all n. If you recall, we only checked this rule in a few special
cases, but for general n it can be obtained by logarithmic differentiation. For the
explanation of the logarithmic differentiation rule and the demonstration of the
power rule, see the footnote link.14
13 https://en.wikipedia.org/wiki/Pythagorean_trigonometric_identity
14 https://youtu.be/hwrTON7VAGw
Answer to Exercise 1. √ √
(a) f (x) = 2x , g(x) = x, ( f (g(x)))′ = ln(2)2 x 2√ 1
x
.
2 ′ 2x 2
(b) f (x) = ln(x), g(x) = x , ( f (g(x))) = x2 = x .
√ cos(x)
(c) f (x) = x, g(x) = sin(x), ( f (g(x)))′ = √ .
2 sin(x)
sin(x)
(d) f (x) = ln(x), g(x) = cos(x), ( f (g(x)))′ = − cos(x) = − tan(x).
(e) Since 23x = (23 )xp= 8x and 32x = p (32 )x = 9x , we can simplify
23x 32x = 8x 9x =
(8 · 9)x = 72x and (23x 32x )7 = (72x )7 = (72x )7/2 = (727/2 )x . This means
that we can apply the rule (ax )′ = ln(a)ax with a = 727/2 to get the derivative
ln(727/2 )(727/2 )x = 27 ln(72)(727/2 )x . It is important to try to simplify, if possible,
before taking derivatives.
(f) This might look like we need the quotient rule. However, we can rewrite
2cos(x) eln(2) cos(x)

= ln(3) sin(x) = eln(2) cos(x)−ln(3) sin(x)
3sin(x) e
and use the chain rule with f (x) = ex and g(x) = ln(2) cos(x) − ln(3) sin(x) to get
the derivative
eln(2) cos(x)−ln(3) sin(x) (− ln(2) sin(x) − ln(3) cos(x))

2cos(x)
=− (ln(2) sin(x) + ln(3) cos(x)).
3sin(x)
Answer to Exercise 2. d
dx g( f (x))|x=1 = g′ ( f (1)) f ′ (1) = g′ (2) f ′ (1) = −1·2 = −2.
Answer to Exercise 3. Notice that time changed to seconds from minutes and
quantity changed to grams from kilograms. First, the quantity (in kg) of the chem-
t t
ical produced up to time t seconds will be equal to f ( 60 ), because t sec = 60
min. The rate of production is its derivative and, using the chain rule, ( f ( 60t
))′ =
′ t 1 ′ t 1
f ( 60 ) 60 , which is measured in kg/sec. Since we want g/sec, we translate f ( 60 ) 60
kg/sec = f ′ ( 60 ) 60 × 1000 g/sec = f ′ ( 60 ) 6 g/sec.
t 1 t 100
Answer to Exercise 4. We could estimate the first derivative of the product in two
∆y
ways. First, since x = 2.3 is between 2 and 3, we could estimate by ∆x between
these two points with y = f (x)g(x), so
d f (3)g(3) − f (2)g(2) 0 · 0 − 0.5 · 0.25

f (x)g(x) ≈ = = −0.125.
dx x=2.3 3−2 1
Another way is to use the product rule f ′ (2.3)g(2.3) + f (2.3)g′ (2.3) first and then
estimate f (2.3), g(2.3), f ′ (2.3) and g′ (2.3). To estimate the derivatives, we could
∆y
use ∆x between x = 2 and x = 3,
f (3) − f (2) g(3) − g(2)

f ′ (2.3) ≈ = −0.5, g′ (2.3) ≈ = −0.25.
3−2 3−2
92 2 Derivatives
To estimate f (2.3) we could use the straight line connecting (2, 0.5) and (3, 0).
We already computed its slope, m = −0.5, and it passes through the point (3, 0),
so the line is y = 0 − 0.5(x − 3) and f (2.3) ≈ −0.5(2.3 − 3) = 0.35. Similarly,
to estimate g(2.3) we could use the straight line connecting (2, 0.25) and (3, 0),
which is y = −0.25(x − 3), so g(2.3) ≈ −0.25(2.3 − 3) = 0.175. Finally, plugging
in all the estimates,
f ′ (2.3)g(2.3) + f (2.3)g′ (2.3) ≈ (−0.5)0.175 + 0.35(−0.25) = −0.175.

d f (x)
For the derivative dx g(x) x=2.3 , the first method will not work, because the ratio
0
0 is undefined at x = 3, so let us use the second method via the quotient rule:
d f (x) f ′ (2.3)g(2.3) − f (2.3)g′ (2.3)

=
dx g(x) x=2.3 g(2.3)2
(−0.5)0.175 − 0.35(−0.25)
≈ = 0.
0.1752
Actually, whenever we have 00 at one of the endpoints, we will always get the
f (x)
estimate for the derivative of g(x) equal to 0 if we use straight lines to approxi-
mate each function. This is because, given our line estimates f (x) ≈ −0.5(x − 3)
and g(x) ≈ −0.25(x − 3) between x = 2 and x = 3, if we divide them we will get
f (x) −0.5(x−3)
g(x) ≈ −0.25(x−3) = 2, so our estimate of the ratio is constant, and its derivative is
0.
1+x2 −2x2 ln(x) ln 2)2x
Answer to Exercise 5. (a) −2e−x sin(x). (b) x(1+x2 )2
. (c) − (2
(1+2x )3
.
2 2x
(d) 22x ln(1 + x2 ) + 2 ln(2)x22x ln(1 + x2 ) + 2x1+x2 2 . If we write 22x = 4x then we can
2 4x
also write this as 4x ln(1 + x2 ) + ln(4)x4x ln(1 + x2 ) + 2x 1+x2
.
−0.5
Answer to Exercise 6. (b) ( f −1 )′ (30, 000) ≈ ∆r
∆c = −500 = 0.001 %
$.
Answer to Exercise 7. (a) (g−1 )′ (1.5) = g′ 1(2) = −0.5

1
= −2.
(b) dx g ( f (x))|x=3 = (g ) ( f (3)) f (3) = −2(g ) (2) = g−2
d −1 −1 ′ ′ −1 ′
′ (1) =
−2
−1 = 2.
Answer to Exercise 8. If b = sin(a) then

1 1 1 1
arcsin′ (b) = ′ = =√ =√ .
sin (a) cos(a) 1 − sin2 (a) 1 − b2
2.5 First applications: old and new 93
2.5 First applications: old and new

We have learned how to calculate the derivatives of many functions, and now we
will begin to apply this skill. We will start with two topics that are already familiar
to us on a conceptual level – tangent lines and shapes of graphs – and we will
combine our conceptual understanding with explicit calculations of derivatives.
After that we will take a look at a new topic – implicit differentiation.
Tangent line, again. We know the formula
y = f (a) + f ′ (a)(x − a)
for the tangent line to y = f (x) at x = a. In addition to being called the tangent line,
this linear function is sometimes also called:
• local linearization of f (x) near x = a;
• best linear approximation to f (x) near x = a.
The names reflect that f (x) is well approximated by its tangent line locally near
x = a, i.e. f (x) ≈ f (a) + f ′ (a)(x − a) near x = a. It is a good idea to remember a
few special cases:
ex ≈ 1 + x, sin(x) ≈ x, ln(1 + x) ≈ x, (1 + x)κ ≈ 1 + κx,
all near x = 0. In the last one, κ is any constant power.
Example 1. Check that ex ≈ 1 + x and sin(x) ≈ x near x = 0.

Solution: Because (ex )′ = ex and e0 = 1, the local linearization of ex near x = 0 is
1 + 1 · (x − 0) = 1 + x, so ex ≈ 1 + x there. Similarly, because (sin x)′ = cos x and
sin 0 = 0, cos 0 = 1, the local linearization of sin x near x = 0 is 0 + 1 · (x − 0) = x,
and sin x ≈ x there.
Exercise 1. Check that ln(1 + x) ≈ x and (1 + x)κ ≈ 1 + κx near x = 0.
Looking at the figure we see that:

• If the function is concave up then tangent line is
below, so using the tangent line to approximate
the function will underestimate. We can see that
the function is concave up by looking at its graph,
or checking that f ′′ (x) > 0.
• If the function is concave down then tangent line
is above, so using the tangent line to approximate
the function will overestimate. We can see that the
function is concave down by looking at its graph,
or checking that f ′′ (x) < 0.
94 2 Derivatives
• The error of approximation at a point x is the absolute value of the difference

between f (x) and its tangent line approximation y = f (a) + f ′ (a)(x − a).
Example 2. Without using a calculator decide which number is bigger, e0.1 or 1.1?
Solution: 1.1 = 1 + 0.1 is the tangent line y = 1 + x to the exponential function
y = ex at x = 0 evaluated at x = 0.1, so to compare 1 + 0.1 with e0.1 we need
to determine whether 1 + x is smaller or bigger than ex . We know that the graph
of y = ex is concave up, but now we can also check it using formulas, because
(ex )′′ = ex > 0. This implies that the tangent line underestimates, so 1.1 < e0.1 .
The error of approximation is |e0.1 − 1.1| = 0.00517 . . . .
√
Exercise 2. (a )Without using a calculator decide which number is bigger, 1.1 or
1.05? (b) The speed of sound in dry (0% humidity) air at temperature T °C can be
calculated as r
T
331.3 1 + m/s.
273.15
Simplify this assuming that the temperature is not far from 0°C. What is the error
of approximation at 25°C.
Example 3. The equation ln(1 + x) + x = 41 has a solution near x = 0. Find an
approximate solution by local linearization near x = 0.
Solution: We know that local linearization near 0 of ln(1 + x) is x, so we can say
that 41 = ln(1 + x) + x ≈ x + x = 2x. This means that 14 ≈ 2x or x ≈ 18 = 0.125.
Using the calculator we can find that the actual solution is x = 0.1288 . . ., so our
approximation 0.125 is pretty close.
√ 3
Exercise 3. The equation 1 + x + x3 − x2 =√0.1 has a solution near x = 0. Find an
approximate solution by local linearization 1 + a ≈ 1 + a2 near a = 0.
Shapes of graphs. We know that the sign of f ′ (x) determines whether the func-
tion y = f (x) is increasing or decreasing, and the sign of f ′′ (x) determines whether
the function is concave up or concave down. Let us now combine this information
with explicit calculations of derivatives.
Example 4. Find where the function y = 2x3 − 3x2 − 12x + 1 is increasing, decreas-
ing, concave up, and concave down. Sketch its graph.
Solution: The first derivative is y′ (x) = 6x2 − 6x − 12 =
6(x2 −x−2). We can check that x2 −x−2 = 0 when x =
−1 and x = 2. Between −1 and 2 (for example at x = 0)
x2 − x − 2 < 0 is negative, and it is positive outside of
[−1, 2]. This means that the function 2x3 −3x2 −12x+1
is decreasing on (−1, 2) and increasing on (−∞, −1)
and (2, ∞). The second derivative is y′′ (x) = 6(2x − 1),
it is equal to zero at x = 0.5, negative to the left and
positive to the right of 0.5. So the function is concave
down on (−∞, 0.5) and concave up on (0.5, ∞). In particular, x = 0.5 is an inflection
point. With this information, we can sketch the general shape of this function as in
the figure.
Exercise 4. Find where the function ln(1 + x2 ) is increasing, decreasing, concave
up, and concave down. Sketch its graph.
Implicit differentiation. Sometimes the rela-

tionship between variables x and y is not given by
a simple formula y = f (x) but, instead, by some
complicated equation in terms of x and y that we
cannot solve for y explicitly. For example, consider
an equation
sin(x + y) − cos(xy) + 1 = 0.
The blue curves in the figure show all the points

(x, y) that satisfy this equation in a part of the x-y plane, and we can see that for
a given x there could be many possible y. More importantly, we cannot solve this
equation for y explicitly. Suppose that we are interested in a particular point on this
curve, for example, A = (2, 2.383 . . .) in the figure. A piece of the curve near this
points is given by some function y = y(x). It is called an implicit function because
we do not know it explicitly and only know that it satisfies the equation
sin(x + y(x)) − cos(xy(x)) + 1 = 0.
Can we find the derivative y′ (2), which is the slope of the tangent line in the figure,
without knowing this function? The answer is yes, using implicit differentiation.
Example 5. Compute the derivative y′ (2) and the point A = (2, 2.383 . . .) in the
figure.
Solution: What implicit differentiation means is that we differentiate the above
equation sin(x + y(x)) − cos(xy(x)) + 1 = 0, pretending that we know y(x) and
using the chain rule, and then solving for y′ (x) at the end. If this equation is true
then ′
sin(x + y(x)) − cos(xy(x)) + 1 = (0)′ = 0
is also true. First, we use the chain rule,
cos(x + y(x)) · (x + y(x))′ + sin(xy(x)) · (xy(x))′ = 0.
Then we use whatever rule is necessary for the remaining derivatives, in this case
the sum rule and the product rule,
cos(x + y(x)) · (1 + y′ (x)) + sin(xy(x)) · (y(x) + xy′ (x)) = 0.
Notice that, at this step, we simply write y′ (x) for the derivative of y(x) formally,
without knowing what it is. However, the good news is that we can now solve
96 2 Derivatives
the above equation for y′ (x) by multiplying out, collecting all the terms with y′ (x)
together, and moving all the other terms to the other side of the equation,
cos(x + y(x)) + sin(xy(x))x y′ (x) = − cos(x + y(x)) − sin(xy(x))y(x).

Now we can divide by cos(x + y(x)) + sin(xy(x))x to get
cos(x + y(x)) + sin(xy(x))y(x)

y′ (x) = − .
cos(x + y(x)) + sin(xy(x))x
Finally, since we are interested in the point A = (2, 2.383 . . .), this means that x = 2
and y(2) = 2.383 . . ., so we can plug in these values into the formula
cos(2 + 2.383) + sin(2 · 2.383)2.383
y′ (2) = − = −1.1648 . . . .
cos(2 + 2.383) + sin(2 · 2.383)2
The calculation we just did will be a bit cleaner if we write y instead of y(x) and y′
instead of y′ (x), keeping in mind that y and y′ depends on x:
(sin(x + y) − cos(xy) + 1)′ = (0)′ = 0
cos(x + y) · (x + y)′ + sin(xy) · (xy)′ = 0

cos(x + y) · (1 + y′ ) + sin(xy) · (y + xy′ ) = 0
cos(x + y) + sin(xy)x y′ = − cos(x + y) − sin(xy)y

cos(x + y) + sin(xy)y
y′ = − ,
cos(x + y) + sin(xy)x
and then plug in x = 2 and y = 2.383. Also, we could plug in the values x = 2 and
y = 2.383 before solving for y′ , which would actually make solving for y′ much
easier. Make sure to take advantage of this in the next two exercises.
Exercise 5. A bagel with an inner radius r and an

outer radius R has volume V = 14 π 2 (R + r)(R − r)2 .
Given a fixed amount of dough, we can change the
shape of the bagel. If the volume of the bagel is V =
4π 2 , find the derivative dR
dr when r = 1 and R = 3.
Exercise 6. Find the tangent line to the curve x2 + 2xy − y3 = 7 at (2, 1).
d 1
Answer to Exercise 1. Because dx ln(1 + x)|x=0 = 1+x |x=0 = 1 and ln(1 + 0) = 0,
the best linear approximation of ln(1 + x) near x = 0 is 0 + 1 · (x − 0) = x, so
d
ln(1 + x) ≈ x there. Because dx (1 + x)κ |x=0 = κ(1 + x)κ−1 |x=0 = κ and (1 + 0)κ =
1, the best linear approximation of (1 + x)κ near x = 0 is 1 + κ · (x − 0) = 1 + κx,
and (1 + x)κ ≈ 1 + κx there.
Answer to Exercise 2.√(a) We need to use the formula (1 + x)κ ≈ 1 + κx near

x = 0 with κ =√21 , so 1 + x ≈ 1 + 2x near x = 0. In other words, 1 + 2x is the
√
√ line to√ 1 + x at x = 0. We know that the graph of x is concave down,
tangent
and 1 + x is x shifted to √ the left by 1, so it is also concave down. We
√ can also
check this using formulas: ( 1 + x)′ = 21 (1 + x)−1/2 and, therefore, ( 1 + x)′′ =
− 14 (1 + x)−3/2 < 0. For a concave down function the tangent line overestimates, so
√ √ √
1.1 = 1 + 0.1 < 1 + 0.1 2 = 1.05. The error of approximation is | 1.1 − 1.05| =
0.00119 . . . . √
(b) Since 1 + x ≈ 1 + 2x near 0,
r
T T
331.3 1 + ≈ 331.3 1 + = 331.3 + 0.606T.
273.15 2 · 273.15
At T = 25, the original formula gives 346.1292, while the tangent line gives
346.461, so the error of approximation is 0.3318 m/s.
3
√
3
√ think ofa a = x+x inside 1 + x + x as one number
Answer to Exercise 3. We can
and use local linearization 1 + a ≈ 2 to write
p x3 x + x3 x3 x
0.1 = 1 + x + x3 − ≈ 1+ − = 1+ .
2 2 2 2
This gives that 0.1 ≈ 1 + 2x and, solving for x, we get x ≈ 1.8. Using the calculator
we can find that the actual solution is x = 0.177379 . . ., so our approximation 1.8
is pretty close.
Answer to Exercise 4. The first derivative is y′ (x) =

2x
1+x2
. It is equal to 0 at x = 0, negative to the left
and positive to the right of x = 0. This means that
the function ln(1 + x2 ) is decreasing on (−∞, 0)
and increasing on (0, ∞). The second derivative is
2)
y′′ (x) = 2(1−x
(1+x2 )2
, it is equal to zero at −1 and 1, pos-
itive in between on the interval (−1, 1) and negative
outside. The function is concave up on (−1, 1) and
concave down on (−∞, −1) and (1, ∞). In particu-
lar, x = −1 and x = 1 are inflection points. With this
information, we can sketch the general shape of this function as in the figure.
Answer to Exercise 5. If 14 π 2 (R + r)(R − r)2 = 4π 2 then (R + r)(R − r)2 = 16.

Keeping in mind that R = R(r) is a function of r and R′ means R′ (r), if we differ-
98 2 Derivatives
entiate the equation ((R + r)(R − r)2 )′ = (16)′ = 0, we get
(R′ + 1)(R − r)2 + (R + r)2(R − r)(R′ − 1) = 0,
Before solving for R′ , if plug in r = 1, R = 3 first, we get 4(R′ +1)+16(R′ −1) = 0,

or 20R′ − 12 = 0. Solving for R′ , we get R′ = 0.6.
Answer to Exercise 6. Differentiating (x2 + 2xy − y3 )′ = (7)′ = 0 we get 2x + 2y +

2xy′ − 3y2 y′ = 0. Plugging in x = 2 and y = 1 we get 4 + 2 + 4y′ − 3y′ = 0, so
y′ = −6.
2.6 Critical points 99
2.6 Critical points

In the next section we will focus on solving optimization problems, which is one
of the most important applications of derivatives, but first we need to learn about
critical points and how to use them to find local minima and local maxima. A
point x in the domain of a function y = f (x) is called a critical point if one of the
following two things happen:
• the derivative f ′ (x) = 0, • the derivative f ′ (x) is undefined.
A quick summary about the critical

points can be found in the footnote
link15 and the figure on the right. Points
B,C and D in the figure are where
f ′ (x) = 0 (so the tangent line is horizon-
tal), and points A and E are where the
derivative is undefined. Point A is a cor-
ner and the function has different slopes
to the right and left of it, and point E
has a vertical tangent line, whose slope
is infinite, so undefined.
• A point x in the domain of a function y = f (x) is called a local maximum or
local minimum if nearby x the maximum or minimum value is reached at this
point x. For example, the points A and D in the figure are local minima, point B
is a local maximum, and points C and E are neither.
• To find local maxima and minima we usually need to find critical points first,
because local minima or maxima must be critical points when they are inside
the domain (not the endpoints). This is why the critical points are so important.
We will discuss what happens at the endpoints in the examples below and next
section.
When we find a critical point, how do we decide if it is a local minimum, or
local maximum, or neither? We can use the first derivative test (FDT) or second
derivative test (SDT). We have already seen them implicitly when we looked at
graphs of functions, but now we will spell them out explicitly. We will always
assume that a function is at least continuous at the critical point x = a.
• (Local max) Critical point x = a is a local maximum if (FDT) f ′ (x) > 0 to the
left of a and f ′ (x) < 0 to the right of x, or (SDT) f ′ (a) = 0 and f ′′ (a) < 0.
• (Local min) Critical point x = a is a local minimum if (FDT) f ′ (x) < 0 to the
left of a and f ′ (x) > 0 to the right of x, or (SDT) f ′ (a) = 0 and f ′′ (a) > 0.
In the case of SDT, the second derivative f ′′ (a) is defined only if the first derivative
is defined, so the critical point must be of the first type, f ′ (a) = 0. If the derivative is
15 https://youtu.be/URm5AQwOkLQ
100 2 Derivatives
undefined, we cannot use the SDT. The meaning of FDT is simple – if the function
is increasing up to and decreasing after x = a then the point is a local maximum
(like point B in the above figure). In the case of SDT, if f ′′ (a) < 0 then f ′ (x) is
decreasing at x = a and, because f ′ (a) = 0, the derivative changes from positive to
negative, so the point again must be a local maximum. If f ′′ (0) = 0 then the SDT
is inconclusive and we need to use the FDT. Similar reasoning works for local
minimum.
Example 1. Find all critical points and determine which ones are local minima or
2
maxima for (a) f (x) = 3x5 − 5x3 , (b) f (x) = e−x .
Solution: (a) Since f ′ (x) = 15x4 − 15x2 , it is defined everywhere. To find critical
points we need to solve 15x4 − 15x2 = 0, or x2 (x2 − 1) = x2 (x − 1)(x + 1) = 0. The
solutions are x = −1, 0, 1, so the function has three critical points where the tangent
line is horizontal. To decide which ones are local minima or maxima, let us start
with the first derivative test. By checking the sign of the derivative on each interval,
x (−∞, −1) (1, 0) (0, 1) (1, ∞)

f ′ (x) + − − +
we see that −1 is a local maximum, +1 is a local minimum. The point x = 0 is

neither a local maximum or minimum, because the function is decreasing before
and after this point.
We can also try a second derivative test. Since f ′′ (x) = 15(4x3 − 2x) = 30(2x3 −
x), we can see that f ′′ (−1) < 0, f ′′ (+1) > 0 and f ′′ (x) = 0. So the second deriva-
tive test tells us that −1 is a local maximum, +1 is a local minimum, and it is
inconclusive (does not tell us anything) at x = 0. So at x = 0 we have to use the
first derivative test to see what is happening.
2 2
(b) Since f ′ (x) = e−x (−2x) = −2xe−x , it is defined everywhere. To find critical
2
points we need to solve −2xe−x = 0. There is only one solution x = 0, so the
function has one critical point. The derivative is positive on (−∞, 0) and negative
on (0, ∞), so x = 0 is a local maximum. We can also use the second derivative test:
2 2
f ′′ (x) = −2e−x + 4x2 e−x and f ′′ (0) = −2 < 0, so again we see that x = 0 is a
local maximum.
Exercise 1. Find all critical points and determine which ones are local minima or
maxima for (a) f (x) = x − 3 ln(x), (b) f (x) = x4 − 18x2 + 1.
Example 2. If a > 0 and b > 0 are √ positive constants, find
all critical points f (x) = ax + b x and determine which
ones are local minima or maxima.
Solution: Since f ′ (x) = a + 2√b
x
and a and b are posi-
tive, the derivative is always positive and cannot equal
to 0. However, it is undefined at x = 0 where we divide
by zero, so x = 0 is the only critical point. To determine
whether it is a local maximum or minimum, here we cannot use the first or second
√ test, because x = 0 is not inside the domain. The domain of the function
derivative
ax + b x is x ≥ 0, so x = 0 is the left endpoint. Because the function is increasing
to the right of it, x = 0 is a local minimum. As we will discuss in the next section,
endpoints are often local or global minima or maxima even if they are not critical
points, so they require special attention.
Exercise 2. If a > 0 and b > 0 are positive constants, find all critical points f (t) =
aet + be−t and determine which ones are local minima or maxima.
In the next four problems we will refer to the following figures.
Example 3. In the figure above on the left, we see a graph of the derivative f ′ (x) of
some continuous function f (x). List all critical points of f (x) and determine which
ones are local minima or maxima. Also identity all inflections points of f (x).
Solution: Critical points are where the derivative is zero, x = −6, −1, 4 and 9, and
where it is undefined, x = −2. By the first derivative test, local maxima are where
the derivative changes from positive to negative, which happens at x = −2 and 9.
Local minima are where the derivative changes from negative to positive, which
happens at x = −6 and −1. Critical point x = 4 is neither a local minimum or
maximum, because the derivative is positive on both sides, so the function f (x)
increases before and after x = 4. Notice that the local maximum x = −2 is a corner
(kink) of f (x) because the slope f ′ (x) jumps suddenly from +2 to −2. Finally,
inflection points of f (x) are where the derivative f ′ (x) changes direction from in-
creasing to decreasing or vice versa, so x = −4.4, −2, 0.6, 4, 7.6 (some values are
approximate because it is hard to see exactly from the figure).
Exercise 3. In the figure above on the right, we see a graph of the derivative f ′ (x) of
some continuous function f (x). List all critical points of f (x) and determine which
ones are local minima or maxima. Also identity all inflections points of f (x).
Example 4. In the figure above on the left, we see a graph of the derivative f ′ (x)
of some continuous function f (x). On the interval [0, 8], where does the function
f (x) grow most rapidly and decay most rapidly?
102 2 Derivatives
Solution: That the function f (x) grows most rapidly means that its derivative is
as large as possible on this interval [0, 8], which happens at about x = 7.6. The
function f (x) decays most rapidly means that its derivative is as small as possible
on this interval [0, 8], which happens at x = 4. Notice that these points are inflection
points of the original function, because f ′ (x) changes direction here.
Exercise 4. In the figure above on the right, we see a graph of the derivative f ′ (x)
of some continuous function f (x). On the interval [−2, 8], where does the function
f (x) grow most rapidly and decay most rapidly?
Example 5. Suppose that f (x) has a continuous derivative and we know its values
in the following table:
x 0 1 2 3 4 5 6 7
f ′ (x) 1 −0.5 −0.25 0.5 1 1.5 0.5 −1
Estimate the coordinates of the critical points of f (x) on the interval [0, 7] and
determine which ones are local minima or maxima.
Solution: We see that the derivative changes sign (so crosses 0) somewhere in
between 0 and 1, 2 and 3, and 6 and 7. If we connect two neighbouring points
in the table by a line, between 0 and 1 the slope is −0.5−1 1−0 = −1.5, so the line
is y = 1 − 1.5(x − 0). We want to know where it crosses zero, so we solve the
1
equation 1 − 1.5(x − 0) = 0 and get x = 1.5 = 23 . This point is a local maxi-
mum, because f ′ (x) changes from positive to negative, so the function changes
from increasing to decreasing. Between 2 and 3 the slope is 0.75, so the line is
y = −0.25 + 0.75(x − 2). Solving −0.25 + 0.75(x − 2) = 0 we get x = 2 31 . This
point is a local minimum, because f ′ (x) changes from negative to positive. Finally,
between 6 and 7 the slope is −1.5, so the line is y = 0.5 − 1.5(x − 6). Solving
0.5 − 1.5(x − 6) = 0 we get x = 6 13 . This point is a local maximum, because f ′ (x)
changes from positive to negative. Of course, all critical points are only estimates,
because we do not know the function exactly.
Exercise 5. Suppose that f (x) has a continuous derivative and we know its values
in the following table:
x 0 0.5 1 1.5 2 2.5 3 3.5

f ′ (x) 1 0.5 −1 −2 −3 −1 2 1
Estimate the coordinates of the critical points of f (x) on the interval [0, 7] and
determine which ones are local minima or maxima.
Example 6. Given the graphs of functions f (x) (solid green curve) and g(x)
(dashed blue curve) below, find the critical points of f (g(x)).
Solution: First of all, using the chain rule,

( f (g(x)))′ = f ′ (g(x))g′ (x). This derivative is equal
to zero if either g′ (x) = 0 or f ′ (g(x)) = 0. From the
graph of y = g(x) we see that g′ (x) = 0 at x = 0,
where its slope is zero. From the graph of y = f (x)
we see that f ′ (x) = 0 at x = −2 and x = 2. This
means that f ′ (g(x)) = 0 when g(x) = −2 or g(x) =
2. We see that g(x) is never −2, but g(x) = 2 at
x = −3 and x = 3. So ( f (g(x)))′ = 0 at x = −3, 0, 3.
Bonus. If we were asked to determine local maxima and minima, we could use
the second derivative test. The second derivative is equal to
′′ ′
f (g(x)) = f ′ (g(x))g′ (x) = f ′′ (g(x))(g′ (x))2 + f ′ (g(x))g′′ (x).
Plugging in x = 0, f ′′ (g(0))(g′ (0))2 + f ′ (g(0))g′′ (0) = f ′ (0)g′′ (0) < 0, because

f ′ (0) < 0 and g′′ (0) > 0. So x = 0 is a local maximum. Plugging in x = −3,
f ′′ (g(−3))(g′ (−3))2 + f ′ (g(−3))g′′ (−3) = f ′′ (2)(g′ (−3))2 + f ′ (2)g′′ (−3)

= f ′′ (2)(g′ (−3))2 > 0,
because f ′′ (2) > 0 and (g′ (−3))2 > 0. So x = −3 is a local minimum. Similarly,
we can check that x = 3 is also a local minimum.
Exercise 6. Given the graphs of functions f (x) and g(x) in the above example, find
the critical points of g( f (x)).
Example 7. After a dog jumps into a pool, a beach ball starts
floating up and down on the waves and its height is given by
h(t) = e−t (cos(t) + sin(t)).
Find the critical points for t ≥ 0 and explain what happens at

those points.
Solution: Using the product rule, the derivative h′ (t) is
−e−t (cos(t) + sin(t)) + e−t (− sin(t) + cos(t)) = −2e−t sin(t).
It is equal to zero when sin(t) = 0, so when t = 0, π, 2π, 3π, . . . . At these times

the ball will be at the top or bottom of the wave. For example, at t = π, sin(t)
changes sign from positive to negative, so −2e−t sin(t) changes sign from negative
to positive, meaning that t = π is a local minimum. We can check other points
similarly.
Exercise 7. Suppose that constant a and b are positive and not equal to each other,
a ̸= b. Show that y = e−ax − e−bx has a unique critical point. Is this critical point
positive or negative?
104 2 Derivatives
Answer to Exercise 1. (a) x = 3 is the only critical point, and it is a local minimum.
(b) The critical points are −3, 0, 3; −3 and 3 are local minima, and 0 is a local
maximum.
Answer to Exercise 2. f ′ (t) = aet − be−t = 0 when e2t = ab , or t = 12 ln ba . In this

case, it is easier to use the second derivative rule, because f ′′ (t) = aet + be−t =
0 > 0, so the critical point t = 21 ln ba is a local minimum. We could also use the first
derivative rule. When t → −∞, the term aet gets small and the term be−t gets large,
so f ′ (t) = aet − be−t gets large and negative. When t → +∞, the term aet gets
large and the term be−t gets small, so f ′ (t) = aet − be−t gets large and positive.
The derivative changes from negative to positive, so the critical point must be a
local minimum.
Answer to Exercise 3. Critical points are x = −6, −3, −2, 4, 8. Local maxima are
x = −6, −2, local minima are x = −3, 4. Point x = 8 is neither. Inflection points
are x = −4.5, −3, 0.5, 5.5, 8.
Answer to Exercise 4. The function f (x) grows most rapidly means that its deriva-
tive is as large as possible on this interval [−2, 8], which happens at about x = 5.5.
The function f (x) decays most rapidly means that its derivative is as small as pos-
sible on this interval [−2, 8], which happens at x = 0.5.
Answer to Exercise 5. Between 0.5 and 1 the slope is −1−0.51−0.5 = −3, so the line is
y = 0.5−3(x −0.5). Solving 0.5−3(x −0.5) = 0 we get x = 32 . This point is a local
maximum, because f ′ (x) changes from positive to negative. Between 2.5 and 3 the
2+1
slope is 3−2.5 = 6, so the line is y = −1 + 6(x − 2.5). Solving −1 + 6(x − 2.5) = 0
we get x = 2 3 . This point is a local minimum, because f ′ (x) changes from negative
2
to positive.
Answer to Exercise 6. Using the chain rule, (g( f (x)))′ = g′ ( f (x)) f ′ (x). This
derivative is equal to zero if either f ′ (x) = 0 or g′ ( f (x)) = 0. From the graph
of y = f (x) we see that f ′ (x) = 0 at x = −2 and 2, where its slope is zero.
From the graph of y = g(x) we see that g′ (x) = 0 at x = 0. This means that
g′ ( f (x)) = 0 when f (x) = 0, which happens at x = −3.5, 0, 3.5. So (g( f (x)))′ = 0
at x = −3.5, −2, 0, 2, 3.5.
Answer to Exercise 7. y′ (x) = −ae−ax + be−bx = 0 when e(a−b)x = ab , or x =

1 a a a
a−b ln b . If a < b then a − b is negative, and ln b is also negative because b < 1, so
a
the critical point is positive. If a > b then a − b is positive, and ln b is also positive
because ba > 1, so the critical point is positive. No matter what a and b are (as long
as they are positive and not equal), the critical point is positive.
2.7 Optimization problems 105
2.7 Optimization problems

In this section we will study optimization problems where the goal is to find the
maximum or minimum value of some function y = f (x) on some given domain. In
many problems we will have to translate a word problem in a formula f (x) first.
The domain of f (x) will depend on the problem, but the most common case will
be when the domain is a closed interval [a, b], in which case we need to
• find all critical points on the interval [a, b],
• compare the values of y = f (x) at critical points and endpoints a and b.
The reason why we need to check the endpoints is because they can be local or
global maximum or minimum even if they are not critical points.
4
Example 1. Find global minima and maxima of f (x) = 1
2 + (x−1)
8 − (x − 1)2 on
the interval [−2, 2].
Solution: First we find critical points:
(x − 1)3
f ′ (x) = − 2(x − 1) = 0
2
2
when x − 1 = 0, i.e. x = 1, or (x−1)
2 − 2 = 0. We can
rewrite this as (x − 1)2 = 4, or x − 1 = ±2, so x = −1
and x = 3. The point x = 3 is outside of the interval
[−2, 2], so x = −1 and x = 1 are critical points inside
(−2, 2). We could check whether they are local minima or maxima, but this is not
necessary because we are looking for global min and max. We can simply plug in
these critical points and the endpoints into f (x) and compare the values:
f (−1) = −1.5, f (1) = 0.5, f (−2) = 1.625, f (2) = −0.375.
This means that x = −1 is the global minimum and x = −2 is global maximum on

the interval [−2, 2]. This matches the above figure. Notice that the endpoint x = −2
is the global maximum on [−2, 2], even though it is not a critical point.
(x−1)2
Exercise 1. Find global minima and maxima of g(x) = − 21 + 2(x − 1)e− 2 on
[−2, 2].
Example 2. A restaurant sells pizza of diameter d between
10′′ and 20′′ for the price of P(d) = 40
π
(d 2 − 100 ln(d/10))
dollars. Which diameter pizza is the best deal in terms of
price per square inch?
Solution: Area of pizza is A(d) = πr2 = 41 πd 2 , so dividing
the price P(d) by area A(d), we get that the price per square
inch is
P(d) d 2 − 100 ln(d/10) 1 ln(d/10)
= 2
= − 10 .
A(d) 10d 10 d2
106 2 Derivatives
Since ln(d/10) = ln(d) − ln(10), its derivative is (ln(d/10))′ = d1 , so using the

quotient rule
1
P(d) ′
d · d 2 − ln(d/10) · 2d 1 − 2 ln(d/10)
= 0 − 10 · 4
= −10 · .
A(d) d d3
Derivative is undefined at d = 0, because we divide by 0, but this is outside the
domain [10, 20]. Derivative is equal to zero when
d d 1 d √ √
1 − 2 ln = 0 ⇒ ln = ⇒ = e1/2 = e ⇒ d = 10 e = 16.487,
10 10 2 10
which is inside the domain. Plugging into P(d)

A(d) , we get that the price per square
inch is 0.0816 when d = 16.487, 0.1 when d = 10, and 0.0826 when d = 20. So
the best deal is 16.487′′ pizza.
Exercise 2. Suppose that the distribution of grades
in Calculus I follows a special case of the so called
Beta distribution with the shape
1 5
y= x (1 − x)2
168
on the interval [0, 1], shown in the figure. The grade
is measured as the proportion out of 100 and, e.g.
x = 0.81 represents grade 81. What grade is the most likely, or most common one?
In the next few problems we will have two variables, but they will be related
through some given constraint and, as a result, we will be able to eliminate one
variable by expressing it in terms of the other and then optimize as usual. The next
two problems will refer to the following figures.
Example 3. In the figure above on the left, a region consisting of an x × y rectangle

and two semicircles is enclosed by 100 meters of fence. What shape will maximize
the area of this region?
Solution: First of all, recall that the perimeter of a circle of radius r is 2πr, and its
area is πr2 . The radius of each semicircle in the figure is 2y , so the perimeter P and
the area A of the region are
y y 2 πy2
P = 2x + 2π = 2x + πy, A = xy + π = xy + .
2 2 4
The perimeter is 100 meters, so 2x +πy = 100. This means that one of the variables
is completely determined by the other, for example, x = 50 − πy
2 , and we can write
the area in terms of y only,
πy2 πy πy2 πy2

A = xy + = 50 − y+ = 50y − .
4 2 4 4
Now we need to maximize this, keeping in mind that 2x + πy = 100 means that
πy must be between 0 and 100, so y ∈ [0, 100 ′ πy
π ]. Since A (y) = 50 − 2 = 0 when
100
y = π , the only critical point is the endpoint of our domain. This means that we
need to compare area at the endpoints, A(0) = 0, and
100 100 π 100 2
A = 50 − = 795.7747 . . . .
π π 4 π
The shape that maximizes the area corresponds to y = 100 π 100
π and x = 50 − 2 π = 0,
100
which is a circle of diameter y = π . Actually, among all shapes with the same
perimeter, circle will always have the largest area, because it is always advanta-
geous to “stretch the perimeter in all directions”. Here, a circle was one allowed
shape and we saw that, indeed, it maximized the area.
Exercise 3. In the figure above on the right, a region consisting of an x×y rectangle
and one semicircles is enclosed by 100 meters of fence. What shape will maximize
the area of this region? Notice that circle is not an option here.
Example 4. Inside a hemisphere of radius 1 we
want to place a cylinder vertically, as in the figure,
with the largest possible volume. What is the height
h and the radius of the base r of this cylinder?
Solution: The top of the cylinder should touch the
sphere, so the hypotenuse of the right triangle in the
figure is the radius of the sphere, which is 1. This
means that r2 + h2 = 1. On the other hand, the volume of the cylinder is equal to
V = πr2 h, which is the area of the base πr2 times the height h. From r2 + h2 = 1
we get that r2 = 1 − h2 and we can rewrite the volume as
V = πr2 h = π(1 − h2 )h = π(h − h3 ).
The domain is [0, 1] because the height h cannot be bigger than 1, so we want
to maximize π(h − h3 ) on the interval [0, 1]. Since V ′ (h) = π(1 − 3h2 ) = 0 when
h2 = 31 , or h = √13 , this is the only critical point in the domain. We can see that the
volume V is zero at the endpoints h = 0 or h = 1, and V = 1.2091 when h = √13 ,
q
so the cylinder with the largest volume has height h = √13 and radius r = 23 .
108 2 Derivatives
Exercise 4. We want to make one round enclosure and one square enclosure using
ℓ meters of fence total. What part of the fence we should spend on each enclosure
to maximize the total area? What if we wanted to minimize the total area?
In the next example, we will encounter a situation when the domain is not a
finite closed interval [a, b], and so we have to argue a bit more carefully instead of
just comparing critical points and the endpoints.
Example 5. Suppose that in a family of similar drugs, the price of a drug with a
half-life h hours in a patient’s body is h2 dollars per mg. If we have $100 to spend,
what drug should we buy if our goal is to maximize the amount of drug remaining
in a patient’s body 2 hours after administering it.
Solution: $100 is the price of 100h2
mg of drug with the half-life h hours. Half-life h
means that the amount of drug in a patient’s body is decaying exponentially and
the amount remaining after t hours is proportional to 2−t/h . In our case, the amount
left will be 100
h2
2−t/h after t hours, and after 2 hours it will be
100 − 2
a= 2 h.
h2
We want to maximize this over h > 0. We can of course take the derivative to find
critical points, but there is one useful trick that can simplify the calculations. We
see that we divide by h everywhere, so if we rename 1h as x then a = 100x2 2−2x .
Let us maximize this function over x > 0 and then find optimal h = 1x . First,
a′ (x) = 100(2x)2−2x + 100x2 ln(2)2−2x (−2) = 2−2x 200x 1 − ln(2)x = 0

when x = 0 or x = ln(2) 1
= 1.44. If we plug these points into a(x) = 100x2 2−2x ,
we get a(0) = 0 and a(1.44) = 28.16. Does this mean that the global maximum is
1
at x = ln(2) ? Since our domain here is all x ∈ (0, ∞), which is not a finite closed
interval, what do we do about the endpoints? We already checked what happens to
a(x) at x = 0, a(0) = 0, but what about x = ∞?
Of course, we can graph the function and see
1
that x = ln(2) is the global maximum. However,
without a graphical calculator, there are several
ways we can proceed. One way to decide if x =
1.44 is a global maximum is to look at the deriva-
tive a′ (x). We see that a′ (x) < 0 when x > ln(2)
1
because 1 − ln(2)x < 0, so the function a(x) is de-

creasing after this critical point. This shows that
1
indeed x = ln(2) is a global maximum. Another
way would be to notice that
100x2
a(x) = 100x2 2−2x = →0
22x
when x → ∞ because exponential growth 22x dominates the power function x2 at

1
infinity. This also shows that x = ln(2) is a global maximum. Finally, we recall
that the original problem was in terms of the half-life h, and the optimal choice is
h = 1x = ln(2).
Exercise 5. A swimmer is 20 meters

from the closest point A on the beach
and her beach umbrella is at the point
B, which is 60 meters from point A. A
swimmer swims at 1 m/s and runs on
the beach at 4 m/s. Toward which point
C between A and B should the swimmer
aim to minimize the total time to reach
her umbrella?
Example 6. Suppose we have a 10′′ × 10′′ card-

board, and we cut off the four corners of size
x × x and then fold the sides to make a box.
What is the maximum volume of the box V =
V (x)?
Solution: The dimensions of the box will be
(10 − 2x) × (10 − 2x) × x, and the volume is
V (x) = (10 − 2x)2 x. The domain is 0 ≤ x ≤ 5.
Let us find critical points:
V ′ (x) = 2(10 − 2x)(−2)x + (10 − 2x)2

= (10 − 2x)(−4x + (10 − 2x)) = (10 − 2x)(10 − 6x) = 0
10
when x = 5, x = 6. Since V (0) = V (5) = 0, the maximum volume is V ( 10
6 ) ≈ 74.
Exercise 6. The Statue of Liberty

is 46 meters high and it stands on
a pedestal which is also 46 meters
high. At what distance d from the
base is the angle of view of the
statue (angle θ in the figure) as large
as possible?
If the domain is an infinite or open interval, for example [0, ∞) or (0, 1), then a
function might not have a global maximum or minimum, as we will see in the next
two problems.
110 2 Derivatives
Example 7. Give an example of a continuous function that (a) does not have a
global minimum on [0, ∞), (b) does not have a global maximum on (−2, 2).
Solution: (a) For example, an exponential decay function y = e−x does not have a
global minimum on [0, ∞), because it is decreasing and approaching 0 as x → ∞,
but it never actually reaches 0, so there is no point x where e−x takes the smallest
value.
(b) For example, y = x2 does not have a global maximum on (−2, 2), because it
is approaching 4 as x approaches −2 or 2, but it never actually reaches 4 because
−2 and 2 are not in the domain, so there is no point x on (−2, 2) where e−x takes
the largest value.
Exercise 7. Sketch a graph of a differentiable function y = f (x) on the open interval
(−4, 4) such that
• f (x) has at least one local min on • f (x) does not have a global min on
(−4, 4) (−4, 4)
• f (x) has at least one local max on • f (x) has a critical point at x = 3
(−4, 4) which is not a local max or min
• f (x) has a global max on (−4, 4) • f (x) has an inflection point at x = −2
The next two problems will refer to the following two figures.
Example 8. In the figure above on the left we are

given the graph of the second derivative f ′′ (x) of
some function y = f (x) on the interval [−4, 4]. If
f ′ (−1) = 0, where is the global maximum and min-
imum of f (x) on this interval?
Solution: First, we will use that f ′′ (x) is the deriva-
tive of f ′ (x). Because f ′′ (x) in the figure is nega-
tive for x < −1, f ′ (x) is decreasing for x < −1, and
because f ′′ (x) is positive for x > −1, f ′ (x) is in-
creasing for x > −1. This means that x = −1 is the global minimum of f ′ (x). We
are given that f ′ (−1) = 0, so f ′ (x) is never negative, as in the figure on the right.
Once we know that f ′ (x) ≥ 0, this means that f (x) is increasing, so its global max-
imum on the interval [−4, 4] is at x = 4, and its global minimum is at x = −4, as in
the figure.
Exercise 8. In the figure above on the right we are given the graph of the second
derivative f ′′ (x) of some function y = f (x) on the interval [−4, 4]. If f ′ (1) = 0,
where is the global maximum and minimum of f (x) on this interval?
Answer to Exercise 1. First we find critical points:

(x−1)2 (x−1)2
g′ (x) = 2e− 2 − 2(x − 1)2 e− 2 =0
when 2 − 2(x − 1)2 = 0, or (x − 1)2 = 1, or x − 1 =

±1, so x = 0 and x = 2. These points are inside
the interval so it remains to compare the values:
g(0) = −1.713, g(2) = 0.713, g(−2) = −0.566.
This means that x = 0 is the global minimum and
x = 2 is global maximum on the interval [−2, 2].
This matches the above figure.
Answer to Exercise 2. Let us find critical points first:
1 1 x4 (1 − x)
y′ (x) = 5x4 (1 − x)2 − 2x5 (1 − x) = 5(1 − x) − 2x = 0
168 168 168
when x = 0, x = 1, or when 5(1 − x) − 2x = 0, i.e. x = 57 = 0.714. At the endpoints
y(0) = y(1) = 0, while y(0.714) = 2.5499, so the most likely grade is x = 0.714, or
1
about 71. By the way, if you were wondering, the constant 168 was chosen in such
a way that the area under the curve is equal to 1, representing all students.
Answer to Exercise 3. The perimeter P and the area A of the region are
y 2+π 1 y 2 πy2
P = 2x + y + π = 2x + y, A = xy + π = xy + .
2 2 2 2 8
The perimeter is 100 meters, so 2x + 2+π 2+π
2 y = 100, and x = 50 − 4 y, so we can
write the area in terms of y as
2+π πy2 4+π 2
A = 50 − y y+ = 50y − y .
4 8 8
Since A′ (y) = 50 − 4+π 200
4 y = 0 when y = 4+π = 28.0049, this is the only critical
2+π 2+π 200
point. Since 2x + 2 y = 100, 2 y must be between 0 and 100, so y ∈ [0, 2+π ]=
[0, 38.8984], and the critical point 28.0049 is inside this domain. It remains to com-
pare the values: A = 0 at y = 0, A = 700.1239 at y = 28.0049, and A = 594.1889 at
y = 38.8984. Maximal area is A = 700.1239 when y = 28.0049 and x = 14.0024.
112 2 Derivatives
Answer to Exercise 4. If we spend x meters on a round enclosure, we will have

ℓ − x meters left for a square enclosure. If the radius of the round enclosure is r
x x2
then 2πr = x, so r = 2π . Then its area is πr2 = 4π . Similarly, if the side of the
ℓ−x (ℓ−x)2
square enclosure is s then 4s = ℓ − x and s = 4 . Then its area is s2 = 16 . This
means that the total area will be
x2 (ℓ − x)2
A= + .
4π 16
The domain of this function is [0, ℓ] because we can spend anywhere between 0
and ℓ meters on the round enclosure. Let us find critical points:
x ℓ−x ℓ x x 4+π π
A′ (x) = − =0 ⇒ = + = x ⇒ x= ℓ ≈ 0.44ℓ.
2π 8 8 2π 8 8π 4+π
If we plug this into A, we will get 0.035ℓ2 . At the endpoints, A(0) = 0.0625ℓ2 and
A(ℓ) = 0.0795ℓ2 . This means that the largest total area is when we spend all fence
ℓ on one round enclosure, and the smallest total area is when we spend ≈ 0.44ℓ
on the round enclosure and ≈ 0.56ℓ on the square enclosure. Notice that, as in
Example 3, the maximum area is achieved when we make one big circle enclosure,
which was a possible option in this problem.
Answer to Exercise 5. If the distance √ between A and C is x then the swimming dis-
tance between the swimmer and C is 202 + x2 and the running distance between
C and B is 60 − x. Since the swimming speed is 1m/s and running speed is 4m/s,
the total time to reach her umbrella is
√
202 + x2 60 − x p 60 − x
T= + = 400 + x2 + .
1 4 4
We want to minimize this for x between 0 and 60. Let us find critical points:
2x 1 x 1
T ′ (x) = √ − =√ − =0
2 400 + x2 4 400 + x2 4
√
when 4x = 400 + x2 and, squaring both sides,
r
2 2 2 400 400
16x = 400 + x =⇒ x = =⇒ x= = 5.16.
15 15
This critical value is in the domain [0, 60] and plugging it in, we get T = 34.36
seconds. Then we then check the endpoints, T (0) = 35 and T (60) = 63.24, and we
see that the optimal point C is at the distance 5.16 meters from A.
Answer to Exercise 6. Let α be the

angle between the horizontal line
and line of sight to the bottom of the
Statue of Liberty, and let β be the
angle between the horizontal line
and line of sight to the top of the
Statue of Liberty, as in the figure.
Then θ = β − α. From the right tri-
angles, we see that
46 92
tan(α) = and tan(β ) = ,
d d
so α = arctan( 46 92 92 46
d ), β = arctan( d ), and θ = arctan( d ) − arctan( d ). We want to
maximize this angle θ over all distances d > 0. First, let us find the critical points.
Recall that (arctan(x))′ = 1+x
1
2 . Then
1 92 1 46 46 92
θ ′ (d) = 92 2
− 2
− 46
− 2 = 2 2
− 2 .
1+( d ) d 1 + ( d )2 d d + 46 d + 922
If we set this equal to zero and solve for d, we get d 2 = 92 · 46, so d = 65.05.
To check that this is the global maximum, we can notice that θ (d) approaches 0
when both d → 0 and d → ∞. This can be seen from the figure, or check using that
arctan(0) = 0 and arctan(x) → π2 as x → ∞. The best viewing angle of the Statue
of Liberty is at the distance of ≈ 65 meters.
Answer to Exercise 7. The graph is mostly

self-explanatory. The reason there is no global
minimum is that the smallest values of the
function are near x = 4 where f (x) is decreas-
ing but never reaches the smallest value be-
cause x = 4 is not in the domain.
Answer to Exercise 8. Because f ′′ (x) in the figure is positive for x < 1, f ′ (x) is
increasing for x < 1, and because f ′′ (x) is negative for x > 1, f ′ (x) is decreasing
for x > 1. This means that x = 1 is the global maximum of f ′ (x). We are given that
f ′ (1) = 0, so f ′ (x) is never positive. Once we know that f ′ (x) ≤ 0, this means that
f (x) is decreasing, so its global maximum on the interval [−4, 4] is at x = −4, and
its global minimum is at x = 4.
114 2 Derivatives
2.8 Parametric families of functions

The goal of this section is twofold. We will introduce several families of functions
that appear in applications and can be used, for example, for modelling various
growth and decay processes. One the other hand, understanding the shape of these
functions will give us an opportunity to use derivatives, in addition to reviewing
scaling of functions. As a motivation, let us keep in mind the following question,
which will be answered after we introduce these families of functions.
Motivating Question. Match the following families of functions

(t−a)2
−
1. bell curve y = ce ; 2b2
c
2. logistic curve y = −b(t−a)
;
1+e
3. exponential with a limit curve y = c(1 − e−bt );
4. surge function y = cte−bt ;
5. quadratic polynomial y = −at 2 + bt + c;
with the following behaviour of bacterial growth and decay,
(a) the number of bacteria in a Petri dish grows quickly from the beginning, but then
runs out of food and dies out quickly;
(b) the number of bacteria in a Petri dish grows faster and faster initially and then
stabilizes;
(c) the number of bacteria in a Petri dish grows quickly from the beginning until it
stabilizes;
(d) the number of bacteria in a Petri dish grows faster and faster, but then runs out
of food and dies out over time;
(e) the number of bacteria in a Petri dish grows quickly from the beginning, but then
runs out of food and dies out over time.
Bell curve. First, let us introduce the so called bell curve given by
(x−a)2
−
y = ce 2b2
where a ∈ R is any real number, and b > 0, c > 0 are any positive numbers. Various
features of these curves are summarized in the figure below, and we will check
some of them in the next two problems. This family of curves is most famously
used to describe (or model) the distribution of many quantities occurring in nature,
physical experiments, finance, etc.16 Here, we are simply interested in its shape
16 https://en.wikipedia.org/wiki/Normal_distribution#Occurrence_and_applications
2.8 Parametric families of functions 115
and basic properties. In the first example we will see how we can obtain all bell
curves by rescaling one of them.
x2
Example 1. Given a standard bell curve y = e− 2 with parameters a = 0, b = 1 and
c = 1, if we stretch its graph horizontally b times, stretch it vertically c times and
shift it to the right by a, what is the function corresponding to the resulting curve?
Solution: Recall that stretching a graph of f (x) horizontally b times corresponds to
f ( bx ), then stretching the result vertically c times corresponds to c f ( bx ) and, finally,
shifting to the right by a corresponds to c f ( x−a −x2 /2 , we will get
b ). When f (x) = e
2 2
y = ce−(x−a) /(2b ) , which is the general bell curve above.
x2
Exercise 1. Show that y = e− 2 has the global maximum y = 1 at x = 0 and two
inflection points at x = −1, x = +1. Explain how this, together with the previous
example, implies the location of the maximum (y = c at x = a) and inflection points
(x = a ± b) for the general bell curve above.
Logistic curve. Next, let us consider the so called logistic function given by
c c
y= =
1 + e−b(x−a) 1 + κe−bx
where parameter a is any real number, b > 0, c > 0, κ > 0 are any positive numbers,
and where parameters a and κ are interchangeable and related by κ = eba or a =
116 2 Derivatives
1
b ln(κ). Logistic curves have many applications, for example, in modelling various
growth processes.17
The two formulas above are just slightly different representations of the same
function because
e−b(x−a) = e−bx+ba = eba e−bx = κe−bx
if we set κ = eba . In other words, parameter a can be replaced by κ or vice versa.

Various features of these curves are summarized in the figure. For example, e−bx
goes to 0 as x → +∞, so the function approaches 1+0 c
= c as in the figure, and e−bx
c
goes to ∞ as x → −∞, so the function approaches 1+∞ = 0. In the next example we
will see how we can obtain all logistic curves by rescaling one of them.
Example 2. Given a standard logistic curve y = 1+e1 −x with parameters a = 0, b = 1

and c = 1, if we shrink its graph horizontally b times, stretch it vertically c times
and shift it to the right by a, what is the function corresponding to the resulting
curve?
Solution: Shrinking a graph of f (x) horizontally b times corresponds to f (bx), then
stretching the result vertically c times corresponds to c f (bx) and, finally, shifting
to the right by a corresponds to c f (b(x − a)). When f (x) = 1+e1 −x , we will get
c
y = 1+e−b(x−a) , which is the general logistic curve above.
17 https://en.wikipedia.org/wiki/Logistic_function#Applications
Exercise 2. Show that y = 1+e1 −x is increasing and has one inflection point at x = 0,
where y(0) = 12 . Discuss how this, together with the previous example, explains
the shape of the general logistic curve above.
Exponential with a limit. Next, we will introduce the exponential with a limit
function given by
y = c(1 − e−bx )
for x ≥ 0, where b > 0 and c > 0 are any positive numbers. Notice that here our
domain starts at x = 0 and we do not shift the function horizontally, so we do not
have a parameter a as in the above two families. We think of x = 0 as the starting
point of the process, although we could, of course, introduce a horizontal shift if
we wanted to.
The function y = c(1 − e−bx ) is a familiar exponential decay function e−bx

flipped around the x-axis by a minus sign, −e−bx , then shifted up by one, 1 − e−bx ,
and stretched vertically c times. As a result, it is increasing, and we can say that it
is approaching a horizontal asymptote y = c exponentially fast, as in the figure. For
this reason, these functions can be used to describe processes that approach some
limiting value c exponentially fast (see Exercise 3 below for a real life example).
We can also choose a standard one in the family and obtain all these curves by
rescaling one of them.
Example 3. Given an exponential with a limit y = 1 − e−x with parameters b = 1
and c = 1, if we shrink its graph horizontally b times and stretch it vertically c
times, what is the function corresponding to the resulting curve? Show that, as in
the figure above, y = c(1 − e−bx ) is increasing and concave down if parameters
b > 0 and c > 0 are positive, and the derivative y′ (0) = cb.
118 2 Derivatives

stretching the result vertically c times corresponds to c f (bx). When f (x) = 1−e−x ,
we will get y = c(1−e−bx ), which is the general exponential with a limit curve. The
derivative y′ (x) = cbe−bx is positive, so the function is increasing, and y′ (0) = cb.
Exercise 3. A pie at room temperature of 20◦ C is put into an oven at temperature

200◦ C, after which its temperature T starts to change according to the formula
T = k + c(1 − e−bt ), where time t is measured in minutes. What is k and c? If the
pie temperature is initially increasing at 18◦ C per minute, what is b?
Surge function. Next, we will consider the so called surge function
y = cxe−bx
for x ≥ 0, where b > 0, c > 0 are any positive numbers. Similarly to the expo-
nential with a limit, the domain here starts at zero and the function grows quickly
from the beginning, but eventually starts decreasing. The function has a horizon-
tal asymptote y = 0 as x → +∞ because cxe−bx = ecxbx and exponential growth ebx
dominates the power function x at infinity. The constant c here does not quite play
the same role as before (stretching vertically by c) for the reason explained in the
next example.
Example 4. Given a standard surge function y = xe−x with parameters b = 1 and

c = 1, if we shrink its graph horizontally b times and stretch it vertically c times,
what is the function corresponding to the resulting curve?
Solution: As before, shrinking a graph of f (x) horizontally b times corresponds

to f (bx), then stretching the result vertically c times corresponds to c f (bx). When
f (x) = xe−x , we will get y = cbxe−bx . It does not really make sense to write a
constant cb in front of xe−bx , because it is just one constant, so we wrote it simply
as c in the definition of the general surge function. Of course, given y = cxe−bx , if
we rewrite it as y = ( bc )bxe−bx then bc is the vertical stretch factor; that is why in
the figure above the global maximum be c
= bc · 1e is proportional to bc .
Exercise 4. Show that y = xe−x has global maximum y = 1e at x = 1, and one

inflection point at x = 2. If a surge function y = cxe−bx has a global maximum at
x = 0.75, where is its inflection point?
Now that we discussed all the families of functions in the Motivating Question
above (except for the quadratic polynomial that we are already familiar with), go
back to that question and try to match different shapes with the description of
bacterial growth or decay. After that, check your answers with the answers at the
end of the section.
We will introduce one more family of functions below, called the Gompertz
functions, but first let us solve a couple of quick problems about parametric families
of functions.
2
Example 5. Find the global minimum of y = x + ax for x > 0, where parameter a
is positive, a > 0. Can we write this family as a horizontal and vertical stretching
of y = x + 1x ?
2
Solution: We see that y′ (x) = 1 − ax2 = 0 when
x2 = a2 or x = ±a. Also, the derivative is unde-
fined at x = 0, because we divide by zero. However,
since our domain is x > 0, the only critical point is
x = a. We can see that the derivative is negative for
x < a and positive for x > a, so x = a is the global
minimum where the value is y(a) = 2a. The func-
2
tion y = x + ax approaches +∞ when x approaches
0 or +∞. Check that this matches the graphs for
a = 0.5, 1 and 2 in the figure.
If we start with y = x + 1x , then stretch it horizontally b times and vertically c
times, we will get x x b c cb
cy =c + = x+ .
b b x b x
2
Can we choose b and c in such a way as to get y = x + ax ? In other words, we
need that bc = 1 and bc = a2 , so b = c and b2 = a2 . Yes, we can take b = c = a, so
this family of functions is y = x + 1x stretched horizontally a times and vertically a
times.
Exercise 5. Find a function of the form y = ae−x + bx with the global minimum at
(1, 2).
120 2 Derivatives
Gompertz curve. Finally, we consider the Gompertz function
−b(x−a) −bx
y = ce−e = ce−κe
where κ > 0, b > 0, c > 0 are any positive numbers, a ∈ R is any real number, and
where κ = eba , or a = 1b ln(κ). Gompertz functions are used to model growth of
tumours, adoption of technology (e.g. cellphones), etc.18 As in the case of logistic
function, the two formulas above are just different representations of the same
function, where the parameter a can be replaced by κ or vice versa. All Gompertz
functions are rescalings of one of them.
−x
Example 6. Given a standard Gompertz curve y = e−e with parameters a = 0,
b = 1 and c = 1, if we shrink its graph horizontally b times, stretch it vertically
c times and shift it to the right by a, what is the function corresponding to the
−x
resulting curve? What are the horizontal asymptotes of y = e−e ?
stretching the result vertically c times corresponds to c f (bx) and, finally, shifting
−x
to the right by a corresponds to c f (b(x − a)). When f (x) = e−e , we will get
−b(x−a)
y = ce−e , which is the general Gompertz curve above. Because e−x → 0 as
−x
x → ∞, we see that e−e → e−0 = 1, and because e−x → ∞ as x → −∞, we see
−x
that e−κe → e−∞ = 0. This matches the horizontal asymptotes of the general
Gompertz function in the figure above after vertical rescaling by c.
18 https://en.wikipedia.org/wiki/Gompertz_function#Example_uses
Exercise 6. Show that y = e −e−x

is increasing and has one inflection point at x = 0,
where y(0) = 1e ≈ 0.37. Discuss how this, together with the previous example,
explains the shape of the general Gompertz curve above.
Bonus: Gompertz decay function. If we change b to −b in the Gompertz curve

above, we will get another version of the Gompertz function,
b(x−a) bx
y = ce−e = ce−κe
where κ > 0, b > 0, c > 0 are any positive numbers, a ∈ R is any real number,
and where κ = e−ba , or a = − b1 ln(κ). What changing b to −b does is it flips
the Gompertz growth functions horizontally and turns them into Gompertz decay
functions. Most famously, these functions can be used to model human survival
chances with age, based on the empirical Gompertz law of mortality.19 . We will
not discuss this further here, but you can learn more about it in the video in the
footnote link.20 Similarly, by changing b to −b, logistic growth above can be turned
into logistic decay.
Answer to the Motivating Question. 1(d), 2(b), 3(c), 4(e), 5(a).

x2
Answer to Exercise 1. Since f ′ (x) = e− 2 (−x) = 0 when x = 0, and f ′ (x) is nega-
tive for x > 0 and positive for x < 0, we see that the critical point x = 0 is the global
maximum. Since
19 https://en.wikipedia.org/wiki/Gompertz-Makeham_law_of_mortality
20 https://youtu.be/6Lyv53YTPDU
122 2 Derivatives
x2 x2 x2
f ′′ (x) = e− 2 (−x)2 + e− 2 (−1) = e− 2 (x2 − 1),
we can see that f ′′ (x) = 0 when x = −1 or +1, it is negative in between x = −1 and

x = +1 and positive elsewhere, so the function is concave down between x = −1
and x = +1 and concave up outside of this interval. This means that x = −1 and
2
x = +1 are inflection points. If we stretch the graph of y = e−x /2 horizontally b
times, stretch it vertically c times and shift it to the right by a, the maximum will
move to y = c at x = a, and the inflection points will move to a − b and a + b, just
like in the figure of the general bell curve.
−x
Answer to Exercise 2. Since f ′ (x) = (1+ee
−x )2 , the derivative is always positive so
the functions is always increasing. Since
−e−x (e−x − 1)2 + e−x 2(e−x − 1)(−e−x ) e−x (e−x − 1)

f ′′ (x) = = ,
(1 + e−x )4 (1 + e−x )3
we can see that f ′′ (x) = 0 when e−x = 1, or x = 0, it is positive for x < 0 and
negative for x > 0, so the function is concave up for x < 0 and concave down for
x > 0. This means that x = 0 is the only inflection point. If we shrink the graph of
y = 1+e1 −x horizontally b times, stretch it vertically c times and shift it to the right
by a, the inflection will move to x = a where the value will be y = 2c , just like in
the figure of the general logistic curve.
Answer to Exercise 3. Since T (0) = k + c(1 − e−b·0 ) = k and at time t = 0 the pie
is at room temperature 20◦ C, this means that k = 20. The limit of T = k + c(1 −
e−bt ) as t → ∞ is k + c(1 − 0) = k + c, which should be the oven temperature, so
k + c = 200 and c = 200 − k = 200 − 20 = 180. So T = 20 + 180(1 − e−bt ). The
derivative T ′ (0) = 180b is exactly the initial rate of increase of the pie temperature,
so 180b = 18 and b = 0.1. We finally get that T = 20 + 180(1 − e−0.1t ).
Answer to Exercise 4. Since y′ (x) = (xe−x )′ = e−x − xe−x = (1 − x)e−x , we can

see that y′ (x) = 0 when x = 1, it is positive for x < 1 and negative for x > 1, so
x = 1 is the global maximum. If we plug in x = 1 we get y = 1 · e−1 = 1e . The
second derivative
y′′ (x) = ((1 − x)e−x )′ = −e−x − (1 − x)e−x = (−2 + x)e−x
is equal to zero at x = 2, where it changes sign from negative to positive, so x = 2

is an inflection point, the function is concave down for x < 2 and concave up for
x > 2 as in the figure above. If a surge function y = cxe−bx has a global maximum
at x = 0.75, it means that b1 = 0.75 and, as a result, the inflection points is b2 =
2 × 0.75 = 1.5.
Answer to Exercise 5. Since the global minimum is at (1, 2), x = 1 must be a

critical point. Since y′ (x) = −ae−x + b, we must have y′ (1) = −ae−1 + b = 0, or
a = eb. Also, y(1) = 2 so ae−1 + b = 2. Because a = eb, we get ae−1 + b = b + b =

2, so b = 1 and, finally, a = e.
−x −x
Answer to Exercise 6. Since y′ (x) = e−e e−x = e−e −x > 0, the function is al-
−x
ways increasing. Since y′′ (x) = e−e −x (e−x − 1) = 0 when e−x = 1 or x = 0, y′′ (x)
is positive for x < 0 and negative for x > 0, the function is concave up for x < 0,
concave down for x > 0 and has inflection points at x = 0, where y(0) = 1e . After
rescaling and horizontal shift as in the previous example, we get exactly the shape
as in the figure.
124 2 Derivatives
2.9 Related rates

In this section we will take a look at problems where
• several quantities are related through some geometric or physical constraint,
• these quantities are changing at the same time.
If we have information about the rate of change of one (or more) of them, we
can find the rate of change of the other by using the relationship between these
quantities. For example, if variables x = x(t) and y = y(t) are related though some
equation and we know y′ (t), we can find x′ (t) by using this equation. We will
see that, typically, we can solve the equation for x and then take the derivative,
but sometimes we can use implicit differentiation by taking the derivative of the
equation and then solving for x′ (t).
Example 1. During takeoff an airplane is climbing (gaining altitude) at 900 meters
per minute, and the temperature outside is dropping at 7◦ C per 1000 meters. How
fast is the temperature outside the plane changing?
Solution: Let us start by writing all the variables, the given information about the
variables, and the quantity we want to find.
• Variables mentioned in the problem are: altitude y (in meters), temperature T (in
◦ C), and time t (in minutes).
• We are given the following information:

dy dT
y′ (t) = = 900 m/min and T ′ (y) = = 7 ◦ C/1000m = 0.007 ◦ C/m.
dt dy
Notice that, because we were given the rate of change of temperature in degrees
per 1000 meters, it means that at this moment we view temperature as a function
of altitude, T = T (y).
• The question “How fast is the temperature outside the plane changing?” means
that we want to find the rate of change T ′ (t) = dT
dt of temperature with respect
to time. In other words, we are interested in temperature as a function of time,
T = T (t).
To summarize, we have three variables y, T and t, we have some information about
y(t) and T (y), and we want to learn something about T (t). For this, we need to
find the relationship between all the functions involved, y(t), T (y) and T (t). In this
case, the key equation that relates all the functions is
T (t) = T (y(t)).
In other words, we plug in y(t) into T (y) to get T (y(t)) – temperature as a function
of time. Sometimes the relationship is given simply by composition of functions.
Since we want to find T ′ (t), we differentiate the key equation, in this case us-
ing the chain rule T ′ (t) = (T (y(t)))′ = T ′ (y(t))y′ (t) = 0.007 ◦ C/m × 900 m/min =
6.3 ◦ C/min.
2.9 Related rates 125
Exercise 1. A colony of bacteria in a Petri dish is growing in a circular shape.

When the radius is 5 mm, it is growing at the rate of 1 mm per day. How fast does
the area change at that moment?
Example 2. The plane is climbing at 400 meters per minute, and the temperature
outside the plane is dropping at 2◦ C per minute. At that moment, how fast is the
temperature changing with altitude?
Solution: This example is almost the same as the first example above, and the
solution is the same up to the derivative of the key equation: T ′ (t) = T ′ (y(t))y′ (t).
This relationship allows us to find one of the rates T ′ (t), T ′ (y), y′ (t) given the
other two. For example, right now we are given that y′ (t) = 400 m/min and T ′ (t) =
2 ◦ C/min, and we can find
T ′ (t) 2 ◦
T ′ (y(t)) = ′
= C/m = 5 ◦ C/km.
y (t) 400
If, instead, we knew how fast the temperature outside is changing, T ′ (t), and how
temperature changes with altitude, T ′ (y), we could find how fast the airplane is
′ (t)
climbing at that moment, solving for y′ (t) = T T′ (y(t)) . This is an example of implicit
differentiation, which will also be useful in the next exercise.
Exercise 2. Bread dough is rising in the oven during
the first few minutes of baking. Its shape is (roughly)
hemispherical and, at the moment when the radius
is r = 10 cm, its volume is increasing at the rate of
200 cm3 /min. How fast is the radius changing at the
same moment?
Example 3. A 1.3 meter broom leaning against the

wall starts sliding away from the wall. When the
broom head is 1 meter from the wall (i.e. x = 1 in the
figure) and it is sliding at a rate of 0.5 m/s, how fast is
the top of the handle sliding along the wall?
Solution: The variables mentioned in the problem are
x and y depicted in the figure, and time t. We are given
that x′ (t) = 0.5 m/s when x(t) = 1 m, and we want to
find y′ (t). What is the relationship between x and y?
We are also given that the broom is 1.3 meters long, which we can view geometri-
cally as the hypotenuse of the right triangle with the other two sides x and y, so the
key equation is x2 + y2 =p1.32 , or x(t)2 + y(t)2 = 1.69. Since we want to find y′ (t),
we can solve for y(t) = 1.69 − x(t)2 and take the derivative,
1 1
y′ (t) = p (−2x(t))x′ (t) = √ (−2 · 1)0.5 ≈ −0.6 m/s.
2 1.69 − x(t)2 2 1.69 − 12
126 2 Derivatives
Another way it to use implicit differentiation. Taking the derivative of the equation
x(t)2 + y(t)2 = 1.69 we get 2x(t)x′ (t) + 2y(t)y′ (t) = 0, and then solving for y′ (t),
x(t)x′ (t)
y′ (t) = − .
y(t)
We know thatpx(t) = 1 and x′ (t)√= 0.5, and we can find y(t) from the key equation
again, y(t) = 1.69 − x(t)2 = 1.69 − 12 = 0.83, so y′ (t) = − 0.830.5
≈ −0.6 m/s.
Exercise 3. A person standing on the Cherry beach in Toronto is watching an air-

plane take off from the Billy Bishop airport. When the airplane’s altitude is y = 550
meters and the person’s angle of view is θ = π6 , the plane’s altitude is increasing
at 400 meters per minute and the angle of view is decreasing at 1.8 radians per
minute. How fast is the horizontal distance x to the plane changing at that moment,
in km/h?
Exercise 4. The Statue of Liberty is 46 meters high and it stands on a pedestal

which is also 46 meters high. When a cruise boat is at the distance d = 100 meters
2.9 Related rates 127
and is approaching the statue with the speed 5 m/s, how fast does the angle of view
of the statue θ change?
To summarize, the first steps in solving related rates problems are:

• Start by writing all the variables, the given information about the variables, and
the quantity we want to find.
• Find the key equation relating all the quantities involved.
After that we can use different strategies:
• If possible, solve the key equation for the variable of interest and then take its
derivative.
• Use implicit differentiation: take the derivative of the key equation first and then
solve for the derivative we want to find.
Answer to Exercise 1. Variables are: radius r, area A and time t. We are given that
r′ (t) = 1 mm/day when r(t) = 5 mm. We want to know A′ (t). The key equation
relating the radius and area is A = πr2 , so A(t) = π(r(t))2 . Differentiating this
equation we get that A′ (t) = 2πr(t) · r′ (t) = 2π · 5 · 1 = 10π mm2 /day.
Answer to Exercise 2. Variables are: radius r, volume V and time t. We are given
that V ′ (t) = 200 cm3 /min when r(t) = 10 cm. We want to know r′ (t). The key
equation relating the radius and volume of half the sphere is V = 2π 3
3 r , so V (t) =
2π 3 ′
3 (r(t)) . We want to know r (t), so we can solve for r(t) first and then take the
derivative, or we can use implicit differentiation. Let us use implicit differentiation.
Differentiating the equation V (t) = 2π 3 ′ 2 ′
3 (r(t)) we get that V (t) = 2π(r(t)) · r (t).
Solving for r′ (t) we get
V ′ (t) 200
r′ (t) = 2
= = 0.3183 cm/min.
2π(r(t)) 2π(10)2
Answer to Exercise 3. The variables are x, y, θ and time t. We are given that
y′ (t) = 400 m/min and θ ′ (t) = −1.8 rad/min when y = 550 and θ = π6 . We want
to find x′ (t) at the same moment. The relationship between variables from the right
triangle is tan(θ ) = xy . Solving for x we get x(t) = y(t) cot(θ (t)), and taking the
derivative (also recall or check that (cot(x))′ = − csc2 (x) = − sin21(x) ),
x′ (t) = y′ (t) cot(θ (t)) + y(t)(− csc2 (θ (t)))θ ′ (t)

= 400 cot(π/6) + 550 − csc2 (π/6) (−1.8) = 4652.62 m/min,

which is about 279 km/h.

128 2 Derivatives
Answer to Exercise 4. We are given that d ′ (t) = −5 m/s (minus sign because the
boat is moving toward the statue, so the distance is decreasing) when d(t) = 100
m, and we want to know θ ′ (t), so we need to find the equation relating d and θ .
Let α be the angle between the horizontal line and line of sight to the bottom of
the Statue of Liberty, and let β be the angle between the horizontal line and line of
sight to the top of the Statue of Liberty, as in the figure. Then θ = β − α. From the
right triangles, we see that
46 92
tan(α) = and tan(β ) = ,
d d
so α = arctan( 46 92 92 46
d ), β = arctan( d ), and θ (t) = arctan( d(t) ) − arctan( d(t) ). This is
our key equation. Differentiating it, we get
1 92 1 46
θ ′ (t) = 92 2
− d ′
(t) − 46 2
− d ′ (t)
1 + ( d(t) ) d(t)2 1 + ( d(t) ) d(t)2
46 92
= − d ′ (t)
d(t)2 + 462 d(t)2 + 922
46 92
= − (−5) = 0.0059 rad/s.
1002 + 462 1002 + 922
Chapter 3
Integrals
3.1 Definite integrals: the case of velocity

Main question that we will study in this chapter is the following. If we know the rate
of change of some quantity, how do we calculate how much the quantity changes
on some interval? To spell out this question more precisely:
• If we know the rate of change f (x) of some quantity F(x) on some interval [a, b],
which means that f (x) is the derivative of this quantity, f (x) = F ′ (x), how do
we calculate how much the quantity changes between a and b? In other words,
how do we calculate F(b) − F(a)?
To make this question more con-
crete, consider a car driving on a moun-
tain road, and suppose that at time t = a
the car is at point A and at time t = b
it is at point B. At time t, let D(t) be
the distance of the car along this road
from the highway exit O. Notice that
this distance D(t) can increase or de-
crease depending on the direction in
which the car is moving. What is D′ (t)?
It is the velocity v(t) at time t, which is
the speed of the car with the plus or minus sign depending on the direction. For ex-
ample, when the car is moving towards the highway exit, D(t) is decreasing and its
derivative will be negative, i.e. minus the speed. In this concrete setup, the question
above becomes:
• If we know the velocity v(t) on some time interval [a, b], how do we calculate
how much the position D changes between time a and time b? In other words,
how do we calculate the distance D(b) − D(a) between points A and B along
this road depicted in the figure above?
Notice also that the change in position D(b) − D(a) would be negative (minus the
distance) if B was closer to the exit O than A, in which case D(b) < D(a).
129
130 3 Integrals
Main formula. Before we

give the answer to the above
question, let us look at the graph
of the velocity on the interval
a ≤ t ≤ b. The graph in the fig-
ure on the right is not a very re-
alistic description of a car mov-
ing on a mountain road, but as
an illustration it has the right fea-
tures. For example, up to a cer-
tain point in time the car is mov-
ing away from the highway exit
(towards the peak of the moun-
tain), because the velocity v(t) is
positive and its graph is above
the x-axis. Then the car turns around and starts moving back towards the high-
way exit, so the velocity is negative and its graph is below the x-axis. At time t = b
the car is passing point B. Let A1 be the area below the graph of velocity y = v(t)
and above the x-axis, when the car is moving in the ‘positive direction’ and D(t) is
increasing. Let A2 be the area above the graph of velocity y = v(t) and below the
x-axis, when the car is moving in the ‘negative direction’ and its position D(t) is
decreasing. Then the answer to the above question is:
• Given velocity v(t) = D′ (t), the change of the position between time a and b is
D(b) − D(a) = A1 − A2 .
A more detailed answer is

• The car will cover distance A1 in the positive direction and distance A2 in the
negative direction.
So the total distance travelled by the car will be A1 + A2 but, taking into account
direction, the change in car’s position on the road will be A1 − A2 . The minus sign
in the second term in A1 − A2 simply takes into account the direction, or the fact
that D(t) can increase and decrease, and our main goal now is to understand why
the area between the graph of velocity v(t) and the x-axis represents the distance
travelled. Actually, the reason for this is quite simple and will become clear through
several examples.
Example 1. A car drives in a ‘positive direction’ on some road with the speed of
20 mph between 2 p.m. and 5 p.m., then turns around and drives in the opposite
direction with the speed of 40 mph between 5 p.m. and 9 p.m., and then turns
around again and drives in the original direction with the speed of 50 mph between
9 p.m. and 11 p.m. What is the total distance travelled by the car, and what is the
change in its position along this road. Draw the graph of the velocity and check
that the calculation matches the above formula in terms of areas.
3.1 Definite integrals: the case of velocity 131
Solution: Using the formula
Distance = Speed × Time
we can calculate that the distance

travelled between 2 p.m. and 5
p.m. is 20 m/h × 3 h = 60 miles,
the distance travelled between 5
p.m. and 9 p.m. is 40 × 4 = 160
miles, and the distance travelled
between 9 p.m. and 11 p.m. is
50 × 2 = 100 miles. The total
distance travelled is 60 + 160 +
100 = 320 miles, and the posi-
tion change taking into account the direction is 60 − 160 + 100 = 0 miles.
If we take a look at the graph of velocity between 2 p.m. and 11 p.m., the areas
we need to compute correspond to three rectangles, because velocity is constant
during each time period and the sides of each rectangle are exactly Speed (height)
and Time (width), so each Area = Speed × Time, which is the same as distance
travelled during that time period. In the case when velocity is constant on each
interval, this explains why we use the areas in the formula A1 − A2 above. Notice
that A1 here consists of two disjoint rectangles, so it is okay that the region above
or below the x-axis consists of several pieces.
Exercise 1. The table below gives the velocity of the car driving along some road
during different time intervals. Compute the total distance travelled and the change
in position.
t 1 to 3 p.m. 3 to 5 p.m. 5 to 7 p.m. 7 to 9 p.m.
v(t) 60 mph −55 mph 50 mph −60 mph
Sketch the graph of the velocity and check that various distances are exactly the
areas in your figure.
For the future, it is convenient to modify the formula Distance = Speed × Time
slightly and rewrite it as
Position Change = Velocity × Time
where we simply take into account the positive or negative direction. When the
position D(t) is decreasing, Position Change = −Distance and Velocity = −Speed,
so it is still the same formula just with a minus sign. It is more convenient because
we do not need to mention the direction explicitly, since it is reflected by the plus
or minus sign.
132 3 Integrals
Riemann sum approximation. In the two problems above, the time interval
[a, b] was divided into several subintervals where the velocity was constant. If the
velocity is not constant, we can still use the same calculation to approximate the
distance travelled by the car and its change in position. To do that, we can divide the
time interval [a, b] into many small subintervals and, because the velocity cannot
change much over a very short period of time, the velocity is almost constant on
each subinterval. This means that if we measure the velocity at any particular time
on a small subinterval, we will get an approximation
∆D ≈ Velocity × ∆t
where ∆D is the increment of position and ∆t is the increment of time.

• If we sum up the increments of position ∆D over all subintervals we will get the
total change of position D(b) − D(a).
• Because Velocity × ∆t = ±Area of Rectangle (see figure below), their sum
will approximate A1 − A2 .
This explains the formula D(b) − D(a) = A1 − A2 in the general case, because
when subintervals get smaller and smaller, the approximation of the areas A1 and
A2 by rectangles gets better and better.
In the figures above we divided the interval [a, b] into 20 subintervals and then
made two possible choices of velocity on each subinterval: at the left endpoint or
at the right endpoint. The sum corresponding to the left endpoint is called the left
Riemann sum, and to the right endpoint is called the right Riemann sum.
Example 2. The following table shows the ve-
locity v(t) of the bowling ball (in meters per
second) at time t (in seconds), between the
moment it was released until it hit the pins.
t 0 0.5 1 1.5 2 2.5
v(t) 8.6 7.7 7.15 6.8 6.6 6.5
Estimate the length of the bowling lane from above and below.
Solution: Because of friction, the bowling ball is
slowing down, which is also reflected in the above
table, because the values v(t) are decreasing. As a
result, the speed is the highest at the beginning of
each interval (left endpoint), and it is lowest at the
end of each interval (right endpoint). This means
that the left Riemann sum, in this case correspond-
ing to 5 intervals of length ∆t = 0.5 each,
8.6 × 0.5 + 7.7 × 0.5 + 7.15 × 0.5 + 6.8 × 0.5 + 6.6 × 0.5 = 18.425,
overestimates the total distance travelled by the ball. The right Riemann sum
7.7 × 0.5 + 7.15 × 0.5 + 6.8 × 0.5 + 6.6 × 0.5 + 6.5 × 0.5 = 17.375
underestimates the total distance travelled by the ball. We conclude that the length
of the lane is between 17.375 and 18.425 meters. Notice how in the figure above,
because the function is decreasing, the green rectangles with height given by the
left endpoints are above the graph of y = v(t) and the blue rectangles with height
given by the right endpoints are below the graph, so the area below the graph is,
indeed, in between the right and left Riemann sums.
What we have learned in the last problem is that

• If the function is decreasing then the left Riemann sums overestimate and right
Riemann sums underestimate the change in position D(b) − D(a).
This is true even when velocity can become negative (check it!). Similarly, we can
see that
• If the function is increasing then the left Riemann sums underestimate and right
Riemann sums overestimate the change in position D(b) − D(a).
Exercise 2. The following table shows the speed v(t) of the bowling ball (in meters
per second) at time t (in seconds), between the moment it was dropped from the
roof of a building until it hit the ground.
t 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
v(t) 0 2.45 4.9 7.35 9.8 12.25 14.7 17.15 19.6
(a) Estimate the height of the building from above and below using four equal subin-
tervals.
(b) Estimate the height of the building from above and below using eight equal
subintervals.
(c) If the speed is v(t) = 9.8t, plot its graph and deduce what the exact height of the
building is using the area formula.
134 3 Integrals
Example 3. Given the following table of velocity v(t) (in meters per second) at
time t (in seconds) between t = 0 and t = 3,
t 0 0.5 1 1.5 2 2.5 3

v(t) 1.6 0.7 0.15 −0.2 −0.4 −0.5 −0.55
which of the following expressions is not a Riemann sum estimate of the position
change D(3) − D(0), and why?
(a) (1.6 + 0.15 − 0.4) × 1
(b) (0.7 + 0.15 − 0.2 − 0.4 − 0.5 − 0.55) × 0.5
(c) (1.6 + 0.7 + 0.15 − 0.2 − 0.4 − 0.5 − 0.55) × 0.5
(d) (1.6 + 0.7 + 0.15 − 0.2 − 0.4 − 0.5) × 0.5
Solution: (a) Notice that here we multiply the velocity values by 1, not 0.5, so
the increment of time is ∆t = 1, which means that this sum could correspond to a
Riemann sum with 3 subintervals on the interval [0, 3]. This means that we need
to look at the values of v(t) at t = 0, 1, 2 and 3, and we can recognize that (1.6 +
0.15 − 0.4) × 1 is a left Riemann sum v(0) × ∆t + v(1) × ∆t + v(2) × ∆t.
(b) This is a right Riemann sum with ∆t = 0.5 and 6 subintervals. The number
of terms matches the number of subintervals.
(d) This is a left Riemann sum with ∆t = 0.5 and 6 subintervals.
(c) Here, it looks like ∆t = 0.5, but this is not a Riemann sum with ∆t = 0.5 and
6 subintervals, because the number of terms is 7 and it does not match the number
of subintervals. By using all the values in the table we are overcounting, or more
precisely, we are counting one of the intervals twice, at the left and right endpoints.
Exercise 3. Given velocity v(t) = e−t between t = 0 and t = 4, which of the fol-
lowing is not a Riemann sum estimate of the position change, and why?
(a) (e−2 + e−4 ) × 2
(b) (e−1 + e−2 + e−3 + e−4 ) × 1
(c) (1 + e−0.5 + e−1 + e−1.5 + e−2 + e−2.5 + e−3 + e−3.5 ) × 0.5
(d) (e0 + e−1 + e−2 + e−3 + e−4 ) × 1
Constant acceleration. Let us now consider several examples with constant
acceleration, which makes the areas particularly easy to compute.
Example 4. An object is moving along a straight line
with initial velocity 4 m/s and acceleration −1 m/s2 .
What is the total distance travelled and position
change at time t = 9 s?
Solution: Acceleration is the derivative of velocity,
so constant acceleration means that velocity v(t) is a
linear function with the slope equal to acceleration.
In our case, the slope is −1 and initial velocity is
v(0) = 4, so the function is v(t) = 4 − t.
Because velocity v(9) = −5 at time t = 9 is negative, the object changes direc-

tion somewhere before that, and we need to find where: 4 −t = 0, or t = 4. Looking
at the figure, we can now use the areas of the triangles to compute the distances.
The area above the x-axis is 12 · 4 · 4 = 8, which is the distance travelled in the posi-
tive direction, and the area below the x-axis is 12 · 5 · 5 = 12.5, which is the distance
travelled in the negative direction. So the change in position is 8 − 12.5 = −4.5
meters and the total distance travelled is 8 + 12.5 = 20.5 meters. Notice that mul-
tiplying the units m/s × s gives meters.
Exercise 4. A baseball is thrown from height 0 directly upwards with speed 29.4
m/s. What is the height of the baseball at time t = 5 seconds?
Example 5. A baseball is thrown from height 0 di-

rectly upwards and it reaches its peak at 78.4 me-
ters. What is its initial velocity v0 and time t0 it takes
to reach the peak?
Solution: Acceleration due to gravity is −9.8 m/s2 ,
which is the derivative of velocity, so velocity v(t)
is a linear function with the slope −9.8. In our case,
the initial velocity v0 and time t0 to reach the peak
are unknown, but we know that the velocity at the
peak should be zero, v(t0 ) = 0, as in the figure.
In addition to the slope, we also know the area A1 under the graph, because it is
exactly the distance to the peak, A1 = 78.4. In terms of v0 and t0 the area is 21 v0t0 .
Also, since the line passes through points (0, v0 ) and (t0 , 0), its slope is 0−v v0
t0 −0 = − t0 .
0
This gives us two equations,
v0 1
Slope = − = −9.8, Area = v0t0 = 78.4.
t0 2
From the first one, we get v0 = 9.8t0 and, plugging this into the second equation,
we get 12 9.8t02 = 78.4, or t02 = 16, or t0 = 4. Then v0 = 9.8t0 = 9.8 × 4 = 39.2 m/s.
Exercise 5. After spotting a police officer in a school zone with a 10 m/s speed
limit, a surprised driver slams on the brakes and comes to a complete stop. The
police officer was not equipped with a speed radar to see how fast the driver was
going. However, by looking at the tire skid marks the police officer determined that
(a) It took the driver 35 m to come to a complete stop.
(b) The driver was braking (decelerating) at 7 m/s2 .
Does the officer have enough information to issue a speeding ticket? Hint: Sketch
the graph of velocity and express unknown initial velocity v0 and time to stop t0 in
terms of given information.
136 3 Integrals
Notation and terminology. The change of position D(b) − D(a) is often called
total displacement, which is A1 −A2 as opposed to total distance travelled A1 +A2 .
Now that we understood that the total displacement D(b) − D(a) can be computed
from velocity v(t) = D′ (t) using the areas of the Riemann sum approximations, it
is time to introduce important notation and terminology.
First of all, the difference of areas A1 − A2 above
and below the x-axis on the interval [a, b] is called
the definite integral of v(t) on the interval [a, b],
and it is denoted
Z b
v(t) dt.
a
This notation will become more clear below.

We have seen in the examples above that we can
approximate the difference A1 − A2 by the sum of
areas of rectangles (of course, also with a plus or
minus sign), which is called a Riemann sum. For
example, the figure on the right depicts rectangles
corresponding to the left Riemann sum. When the
interval is divided into n subintervals, the notation
for a Riemann sum of the function v(t) is
n
∑ v(ti∗ )∆t
i=1
where v(t) can be replaced by a specific formula, and where the meaning of each
symbol is as follows: .
• The symbol Σ (sigma)
represents the sum.
• If we divide the inter-
val [a, b] into n small
subintervals, the let-
ter i represents the in-
dex enumerating these
subintervals from 1 to
n. For example, i = 3
means that we are look-
ing at the third subin-
terval starting from t = a. Other letters can be used instead of i, for example,
k, ℓ, m, etc. Below Σ we write i = 1 to indicate that the first interval index is 1,
and above Σ we write n to indicate that the last interval index is n. Sometimes 1
and n can change depending on the context.
• The factor ∆t represents the increment of time t, which is also the width of
subintervals.
• Points ti∗ represents some specific choice of a point inside the subinterval #i.
For example, it could be a left endpoint or right endpoint in the case of the left
or right Riemann sums. Because v(ti∗ ) is the ±height of rectangle #i and ∆t is
its width, v(ti∗ )∆t is exactly the ±Area of this rectangle and the notation ∑ni=1
means that we are adding up these terms, just like we did in the above problems.
Example 6. Given a function v(t) = cos(t):
(a) How do we denote its definite integral on the interval [0, π]?
(b) What is the definition of the definite integral on the interval [0, π]? Using the
definition, what is this integral?
(c) What is another meaning of the definite integral on the interval [0, π]?
(d) How can we compute the definite integral on the interval [0, π]?
(e) How do we denote a general Riemann sum for v(t) on the interval [0, π]?
(f) Write down the right Riemann sum on the interval [0, π] with four subintervals.
Solution: (a) The definite integral R of y = cos(t)
on the interval [0, π] is denoted 0π cos(t) dt.
(b) Its definition is the difference A1 − A2 of
areas above and below the x-axis on this inter-
val, as in the figure. In the case of cosine, these
two
R π areas are the same so they cancel out and
0 cos(t) dt = 0.
(c) Another meaning of this definite integral is
the total displacement D(π) − D(0) of an object
moving along a straight line with this velocity
v(t) = cos(t). In other words, if D′ (t) = cos(t) then D(π)−D(0) = 0π cos(t) dt = 0.
R
(d) Above, we noticed that this definite integral is zero, because the areas A1
and A2 cancel out, but we can also compute this integral by approximating it with
Riemann sums.
(e) A general Riemann sum in this case is ∑ni=1 cos(ti∗ )∆t. We can be a bit more
specific and replace ∆t with πn , because the interval has length π and we divide it
into n equal subintervals: ∑ni=1 cos(ti∗ ) πn .
(f) If we divide [0, π] into four subintervals then the right endpoints will be
t1∗ = π4 , t2∗ = π2 , t3∗ = 3π ∗
4 and t4 = π, so the right Riemann sum will be
π π 3π π
cos + cos + cos + cos(π) = −0.7853.
4 2 4 4
Exercise 6. Given a function v(t) = 1 − |t|:
(a) How do we denote its definite integral on the interval [−1, 1]?
(b) What is the definition of the definite integral on the interval [−1, 1]? Using the
definition, what is this integral?
(c) What is another meaning of the definite integral on the interval [−1, 1]?
138 3 Integrals
(d) How can we compute the definite integral on the interval [−1, 1]?
(e) How do we denote a general Riemann sum for [−1, 1] on the interval [0, π]?
(f) Write down the left Riemann sum on the interval [−1, 1] with four subintervals.
Example 7. A cat is chasing a mouse, both running along a straight wall. The
mouse never stops running but can reverse direction. At time t ≥ 0, the velocity
of the mouse is v(t) and the distance travelled by the mouse is d(t). What is the
relationship between v(t) and d(t)? Which one of the functions v(t) and d(t) is
invertible?
Solution: The total displacement is the
sum of areas A1 + A2 . When velocity is
negative, if we flip it around the x-axis then
it will become speed |v(t)| but the area A2
will stay the same, so the area below the
speed function |v(t)| is exactly A1 + A2 .
This means that we can express the total
distance travelled at time t as
Z t
d(t) = |v(s)| ds.
0
Notice how we used a different name for a

variable of integration, s instead of t, because t was already reserved for the time t
up to which we integrate. This is a typical convention not to use the same name for
more than one variable or object.
Exercise 7. A cat is chasing a mouse, both running along a straight wall. The mouse
never stops running but can reverse direction. At time t, the velocity of the mouse
is vm (t) and velocity of the cat is vc (t). Also, at time t, the distance travelled by the
mouse is dm (t) and distance travelled by the cat is dc (t). Write down the formulas
for the total displacement and total distance travelled by the cat in terms of the total
distance x travelled by the mouse.
Taking a limit. We saw that as we increase the number of subintervals n and the
width or rectangles gets smaller and smaller, Riemann sums get closer and closer
R bthe difference of areas A1 − A2 , which we called the definite integral and denoted
to
a v(t) dt. This statement can be written using the limit n → ∞ notation:
n Z b
lim ∑ v(ti∗ )∆t = v(t) dt.
n→∞ a
i=1
In particular, this explains the notation ab v(t) dt, whichR resembles the Riemann
R
sum, with the sum Σ now replaced by the integral sign and the interval indices
are replaced by the endpoints a and b. Since the definite integral is also the change
of position D(b) − D(a) on the interval [a, b], a more complete summary of this
section is the following formula: if v(t) = D′ (t) then
n Z b
lim
n→∞
∑ v(ti∗ )∆t = a
v(t) dt = D(b) − D(a).
i=1
Answer to Exercise 1. Change in position is 60 × 2 − 55 × 2 + 50 × 2 − 60 × 2 =

−10 miles. Total distance travelled is 60×2+55×2+50×2+60×2 = 450 miles.
Answer to Exercise 2. The speed is increasing so the left Riemann sums will
underestimate and right Riemann sums will overestimate the distance travelled by
the bowling ball, i.e. the height of the building. (a) Left Riemann sum is (0 + 4.9 +
9.8 + 14.7)0.5 = 14.7 and right Riemann sum is (4.9 + 9.8 + 14.7 + 19.6)0.5 =
24.5, so the height is between 14.7 and 24.5 meters.
(b) Left Riemann sum is (0 + 2.45 + 4.9 + 7.35 + 9.8 + 12.25 + 14.7 + 17.15) ∗
0.25 = 17.15 and right Riemann sum is (2.45 + 4.9 + 7.35 + 9.8 + 12.25 + 14.7 +
17.15 + 19.6)0.25 = 22.05, so the height is between 17.5 and 22.5 meters. Notice
how the estimates improved compared to part (a) when we increased the number
of intervals. Compare the figures below with 4 and 8 intervals.
(c) If v(t) = 9.8t is linear, the region under its graph is a triangle with sides 2
and 19.6, so its area is 21 · 2 · 19.6 = 19.6. This means that the exact height of the
building is 19.6 meters.
Answer to Exercise 3. (a) is a right Riemann sum with ∆t = 2, (b) is a right

Riemann sum with ∆t = 1, (c) is a left Riemann sum with ∆t = 0.5, and (d) is not
a Riemann sum because ∆t = 1 and we are overcounting one interval.
Answer to Exercise 4. Acceleration due to gravity is −9.8 m/s2 , which is the

derivative of velocity, so velocity v(t) is a linear function with the slope −9.8. In
our case, the initial velocity is v(0) = 29.4, so the function is v(t) = 29.4 − 9.8t.
140 3 Integrals
At time t = 5, the velocity v(5) = 29.4 −

9.8 · 5 = −19.6 is negative, so the baseball
is on its way down. It reaches the peak and
changes direction when 29.4 − 9.8t = 0, or
t = 29.4
9.8 = 3. Looking at the figure, we can
now use the areas of the triangles to com-
pute the distances on the way up and on the
way down. The area above the x-axis is A1 =
1
2 · 3 · 29.4 = 44.1, which is the distance on
the way up, and the area below the x-axis is
1
2 · 2 · 19.6 = 19.6, which is the distance on
the way down. So the change in position is
44.1 − 19.6 = 24.5 meters, which is the height of the baseball at time t = 5.
Answer to Exercise 5. Decelerating at 7 m/s2
means that acceleration is −7 m/s2 , i.e. the
slope is −7. Of course, this agrees with the
fact that the speed is decreasing during brak-
ing. Total braking distance is the area under
this graphs, and it is given to us as 35 m. In
terms of v0 and t0 the area is 35 = 21 v0t0 .
Since the line passes through points (0, v0 )
and (t0 , 0), its slope is −7 = 0−v v0
t0 −0 = − t0 . This
0
gives us two equations,

v0
= 7, v0t0 = 70.
t0
From the first one, we get v0 = 7t0 and, √ plugging this into the√second equation,
2
we get (7t0 )t0 = 70, or t0 = 10, or t0 = 10. Then v0 = 7t0 = 7 10 ≈ 22.13 m/s,
which is above the speed limit.
Answer to Exercise 6. (a) The definite inte-
gral ofRy = 1 − |t| on the interval [−1, 1] is de-
1
noted −1 (1 − |t|)dt.
(b) Its definition is the difference A1 − A2
of areas above and below the x-axis on this
interval, as in the figure. In the caseR of 1 −
1
|t|, A1 = 21 · 1 · 2 = 1 and A2 = 0, so −1 (1 −
|t|)dt = 1.
(c) Another meaning of this definite inte-
gral is the total displacement D(1) − D(−1)
of an object moving along a straight line with thisR1
velocity v(t) = 1 − |t|. In other
words, if D′ (t) = 1 − |t| then D(1) − D(−1) = −1 (1 − |t|)dt = 1.
(d) Above, we noticed that this definite integral is 1, but we can also compute
this integral by approximating it with Riemann sums.
(e) A general Riemann sum in this case is ∑ni=1 (1 − |ti∗ |)∆t. We can be more
specific and replace ∆t with 2n , because the interval has length 2 and we divide it
into n equal subintervals: ∑ni=1 (1 − |ti∗ |) 2n .
(f) If we divide [−1, 1] into four subintervals then the left endpoints will be
t1∗ = −1, t2∗ = −0.5, t3∗ = 0 and t4∗ = 0.5, so the left Riemann sum will be
2
(1 − | − 1|) + (1 − | − 0.5|) + (1 − |0|) + (1 − |0.5|) = 1.
4
Answer to Exercise 7. Since the distance x travelled by the mouse is dm (t), we can
solve x = dm (t) for t as t = dm−1 (x), where we assume that the function is invertible
because the mouse never stops running. The the distance travelled by the cat at that
time is dm (t) = dm (dc−1 (x)). The total displacement at time t is
Z t Z d −1 (x)
c
vc (s) ds = vc (s) ds.
0 0
142 3 Integrals
3.2 Definite integrals: general case

Summary of last section. In the last section we considered a position D(t) of an
object changing with time t, typically moving along some road or along a straight
line. The rate of change of position is velocity v(t) = D′ (t), and the content of the
last section can be summarized in the following formula:
n Z b
D(b) − D(a) = lim
n→∞
∑ v(ti∗ )∆t = a
v(t) dt.
i=1
In a few words, the formula says that if we

break the change of position D(b) − D(a)
into small steps, each step ∆D is approxi-
mately v(ti∗ )∆t and, if we interpret this as
an area of rectangle, in the limit we get the
R of areas A1 − A2 , which was de-
difference
noted ab v(t) dt and called the definite in-
tegral. This means that we can compute
D(b) − D(a) by finding the areas A1 and
A2 if possible, or by using Riemann sums
as an approximation. Of course, if we can
use a calculator we can make this approximation very precise by dividing into
many rectangles, see for example the link in the footnotes.1
Switching the point of view. Everything we discussed about a position D(t) and
velocity v(t) = D′ (t) as a function of time t can be applied to any quantity F(x)
and its derivative f (x) = F ′ (x) as a function of any input variable x. By looking at
the following table
Variable Function Rate of change Small step Riemann sum Definite integral
Rb
t D(t) v(t) = D′ (t) ∆D ≈ v(ti∗ )∆t ∑ni=1 v(ti∗ )∆t v(t) dt
R ba
x F(x) f (x) = F ′ (x) ∆F ≈ f (xi∗ )∆x n ∗
∑i=1 f (xi )∆x a f (x) dx
we can see that the analogue of the above formula in the general case will be
n Z b
F(b) − F(a) = lim ∑ f (xi∗ )∆x = f (x) dx.
n→∞ a
i=1
Of course, the key step is ∆F ≈ f (xi∗ )∆x, which is true because the rate of change
on a small interval should be almost constant, so the rate of change f (xi∗ ) at any
point xi∗ on this interval can be approximated by the average rate of change ∆F ∆x . Let
Rb
us add that when we write a definite integral a f (x) dx, the endpoints a and b are
also called the limits of integration and the function f (x) is called the integrand.
1 https://youtu.be/DcgiBBhGreY.
3.2 Definite integrals: general case 143
Example 1. The concentration of medication in patient’s bloodR 3 is changing at a

rate of r(t) mg/L per hour at time t (measured in hours).
R3
If 2 r(t) dt = −5, what
are the units of 2, 3 and −5? What is the meaning of 2 r(t) dt?
Solution: Since the variable of integration t is time, both limits 2 and 3 represent
time, so its units are hours. If C(t) is the concentration of medication at time t
measured in mg/L, R3
then the rate r(t) is its derivative C′ (t), and the meaning of the
definite integral 2 r(t) dt is the change of concentration C(3) − C(2) between 2
and 3 hours. It means that the units of −5 are mg/L, so the concentration decreased
between 2 and 3 hours by 5 mg/L.
Exercise 1. Average daily profit of a restaurant is increasing at a rate of f (x) dollars
R 60.5
per customer. If 40.5 f (x) dx = 150, what are the units of 40.5, 60.5 and 150? Do
R 60.5
the limits 40.5 and 60.5 make sense? What is the meaning of 40.5 f (x) dx?
Example 2. The following figure shows

the velocity y = v(t) (in m/s) of two cars
driving on the same road between t = 1
and t = 7 minutes. At time t = 1 min, the
first car is ahead by 10 meters. Express
how far ahead or behind the first car is
at time t = 7 min in terms of the areas
A1 , A2 , A3 and A4 in the figure.
Solution: Let us start by comparing the
distances travelled by the two cars between
1 and 7 minutes. Let us do this in two different ways.
First, Car 1 travelled the distance given by the area under its graph, which is
A2 + A3 + A4 , and Car 2 travelled the distance given by the area under its graph,
which is A1 + A3 + A4 . If we subtract the two, we get that
(A2 + A3 + A4 ) − (A1 + A3 + A)4 = A2 − A1
is how much more Car 1 travelled compared to Car 2. If this is negative, it means
that Car 1 travelled less. Notice that we do not need to know A3 and A4 .
Another way to get the same answer is to divide the interval in the middle point
where the two velocities are equal, near 3.8. Up to that point, the velocity of Car 2
is bigger and the extra distance it travels compared to Car 1 is exactly the area A1 in
between the two graphs. If this is not clear, notice that on this interval Car 1 travels
A3 and Car 2 travels A1 + A3 , so it travels extra distance A3 . Similarly, in the second
half the velocity of Car 1 is bigger and it travels extra distance A2 . Combining the
two intervals, we get that Car 1 travels A2 − A1 extra distance compared to Car 2.
Again, if this is negative, it means that Car 1 travelled less.
Since Car 1 was 10 meters ahead at time t = 1 min and it travelled extra A2 − A1
between 1 and 7 min, does this mean that it is ahead by 10 + A2 − A1 ? The answer
is no! We need to pay attention to the units of the variables and ‘area’ in the figure.
144 3 Integrals
Time on the x-axis has minutes as its units, while the velocity on the y-axis has
m/s as its units, so the units of area or distance v(t)∆t are m/s × min = m/s × 60s
= 60m. So one unit of ‘area’ in this figure is equal to 60 meters, which means
that the extra distance travelled by Car 1 is (A2 − A1 ) × 60m once we take units
into account. At time t = 7 minutes Car 1 is 10 + 60(A2 − A1 ) meters ahead. If
this number is negative, ahead by a negative number means that Car 1 is actually
behind.
Exercise 2. The following figure

shows the growth rates (in cm/year)
of boys and girls in the United
States between the ages of 6 and 17.
The average height of boys at age
6 is 116 cm, and the average height
of girls at age 6 is 115 cm. Express
the difference between the average
height of boys and girls at age 17 in
terms of the areas A1 and A2 in the figure. Estimate the difference between the
average height of boys and girls at the age 18 using the data in the following table.
Age 6 7 8 9 10 11 12 13 14 15 16 17
Boys 6.70 6.18 5.90 5.60 5.51 5.68 6.54 7.64 6.73 4.46 2.58 1.10
Girls 6.70 6.27 6.00 5.98 6.33 6.68 6.04 4.29 2.42 1.13 0.64 0.33
Some properties of definite integrals. When solving the above problems, we

implicitly used some basic properties of areas. For example, if we divide some
interval [a, b] into two subintervals [a, c] and [c, b] then adding the areas on these
subintervals we get the total area on the entire interval. This means that
Z c Z b Z b
f (x) dx + f (x) dx = f (x) dx.
a c a
R7
Example 3. If a function f (x) is even, and we know that −3 f (x) dx = 5 and
R7 R7
3 f (x) dx = 1, what is the integral 0 f (x) dx?
Solution: By the above property,
Z 7 Z 3 Z 7
f (x) dx = f (x) dx + f (x) dx,
−3 −3 3
3 7
f (x) dx − 37 f (x) dx = 5 − 1 =
R R R
so −3 f (x) dx = −3
4. Because Rf (x) is even, as we can see in the figure,
3
the integral −3 f (x) dx consists of two equal parts on
subintervals [−3, 0] and [0, 3], so 03 f (x) dx = 42 = 2.
R
Using the above property again, 0 f (x) dx = 03 f (x) dx + 37 f (x) dx = 2 + 1 = 3.

R7 R R
R7
Exercise 3. If a function f (x) is odd, and we know that −3 f (x) dx = 5 and
R7 R3
0 f (x) dx = 7, what is the integral 0 f (x) dx?
Another important property of definite integrals is the following convention. We

will agree that
Z a Z b
f (x) dx = − f (x).
b a
In other words, swapping

R −3 the limits ofRintegration a and b changes the sign of the
1
integral. For example, 1 f (x) dx = − −3 f (x). Let us explain why this convention
is very convenient.
• First, for convenience, we want the formula ab f (x) = F(b) − F(a) to be true no
R
matter which number is bigger, a or b. This way we don’t need to worry which
number is bigger when we use this formula. But if we swap a and b, the right
hand side will become F(a) − F(b) = −(F(b) − F(a)), so only the sign will
change. That is why we agree that ab f (x) will only change the sign if we swap
R
a and b.
• Another reason is that we want the formula ac f (x) dx + cb f (x) dx = ab f (x) dx
R R R
to be true no matter the numbers a, b and c are. If we take b equal to a then this
formula becomes
Z c Z a Z a
f (x) dx + f (x) dx = f (x) dx = 0.
a c a
The integral on the right hand side is equal to 0, because the areaR from a to a is
zero, so this formula will be true if we agree that ca f (x) dx = − ac f (x) dx.
R
R3 R −1 R3
Example 4. If −1 f (x) dx = 4 and 2 f (x) dx = 2, what is 2 f (x) dx?
Solution: Let us start with the usual property −1 f (x) dx = −1 f (x) dx + 23 f (x) dx.
R3 R2 R
R3
f (x) dx = 4 and we want to find 23 f (x) dx, so we need −1
R R2
We know that −1 f (x) dx.
R2 R −1
But, by our convention, −1 f (x) dx = − 2 f (x) dx = −2, so we can write the
above property as 4 = −2 + 23 f (x) dx, and 23 f (x) dx = 4 + 2 = 6.
R R
R −1 R2 R1
Exercise 4. If −2 f (x) dx = 4, −2 f (x) dx = 6 and 2 f (x) dx = −2, what is
R1
−1 f (x) dx?
Another two properties of definite integrals is that the integral of the sum is
equal to the sum of integrals,
Z b Z b Z b
f (x) + g(x) dx = f (x) dx + g(x) dx,
a a a
and
146 3 Integrals
Z b Z b
κ f (x) dx = κ f (x) dx,
a a
i.e. we can take a constant factor κ outside of the integral. The first property is true
because we can add the areas vertically, and the second property is true because
stretching a function κ times vertically multiplies the area by κ. The two properties
together are called the linearity of integral.
R3 R3 R3
Example 5. If 1 f (x) dx = −2 and 1 g(x) dx = −1, what is 1 (5 f (x) − 7g(x)) dx?
Solution: By the linearity of integral,
Z 3 Z 3 Z 3
(5 f (x) − 7g(x)) dx = 5 f (x) dx − 7 g(x) dx = 5(−2) − 7(−1) = −3.
1 1 1
Rb
f (x) + g(x) dx = −2 and ab f (x) − g(x) dx = 4, what is
R
Exercise 5. If a
Rb
a g(x) dx?
Another property of definite integrals is that

Z b Z b
if f (x) ≤ g(x) then f (x) dx ≤ g(x) dx.
a a
If a function is bigger than its definite integral is also bigger, which is obvious by
looking at the areas. This is called monotonicity of integral. Notice that this is only
true if a ≤ b! Unlike other properties, if we flip a and b the minus sign will also
flip the inequality. For example, if 2 = 12 f (x)Rdx ≤ 12 g(x)Rdx = 7 then flipping the
R R
limits we get the opposite inequality −2 = 21 f (x) dx ≥ 21 g(x) dx = −7. So the

order of limits is important for this property.
R1
Example 6. If f (x) ≤ −2 then which number is bigger, 6 f (x) dx or 10?
Solution: If we use the limits in the increasing order, 1 to 6, then the monotonicity
of integral tells us that f (x) ≤ −2 implies that
Z 6 Z 6
f (x) dx ≤ (−2)dx = (−2)(6 − 1) = −10.
1 1
R1
Flipping the limits changes the sign, so 6 f (x) dx ≥ 10.
Exercise 6. Order the following definite integrals from the smallest to the largest
without any calculations first. After that find their values.
R1 R1 R1 √
(a) 1 dx (b) (1 − |x|) dx (c) 1 − x2 dx.
−1 −1 −1
Example 7. Which number is bigger,

Z π/2
π π π π π
cos(x) dx or cos 0.25 + cos 0.5 + cos 0.75 + cos .
0 8 2 2 2 2
Solution: We should recognize that the second number is the right Riemann sum of
R π/2 π π
0 cos(x) dx on the interval [0, 2 ] with n = 4 subintervals, each of length 8 . Since
cos(x) is decreasing on this interval, we already discussed in the last section that
the right Riemann sum is underestimating the integral, so it is smaller. Of course,
this is the same idea as monotonicity, because the rectangles with height at the right
endpoints will be below the function when it is decreasing.
Exercise 7. Which number is bigger,

Z π/2
π π π π π
sin(x) dx or sin 0.25 + sin 0.5 + sin 0.75 + sin .
0 8 2 2 2 2
Next two problems will refer to the following two figures.
1 R
Example 8. The definite integral −1 f (x) dx corresponding to the left figure above
is approximately equal to which of the following?
(a) 1.12 (b) 1.58 (c) 2.02 (d) 2.57

√
Solution: It looks like the curve is closely tracing half a circle y = 1 − x2 , whose
definite integral is π2 ≈ 1.57, so the best guess would be (b) 1.58.
1 R
Exercise 8. The definite integral −1 f (x) dx corresponding to the right figure above
is approximately equal to which of the following?
(a) 1 (b) 1.5 (c) 2 (d) 2.5

148 3 Integrals
Answer to Exercise 1. Since the quantity we are talking about is the average daily
profit (in dollars) and the rate f (x) is given in dollars per customer, the variable
x must be the number of customers, so the unit of 40.5 and 60.5 is the number
of customers. It might look strange to consider half a customer but, because we
are talking about the average daily profit over a long period of time, x can also
be thought of as the average daily number of customers, so the limits 40.5 and
60.5 make sense. If P(x) is the average daily profit when x is the average number
of customers,
R 60.5
the rate f (x) is its derivative P′ (x), and the meaning of the definite
integral 40.5 f (x) dx is the extra average daily profit P(60.5) − P(40.5) when the
average number of customers increases from 40.5 to 60.5. It means that the units
of 150 are dollars.
Answer to Exercise 2. The the difference between the average height of boys and
girls at age 17 is 1 + A2 − A1 , similarly to Example 2 and because the units of area
are cm/year × year = cm. Using the values in the table, (6.70 + 6.27 + 6.00 +
5.98 + 6.33 + 6.68 + 6.04 + 4.29 + 2.42 + 1.13 + 0.64 + 0.33) × 1 = 52.81 is the
left Riemann sum for the growth of girls on the interval [6, 18], and (6.70 + 6.18 +
5.90 + 5.60 + 5.51 + 5.68 + 6.54 + 7.64 + 6.73 + 4.46 + 2.58 + 1.10) × 1 = 64.62
is the left Riemann sum for the growth of boys on the interval [6, 18]. Although age
18 is not in the table, using the twelve values from age 6 to 17 estimates growth
over the twelve year period from 6 to 18. The estimate of the difference of average
heights at age 18 is 1 + 64.62 − 52.81 = 12.81 cm.
Answer to Exercise 3. By the property of integrals,

Z 7 Z 0 Z 7
f (x) dx = f (x) dx + f (x) dx,
−3 −3 0
0 7
f (x) dx − 07 f (x) dx = 5 − 7 =
R R R
so −3 f (x) dx = −3
−2. BecauseR f (x) is odd, as we can Rsee in the figure,
the integral 03 f (x) dx differs from −3 0
f (x) dx only
by a sign, so it is equal to 2.
Answer to Exercise 4. Here we divide the interval [−2, 2] into [−2, −1], [−1, 1]
and [1, 2], so
Z 2 Z −1 Z 1 Z 2
f (x) dx = f (x) dx + f (x) dx + f (x) dx.
−2 −2 −1 1
and 21 f (x) dx = −2 gives that

R R2
We knowR1
the first two integrals,
R1 1 f (x) dx = 2, so
6 = 4 + −1 f (x) dx + 2 and −1 f (x) dx = 0.
Rb Rb
Answer to Exercise 5. If we denote p = a f (x) dx and q = a q(x) dx, we know
that
Z b
p+q = f (x) + g(x) dx = −2
a
Z b
p−q = f (x) − g(x) dx = 4.
a
Solving
Rb these two equations for p and q, we get that p = 1 and q = −3, so
a g(x) dx = −3.
Answer√ to Exercise 6. The key here is to remember

that y = 1 − x2 is the top half of the circle of radius
1 centered at the origin, because squaring both sides
we get y2 = 1 − x2 , or x2 + y2 = 1. The function y =
1 − |x| has a corner shape consisting of two lines:
1−x when x is positive and 1+x when x is negative.
Sketching all three functions we can see that
p
1 − |x| ≤ 1 − x2 ≤ 1
so, by monotonicity, the integrals are arranged in the same order, (b) ≤ (c) ≤ (a).
Using geometry, their values are: (a) 2 × 1 = 2, (b) 12 × 2 × 1 = 1 and (c) π2 .
R π/2
Answer to Exercise 7. The sum is the right Riemann sum of 0 sin(x) dx on the
interval [0, π2 ] with n = 4 subintervals. Since sin(x) is decreasing on this interval,
the right Riemann sum is overestimating the integral, so it is bigger.
Answer to Exercise 8. It looks like the curve is closely tracing a linear function
1 + x on the interval [−1, 0] and a constant function 1 on the interval [0, 1], so the
best guess would be 21 + 1 = 1.5, i.e. (b).
150 3 Integrals
3.3 Fundamental Theorem of Calculus

In the previous sections we learned that: if f (x) = F ′ (x) then
Z b
b
f (x) dx = F(x) a = F(b) − F(a).
a
This statement is called the Fundamental Theorem of Calculus (FTC for short).
The notation in the middle F(x)|ba is another way to write F(b) − F(a), which
will be very convenient when using this formula. So far we have mostly used this
formula to find the change
Rb
F(b) − F(a) of some quantity F(x) by either computing
the definite integral a f (x) dx using areas (for simple enough graphs of f (x)) or
approximating it by Riemann sums. Below we will start discussing a different way
to use the FTC, by systematically guessing what the function F(x) could be if we
know its derivative f (x). But first let us solve a couple of simple problems using
the FTC and emphasizing the relationship f (x) = F ′ (x).
Example 1. If f (x) = F ′ (x) and

Z 4.5
F(4.5) = −3, f (x) dx = 3.2, F(7) = 1,
1.5
R7
what is F(1.5) and 1.5 f (x) dx?
R 4.5
Solution: Using that 1.5 = F(4.5) − F(1.5) and the information given, we
f (x) dx
R7
get that 3.2 = −3 − F(1.5) and so F(1.5) = −6.2. As for the integral 1.5 f (x) dx,
by FTC it is equal to F(7) − F(1.5) = 1 − (−6.2) = 7.2. R7
If we haven’t already computed F(1.5), we could also compute 1.5 f (x) dx by
breaking it into two integrals:
Z 7 Z 4.5 Z 7
f (x) dx = f (x) dx + f (x) dx.
1.5 1.5 4.5
The first integral is given

R 7 to us, 3.2, and the second can be computed using
R 7 the FTC
and the given values: 4.5 f (x) dx = F(7)−F(4.5) = 1−(−3) = 4, so 1.5 f (x) dx =
3.2 + 4 = 7.2.
Exercise 1. Suppose f (x) = F ′ (x). Given the
graph of y = F(x) in the figure, describe the
geometric meaning of:
(a) f (a)
Z b
(b) f (x) dx
a
Z b
1
(c) f (x) dx
b−a a
3.3 Fundamental Theorem of Calculus 151
Average value. The quantity that ap-

peared in the last exercise,
Z b
1
f (x) dx,
b−a a
is called the average of f (x) on the in-

terval [a, b]. If the look at the graph of
f (x) in the figure, the rectangleR with the
1 b
height equal to the average b−a a f (x) dx
will have the same area as the area under
1 Rb F(b)−F(a)
f (x). Also, when f (x) is the rate of change of F(x) then b−a a f (x) dx = b−a
is the slope of the secant line of F(x), which was called the average rate of change
of F(x) on [a, b]. In the next two problems, we will compute averages of two func-
tions f (x) while at the same time trying to guess F(x) such that f (x) = F ′ (x).
Example 2. The Gateway Arch in St. Louis
is 630 ft wide and 630 ft tall. Its height can
be well approximated by the function
y = 704 − 37e0.0093x − 37e−0.0093x
for x between −315 and 315 feet, shown by

the yellow dashed line in the figure.2 (Its
shape is an example of a weighted catenary
curve.) What is the average height of the
Gateway Arch?
Solution: First, by linearity of integral, the definite integral of the height y(x) on
[−315, 315] can be broken into three parts:
Z 315 Z 315 Z 315
704 dx − 37 0.0093x
e dx − 37 e−0.0093x dx.
−315 −315 −315
The first term is just 704 × 630 = 443520, the area of rectangle. To compute the
second and third integral, we would like to use the FTC and write them as F(315)−
F(−315). Let us start with the second term. Can we guess what function F(x) has
the derivative F ′ (x) = e0.0093x ? We know that (ex )′ = ex , so we can try e0.0093x . Its
derivative (e0.0093x )′ = 0.0093e0.0093x is almost what we want, but it has an extra
factor 0.0093 because of the chain rule. This is not a big problem, because we
can divide by 0.0093 and take F(x) = 0.0093 1
e0.0093x . Now we can see that F ′ (x) =
e0.0093x as we wanted. Therefore, by the FTC,
e0.0093x e0.0093×315 e−0.0093×315

Z 315 315
e0.0093x dx = = − = 2006.97.
−315 0.0093 −315 0.0093 0.0093
2 Background photo by Sam Valadi, https://www.flickr.com/photos/132084522N05/17275578342/.
152 3 Integrals
Similarly, we can guess that e−0.0093x is the derivative of − 0.0093

1
e−0.0093x and
e−0.0093x e−0.0093×315 e0.0093×315

Z 315 315
e−0.0093x dx = − =− − − ,
−315 0.0093 −315 0.0093 0.0093
which is also equal to 2006.97. Combining all three integrals, we get 443520 −
37 × 2006.97 − 37 × 2006.97 = 295004.22. To get the average of y(x) we need to
divide this by the length 630 of the interval [−315, 315], and we get that the average
height is ≈ 468 feet.
Exercise 2. A pizza pie at room temperature
of 20◦ C is put into a brick oven, after which
the temperature T of the pizza top starts to
increase according to the formula
T = 100 − 80e−t
where time t is in minutes. What is the aver-

age temperature during the first two minutes?
Antiderivatives, a.k.a. indefinite integrals. If f (x) is the derivative of F(x)
then F(x) is called an antiderivative of f (x). When we were guessing F(x) in
the problems above, we were looking for an antiderivative. Another name for an
antiderivative is indefinite
R integral. The notation for an antiderivative/indefinite
integral of f (x) is f (x) dx, which is similar to a definite integral only without
the limits a and b. In other words:
Z
′
If f (x) = F (x) then f (x) dx = F(x) +C
where C is any constant. The reason we need to add a constant +C is because, if

F ′ (x) = f (x) then (F(x) +C)′ = f (x) + 0 = f (x), so given one antiderivative F(x)
any shift F(x) +C will also be an antiderivative. The constant C will be important
below when we look for antiderivative passing through a specific point.
Since we know derivatives of many basic functions, we can reverse the direction
to make a list of a few basic indefinite integrals:
xn+1
Z
xn dx = +C if n ̸= 1,
n+1
1
Z
dx = ln |x| +C,
Z x
ex dx = ex +C,
Z
cos(x) dx = sin(x) +C,
Z
sin(x) dx = − cos(x) +C.
The reason we put |x| inside the logarithm in the second integral is because 1x is
defined for negative x while ln(x) is defined only for positive x and, for negative
x, antiderivative of 1x will be ln(−x), which is the same as ln(x). All the other
antiderivatives can be checked by taking derivatives. We can also combine these
examples with the following simple rule:
1
Z Z
If f (x) dx = F(x) +C then f (b + mx) dx = F(b + mx) +C.
m
We have already used this in the above example when we guessed that an an-
tiderivative of emx is m1 emx . Here we can similarly check that, if F ′ (x) = f (x) then
1 ′ 1 1
F(b + mx) = (F(b + mx))′ = F ′ (b + mx)m = f (b + mx),
m m m
by using the chain rule in the middle, so indeed m1 F(b + mx) is an antiderivative of
f (b+mx). This rule is the simplest case of the so called integration by substitution
that we will study later. Another obvious rule is the linearity of indefinite integrals
saying that an antiderivative of the sum is the sum of antiderivatives.
Example 3. Compute the following indefinite integrals:
(a) (3 + 4t + 1t ) dt (b) (6e−z/2 − e3z ) dz (c) (sin(2x +1)− x42 ) dx.

R R R
Solution: (a) By linearity and the above list:
1 1 t2
Z Z Z Z
3 + 4t + dt = 3 dt + 4 t dt + dt = 3t + 4 + ln |t| +C.
t t 2
Notice that we do not need to add +C to each indefinite integral separately, because
all those unknown constants can be combined into one constant. Of course, we can
simplify the answer as 3t + 2t 2 + ln |t| +C.
(b) Using linearity and the above (b + mx) rule:
6e−z/2 e3z
Z Z Z
(6e−z/2 − e3z ) dz = 6 e−z/2 dz − e3z dz = − +C
−1/2 3
3z
which can be simplified as −12e−z/2 − e3 +C.
(c) By linearity, using the above list, and using the (b + mx) rule:
4
Z Z Z
(sin(2x + 1) − 2 ) dx = sin(2x + 1) dx − 4 x−2 dx
x
cos(2x + 1) 4x−2+1
=− − +C
2 −2 + 1
which can be simplified to − cos(2x+1)

2 + 4x +C.
154 3 Integrals
Exercise 3. Compute the following indefinite integrals:

R √ 2
) du (b) (1 − 6e−s/3 ) ds (c) 2 cos( 2r − 1) dr.
R R
(a) ( 3u + 1 − 2u+1
Example 4. Find the total area between the graph

of y = cos(x) and the x-axis on the interval [0, 2.5].
Solution: We know that cosine is positive between
[0, π2 ] and negative between [ π2 , π], and the right
endpoint 2.5 is somewhere in the second interval.
We want to compute the sum of areas A1 + A2 in the
figure. The first area is just the indefinite integral
which can be computed using the FTC:
Z π/2 π/2
A1 = cos(x) dx = sin(x) = sin(π/2) − sin(0) = 1 − 0 = 1.
0 0
Notice that when using the FTC we did not write +C in (sin(x) +C), because the
constant will cancel out anyway: (F(b) +C) − (F(a) +C) = F(b) − F(a).
To find A2 , we know that −A2 is the definite integral on [ π2 , 2.5], so
Z 2.5 2.5
−A2 = cos(x) dx = sin(x) = sin(2.5) − sin(π/2) = −0.4015.
π/2 π/2
This means that A2 = 0.4015 and the total area is A1 + A2 = 1.4015.
Exercise 4. Find the total area between the graph of y = 12 x(x − 1)(x − 3) and the
x-axis on the interval [0, 2].
Initial Value Problem. Given the rate of change f (x) of some quantity F(x),
sometimes we want to find F(x) given additional information that F(x0 ) = y0 ,
which is also called the initial value problem. If we can find any antiderivative
G(x) of f (x) then G(x) +C is also an antiderivative so, if we want an antiderivative
to pass through a point (x0 , y0 ), we can plug it in, y0 = G(x0 ) + C, and find the
constant C = y0 − G(x0 ).
Example 5. The velocity of a ball oscillating on

a vertical spring is v(t) = cos(πt). If at time t = 0
the height of the ball is 1, what is its height h(t)
at time t.
Solution: Since h′ (t) = v(t), the height h(t) is an
antiderivative of cos(πt) such that h(0) = 1. We
can take a general antiderivative π1 sin(πt) +C of
the velocity and make sure that π1 sin(0) +C = 1, so C = 1 − π1 sin(0) = 1. It means
that the height at time t is h(t) = π1 sin(πt) + 1.
Exercise 5. Blood alcohol content (BAC) is measured in

grams per 100ml, so 1 BAC = 0.01 g/ml. An average person
metabolizes alcohol at a rate of 0.015 BAC/hour. If a person
stops drinking at 2 a.m. and at 8 a.m. their BAC level is 0.09,
what is their BAC level at time t after 2 a.m. measured in
hours.3
FTC-2: Reconstruction Theorem. Another way toRsolve the initial value prob-
lem F ′ (x) = f (x), F(x0 ) = y0 , is to take the statement ab f (x) dx = F(b) − F(a) of
the FTC and replace a by x0 and b by x:
Z x
f (t) dt = F(x) − F(x0 ).
x0
Notice that we also replaced the variable of integration by t in f (t) dt because x is

now reserved for the upper limit b = x. Of course, instead of t we can use any other
name that is not reserved, such as s, u, etc. If we want F(x0 ) to be equal to y0 , we
can replace F(x0 ) by y0 and rewrite this formula as
Z x
F(x) = y0 + f (t) dt.
x0
The FTC rewritten in this form is known as the second form of the FTC or the
Reconstruction Theorem. In this statement, we are thinking of the upper limit b as
an independent variable x, and the right hand side y0 + xx0 f (t) dt gives us a specific
R
antiderivative with the initial value y0 at x = x0 .

Example 6. Blood alcohol content (BAC) is measured in grams per 100ml, so
1 BAC = 0.01 g/ml. An average person metabolizes alcohol at a rate of 0.015
BAC/hour. If a person stops drinking at 2 a.m. and at 8 a.m. their BAC level is
0.09, what is their BAC level at time t after 2 a.m. measured in hours.
Solution: If A(t) is the BAC level at time t, then its derivative is −0.015 (minus sign
R is decreasing) and A(6) = 0.09. By the Reconstruction theorem,
because the level
A(t) = 0.09 + 6t (−0.015) ds = 0.09 + (−0.015)(t − 6) = 0.18 − 0.015t. Of course,
this formula only works until A(t) reaches zero, 0.18 − 0.015t = 0, i.e. until t = 12
hours, or 2 p.m.
Exercise 6. A patient is given some medication in a 50 mg immediate release pill
and 100 mg delayed-release pill which is released at a rate 1600te−4t mg/h, where
time t is measured in hours. What amount of medication has been delivered by
time t? The answer can be in an integral form.
Example 7. Given a graph of the derivative y = f ′ (x) in the left figure below of
some continuous function f (x) on the interval [−4, 4], sketch the graph of y = f (x)
if f (1) = 4.
3 Illustration from www.houstondwiattorney.net.
156 3 Integrals
Solution: First of all, notice that here we departed from the convention to call the
derivative f (x) and antiderivative F(x). Instead, here f ′ (x) is the derivative of f (x)
and f (x) is the antiderivative of f ′ (x) such that f (1) = 4. We see that f ′ (x) = −2
on the interval [−4, −2) which means that f (x) has slope −2 there, f ′ (x) = 3 on
the interval (−2, 1) where the slope of f (x) is 3, and f ′ (x) = −1 on the interval
(1, 4) where the slope of f (x) is −1.
In the middle figure above we sketch a continuous function f (x) with such
slopes on these three intervals. We specified that f (x) is continuous, so it does
not have jumps at x = −2 and x = 1, but the derivative is not defined at those
points. We labelled this graphs by y = f (x) +C in the middle figure, because it can
be shifted vertically by any constant C if we only know its derivative f ′ (x).
However, because we are given that f (1) = 4, we can now fix the position of the
function f (x) as depicted in the right figure above. In this case, each piece of the
function is linear so we could easily find its formula if we wanted to: f (x) = −x + 5
on [1, 4], f (x) = 3x + 1 on [−2, 1], and f (x) = −2x − 9 on [−4, −2].
Exercise 7. Given f (x) in the figure on the right,
which function below could be its antiderivative
such that F(0) = 1?
(a) (b)
(c) (d)
Rx
The FTC tells us that, if f (x) = F ′ (x) then a f (t) dt = F(x) − F(a). As above,
we replaced the upper limit b by a variable x, because, for example, we want to
compute this definite integral for any Rupper limit x. If we take the derivative of
d x ′ ′
both sides with respect to x, we get dx a f (t) dt = (F(x) − F(a)) = F (x) = f (x).
This gives us yet another consequence of the FTC:
Z x
d
f (t) dt = f (x).
dx a
This
R x formula is very useful, because even if we do not know what the integral
a f (t) dt is, we know that
R its derivative is f (x), so we can deduce some of the
properties of this integral ax f (t) dt.
Next two problems will refer to the following figures.
Rx
Example 8. Suppose that h(x) = 0 f (t) dt with f (t) in the left figure above.
(a) What is h(0) and h′ (1)? (c) Where is the global minimum
(b) On what interval is h(x) concave up? and maximum of h(x) on [0, 2]?
Solution: (a) h(0) = 00 f (t) dt

R
R x = 0, because the interval [0, 0] has length zero. By
the above formula, h (x) = ( 0 f (t) dt) = f (x) so h′ (1) = f (1) = 0.
′ ′
(b) Since h′ (x) = f (x) and h(x) is concave up on the interval where its derivative
is increasing, this happens on the interval [0, 0.5] where f (x) is increasing.
(c) Since h′ (x) = f (x) is positive on [0, 1] and negative on [1, 2] in the figure,
h(x) is increasing on [0, 1] and decreasing on [1, 2]. It means that the maximum is
at x = 1 and minimum is at one of the endpoints x = 0 or x = 2. By FTC, h(x)
increases on [0, 1] by the area above the x-axis in the figure and decreases on [1, 2]
by the area below the x-axis in the figure. Since the area below the x-axis is bigger,
h(x) will decrease more on [1, 2] than it will increase on [0, 1]. This means that the
minimum will be at x = 2.
158 3 Integrals
Rx
Exercise 8. Suppose that h(x) = 0 f (t) dt with f (t) in the right figure above.
(a) What is h(0) and h′ (−1)? (c) Where is the global minimum
(b) On what interval is h(x) concave up? and maximum of h(x) on [−1, 1]?
There is a more general formula:

Z b(x)
d
f (t) dt = f b(x) b′ (x) − f a(x) a′ (x).

dx a(x)
First of all, in this formula we replaced both the lower limit a and upper limit b by
some functions a(x) and b(x). It might look intimidating, but all we did was apply
R b(x)
the chain rule. Indeed, the FTC tells us that a(x) f (t) dt = F(b(x)) − F(a(x)) and
when we differentiate this equation, we simply apply the chain rule twice,
Z b(x)
d ′
f (t) dt = F(b(x)) − F(a(x))
dx a(x)
= F ′ b(x) b′ (x) − F ′ a(x) a′ (x)

= f b(x) b′ (x) − f a(x) a′ (x).

R x2 t 4
Example 9. Compute the derivative of 2x e dt.
Solution: By the above formula,
Z x2
d 4 2 )4 4 8 4
e−t dt = e−(x (x2 )′ − e−(2x) (2x)′ = 2xe−x − 2e−16x .
dx 2x
R sin(x)
Exercise 9. Compute the derivative of cos(x)
ln(1 + t 2 ) dt.
A playlist with an overview of the FTC can be found in the footnote link.4
You can also play around with the following Geogebra example to make sure you
understand how the integral changes as a function of the upper limit.5
4 https://www.youtube.com/playlist?list=PLYxPH73Uem-QmJI2fdsCtRww-oYMIXUOx.
5 https://www.geogebra.org/m/fa2w8qjy
Answer to Exercise 1. (a) Because f (a) = F ′ (a)

it is the slope of y = F(x) at x = a.
(b) Because
Z b
f (x) dx = F(b) − F(a),
a
it is the distance between F(a) and F(b) on the

y-axis.
(c) Because
Z b
1 F(b) − F(a)
f (x) dx = ,
b−a a b−a
it is the slope of the secant line connecting the points (a, F(a)) and (b, F(b)).
Answer to Exercise 2. e−t is the derivative of −e−t so, by FTC,

Z 2 Z 2 Z 2
(100 − 80e−t )dt = 100 dt − 80 e−t dt = 100 × 2 − 80(−e−2 − (−e0 ))
0 0 0
which is approximately 130.62◦ C.
Answer
R √ to Exercise 3.
2
(a) ( 3u + 1 − 2u+1 ) du = 29 (3u + 1)3/2 − ln |2u + 1| +C.
(b) R (1 − 6e−s/3 ) ds = s + 18e−s/3 +C.
R
(c) 2 cos( 2r − 1) dr = 4 sin( 2r − 1) +C.
Answer to Exercise 4. We know that the polyno-

mial 21 x(x − 1)(x − 3) has zeros at x = 0, 1 and 3
and we can easily check that it is positive on [0, 1]
and negative on [1, 3], so the graph looks like in
the figure. To find definite integrals on [0, 1] and
[0, 2], we first multiply out
1 x3 3x
x(x − 1)(x − 3) = − 2x2 + .
2 2 2
Its indefinite integral is
Z 3
x 3x x4 2x3 3x2
− 2x2 + dx = − + +C
2 2 8 3 4
so
Z 1 3 x4 2x3 3x2 1 1 2 3
x 3x 5
A1 = − 2x2 +
dx = − + = − + = ,
0 2 2 8 3 4 0 8 3 4 24
Z 2 3 x4 2x3 3x2 2
x 3x 13
−A2 = − 2x2 + dx = − + =− ,
1 2 2 8 3 4 1 24
160 3 Integrals
5
and the total area is A1 + A2 = 24 + 13
24 =
3
4 = 0.75.
Answer to Exercise 5. The rate of change of BAC level is −0.015 because it is

decreasing, so a general antiderivative is −0.015t +C. At t = 6 hours after 2 a.m.
BAC level is −0.015 × 6 +C = 0.09, so C = 0.09 + 0.015 × 6 = 0.18. BAC level at
time t after 2 a.m. is 0.18 − 0.015t until it reaches 0, i.e. when 0.18 − 0.015t = 0 or
t = 12, which is 2 p.m. So between 2 a.m. and 2 p.m. the BAC level is 0.18−0.015t.
Answer to Exercise 6. If A(t) is the total amount of medication by time t, then

the rate 1600te−4t is its derivative and
R the initial amount is A(0) = 50. By the
Reconstruction theorem, A(t) = 50 + 0t 1600se−4s ds. The units are: mg/h × h =
mg. This integral can be computed using integration by parts that we will learn
later on, but for any specific t we could use Riemann sums to approximate this
amount if we needed to.
Answer to Exercise 7. First of all, we can eliminate (b), because the derivative
f ′ (0) is defined at x = 0 so the function cannot have a jump there. Next, we can
eliminate (d), because the function is not equal to 1 at x = 0, but we must have
F(0) = 1. Between (a) and (c), the difference is the slope between x = −4 and
x = −2. Since f (x) = −2 on that interval, the slope should be −2, which means
that (a) could be the answer. Why did we say “could be”? Because in this problem
we did not ask for F(x) to be continuous and the derivative f (x) is not defined at
x = −2, so if F(x) could have a jump then the linear piece on the interval [−4, −2]
could also be shifted vertically. If we asked for a continuous antiderivative then
this is the answer.
Answer to Exercise 8. (a) h(0) = 0 and h′ (−1) = f (−1) = 0.

(b) h′ (x) = f (x) is increasing on [−0.5, 1], so h(x) is concave up there.
(c) h′ (x) = f (x) is negative on [−1, 0] and positive on [0, 1], so h(x) is decreasing
on [−1, 0] and increasing on [0, 1]. The minimum is at x = 0, and the maximum is
at x = 2 because the area above the x-axis is bigger, so h(x) will increase more than
it will decrease.
Answer to Exercise 9.
Z sin(x)
d
ln(1 + t 2 ) dt = ln(1 + sin2 (x)) cos(x) − ln(1 + cos2 (x))(− sin(x))
dx cos(x)
= ln(1 + sin2 (x)) cos(x) + ln(1 + cos2 (x)) sin(x).
3.4 Application of FTC: differential equations 161
3.4 Application of FTC: differential equations

In this section we will continue using the Fundamental Theorem of Calculus in the
form of the Reconstruction theorem, which tells us that the antiderivative
Rx F(x) of
f (x) with the initial value F(x0 ) = y0 can be written as F(x) = y0 + x0 f (t) dt.
What will be new is a slightly different point of view and some terminology that
will connect us to a more general topic of differential equations. In the future, we
will discuss in some detail the differential equations of the form
dy
= f (x, y).
dx
A differential equation with additional information about some initial value
dy
= f (x, y), y(x0 ) = y0
dx
is called an initial value problem (IVP). This equation is like a puzzle where an
unknown function y = y(x) could appear on both sides of the equation. On the left
dy
hand side its derivative dx = y′ (x) appears, and on the right hand side y(x) appears
inside some formula f (x, y) = f (x, y(x)) that can also depend on x. Here are some
examples of such equations:
dy dy x dy dy dy
= xy, = , = x2 + y2 , = 1 + y2 , = cos(x).
dx dx y dx dx dx
To solve such an equation means to find possible functions y = y(x) that satisfy it,
meaning that if we plug it in on both sides of the equation we will get equality.
Example 1. Show that y = tan(x) on the interval − π2 < x < π

2 is the solution to the
dy
initial value problem dx = 1 + y2 , y(0) = 0.
Solution: First of all, y(0) = tan(0) = 0, so the initial value matches. Next, let us
plug in y = tan(x) into the differential equation and check that the two sides are
dy
equal. On the left hand side, the derivative dx = (tan(x))′ = sec2 (x). On the left
2 2 2
hand side, 1 + y = 1 + tan (x) = sec (x) – a famous trigonometric identity. The
two sides are equal, so y = tan(x) is indeed the solution of this initial value problem.
The reason the interval was limited to − π2 < x < π2 is because tan(x) has vertical
asymptotes at x = − π2 and π2 , so the equation does not make sense at those points.
3π
Exercise 1. Show that x = sin(t)
√ on the interval π2 ≤ x ≤ 2 is the solution to the
initial value problem dx 2
dt = − 1 − x , x(π) = 0.
dy
In general, equations dx = f (x, y) are difficult to solve, but there is one equation
dy
in the above list which is much easier than others. In the last equation dx = cos(x),
162 3 Integrals
the right hand side does not have y in it, so the equation simply tells us that the
derivative of y(x) is equal to cos(x) or, in other words, y(x) is an antiderivative of
cos(x), so y(x) = sin(x) +C. Given a differential equation of this easier form
dy
= f (x)
dx
if we can find one particular antiderivative F(x) of f (x) then y = F(x) +C is called
a general solution of this equation. Such equation with an additional information
about some initial value
dy
= f (x), y(x0 ) = y0
dx
Rx
is also called an initial value problem (IVP) and its solution F(x) = y0 + x0 f (t) dt
is called the solution of this initial value problem.
Example 2. A patient is given some medication in a 50 mg immediate release pill

and 100 mg delayed-release pill which is released at a rate 1600te−4t mg/h, where
time t is measured in hours. What is the initial value problem that the amount of
medication A(t) delivered by time t satisfies? Show that
A(t) = 150 − 100(1 + 4t)e−4t
is the solution to this initial value problem.

Solution: The initial value A(0) = 50 comes from the immediate release pill, and
the rate of change after that is due to the delayed-release pill, so the amount of
medication delivered by time t satisfies the following initial value problem:
dA
= 1600te−4t , A(0) = 50.
dt
Let us show that A(t) = 150−100(1+4t)e−4t is the solution of this IVP. The initial
value matches: A(0) = 150 − 100(1 + 0)e0 = 50. By the product rule,
dA
= 0 − 100(4)e−4t − 100(1 + 4t)e−4t (−4) = 1600te−4t ,
dt
so the differential equation is also satisfied. OfR course, this solution comes from
the Reconstruction theorem, A(t) = 50 + 1600 0t se−4s ds, and we will later learn
how to find this integral using integration by parts technique, so we will learn how
to solve this problem without somebody giving us the answer to check.
Exercise 2. Find the solution to the following initial value problems.

dy dx
(a) = x2 + 4x3 , y(0) = 5. (b) = −10t + 3, x(0) = 25.
dx dt
Motion with constant acceleration. In the first section in this chapter we have
solved several problems about a linear motion with constant acceleration, in which
case the velocity v(t) was a linear function of time. We have used the area under
the graph of velocity to determine the position change, or displacement, but now
we can also find a general formula for position y(t) using the FTC.
Example 3. A coin is tossed straight up into the air from

the height of 1 meters with the speed of 5 m/s. Suppose that
gravity g = 10 m/s2 .
(a) Find the formula for velocity and height of the coin at
time t seconds.
(b) When will the coin reach the highest point? What is the
maximum height?
(c) When will the coin hit the ground? What is its speed at
that moment?
Solution: (a) The acceleration is due to gravity, so it is −g = −10 m/s2 . The minus
sign is because we consider upward as the positive direction. Because acceleration
is the derivative of velocity, dv
dt = −g, and because the initial velocity is v(0) = 5,
we can solve this initial value problem by using the FTC,
Z t
v(t) = 5 + (−g)ds = 5 − gt = 5 − 10t.
0
Then, because vertical velocity is the derivative of height, dy dt = 5 − gt, and the
initial height is y(0) = 1, we can again solve this initial value problem using the
FTC-2,
t2
Z t
y(t) = 1 + (5 − gs)ds = 1 + 5t − g = 1 + 5t − 5t 2 .
0 2
Notice how in both integrals above we used s as a variable of integration, because

t was reserved for the upper limit.
(b) At the highest point the velocity of the coin will be 0, so v(t) = 5 − 10t = 0.
Solving for t we get that t = 0.5 seconds. At that moment, the height is y(0.5) =
1 + 5(0.5) − 5(0.5)2 = 2.25 meters.
(c) The coin will hit the ground
√ when the height is 0, so y(t) = 1 + 5t − 5t 2 = 0.
5+ 45
Solving for t we get t = 10 = 1.1708. At that moment the velocity is equal to
v(1.1708) = −6.7082, so the speed is ≈ 6.7 m/s.
Exercise 3. A coin is tossed straight up into the air from the height of 1 meters
and it reaches the maximum height of 3 meters. What was the original velocity v0 ?
Suppose that gravity g = 10 m/s2 .
164 3 Integrals
Exercise 4. The lift-off speed of Airbus A3806

is approximately 90 m/s and it takes about 65
seconds to reach that speed. How long should
the runway be? Assume constant acceleration.
What is the lift-off speed was given as 200 mph?
Answer to Exercise 1. First, the initial value sin(π) = 0 matches. Plugging x =

sin(t) into the equation, the left hand side is (sin(t))′ = cos(t), and the right hand
side is also
q √
− 1 − sin2 (x) = − cos2 x = −| cos(x)| = cos(x).
The last equality −| cos(x)| = cos(x) is true because we limit ourselves to the in-
terval π2 ≤ x ≤ 3π
2 where cos(x) is negative.
dy
Answer to Exercise 2. (a) The notation dx indicates that y = y(x) is a function
of x, so x is the independent variable. This means that the differential equation
dy 2 3 ′ 2 3
dx = x + 4x tells us that the derivative y (x) is precisely x + 4x , so y(x) must
be an antiderivative of x + 4x . A general antiderivative of x + 4x3 is y(x) =
2 3 2
x3 4
3 + x +C, which can be also called the general solution of the above differential
equation. Since we are given the initial value y(0) = 5, we can plug it in, y(0) =
03 4 x3 4
3 + 0 +C = C = 5, so C = 5 and y(x) = 3 + x + 5.
(b) Here, t is the independent variable, and x = x(t) is an antiderivative of −10t +
3. A general antiderivative is −5t 2 + 3t + C, and x(0) = 25 = C, so C = 25 and
x(t) = −5t 2 + 3t + 25.
Answer to Exercise 3. As in the previous problem,
v(t) = v0 − 10t, y(t) = 1 + v0t − 5t 2 .

v0
The coin reaches the maximum height when v(t) = v0 − 10t = 0, so t = 10 . At
v0
that time the height must be y(t) = 3 meters, so plugging t = 10 into the equation
y(t) = 1 + v0t − 5t 2 , we get
v0 v 2
0 v2 v2 v2
3 = 1 + v0 −5 = 1+ 0 − 0 = 1+ 0 .
10 10 10 20 20
√
Solving for v0 , we get that v20 = 40, or v0 = 40 ≈ 6.32 m/s.
6 Image by Bill Abbott, https://www.flickr.com/photos/wbaiv/51673118672/
Answer to Exercise 4. Since dv dt = a, where a is unknown acceleration, and the

starting velocity is v(0) = 0, the velocity at time t is v(t) = at. We know that at
time t = 65 s velocity should be 90 m/s, so v(65) = 65a = 90 and a = 65 90
m/s2 .
Since dx
dt = v(t) = at and the starting position is x(0) = 0, the distance at time t is
t 2 90 t 2 9t 2
x(t) = a = · = .
2 65 2 13
2
Then at lift-off time t = 65 the distance is x(65) = 9(65)
13 = 2925 meters, so the
runway should be at least this long.
If the lift-off speed was given as 200 mph, the units of speed (miles per hour)
and time (seconds) would not match, so we would have to first convert 200 mph to
distance per second, for example, 89.4 m/s or 293.3 ft/s.
166 3 Integrals
3.5 Techniques of integration: substitution rule

In this section we will learn about integration by substitution, which is the reverse
of the chain rule of differentiation. If F ′ (x) = f (x) then the chain rule tell us that
d
F(g(x)) = F ′ (g(x))g′ (x) = f (g(x))g′ (x).
dx
This can be rephrased in the language of antiderivatives: if F ′ (x) = f (x) then
Z
f (g(x))g′ (x) dx = F(g(x)) +C.
In the case of definite integrals, this formula looks like this: if F ′ (x) = f (x) then
Z b x=b
f (g(x))g′ (x) dx = F(g(x)) = F(g(b)) − F(g(a)).
a x=a
How do we know that these formulas are applicable in a particular problem? The
key is to recognize what the presence of some function g(x) and its derivative g′ (x)
inside the integral, which takes some practice, but luckily there are several typical
patterns.
Power substitution. Our first example will be when the function g(x) is of the
following type:
g(x) = mx p + k for some constants p, m and k.
Let us illustrate on a concrete example how substitution rule works in practice.

Example 1. Compute the following indefinite and definite integral.
√ Z 4 √
cos(2 x + 1) cos(2 x + 1)
Z
(a) √ dx (b) √ dx
3 x 1 3 x
Solution: Once we choose the right substitution, the solution is usually relatively
short. However, in this first example let us explain how things work step by step.
• First, we need to guess what the function g(x) could be and call it by a new
name, for example, u, w, y, etc. We write
u = g(x)
which is called making a substitution, because this new variable √ u substitutes

for g(x). In this case, it is a good idea to try a substitution u = 2 x + 1, because
we are looking for a√function g(x) inside another function f (x), i.e. f (g(x)), and
here we have cos(2 x + 1).
3.5 Techniques of integration: substitution rule 167
• Second, we compute the derivative du

dx = g′ (x) and rewrite this formally
du = g′ (x)dx.
√ ′
In this problem, du √1 √1
dx = (2 x + 1) = x so du = x dx. We can see the presence
of √1x in the integral, which is an indication that we are on the right track.
• We replace all the appearances of g(x) by u and replace g′ (x) dx by du:
Z ′ Z
f g(x) g (x) dx = f (u) du.
|{z} | {z }
u du
In this particular problem,

√
Z
cos(2 x + 1)
Z
√ 1 1 Z
1
√ dx = cos 2 x + 1 · · √ dx = cos(u) du.
3 x | {z } 3 x 3
u | {z }
du
In the case of the definite integral, we also replace the limits: x = a becomes
u = g(a) and x = b becomes u = g(b):
Z b Z g(b)
f g(x) g′ (x) dx =

f (u) du.
a |{z} | {z } g(a)
u du
√ √
Here, x = 1 becomes u = 2 1 + 1 = 3 and x = 4 becomes u = 2 4 + 1 = 5,
Z 4
√ 1 1 Z 5
1
cos 2 x + 1 · · √ dx = cos(u) du.
1 | {z } 3 x 3 3
u | {z }
du
It is important that, after we made this substitution, there is no more x left in the
integral. Everything is now in terms of the new variable u.
• Now that the integral has been simplified, hopefully at this step we can find the
antiderivative F(u) of f (u):
Z
f (u) du = F(u) +C.
In the case of the definite integral, we can also apply the FTC:
Z g(b) u=g(b)
f (u) du = F(u) = F(g(b)) − F(g(a)).
g(a) u=g(a)
In this particular problem,

168 3 Integrals
1 1
Z
cos(u) du = sin(u) +C
3 3
and, in the case of the definite integral,
Z 5 u=5
1 1 1 1
cos(u) du = sin(u) = sin(5) − sin(3).
3 3 3 u=3 3 3
• The definite integral has already been computed in the last step, but in the case
of indefinite integral it is very important to substitute u = g(x) back:
F(u) +C = F(g(x)) +C,
because our original integral f (g(x))g′ (x) dx was in terms of the x variable, so
R
the answer should also be in terms of x. In this problem,

1 1 √
sin(u) +C = sin(2 x + 1) +C.
3 3
• Now that we solved this problem step by step, let us show how a concise version
of the solution would look like. First, we make a substitution
√ du √ 1 1
u = 2 x + 1, so that = (2 x + 1)′ = √ and du = √ dx.
dx x x
With this substitution, we solve part (a) as follows:
√
Z
cos(2 x + 1)
Z
1 1 1 √
√ dx = cos(u) du = sin(u) +C = sin(2 x + 1) +C.
3 x 3 3 3
In part (b), we compute the definite integral using this substitution as follows:
Z 4 √ Z 5
cos(2 x + 1) 1 1 u=5 1 1
√ dx = cos(u) du = sin(u) = sin(5) − sin(3).
1 3 x 3 3 3 u=3 3 3
Comment. In the definite integral, a common mistake is not to change the limits
Z 4 √ Z 4
cos(2 x + 1) 1
√ dx = cos(u) du = . . .
1 3 x 1 3
or stop writing the limits in the intermediate steps

Z 4 √ Z ?
cos(2 x + 1) 1
√ dx = cos(u) du = . . . .
1 3 x ? 3
On the other hand, it is totally okay to solve the indefinite integral by substitution
first without writing any limits,
√
Z
cos(2 x + 1)
Z
1 1 1 √
√ dx = cos(u) du = sin(u) +C = sin(2 x + 1) +C,
3 x 3 3 3
and then apply the FTC in the definite integral directly
√
Z 4
cos(2 x + 1) 1 √ x=4 1 1
√ dx = sin(2 x + 1) = sin(5) − sin(3),
1 3 x 3 x=1 3 3
skipping all the intermediate substitution steps. ⊔
⊓
Exercise 1. Compute the following indefinite and definite integral.

Z Z 1
2 +3 2 +3
(a) xe−2x dx (b) xe−2x dx
0
Example 2. Given the graph of a Rfunction y = f (x)

in the figure, estimate the integral 01 f (2x2 − 1)x dx.
Solution:
R1 First, we need to notice that the integral
f (2x 2 −1)x dx can be simplified if we make a sub-
0
stitution u = 2x2 − 1. Then dudx = 4x and du = 4x dx,
which can be also written as x dx = 41 du. With this
substitution, we can rewrite the integral as
Z 1 Z 1
1
f (2x2 − 1)x dx = f (u) du.
0 4 −1
1 R
It remains to estimate the integral −1 f (u) du, which is the area under its graph
on the interval [−1, 1]. We do not have all the information to compute this area
exactly but, looking at the graph, we see that f (x) approximately follows a straight
line y = 1 + x between x = −1 and x = 0, and it is approximately constant y = 1
between x = 0 and x = 1. This means that the area is approximately 12 + 1 = 23 and
the original integral is 01 f (2x2 − 1)x dx ≈ 41 × 32 = 38 .
R
R1
Exercise 2. Estimate the integral 0 x f (3x2 − 1) dx given the following table:
x −1 −0.5 0 0.5 1 1.5
f (x) 0 -1.75 1 3.75 5 3.75
Trigonometric substitution. Our next substitution will be of the type
g(x) = sin(x) or g(x) = cos(x).
R π/2
Example 3. Compute the integral 0 sin2 (x) cos(x) dx.
Solution: Here we see sin(x) squared (i.e. it is inside the square function) and we
see its derivative cos(x), so it is a good idea to make a substitution u = sin(x). Then
170 3 Integrals
du
dx = cos(x) and du = cos(x) dx. With this substitution,
u3 sin3 (x)
Z Z
sin2 (x) cos(x) dx = u2 du = +C = +C.
3 3
Since we already found the indefinite integral, for the definite integral we can skip
the intermediate substitution steps and use the FTC directly:
sin3 (x) x=π/2 1 0 1

Z π/2
sin2 (x) cos(x) dx = = − = .
0 3 x=0 3 3 3
We could also use the substitution for the definite integral if we also replaced the
limits u = sin(0) = 0 and u = sin(π/2) = 1, so that
u3
Z 1 u=1 1 1
Z π/2
sin2 (x) cos(x) dx = u2 du = = −0 = .
0 0 3 u=0 3 3
R π/4 sin(x)
Exercise 3. Compute the integral 0 cos2 (x)
dx.
Exponential substitution. Next, we will consider a couple examples when
g(x) = some combination of exponentials eκx .
R ex −e−x
Example 4. Compute indefinite integral ex +e−x
dx.
Solution: If we make a substitution u = ex + e−x then du
dx = (ex + e−x )′ = ex − e−x
and du = (ex − e−x )dx, so
ex − e−x 1
Z Z
dx = du = ln |u| +C = ln(ex + e−x ) +C.
ex + e−x u
We do not need to write |ex + e−x | because ex + e−x is already positive.
R e2x
Exercise 4. Compute indefinite integral 4+e2x
dx.
R1
cos eh(x) eh(x) h′ (x) dx given the table:

Exercise 5. Compute the integral 0
x −1 −0.5 0 0.5 1 1.5

h(x) 0 -1.75 ln π4 3.75 ln π2 3.75
Exercise 6. What differentiation technique does the substitution rule come from?
(a) power rule (b) product rule (c) chain rule (d) quotient rule
Answer to Exercise 1. We make a substitution u = −2x2 + 3 so that du dx = −4x and

du = −4x dx. We can also write x dx = − 41 du. With this substitution, we solve part
(a) as follows:
1 1 1
Z Z
−2x2 +3 2
xe dx = − eu du = − eu +C = − e−2x +3 +C.
4 4 4
In part (b), we compute the definite integral using this substitution as follows:
eu e e3
Z 1 Z 1 u=1
−2x2 +3 1
xe dx = − eu du = − =− + .
0 4 3 4 u=3 4 4
Answer to Exercise 2. First, we make a substitution u = 3x2 − 1. Then du dx = 6x

and du = 6x dx, which can be also written as x dx = 61 du. With this substitution,
Z 1 Z 2
1
f (3x2 − 1)x dx = f (u) du.
0 6 −1
2 R
It remains to estimate the integral −1 f (u) du. Given the table, we can use the left
Riemann sum with n = 6 subintervals on [−1, 2] of length 0.5 each:
Z 2
f (u) du ≈ (0 − 1.75 + 1 + 3.75 + 5 + 3.75) × 0.5 = 5.875.
−1
The original integral is ≈ 16 5.875 = 0.97916 . . ..

du
Answer to Exercise 3. We make a substitution u = cos(x), dx = − sin(x) and du =
− sin(x) dx:
sin(x) 1 1 1
Z Z
dx = − du = +C = +C.
cos2 (x) u2 u cos(x)
Using the FTC:
Z π/4
sin(x) 1 x=π/4 √
dx = = 2 − 1.
0 cos2 (x) cos(x) x=0
du
Answer to Exercise 4. If we make a substitution u = 4 + e2x then dx = 2e2x and
e2x dx = 12 du, so
e2x 1 1 1 1
Z Z
dx = du = ln |u| +C = ln(4 + e2x ) +C.
4 + e2x 2 u 2 2
Answer to Exercise 5. We can substitute u = h(x) first, but this will require us to do
another substitution later on (try it). Instead, if we make a substitution u = eh(x) then
172 3 Integrals
du
dx = eh(x) h′ (x) and du = eh(x) h′ (x) dx. Also, from the table, x = 0 will be replaced
π π
by u = eh(0) = eln 4 = π4 , and x = 1 will be replaced by u = eh(1) = eln 2 = π2 , so
Z 1 Z π/2 u=π/2
cos eh(x) eh(x) h′ (x) dx =

cos(u) du = sin(u)
0 π/4 u=π/4
1
= sin(π/2) − sin(π/4) = 1 − √ .
2
Answer to Exercise 6. Substitution rule is the reverse of the chain rule, so (c).
3.6 Techniques of integration: integration by parts 173
3.6 Techniques of integration: integration by parts

In this section we will learn about integration by parts, which is the reverse of the
product rule of differentiation. The product rule tell us that
′
u(x)v(x) = u′ (x)v(x) + u(x)v′ (x).
Integrating both sides, we can write

Z ′ Z Z
u(x)v(x) dx = u′ (x)v(x) dx + u(x)v′ (x) dx.
The left hand side we is equal to u(x)v(x) because the antiderivative of a derivative
is the function itself. If we replace the left hand side by u(x)v(x) and move the last
integral u(x)v′ (x) dx to the other side of the equation, we get
R
Z Z
′
u (x)v(x) dx = u(x)v(x) − u(x)v′ (x) dx.
This formula is called integration by parts, and its definite integral version is
Z b x=b
Z b
u(x)v′ (x) dx = u(x)v(x) − u′ (x)v(x) dx.
a x=a a
This formula relates one integral ab u(x)v′ (x) dx to another integral ab u′ (x)v(x) dx,
R R
and the idea is that in some cases the second integral is much easier to compute
than the first one. How do we know that these formulas are applicable in a given
problem? As usual, it takes some practice, but we will only focus on a few common
examples of the form (possibly with some constants)
xn sin(x), xn cos(x), x n ex , ln(x), xn ln(x).
The choice of u(x) and v(x) is summarized in the following table:
u xn ln(x) u′ nxn−1 1
x
& n+1
v′ sin(x), cos(x), e 1, xn
x v − cos(x), sin(x), ex x, xn+1
and we will see that in all these cases u′ (x)v(x) will be simpler than u(x)v′ (x).

Z Z π/4
(a) x cos(2x) dx (b) x cos(2x) dx
0
174 3 Integrals
Solution: According to the above table, we take u(x) = x and v′ (x) = cos(2x). To
use the integration by parts formula, we need to find u′ (x) and v(x). First, u′ (x) = 1.
To find v(x) we need to find an antiderivative of v′ (x) = cos(2x). Let us recall the
following substitution formula that will help us many times below:
1
if antiderivative of f (x) is F(x) then antiderivative of f (kx) is F(kx).
k
Since an antiderivative of cos(x) is sin(x), an antiderivative of cos(2x) is 21 sin(2x),

so v(x) = 12 sin(2x). Here we do not need to write +C, because integration by parts
will work with any specific antiderivative. Using the integration by parts formula
with u(x) = x, v′ (x) = cos(2x), u′ (x) = 1 and v(x) = 21 sin(2x):
Z Z
x cos(2x) dx = u(x)v(x) − u(x)v′ (x) dx
1 1
Z
= x sin(2x) − sin(2x) dx.
2 2
Notice how the integral simplified and we know how to integrate
1 1 1 1
Z
− sin(2x) dx = − − cos(2x) +C = cos(2x) +C.
2 2 2 4
Do not forget +C at this step! Plugging this into the above formula,
x 1
Z
x cos(2x) dx = sin(2x) + cos(2x) +C.
2 4
For the definite integral, we can use the FTC after we computed the indefinite
integral:
x 1 x=π/4 π 1
Z π/4
x cos(2x) dx = sin(2x) + cos(2x) = − .
0 2 4 x=0 8 4
Alternatively, we could also carry the limits from the beginning:
x x=π/4 π/4 1
Z π/4 Z
x cos(2x) dx = sin(2x) − sin(2x) dx
0 2 x=0 0 2
1
Z π/4
π
= − sin(2x) dx.
8 0 2
π 1 x=π/4 π 1
= + cos(2x) = − .
8 4 x=0 8 4
When using integration by parts, it is often easier to find the indefinite integral first
and plug in the limits at the very end.

Z Z π/3
(a) x sin(3x) dx (b) x sin(3x) dx
0

Z Z 1
(a) te2t dt (b) te2t dt
0
Solution: According to the above table, we take u(t) = t and v′ (t) = e2t . Then
u′ (t) = 1 and v(t) = 12 e2t and, using integration by parts,
t 1 2t t 1
Z Z
te2t dt = e2t − e dt = e2t − e2t +C.
2 2 2 4
integral:
e2 1
Z 1 t=1
t 1
te2t dt = e2t − e2t = + = 2.0973 . . .
0 2 4 t=0 4 4

Z Z 4
(a) te−t/4 dt (b) te−t/4 dt
0

Z Z 2
(a) ln(x) dx (b) ln(x) dx
1
Solution: In this integral we do not have a product of two functions like in a typical
integration by parts examples, but we can nevertheless use integration by parts.
If we take u(x) = ln(x) then what is v′ (x)? In this case it is simply v′ (x) = 1, so
u(x)v′ (x) = ln(x) × 1 = ln(x). Then u′ (x) = 1x , v(x) = x and the integration by parts
formula gives
1
Z Z Z
ln(x) dx = ln(x)x − x dx = x ln(x) − 1 dx = x ln(x) − x +C.
x
The definite integral is

Z 2 x=2
ln(x) dx = (x ln(x) − x) = (2 ln 2 − 2) − (1 ln 1 − 1) = 2 ln 2 − 1.
1 x=1
176 3 Integrals

Z Z 2
(a) x2 ln(x) dx (b) x2 ln(x) dx
1
Comment. We did not include any constant inside the logarithm ln(x) in the above
problems because ln(kx) = ln(k) + ln(x) so we can separate the constant factor k
into a separate term. ⊔
⊓
R2 R2 ′
Example 4. Find 0 f (x)g′ (x) dx = −4 given that 0 f (x)g(x) dx = −4 and
x f (x) f ′ (x) g(x) g′ (x)

0 1 −1.5 0.5 3.5
2 2 1 −1 1
Solution: Using the integration by parts formula,

Z 2 x=2
Z 2
f (x)g′ (x) dx = f (x)g(x) − f ′ (x)g(x) dx
0 x=0 0
Z 2
= f (2)g(2) − f (0)g(0) − f ′ (x)g(x) dx.
0
Using the table and given integral, this is equal to 2(−1) − 1(0.5) − (−4) = 1.5.
R3
Exercise 4. Find 1 x f ′′ (x) dx given the following table
x f (x) f ′ (x)
1 1 −1.5
3 2 1
R
Example 5. Compute sin(x) cos(x) dx using:
(a) substitution (b) integration by parts (c) trig identity
du
Solution: (a) Let us take u = sin(x). Then dx = cos(x), du = cos(x)dx, and
u2 sin2 (x)
Z Z
sin(x) cos(x) dx = u du = +C = +C.
2 2
(b) If we take u(x) = sin(x), v′ (x) = cos(x), u′ (x) = cos(x), v(x) = sin(x) then
the integration by parts formula gives
Z Z
sin(x) cos(x) dx = sin2 (x) − cos(x) sin(x) dx.
R
We see the same indefinite
R integral sin(x) cos(x) dx on both sides of the equation,
1 2
so solving for it we get sin(x) cos(x) dx = 2 sin (x). We found one antiderivative,
and the general antiderivative is sin(x) cos(x) dx = 2 sin2 (x) +C as in (a).
1
R
(c) Using trigonometric identity 2 sin(x) cos(x) = sin(2x),

1 1
Z Z
sin(x) cos(x) dx = sin(2x) dx = − cos(2x) +C.
2 4
Why does this answer look different from (a) and (b)? Because the two answers are
related by another trigonometric identity cos(2x) = 1 − 2 sin2 (x), so we can rewrite
this answer as − 14 + 21 sin2 (x) + C = 12 sin2 (x) + C′ , where C′ = C − 41 can be any
constant, because C can be any constant. Now the answers are the same.
Exercise 5. Find the mistake in the following argument that shows that 0 = 1.
Proof: If we take u(x) = sin(x), v′ (x) = cos(x), u′ (x) = cos(x), v(x) = sin(x) then
the integration by parts formula gives
Z Z
sin(x) cos(x) dx = sin2 (x) − cos(x) sin(x) dx.
R
Next, let us integrate the last integral cos(x) sin(x) dx by parts taking
u(x) = cos(x), v′ (x) = sin(x), u′ (x) = − sin(x), v(x) = − cos(x).
Integration by parts gives

Z Z
cos(x) sin(x) dx = − cos2 (x) − sin(x) cos(x) dx.
If we plug this into the first integration by parts step above, we get
Z Z
sin(x) cos(x) dx = sin2 (x) − − cos2 (x) − sin(x) cos(x) dx
Z Z
2 2
sin(x) cos(x) dx = sin (x) + cos (x) + sin(x) cos(x) dx.
| {z }
1
R
If we cancel the integrals sin(x) cos(x) dx on the left and right hand side, we get
0 = 1. Where is the mistake?
Exercise 6. What differentiation technique does the integration by parts formula

come from?
(a) power rule (b) product rule (c) chain rule (d) quotient rule
178 3 Integrals
Answer to Exercise 1. Take u(x) = x and v′ (x) = sin(3x), so that u′ (x) = 1 and
v(x) = − 31 cos(3x). Integration by parts gives,
x 1 x 1
Z Z
x sin(3x) dx = − cos(3x) + cos(3x) dx = − cos(3x) + sin(3x) +C.
3 3 3 9
x 1 x=π/3
Z π/3 π
x sin(3x) dx = − cos(3x) + sin(3x) = .
0 3 9 x=0 9
Answer to Exercise 2. According to the above table, we take u(t) = t and v′ (t) =
e−t/4 . Then u′ (t) = 1 and v(t) = −4e−t/4 and, using integration by parts,
Z Z
te−t/4 dt = −4te−t/4 + 4e−t/4 dt = −4te−t/4 − 16e−t/4 +C.
integral:
Z 4 t=4
32
te−t/4 dt = −4te−t/4 − 16e−t/4 =− + 16 = 4.2279 . . .
0 t=0 e
Answer to Exercise 3. If we take u(x) = ln(x) and v′ (x) = x2 then u′ (x) = 1

x and
3
v(x) = x3 . The integration by parts formula gives
x3 1 x3 x3 x2 x3 x3
Z Z Z
x2 ln(x) dx = ln(x) − dx = ln(x) − dx = ln(x) − +C.
3 x 3 3 3 3 9
Z 2 x3 x3 x=2 8 7
x2 ln(x) dx = ln(x) − = ln(2) − .
1 3 9 x=1 3 9
Answer to Exercise 4. Take u(x) = x and v′ (x) = f ′′ (x). Then u′ (x) = 1 and v(x) =
f ′ (x). Using the integration by parts formula,
Z Z
x f ′′ (x) dx = x f ′ (x) − f ′ (x) dx = x f ′ (x) − f (x) +C.
In the second equality we used that an antiderivative of f ′ (x) is f (x). Then the
definite integral is
Z 3 x=3
x f ′′ (x) dx = x f ′ (x) − f (x) = (3 f ′ (3) − f (3)) − ( f ′ (1) − f (1)).

1 x=1
Using the table, this is equal to (3(1) − 2) − (−1.5 − 1) = 3.5.

Answer to Exercise 5. After two integration by parts steps we arrived at

Z Z
2 2
sin(x) cos(x) dx = sin (x) + cos (x) + sin(x) cos(x) dx
which is the same as

Z Z
sin(x) cos(x) dx = 1 + sin(x) cos(x) dx.
The mistake was in cancelling the two integrals. Recall that indefinite integrals,
a.k.a. antiderivatives, are defined up to a constant +C. The above equation simply
says that adding +1 to any antiderivative of sin(x) cos(x)
R is also an antiderivative of
sin(x) cos(x). When we say that indefinite integral f (x) dx is equal to something,
there is always a hidden +C in the statement.
Answer to Exercise 6. Integration by parts is based on the product rule, so (b).

180 3 Integrals
3.7 Approximating integrals using Taylor polynomials

In this section we will learn another way to approximate a function y = f (x) by
simple functions, called Taylor polynomials, and then use them to approximate
integrals. To introduce these polynomials, let us start by recalling a formula for the
tangent line to y = f (x) at x = a:
P1 (x) = f (a) + f ′ (a)(x − a).
Tangent line is also called the Taylor polynomial of degree n = 1 (centered) at

x = a and is often denoted P1 (x).
What do we know about the tangent line?
• Tangent line has the same value at x = a as our
function: P1 (a) = f (a).
• Tangent line has the same derivative (velocity)
at x = a: P1′ (a) = f ′ (a).
As a result, the tangent line P1 (x) approximates
f (x) near x = a.
Taylor polynomials of degree 2. What if we also want our approximation of
y = f (x) to have the same second derivative (acceleration) at x = a? It turns out
that we can do that if instead of a line we use a parabola
f ′′ (a)
P2 (x) = f (a) + f ′ (a)(x − a) + (x − a)2
2
which is called the Taylor polynomial of degree n = 2 (centered) at x = a and is

denoted P2 (x).
Similarly to the Taylor polynomial of degree 1:
• P2 (x) has the same value at x = a as our function:
P2 (a) = f (a).
• P2 (x) has the same derivative (velocty) at x = a:
P2′ (a) = f ′ (a).
• In addition, P2 (x) has the same second deriva-
tive (acceleration) at x = a: P2′′ (a) = f ′′ (a).
As a result, P2 (x) also approximates f (x) near x = a, often better than P1 (x) and in
a bigger neighbourhood of the point x = a, as we can see in the figure above.
′′ (a)
Example 1. Explain why we divide by 2 in the coefficient f 2 of the last term
f ′′ (a) 2
2 (x − a) in the definition of the Taylor polynomial P2 (x).
3.7 Approximating integrals using Taylor polynomials 181
Solution: We want the first derivative to be the same, P2′ (a) = f ′ (a), and the second
derivative to be the same, P2′′ (a) = f ′′ (a), at x = a. If we compute the first two
derivatives of the above parabola P2 (x):
f ′′ (a)
P2′ (x) = f ′ (a) + 2(x − a) = f ′ (a) + f ′′ (a)(x − a), P2′′ (x) = f ′′ (a),
2
we can see that P2′ (a) = f ′ (a) + f ′′ (a)(a − a) = f ′ (a) and P2′′ (a) = f ′′ (a). Notice
how 2 in the denominator cancelled 2 that came from the derivative of (x − a)2 ,
which explains why we divided by 2 in the last term of P2 (x). The derivatives
would not match otherwise.
Exercise 1. Suppose that a function f (x) is approximated near x = 0 by a Taylor

polynomial of degree 2 given by P2 (x) = −3 + 2x − x2 . Find f (0), f ′ (0) and f ′′ (0).
Example 2. Compute and graph the first and second degree Taylor polynomial
centered at x = 0 for the functions y = ex and y = cos(x).
Solution: If f (x) = ex then f ′ (x) = ex and f ′′ (x) = ex , so f (0) = f ′ (0) = f ′′ (0) = 1.
2
By definition, P1 (x) = 1 + x and P2 (x) = 1 + x + x2 . The graph is in the left figure
below.
If f (x) = cos(x) then f ′ (x) = − sin(x) and f ′′ (x) = − cos(x), so f (0) = 1,
2
f ′ (0) = 0 and f ′′ (0) = −1. By definition, P1 (x) = 1 and P2 (x) = 1 − x2 . The graph
is in the right figure below.
Notice that, if a function is concave up at x = a then its Taylor polynomial P2 (x)

of degree 2 is also concave up, and if a function is concave down at x = a then
its Taylor polynomial P2 (x) of degree 2 is also concave down. This is because the
coefficient that determines if the parabola opens upward or downward is f ′′ (a)/2: if
f ′′ (a) > 0 then f (x) is concave up and the parabola opens upward, and if f ′′ (a) < 0
then f (x) is concave down and the parabola opens downward.
182 3 Integrals
Exercise 2. Suppose that P2 (x) = p + qx + rx2 is the Taylor polynomial of degree 2

centered at x = 0 for the function f (x). For each of the functions below determine
the sign of p, q and r.
(a) (b)
(c) (d)
Example 3. Estimate the coefficients of the second degree Taylor polynomial

P2 (x) = p + q(x − 0.5) + r(x − 0.5)2 centered at x = 0.5 for the function f (x) given
the values in the table:
x 0 0.25 0.5 0.75 1
f (x) 3 1.75 1 0.75 1
′′
Solution: By definition, p = f (0.5), q = f ′ (0.5), and r = f (0.5)
2 , so p = f (0.5) = 1
and we need to estimate f ′ (0.5) and f ′′ (0.5). We can estimate the first derivative
f ′ (0.5) ≈ ∆x
∆y
in several ways:
• By considering the increments ∆y and ∆x between x = 0.5 and x = 0.75:
f ′ (0.5) ≈ ∆x
∆y 0.75−1
= 0.75−0.5 = −1.
• By considering the increments ∆y and ∆x between x = 0.25 and x = 0.5:
f ′ (0.5) ≈ ∆x
∆y 1−1.75
= 0.5−0.25 = −3.
• By taking the average of the above two approximations: f ′ (0.5) ≈ −1−3
2 = −2.
This is the same as considering the increments ∆y and ∆x between x = 0.25 and
x = 0.75: f ′ (0.5) ≈ ∆x
∆y
= 0.75−1.75
0.75−0.25 = −2.
We have seen when estimating derivatives that taking the average is often more
accurate, so let us take the estimate q = f ′ (0.5) ≈ −2. Finally, we need to estimate
the second derivative f ′′ (0.5). Let us use the following estimate:
f (x+h)− f (x) f (x)− f (x−h)

− f (x + h) − 2 f (x) + f (x − h)
f ′′ (x) ≈ h h
=
h h2
where h is the increment ∆x, in this case h = 0.25. Before we explain this formula,
let us use in this problem:
f (0.75) − 2 f (0.5) + f (0.25) 0.75 − 2 · 1 + 1.75
f ′′ (0.5) ≈ = = 8,
0.252 0.252
′′
so our estimate of the last coefficient is r = f (0.5)
2 ≈ 82 = 4. We estimate that the
Taylor polynomial is P2 (x) ≈ 1 − 2(x − 0.5) + 4(x − 0.5)2 .
Finally, the reason behind the above estimate of the second derivative f ′′ (x)
is that f (x+h)−
h
f (x)
can be viewed as an estimate of f ′ (x + h2 ), f (x)−hf (x−h) can be
viewed as an estimate of f ′ (x − h2 ), and then the above formula becomes
f ′ (x + 2h ) − f ′ (x − h2 )
f ′′ (x) ≈
h
which is exactly how we would estimate the second derivative f ′′ (x) if we knew
the values of the first derivative.
Exercise 3. Estimate the coefficients of the second degree Taylor polynomial
P2 (x) = p + q(x − 2) + r(x − 2)2 centered at x = 0.5 for the function f (x) given
the values in the table:
x 1.5 1.75 2 2.25 2.5
f (x) 1.75 2 1.75 1 -0.25
Taylor polynomials of general degree. Can we generalize the Taylor polyno-

mials of degree 1 and 2 to match not only the first and second derivatives at x = a,
but also the third, fourth derivative, and so on? The answer is yes, we can match
the first n derivatives, if we use a polynomial of degree n:
f ′′ (a) f (n) (a)

Pn (x) = f (a) + f ′ (a)(x − a) + (x − a)2 + . . . + (x − a)n
2! n!
which is called the Taylor polynomial of degree n (centered) at x = a and is

denoted Pn (x). Here n! = 1 × 2 × . . . × n is called n-factorial. All the coefficients
are chosen in such a way that the first n derivatives match:
(n)
Pn (a) = f (a), Pn′ (a) = f ′ (a), Pn′′ (a) = f ′′ (a), . . . , Pn (a) = f (n) (a).
The following figures show several Taylor polynomials centered at x = 0 for three
functions: ex , cos(x), and sin(x). We can see that the approximations get better and
better on wider and wider intervals as the degree n increases.
184 3 Integrals
Let us now consider several classic examples of Taylor polynomials. First, let
us take a look at the exponential function f (x) = ex . Because all the derivatives of
ex are equal to ex , we get that f (n) (x) = ex and f (n) (0) = e0 = 1, and the Taylor
polynomial of ex of degree n at x = 0 is
x2 x3 xn
Pn (x) = 1 + x + + +...+ .
2! 3! n!
Next, let us compute Taylor polynomials for cos(x) and sin(x).
Example 4. Compute Taylor polynomials of all degrees at x = 0 for f (x) = cos(x).

Solution: When we write consecutive derivatives of f (x) = cos(x),
f = cos(x), f ′ = − sin(x), f ′′ = − cos(x), f ′′′ = sin(x), f (4) = cos(x), . . .
we see that the fourth derivative is equal to the original function cos(x), so after
that the same pattern of cos(x), − sin(x), − cos(x), sin(x) will keep repeating. When
we plug in x = 0, we see that cos(0) = 1, − sin(0) = 0, − cos(0) = −1, sin(0) = 0,
so the derivatives at x = 0 will follow a repeating pattern of 1, 0, −1, 0, etc. That
means that the Taylor polynomials of cos(x) at x = 0 will have a pattern
x2 x4 x6 x8 x10
1− + − + − +...
2! 4! 6! 8! 10!
Notice that all odd powers of x are missing because of the coefficients ± sin(0) = 0.
We can stop at any degree to get the Taylor polynomial Pn (x) of that degree n:
x2 x2 x2 x4
P1 (x) = 1, P2 (x) = 1 − , P3 (x) = 1 − , P4 (x) = 1 − + , . . .
2! 2! 2! 4!
Notice how P3 (x) is equal to P2 (x). Again, that is because the coefficient in front
of x3 is zero. That is why in the above figure we plotted only even degree Taylor
polynomials P2 (x), P4 (x) and P6 (x).
Exercise 4. Show that the Taylor polynomials of f (x) = sin(x) at x = 0 follow a

pattern
x3 x5 x7 x9 x11
x− + − + − +....
3! 5! 7! 9! 11!
What is P5 (x) and P6 (x)?
Example 5. Write down and simplify the Taylor polynomial of degree 5 centered
at x = 0 for y = f (x) given that
f (0) f ′ (0) f ′′ (0) f ′′′ (0) f (4) (0) f (5) (0)

−2 2 −1 2 −3 12
Solution: By definition,
1 2 2 3 3 4 12 5
P5 (x) = −2 + 2x − x + x − x + x .
2! 3! 4! 5!
We can simplify the last three coefficients,
2 2 1 3 3 1 12 3 · 4 1
= = ; − =− =− ; = = ,
3! 1 · 2 · 3 3 4! 1 · 2 · 3 · 4 8 5! 1 · 2 · 3 · 4 · 5 10
so the simplified form of the Taylor polynomial is
x2 x3 x4 x5
P5 (x) = −2 + 2x − + − + .
2 3 8 10
Exercise 5. Find f (5) (−1) and f (7) (−1) given the Taylor polynomial of f (x) of
7
degree 7 centered at x = −1: P7 (x) = −2 + (x + 1) − 3(x + 1)3 + (x + 1)4 − (x+1)
6! .
Approximating integrals using Taylor polynomials. Next, we will use Taylor

polynomials to approximate definite integrals. As we will see, Taylor polynomials
are very useful because they are very easy to integrate.
Example 6. Write down a Taylor polynomial of
cos(2x2 ) at x = 0 with
R1
three non-zero terms and use
it to approximate −1 cos(2x2 ) dx.
Solution: In problems of this type, instead of using
the definition of a Taylor polynomial and comput-
ing the derivatives of cos(2x2 ), what we can do is
take a Taylor polynomial of cos(x) and then replace
x by 2x2 . This will automatically give us a Taylor
polynomial we want without doing extra calculations. We already know that
x2 x4
cos(x) ≈ 1 − +
2! 4!
186 3 Integrals
near x = 0, where we wrote the polynomial of degree 4 because we were asked to

use a polynomial with thee non-zero terms. If we now replace x by 2x2 , we get
(2x2 )2 (2x2 )4 2
cos(2x2 ) ≈ 1 − + = 1 − 2x4 + x8 .
2! 4! 3
Actually, this approximation is very good on the interval [−1, 1] as we can see in
the figure above, where cos(2x2 ) is the black solid line and 1 − 2x4 + 32 x8 is the
blue dashed line. Integrating this approximation gives
Z 1 Z 1
2
cos(2x2 ) dx ≈ 1 − 2x4 + x8 dx
−1 −1 3
2x 5 9
2x x=1
= x− +
5 27 x=−1
2 2 2 2
= 1− + − −1 + − = 1.3481 . . .
5 27 5 27
Actually, one
R 1 can check using a computer or a graphical calculator that the original
integral is −1 cos(2x2 ) dx = 1.3351 . . . , so the approximation we obtained is pretty
good. To make it even better we could have used a Taylor polynomial with a few
more terms, which would give a better approximation of our function.
2
Exercise 6. Write down a Taylor polynomial of e−x at x = 0 with four non-zero
R 1 −x2
terms and use it to approximate 0 e dx.
Example 7. Use a Taylor polynomial of sin(x)

at x = 0 with two non-zero terms to approximate
R 1 sin(x)
0 x dx.
Solution: As the problem suggests, we start with a
Taylor approximation
x3 x3
sin(x) ≈ x − = x− .
3! 6
When we divide both sides by x, we get
sin(x) x2
≈ 1− .
x 6
2
In the figure above, sin(x) x
x is the black solid line and 1 − 6 is the blue dashed line,
and the approximation looks pretty good. Integrating this approximation gives
x2
Z 1 Z 1
sin(x)
dx ≈ 1− dx
0 x 0 6
x3 x=1 1 17
= x− = 1− −0 = = 0.9444 . . .
18 x=0 18 18
One can check using a computer or a graphical calculator that the original integral
is 01 sin(x)
R
x dx = 0.9460 . . ., so the approximation is pretty good. There is one subtle
point in this problem: sin(x)
x is not defined at x = 0 because we divide by 0. How-
ever, according to the above figure, sin(x)
x approaches 1 as x approaches 0, so we
implicitly assumed that that function we integrate is equal to 1 at x = 0.
Exercise 7. Use a Taylor polynomial of cos(x) at x = 0 with three non-zero terms

to approximate 01 1−cos(x)
R
x2
dx.
Exercise 8. A bacterial colony starts growing in a

Petri dish at time t = 0 (time is measured in days).
Every hour between 22 and 26 hours you measure
the area of the colony (in cm2 ) and from the data
you estimate f (1), f ′ (1) and f ′′ (1), where f (t) is the
growth rate (in cm2 /day) of the area occupied by
the colony. However, the next day you realize that
you did not save the area data and only know that
f (1) = 2, f ′ (1) = −0.6, and f ′′ (1) = 0.18. How can you estimate the area at t = 1
given this information?
Answer to Exercise 1. There are two ways we can solve this problem. Since we
know that P2 (x) and f (x) have the same value and first two derivatives at x = 0 (so
in this case a = 0), we can just compute P2′ (0) = 2 − 2x and P2′′ (0) = −2 and plug
in x = 0 to get f (0) = P(0) = −3, f ′ (0) = P2′ (0) = 2, and f ′′ (0) = P2′′ (0) = −2.
A better way to solve this problem without any calculations is to compare the
′′
definiton of P2 (x) = f (0) + f ′ (0)x + f 2(0) x2 centered at x = 0 with the formula
given to us, P2 (x) = −3 + 2x−1x2 , and make sure that all the coefficients match:
f ′′ (0)
f (0) = −3, f ′ (0) = 2, = −1.
2
This immediately gives us f (0) = −3, f ′ (0) = 2, and f ′′ (0) = −2.
Answer to Exercise 2. (a) p > 0 because p = f (0) > 0, q > 0 because q = f ′ (0) > 0
since the slope is positive, r < 0 because r = f ′′ (0)/2 < 0 since the function is
concave down at x = 0. (b) p > 0, q < 0, r < 0. (c) p > 0, q > 0, r > 0. (d) p > 0,
q < 0, r > 0.
f (2.25)− f (1.75)
Answer to Exercise 3. p = f (2) = 1.75, q = f ′ (2) ≈ 0.5 = 1−2
0.5 = −2,
′′
and r = f 2(2) , where
f (2.25) − 2 f (2) + f (1.75) 1 − 2 · 1.75 + 2

f ′′ (2) ≈ = = −8,
0.252 0.252
so r ≈ − 28 = −4. The Taylor polynomial is P2 (x) ≈ 1.75 − 2(x − 2) − 4(x − 2)2 .
188 3 Integrals
Answer to Exercise 4. When we write consecutive derivatives of f (x) = sin(x),

′′′
f = sin(x), f ′ = cos(x), f ′′ = − sin(x), f = − cos(x), f (4) = sin(x), . . .
we see that the fourth derivative is equal to the original function sin(x), so after that
the same pattern of sin(x), cos(x), − sin(x), − cos(x) will keep repeating. When we
plug in x = 0, we see that sin(0) = 0, cos(0) = 1, − sin(0) = 0, − cos(0) = −1, so
the derivatives at x = 0 will follow a repeating pattern of 0, 1, 0, −1 etc. That means
that Taylor polynomials will have a pattern
x3 x5 x7 x9 x11
x− + − + − +...
3! 5! 7! 9! 11!
Notice that all even powers of x are missing because of the coefficients ± sin(0) =
0. We can stop at any degree to get the Taylor polynomial Pn (x) of that degree n.
For example,
x3 x5 x3 x5
P5 (x) = x − + , P6 (x) = x − +
3! 5! 3! 5!
Notice how P6 (x) is equal to P5 (x). Again, that is because the coefficient in front
of x6 is zero.
Answer to Exercise 5. Because we do not have the term with (x − (−1))5 = (x +

(5)
1)5 , it means that the coefficient f 5!(−1) is 0, so f (5) (−1) = 0. The coefficient in
f (7) (−1)
front of (x − (−1))7 = (x + 1)7 is − 6!1 which by definition should be 7! , so
f (7) (−1) 1 7!
=− =⇒ f (7) (−1) = − = −7.
7! 6! 6!
Answer to Exercise 6. We start with the Taylor

polynomial for ex with four terms,
x2 x3
ex ≈ 1 + x + + ,
2! 3!
and replace x by −x2 ,
2 (−x2 )2 (−x2 )3
e−x ≈ 1 + (−x2 ) + +
2! 3!
x 4 x 6
= 1 − x2 + − .
2 6
2 4 6
In the figure above, where e−x is the black solid line and 1 − x2 + x2 − x6 is the
blue dashed line. Integrating this approximation gives
x4 x6
Z 1 Z 1
2
e−x dx ≈ 1 − x2 +
− dx
0 0 2 6
x3 x5 x7 x=1
= x− + −
3 10 42 x=0
1 1 1
= 1− + − − 0 = 0.7428 . . . .
3 10 42
One can check using a computer or a graphical calculator that the original integral
2
is 01 e−x dx = 0.7468 . . . , so the approximation we obtained is pretty good.
R
Answer to Exercise 7. As the problem suggests,

we start with a Taylor approximation
x2 x4 x2 x4
cos(x) ≈ 1 − + = 1− +
2! 4! 2 24
When we subtract both sides from 1 and divide by
x2 , we get
1 − cos(x) 1 x2
≈ − .
x2 2 24
2
In the figure above, 1−cos(x)
x2
is the black solid line and 1
2
x
− 24 is the blue dashed
line. Integrating this approximation gives
x2
Z 1 Z 1
1 − cos(x) 1
dx ≈ −
dx
0 x2 0 2 24
x x3 x=1
= −
2 72 x=0
1 1
= − − 0 = 0.4861 . . .
2 72
One can check using a computer or a graphical calculator that the original inte-
gral is 01 1−cos(x)
R
x2
dx = 0.4863 . . .. Again, there is one subtle point in this problem:
1−cos(x)
x2
is not defined at x = 0 because we divide by 0. However, according to the
x2
approximation 1−cos(x)
x2
≈ 12 − 24 , this function approaches 12 as x approaches 0, so
we can implicitly assume that that function we integrate is equal to 12 at x = 0.
Answer to Exercise 8. If A(t) is the area at time t then f (t) = A′ (t) and, by the
FTC, Z Z1 1
A(1) = A(0) + f (t) dt = f (t) dt,
0 0
because the area was 0 at t = 0. Given f (1) = 2, f ′ (1) = −0.6, and f ′′ (1) = 0.18,
we can estimate f (t) using the second degree Taylor polynomial centered at t = 1,
190 3 Integrals
f ′′ (1)
f (t) ≈ P2 (t) = f (1) + f ′ (1)(t − 1) + (t − 1)2
2
= 2 − 0.6(t − 1) + 0.09(t − 1)2
and use it to estimate the integral

Z 1 Z 1
A(1) = f (t) dt ≈ 2 − 0.6(t − 1) + 0.09(t − 1)2 dt
0 0
(t − 1)2 (t − 1)3 t=1
= 2t − 0.6 + 0.09
2 3 t=0
0.6 0.09
2
= 2− − − = 2.33 cm .
2 3
3.8 CAS: computer algebra systems 191
3.8 CAS: computer algebra systems

When it comes to computing indefinite and definite integrals, there are many com-
puter algebra systems (advanced online calculators) available, such as Wolfram
Alpha, Symbolab, Geogebra, Desmos, etc. In this section we will go over several
examples of computing indefinite and definite integrals, as well as Taylor polyno-
mials, using Wolfram Alpha. It is quite flexible in terms of interpreting queries in
natural language, so in this sense it is very convenient.
Indefinite integrals.
R To find an
indefinite integral f (x) dx of some
function f (x), one can simply enter
“integral of f (x)” into the input bar.
Depending on the function f (x), the
output may vary, as we will see in the
examples below. In the example on the
right, the function f (x) is ex cos(x), and
the first line of the output is the answer
ex
Z
ex cos(x) dx = (cos(x)+sin(x))+C.
2
In this particular case, below some
x
plots of e2 (cos(x) + sin(x)), it also
gives alternative forms of the integral,
for example,
ex π
√ sin(x + ) +C
2 4
which is just another way to rewrite this function using trigonometric identities.
Below that, the Taylor polynomial of degree 4 is given,
1 x2 x4
+x+ − .
2 2 12
It says “series expansion of the integral at x = 0” because Taylor polynomials give
rise to the so called Taylor series that will be discussed in the later chapter. As
we can see, the output contains a lot of useful information without us even asking
for it. If we are simply looking for an antiderivative, it is useful to take a look at
alternative forms of the integral.
Exercise 1. Using www.wolframalpha.com, find the indefinite integral

sin(x)
Z
dx.
cos2 (x)
What is an alternative form of the integral?
192 3 Integrals
Definite integrals. To find a defi-

nite integral ab f (x) dx of some func-
R
tion f (x), we can enter “integral of

f (x) from a to b” into the input bar.
In
R 1 the example on the right, we entered
ex cos(x) dx and the output gives
0
1
e(sin(1) + cos(1)) − 1 ≈ 1.3780
2
One can click on the answer to see
a more accurate decimal approxima-
tion 1.378024613547 . . . . The exact an-
swer 21 (e(sin(1) + cos(1)) − 1) actually
comes from an application of the FTC
using the indefinite integral found in the example above:
ex x=1 1
(cos(x) + sin(x)) = e(sin(1) + cos(1)) − 1.
2 x=0 2
This indefinite integral is also given in the output, if you scroll down.
Exercise 2. Using www.wolframalpha.com, find the definite integral
sin(x)
Z π/4
dx.
0 cos2 (x)
Give the answer up to ten digits.
The good thing about definite inte-
grals is that they can be estimated using
Riemann sums (and other techniques of
numerical integration) even if its an-
tiderivative cannot be found explicitly,
so we can not apply the FTC. In the
example on the right, Wolfram Alpha
was unable
√ to find the indefinite inte-
2
gral of x + 1 e−x in terms of standard
mathematical functions, but it had no
problem computing the definite integral
from 0 to 1 with high accuracy. We will
discuss below how to use specific nu-
merical methods, such as the familiar
left and right Riemann sums.
Exercise 3. Find the definite integral 02 cos(x4 ) sin(x3 ) dx. What about indefinite
R
integral? Does this function have an antiderivative?

Unfamiliar outputs. Often when

we try to find an indefinite integral, the
output might look unfamiliar. For ex-
ample, in the example on the right we
see that
1
Z
√ dx = sinh−1 (x) +C.
1+x 2
We can see if there is an alternative

form of the integral that might look
more familiar. For example, in this case
an alternative form is
p
ln x2 + 1 + x .
Actually, the answer says log instead of

ln, but if you look in the corner under it
(this part was cut off in this figure), it
says “log(x) is the natural logarithm”.
In Wolfram Alpha log(x) denotes the natural logarithm ln(x), which is a com-
mon convention in Mathematics in general. In any case, this alternative form is
more familiar in this particular case. You can also look up what “sinh” means and
you will find that the function sinh(x) is the so called hyperbolic sine defined by
x −x −1
sinh(x) = e −e 2 , and sinh (x) is its inverse. We could use either form to find a
definite integral, for example,
Z 1
1 √
dx = sinh−1 (1) − sinh−1 (0) = ln 2 + 1 ≈ 0.88137 . . . .

√
0 1+x 2
Sometimes the output might look

unfamiliar, and there is no alternative
form given. For example, in the exam-
ple on the right
1√
Z
2
e−x dx = π erf(x) +C.
2
In this case, you can look up what
erf(x) means, and you will find that it
is the so called called error function de-
fined by Z x
2 2
erf(x) = √ e−t dt.
π 0
2
In other words, it is a specific antiderivative of √2 e−x equal to 0 and x = 0.
π
194 3 Integrals
R sin(x)
Exercise 4. Find an indefinite integral x dx. Investigate what the output
means. Find the definite integral 01 sin(x)
R
x dx.
Riemann sums. We can find a left
Riemann sum with n subintervals on
the interval [a, b] by writing “integral
of f (x) from a to b using left end-
point rule with n intervals”. We can
replace “integral” by “Riemann sum”,
and “left” by “right” if we want the
right Riemann sum. The output gives
the result of the Riemann sum, its sym-
bolic representation using the ∑ nota-
tion, the graph of the function illustrat-
ing how rectangles approximate the ac-
tual function, and theR exact result for
the definite integral ab f (x) dx so we
can compare how well the Riemann
sum approximates the actual integral.
If we scroll down all the way down,
the output also gives the method com-
parison for various numerical methods.
We have only discussed the left and
right Riemann sums but, as we can see,
there are many other methods, some of
them are much more accurate than sim-
ple left or right Riemann sums. Abso-
lute error means how far the approxi-
mation is from the actual integral. Relative error means the absolute error divided
by the actual integral (ignoring ± sign) or, in other words, the error measured as a
proportion of the actual answer.
Exercise 5. Compute the definite integral 12 sin(x)
R
x dx using the right Riemann sum
with 100 intervals. What is the absolute error of this approximation?
Taylor polynomials. We can find the Taylor

polynomial Pn (x) of f (x) of degree n centered at
x = a by writing “Taylor polynomial of degree
n of f (x) at a”. We can replace “Taylor poly-
nomial” by “series”. The answer is given under
“series expansion at x = a” .
Exercise 6. Find the Taylor polynomial of
sin(x) of degree 3 centered at π4 .
R sin(x)
Answer to Exercise 1. cos2 (x) dx = sec(x) +C. An alternative form of the answer
1
is cos(x) +C, which in this case is just the definition of sec(x).
R π/4 sin(x) √
Answer to Exercise 2. 0 cos2 (x)
dx = 2 − 1 ≈ 0.41421 . . . . Ten digit answer is
0.4142135623 . . . .
Answer to Exercise 3. 02 cos(x4 ) sin(x3 ) dx ≈ 0.04745 . . . . Indefinite integral can-

R
not be written in terms of standard mathematical functions. Nevertheless, the

function cos(x4 ) sin(x3 ) is a nice continuous function so it does have an an-
tiderivative.
Rx For example, by the FTC-2, Reconstruction theorem, we know that
4 3
0 cos(t ) sin(t ) dt is one such antiderivative. Simply, there is no way to write this
antiderivative using functions that have already been defined somewere and given
some standard name.
Answer to Exercise 4. Wolfram Alpha outputs sin(x)

R
x dx = Si(x) + C. Look-
ing up Si(x) functions shows that it is called the sine integral and is defined by
Si(x) = 0x sin(t) sin(x)
R
t Rdt. In other words, it is a specific antiderivative of x . The defi-
nite integral is 01 sin(x)
x dx = Si(1) − Si(0) ≈ 0.946083 . . . .
Answer to Exercise 5. The Riemann sum equals 0.657395. Absolute error is

0.00193523.
Answer to Exercise 6. P3 (x) = √1 + √12 (x − π4 ) − 2√1 2 (x − π4 )2 + 6√1 2 (x − π4 )3 .

2
196 3 Integrals
3.9 Improper integrals

Rb
Improper integrals are definite integrals a f (x) dx where
• either the interval [a, b] is infinite,
• or f (x) has some vertical asymptotes on the interval [a, b],
• or a combination of both.
R ∞
One example is the integral −∞ f (x) dx in the left figure below, where the interval
is infinite, −∞ < x < ∞, and the function has a vertical asymptote at x = 3 where
it blows up to infinity from both left and right side. In this case the area between
y = f (x) and the x-axis could be infinite, or it could be finite. The question is, how
do we decide if this area is finite or infinite, and how do we calculate it?
The procedure is illustrated in the above two figures.

• First, we slice the area into several pieces, where each piece corresponds to one
potential issue that can cause the area to be infinite. For example, pieces A1 and
A4 correspond to the interval stretching to infinity in one direction, and pieces
A2 and A3 correspond to the function blowing up to infinity from one side of
the vertical asymptote or the other. The specific choice of the points x = 2 and
x = 4 that divided the regions is not important, and they could be replaced by
any other points to the left and right of the vertical asymptote.
• Next, we need to decide if these four regions have finite or infinite areas. We will
agree to calculate their areas using the following method, illustrated in the right
figure above. In the case of A1 and A4 , we will integrate up to some finite points
a or d first, and then let those points get closer and closer to infinity. Using the
limit notation, we can express this mathematically as
Z 2 Z d
A1 = lim f (x) dx, A4 = lim f (x) dx.
a→−∞ a d→+∞ 4
In the case of A2 and A3 that border the vertical asymptote, we will integrate
up to some point b before the vertical asymptote or from some point c after the
3.9 Improper integrals 197
vertical asymptote and then let those points b and c get closer and closer to this
vertical asymptote from the corresponding side:
Z b Z 4
A2 = lim f (x) dx, A3 = lim f (x) dx.
b→3− 2 c→3+ c
Notice how we wrote b → 3− and c → 3+, which is the notation for the left
and right limits. This notation is very important because it indicates that we
approach the vertical asymptote from a specific side without crossing it.
• If at least one of these pieces is infinite, we say that the integral R ∞
R
−∞ f (x) dx
diverges. If all of these pieces are finite, we say that the integral ∞ −∞ f (x) dx
converges and is equal to A1 + A2 + A3 + A4 . Of course, if a function was
negative on some interval, the corresponding piece could have a minus sign.
Before we consider more concrete examples, let us practice the above definition
first.
Example 1. If a function f (x) has a vertical asymptote on both sides of x = 1 and

is continuous everywhere else, how do we define 0∞ f (x) dx?
R
Solution: We divide the interval [0, ∞) into three “problematic” regions, [0, 1], [1, 2]
and [2, ∞), where the choice of 2 can be replaced by any other point bigger than 1.
Then the definite integral on each piece is defined as
Z a Z 2 Z c
lim f (x) dx, lim f (x) dx, lim f (x) dx.
a→1− 0 b→1+ b c→+∞ 2
The integral 0∞ f (x) dx diverges if at least one of these limits is not finite. If all
R
three are well defined and finite then we just add them up,
Z ∞ Z a Z 2 Z c
f (x) dx = lim f (x) dx + lim f (x) dx + lim f (x) dx.
0 a→1− 0 b→1+ b c→+∞ 2
Exercise 1. If a function f (x) has vertical asymptotes on both sides of x = 1 and

x = 2 and is continuous everywhere else, how do we define 03 f (x) dx?
R
Simple power functions. Our main concrete family of examples will be the
power functions
1
f (x) = p where p > 0.
x
These functions have a vertical asymptote at x = 0 and a horizontal asymptote
y = 0 as x → +∞ (see figures below), so we will consider separately the case of
the vertical asymptote on a finite interval [0, 1] and the case of the infinite interval
[1, ∞).
198 3 Integrals
R1 1
Example 2. Consider the improper integral 0 x p dx. Determine for which p > 0 it
converges and for which p > 0 it diverges.
Solution: By the definition above, we need to find the limit
Z 1 Z 1
1 1
dx = lim dx
0 xp a→0+ a xp
and determine when this limit is finite and when it is infinite. Let us start with the
case p = 1: Z 1
1 x=1
dx = ln(x) = ln(1) − ln(a) = − ln(a).
a x x=a
We know that ln(a) → −∞ when a approaches 0 from the right, so − ln(a) → +∞.
The area is infinite and the integral diverges when p = 1.
When p is not equal to 1, we can use the power rule:
x−p+1 a−p+1
Z 1 Z 1 x=1
1 1
p
dx = x−p dx = = − .
a x a −p + 1 x=a −p + 1 −p + 1
To see what happens when a → 0+, we have to separate into two cases: p > 1 and
p < 1. When p > 1 then p − 1 is positive and
1 a−p+1 1 a−(p−1) 1 1
− = − =− + → +∞
−p + 1 −p + 1 −(p − 1) −(p − 1) p − 1 (p − 1)a p−1
as a → 0+, because a p−1 → 0 in the denominator. This means that the integral
diverges when p > 1. Actually, we can see this without integrating because the
function x1p is bigger than 1x between x = 0 and x = 1 (blue line is above the green
line in the left figure) so the area will be bigger. We already computed that the area
is infinite when p = 1, so the area must also be infinite when p > 1.
Finally, when p < 1 then 1 − p is positive and
1 a−p+1 1 a1−p 1 1
− = − →= +0 =
−p + 1 −p + 1 1 − p 1 − p 1− p 1− p
as a → 0+, because a1−p → 0 in the numerator. So the area is finite in this case
1
and is equal to 1−p . To summarize:
• The integral 01 x1p dx diverges when p = 1 or p > 1.

R
• The integral 01 x1p dx = 1−p

1
R
when p < 1.
This is easy to remember if we keep in mind that the case p = 1 is in the middle
and it diverges. Then we only need to remember that the case of p > 1 is above,
so it also diverges. To remember this, take e.g. p = 2 and x = 0.1, and notice that
x2 = 0.12 = 0.01 < 0.1, so when we divide we get 1x = 0.11 1
< 0.01 = x12 . Notice how
the order of the functions reverses in the right figure above when x ≥ 1. That is
because if we take x = 2 then x2 = 4 and x12 = 14 < 21 = 1x . The answer will also
reverse in this case.
R∞ 1
Exercise 2. Consider the improper integral 1 x p dx. Check that:
• The integral R1∞ x1p dx diverges when p = 1 or p < 1.

R
• The integral 1∞ x1p dx = p−11

when p > 1.
Comparison of integrals. RIn the above example, we could conclude without any
calculations that the integral 01 x1p dx diverges (or is infinite) when p > 1 because,
in this case, x1p is bigger that 1x on this interval, so the area will also be bigger. If we
want to know if an integral converges or diverges, we can often compare to simpler
integrals, or integrals that we have already computed.
• If area A1 is finite and A2 is smaller then it is also finite.
• If area A1 is finite and A2 is bigger then we cannot tell if it is finite or infinite.
• If area A1 is infinite and A2 is smaller then we cannot tell if it is finite or infinite.
• If area A1 is infinite and A2 is bigger then it is also infinite.
Below we will
R ∞ 1take a look
R 1 1 at several concrete examples of comparison with the
p-integrals 1 x p dx or 0 x p dx that we computed above, but first let us take a look
at some comparisons by looking at the graphs of functions.
Example 3. Suppose that the functions in

the figure do not intersect anywhere be-
sides R the two points shown.
R1 If we know
∞
that
R∞ −∞ f (x) dx < ∞, 0 k(x) dx = ∞ and
1 k(x) dx = ∞, do the following integrals
converge, diverge, or we cannot tell?
R∞ R5
(a) 5 h(x) dx. (b) 0 h(x) dx.
R0 R∞
(c) −∞ g(x) dx. (d) −∞ g(x) dx.
200 3 Integrals
Solution: Let us start with a very important observation

that will be useful in this and other similar problems. If
a function is continuous on some interval [a, b], as in the
figure, then the area on this interval is finite and moving
the starting point from a to b does not change whether
the improper integral is finite or infinite. In other
R words,
if an integral a∞ is finiteR then the integral b∞ is also
R
Rfinite, and if an integral a∞ is infinite then the integral

∞
b is also infinite.
(a) We see from the figure that h(x) is bigger than k(x) after the pointR where they
∞
R ∞ call this intersection point xR=
intersect. If we
∞
a then, by comparison,
R∞ a h(x) dx is
bigger than a k(x) dx. But we know Rthat 1 k(x) dx = ∞, so a k(x) dx is also ∞
because of the above comment. Since a∞ h(x) dx is bigger, it is also infinite. Finally,
by the above comment that the starting point of the integral does not affect whether
it is finite or infinite, we get that 5∞ h(x) dx = ∞.
R
(b) Near the vertical asymptote x = 0, the function h(x) is below k(x), so its area
is smaller. However, because 1∞ k(x) dx = ∞ and the area below k(x) is infinite, this
R
does not give us any information.

R5 Area less than infinity could be finite or infinite,
so we cannot tell whether 0 h(x) dx converges or diverges. As in part (a), it does
not matter if the upper limit 5 is different from the intersection point x = a.
R0
(c) The answer is −∞ g(x) dx < ∞ and the integral converges. That is because
g(x) is below f (x) as they approach −∞, and the area below f (x) is finite. The
region between the point where they intersect and x = 0 does not matter because
the areas there are finite.
R∞
(d) The answer is that we cannot tell whether −∞ g(x) dx is finite or infinite. We
know from part R(c) that the part between −∞ and 0 is finite, so we only need to
decide whether 0∞ g(x) dx is finite or infinite. Here, g(x) is above f (x) and below
k(x) but, unfortunately, it does not give us any useful information, because the
integral is bigger than a finite number and smaller than infinity, so it can be finite
or infinite.
Exercise 3. Which of the following state-

ments are true if the functions in the figure
do not intersect.
(a) If 5∞ h(x) dx converges then 3∞
R R
f (x) dx
converges.
(b) If 6∞ h(x) dx diverges then 3∞
R R
f (x) dx
diverges.
(c) If 7∞ g(x) dx diverges then 3∞
R R
f (x) dx
diverges.
(d) If 8∞ g(x) dx converges then 3∞
R R
f (x) dx
converges.
Comparisons with p-integrals. Next, Rwe will consider several examples of

comparison with p-integrals of the form 01 x1p dx or 1∞ x1p dx for p > 0 that we
R
computed above.
R 5 3−2 cos(x)
Example 4. Does the improper integral 0 x2
dx converge or diverge?
Solution: The function 3−2xcos(x)

2 has a vertical asymptote at x = 0 where we divide
by 0 in the denominator, so this is indeed an improper integral. To get an idea
of what to do we should notice that the numerator 3 − 2 cos(x) ‘behaves like a
constant’, because cos(x) is always in between −1 and +1, so 3−2 cos(x) is always
in between 3 − 2 R= 1 and 3 + 2 = 5. If the numerator was truly a constant, we know
that the integral 05 x12 dx diverges because p = 2 > 1, so we should be aiming to
conclude that our integral also diverges. For this purpose, logically we need to
compare from below by an infinite area, so we use that
3 − 2 cos(x) 1
≥ 2.
x2 x
R5 1 R 5 3−2 cos(x)
Because 0 x2 dx diverges, 0 x2
dx also diverges.
R 2 4+2 sin(t)
Exercise 4. Does the improper integral 0
√
t
dt converge or diverge?
R∞ 3
Example 5. Does the improper integral 3 x(x+1) dx converge or diverge?
3
Solution: The function x(x+1) has vertical asymptotes at x = 0 and x = −1 where we
divide by 0, but these points are outside of the interval [3, ∞), so the only issue here
is the infinite interval. Heuristically, we can think of the denominator x(x + 1) =
x2 + x are roughly x2 , because x2 Rterm dominates x term when x is large. If we
ignore x term then we know that 3∞ x32 dx converges, because it is the p-integral
with p > 1. So we should be aiming to show that our integral converges, so we
want to compare the function from above:
3 3
≤ 2.
x(x + 1) x
R∞ 3
This is true because we decreased the denominator from x2 + x to x. Since 3 x2 dx
3
converges, 3∞ x(x+1)
R
dx also converges, by comparison.
√ √2
R∞
Example 5. Does the improper integral 10 x( x−1)
dx converge or diverge?
Answer to Exercise 1. This improper integral is defined as

Z a Z 1.5 Z c Z 3
lim f (x) dx + lim f (x) dx + lim f (x) dx + lim f (x) dx
a→1− 0 b→1+ b c→2− 1.5 d→2+ d
202 3 Integrals
if all the limits exist and are finite. Here 1.5 can be replaced by any point in between
1 and 2, and 3 can be replace by any point bigger than 2. If even one of these four
limits is not finite then the integral diverges.
Answer to Exercise 2. By the definition above, we need to find the limit

Z a
1 1
Z ∞
p
dx = lim dx.
1 x a→∞ 1 xp
Let us start with the case p = 1:
Z a x=a
1
dx = ln(x) = ln(a) − ln(1) = ln(a).
1 x x=1
We know that ln(a) → ∞ when a → ∞ from the right, so the area is again infinite
and the integral diverges when p = 1.
When p is not equal to 1, we can again use the power rule:
x−p+1 a−p+1
Z a Z a x=a
1 1
p
dx = x−p dx = = − .
1 x 1 −p + 1 x=1 −p + 1 −p + 1
When p > 1 then p − 1 is positive and
a−p+1 1 a−(p−1) 1 1 1 1
− = − =− + →
−p + 1 −p + 1 −(p − 1) −(p − 1) (p − 1)a p−1 p − 1 p−1
as a → ∞, because a p−1 → ∞ in the denominator. This means that the integral

1
converges to p−1 when p > 1. Finally, when p < 1 then 1 − p is positive and
a−p+1 1 a1−p 1
− = − →∞
−p + 1 −p + 1 1 − p 1 − p
as a → ∞, because a1−p → ∞ in the numerator.
Answer to Exercise 3. (a) False, (b) True, (c) False, (d) True. Since the functions
are below the x-axis, all integrals will have a negative sign, but we still compare
the areas between these functions and the x-axis.
Answer to Exercise 4. Because 4 + 2 sin(t) is always in between 4 − 2 = 2 and

4 + 2 = 6, we can use that
4 + 2 sin(t) 6
√ ≤√ .
t t
R2 6
0 t dt converges because it is a p-integral with p =
We know that the integral √
R 2 4+2 sin(t)
0.5 < 1, so we conclude that 0
√
t
dt also converges.
√ √ √
√ denominator x( x −
Answer to Exercise 5. The 1) = x − x is dominated by x,
which grows faster than x, so our function √x(√2x−1) = x−2√x behaves like 2x . We
R∞ 2
know that 10 x dx diverges as the p-integral with p = 1, so if we aim to show that
the original integral diverges, we need to bound it from below. Since
2 2 2
√ √ = √ ≥ ,
x( x − 1) x − x x
√ √2
R∞
this gives us the comparison we want, so 10 x( x−1)
dx also diverges.
204 3 Integrals
3.10 Slicing problems: geometry

In this section we will focus on computing volumes, areas, and lengths. In the next
section we will consider quantities that may be distributed unevenly throughout
some geometric region, and computing their total amount will involve one more
computational step. Nevertheless, the general approach will be very similar, so the
purely geometric problems of this section will serve as a foundation for further
applications. We have already computed areas between graphs of functions by ap-
proximating them with rectangles, so here we will focus on volumes and areas.
Computing volumes by slicing. Let us consider a problem of computing the

volume of a loaf of bread, depicted in the above figure. Suppose that the loaf is
aligned along the x-axis, it starts at x = a and ends at x = b. Let A(x) be the area
of the slice at a point x.
Step 1. Let us slice the bread vertically into n thin slices along the length of the
loaf. In other problems, we will also slice an object horizontally, because it will
make the calculation easier.
Step 2. Suppose that the slice number i is between points xi and xi+1 , as shown
in the figure, and the width of one slice ∆x. If the slice is thin enough then its cross-
section does not change much between the two cuts xi and xi+1 and the area of the
slice can be evaluated at any point xi∗ in between xi and xi+1 . In other words, the
area of the slice is approximately equal to A(x∗i ).
Step 3. As a result, the volume ∆Vi of the slice number i is approximately equal
∆Vi ≈ A(xi∗ )∆x.
We denote the volume of one slice by ∆V to emphasize that this volume represents
a small increment of volume when we add another slice of small width ∆x. We can
remember this formula more informally as ∆V ≈ A(x)∆x.
3.10 Slicing problems: geometry 205
Step 4. The total volume is the sum of volumes of n slices, so

n n
V = ∑ ∆Vi ≈ ∑ A(xi∗ )∆x.
i=1 i=1
Step 5. The approximation will get better and better when our slices get thinner
and thinner or, in other words, when the number of slices n gets bigger and bigger.
Using the language of limits,
n Z b
V = lim
n→∞
∑ A(xi∗ )∆x = a
A(x) dx.
i=1
Where did the last integral come from? It appears because we recognized that the
sum in the middle is the Riemann sum corresponding to the function A(x) on
the interval [a, b]. This is how integrals appear in applications where the total
Quantity (in this case Volume) can be approximated by a sum of small pieces that
looks like a Riemann sum of some integral.
Comment. In the problems where we will use the above formula V = ab A(x) dx,
R
it will be easy to compute A(x) because the cross-sections will be simple, such as
circles, or rectangles, or triangles. However, it is very important to write down all 5
steps each time, because in applications in the next section the formula will not be
applicable directly. Instead, Volume will be replaced by some other Quantity and
the crucial Step 3 above will be replaced by a different calculation specific to each
problem. Of course, the formula ∆V ≈ A(x)∆x will typically be used as a building
block in the calculations involving volumes.
Example 1. Compute the volume of a so called

solid of revolution obtained by rotating the graph of
y = 3e−x/2 on the interval [0, 5] around the x-axis, as
shown in the figure.
Solution: We want to compute the volume enclosed
by this cone, and we notice that vertical slices along
the x-axis look like circles, so their areas can be com-
puted using the formula πr2 . The radius of a cross-
section at the position x is 3e−x/2 , so the area of the
cross section is A(x) = π(3e−x/2 )2 = 9πe−x .
Step 1. We slice the region vertically into n thin slices along the x-axis on the
interval [0, 5].
Step 2. If the slice number i between points xi and xi+1 is thin enough then its
cross-section does not change much between the two cuts and the area of the slice
∗
is approximately equal to A(xi∗ ) = 9πe−xi , for any point xi∗ between xi and xi+1 .
Step 3. As a result, the volume of the slice number i is approximately equal
∗
∆Vi ≈ 9πe−xi ∆x, where ∆x is the width of one slice.
206 3 Integrals
Step 4. The total volume is the sum of ∆Vi , so V ≈ ∑ni=1 9πe ∆x. −xi∗
Step 5. The approximation will get better as the number of slices gets bigger, so
n Z 5
−xi∗
V = lim ∑ 9πe ∆x = 9πe−x dx.
n→∞ 0
i=1
The last integral appears because the sum in the middle is the Riemann sum corre-
sponding to the function 9πe−x on the interval [0, 5]. The interval [0, 5] is implicit
in the notation of the sum, but we should remember that in the first step we were
slicing this interval [0, 5], so the Riemann sum is defined on this interval.
In this particular case we can compute the integral using the FTC,
Z 5 x=5
9πe−x dx = −9πe−x = −9πe−5 − (−9πe−0 ) = −9πe−5 + 9π,
0 x=0
so V = −9πe−5 + 9π ≈ 28.0838 . . . .
Exercise 1. Compute the volume of a solid of revo-

lution obtained by rotating the graph of y = 3 − 2x on
the interval [0, 6] around the x-axis, as shown in the
figure.
Example 2. A vase of height H has radius r(h) at height

h. How much water will fit inside the vase?7
Solution: We will assume that the walls of the vase are
thin, or the radius r(h) refers to the inner radius of the
vase. We notice that horizontal slices along the h-axis
look like circles, so their areas can be computed using
the formula πr(h)2 .
Step 1. We slice the vase horizontally into n thin
slices along the h-axis on the interval [0, H].
Step 2. If the slice number i between height hi and
hi+1 is thin enough then its cross-section does not
change much between the two cuts and the area of the
slice is approximately equal to πr(h∗i )2 , for any height h∗i between hi and hi+1 .
Step 3. As a result, the volume of the slice number i is approximately equal to
∆Vi ≈ πr(h∗i )2 ∆h, where ∆h is the width of one slice.
7 Image from rawpixel.com.
Step 4. The total volume is the sum of ∆Vi , so V ≈ ∑ni=1 πr(h∗i )2 ∆h.
n Z H
V = lim
n→∞
∑ πr(h∗i )2 ∆h = 0
πr(h)2 dh.
i=1
The last integral appears because the sum in the middle is the Riemann sum corre-
sponding to the function πr(h)2 on the interval [0, H].
Exercise 2. A vase of height H has inner radius r(h) and
outer radius R(h) at height h. Set up an integral for the
volume of the sidewall of the vase.
Computing lengths by slicing.

Next, we will derive an integral for-
mula for the length of the graph of a
function y = f (x) on the interval [a, b],
again using the slicing method. The
only real difference will be that, instead
of the increment ∆V of the volume, we
will now have to compute (or more pre-
cisely, approximate) the increment ∆L
of the length of the curve in terms of
the function y = f (x).
Step 1. Let us slice the curve vertically into n small arcs along the x-axis, as in
the figure.
Step 2. Suppose that the arc number i is between points xi and xi+1 , and the width
of one slice is ∆x. If ∆x is small then the slope does not change much between xi
and xi+1 and the length ∆Li of the arc is approximately equal to the length of the
secant (hypothenuse of the right triangle in the figure):
r r
p ∆y2 2 ∆y 2
∆Li ≈ ∆x2 + ∆y2 = 1 + 2 ∆x = 1 + ∆x.
∆x ∆x
∆y
The ratio ∆x approximates the slope f ′ (x) of y = f (x) between xi and xi+1 and,
because the slope does not change much, it can be evaluated at any point xi∗ in
between.
208 3 Integrals
Step 3. As a result, the length ∆Li of the arc is approximately equal to

q
∆Li ≈ 1 + f ′ (xi∗ )2 ∆x.
p
We can remember this formula more informally as ∆L ≈ 1 + f ′ (x)2 ∆x.
Step 4. The total length is the sum of lengths of n small arcs, so
n n q
L = ∑ ∆Li ≈ ∑ 1 + f ′ (xi∗ )2 ∆x.
i=1 i=1
Step 5. The approximation will get better and better when our arcs get smaller
and smaller or, in other words, when the number of slices n gets bigger and bigger.
Using the language of limits,
n q Z bq
L = lim ∑ 1+ f ′ (xi∗ )2 ∆x = 1 + f ′ (x)2 dx.
n→∞ a
i=1
because the sum in the middle is the Riemann sum

Again, the last integral appearsp
corresponding to the function 1 + f ′ (x)2 on the interval [a, b]. The key step that
should be memorized p and does not need to be derived every time is the element of
length formula: ∆L ≈ 1 + f ′ (x)2 ∆x.
Example 3. The main span of Brooklyn bridge is approximately 480 meters long.
If we place the origin in the center of the bridge, the shape of the main cable can
be approximated by the graph of y = 0.0008x2 , as shown in the figure. Derive the
formula for the length of this cable using the slicing method and evaluate it in
Wolfram Alpha.
Solution: Step 1. We slice the cable vertically into n small arcs along the x-axis on
the interval [−240, 240]. Since we placed the origin in the middle of the bridge, the
main span of the bridge is between the coordinates −240 and 240.
Step 2. Arc number i is between points xi and xi+1 , and the width of one slice is
∆x. If ∆x is small then the slope does not change much between xi and xi+1 .
Step 3. Because f ′ (x) = 0.0016x, from the right triangle calculation, the length
∆Li of the arc is approximately equal to
q q
∆Li ≈ 1 + f (xi ) ∆x = 1 + (0.0016xi∗ )2 ∆x.
′ ∗ 2

n n q
L = ∑ ∆Li ≈ ∑ 1 + (0.0016xi∗ )2 ∆x.
i=1 i=1
Step 5. Approximation will get better when the number of slices gets bigger, so
n q Z 240 q
L = lim ∑ 1 + (0.0016xi∗ )2 ∆x = 1 + (0.0016x)2 dx.
n→∞ −240
i=1
Evaluating this integral in Wolfram Alpha gives L = 491.5 meters. One can actually
find the antiderivative and apply the FTC using some special substitution, but this
is a more advanced material which is beyond what we have studied before.
Exercise 3. Compute the length of the graph of y = x3/2 on the interval [0, 1] by
first setting up the integral using the slicing method and then evaluating it.
Answer to Exercise 1. A cross-section at the position x is a circle of radius 3 − 2x ,

so its area is A(x) = π(3 − 2x )2 .
Step 1. We slice the region vertically into n thin slices along the x-axis on the
interval [0, 6].
Step 2. If the slice number i between points xi and xi+1 is thin enough then its
cross-section does not change much between the two cuts and the area of the slice
x∗
is approximately equal to A(xi∗ ) = π(3 − 2i )2 for any point xi∗ between xi and xi+1 .
Step 3. As ∗a result, the volume of the slice number i is approximately equal
x
∆Vi ≈ π(3 − 2i )2 ∆x, where ∆x is the width of one slice.
x∗
Step 4. The total volume is the sum of ∆Vi , so V ≈ ∑ni=1 π(3 − 2i )2 ∆x.
n
xi∗ 2
Z 6
x 2
V = lim ∑ π 3 − ∆x = π 3− dx.
n→∞
i=1 2 0 2
We can compute the integral using the FTC,

210 3 Integrals
Z 6
x 2 π 1 x 3 x=6 π 1
π 3− dx = × 3− = 0− × × 33 = 18π.
0 2 (−1/2) 3 2 x=0 (−1/2) 3
Answer to Exercise 2. We could subtract the inner volume from the outer volume
to get the volume of the vase itself:
Z H Z H Z H
V= πR(h)2 dh − πR(h)2 dh = π(R(h)2 − r(h)2 ) dh.
0 0 0
However, to practice the slicing method it is better to follow the usual slicing steps
as in the previous example. The only difference here is that the slice is a disk in
between two circles of radius R(h) and r(h), so the area of one slice is A(h) =
πR(h)2 − πr(h)2 = π(R(h)2 − r(h)2 ).
Answer to Exercise 3. Step 1. We slice the graph

vertically into n small arcs along the x-axis on the
interval [0, 1].
Step 2. Arc number i is between points xi and xi+1 ,
and the width of one slice is ∆x. If ∆x is small then
the slope does not change much between xi and xi+1 .
Step 3. Because f ′ (x) = (3/2)x1/2 , from the right
triangle calculation, the length ∆Li of the arc is ap-
proximately equal to
q q
∆Li ≈ 1 + f (xi ) ∆x = 1 + (9/4)xi∗ ∆x.
′ ∗ 2

n n q
L = ∑ ∆Li ≈ ∑ 1 + (9/4)xi∗ ∆x.
i=1 i=1
Step 5. Approximation will get better when the number of slices gets bigger, so
n q Z 1p
L = lim ∑ 1 + (9/4)xi∗ ∆x = 1 + (9/4)x dx.
n→∞ 0
i=1
This integral can be easily computed,

Z 1p x=1
1 1
L= 1 + (9/4)x dx = × (1 + (9/4)x)3/2 = 1.4397 . . . .
0 9/4 3/2 x=0
3.11 Slicing problems: densities 211
3.11 Slicing problems: densities

We have learned how to compute areas, volumes and lengths of some geometric
regions, and right now we will learn how to compute various quantities that may
be distributed over those regions. These quantities will typically be distributed un-
evenly, so we will first need to compute the quantity is a small region (slice) and
then add them up to get the total. In other words, we will be using the slicing
method again, only with some extra calculations.
Linear densities. If Q denotes some Quantity of interest then the general idea
will be to use formulas of the form:
Quantity ∆Q
Quantity = × Length or ∆Q = × ∆L.
Length ∆L
The ratio Quantity ∆Q

Length (or ∆L ) is often called a linear density, because it tells us how
densely the quantity is distributed per unit of length. Examples of the units of
kg $ #
linear density are meter , mile , km etc., where the numerator has units of our quantity
of interest, and denominator has units of length. When we use the slicing method:
• if the quantity is distributed over a straight line, for example the x-axis, then the
increment of length in the slicing method will simply be ∆L = ∆x;
• if the quantity is distributed over some curve described by the graph of a
function y = f (x) then the increment of length in the slicing method will be
p
∆L ≈ 1 + f ′ (x)2 ∆x.
The density ∆Q
∆L will be given to us either explicitly or implicitly, which might
require some calculation of the quantity per unit of length locally on a given slice.
Example 1. Suppose that the traffic on

a 1.4 km stretch of Queen St between
Spadina and Yonge is moving at the
speed of v(x) km/h, where x (in km) is
the distance from Spadina and Queen
intersection. Derive a formula for the
time it take to drive from from Spadina
to Yonge in this traffic. Is the integral
formula obtained a proper or improper
integral?
Solution: Recall that speed = distance
time . If the speed was constant, v km/h, to find time
we could simply divide the distance of 1.4 km by speed, T = 1.4 v hour. However,
because the speed is changing depending on the location x, we need to use the
slicing method and apply the formula time = distance
speed locally.
Step 1. We slice the given stretch of Queen St into n small subintervals of length
∆x each along the interval [0, 1.4] km.
212 3 Integrals
Step 2. If the interval number i between points xi and xi+1 is small enough then
the speed is almost constant on this interval and is approximately equal to v(xi∗ ),
for any point xi∗ between xi and xi+1 .
Step 3. Since the speed is almost constant, as a result, we can apply the above
formula to write the increment of time between points xi and xi+1 as
∆x
∆Ti ≈ .
v(xi∗ )
Step 4. The total time is the sum of ∆Ti , so T ≈ ∑ni=1 v(x

∆x
∗ .
i)
n Z 1.4
∆x dx
T = lim ∑ v(x∗ ) =
n→∞
i=1 i 0 v(x)
because the sum in the middle is the Riemann sum corresponding to the function
1
v(x) on the interval [0, 1.4].
1
Since the car can stop at a traffic light, the speed v(x) can approach 0 and v(x)
can approach a vertical asymptote at that point. As a result, the answer could be an
improper integral. It is a subtle point, but in the above calculation we should avoid
points where v(xi∗ ) is equal to zero. Obviously, this improper integral would be
convergent because the time cannot be infinite. In a problem like this, for simplicity,
in a Calculus class it would probably be assumed that v(x) is always positive.
Exercise 1. A mountain goat is walking

straight east on a mountain path tracing
an altitude y = f (x) (in km), where x
is the horizontal coordinate (longitude,
also in km). Suppose that goat’s speed
is always positive and depends only on
the altitude y, i.e. v = v(y) km/h. Find
the time it takes to walk from longitude
x = a to x = b.
Example 2. In the setting of the previous problem, the density of plants is changing
with altitude y, so the amount of food the goat consumes along the way is A(y)
kg/km. What is the total amount of food the goat eats between x = a and x = b.
Solution: Although the word “density” was not mentioned explicitly in the prob-
lem, notice how the units kg/km tell us that A(y) is actually the density of the
amount of food per unit of distance. If this density was constant, we could simply
multiply it by the length of the path to get the total amount of food. However, since
A(y) changes with altitude, we need to use the slicing method.
Step 1. We slice the path into n small subintervals of width ∆x along the x-axis
on the interval [a, b] km.
Step 2. If the interval number i between points xi and xi+1 is small then the
altitude does not change much and so the density of food is almost constant on this
interval and is approximately equal to
A(y) = A( f (xi∗ ))
for any point xi∗ between xi and xi+1 . Although A(y) depends on the altitude y, this
altitude should be evaluated at a point xi∗ and the quantity should be expressed in
terms of x when we slice along the x-axis.
Step 3. We need to multiply the density of food in kg/km by the distance in km
to get the amount of food in kg. As a result, the amount of food between points xi
and xi+1 is
q
∆Fi ≈ A( f (xi∗ ))∆Li ≈ A( f (xi∗ )) 1 + f ′ (xi∗ )2 ∆x.
Here the distance ∆L is not the horizontal

p increment ∆x, but the increment along
′
the mountain path y = f (x): ∆L ≈ 1 + f (x)2 ∆x.
Step 4. The total amount of food is the sum of ∆Fi , so
n n q
F = ∑ ∆Fi ≈ ∑ A( f (xi∗ )) 1 + f ′ (xi∗ )2 ∆x.
i=1 i=1
n n q
∗
F = lim ≈
∑ i ∑
∆F A( f (xi )) 1 + f ′ (xi∗ )2 ∆x
n→∞
i=1 i=1
Z 1.4 q
= A( f (x)) 1 + f ′ (x)2 dx
0
becausep the sum in the middle is the Riemann sum corresponding to the function
A( f (x)) 1 + f ′ (x)2 on the interval [a, b].
Exercise 2. Hiker is walking straight east on a mountain path which is tracing an

altitude y = f (x) (in km), where x (in km) is the horizontal coordinate (longitude).
Suppose that hiker’s speed is always positive and depends only on the longitude,
v = v(x) km/h. Hiker is breathing air at a rate
r = 15 + 100|slope| litres per minute,
where slope means the slope of the graph of y = f (x). Find the total volume of air
the hiker breathes between x = a and x = b. Pay attention to units!
214 3 Integrals
Volume densities. Next, we will consider quantities distributed over some vol-
ume, so we will be using:
Quantity ∆Q
Quantity = × Volume or ∆Q = × ∆V.
Volume ∆V

Volume (or ∆V ) is called a volume density, because it tells us how densely
the quantity is distributed per unit of volume.
Example 3. Trees “lift” water from roots to shoots by a

mechanism of a decrease in hydrostatic (water) pressure
created by transpiration, which is the evaporation of water
from leaves. The work done by a tree against the gravity
force to lift m kg of water to height h meters is given by the
formula
W = Force × Distance = mg × h
(measured in J (joule) = Newton × meter = kg × m2 /s2 ).

Palm tree trunk contains 75% of water by volume. If palm
tree trunk is H meters high and has radius r(h) at height h
meters, how much work does the tree do to “refill” its trunk with water. Denote by
ρ the density of water.
Solution: When using the formula W = mg × h, we will compute the mass of water
as ρ×(Volume of water)= ρ × 0.75 (Volume of tree trunk), because water is 75%
of tree trunk by volume. However, in Step 3 of the slicing method below we will
have to apply this formula locally at height h because the work done changes with
height.
Step 1. We slice the tree horizontally into n narrow disks of width ∆h along the
h-axis (y-axis) on the interval [0, H], i.e. between height 0 and height H.
Step 2. If the interval number i between height hi and hi+1 is small enough
then the radius of the cross-section r(h) and height h do not change much and are
approximately equal to r(h∗i ) and h∗i for any point h∗i between hi and hi+1 .
Step 3. Palm tree looks like a body of revolution so we can think of its cross-
section as a circle with the area A ≈ πr(h∗i )2 and, as a result, the volume of one
slice is ∆Vi ≈ πr(h∗i )2 ∆h. Because 75% of the tree trunk is water, the mass of water
in this slice is 0.75ρ∆Vi ≈ 0.75ρπr(h∗i )2 ∆h. Using the formula for work stated in
the problem, the work that the tree does to lift this much water to height h∗i is
∆Wi ≈ 0.75ρπr(h∗i )2 ∆hg × h∗i = 0.75gπρr(h∗i )2 h∗i ∆h.
The units will be J (joule), because all the units were consistent.
Step 4. The total work is the sum of ∆Wi , so
n n
W = ∑ ∆Wi ≈ ∑ 0.75gπρr(h∗i )2 h∗i ∆h.
i=1 i=1
n Z H
W = lim ∑ 0.75gπρr(h∗i )2 h∗i ∆h = 0.75gπρr(h)2 h dh (in J)
n→∞ 0
i=1
0.75gπρr(h)2 h on the interval [0, H].
Exercise 3. The amount of heat H stored in a piece of material is
H = cρTV
where c is a constant called the specific heat in J/(kg · K) that depends on the
material, ρ is the mass density in kg/m3 , V is the volume in m3 , and T is the
temperature in K.
Consider a bar of length L meters made of aluminium with constant density ρA and
specific heat cA , and varying radius r(x) (in meters) and temperature T (x) (in K).
What is the total heat stored in the bar?
Example 4. A glass bowl depicted in the figure has

semi-circular sides of radius R cm, and it has length
L cm. When it is filled with potato chips they tend
to break and smaller pieces accumulate towards the
bottom, so the density of chips changes with depth
according to the function C = C(d) g/cm3 , where
depth d is 0 at the top and R at the bottom. Find
the total mass of potato chips in the bowl.
Solution: Main formula we want to use is, of course, mass = density × volume,
but we need to combine it with the slicing method to make sure that the density is
constant or almost constant. Since the density changes with depth:
Step 1. We slice the bowl horizontally into n slices
of height ∆d along the (y-axis) on the interval [0, R],
i.e. between depth 0 and R.
Step 2. When ∆d is small, the density of chips is
almost constant on any given slice and is approx-
imately equal to C(di∗ ) for any point di∗ between di
and di+1 . Also we can see by looking at the√ side view
that the width of the slice at depth d is 2 R2 − d 2 , so the volume of the rectangular
slice is approximately ∆Vi ≈ 2 R2 − (di∗ )2 × L × ∆d.
p
216 3 Integrals
Step 3. Using the formula mass = density×volume when the density is constant:
q
∆mi ≈ C(di∗ )2 R2 − (di∗ )2 L ∆d.
The units will be grams, because all the dimensions are given in cm and the density
is given in g/cm3 .
Step 4. The total mass is the sum of ∆mi , so
n n q
m = ∑ ∆mi ≈ ∑ (2L)C(di∗ ) R2 − (di∗ )2 ∆d.
i=1 i=1
n q Z R p
m = lim (2L)C(di∗ )
∑ R2 − (di∗ )2 ∆d = (2L)C(x) R2 − x2 dx grams
n→∞ 0
i=1
because the
√ sum in the middle is the Riemann sum corresponding to the function
(2L)C(x) R2 − x2 on the interval [0, R]. The reason we replaced the depth variable
d by variable x in the integral is because dd would look confusing in place of dx.
Exercise 4. Average oxygen concentration

in the Atlantic ocean at depth d km is given
by the function C = C(d) mg/litre (de-
picted in the figure for depths between 0
and 5 km).8 According to the ocean depth
chart, the area of the ocean at depth d is
given by A = A(d) (in millions of km2 ).
How much oxygen is stored in Atlantic
ocean up to the depth of 5 km?
Area densities. Finally, we will consider quantities distributed over some area,
so we will be using:
Quantity ∆Q
Quantity = × Area or ∆Q = × ∆A.
Area ∆A

Area (or ∆A ) is called a area density, because it tells us how densely
the quantity is distributed per unit of area.
8 https://bit.ly/3QFCMNA
Example 5. A circular city of Ecbatana

has radius R and population density of
d(r) people per km2 at distance r km
from the center. What is its total popu-
lation?
Solution: Step 1. We slice the city into n
circular strips of width ∆r between ra-
dius 0 and R. One such slice is shown
in the figure.
Step 2. When ∆r is small, radius
does not change much on one strip so
the population density is almost con-
stant and is approximately equal to
d(ri∗ ), evaluated at any radius ri∗ between ri and ri+1 , on the strip number i. The
perimeter of the strip is approximately 2πri∗ and the width is ∆r, so its area is
approximately ∆Ai ≈ 2πri∗ ∆r.
Step 3. This means that the population on the strip number i is
∆Pi = density × area ≈ d(ri∗ )2πri∗ ∆r.
Step 4. The total population is the sum of ∆Pi , so P ≈ ∑ni=1 d(ri∗ )2πri∗ ∆r.
n Z R
P = lim
n→∞
∑ d(ri∗ )2πri∗ ∆r = 0
2πd(r)r dr people
i=1
2πd(r)r on the interval [0, R].
Exercise 5. Trees transform carbon dioxide (CO2 )
and water into glucose and oxygen using sunlight.
This process is called photosynthesis. According to a
paper9 , during 1 hour the leaves of the plant Plantago
Asiatica produce glucose and oxygen at a rate

r(T ) = 0.36 11 + 0.95(T − 10) − 0.025(T − 10)2
measured in µmol/cm2 , where T is the temperature

of the leaf in C◦ . The leaf in the figure (where x and y are in cm) is in partial shade
and the temperature of different parts of the leaf varies according to the formula
2
T (x) = 10+2 sin( x2 ) C◦ . What is the total amount of glucose and oxygen produced
by this leaf during 1 hour?
9 https://doi.org/10.1093/jxb/erj049
218 3 Integrals
Answer to Exercise 1. Step 1. We slice the path into n small subintervals of width
∆x along the x-axis on the interval [a, b] km.
the altitude does not change much and so the speed is almost constant on this
interval and is approximately equal to
v(y) = v( f (xi∗ ))
for any point xi∗ between xi and xi+1 . Although speed v(y) depends on the altitude y,
this altitude should be evaluated at a point xi∗ and the quantity should be expressed
in terms of x when we slice along the x-axis.
Step 3. Since the speed is almost constant, as a result, we can write the increment
of time between points xi and xi+1 as
1 + f ′ (xi∗ )2 ∆x
p
distance ∆Li
∆Ti = ≈ ≈ .
speed v( f (xi∗ )) v( f (xi∗ ))
Notice that, compared to the previous example, here the distance ∆L is not the hor-
izontal increment ∆x, but the increment p along the mountain path y = f (x), which
we found in the last section: ∆L ≈ 1 + f ′ (x)2 ∆x. This calculation is the main
step and the biggest difference from the previous example. √ ′ ∗2
n 1+ f (x ) ∆x
Step 4. The total time is the sum of ∆Ti , so T ≈ ∑i=1 v( f (x∗i)) .
i
1 + f ′ (xi∗ )2 ∆x
n
p Z 1.4 p
1 + f ′ (x)2
T = lim ∑ = dx
n→∞
i=1 v( f (xi∗ )) 0 v( f (x))
because
√ ′ 2the sum in the middle is the Riemann sum corresponding to the function
1+ f (x)
v( f (x)) on the interval [a, b].
Answer to Exercise 2. Step 1. We slice the path into n small subintervals of width
∆x along the x-axis on the interval [a, b] km.
the speed does not change much and is approximately equal to v(xi∗ ) km/h for
any point xi∗ between xi and xi+1 . The altitude also does not change much, so the
breathing rate is approximately equal to 15 + 100|slope| = 15 + 100| f ′ (xi∗ )| L/min,
because the slope of y = f (x) is f ′ (x).
Step 3. How many litres of air does the hiker breathe between points xi and xi+1 ?
Here the units can help us. To get litres we need to multiply L/min by min, so we
need to multiply the rate 15 + 100| f ′ (xi∗ )| L/min by time ∆T in minutes. We can
computed time as in the previous problems as
1 + f ′ (xi∗ )2 ∆x
p
distance ∆Li
∆Ti = ≈ ≈ .
speed v(xi∗ ) v(xi∗ )
The only subtle issue is that time is in hours, because the distance was given in km
and speed was given in km/h, so when we multiply the breathing rate by time,
1 + f ′ (xi∗ )2 ∆x
p
′ ∗

15 + 100| f (xi )| × ,
v(xi∗ )
the units are L/min × hour = L/min × 60min = 60 L. So the amount of air is:
1 + f ′ (xi∗ )2 ∆x
p
′ ∗

∆Ai ≈ 15 + 100| f (xi )| × × 60 litres.
v(xi∗ )
Step 4. The total amount of air is the sum of ∆Ai , so
1 + f ′ (xi∗ )2 ∆x
n
p
′ ∗

A ≈ ∑ 60 15 + 100| f (xi )| × .
i=1 v(xi∗ )
1 + f ′ (xi∗ )2 ∆x
n
p
′ ∗

A = lim ∑ 60 15 + 100| f (xi )| ×
n→∞
i=1 v(xi∗ )
p
1 + f ′ (x)2
Z 1.4
60 15 + 100| f ′ (x)| ×

= dx litres
0 v(x)
′
√1+ f ′ (x)2
60 15 + 100| f (x)| v( f (x)) on the interval [a, b].
Answer to Exercise 3. Step 1. We slice the bar vertically into n narrow disks of
width ∆x along the x-axis on the interval [0, L].
the radius of the cross-section and temperature do not change much and are ap-
proximately equal to r(xi∗ ) and T (xi∗ ) for any point xi∗ between xi and xi+1 .
Step 3. As we saw in the last section, for a body of revolution the cross-section
is a circle so its area is A ≈ πr(xi∗ )2 and, as a result, the volume is ∆Vi ≈ πr(xi∗ )2 ∆x.
Using the formula stated in the problem, the heat stored in one slice is
∆Hi ≈ cA ρA T (xi∗ )∆Vi = cA ρA T (xi∗ )πr(xi∗ )2 ∆x.
The units will be J (joule), because all the units were consistent.
Step 4. The total heat is the sum of ∆Hi , so H ≈ ∑ni=1 πcA ρA T (xi∗ )r(xi∗ )2 ∆x.
n Z L
A = lim ∑ πcA ρA T (xi∗ )r(xi∗ )2 ∆x = πcA ρA T (x)r(x)2 dx J
n→∞ 0
i=1
πcA ρA T (x)r(x)2 on the interval [0, L].
220 3 Integrals
Answer to Exercise 4. Step 1. We slice the Atlantic ocean into n slices of depth
∆d between depth 0 and 5 km.
Step 2. When ∆d is small, oxygen concentration is almost constant on any given
slice and is approximately equal to C(di∗ ) for any point di∗ on the interval number i
between di and di+1 . The area of the slice is approximately A(di∗ ), so its volume is
∆Vi ≈ C(di∗ )A(di∗ ).
Step 3. Using the formula Mass = Concentration × Volume when concentration
is constant, we get that ∆mi ≈ C(di∗ )A(di∗ ) ∆d. The area was given in millions of
km2 , depth in km, and concentration in gram/litre. There are 1012 litres in cubic
kilometers, so when we multiply the units we will get 1018 grams. Converting this
to tonnes we get that the units will be in million tonnes.
Step 4. The total mass is the sum of ∆mi , so
n n
m = ∑ ∆mi ≈ ∑ C(di∗ )A(di∗ ) ∆d.
i=1 i=1
n Z 5
m = lim
n→∞
∑ C(di∗ )A(di∗ ) ∆d = 0
C(x)A(x) dx millions of tonnes
i=1
C(x)A(x) on the interval [0, 5] km.
Answer to Exercise 5. The rate of photosynthesis r(T ) will depend on the coor-
dinate x on the leaf, because temperature T = T (x) varies with x. Let us record
that
r(T (x)) = 0.36 11 + 1.9 sin(x2 /2) − 0.1 sin2 (x2 /2) .

Step 1. We slice the leaf into n slices of width ∆x between x = 0 and x = 10 cm.
Step 2. When ∆x is small, the rate of photosynthesis is almost constant on one
slice and is approximately equal to r(T (xi∗ )) for any point xi∗ on the interval number
i between xi and xi+1 . The height of the slice is approximately f (xi∗ ) − g(xi∗ ), so its
area is ∆Ai ≈ ( f (xi∗ ) − g(xi∗ ))∆x.
Step 3. Notice that the rate of photosynthesis was given per unit or area,
so we can use that Amount = Rate × Area when the rate is constant. As a re-
sult, the amount of glucose and oxygen produced during 1 hours in one slice is
∆Ai ≈ r(T (xi∗ ))( f (xi∗ ) − g(xi∗ ))∆x. The area in the figure is in cm2 , and rate is in
µmol/cm2 , so the amount here is in µmol.
Step 4. The total amount is the sum of ∆Ai , so
n n
A = ∑ ∆Ai ≈ ∑ r(T (xi∗ ))( f (xi∗ ) − g(xi∗ ))∆x.
i=1 i=1
n Z 10
A = lim
n→∞
∑ r(T (xi∗ ))( f (xi∗ ) − g(xi∗ ))∆x = 0
r(T (x))( f (x) − g(x)) dx µmol
i=1
r(T (x))( f (x)−g(x)) on the interval [0, 10] cm. At this stage we can replace r(T (x))
by the specific formula above:
Z 10
0.36 11 + 1.9 sin(x2 /2) − 0.1 sin2 (x2 /2) f (x) − g(x) dx µmol

A=
0
Chapter 4
Differential equations
4.1 Differential equations: qualitative analysis

In this chapter we will study differential equations of the form
y′ = f (x, y).
In this equation, x is an independent variable and y is a function y = y(x). In other

words, the equation can be written more precisely as
y′ (x) = f x, y(x)

but we will often write it simply as y′ = f (x, y), keeping in mind that y here actually
means a function y(x). This function is usually unknown to us and we want to solve
this equation to find y(x). A function y(x) is a solution of the equation y′ = f (x, y)
if it satisfies this equation, which means that if we plug it into the two sides of the
equation we get equality. As we will see in the examples, an equation like this will
have many solutions depending on the starting point y(x0 ) = y0 , which is called the
initial condition.
• In this section we will focus on understanding some qualitative behaviour of
solutions of some equations.
• In the next section we will see how we can approximate solutions using Taylor
polynomials and the so called Euler’s method.
• In the section after that we will learn how to find solutions for the so called
separable equations.
• After that, we will do some modelling using differential equations and consider
examples of systems of equations.
Let us introduce some additional definitions in the context of an applied example.
223
224 4 Differential equations
Example 1. By Newton’s law of cool-

ing, the temperature of a cup of coffee
in a room with temperature 20◦ C can be
modelled by the differential equation
y′ = κ(20 − y)
for some positive constant κ > 0, where

time t is an independent variable and
y = y(t) is the temperature changing
with time. We will take κ = 0.02 in this
example, as in the figure.
(a) The figure shows several solutions of this equation. What distinguishes these
solutions?
(b) One of the solutions is y(t) = 20. Verify that it is indeed a solution.
(c) Why is it sensible to call y(t) = 20 an equilibrium solution?
(d) Using common sense, would you call y(t) = 20 a stable or unstable equilibrium?
Why?
Solution: (a) One of the things that distinguishes these solutions is the initial tem-
perature y(0) at time t = 0 or, in other words the initial condition. We see that
the solutions corresponding to temperatures 30, 50 and 70◦ C are decreasing over
time to 20◦ C, while solutions corresponding to temperatures 10 and −10◦ C are
increasing to 20◦ C.
(b) To verify that y(t) = 20 is a solution of y′ = κ(20 − y), we check that y′ (t) =
(20)′ = 0 and κ(20 − y(t)) = κ(20 − 20) = 0, so the two sides are indeed equal.
(c) It is sensible to call y(t) = 20 an equilibrium solution, because it reflects that
coffee at room temperature will stay at room temperature, so its temperature is in
a state of rest or balance.
(d) The equilibrium solution y(t) = 20 is a stable equilibrium because solutions
that start close to 20◦ C do not move away and always stay close, in fact, getting
closer and closer to 20◦ over time. We will see examples below when a solution
starting close to an equilibrium moves away from it over time. Such an equilib-
rium will be called an unstable equilibrium. If we are only given the equation
y′ = κ(20 − y) and do not see the figure, how can we check that y(t) = 20 is a
stable equilibrium? We can use the following diagram:
y′ = 0.02(20 − y) : + −
y
20
Since the derivative y′ is 0.2(20 − y), so it depends only on y, we draw an arrow
representing y-axis and we mark 20 on this axis, which is our equilibrium point.
Then we check the sign of the derivative to the left and right of 20. For example,
if we plug in y = 10, we get 0.2(20 − 10) = 2 which is positive and, if we plug
4.1 Differential equations: qualitative analysis 225
in y = 30, we get 0.2(20 − 30) = −2 which is negative. To the left of 20 we draw

an arrow in the positive direction to indicate that solutions increase there, because
y′ (t) > 0, and to the right of 20 we draw an arrow in the negative direction to
indicate that solutions decrease there. The diagram shows that the solutions starting
nearby 20 will move towards 20, so it must be a stable equilibrium.
Comment. In the above example, the differential equation was of the type
y′ = f (y).
In other words, the right hand side depends only on y and does not depend on
the independent variable x or t explicitly. Such equations are called autonomous
equations. Otherwise, the equation is non-autonomous. For example, y′ = y + t
is non-autonomous, because of the term +t. We will discuss autonomous vs non-
autonomous equations in the examples below. To find equilibrium solutions of an
autonomous equation, we simply find all the points where f (y) = 0. For example,
in the above example 0.2(20−y) = 0 when y = 20, so this was the only equilibrium
solution.
Exercise 1. Consider a differential equation y′ = 0.3(y − 2).

(a) Is this an autonomous equation?
(b) Find all equilibrium solutions. Verify that they are indeed solutions.
(c) Are they stable or unstable?
(d) Roughly sketch some examples of solutions.
Example 2. Consider a differential equation y′ = y(1 − y).

(d) Roughly sketch some examples of solutions.
Solution: (a) This is an autonomous equation, because the right hand side y(1 − y)
depends only on y.
(b) The right hand side y(1 − y) is equal to 0 when y = 0 or y = 1, so there
are two equilibrium solutions. To check that y(t) = 1 is a solution, we plug it into
y′ = y(1 − y) and we get (1)′ = 1(1 − 1), because both sides are zero. Similarly,
y(t) = 0 is a solution because (0)′ = 0(1 − 0).
(c) To decide if these equilibria are stable or unstable, we draw a diagram:
y′ = y(1 − y) : − + −
y
0 1
From this diagram we see that y = 0 is an unstable equilibrium because solutions
that start nearby are moving away from it, and y = 1 is a stable equilibrium because
solutions are moving towards it.
(d) In the figure on the right we

sketched two equilibrium solutions
y = 0 and y = 1, and three more so-
lutions above, below, and in between
these points. According to the above
diagram, the solutions above and be-
low are decreasing because y′ < 0, and
the solution in between is increasing
because y′ > 0 there.
This figure contains a new feature
called the slope field, which is a bunch
of arrows that describe in which direc-
tion a solution would flow at various points on the plane. Before we jump to the
next problem, let us explain how slope fields work exactly.
Slope fields. The figure to the right
explains how the slope field is con-
structed. Given a differential equation
y′ = f (x, y):
• We take some regular grid on the x-
y plane. In the figure, a 3 × 3 grid is
illustrated by red dots.
• At each point (a, b) on the grid, we
draw a small segment of a line with
the slope f (a, b). For example, if
f (x, y) = y(1 − y) as in the previous
example then, at the point (1.5, 0.5)
we draw a segment of the line with the slope f (1.5, 0.5) = 0.5(1 − 0.5) = 0.25.
Sometimes arrows are added like in the above example, but this is not necessary.
This construction of the slope field is very natural, because a solution y(x) of the
equation y′ = f (x, y) passing through a point (a, b) must have the slope f (a, b). As
a result, each piece of the slope field represents a segment of the tangent line of
a solution passing through that point, so the slope field helps us visualize how the
solutions would move at various points of the plane.We can draw a slope field in
Geogebra using the command
SlopeField( f (x, y), n, a, Min x, Min y, Max x, Max y).
Here Min x, Min y, Max x, Max y are the sides of the rectangle where we want to
draw the slope field, number n is the number of points in the grid in both the
horizontal and vertical directions, and length multiplier a controls the length of
each segment, which can be adjusted for visual impact. The slope field in the above
example was constructed using SlopeField(y(1 − y), 60, 0.75, 0, −5, 10, 5). We can
also draw a solution starting at a point (x0 , y0 ) using the commands
SolveODE( f (x, y), (x0 , y0 )) or SolveODE( f (x, y), x0 , y0 , End x, Step).
Here End x means up to which point we want the solution to be drawn, and Step
means that the solution is drawn in steps of this size. The smaller the step the
smoother solution looks like, although any small enough step like 0.1 will look
perfectly smooth.
Exercise 2. Consider a differential equation y′ = y(y − 1).
(d) Roughly sketch some examples of solutions. Draw the slope field along the line
y = 2. Draw the slope field in Geogebra.
Non-autonomous equations. In the next two problems we will take a look at a

couple of non-autonomous equations y′ = f (t, y) where the right hand side depends
on the independent variable t. In fact, the equations will be of the special form
y′ = g(t)h(y), which are called separable equations, and which we will learn how
to solve later on. In this case we can still find equilibrium solutions by solving
h(y) = 0. The next two problems will refer to the following figures. Notice how
the slope field changes along horizontal lines, compared to the examples above
where the slope field stayed the same along any horizontal line. That is because
the slope y′ now changes with t even if y stays constant, because the equation is
non-autonomous.
Example 3. Consider the slope field in the left figure above corresponding to the
differential equation y′ = ty(1 − y).
(a) By looking at the slope field, find all equilibrium solutions. Check that they are
indeed solutions by plugging them into the equation.
(b) Sketch some solutions following along the slope field.
(c) If the initial condition y(0) > 0 is positive, what is the limit limt→∞ y(t)?
Solution: (a) By looking at the slope

field, we can see that some solutions
are flowing towards y = 1 and some
solutions are flowing away from y = 0,
so it looks like y(t) = 1 and y(t) = 0
could be equilibrium solutions. If we
plug y(t) = 1 into the equation y′ =
ty(1 − y), we see that (1)′ = t × 1(1 −
1) because both sides are equal to zero.
Similarly, (0)′ = t × 0(1 − 0) holds be-
cause both sides are zero, so y(t) = 0
is also an equilibrium solution. With-
out looking at the slope field, we could have found these solutions by setting
y′ = ty(1 − y) = 0, so y(1 − y) = 0 and y = 0 or y = 1.
(b) We sketched some solutions flowing along the slope field in the above figure.
(c) From the slope field we see that any solution starting above the x-axis, when
y(0) > 0 is positive, flows towards the equilibrium y = 1, so limt→∞ y(t) = 1 in this
case.
Exercise 3. Consider the slope field in the right figure above corresponding to the
differential equation y′ = sin(πt) sin(πy).
(a) By looking at the slope field, find all equilibrium solutions. Check that they are
indeed solutions by plugging them into the equation.
(b) Sketch some solutions following along the slope field.
(c) If the initial condition is y(0) = 0.5, does the solution have a limit limt→∞ y(t)?
1. 2.
3. 4.
Next two problems will refer to the above figures.

Example 4. Which of the slope fields above correspond to autonomous differential
equations and why?
Solution: Slope fields 2 and 3 correspond to autonomous differential equations of
the form y′ = f (y), because the slope looks the same along any horizontal line
y = c. This is exactly what should happen, because on a horizontal line y = c the
slope is also constant, y′ = f (y) = f (c), so the segments should all look the same
in the horizontal direction. Notice that slope fields 1 and 4 change along horizontal
lines, which is an indication that the right hand side of the equation y′ = f (t, y)
depends on t.
Exercise 4. Match the slope fields in the above figure with the following four dif-
ferential equations:
(a) y′ = (2y + 1)2 (c) y′ = 2y+1

1
(b) y′ = (2t + 1)2 (d) y′ = y − t
Exercise 5. A cup of coffee is sitting in

a room with temperature which is initially
20◦ C and then starts decreasing after the
AC has been turned on. By looking at the
slope field in the figure which describes
the temperature of a cup of coffee, at what
time was the AC turned on? The units on
the x-axis are minutes. Sketch some solu-
tions flowing along this slope field.
Answer to Exercise 1. (a) This is an au-

tonomous equation, because the right hand
side 0.3(y − 2) depends only on y.
(b) 0.3(y − 2) = 0 when y = 2. To check
that y(t) = 2 is a solution, we plug it into
y′ = 0.3(y − 2) and we get (2)′ = 0.3(2 − 2),
because both sides are zero.
(c) We draw a diagram:
y′ = 0.3(y − 2) : − +
y
2
From this diagram we see that y = 2 is an unstable equilibrium, because solutions
that start nearby are moving away from it.
(d) In the figure we sketched an equilibrium solution y = 2 and two solutions

above and below moving away from it. The solution above y = 20 is increasing,
because we saw in the diagram that the derivative is positive there, and the solution
below y = 20 is decreasing because y′ < 0. Any increasing and decreasing shapes
would be fine here, but in the next section we will learn how to check if the solution
is concave up or concave down. As a quick preview: y′ = 0.3(y − 2) implies that
y′′ = (y′ )′ = (0.3(y − 2))′ = 0.3y′ = 0.3 × 0.3(y − 2) = 0.09(y − 2). We see that
y′′ > 0 when y > 2 where solutions must be concave up and y′′ < 0 when y < 2
where solutions must be concave down.
Answer to Exercise 2. (a) This is an autonomous equation, because the right hand
side y(y − 1) depends only on y.
(b) The right hand side y(y − 1) is equal to 0 when y = 0 or y = 1, so there
are two equilibrium solutions. To check that y(t) = 1 is a solution, we plug it into
y′ = y(1 − y) and we get (1)′ = 1(1 − 1), because both sides are zero. Similarly,
y(t) = 0 is a solution because (0)′ = 0(0 − 1).
(c) To decide if these equilibria are stable or unstable, we draw a diagram:
y′ = y(y − 1) : + − +
y
0 1
From this diagram we see that y = 1 is an unstable equilibrium because solutions
that start nearby are moving away from it, and y = 0 is a stable equilibrium because
solutions are moving towards it.
(d) In the figure on the right we
sketched two equilibrium solutions
y = 0 and y = 1, and three more so-
lutions above, below, and in between
these points. According to the above
diagram, the solutions above and be-
low are increasing because y′ > 0, and
the solution in between is decreasing
because y′ < 0 there. We drew the
slope field in Geogebra, but the slope
field around level y = 2 could be drawn
by hand by computing y′ = y(y − 1) =
2(2 − 1) = 2 and drawing a bunch of line segments with the slope 2 along the hor-
izontal line y = 2. In the figure the regions near y = 2 is emphasized by the red
dashed lines.
Answer to Exercise 3. (a) By looking at the slope field, we can see that the slope
is horizontal at y = −2, −1, 0, 1, 2 and 3, so these look like equilibrium solutions.
For example, if we plug y(t) = 1 into the equation y′ = sin(πt) sin(πy), we see that
(1)′ = sin(πt) sin(0) because both sides are equal to zero. Without looking at the
slope field, we see that y′ = sin(πt) sin(πy) = 0 when sin(πy) = 0, which holds
when πy is of the form 0, ±π, ±2π, etc. In other words, when y is any integer
number 0, ±1, ±2, etc.
(b) We sketched the equilibrium so-
lutions and one non-equilibrium solu-
tion flowing along the slope field in the
figure. We see that solutions can now
fluctuate by decreasing and increasing.
(c) For the same reason it looks that
the limit limt→∞ y(t) does not exist, un-
less we start exactly at one of the equi-
librium solutions.
Answer to Exercise 4. From previous example we know that 2 and 4 are au-
tonomous equations, so they must be (a) or (c). We can see from the figure that
in 2 the slope is decreasing as y increases and in 3 the slope is increasing as y in-
creases, so we can match 2(c) and 3(a). We could also match by noticing that 3 has
1
a horizontal slope line, aka equilibrium solution, below the x-axis, while (c) 2y+1
can never be equal to zero.
Next, we can notice that the slope field 4 is the same along vertical lines. This
is an indication that it correspond to the equation of the form y′ = f (t), because
on a vertical line t is a constant, t = c. This allows us to match 4(b), because (b)
y′ = (2t + 1)2 is the only equation where the right hand side depends only on t.
This leaves 1(d).
Another way we could match 4(b) and 1(d) is to check when the slope is zero in
(b) and (d). In (b), the slope is zero when y′ = (2t + 1)2 = 0, so t = −0.5. In (d),
the slope is zero when y′ = y − t = 0, so y = t. We can see that in the slope field
4, the slopes are zero around t = −0.5, and in slope field the 1 the slopes are zero
around the diagonal line y = t.
Answer to Exercise 5. AC was turned on

at t = 30 minutes, because that it when the
slope field starts changing along the horizon-
tal lines. A couple of solutions are sketched in
the figure.
4.2 Differential equations: approximations

In this section we will learn how to approximate the solution of the initial value
problem
y′ = f (x, y), y(a) = b.
Here, y(a) = b is the initial condition.
Euler’s method. First, we will use

Euler’s method to approximate solution
of the above initial value problem. The
idea is to move along the tangent line
in small steps ∆x = h as follows. Since
we want to approximate the solution y =
y(x) of the above differential equation,
let us recall that
T (x) = y(a) + y′ (a)(x − a)
is its tangent line at x = a. We already

know y(a) because we are given the initial value y(a) = b. Because y(x) is the
solution of the equation y′ = f (x, y), we can use it to compute the slope y′ (a) =
f (a, y(a)) = f (a, b) at x = a. So the tangent line at x = a is:
T (x) = b + f (a, b)(x − a).
As the above figure illustrates, if we move a small step h from x = a to x = ah along

this tangent line then y = b will become y = b + f (a, b)h. If the step h is small then
the tangent line should be a good approximation of the actual solution and our new
point (a + h, b + f (a, b)h) is close to the solution at a new time t = a + h. Then we
can think of this point as our new initial condition and repeat the process, i.e. again
move along another tangent line at that point. In the figure above, we started at
some point on the y-axis and then made 4 steps, each time updating the next slope
according to our position. Let us show how this is done on a specific problem.
Example 1. The above figure comes from the example of a cup of coffee cool-
ing down at room temperature. The starting temperature is y(0) = 70◦ C and the
differential equation is y′ = 0.02(20 − y). Describe the first four steps of Euler’s
method with the step h = 20 minutes. If the actual solution at time t = 20 is equal
to 30.09◦ C, does the Euler method underestimate or overestimate it?
Solution: In the above figure, Euler’s method steps are given by the solid lines and
the actual solution is the dashed curve. In this problem, f (t, y) = 0.02(20 − y).
4.2 Differential equations: approximations 233
Step 1. Our starting point is (0, 70) and the slope at this point is f (0, 70) =
0.02(20 − 70) = −1. This means that on the first interval from t = 0 to t = 20, the
tangent line is y = 70 − t. If we move along this line, at time t = 20 we end up at
y = 70 − 20 = 50.
Step 2. After the first step we are at the point (20, 50). The slope at this point
is f (20, 50) = 0.02(20 − 50) = −0.6. This means that on the second interval from
t = 20 to t = 40, the tangent line is y = 50 − 0.6(t − 20). If we move along this
line, at time t = 40 we end up at y = 50 − 0.6(40 − 20) = 38.
Step 3. After the second step we are at the point (40, 38). The slope at this point
is f (40, 38) = 0.02(20 − 38) = −0.36. This means that on the third interval from
t = 40 to t = 60, the tangent line is y = 38 − 0.36(t − 40). If we move along this
line, at time t = 60 we end up at y = 38 − 0.36(60 − 40) = 30.8.
Step 4. After the third step we are at the point (60, 30.8). The slope at this point
is f (60, 30.8) = 0.02(20 − 30.8) = −0.216. This means that on the fourth interval
from t = 60 to t = 80, the tangent line is y = 30.8 − 0.216(t − 60). If we move
along this line, at time t = 80 we end up at y = 30.8 − 0.216(80 − 60) = 26.48.
Since the actual solution at time t = 80 is equal to 30.09, Euler’s method under-
estimated it, which agrees with the figure above.
Comment. Euler’s method approximation above at t = 80 was not very accurate,

but the steps of h = 20 we took along the tangent lines we quite large. If we took
steps of size h = 1, our Euler method approximation would give us 29.93, which is
much better even though the step h = 1 is not so small. With the step size h = 0.1,
Euler method would produce 30.07 at time t = 80, so the approximation gets better
and better as the step size gets smaller. In Wolfram Alpha, one can use the following
command to see the answer 30.07 at time t = 80 with the step size h = 0.1:
Euler’s method y′ = 0.02(20 − y), y(0) = 70, from 0 to 80, stepsize = 0.1
We can, of course, change the equation, the interval and the step size.
Exercise 1. A cup of iced coffee is sitting at room

temperature. The starting temperature is y(0) = 0◦ C
and the differential equation is y′ = 0.02(20 − y).
Describe the first four steps of Euler’s method with
the step h = 5 minutes. If the actual solution at time
t = 20 is equal to 6.59◦ C, does the Euler method un-
derestimate or overestimate it? Use Wolfram Alpha
to find the approximation at time t = 20 with the step
size h = 0.1.
Comment. In the above two problems we saw that Euler’s method underestimated
or overestimated the actual solution. The reason was quite simple.
• In the region where solutions are concave up, the tangent lines are below, so
moving along the tangent lines will underestimate the actual solutions.
• In the region where solutions are concave down, the tangent lines are above, so
moving along the tangent lines will overestimate the actual solutions.
Example 2. In the figure on the right, will

Euler’s method overestimate or underestimate
the solution starting from y(0) = 4.5?
Solution: That solution seems to be concave
down for a while and then becomes con-
cave up. The slope field in the region below
the equilibrium solution y = 6 also seems to
be concave down or concave up in different
parts. As a result, we can not say for sure
whether Euler’s method will overestimate or
underestimate this solution.
Exercise 2. In the figure above, will Euler’s method overestimate or underestimate

the solution starting from y(0) = 7 in the visible region?
Taylor polynomial approximation. Next, we will use Taylor polynomials to

approximate the solution of the initial value problem near the starting point x = a.
In this section, we will only use Taylor polynomials of degree n = 2, but the same
procedure can be iterated to find higher degree Taylor polynomial approximations
if necessary. Since we want to approximate the solution y = y(x) of the above
differential equation, let us recall that
y′′ (a)
P2 (x) = y(a) + y′ (a)(x − a) + (x − a)2
2
is its Taylor polynomial of degree n = 2 centered at x = a. We already know y(a)

because we are given the initial value y(a) = b. Because y(x) is the solution of the
equation y′ = f (x, y), we can use it to compute y′ (a) = f (a, y(a)) = f (a, b). So
automatically we know that
y(a) = b, y′ (a) = f (a, b)
and it only remains to compute y′′ (a). Again, because y′ (x) = f (x, y(x)), we can
differentiate this equation to find y′′ (x) = (y′ (x))′ . Let us illustrate this on a specific
example.
Example 3. If y(t) is the solution of

1 1
y′ = sin(π(t + y)), y =
4 2
find its Taylor polynomial of degree 2 centred
at 41 .
Solution: We are given that y( 14 ) = 21 and, using
the differential equation, we can compute the
first derivative
1 1 1 1 1 3π 1
y′ = sin π +y = sin π + = sin =√ .
4 4 4 4 2 4 2
To find y′′ ( 41 ), we differentiate the equation using the chain rule and keeping in
mind that y is a function of t:
′ ′
y′′ (t) = y′ (t) = sin π(t + y)
′
= cos(π(t + y)) π(t + y) = cos(π(t + y)) π(1 + y′ ) .

We now plug in t = 1
4 and use that y = 1
2 and y′ = √1
2
at t = 41 :
1 1 1 1 3π 1
y′′ = cos π + π 1+ √ = cos π 1+ √ =
4 4 2 2 4 2
1 1 1 1
= − √ π 1 + √ = −π √ + .
2 2 2 2
1
As a result, the Taylor polynomial of degree 2 centred at t = 4 is
1 1 1 π 1 1 1 2
P2 (t) = +√ t− − √ + t− .
2 2 4 2 2 2 4
In the above figure, this Taylor polynomial is graphed as a dashed green curve,
while the actual solution is graphed as a solid blue line, both for t ≥ 14 . We can
see that the approximation works well only if we are not far from the starting time
t = 14 .
Comment. Here we only compute the second degree Taylor polynomials but, once
we computed the second derivative y′′ (x), we can differentiate it once again to find
y′′′ (x) and to find the third degree Taylor polynomial. We can repeat this process to
find any degree Taylor polynomials. In another direction, we could use the Taylor
polynomial of degree 2 only on a small interval of step h, which would give us a
more accurate version of Euler’s method.
Exercise 3. If y(x) is the solution of the differential equation y′ = sin(x2 y) with the
initial condition y(1) = π, find its Taylor polynomial of degree 2 centred at x = 1.
Exercise 4. Consider an differential equation
y′ = f (y)
where the plot of f (y) is given in the figure. Notice

that the horizontal axis corresponds to the variable y.
If y(5) = 3, is the solution y(t) concave up or down
at time t = 5?
Answer to Exercise 1. Step 1. Our starting point is (0, 0) and the slope at this point
is f (0, 0) = 0.02(20 − 0) = 0.4. This means that on the first interval from t = 0 to
t = 5, the tangent line is y = 0.4t. If we move along this line, at time t = 5 we end
up at y = 2.
Step 2. After the first step we are at the point (5, 2). The slope at this point is
f (5, 2) = 0.02(20 − 2) = 0.36. This means that on the second interval from t = 5
to t = 10, the tangent line is y = 2 + 0.36(t − 5). If we move along this line, at time
t = 10 we end up at y = 2 + 0.36(10 − 5) = 3.8.
Step 3. After the second step we are at the point (10, 3.8). The slope at this point
is f (10, 3.8) = 0.02(20 − 3.8) = 0.324. This means that on the third interval from
t = 10 to t = 15, the tangent line is y = 3.8 + 0.324(t − 10). If we move along this
line, at time t = 60 we end up at y = 3.8 + 0.324(15 − 10) = 5.42.
Step 4. After the third step we are at the point (15, 5.42). The slope at this point
is f (15, 5.42) = 0.02(20 − 5.42) = 0.2916. This means that on the fourth interval
from t = 15 to t = 20, the tangent line is y = 5.42 + 0.2916(t − 15). If we move
along this line, at time t = 20 we end up at y = 5.42 + 0.2916(20 − 15) = 6.878.
Since the actual solution at time t = 20 is 6.5935, Euler’s method overestimated
it. Using the following command in Wolfram Alpha
Euler’s method y′ = 0.02(20 − y), y(0) = 0, from 0 to 20, stepsize = 0.1
we get that Euler’s method with the step size h = 0.1 gives 6.5989, which is a much
better approximation.
Answer to Exercise 2. The slope field in the region above the equilibrium solution
y = 6 seems to be concave up everywhere, and the solution looks concave up. As a
result, Euler’s method will underestimate this solution in this region.
Answer to Exercise 3. We are given that y(1) = π and, using the differential equa-
tion, we can compute y′ (1) = sin(12 π) = sin(π) = 0. To find y′′ (1), we differentiate
the equation using the chain rule and keeping in mind that y is a function of t:
′ ′
y′′ (t) = y′ (t) = sin x2 y
′
= cos(x2 y) x2 y = cos(x2 y) 2xy + x2 y′ .

We now plug in x = 1 and use that y = π and y′ = 0 at x = 1:

y′′ (1) = cos(12 π) 2 · 1 · π + 12 · 0 − 2π.

As a result, the Taylor polynomial of

degree 2 centred at t = 1 is
2π
P2 (x) = π + 0(x − 1) − (x − 1)2
2
= π − π(x − 1)2 .
In the figure, this Taylor polynomial is

graphed as a dashed green curve, while
the actual solution is graphed as a solid
blue line, both for t ≥ 1. We can see that
this approximation works well if we are
not far from t = 1.
Answer to Exercise 4. To decide if y(t) is concave up or down at time t = 5,

we need to compute the second derivative y′′ (t). Since y′ = f (y), differentiating
this equation gives y′′ = ( f (y))′ = f ′ (y)y′ = f ′ (y) f (y), where in the last step we
replaced y′ by f (y). At time t = 5, y′′ (5) = f ′ (y(5)) f (y(5)) = f ′ (3) f (3), because
we are given that y(5) = 3. Looking at the figure, we see that f (3) is negative, but
its slope f ′ (3) is positive, so y′′ (5) < 0 and y(t) is concave down at t = 5.
4.3 Separable differential equations

In this section we will learn how to solve separable equations of the form
f (x) g(y)
y′ = f (x)g(y) or y′ = or y′ = .
g(y) g(x)
In other words, on the right hand side the variables x and y are separated into a
product or ratio of two functions, and one of the functions depends only on x and
another one depends only on y. Examples of such differential equations are
x
y′ = xy, y′ = , y′ = ex+y = ex · ey ,
1+y
x(1 − y) x 1−y
y′ = 0.5y, y′ = = · .
y(1 + x) 1 + x y
Let us illustrate on a specific example how we can solve such equations.
Example 1. Find the solution of a separable differential equation y′ = xy with the

initial condition y(0) = −2. What if the initial condition is not given?
Solution: First, let us show how to solve such equation formally and after that we
will explain why these steps make sense even though they might look a bit strange
at first.
dy
Step 1. We write the derivative as y′ = dx and rewrite the equation as
dy
= xy.
dx
Step 2. We move all the terms containing y variable to one side and all the terms
containing x variable to the other side:
dy
= x dx.
y
Here we treat dy and dx formally as numbers, so we can move them around using
algebra rules.
Step 3. We now integrate both sides formally and find antiderivatives:
dy x2
Z Z
= x dx =⇒ ln |y| = +C.
y 2
We do not need to write +C on both sides, because those indeterminate constants

can be combined
R into one constant anyway. At this step, it is very important to
remember that dy y = ln |y| +C and not ln(y) +C, i.e. we should not forget the
4.3 Separable differential equations 239
absolute value |y|. For example, the initial condition is y(0) = −2, so if we forgot
the absolute value and wrote ln(y) + C, ln(−2) would be undefined because we
cannot plug in negative values into the logarithm.
Step 4 (with initial condition). First, we plug in the initial condition y(0) = −2:
02
ln | − 2| = +C.
2
This gives us that the constant C = ln 2, so the equation is
x2
ln |y| = + ln 2.
2
Then we solve for y if possible. In some cases, the equation may be too complicated
to solve for y, so we can leave it as is and think of y = y(x) as an implicit solution
of this equation. In this particular problem, we can solve for y:
x2 2 /2 2 /2
ln |y| = + ln 2 =⇒ |y| = ex eln 2 = 2ex .
2
Now, we recall that |y| = y is y is positive and |y| = −y is y is negative. In this

case, we start with the negative initial condition y(0) = −2 which tells us that y is
negative, so we can finally write
2 /2 2 /2 2 /2
−y = ex eln 2 = 2ex =⇒ y = −2ex .
2
We found the solution y = −2ex /2 .
Step 4 (without initial condition). If we do not have the initial condition then
we leave the constant C in Step 3 indeterminate. Again, we can try to solve for y
if possible, or leave it as is and think of y = y(x) as an implicit solution. In this
particular problem, we can solve for y:
x2 2 /2
ln |y| = +C =⇒ |y| = eC ex
2
2 /2
=⇒ y = (±eC )ex
2 /2
=⇒ y = Bex .
In the last step we renamed ±eC as another indeterminate constant B to keep the
expression simple. If we have the initial condition then we can determine B at this
step. For example, we know that y(0) = −2, so −2 = Be0 = B, so B = −2 and
2
y = −2ex /2 again.
Comment. The first three steps above are easy to follow in practice, but they hide
an implicit integration by substitution. To make sense of these steps, we could have
rewritten the original equation y′ = xy as
y′ (x)
y′ (x) = xy(x) or =x
y(x)
and, since both sides are functions of x, their indefinite integrals are the same:
y′ (x)
Z Z
dx = x dx.
y(x)
On
R dy
the left hand side, if we make the substitution y = y(x), we can rewrite it as
y so we arrive to Step 3 above in a way that makes more sense. Of course, in
,
practice we can skip this substitution step and follow the above formal steps.
Exercise 1. A cup of coffee is sitting at room temperature of 20◦ C. The coffee

temperature satisfies the equation y′ = 0.1(20 − y) and its initial temperature is
y(0) = 90◦ C. What is the temperature y(t) at time t?
Example 2. A common model of tumour growth is

described by a differential equation y′ = bce−ct y for
some positive constants b > 0 and c > 0. Find the
solutions of this equation. Write down the solution
corresponding to b = c = 1 and the initial condition
y(0) = 1e .
Solution: This is a separable equation, so we go
through the standard steps:
dy dy
= bce−ct y =⇒ = bce−ct dt
dt y
dy
Z Z
=⇒ = bce−ct dt
y
bc −ct
=⇒ ln |y| = e +C = −be−ct +C.
−c
−ct
Exponentiating both sides, we get |y| = eC e−be . Since the size of the tumour y
−ct
is positive, |y| = y and so y = eC e−be . We can also rename eC as a and write the
−ct
general solution as y = ae−be . This family of functions is called the Gompertz
growth functions.
Constant a could be determined from the initial condition. If b = c = 1 then the
−t
general solution is y = ae−e . Plugging in the initial condition y(0) = 1e we get
1 −e−0 = ae−1 = a , so a = 1 and the solution is y = e−e−t .
e = ae e
Exercise 2. When a crow sitting

on a tree drops a piece of cheese,
the angle θ of fox’s line of sight
of the cheese is changing accord-
ing to the equation
dθ gt
= − cos2 (θ )
dt d
where g = −9.8 m/s2 and d is

the distance from the fox to
where the cheese will land. Find
θ (t) if θ (0) = π4 . Hint: (tan(θ ))′ = sec2 (θ ).
Exercise 3. A rumour1 is spreading

among 100 people at a party starting
from one person at time t = 0. The ru-
mour spreads at a rate proportional to
the number y of people who already
know the rumour and proportional to
the number 100 − y of people who do
not yet know the rumour, namely:
dy
= κy(100 − y)
dt
where κ > 0 is some constant, and where time t is measured in minutes.

1 1 1 1
(a) Find y(t). Hint: use that y(100−y) = 100 ( y + 100−y ).
(b) If 3 people know the rumour after 1 minute, find κ.
(c) At what time will 95 people know the rumour?
Exercise 4. Water is flowing through a hole at

the bottom of a cylindrical container, as shown
in the figure. According to Torricelli’s law, when
the water level is at height h, the speed v of flow
of water through the hole is given by the formula
p
v= 2gh.
Suppose that the hole has area a and the area of

the horizontal cross-section of the container is A. Let us derive and solve the dif-
ferential equation for the water level h(t) at time t.
1 Image from freepik.com
(a) If between time t and t + ∆t the water level changed by ∆h, using that the cross-
section area of the container is A, what is the volume ∆V of water that escaped
during this time. Warning: water level is decreasing so the change ∆h is negative!
p By Torricelli’s law, at time t the water was flowing out at the speed v =
(b)
2gh(t). Using that the water hole has area a and the speed does not change much
over a very short period ∆t, what is the approximate volume ∆V of water jet that
escaped during this time.
(c) Set the volumes ∆V in (a) and (b) equal to each other and derive a differential
equation for h(t) by taking the limit ∆t → 0.
(d) Solve this equation with the initial condition h(0) = H. How long will it take
for the container to empty?
Answer to Exercise 1. We start with the usual steps:

dy dy
= 0.1(20 − y) =⇒ = 0.1 dt =⇒
dt 20 − y
dy
Z Z
= 0.1 dt =⇒ − ln |20 − y| = 0.1t +C.
20 − y
R
The minus sign in front of − ln |20−y| is because of our usual rule f (m+κx)dx =
1
κ F(m+κx)+C if F is an antiderivative of f . We plug in the initial value y(0) = 90
to find C: − ln |20 − 90| = 0.1(0) + C, so C = − ln 70 and the equation becomes
− ln |20−y| = 0.1t −ln 70 or ln |20−y| = −0.1t +ln 70. Exponentiating both sides:
|20 − y| = 70e−0.1t . Finally, we need to decide if |20 − y| = 20 − y or |20 − y| =
−(20 − y) = y − 20. Since initial temperature is 90, the correct choice is |20 − y| =
y − 20, so y − 20 = 70e−0.1t or y = 20 + 70e−0.1t .
1
Answer to Exercise 2. We start with the usual steps (and use cos2 (θ )
= sec2 (θ )):
dθ gt dθ gt
= − cos2 (θ ) =⇒ 2
= sec2 (θ ) dθ = − dt =⇒
dt d cos (θ ) d
Z Z
gt gt 2
2
sec (θ ) dθ = − dt =⇒ tan(θ ) = − +C.
d 2d
2
Plugging in the initial condition θ (0) = π4 , we get tan( π4 ) = − g(0)
2d +C, so 1 = C
gt 2
and tan(θ ) = 1 − 2d . We can solve this for θ by taking the inverse tangent: θ =
2
tan−1 (1 − gt2d ).
Answer to Exercise 3. (a) We start with the usual steps:

dy dy
= κy(100 − y) =⇒ = κ dt
dt y(100 − y)
dy
Z Z
=⇒ = κ dt.
y(100 − y)
1 1 1 1
Using the hint y(100−y) = 100 ( y + 100−y ), we can easily find the integral on the left
hand side, so
1
ln |y| − ln |100 − y| = κt +C or ln |y| − ln |100 − y| = 100κt + 100C.
100
Plugging in the initial condition y(0) = 1, we get that 100C = − ln 99, so the equa-
tion can be rewritten as
|y|
ln |y| − ln |100 − y| = ln = 100κt − ln 99.
|100 − y|
Exponentiating both sides, we get
|y| 1
= e100κt .
|100 − y| 99
Because the number of people who know the rumour is always between 0 and 100,
both y and 100 − y are positive, so we can forget about the absolute values and
y 1 100κt
write the equation as 100−y = 99 e . It remains to solve it for y:
1 100κt 100 100κt 1

y= e (100 − y) = e − e100κt y
99 99 99
1 100κt 100 100κt 1 100
=⇒ y+ e y= e =⇒ y 1 + e100κt = e100κt
99 99 99 99
100e100κt
=⇒ y= .
99 + e100κt
This solution is an example of the so called logistic function.
100e100κ
(b) If y(1) = 3 then plugging this into our solution above gives 3 = 99+e100κ
.
Solving this for κ:
3(99 + e100κ ) = 100e100κ =⇒ 297 + 3e100κ = 100e100κ

297
=⇒ e100κ =
97
1 297
=⇒ κ= ln = 0.0111 . . . .
100 97
100e100κt
(c) 95 people will know the rumour when 99+e100κt
= 95, so
95(99 + e100κt ) = 100e100κt =⇒ 9405 + 95e100κt = 100e100κt

9405
=⇒ e100κt = = 1811
5
ln(1811)
=⇒ t = ≈ 6.7 min.
100κ
Answer to Exercise 4. (a) Since ∆h is negative, −∆h is the height by which the
water level decreased. Since the cross section area is A, the volume of water that
escaped is Area × Height = A × (−∆h) = −A∆h.
(b) If we image the water jet flowing from the hope as a small cylinder, its cross-
section is the area of the hole a. During short time period ∆t, the speed
p is almost
constant, v(t), so the jet moved by Height = Speed × Time ≈ v(t)∆t = 2gh(t)∆t,
where in the last step we used Torricelli’s
p law. Multiplying this by the area a gives
us the volume of the water jet, ∆V ≈ a 2gh(t)∆t. p
(c) The volumes we found in (a) and (b) should
p be equal so −A∆h ≈ a 2gh(t)∆t.
Dividing both sides by ∆t we get −A ∆h ∆t ≈ a 2gh(t). When we take the limit
∆t → 0, the approximation will get better and better and ∆h ′
∆t will become h (t), so
we finally obtain the differential equation
−Ah′ (t) = a 2gh(t) = a 2g h(t).

p p p
(d) To solve this equation, we follow the usual steps:

√ √
dh a 2g √ dh a 2g
=− h =⇒ √ =− dt =⇒
dt A h A
Z √ √
√
dh a 2g a 2g
Z
√ =− dt =⇒ 2 h = − t +C.
h A A
√
Using the initial condition h(0) = H, we find the constant C = 2 H, so
√ a√2g √ 2
√ a 2g √
2 h=− t + 2 H =⇒ h = − t+ H .
A 2A
√
a 2g √
The container
√
will become empty when h = 0 so when − 2A t + H = 0, or
t = 2A√ H.
a 2g
4.4 Lotka-Volterra predator-prey model 245
4.4 Lotka-Volterra predator-prey model

In this section we will investigate a famous Lotka-Volterra predator-prey model.
We imagine a simplified situation when:
• one predator species (for example, foxes) shares the same territory with one prey
species (for example, rabbits);
• without rabbits, foxes would not have enough food (berries, other prey, etc.) and
would generally die out;
• rabbit have unlimited resources (food) and, in the absence of foxes, would mul-
tiply exponentially.
Let us denote by x(t) the population of rabbits and by y(t) the population of foxes
at time t. We should remember that the units can be in hundreds, or thousands, etc.,
so x(t) = 0.5 does not mean half a rabbit.
Without encounters between foxes and rabbits (in other words, if foxes did not
eat rabbits), these populations would satisfy the equations:
x′ (t) = ax, y′ (t) = −cy
for some constants a > 0 and c > 0. The population of rabbits would grow at a rate
proportional to the number of rabbits, and the population of foxes would die out at
a rate proportional to the number of foxes. We know that these correspond to the
exponential growth and exponential decay.
Since foxes and rabbits share the same territory, it is reasonable to assume that
rate of encounters is proportional to the product of populations xy.
Indeed, imagine that a rabbit encounters 1 fox per month. If the population of
foxes doubles then it is reasonable that the number of encounters per month would
also double to 2. So the number of encounters of one rabbit with foxes per month
is proportional to y, and the total number of encounters is proportional to xy. The
Lotka-Volterra model modifies the above equations to account for these encounters:
x′ (t) = ax − bxy, y′ (t) = −cy + dxy
for some positive constants a, b, c and d. The second term −bxy in the rate x′ (t) of
the rabbit population change is there because some proportion of the encounters
will lead to rabbits being killed and eaten. The second term +dxy in the rate y′ (t)
of the fox population change is there because those encounters yield food for foxes
so they allow them to survive and procreate. Simply put, encounters are bad for
rabbits and good for foxes.
In the rest of this section we will analyze this model and, for simplicity, will
take the coefficients to be all equal to 1, a = b = c = d = 1, so
x′ (t) = x − xy, y′ (t) = −y + xy.
We will break the analysis of the model into two exercises with relatively simple
steps. In the exercises, we always assume that the derivatives x′ (t) and y′ (t) satisfy
the above Lotka-Volterra equations. In the first exercise, we will draw the pair of
populations x(t) and y(t) as a point (x(t), y(t)) on the xy-plane, and will try to
imagine what kind of trajectory this point (x(t), y(t)) will follow over time. Before
we begin, let us give a simple example.
Example 1. Suppose that an ant is walking on a flat

surface and its coordinates at time t are given by
x(t) = cos(t) and y(t) = sin(t).
(a) Describe the trajectory of the ant on the plane.
(b) Notice that
x′ (t) = −y and y′ (t) = x.
If we only knew these equations, what could we say

about the trajectory of the ant?
Solution: (a) We know that cos(θ ) and sin(θ ) are the coordinates of a point on a
unit circle corresponding to the angle θ , so if we think of time t as an angle, we
know that the ant is moving along this unit circle. Another way to see this is to
notice that x2 (t) + y2 (t) = cos2 (t) + sin2 (t) = 1, so the position (x(t), y(t)) of the
ant at time t is on the units circle x2 + y2 = 1.
(b) Now imagine that we only know that x′ (t) = −y and y′ (t) = x. In the first
quadrant where x > 0 and y > 0, the signs of the derivatives are
x′ (t) = −y < 0 and y′ (t) = x > 0
so the coordinate x(t) is decreasing and y(t) is increasing with time. This means
that the ant is moving in the north-west direction, as indicated by the arrow in the
first quadrant. Similarly, we can check that in the second quadrant it is moving in
the south-west direction, in the third quadrant in the south-east direction, and in
the fourth quadrant in the north-east direction. The equations do not immediately
tell us that the ant is walking on a circle, but we get a general idea that the ant is
moving counterclockwise around the origin.
We will now apply a similar analysis to the fox and rabbit populations using the
Lotka-Volterra equations. After that we will see how we could figure out that the
ant is moving along a circle using only the equations x′ (t) = −y and y′ (t) = x.
Exercise 1. (a) Find all the points (x, y)

on the xy-plane such that both
x − xy = 0 and − y + xy = 0.
What happens if the initial populations

(x(0), y(0)) at time t = 0 are at one of
those points?
(b) In each of the four regions in the
figure (separated by the lines x = 1
and y = 1), determine the signs of the
derivatives x′ (t) and y′ (t) using the
Lotka-Volterra equations. As time t in-
creases, in which direction on the plane
will the pair (x(t), y(t)) be moving:
north-east, north-west, south-east, south-west? Draw an arrow indicating the di-
rection in each region. What do these arrow tells us about how the populations are
changing over time?
Next, we will consider a different question. Both populations x(t) and y(t) de-
pend on time t, but
how does the population of foxes y depend on the population of rabbits x?
In other words, can we eliminate time t and find y = y(x) as a function of x? Before
we address this question, let us make a useful observation.
Comment. If we plug in x = x(t) and y = y(t) into y = y(x) we get that

y(t) = y x(t) .
Taking the derivative of both sides, by the chain rule, y′ (t) = y′ (x)x′ (t), so
y′ (t)
y′ (x) = .
x′ (t)
This formula makes sense, because y′ (t) ≈ ∆y

∆t and x′ (t) ≈ ∆x
∆t , so
y′ (t)
∆y
∆y
≈ ∆t
= ≈ y′ (x).
x′ (t) ∆x
∆t
∆x
If we know that the coordinate x(t) moves at a rate x′ (t) and coordinate y(t) moves
′
at a rate y′ (t) then the pair moves along the curve with the slope y′ (x) = xy′ (t)
(t) . This
is how the slope field was graphed in the previous exercise, using the formula for
y′ (x) that will be computed in the next exercise.
Example 2. In the setting of the ant example above, if we know that
x′ (t) = −y and y′ (t) = x,
find an equation for y′ (x) and solve it to show that the ant is moving along a circle.
Solution: Using the chain rule formula in the comment above,
y′ (t) x
y′ (x) = ′
=− ,
x (t) y
where in the last step we used the given equations x′ (t) = −y and y′ (t) = x. This is
a separable equation so we can solve it following standard steps:
dy x
=− =⇒ y dy = −x dx
dx y Z Z
=⇒ y dy = − x dx
y2 x2
=⇒ = − +C.
2 2
If we rewrite this as x2 + y2 = 2C, we can see that this is an equation of the cir-
cle centred at the origin. The constant C can be found if we are given the initial
condition (x(0), y(0)) or any point on the trajectory.
Exercise 2. (a) Recall the Lotka-Volterra equations describing the populations of

foxes and rabbits:
x′ (t) = x − xy, y′ (t) = −y + xy.
Find a differential equation for y′ (x) of the form y′ (x) = f (x, y).
(b) Solve the equation you found in (a). The answer can be written as an implicit
function.
Exercise 3. Match the following systems of equations

′ ′
x (t) = −x + 0.5xy x (t) = 0.05x
(a) ′ (c)
y (t) = 0.5y − 0.01xy y′ (t) = −0.5y + 2xy
′ ′
x (t) = 0.1x − 0.5xy x (t) = −x + 0.5xy
(b) (d)
y′ (t) = 0.2y − 0.1xy y′ (t) = −0.5y + 0.01xy
with the following pair of species. Which species is x and which one is y?
1. Humans and gut bacteria; both need 3. Polar bears and seals.
each other to survive.
2. Owls and trees; owls needs trees for 4. Ducks and geese; in competition for
nesting and food. food.
Answer to Exercise 1. (a) First, x − xy = x(1 − y) = 0 when either x = 0 or y = 1.

Similarly, −y + xy = y(−1 + x) = 0 when either x = 1 or y = 0. This means that
both will be equal to zero when x = 0 and y = 0, or x = 1 and y = 1. In other words,
at the points (0, 0) and (1, 1). At these points
x′ (t) = x − xy = 0 and y′ (t) = −y + xy = 0,
so these are equilibrium solutions of this system of equations, and populations

starting out at one of these points will always stay there, i.e. the populations will be
constant. Of course, the case (0, 0) makes sense, but the case of (1, 1) is the feature
of the Lotka-Volterra model. We remind again that 1 here could be in arbitrary
units, so it does not mean one fox and one rabbit.
(b) The directions of how the pair of populations (x(t), y(t)) changes over time
are described by the arrows in the left figure above. Indeed, x′ (t) = x−xy = x(1−y)
is positive when y < 1 and negative when y > 1, so x(t) is increasing (moving right,
or east) below the line y = 1, and decreasing above this line (moving left, or west).
This makes sense, because this tells us that the population of rabbits increases
when there are not too many foxes around, and it decreases when the population of
foxes exceeds a certain threshold. Similarly, y′ (t) = −y+xy = y(−1+x) is positive
when x > 1 and negative when x < 1, so y(t) is increasing (moving up, or north)
to the right of the line x = 1 and decreasing to the left of this line (moving down,
or south). Over time, the point (x(t), y(t)) will be moving around the equilibrium
point (1, 1). Again, this makes sense, because this tells us that the population of
foxes decreases when there are not enough rabbits around, and it increases when
the population of rabbits exceeds a certain threshold.
In the figure above on the right, we show the slope field y′ (x) and one trajectory
starting at a point (0.5, 0.5). We will discuss the slope field in the next example,
and the trajectory can be graphed in Geogebra using the command:
SolveODE(−y + xy, x − xy, 0.5, 0.5, 10, 0.01).
Here −y + xy and x − xy are the formulas for x′ (t) and y′ (t), (0.5, 0.5) is the initial
position at time t = 0, 10 is the time t until the equation is solved numerically (you
can try other values of t), and 0.01 is the step size. We will find the formula for this
trajectory in the next exercise.
Answer to Exercise 2. Using the chain rule as in the previous example,
y′ (t) −y + xy y(−1 + x)
y′ (x) = = =
x′ (t) x − xy x(1 − y)
where in the middle step we applied the Lotka-Volterra equations. This gives us a
separable differential equation
x−1 y
y′ (x) = · .
x 1−y
We can solve it using the standard steps:
dy x − 1 y 1−y x−1
= · =⇒ dy = dx
dx x 1−y y x
1 1
Z Z
=⇒ − 1 dy = 1− dx
y x
=⇒ ln |y| − y = x − ln |x| +C.
Since the populations are positive, we can forget about the absolute values and
write the last equation as ln(y) − y = x − ln(x) +C. We cannot solve this for y, so
we leave it as an implicit solution. The constant C can be found if we are given the
initial condition (x(0), y(0)) or any point on the trajectory.
Answer to Exercise 3. 1(d) – cannot tell which one is which. 2(c) – x is trees and y
is owls. 3(a) – x is polar bears and y is seals. 4(b) – cannot tell which one is which.
4.5 The SIR model 251
4.5 The SIR model

In this section we will take a look at a simple deterministic model of the spread of
disease in a population, for example a seasonal viral infection such as flu. At any
time t, the population will be divided into three groups.
• Let S(t) be the proportion of susceptible individuals at time t, meaning that
they have not yet been infected and do not have immunity;
• Let I(t) be the proportion of currently infected individuals at time t;
• Let R(t) be the proportion of recovered individuals at time t. We will assume
that recovered have full immunity and cannot be reinfected.
Proportion here means the proportion of the entire population, so it is a number
between 0 and 1. The spread of disease is driven by encounters between infected
and susceptible individuals and, as with foxes and rabbits, we will assume that:
rate of encounters between infected and susceptible is proportional to S(t)I(t).
Then we can model the rates of change of the three groups by the equations:
S′ (t) = −aSI, I ′ (t) = aSI − bI, R′ (t) = bI.
• Constant a > 0 is the rate of infection (more precisely, the rate of potential
transmissions by an infected individual), defined as the average number of con-
tacts of one person per unit of time, multiplied by the probability of disease
transmission in a contact between a susceptible and an infectious subject.
• Constant b > 0 is the rate of recovery, which is defined as b = D1 where D is an
average time period an individual is infectious.
The term aSI represents new infections (per unit of time) resulting from interactions
between susceptible and infectious individuals, so it is added to the infected group
and subtracted from the susceptible group. The term bI represents newly recovered
individuals (per unit of time), so it is subtracted from the infected and added to the
recovered group. Since the term R(t) does not appear in the first two equations, we
can for now forget about the last equation R′ (t) = bI and first focus on analyzing
and solving the system of the first two equations:
S′ (t) = −aSI, I ′ (t) = aSI − bI.
In the next problem there will appear a constant
a
R0 = = aD
b
which is called the basic reproduction number. Since it is the rate a of potential
transmissions by an infectious individual per unit of time multiplied by the number
of days D an individual is typically infected, R0 represents the average number
of new cases generated by one infected individual, assuming that everyone else is
susceptible. When the proportion of susceptible in the population is S, the actual
average number of new cases generated by one infected individual is R0 S.
Notice that S′ (t) = −aSI is negative, so the number of susceptible individuals
is always decreasing (obviously). In the first exercise we will analyze when the
infectious subpopulation is increasing and decreasing.
Exercise 1. In the first quadrant S > 0, I > 0 on the SI-plane (with S on the x-
axis and I on the y-axis) find the regions where I ′ (t) is positive, negative, or zero.
Express the regions in terms of the basic reproduction number R0 . In each region,
sketch in which direction the pair (S(t), I(t)) is moving as time t increases. Explain
the behaviour in terms of the basic reproduction number R0 .
Exercise 2. Suppose that an individual is typically infected for D = 4 days, and the
basic reproduction number is R0 = 2.
(a) Write down the SIR model corresponding to these parameters.
(b) Find a differential equation for I ′ (S).
(c) Solve this equation with the initial conditions S(0) = 0.95 and I(0) = 0.5.
Spread of disease in time. If we want to see how the disease spreads over
time, we could try to find the function S = S(t), which tells us how the susceptible
population decreased as a function of time. Then 1−S(t) will give us the proportion
of the population infected up to time t. For example, in the setting of the previous
exercise, if we take the solution I = 0.5 ln S − S + 1.025 we found in part (c) and
plug it into the equation S′ (t) = −0.5SI found in part (a), we get
dS
= −0.5S(0.5 ln S − S + 1.025).
dt
This is a separable equation, but we can not solve
it explicitly because we can not integrate
dS
Z
S(0.5 ln S − S + 1.025)
explicitly. However, we can solve it numerically,
for example, by using the following command in
Geogebra:
SolveODE(−0.5y(0.5 ln(y) − y + 1.025), 0, 0.95, 30, 0.1).
This produced the graph in the above figure. Here the initial condition is S(0) =
0.95, and we solve the equation up to time t = 30.
4.5 The SIR model 253
Answer to Exercise 1. The derivative I ′ (t) is zero,
I ′ (t) = aSI − bI = (aS − b)I = 0
when aS − b = 0 (assuming that I > 0), or when S = ba = R10 . The derivative is

positive when S > ab = R10 and it is negative when S < ab = R10 , so the regions when
the proportion I(t) of infected individuals is increasing or decreasing are separated
by the vertical line S = R10 . Since S(t) is always decreasing as time t increases, the
pair (S(t), I(t)) is moving in the north-west direction when S > R10 , and it is moving
in the south-west direction when S < R10 . This is depicted in the left figure below.
Recall that R0 S represents the average number of new cases generated by one
infected individual when the proportion of susceptible in the population is S. If
we rewrite S > R10 as R0 S > 1, and S < R10 as R0 S < 1, we see that the proportion
of infected individuals I(t) is increasing when this average number of new cases
is bigger than 1, and is decreasing when it is smaller than 1. This makes perfect
sense.
In the figure below on the right, we show the slope field I ′ (S) and one trajectory
for the pair (S(t), I(t)). We will discuss the slope field in the next exercise, and the
trajectory can be graphed in Geogebra using the command:
SolveODE(axy − by, −axy, S(0), I(0),t, stepsize).
We have to write x instead of S and y instead of I in axy − by, −axy, for Geogebra
to understand what we want.
1 R0
Answer to Exercise 2. (a) b = D = 0.25 and a = D = 0.5, so the equations are
S′ (t) = −0.5SI, I ′ (t) = 0.5SI − 0.25I.

(b) Using the chain rule as in the last section,

I ′ (t) 0.5SI − 0.25I 0.5
I ′ (S) = ′
= = −1 +
S (t) −0.5SI S
where in the middle step we applied the SIR equations. This is the slope field that
was graphed in the right figure above. The right hand side depends only on the
independent variable S, we we can simply integrate this to find I(S):
0.5
Z
I(S) = − 1 dS = 0.5 ln S − S +C.
S
The constant C can be found from the initial condition (0.95, 0.5), so 0.5 =
0.5 ln 0.95 − 0.95 +C and C = 1.025 . . . . We get I(S) = 0.5 ln S − S + 1.025.
4.6 Approximating solutions by Taylor polynomials 255
4.6 Approximating solutions by Taylor polynomials

We have already seen how to approximate solutions of differential equations by
Taylor polynomials of degree 1 (Euler’s method), and Taylor polynomials of degree
2, by differentiating the equation. We also mentioned that the same procedure can
be iterated to find higher order derivatives of solutions y(x) and, as a result, higher
order Taylor polynomial approximations. In this section, we will try a different
procedure to find higher order Taylor polynomial approximations called method of
undetermined coefficients. Let us illustrate this method on an example.
Example 1. Consider the initial value problem:
y′ (x) + 4y(x) = 8, y(0) = −1.
Find the Taylor polynomial of degree 4 for y(x) centred at x = 0 using the method
of undetermined coefficients.
Solution: Let us write the unknown solution as
y(x) = c0 + c1 x + c2 x2 + c3 x3 + c4 x4 + . . .
The terms c0 + c1 x + c2 x2 + c3 x3 + c4 x4 on the right hand side represent the Taylor

polynomial we are looking for, so the constants c0 , c1 , c2 , c3 and c4 are unknown
and are called undetermined coefficients. Since the Taylor polynomial is only an ap-
proximation of y(x), the terms . . . represent the missing difference, or error. We can
also think of . . . as a placeholder: if we continue finding higher order derivatives,
we can keep writing more and more terms in the Taylor polynomial approximation,
and this open ended process is expressed by the dots. We will discuss this more in
the next chapter on Taylor series.
First of all, we know that the constant c0 of the Taylor polynomial should be the
value y(0), so using the initial condition y(0) = −1 gives us that c0 = −1 and
y(x) = −1 + c1 x + c2 x2 + c3 x3 + c4 x4 + . . .
Next, we want to plug in this expression into the equation y′ (x) + 4y(x) = 8, which
means that we need to take the derivative first:
y′ (x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . .
If we add this to 4y(x) = −4 + 4c1 x + 4c2 x2 + 4c3 x3 + 4c4 x4 + . . . , we get
y′ (x) + 4y(x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . .

− 4 + 4c1 x + 4c2 x2 + 4c3 x3 + 4c4 x4 + . . .
= (c1 − 4) + (2c2 + 4c1 )x + (3c3 + 4c2 )x2 + (4c4 + 4c3 )x3 + . . .
Notice how the last term 4c4 x4 in the second line disappeared. That is because it
had nothing to be matched with, so it was absorbed by the dots . . ..
We wrote the left hand side of the equation y′ (x) + 4y(x) = 8 as a polynomial of
degree 3 plus some dots. Next, we need to write the right hand side as a polynomial
of degree 3 plus some dots. In the next example, this will require a little bit of work,
but in this example the right hand side is very simple, just a constant 8, so we can
formally write
8 = 8 + 0x + 0x2 + 0x3 .
Because we want the two sides to be equal, we need to make sure that
(c1 + 4c0 ) + (2c2 + 4c1 )x + (3c3 + 4c2 )x2 + (4c4 + 4c3 )x3 = 8 + 0x + 0x2 + 0x3 .
This means that the coefficients in front of each power of x must be equal, so
c1 − 4 = 8, 2c2 + 4c1 = 0, 3c3 + 4c2 = 0, 4c4 + 4c3 = 0.
The first equation gives us c1 = 12. Plugging it into the second equation gives us
that 2c2 + 4(12) = 0, so c2 = −24. Plugging it into the third equation gives us that
3c3 + 4(−24) = 0, so c3 = 32. Plugging it into the fourth equation gives us that
4c4 + 4(32) = 0, so c4 = −32. We found the Taylor polynomial approximation of
degree 4 for the solution y(x) of this equation:
−1 + c1 x + c2 x2 + c3 x3 + c4 x4 = −1 + 12x − 24x2 + 32x3 − 32x4 .
Comment. The above equation can be rewritten as y′ = 4(2 − y). It is a separable

equation and can be solved following the usual steps:
dy dy
= 4(2 − y) =⇒ = 4 dx
dx 2−y
dy
Z Z
=⇒ = 4 dx
2−y
=⇒ − ln |2 − y| = 4x +C.
From the initial condition y(0) = −1 we get − ln |2 − (−1)| = 4(0) + C, so C =

− ln 3 and the equation is − ln |2 − y| = 4x − ln 3, or ln |2 − y| = 4x + ln 3. Exponen-
tiating both sides, we get |2 − y| = 3e−4x . Because at time t = 0, 2 − y is positive,
we can drop the absolute absolute value and write 2 − y = 3e−4x , or y = 2 − 3e−4x .
Since we found the exact equation for the solution, we can check the Taylor poly-
nomial approximation above. Recall the pattern of the Taylor polynomials for ex :
x2 x3 x4
ex ≈ 1 + x + + +
2! 3! 4!
Replacing x with −4x:
(−4x)2 (−4x)3 (−4x)4

e−4x ≈ 1 + (−4x) + + +
2! 3! 4!
2 32 3 32 4
= 1 − 4x + 8x − x + x .
3 3
Finally, y(x) = 2 − 3e−4x ≈ −1 + 12x − 24x2 + 32x3 − 32x4 , which matches the
answer above.
Exercise 1. Consider the initial value problem:
y′ (x) − 3y(x) = 10, y(0) = 2.
Example 2. Consider the initial value problem:
sin(x)
y′ (x) − 2y(x) = , y(0) = 0.
x
Solution: We handle the left hand side exactly the same way as in the first example.
The initial condition gives us c0 = y(0) = 0, so
y(x) = c1 x + c2 x2 + c3 x3 + c4 x4 + . . .
y′ (x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . .
−2y(x) = −2c1 x − 2c2 x2 − 2c3 x3 − 2c4 x4 + . . .
y′ (x) − 2y(x) = c1 + (2c2 − 2c1 )x + (3c3 − 2c2 )x2 + (4c4 − 2c3 )x3 + . . .
What is different in this example is that the right hand side sin(x)
x is not a simple
constant anymore and, to match the coefficients, we first need to find its Taylor
polynomial. For this, we need to recall the pattern of Taylor polynomials for sin(x):
x3 x5
sin(x) = x − + −....
3! 5!
Dividing both sides by x, we get
sin(x) x2 x4 x2 x4
= 1 − + − . . . = 1 + 0x − + 0x3 + − . . . .
x 3! 5! 3! 5!
sin(x)
Equating the coefficients of y′ (x) − 2y(x) above with the coefficients of x :
1 1
c1 = 1, 2c2 − 2c1 = 0, 3c3 − 2c2 = − =− , 4c4 − 2c3 = 0.
3! 6
11 11
Solving them sequentially we get c1 = 1, c2 = 1, c3 = 18 , and c4 = 36 . We found
the Taylor polynomial approximation of degree 4:
11 3 11 4
x + x2 + x + x .
18 36
Exercise 2. Consider the initial value problem:
y′ (x) + y(x) = x cos(x), y(0) = 0.
Answer to Exercise 1. The initial condition gives us c0 = y(0) = 2, so
y(x) = 2 + c1 x + c2 x2 + c3 x3 + c4 x4 + . . .
y′ (x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . .
−3y(x) = −6 − 3c1 x − 3c2 x2 − 3c3 x3 − 3c4 x4 + . . .
y′ (x) − 3y(x) = (c1 − 6) + (2c2 − 3c1 )x + (3c3 − 3c2 )x2 + (4c4 − 3c3 )x3 + . . .
This must be equal to 10 + 0x + 0x2 + 0x3 , so equating the coefficients in front of

the same powers:
c1 − 6 = 10, 2c2 − 3c1 = 0, 3c3 − 3c2 = 0, 4c4 − 3c3 = 0.
Solving these equations sequentially, we get c1 = 16, c2 = 24, c3 = 24, c4 = 18, so

the Taylor polynomial approximation of degree 4 is
2 + 16x + 24x2 + 24x3 + 18x4 .
Answer to Exercise 2. The initial condition gives us c0 = y(0) = 0, so
y(x) = c1 x + c2 x2 + c3 x3 + c4 x4 + . . .
y′ (x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . .
y′ (x) + y(x) = c1 + (2c2 + c1 )x + (3c3 + c2 )x2 + (4c4 + c3 )x3 + . . .
To handle the right hand side, recall the pattern of Taylor polynomials for cos(x):
x2 x3
cos(x) = 1 − + −....
2! 3!
Multiplying both sides by x, we get

x3 x5 x3 x5
x cos(x) = x − + − . . . = 0 + x + 0x2 − + 0x4 + − . . . .
2! 3! 2! 3!
Equating the coefficients of y′ (x) + y(x) with the coefficients of x cos(x):
1
c1 = 0, 2c2 + c1 = 1, 3c3 + c2 = 0, 4c4 + c3 = − .
2
Solving these equations sequentially, we get c1 = 0, c2 = 21 , c3 = − 16 , c4 = − 12
1
,
so the Taylor polynomial approximation of degree 4 is
1 2 1 3 1 4
x − x − x .
2 6 12
Chapter 5
Taylor polynomials and series
5.1 From Taylor polynomials to Taylor series

x2 x3 ∞
xn
ex = 1 + x + + +... = ∑ n! (R = ∞)
2! 3! n=0
x2 x4 x6 ∞
(−1)n 2n
cos(x) = 1 − + − +... = ∑ x (R = ∞)
2! 4! 6! n=0 (2n)!
x3 x5 x7 ∞
(−1)n
sin(x) = x − + − +... =
3! 5! 7! ∑ (2n + 1)! x2n+1 (R = ∞)
n=0
∞
1
= 1 + x + x2 + x3 + . . . = ∑ xn (R = 1)
1−x n=0
x2 x3 ∞
xn
ln(1 − x) = −x − − −... = −∑ (R = 1)
2 3 n=1 n
x2 x3 ∞
(−1)n+1 n
ln(1 + x) = x − + −... = ∑ n x (R = 1)
2 3 n=1
The above table contains a list of several classic examples of Taylor series, as
well as the radius of convergence R for each of them. We will discuss the meaning
of what is written in the table in this section and subsequent sections.
First of all, let us recall that in Section 3.7 we introduced and discussed Taylor
polynomials of degree n centered at x = a, which can be used to approximate a
function y = f (x) near a point x = a:
f ′′ (a) f (n) (a)

Pn (x) = f (a) + f ′ (a)(x − a) + (x − a)2 + . . . + (x − a)n .
2! n!
In this section, we will push this definition of a Taylor polynomial to the limit,
where it will become the Taylor series.
261
262 5 Taylor polynomials and series
Example 1. Let us discuss the meaning of the Taylor series for ex centred at a = 0:
x2 x3 ∞
xn
ex = 1 + x + + +... = ∑ n! (R = ∞)
2! 3! n=0
Solution: First of all, let us recall the Taylor poly-

nomials of f (x) = ex centred at a = 0. Because all
the derivatives of ex are equal to ex itself, we get
that f (n) (x) = ex and f (n) (0) = e0 = 1. For exam-
ple, the Taylor polynomial of degree 7 at a = 0
is
x2 x3 x4 x5 x6 x7
P7 (x) = 1 + x + + + + + +
2! 3! 4! 5! 6! 7!
and we see in the figure that it approximates ex
very well on the interval between −2 and 2. What
happens if we do not stop at the degree 7 and keep adding more and more terms?
It turns out that the Taylor polynomials will get closer and closer to our function
ex . Depending on the function f (x), this approximation will work on some interval
of the form a − R < x < a + R:
x
a−R a a+R
or, in other words, for x in between a − R and a + R for some number R which is
called the radius of convergence. (Sometimes things are a bit more subtle than this,
but we will not encounter such unusual examples.) For example, the table above
says that, in the case of the exponential function ex , the radius of convergence
is R = ∞. This means that for any −∞ < x < ∞, the Taylor polynomials Pn (x)
will eventually get closer and closer to ex if we keep increasing the degree n. In
mathematical language, we can say that Pn (x) converges to ex for all x and write
ex = lim Pn (x).
n→∞
Another common way to express the same thing is to write
x2 x3 ∞
xn
ex = 1 + x + + +... = ∑ .
2! 3! n=0 n!
In the middle, the dots . . . express that if we continue adding more and more terms
to the Taylor polynomials we will get closer and closer to ex . On the right hand side,
a more sophisticated notation ∑∞ n=0 also expresses that we keep adding the terms
of degree n indefinitely, up to infinite degree n = ∞. This infinite sum is called the
Taylor series of ex centred at a = 0.
5.1 From Taylor polynomials to Taylor series 263
The notation ∑∞n=0 is very important, and it is called the Sigma notation. The n
most important thing about this notation is that we are able to find the formula xn!
that expresses the term number n the Taylor series. In many applications we can
simply write out a few terms at the beginning, but sometimes we will need the
formula for the nth term, so it is important to remember those. Let us check that
2 3
this formula indeed encodes correctly the terms of the series 1 + x + x2! + x3! + . . . .
We need to recall the convention that 0! = 1. Then,
x0 x1 x2
n = 0 =⇒ = 1, n = 1 =⇒ = x, n = 2 =⇒ , etc.
0! 1! 2!
xn
We see that the formula n! matches the pattern of the series as the degree n changes
from 0, 1, 2, etc.
Exercise 1. Discuss the meaning of the Taylor series for cos(x) and sin(x) centred
at a = 0:
x2 x4 x6 ∞
(−1)n 2n
cos(x) = 1 − + − +... = ∑ x (R = ∞)
2! 4! 6! n=0 (2n)!
x3 x5 x7 ∞
(−1)n
sin(x) = x − + − +... =
3! 5! 7! ∑ (2n + 1)! x2n+1 (R = ∞)
n=0
What is the meaning of R = ∞? Check that the ∑-notation matches the pattern in
each case. Does the index n represent the degree of the Taylor polynomial in these
formulas?
Geometric series. Next, we will discuss the so called geometric series:

∞
1
= 1 + x + x2 + x3 + . . . = ∑ xn (R = 1)
1−x n=0
Example 2. Check that the series written above is the Taylor series centred at a = 0
1
of the function f (x) = 1−x .
Solution: Let us take a few derivatives of f (x):
1 1·2 1·2·3 1·2·3·4
f ′ (x) = , f ′′ (x) = , f ′′′ (x) = , f (4) (x) = ,
(1 − x)2 (1 − x)3 (1 − x)4 (1 − x)5
n!
etc. We notice the pattern: f (n) (x) = (1−x)n+1
, so f (n) (0) = n!. By definition, the
Taylor series will be
′′′
f ′′ (0) 2 f (0) 3 2! 3!
f (0) + f ′ (0)x + x + x + . . . = 1 + x + x2 + x3 + . . .
2! 3! 2! 3!
which is exactly 1 + x + x2 + x3 + . . . = ∑∞ n
n=0 x .
In the next exercise, we will check that the geometric series 1 + x + x2 + x3 +

n 1
. . . = ∑∞n=0 x indeed converges to 1−x when −1 < x < 1, and it does not converge
outside of this interval. This precisely will mean that the radius of convergence of
the geometric series is R = 1.
Exercise 2. (a) If x is outside of the interval −1 < x < 1, i.e. if x ≤ −1 or x ≥ 1,

does xn get smaller and smaller when n gets bigger? What can we conclude about
the series 1 + x + x2 + x3 + . . . and why?
1
(b) Using simple algebra, check that the difference between the function 1−x
2 3 n
and its Taylor polynomial 1 + x + x + x + . . . + x of degree n can be simplified as
1 xn+1
− (1 + x + x2 + x3 + . . . + xn ) = .
1−x 1−x
(c) What happens to the difference in part (b) when −1 < x < 1 and n goes to
infinity?
Comment. This is a famous feature of Taylor

series that they approximate a function f (x)
on a symmetric interval a − R < x < a + R
around the centre a. For example, if we look
1
at the function 1−x (black solid curve in the
figure), it has a vertical asymptote at x = 1
where the denominator becomes 0, so the Tay-
lor series cannot possibly approximate it to
the right of a = 0 any further than x = 1. On
1
the left side of x = 0, the function 1−x is well
defined all the way up to −∞, but the Taylor
series fail to approximate it beyond x = −1.
We see how Taylor polynomials P20 (x) and
P21 (x) in the figure start growing fast around
x = −1. Because we have some obstacle for convergence on the right of the centre,
it automatically limits us to the left of the centre as well.
Taylor series for the logarithm. Next, we will discuss the series:
x2 x3 ∞
xn
ln(1 − x) = −x − − −... = −∑ (R = 1)
2 3 n=1 n
2 3
Example 2. Show that ln(1 − x) = −x − x2 − x3 − . . . for any −1 < x < 1.
Solution: Let us take the formula from part (b) in the previous exercise:
1 t n+1
− (1 + t + t 2 + t 3 + . . . + t n ) =
1−t 1−t
and integrate it between 0 and x:

Z x Z x Z x n+1
1 2 3 n t
dt − (1 + t + t + t + . . . + t ) dt = dt.
0 1−t 0 0 1−t
The left hand side is easy to integrate and we get
x2 x3 xn+1
Z x n+1
t
− ln(1 − x) − x + + + . . . + = dt.
2 3 n+1 0 1−t
Our goal is to show that, for −1 < x < 1, the two terms on the left hand side are
close to each other,
x2 x3 xn+1
− ln(1 − x) ≈ x + + +...+
2 3 n+1
n+1
so all we need to show is that the integral 0x t1−t dt is small when n is large. To
R
show this, for certainty, take x = 0.9. Then the numerator t n+1 < 0.9n and the
denominator 1 − t > 0.1 is not too small (we do not divide by something close
n+1 n
to zero), so the function we integrate is pretty small: t1−t ≤ 0.9
0.1 → 0 as n → ∞.
As a result, the integral will be small, so the series will indeed approximate the
logarithmic function ln(1 − x).
Exercise 3. Show that, for any −1 < x < 1,
x2 x3 ∞
(−1)n+1 n
ln(1 + x) = x − + −... = ∑ x .
2 3 n=1 n
Hint: use the series from the previous example.
Answer to Exercise 1. In both cases, the series notation expresses that as we add
more and more terms, the sum will get closer and closer to our function, cos(x)
or sin(x). The fact that the radius of convergence R is equal to ∞ means that this
approximation will work for all x, between −∞ < x < ∞. Of course, the further x
is from the centre a = 0, the more terms we might have to add before this approxi-
mation gets good.
n
Let us check the ∑-notation. In the case of cos(x), the general term is (−1) 2n
(2n)! x ,
so:
(−1)0 0 (−1)1 2 x2
n = 0 =⇒ x = 1, n = 1 =⇒ x =− ,
(0)! (2)! 2!
(−1)2 4 x4 (−1) 63 x6
n = 2 =⇒ x = , n = 3 =⇒ x =− ,
(4)! 4! (6)! 6!
etc. We see that the formula matches the pattern correctly. In the case of sin(x), the
(−1)n 2n+1
general term is (2n+1)! x , so:
(−1)0 1 (−1)1 3 x3
n = 0 =⇒ x = x, n = 1 =⇒ x =− ,
(1)! (3)! 3!
(−1)2 5 x5 (−1)3 7 x7
n = 2 =⇒ x = , n = 3 =⇒ x =− ,
(5)! 5! (7)! 7!
etc. Again, we see that the formula matches the pattern correctly. In these two
cases, the index n does not represent the degree of the polynomial. It represents the
term number, and the degree is either 2n in the case of cosine, or 2n + 1 in the case
of sine. Notice how the degree increases by 2 in both cases, which explains why
we needed to multiply n by 2 in these formulas.
Answer to Exercise 2. (a) If |x| ≥ 1 then |xn | ≥ 1, so xn does not get small when n
gets large. When we start adding numbers xn which do not get smaller and smaller,
we can not approach any limit, so Taylor polynomials Pn (x) = 1 + x + . . . + xn
will not converge to anything as the degree n gets bigger. This explains why the
geometric series does not converge outside of the interval −1 < x < 1.
(b) Writing the difference with the common denominator,
1 1 − (1 − x)(1 + x + x2 + x3 + . . . + xn )
− (1 + x + x2 + x3 + . . . + xn ) = .
1−x 1−x
Let us multiply out the second term (1 − x)(1 + x + x2 + x3 + . . . + xn ) in the nu-
merator:
(1 + x + x2 + x3 + . . . + xn ) − x(1 + x + x2 + x3 + . . . + xn )
=(1 + x + x2 + x3 + . . . + xn ) − (x + x2 + x3 + x4 + . . . + xn + xn+1 )
=1 + (x + x2 + x3 + . . . + xn ) − (x + x2 + x3 + x4 + . . . + xn ) − xn+1
(
(
((( ((((
(x(
=1 + ( +(x2(
+(x3(
+(. . . + xn ) − (x(
+(x2(
+(x3(
+(x4(+ . . . + xn ) − xn+1
(
which is 1 − xn+1 .
The numerator is 1 − (1 − xn+1 ) = xn+1 as promised.
(c) If −1 < x < 1, which means that the absolute value |x| < 1 is smaller than
1, then |x|n → 0 as n → ∞, because an is a geometric decay function when the
base a < 1. For example, 0.510 = 0.0009765625. This means that the difference
we found in part (b):
xn+1
→0
1−x
will get smaller and smaller as n gets bigger, when x is between −1 and 1. This
1
shows that the geometric series 1 + x + x2 + x3 + . . . converges to 1−x for −1 < x <
1, so the radius of convergence R = 1, as promised.
Answer to Exercise 3. If we take the series from the previous example,
x2 x3 ∞
xn
ln(1 − x) = −x − − −... =−∑
2 3 n=1 n
and replace x by −x, we get
(−x)2 (−x)3 ∞
(−x)n
ln(1 − (−x)) = −(−x) − − −... =−∑ .
2 3 n=1 n
Since −(−x)n = (−1)((−1)(x))n = (−1)(−1)n xn = (−1)n+1 xn , the above can be

simplified as
x2 x3 ∞
(−1)n+1 n
ln(1 + x) = x − + −... = ∑ x ,
2 3 n=1 n
which is what we wanted. Notice that if x is between −1 and 1 then −x is also

between −1 and 1, so this series converges on the interval −1 < x < 1.
5.2 Transformations of Taylor series

In this section we will learn how to combine the classic Taylor series from the last
section with some simple algebraic manipulations to obtain Taylor series for more
complicated functions. This is a very useful skill, because in many cases it makes
finding Taylor polynomials and series much easier than using the definition directly
by computing derivatives. At the end of the section we will give one application
and find Padé approximations to functions ex and cos(x) near x = 0.
) 2
Example 1. Find the Taylor series of f (x) = sin(x
x centered at x = 0. Write the
answer using the Σ-notation and by writing out the first few terms. Where does this
series converge? What is its radius of convergence R?
Solution: Step 1. Because sine appears in the function f (x), we start by recalling
its Taylor series, Using the Σ-notation:
∞
(−1)n
sin(x) = ∑ (2n + 1)! x2n+1 .
n=0
Step 2. We want to think of the variable x in the above series as a placeholder

that can be replaced by any other expression. To emphasize this, let us replace this
placeholder by a banana:
∞
(−1)n 2n+1
sin = ∑ .
n=0 (2n + 1)!
In this problem, we will replace by x2 and then simplify using algebra:

∞
(−1)n ∞
(−1)n
sin(x2 ) = ∑ (2n + 1)! (x2 )2n+1 = ∑ (2n + 1)! x4n+2 .
n=0 n=0
Step 3. Finally, we divide both sides by x and simplify. In the Taylor series, we
can divide term by term, just like a regular sum:
sin(x2 ) ∞
(−1)n x4n+2 ∞
(−1)n 4n+1
=∑ =∑ x .
x n=0 (2n + 1)! x n=0 (2n + 1)!
This is the answer in the Σ-notation. Writing out the first few terms:
(−1)0 1 (−1)1 5 (−1)2 9 (−1)3 13 x5 x9 x13

x + x + x + x +... = x− + − +...
1! 3! 5! 7! 3! 5! 7!
How can we decide where this series converges? We know from last section that the
original series for sin(x) converges everywhere, because its radius of convergence
is R = ∞. In other words, −∞ < x < ∞ or −∞ < < ∞, so we can replace by
any number we want, which means that the new series also converges for all x, so
its radius of convergence is R = ∞.
5.2 Transformations of Taylor series 269
Warning. The reason we could divide each term by x is because the series for
sin(x2 ) did not have a constant term c0 , and all the terms had at least one power
of x that could be cancelled out. If, for example, the series started with 1 + . . . ,
dividing by x would give 1x + . . . , which would not be a Taylor series. Also, in Step
2 above, should be something that gives us Taylor series again at the end. As
we will see from examples below, it does not always have to the the power of x,
but it should be simple enough.
2
Exercise 1. Find the Taylor series of f (x) = xe−x centered at x = 0. Write the
1
Example 2. Find the Taylor series of f (x) = 1+2x 2 centered at x = 0. Write the
1 1
Solution: The function 1+2x 2 looks similar to 1−x , so we should start with the
geometric series:
∞ ∞
1 1
1−x
= ∑ xn or
1−
= ∑( )n .
n=0 n=0
In the denominator, our function f (x) has 1 + 2x2 , but we want to see something
like 1 − . In this case, we simply write 1 + 2x2 = 1 − (−2x2 ), which means that
we should replace with −2x2 :
∞ ∞
1 1 2 n
= = ∑
1 + 2x2 1 − (−2x2 ) n=0
(−2x ) = ∑ (−1)n 2n x2n .
n=0
In this case we do not multiply by anything, so we can simply write out a few terms
at the beginning of the series:
∞
1
= ∑ (−1)n 2n x2n = 1 − 2x2 + 4x4 − 8x6 + 16x8 − . . . .
1 + 2x2 n=0
How can we decide where this series converges if we know that the original ge-
ometric series converges when −1 < x < 1, or −1 < < 1? Since we replaced
by −2x , the new series converges when −1 < −2x < 1, or −1 < 2x2 < 1.
2 2
Solving this for x:

1 1 1 1
2x2 < 1 =⇒ x2 < =⇒ |x| < √ =⇒ −√ <x< √ .
2 2 2 2
This means that the radius of convergence of the new series is R = √1 .

2
1
Exercise 2. Find the Taylor series for f (x) = 1+3(x−1) 2 centered around x = 1.
Write the answer using the Σ-notation and by writing out the first few terms. Where
does this series converge? What is its radius of convergence R?
Example 3. Find the Taylor series of f (x) = ln(2 − x) centered at x = 0. Write the
Solution: The function ln(2 − x) looks similar to ln(1 − x), so we should start with
the series:
∞
xn ∞
( )n
ln(1 − x) = − ∑ or ln(1 − )=−∑ .
n=1 n n=1 n
To make 2−x look like 1− , we could rewrite it as 2−x = 1+1−x = 1−(x−1)

and replace by x − 1. However, this will give us the series with powers (x − 1)n
which is centered at x = 1, while we want the series centered at x = 0. We need to
do something a bit different. The trick is to write 2 − x = 2(1 − 2x ), so that
x x
ln(2 − x) = ln 2 1 − = ln 2 + ln 1 − .
2 2
Now, we can replace above by 2x , so
x ∞
(x/2)n ∞
xn
ln(2 − x) = ln 2 + ln 1 − = ln 2 − ∑ = ln 2 − ∑ n .
2 n=1 n n=1 n2
The first few terms at the beginning of the series are
x x2 x3
ln(2) − − − − . . . .
2 8 24
From the last section we know that the original series for ln(1 − x) converges when
−1 < x < 1, or −1 < < 1. Since we replaced by 2x , the new series converges
when −1 < 2x < 1, or −2 < x < 2. This means that the radius of convergence is
R = 2.
Exercise 3. Find the Taylor series of f (x) = ln(10 + x2 ) centered at x = 0. Write

the answer using the Σ-notation and by writing out the first few terms. Where does
this series converge? What is its radius of convergence R?
Next two problem will be slightly more tricky, because they require both shift
and rescaling of the argument.
Example 4. Find the Taylor series of f (x) = 1x centered at x = 5. Write the an-
swer using the Σ-notation. Where does this series converge? What is its radius of
convergence R?
Solution: The function 1x looks similar to 1−x

1
, so we should start with the geometric
series: ∞ ∞
1 1
= ∑ xn or = ∑ ( )n .
1 − x n=0 1− n=0
We want the series to be centered at 5, so at the end we want the powers of x − 5.
That means that we should add and subtract 5 and rewrite our function as
1 1
= .
x 5 + (x − 5)
1
This does not quite look like 1− yet, but we can factor out 5:
1 1 1 1 1 1 1
= = x−5
= · x−5
= · .
x 5 + (x − 5) 5(1 + 5 ) 5 1 + 5 5 1 − (− x−5
5 )
Now it looks like what we want and if we replace in the geometric series above
by (− x−5
5 ), we get
1 1 1 ∞ x − 5 n ∞
(−1)n
· = ∑ − = ∑ (x − 5)n .
5 1 − (− x−5
5 ) 5 n=0 5 n=0 5 n+1
We can see that this series is centered at a = 5 because all the powers are of the
form (x − 5)n . This series converges when −1 < − x−55 < 1, or 0 < x < 10, which
means that the radius of convergence is R = 5.
Exercise 4. Find the Taylor series of f (x) = ln(x) centered at x = 10. Write the
answer using the Σ-notation. Where does this series converge? What is its radius
of convergence R?
Multiplying two series. If we want to multiply two Taylor series,

∞ ∞
f (x) = ∑ an x n and g(x) = ∑ bn xn ,
n=0 n=0
we can multiply them out term by term, just like regular sums, and then collect the
terms with the same powers. In fact, there is a general formula how to multiply two
Taylor series using the Σ-notation, but it is a bit too complicated for our purposes,
so we will stick with simpler examples where we only want to find a few terms of
the product f (x)g(x). Let us illustrate it on an example.
Example 5. Find the first few terms of the Taylor series of f (x)g(x) centered at
x = 0 if
f (x) = 1 − 2x + x2 + 7x3 + . . . and g(x) = 3 + x + x2 − 4x3 + . . . .

Solution: One subtle point to remember when multiplying out the product of two
series
f (x)g(x) = (1 − 2x + x2 + 7x3 + . . .) × (3 + x + x2 − 4x3 + . . .)
is that the terms + . . . could contain powers of x starting from x4 , x5 , etc., and in the
problem we are not told exactly what those terms are. This means the following.
Suppose we make the multiplication table for f (x)g(x) writing all the terms of
f (x) in the first row, all terms of g(x) in the first column, and their products in
other entries of the table:
1 −2x x2 7x3 ···

3 3 −6x 3x2 21x3 ···
x x −2x2 x3 7x4 ···
x2 x2 −2x3 x4 7x5 ···
−4x3 −4x 3 8x 4 −4x −28x6
5 ···
··· ··· ··· ··· ··· ···
In this table the terms + . . . could contain powers starting from x4 and, as we said,
we do not know what they are. This means that the terms written in purple in the
lower right corner of the table should not be collected, because they contain powers
x4 , x5 and x6 and they could potentially be modified by the missing terms + . . . .
This means that when we multiply out two series, we should completely ignore the
terms of the same degree as + . . . and our multiplication table could look like this:
1 −2x x2 7x3
3 3 −6x 3x2 21x3
x x −2x2 x3
x2 x 2 −2x3
−4x3 −4x 3
Then we collect the terms with the same powers,
f (x)g(x) = 3 + (x − 6x) + (x2 − 2x2 + 3x2 ) + (−4x3 − 2x3 + x3 + 21x3 ) + . . . ,
and, after simplifying, we see that the first few terms of the product are
f (x)g(x) = 3 − 5x + 2x2 + 16x3 + . . . .
Exercise 5. Find the first few terms of the Taylor series of f (x)g(x) centered at
x = 0 if
f (x) = 7 − x2 + 2x4 + x6 + . . . and g(x) = 1 + 2x2 + 4x4 + 8x6 + . . . .

Padé approximation. Next, we will give one application of what have learned
so far in this section to find some novel approximations of ex and cos(x) near x = 0:
the so called Padé approximation.
Example 6. In this problem we are going

to find an approximation for ex near x = 0
by a simple function of the form 1+ax
1+bx (the
red dashed curve in the figure). The goal of
this question is to find the best parameters
a and b.
(a) Find the first three terms of the Taylor
1
series for 1+bx .
(b) Find the first three terms of the Taylor
series for the product
1 + ax 1
= (1 + ax) · .
1 + bx 1 + bx
(c) Make sure that the terms in part (b) match the first three terms of the Taylor
series for ex to find a and b.
1
Solution: (a) Using the geometric series 1−x = 1 + x + x2 + . . ., we get that
1 1
= = 1 + (−bx) + (−bx)2 + . . . = 1 − bx + b2 x2 + . . . .
1 + bx 1 − (−bx)
(b) Using what we found in part (a), we want to multiply out
1
(1 + ax) · = (1 + ax) × (1 − bx + b2 x2 + . . .)
1 + bx
= 1 − bx + b2 x2 + ax − abx2 + ab2 x3 + . . . .
However, we should remember that the + . . . term contains powers starting with
x3 , so we must ignore the term ab2 x3 . Correct multiplication will actually be:
1
(1 + ax) · = (1 + ax) × (1 − bx + b2 x2 + . . .)
1 + bx
= 1 − bx + b2 x2 + ax − abx2 + . . .
= 1 + (a − b)x + (b2 − ab)x2 + . . . .
Luckily, forgetting to ignore the term ab2 x3 would not affect out next step, but it
was worth emphasizing this point once again.
(c) We want our series in part (b) to be a good approximation of ex near x = 0,
2
which has the Taylor series ex = 1 + x + x2 + . . .. For this purpose, we want the
2
coefficients in part (b) to match 1 + x + x2 + . . . , so
1
a−b = 1 and b2 − ab = .
2
The first equation gives a = b + 1 and plugging it into the second equation gives
b2 − (b + 1)b = 12 , or −b = 12 , or b = − 12 . Then a = b + 1 = − 12 + 1 = 21 . The
approximation we were looking for is
1 + ax 1 + 0.5x
ex ≈ = .
1 + bx 1 − 0.5x
This is the function graphed by the red dashed curve in the figure above.
Exercise 6. In this problem we are going

to find an approximation for cos(x) near
x = 0 by a simple function of the form
1+ax2
1+bx2
(the red dashed curve in the figure).
The goal of this question is to find the best
parameters a and b.
(a) Find the first three terms of the Taylor
1
series for 1+bx 2 .
(b) Find the first three terms of the Taylor

series for the product
1 + ax2 1
2
= (1 + ax2 ) · .
1 + bx 1 + bx2
(c) Make sure that the terms in part (b) match the first three terms of the Taylor
series for cos(x) to find a and b.
x n 2
Answer to Exercise 1. Because, ex = ∑∞
n=0 n! , replacing x with −x gives
2
∞
(−x2 )n ∞
((−1)x2 )n ∞
(−1)n x2n
e−x = ∑ n! = ∑ n! = ∑ n!
n=0 n=0 n=0
Multiplying this by x term by term like a regular sum:
2
∞
(−1)n x2n x ∞
(−1)n x2n+1 x5 x7
xe−x = ∑ =∑ = x − x3 + − + . . .
n=0 n! n=0 n! 2! 3!
The original series for ex converges everywhere, so x can be replaced by any num-
ber. This means that the new series also converges for all x and its radius of con-
vergence is R = ∞.
Answer to Exercise 2. As in the previous example, if we write 1 + 3(x − 1)2 =

1 − (−3(x − 1)2 ) and use the geometric series:
∞ ∞
1 1 2 n
= = ∑ (−3(x − 1) ) = ∑ (−1)n 3n (x − 1)2n .
1 + 3(x − 1)2 1 − (−3(x − 1)2 ) n=0 n=0
This series is centered at a = 1, because all the terms have powers (x − 1)n . A few
terms at the beginning of the series are
1 − 3(x − 1)2 + 9(x − 1)4 − 27(x − 1)6 + 81(x − 1)8 − . . . .
Since we replaced x by −3(x − 1)2 , the new series converges when −1 < −3(x −
1)2 < 1, or −1 < 3(x − 1)2 < 1. Solving this for x:
1 1 1 1
3(x − 1)2 < 1 =⇒ (x − 1)2 < =⇒ |x − 1| < √ =⇒ − √ < x − 1 < √ .
3 3 3 3
We can also write this as 1 − √13 < x < 1 + √13 , so the radius of convergence is
R = √13 .
x 2 x 2
Answer to Exercise 3. We write 10 + x2 = 10(1 + 10 ) = 10(1 − ( 10 )), so that
x2 x2
ln(10 + x2 ) = ln 10 1 − − = ln 10 + ln 1 − − .
10 10
Using the series for ln(1 − x):
x2 ∞
(−x2 /10)n ∞
(−1)n+1 2n
ln 10 + ln 1 − − = ln 10 − ∑ = ln 10 + ∑ n
x .
10 n=1 n n=1 n10
The first few terms at the beginning of the series are
x2 x4 x6
ln 10 + − + +....
10 200 3000
The original series for ln(1 − x) converges when −1 < x√< 1, so the√new series
x2
converges when −1 < − 10 <√1. Solving for x, we get − 10 < x < 10, so the
radius of convergence is R = 10.
Answer to Exercise 4. We begin by writing

x − 10 x − 10
ln(x) = ln(10 + x − 10) = ln 10 1 + = ln 10 + ln 1 − − .
10 10
Using the series for ln(1 − x), this equals
∞ (− x−10
10 )
n ∞
(−1)n+1
ln 10 − ∑ = ln 10 + ∑ n
(x − 10)n .
n=1 n n=1 n10
The original series for ln(1 − x) converges when −1 < x < 1, so the new series
converges when −1 < − x−10 10 < 1. Solving for x, we get 0 < x < 20, so the series
is centered at a = 10 and the radius of convergence is R = 10.
Answer to Exercise 5. Because the + . . . term could contain powers x7 or above,
we should ignore those powers when multiplying things out:
7 −x2 2x4 x6
1 7 −x2 2x4 x6
2x2 14x2 −2x4 4x6
4x4 28x4 −4x6
8x6 56x6
Collecting the terms with the same powers, we see that f (x)g(x) = 7 − 13x2 +
28x4 + 57x6 + . . . .
1
Answer to Exercise 6. (a) Using the geometric series 1−x = 1 + x + x2 + . . ., we
get that
1 1
2
= = 1 + (−bx2 ) + (−bx2 )2 + . . . = 1 − bx2 + b2 x4 + . . . .
1 + bx 1 − (−bx2 )
(b) Using what we found in part (a), we want to multiply out
1
(1 + ax2 ) · = (1 + ax2 ) × (1 − bx2 + b2 x4 + . . .)
1 + bx2
= 1 − bx2 + b2 x4 + ax2 − abx4 + . . .
= 1 + (a − b)x2 + (b2 − ab)x4 + . . . ..
We did not write the term +ab2 x6 because it is absorbed by the dots + . . . .
(c) We want our series in part (b) to be a good approximation of cos(x) near
2 4
x = 0, which has the Taylor series cos(x) = 1 − x2! + x4! + . . .. For this purpose, we
2 4
want the coefficients in part (b) to match 1 − x2! + x4! + . . . , so
1 1 1
a−b = − and b2 − ab = = .
2 4! 24
The first equation gives a = b − 21 and plugging it into the second equation gives
b2 − (b − 12 )b = 24
1 1
, or b2 = 24 1
, or b = 12 1
. Then a = 12 − 12 = − 12
5
. The approxima-
tion we were looking for is
5 2
1 + ax2 1 − 12 x
cos(x) ≈ 2
= 1
.
1 + bx 1 + 12 x2
This is the function graphed by the red dashed curve in the figure in the statement
of the problem.
5.3 Ratio test and the radius of convergence 277
5.3 Ratio test and the radius of convergence

In this section we will discuss the radius of convergence R of Taylor series in more
detail, and our main tool will be the so called Ratio Test. To state this test, let us
first consider arbitrary series
∞
∑ an = a0 + a1 + a2 + a3 + a4 + a5 + . . .
n=0
that consists of adding a sequence of numbers a0 , a1 , a2 , a3 , a4 , a5 , . . . , indefinitely.

We want to know whether this addition process gets closer and closer to some
limiting number as we keep adding more and more terms, and the Ratio Test gives
us one very useful criterion, as follows.
• Compute the limit of the ratio of two consecutive numbers in the sequence,
|an+1 |
ρ := lim
n→∞ |an |
where we ignore their signs by taking the absolute values |an+1 | and |an |.
• Then the Ratio Test tells us that:
If ρ > 1 then the series diverges. If ρ < 1 then the series converges.
If ρ = 1 then the Ratio Test is inconclusive.

The reason why the Ratio Test works is quite simple.
• If ρ > 1, this indicates that, after a certain step n, the next number |an+1 | gets
bigger than the previous number |an |, in absolute value. If we keep adding bigger
and bigger numbers, we cannot hope to get closer and closer to some limiting
number, so the series must diverge.
• On the other hand, if ρ < 1, this indicates that, after a certain step n, the next
number |an+1 | gets smaller than the previous number |an |, and it gets smaller by
about a factor of ρ < 1. For example, if an = 1 then (ignoring possible ± signs)
the next number is about 0.5, the next number is about 0.25, the next number is
about 0.125, etc. These number are getting small quickly, so when we add them
up we do get closer and closer to some limiting number.
• If ρ = 1 then the number an might decrease, but they do not decrease fast enough
for us to be able to conclusively tell whether their sum converges to something.
Usually, a more careful analysis is needed in this case, but for the purpose of
studying the radius of convergence, the Ratio Test will be enough.
We recall that the Taylor series of a function f (x) centred at a converges on some
interval of the form a − R < x < a + R:
x
a−R a a+R
or, in other words, for x in between a − R and a + R for some number R which
is called the radius of convergence. In the examples below we will see that such
behaviour is, indeed, a consequence of the Ratio Test.
We have stated in Section 5.1 that the radius of convergence of the the Taylor
series of the exponential function ex is R = ∞. In our first example, we will check
this using the Ratio Test.
Example 1. Find the radius of convergence of the Taylor series:
x2 x3 ∞
xn
ex = 1 + x + + +... = ∑ n! .
2! 3! n=0
Solution: To use this Ratio Test, we consider two consecutive terms in the series,
|x|n |x|n+1
|an | = and |an+1 | = ,
n! (n + 1)!
(do not forget the absolute values!) and then compute their ratio,
|x|n+1
|an+1 | (n+1)! |x|n+1 n! |x|n+1 n! n!
= |x|n
= · n= · = |x| .
|an | (n + 1)! |x| |x|n (n + 1)! (n + 1)!
n!
Notice how dividing |x|n+1 by |x|n cancels |x|n , so we are left with one power of |x|.
This will be a typical feature when applying the Ratio Test to Taylor series. Notice
that we can also simplify the ratio of factorials
n! 1 · 2 · · · (n − 1) · n 1
= =
(n + 1)! 1 · 2 · · · (n − 1) · n · (n + 1) n + 1
because we could cancel out 1 · 2 · · · (n − 1) · n in the numerator and denominator.
|x|
As a result, the ratio is n+1 and its limit is
|an+1 | |x|
lim = lim = 0.
n→∞ |an | n→∞ n + 1
Since this limit is less than 1, the Ratio Test tells us that the series converges. Notice
that the limit was 0 no matter what x was, so this conclusion works for all x, which
means that the radius of convergence is R = ∞.
Exercise 1. Find the radius of convergence of the Taylor series:
x2 x3 ∞
xn
ln(1 − x) = −x − − −... = −∑ .
2 3 n=1 n
x2 x4 x6 ∞
(−1)n 2n
cos(x) = 1 − + − +... = ∑ x .
2! 4! 6! n=0 (2n)!
Solution: To use this Ratio Test, we consider two consecutive terms in the series,
|x|2n |x|2(n+1) |x|2n+2

|an | = and |an+1 | = = .
(2n)! (2(n + 1))! (2n + 2)!
Notice how 2n does not become 2n + 1, but 2(n + 1) = 2n + 2! That is because we
have to replace each appearance of n by n + 1. Their ratio is
|x|2n+2
|an+1 | (2n+2)! |x|2n+2 (2n)! |x|2n+2 (2n)! (2n)!
= = · 2n = · = |x|2 .
|an | |x|2n (2n + 2)! |x| |x|2n (2n + 2)! (2n + 2)!
(2n)!
Notice how dividing |x|2n+2 by |x|2n cancels |x|2n , so we are left with |x|2 , not |x|.
Factorials here also simplify differently:
(2n)! 1 · 2 · · · (2n − 1) · 2n 1
= =
(2n + 2)! 1 · 2 · · · (2n − 1) · 2n · (2n + 1)(2n + 2) (2n + 1)(2n + 2)
because we could cancel out 1 · 2 · · · (2n − 1) · 2n in the numerator and denominator.
Because 2n increased by 2, there are two extra factors left in this case. As a result,
|x|2
the ratio is (2n+1)(2n+2) and its limit is
|an+1 | |x|2
lim = lim = 0.
n→∞ |an | n→∞ (2n + 1)(2n + 2)
Since this limit is less than 1, the Ratio Test tells us that the series converges for all
x, which means that the radius of convergence is R = ∞.
x3 x5 x7 ∞
(−1)n
sin(x) = x − + − +...
3! 5! 7!
= ∑ (2n + 1)! x2n+1 .
n=0
In the examples above, we checked that the radius of convergence is what was
claimed in the table at the beginning of Section 5.1. Now, let us try some new
series.

∞
(−2)n
∑ 2
(x + 5)2n .
n=1 n
Solution: We consider two consecutive terms in the series,

2n 2n+1
|an | = |x + 5|2n and |an+1 | = |x + 5|2(n+1) .
n2 (n + 1)2
Their ratio is
2n+1
|an+1 | (n+1)2
|x + 5|2(n+1) n2
= 2n
= 2|x + 5|2 .
|an | |x + 5|2n (n + 1)2
n2
To compute the limit, we divide the numerator and denominator by the highest
power of n, which is n2 in this case, so
n2
|an+1 | n2 n2
lim = lim 2|x + 5|2 = 2|x + 5|2 lim n+1
n→∞ |an | n→∞ (n + 1)2 n→∞ (
n )
2
1 1
= 2|x + 5|2 lim = 2|x + 5|2 = 2|x + 5|2 .
n→∞ (1 + 1 )2 (1 + 0)2
n
By the Ratio Test, the series converges when 2|x + 5|2 < 1. Solving this for x,
1 1 1 1
|x + 5|2 < =⇒ |x + 5| < √ =⇒ − √ < x+5 < √
2 2 2 2
1 1
=⇒ − 5 − √ < x < −5 + √ .
2 2
This means that the centre is a = −5, as it should be because the series is written in
terms of powers of (x + 5)n = (x − (−5))n , which looks like (x − a)n with a = −5.
The radius is convergence is R = √12 .
Comment. In Exercise 1 and Example 3, we had some polynomial P(n) in the

denominator, namely, n or n2 . In both cases we saw that, if we ignore other factors,
the limit of the ratio of polynomial factors converged to 1:
P(n + 1)
lim = 1.
n→∞ P(n)
This is always true for any polynomial, because when we divide the numerator
and denominator by the highest power of n, it makes all the lower degree terms
disappear. If we are looking for a radius of convergence, for example, on a multiple
choice question on the exam where we do not need to show our work, we can
simply erase any polynomial factors from the beginning. For example, the series
∞
(−2)n (n4 − 7n + 3) ∞
∑ 2 + 3n + 1
(x + 5)2n and ∑ (−2)n (x + 5)2n
n=1 n n=1
have the same interval and radius of convergence, because the factors n4 − 7n + 3
and n2 + 3n + 1 in the numerator and denominator of the first series will not affect
the limit. The second one is much easier to work with. Of course, other factors
involving factorials like n! and exponentials like 5n cannot be ignored.

∞
n3 + n
∑ 5n (x − 4)n .
n=0
Pretend that this is a multiple choice question on the exam.

x−1 n
Example 4. (a) If the series ∑∞
n=0 cn ( 4 ) has the interval of convergence (−7, 9),
∞ 2x+1 n
where does the series ∑n=0 cn ( 5 ) converge? (b) Is is ever possible that the
2x+1 n
series ∑∞
n=0 cn ( 5 ) converges on the interval (0, 2)?
Solution: (a) The key step here is to use the given information to determine where
x−1 n
the series ∑∞ n=0 cn ( )n converges. The original series ∑∞ n=0 cn ( 4 ) converges
x−1
when −7 < x < 9 and = 4 , so we need to rewrite −7 < x < 9 in terms of
x−1
4 .
First, we subtract 1, so −7 < x < 9 becomes −8 < x − 1 < 8. Then we divide by
4 and get −2 < x−1 4 < 2. As a result, the given information can be rephrases like
this: ∑∞ c ( )n converges when −2 < < 2.
n=0 n
2x+1 2x+1 n
In the second series, = 5 , so the series ∑∞n=0 cn ( 5 ) converges when
−2 < 2x+1 11
5 < 2 and, solving this for x we get −10 < 2x + 1 < 10, or − 2 < x < 2 .
9
11 9 1 11 9
The center of this series is in the middle of − 2 and 2 , which is a = 2 (− 2 + 2 ) =
− 12 . The radius of convergence is R = 92 − (− 12 ) = 5.
2x+1 n 2x+1
(b) The series ∑∞ n=0 cn ( 5 ) is written in terms of the powers of 5 , so the
2x+1 1
centre must always be when 5 = 0 or x = − 2 . It is not possible that this series
converges on the interval (0, 2) because its center would be at 1 and not − 12 .
x−1 n
Exercise 4. (a) If the series ∑∞n=0 cn ( 2 ) has the radius of convergence R = 2,
x+2 n
where does the series ∑∞n=0 cn ( 4 ) converge? (b) Is is ever possible that the series
∞ x+2 n
∑n=0 cn ( 4 ) converges on the interval (−4, 0)?
Using symmetry. In the next two problems we will use the fact that the interval
of convergence must be symmetric around the center:
x
a−R a a−R
Convergence at the endpoints of the interval a − R and a + R cannot be determined

from the Ratio Test, because this is exactly where it is inconclusive. So we cannot
say anything about the endpoints and, usually, they need to be handled separately.
Sometimes the series might converge at one or both endpoints, but sometimes it
might diverge at both endpoints.
Example 5. Suppose we know that the series ∑∞ n

n=0 cn (x − 2) converges at x = 3
but diverges at x = −1. What can we tell about the series convergence at x = 0,
x = 1, x = 0.5, x = 5 and x = 7?
Solution: The center of the series is when x − 2 = 0, or at x = 2. The statement
about convergence of divergence must be symmetric around the center. Because
we know that the series converges at x = 3 but diverges at x = −1, this leads to the
following diagram:
diverges ??? converges ??? diverges

x
−1 1 2 3 5
Because the series converges at x = 3, it definitely converges in between the center

2 and 3 and, by symmetry, it definitely converges between 1 and 2. Because it
diverges at x = −1, it must also diverge to the left of −1 and, by symmetry, it must
diverge to the right of x = 5 (−1 and 5 are at the same distance from the center).
This leaves the intervals in between −1 and 1, and in between 3 and 5. Here we
have no information to tell whether the series converges or diverges. As a result,
we can tell that at x = 0.5 the series converges, at x = 7 the series diverges, and at
x = 0 we can not tell whether it converges or diverges given what we know. We also
cannot tell what happens at x = 1 and x = 5, because they could be the endpoints
of the interval of convergence.
f (−2) (n)
Exercise 5. Suppose we know that the Taylor series ∑∞
n=0 n! (x+2)n converges
at x = 2 but diverges at x = 4. What can we tell about the series convergence at
x = −9, x = −6, x = −5, x = −4, x = 1 and x = 3?
Answer to Exercise 1. Two consecutive terms, with absolute values, are

|x|n |x|n+1
|an | = and |an+1 | = ,
n n+1
and their ratio is
|x|n+1
|an+1 | n+1 n
= |x|n
= |x| .
|an | n+1
n
n
To compute the limit, we divide the numerator and denominator of n+1 by the
highest power of n, in this case n itself, and we get
n
|an+1 | n n 1 1
lim = lim |x| = |x| lim n+1 = |x| lim = |x| = |x|.
n→∞ |an | n→∞ n+1 n→∞ n→∞ 1 + 1 1+0
n n
The Ratio Test tells us that the series converges when this limit is smaller than 1,
i.e. |x| < 1, or −1 < x < 1. This means that the center is a = 0 and the radius of
convergence is R = 1.
Answer to Exercise 2. To use this Ratio Test, we consider two consecutive terms
in the series,
|x|2n+1 |x|2(n+1)+1 |x|2n+3

|an | = and |an+1 | = = .
(2n + 1)! (2(n + 1) + 1)! (2n + 3)!
Their ratio is
|x|2n+3
|an+1 | (2n+3)! |x|2n+3 (2n + 1)! |x|2n+3 (2n + 1)! (2n + 1)!
= = · = 2n+1 · = |x|2 .
|an | |x|2n+1 (2n + 3)! |x|2n+1 |x| (2n + 3)! (2n + 3)!
(2n+1)!
Factorials simplify to
(2n + 1)! 1 · 2 · · · 2n · (2n + 1) 1
= = .
(2n + 3)! 1 · 2 · · · 2n · (2n + 1) · (2n + 2)(2n + 3) (2n + 2)(2n + 3)
|x|2
As a result, the ratio is (2n+2)(2n+3) and its limit is
|an+1 | |x|2
lim = lim = 0.
n→∞ |an | n→∞ (2n + 2)(2n + 3)
Since this limit is less than 1, the Ratio Test tells us that the series converges for all
x, which means that the radius of convergence is R = ∞.
Answer to Exercise 3. On a multiple choice question, we can pretend that the

polynomials factor n3 + n is not there and that our series is
∞ ∞ x − 4 n
1
∑ 5n (x − 4)n = ∑ 5
.
n=0 n=0
From here, we can proceed in two ways. The fastest way is to remember that the
geometric series ∑∞ n=0 ( )n converges when −1 < < 1. In this case is
x−4 x−4
5 , so the above series converges when −1 < 5 < 1, or −5 < x − 4 < 5, or
−1 < x < 9. So the center is the middle points a = 4 and the radius of convergence
is R = 5.
Another way is to use the ratio test. Two consecutive terms are
x−4 n x−4 n+1
|an | = and |an+1 | = ,
5 5
and their ratio is

|an+1 | | x−4
5 |
n+1
x−4
= x−4 n = .
|an | | 5 | 5
This does not depend on n, so the limit
|an+1 | x−4
lim = .
n→∞ |an | 5
By the Ratio Test, the series converges when | x−4

5 | < 1 and, solving it for x, we get
−5 < x − 4 < 5, or −1 < x < 9, as above.
x−1 n x−1
Answer to Exercise 4. (a) The series ∑∞ n=0 cn ( 2 ) has the centre where 2 = 0,
or x = 1. Since the radius of convergence R = 2, it means that it converges when
1 − 2 < x < 1 + 2. We can rewrite it in terms of = x−1
2 as −1 < < 1.
This means that the series ∑n=0 cn ( 4 ) converges when −1 < x+2
∞ x+2 n
4 < 1, or
−4 < x + 2 < 4, or −6 < x < 2. The center is a = −2 and the radius of convergence
is R = 4.
(b) The center of the interval (−4, 0) is −2, so it is possible that it could be the
x+2 n
interval of convergence of the series ∑∞ n=0 cn ( 4 ) , which must have the center
where x+24 = 0, or x = −2.
Answer to Exercise 5. The diagram here will be
diverges ??? converges ??? diverges

−8 −6 −2 2 4 x
At x = −5, x = −4 and x = 1 the series converges. At x = −9 it diverges. And

at x = 3, we can not tell if it converges or diverges. The point x = −6 could be
the endpoint of the interval of convergence, so we cannot tell either if the series
converges or diverges there.
5.4 Applications of Taylor series 285
5.4 Applications of Taylor series

In this section we will go over some applications of Taylor polynomials and series.
We have already seen applications of Taylor polynomials to approximating inte-
grals, and approximating solutions of some differential equations. In this section
we will consider some new applications to computing derivatives, integrals, limits,
and comparing functions near some point. We will also quickly review some old
applications.
The first two problems will emphasize the connection between the coefficients
of the Taylor series ∑∞ n
n=0 cn (x − a) centered at x = a of some function f (x) and
the derivatives of this function, namely,
f (n) (a)
cn = or f (n) (a) = n!cn .
n!
Of course, this is how the coefficients cn of the Taylor series are defined, but if we
can compute the Taylor series first, we can use the coefficients cn to compute the
derivatives.
Example 1. Suppose that

(
n!
∞
n (n) 2n if n is even,
f (x) = ∑ cn (x − 2) and f (2) = 1
if n is odd.
n=0 n!
Find the formula for the coefficient cn .

Solution: Using the formula above,
1
(
f (n) (2) 2n if n is even,
cn = = 1
n! (n!)2
if n is odd.
Here, we simply applied the definition of the coefficients of the Taylor series. For
1
example, c99 = (99!)2 , because 99 is odd.
Exercise 1. Consider the function

∞
5n
f (x) = ∑ n + 3 (x + 4)n .
√
n=20
Which of the following are true.
(a) f (19) (−4) = 0 (c) f (22) (−4) = 521 22!

522
(b) f (22) (−4) = 5 (d) f (25) (−4) = 525
Computing derivatives. In the next two problems we will have to compute the
series first using some basic transformations of classic series. We will also use a
convenient notation for the derivative f (n) (a), namely,
dn
f (x) .
dxn x=a
dn
The notation expresses that we compute nth derivative dxn of the function f (x) and
then evaluate it at x = a.
Example 2. Compute
d 20 −x2 d 21 −x2
xe and xe .
dx20 x=0 dx21 x=0
2
Solution: First, we need to find the Taylor series for xe−x centered at 0. Starting
xn 2
with the exponential series ex = ∑∞n=0 n! , we replace x by −x ,
2 (−x2 )n
∞ ∞
(−1)n 2n
e−x = ∑ n! = ∑ x ,
n=0 n=0 n!
and then multiply it by x,
2
∞
(−1)n 2n ∞
(−1)n 2n+1
xe−x = x ∑ x =∑ x .
n=0 n! n=0 n!
20
d −x 2
To find the derivative dx 20 xe x=0
, we need to find the coefficient in this series in
front of x or, in other words, when the power 2n + 1 = 20, or n = 19
20
2 = 9.5. This
20
is not integer, so there is not term in the series with the power x . Another way to
see that 2n + 1 cannot be equal to 20 is because it is always odd. Since the power
x20 is not in the series, the coefficient c20 = 0 and, as a result,
d 20 −x2
xe = 0.
dx20 x=0
d 21−x 2
To find the derivative dx 21 xe x=0
, we need to find the coefficient c21 in this
21
series in front of x or, in other words, when the power 2n + 1 = 21, or n = 10.
10
The coefficient c21 in front of x21 is (−1) 1
10! = 10! , so the derivative is
d 21 −x2 1 21!
xe = 21!c21 = 21! = = 14079294028800.
dx21 x=0 10! 10!
Exercise 2. Compute
d 11 d 12
x sin(x) and x sin(x) .
dx11 x=0 dx12 x=0
Computing limits. In the next two problems, we will apply Taylor series to
compute some limits.
Example 3. Compute the limit

2
e−x − 1 + x2
lim .
x→0 x4
Solution: We cannot just plug in x = 0, because we will get 00 . Instead, we will need
to simplify first using Taylor series. As in the previous example, starting with the
xn 2
exponential series ex = ∑∞n=0 n! , we replace x by −x ,
2
∞
(−x2 )n ∞
(−1)n 2n x4
e−x = ∑ n! = ∑ x = 1 − x 2
+ +... .
n=0 n=0 n! 2
If we move 1 − x2 to the left hand side, we get
2 x4
e−x − 1 + x2 = +... .
2
The dots . . . have powers at least x5 (actually, in this case, at least x6 ), so after we
divide both sides by x4 , we get
2
e−x − 1 + x2 1
= +...
x4 2
x5
where the dots . . . now have at least one power of x, because x4
= x. When we let
x go to zero, all those . . . terms will disappear and so
2
e−x − 1 + x2 1
lim = .
x→0 x4 2
Exercise 3. Compute the limit
sin(x3 ) − x3
lim .
x→0 x9
Comparing functions near a point. In the last two problems we used that the
. . . terms in the Taylor series disappeared in the limit x → 0 as long they they had
at least one power of x. Next, we will use a similar idea to compare two functions
near x = 0. Next two problems will refer to the following figures.
Example 4. In the left figure above, which graph corresponds to:

(a) cos(x) (b) 1 − sin(x2 )
2 4 3 5
Solution: Recall that cos(x) = 1 − x2! + x4! − . . . and sin(x) = x − x3! + x5! − . . . and,
replacing x by x2 in sin(x),
x6 x10 x6 x10
sin(x2 ) = x2 − + −... and 1 − sin(x2 ) = 1 − x2 + − +... .
3! 5! 3! 5!
Comparing cos(x) and 1 − sin(x2 ) is equivalent to comparing
x2 x4 x6 x10
1− + −... and 1 − x2 + − +... .
2! 4! 3! 5!
First, we can cancel 1 on both sides, so we need to compare
x2 x4 x6 x10
− + −... and − x2 + − +... .
2! 4! 3! 5!
Then we can divide both sides by x2 and compare
1 x2 x4 x8
− + −... and −1+ − +... .
2! 4! 3! 5!
Near x = 0, the terms that have at least one power of x will get smaller and smaller,
so near 0 the main contribution is − 12 on the left hand side and −1 on the right
hand side. Since − 21 > −1, the left hand side is bigger near x = 0. As a result, we
conclude that cos(x) > 1 − sin(x2 ) near x = 0, so the blue solid graph corresponds
to cos(x) and red dashed graph corresponds to 1 − sin(x2 ).
Exercise 4. In the right figure above, which graph corresponds to:

√
(a) 1 − cos(x) (b) 1 + x2 − 1
√
Hint: find three terms of the Taylor series for 1 + x centered at x = 0 first.
Computing integrals. In Chapter 3 we have used Taylor polynomials to ap-

proximate integrals. Here we will compute some integrals exactly by representing
them as a series. The main fact to remember: we can integrate Taylor series term
by term inside the interval of convergence.
(−1) n+1
Example 5. Using the series ln(1 + x) = ∑∞
n=1 n xn , compute the integral
Z 1.5
ln(x) dx.
1
Solution: The integral can actually be computed using integration by parts, but here
we will try to use Taylor series. The function we integrate is ln(x), while the series
is for ln(1 + x). We can either change variables in the integral or in the series, so
let us make the substitution x = 1 + t, dx = dt in the integral and rewrite it as
Z 1.5 Z 0.5
ln(x) dx = ln(1 + t) dt.
1 0
Recall that the radius of convergence of the above series is R = 1 and the center is
1, so the interval of integration [0, 0.5] is inside the interval of convergence and we
can integrate term by term:
(−1)n+1 n
Z 0.5 Z 0.5 ∞
ln(1 + t) dt = ∑ t dt
0 0 n=1 n
(−1)n+1
∞ Z 0.5
= ∑ n t n dt
n=1 0
∞
(−1)n+1 t n+1 t=0.5
= ∑ ·
n=1 n n+1 t=0
∞
(−1)n+1 0.5n+1
= ∑ ·
n=1 n n+1
∞
(−1)n+1 0.5n+1
= ∑
n=1 n(n + 1)
0.52 0.53 0.54
= − + −....
2 6 12
For example, if we sum the first three terms written above, we get 0.109375, while
the actual integral is 0.108198.
Exercise 5. Compute the following integral by representing it as a series:

Z x
sin(t)
dt.
0 t
Shape of graphs. Let us recall how the coefficients in front of the powers (x−a)
and (x − a)2 in the Taylor series correspond to the properties of the graph of a
function y = f (x).
Exercise 6. Which function among above four figures has the Taylor series
1
f (x) = (x − 4) + (x − 4)2 + . . . .
2
(a) (b)
(c) (d)
Answer to Exercise 1. The center of the series is a = −4, so its coefficients allow
us to compute derivatives at x = −4 using the formula f (n) (−4) = n!cn .
(a) True. Notice that the series starts with index n = 20. In other words, the
lowest power is (x + 4)20 . Since there is no term (x + 4)19 , the coefficient in front
of it is zero, c19 = 0, so the derivative f (19) (−4) = 0.
(b) False. Using the above formula,
522 522
f (22) (−4) = 22!c22 = 22! √ = 22! = 22!521 .
22 + 3 5
(c) True. It matches what we computed in part (b).
(d) False.
525 525
f (25) (−4) = 25!c25 = 25! √ = 25! √ ̸= 525 .
25 + 3 28
Answer to Exercise 2. Multiplying the series for sin(x) by x,

∞
(−1)n 2n+1 ∞
(−1)n 2n+2
x sin(x) = x ∑ x =∑ x .
n=0 (2n + 1)! n=0 (2n + 1)!
d 11
Because the powers 2n + 2 are always even, dx11
x sin(x) = 0.
x=0
d 12
To find dx12
x sin(x)|x=0 , we need to find the coefficient c12 in front of x12 . This
(−1) 5
1
happens when 2n + 2 = 12, or n = 5, so the coefficient is c12 = (2·5+1)! = − 11! .
The the derivative is
d 12 1
x sin(x) x=0
= 12!c 12 = 12! − = −12.
dx12 11!
Answer to Exercise 3. Plugging x3 into the series for sin(x):

∞
(−1)n ∞
(−1)n x9
sin(x3 ) = ∑ (2n + 1)! (x3 )2n+1 ∑ (2n + 1)! x6n+3 = x3 − 3! + . . .
n=0 n=0
where the dots . . . have at least 10 (actually 15) powers of n. Subtracting x3 and
dividing by x9 , we get
sin(x3 ) − x3 1
9
= − +...
x 3!
where the dots have at least one power of x. When x goes to zero, those terms
disappear and we get
sin(x3 ) − x3 1 1
lim 9
=− =− .
x→0 x 3! 6
Answer to Exercise 4. From the Taylor series for cos(x), we know that
x2 x4 x2 x4
1 − cos(x) = − +... = − +....
2! 4! 2 24
√
Next, let us find the first three terms of the series for f (x) = 1 + x = (1 + x)1/2 .
We compute
1 1
f ′ (x) = (1 + x)−1/2 = √ ,
2 2 1+x
1 1 1
f ′′ (x) = − (1 + x)−3/2 = − ,
2 2 4(1 + x)3/2
so f (0) = 1, f ′ (0) = 1
2 and f ′′ (0) = − 14 . This gives the first three terms of the
Taylor series:
√ x x2
1+x = 1+ − +....
2 8
2
Plugging in x and then subtracting 1 gives:
p x2 x4 x2 x4
1 + x2 − 1 = 1 + − + . . . − 1 = − + . . . .
2 8 2 8
√
So, comparing 1 − cos(x) and 1 + x2 − 1 is equivalent to comparing
x2 x4 x2 x4
− +... and − +....
2 24 2 8
x2
Cancelling 2 on both sides and then dividing by x4 leads to comparing
1 1
− +... and − +....
24 8
On both sides the dots . . . contain at least one power of x, so they become
√negligible
1
near x = 0, so we compare − 24 > − 18 . This means that 1 − cos(x) > 1 + x2 − 1
√ blue solid graph corresponds to 1 − cos(x) and red dashed graph
near x = 0, so the
corresponds to 1 + x2 − 1.
Answer to Exercise 5. Since all the terms in the Taylor series for sin(t) have at
least one power of t, we can divide the series by t to represent
sin(t) 1 ∞ (−1)n 2n+1 ∞

(−1)n 2n
= ∑ t =∑ t .
t t n=0 (2n + 1)! n=0 (2n + 1)!
The series for sin(t) has the radius of convergence R = ∞, so we are allowed to
integrate it over any interval, term by term. Then
(−1)n
Z x Z x ∞
sin(t)
0 t
dt =
0 n=0
∑ (2n + 1)! t 2n dt
(−1)n
∞ Z x
= ∑ (2n + 1)! t 2n dt
n=0 0
∞
(−1)n t 2n+1 t=x
= ∑ (2n + 1)! · 2n + 1 t=0
n=0
∞
(−1)n x2n+1
= ∑ (2n + 1)! · 2n + 1
n=0
∞
(−1)n x2n+1
= ∑ (2n + 1)!(2n + 1) .
n=0
In this case, the integral can not be computed by finding the antiderivative, so using
a series representation is a great alternative.
Answer to Exercise 6. The coefficient 1 in front of (x − 4) tells us that f ′ (4) = 1,

so the slope is 1, and coefficient 21 in front of (x−4)2 tells us that f ′′ (4) = 2!· 12 = 1,
so the function is concave up. Since there is no free constant c0 , we also know that
f (4) = 0.
We can eliminate (b) because it is concave down at x = 4. We can eliminate (d)
because it is not equal to 0 at x = 4. Finally, to choose between (a) and (c), we
observe that the slope at x = 4 in (c) looks closer to 2 than 1, so the answer is (a).

Lectures Compressed (3985) MAT135

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lectures Compressed (3985) MAT135

Uploaded by

Copyright:

Available Formats

Dmitry Panchenko

4 Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

Exercise 1. By looking at the table of values, is the function T = f (t) increasing

Example 2. Suppose we have a 10′′ × 10′′ card-

Answer to Exercise 1. It is increasing and the gaps between consecutive values

1.2 Linear functions

Exercise 1. Suppose that during photosynthesis at temperature 10°C in direct sun-

Indeed, if x = x0 then y = y0 + m · (x0 − x0 ) = y0 + m · 0 = y0 . It is very important to

• Enter x-values and y-values into two

Answer to Exercise 1. y = 30t µmol. Slope 30 µmol/h is the rate of photosynthe-

Answer to Exercise 4. Yes, y = 2x + 1.

Answer to Exercise 5. y = 2.01x + 0.914.

1.3 Exponential functions

ab ac = ab+c , (ab )c = abc , (ab)c = ac bc ,

Example 1. Let us check that 2−x = 0.5x .

• a > 0 is a positive constant called (exponential) base.

y = 0.5 · 2x , h = 3 · 0.5−t , P = 100 · 2−t , z = 2 · 3−t/2 ,

are all possible examples of exponential functions. Typical examples of processes

Exponential growth: a > 1. When the exponential base a is bigger than 1,

Example 2. In the figure we see

Solution: (a) The constant a is the

Exercise 2. Sketch the graphs of functions y = 2 · 1.2x , y = 3 · 1.1x and y = 2 · 0.8x

Rate of growth/decay. Given an exponential function P = P0 · at , let us express

Example 3. Find the exponential growth/decay rate of the functions y = 2 · 1.1x

Continuous rate of growth/decay. Recall Euler’s number e = 2.718281828 . . .

It means that, given an exponential function P = P0 · at , we can always express the

Exercise 6. Find the continuous growth/decay rate of the functions y = 2 · 1.02x

The reason why in Calculus we prefer to write

Half-life. Many exponential decay processes are described by their half-life,

Example 7. Aspirin has a half-life of 20 minutes in a patient’s body (once absorbed

Exercise 7. If it takes 60 minutes for 100mg of a drug to be reduced to 30mg in a

Key property of exponentials. The key property of an exponential function is

P(t + h) = P0 · at+h = P0 · at · ah = ah · P(t).

So the factor P(t+h) h

Example 8. Find the exponential function

Solution: Since 3 = y0 · a2 and 7 = y0 · a5 ,

Example 9. Do the values in the table correspond to some exponential function?

Exercise 9. Do the values in the table correspond to some exponential function?

Average exponential growth. If the rate of growth constantly changes, how do

Example 10. The world population

Double exponentials. Functions of the form

where a > 0, b > 0 and c > 0 are

Answer to Exercise 3. y = 1.5 · (1.01)x , y = 2 · (0.96)x .

Answer to Exercise 4. y = y0 ·(0.975)t so y(100) = y0 ·(0.975)100 = y0 ·0.0795 . . .,

Answer to Exercise 6. Continuous growth rate of y = 2 · 1.02x is κ = ln(1.02) =

Answer to Exercise 7. 30 = 100·2−60/H , and solving for H we get H = − 60 ln(2)

Answer to Exercise 8. y = 53/2 · 5−t/4 = 53/2−t/4 .

1.4 Logarithmic functions

If base 10 in the exponent is replaced by Euler’s number e = 2.71828 . . . ,

• The graph of ln(x) is graph of ex

Example 1. Can the values in the following table correspond to an exponential or

Example 2. Find x such that 42x = 7 · 5−x/4 .

x  ln(5)  8 ln(4) + ln(5)

Example 3. Simplify y = 3e−2 ln x .

Example 4. Suppose that the temperature T = T (t) of a cup of coffee is initially

where A is the maximum amplitude (measured in millimetres) recorded on a stan-

A = A0 · 10R = 0.001 · 10R = 10R−3 .

Example 5. How much stronger is the

Exercise 5. How much stronger is the earthquake of magnitude 7.1 compared to

In chemistry, pH scale is used to measure the acidity or alkalinity of a solution

where H + is the number of moles of hydrogen ions per litre of solution.

which gives that p1 = p0 · 10110/20 and

Answer to Exercise 2. Set 70 = 20 + 60 · e−κt , so e−κt = 65 , and taking logarithms,

x ln(5) 8 ln(4) + ln(5)