This action might not be possible to undo. Are you sure you want to continue?
UNIVERSITY PRESS OF FLORIDA
Florida A&M University, Tallahassee
Florida Atlantic University, Boca Raton
Florida Gulf Coast University, Ft. Myers
Florida International University, Miami
Florida State University, Tallahassee
New College of Florida, Sarasota
University of Central Florida, Orlando
University of Florida, Gainesville
University of North Florida, Jacksonville
University of South Florida, Tampa
University of West Florida, Pensacola
Orange Grove Texts Plus
Concepts in Calculus I
Beta Version
Miklos Bona and Sergei Shabanov
University of Florida Department of
Mathematics
University Press of Florida
Gainesville • Tallahassee • Tampa • Boca Raton
Pensacola • Orlando • Miami • Jacksonville • Ft. Myers • Sarasota
Copyright 2011 by the University of Florida Board of Trustees on behalf of the University of
Florida Department of Mathematics
This work is licensed under a modiﬁed Creative Commons AttributionNoncommercialNo
Derivative Works 3.0 Unported License. To view a copy of this license, visit http://
creativecommons.org/licenses/byncnd/3.0/. You are free to electronically copy, distribute, and
transmit this work if you attribute authorship. However, all printing rights are reserved by the
University Press of Florida (http://www.upf.com). Please contact UPF for information about
how to obtain copies of the work for print distribution. You must attribute the work in the
manner speciﬁed by the author or licensor (but not in any way that suggests that they endorse
you or your use of the work). For any reuse or distribution, you must make clear to others the
license terms of this work. Any of the above conditions can be waived if you get permission from
the University Press of Florida. Nothing in this license impairs or restricts the author’s moral
rights.
ISBN 9781616101558
Orange Grove Texts Plus is an imprint of the University Press of Florida, which is the scholarly
publishing agency for the State University System of Florida, comprising Florida A&M
University, Florida Atlantic University, Florida Gulf Coast University, Florida International
University, Florida State University, New College of Florida, University of Central Florida,
University of Florida, University of North Florida, University of South Florida, and University of
West Florida.
University Press of Florida
15 Northwest 15th Street
Gainesville, FL 326112079
http://www.upf.com
Contents
Chapter 1. Functions 1
1. Functions 1
2. Classes of Functions 2
3. Operations on Functions 4
4. Viewing the Graphs of Functions 6
5. Inverse Functions 8
6. The Velocity Problem and the Tangent Problem 11
Chapter 2. Limits and Derivatives 15
7. The Limit of a Function 15
8. Limit Laws 21
9. Continuous Functions 25
10. Limits at Inﬁnity 29
11. Derivatives 34
12. The Derivative as a Function 36
Chapter 3. Rules of Differentiation 41
13. Derivatives of Polynomial and Exponential Functions 41
14. The Product and Quotient Rules 44
15. Derivatives of Trigonometric Functions 47
16. The Chain Rule 49
17. Implicit Diﬀerentiation 53
18. Derivatives of Logarithmic Functions 55
19. Applications of Rates of Change 58
20. Related Rates 61
21. Linear Approximations and Diﬀerentials 65
Chapter 4. Applications of Differentiation 71
22. Minimum and Maximum Values 71
23. The Mean Value Theorem 76
24. The First and Second Derivative Tests 79
25. Taylor Polynomials and the Local Behavior of a Function 84
26. L’Hospital’s Rule 88
27. Analyzing the Shape of a Graph 92
28. Optimization Problems 97
vi CONTENTS
29. Newton’s Method 101
30. Antiderivatives 106
Chapter 5. Integration 111
31. Areas and Distances 111
32. The Deﬁnite Integral 115
33. The Fundamental Theorem of Calculus 120
34. Indeﬁnite Integrals and the Net Change 124
35. The Substitution Rule 128
CHAPTER 1
Functions
1. Functions
A function f is a rule that associates to each element x in a set D
a unique element f(x) of another set R. Here the set D is called the
domain of f, while the set R is called the range of f. The fact that f
associates to each element of D an element of R is represented by the
symbol f : D → R. Instead of saying that f associates f(x) to x, we
often say that f sends x to f(x), which is shorter.
If the sets mentioned in the previous deﬁnition are sets of numbers,
then it is often easier to describe f by an algebraic expression. Let N
be the set of all natural numbers (which are the nonnegative integers).
Then the function f : N → N given by the rule f(x) = 2x + 3 is
the function that sends each nonnegative integer n to the nonnegative
integer 2n+3. For instance, it sends 0 to 3, 1 to 5, 17 to 37, and so on.
In this case, the algebraic description is simpler than actually saying
“f is the function that sends n to 2n + 3.”
The rule that describes f may be simple or complicated. It could
be that a function is deﬁned by cases such as
f(x) =
⎧
⎨
⎩
0.1x if 0 ≤ x ≤ 40,
4 + 0.15(x −40) if 40 < x ≤ 80,
10 + 0.2(x −80) if x > 80.
This example could describe an income tax code. The ﬁrst $40,000
of income is taxed at a rate of 10%, income above $40,000 but below
$60,000 is taxed at a rate of 15%, and income above $80,000 is taxed
at a rate of 20%. The value of f(x) is the amount of tax to be paid
after an income of x thousand dollars for any positive real number x.
There are times when the rules that apply in various cases are
closely connected to each other. A classic example is the absolute value
function, that is,
f(x) = x =
x if 0 ≤ x,
−x if x < 0.
1
2 1. FUNCTIONS
In this case, f(x) = f(−x) for all x. When that happens, we say that f
is an even function. For instance, g(x) = cos x and h(x) = x
2
are even
functions. There are also functions for which −f(x) = f(−x) holds for
all x. Then we say that f is an odd function. Examples of odd functions
include g(x) = sin x and h(x) = x
3
.
There are times when a plain English description of a function is
simpler than an algebraic one. For instance, “let g be the function
that sends each integer that is at least 2 into its largest prime divisor”
is simpler than describing that function with algebraic symbols (and
symbols of formal logic). If the sets D and R are not sets of numbers,
an algebraic description may not even be possible. An example of this
is when D and R are both sets of people and f(x) is the biological
father of person x. Note that it is not by accident that we said that
f(x) is the father (and not the son) of x. Indeed, a function must send
x to a unique f(x). While a person has only one biological father, he
or she may have several sons.
Sometimes the rule that sends x to f(x) can only be given by listing
the value of f(x) for each x, as opposed to a general rule. For instance,
let D be the set of 200 speciﬁc cities in the United States, let R be the
set of all nonnegative real numbers, and for a city x, let f(x) be the
amount of precipitation that x had in 2009. Then f is a function since
it sends each x ∈ D into an element of R. This function is given by its
list of values, not by a rule that would specify how to compute f(x) if
given x.
Finally, functions can also be represented by their graphs. If
f : D → R is a function, then let us consider a twodimensional coor
dinate system such that the horizontal axis corresponds to elements of
D, and the vertical axis corresponds to elements of R. The graph of f
is the set of all points with coordinates (x, f(x)) such that x ∈ D. The
requirement that f(x) is unique for each x will ensure that no vertical
line intersects the graph of f more than once. This is called the vertical
line test.
2. Classes of Functions
2.1. Power Functions. A power function is a function f given by the
rule f(x) = x
a
, where a is a ﬁxed real number. Note that x
−a
= 1/x
a
,
so, for instance, x
−3
= 1/x
3
. The special case of a = −1, that is, the
function f(x) = 1/x, is called the reciprocal function. Note that the
rule g(x) = 1 for all real numbers x also deﬁnes a power function, one
2. CLASSES OF FUNCTIONS 3
in which a = 0. If a = 1/n, where n is a positive integer, then the
power function f given by the rule
f(x) = x
a
= x
1/n
=
n
√
a
is also called a root function.
2.2. Polynomials. A polynomial function is the sum of a ﬁnite number
of constant multiples of power functions with nonnegative integer ex
ponents, such as the function f given by the rule f(x) = 3x
4
+ 2x
2
+
7x − 5. The domain of these functions is the set of all real numbers.
The largest exponent that is present in a polynomial function is called
the degree of the polynomial. So the degree of f in the last example is
4. The real numbers that multiply the power functions in a polynomial
are called the coeﬃcients of the polynomial. In the last example, they
are 3, 2, 7, and −5.
Some subclasses of polynomial functions have their own names as
follows:
• Polynomials of degree 0, such as f(x) = 6, are called constant
functions.
• Polynomials of degree 1, such as g(x) = 3x − 2, are called
linear functions.
• Polynomials of degree 2, such as h(x) = x
2
− 4x − 21, are
called quadratic functions.
• Polynomials of degree 3, such as p(x) = x
3
−x
2
+ 6x −2, are
called cubic functions.
2.3. Rational Functions. A rational function is the ratio of two polyno
mial functions such as
R(x) =
3x
2
+ 4x −7
x
3
−8
.
The domain of a rational function is the set of all real numbers, except
for the numbers that make the polynomial in the denominator 0. In
the preceding example, the only such number is x = 2.
2.4. Trigonometric Functions. Periodicity. The reader has surely en
countered the trigonometric functions sin, cos, tan, cot, sec, and csc in
earlier courses. We will discuss these functions, and their inverses, later
in the text. For now, we mention one of their interesting properties,
their periodicity. A function f is called periodic with period T > 0 if
f(x) = f(x + T) for all x and T is the smallest positive real number
with this property.
4 1. FUNCTIONS
For example, sin and cos are both periodic with period 2π, and
tan and cot are periodic with period π. The reader will be asked in
Exercise 2.7(1) about the periodicity of sec and csc.
2.5. Algebraic Functions. An algebraic function is a function that con
tains only addition, subtraction, multiplication, division, and taking
roots. For instance, power functions with integer exponents are al
gebraic functions, since they only use multiplication, though possibly
many times. Therefore, polynomials are algebraic functions as well since
they are sums of constant multiples of power functions. This implies
that rational functions are also algebraic since they are obtained by
dividing a polynomial (also an algebraic function) by another one.
The preceding list did not contain all algebraic functions since it
did not contain any functions in which roots were involved. So we get
additional examples if we include roots, such as the functions given by
the rules f(x) =
√
x + 3, g(x) =
3
√
x, h(x) =
(x + 1)/(x −1).
2.6. Transcendental Functions. Functions that are not algebraic are
called transcendental functions. These include trigonometric functions
and their inverses, exponential functions, which are functions that con
tain a variable in the exponent, such as f(x) = 2
x
, and their inverses,
which are called logarithmic functions. We will discuss these functions
in later sections of this chapter. There are many additional examples,
which do not have their own names.
2.7. Exercises.
(1) Are sec and csc periodic functions? If yes, what is their period?
(2) Let f(x) = x
2/3
. Is f an algebraic function?
3. Operations on Functions
3.1. Transformations of a Function. We have seen the basic mathemat
ical functions and their graphs in the last section. In this section, we
will look at their transformations.
It is easy to see what happens to the graph of a function if we
increase or decrease each value of a function by a constant. Indeed, the
graph of the function g given by g(x) = f(x) +5 for all x is simply the
graph of the function f translated by ﬁve units to the north. Similarly,
the graph of the function h given by h(x) = f(x) −7 is the graph of f
translated by seven units to the south.
Horizontal translations are a little bit trickier. The reader is invited
to verify that if g is the function given by g(x) = f(x − 2), then the
graph of g is the graph of f translated by two units to the east, that is,
3. OPERATIONS ON FUNCTIONS 5
in the positive direction. Indeed, we must substitute a larger number
into g to get the same value as from f. For instance, g(8) = f(6).
Similarly, if h is the function given by h(x) = f(x + 3) for all x, then
the graph of h is the graph of f translated by three units to the west,
that is, in the negative direction.
The eﬀect of multiplication and division on functions can be de
scribed similarly. If f is a function and g is the function given by
f(x) = c · g(x), where c > 1 is a real number, then the graph of g
is simply the graph of f “stretched” vertically by a factor of c. That
is, each point on the graph of g is c times as far away from the hor
izontal axis as the corresponding point on the graph of f. It goes
without saying that dividing by c > 1 has the opposite eﬀect. That is,
if h(x) = f(x)/c, then the graph of h is a vertically compressed version
of the graph of f. In other words, each point on the graph of h is c
times as close to the horizontal axis as the corresponding point on the
graph of f.
At this point, the reader should stop and think about what happens
if c < −1 is a negative constant. As the reader probably ﬁgured out, the
stretching or compressing eﬀect will not change (it will only depend on
c), but each point on the graph will be reﬂected through the horizontal
axis.
Horizontal transformations involving multiplication and division are
similar to their counterparts involving addition and subtraction in that
their eﬀect is the opposite of what one might think at ﬁrst. If c > 1
and g is the function obtained from f by the rule g(x) = f(cx), then
the graph of g is the graph of f compressed horizontally by a factor
of c. That is, each point on the graph of g is c times as close to the
vertical line as the corresponding point on the graph of f. In other
words, if (x, y) is a point on the graph of f, then (x/c, y) is a point on
the graph of g. On the other hand, if h is obtained from g by the rule
h(x) = f(x/c), then the graph of h is a horizontally stretched version
of the graph of f. That is, each point on the graph of h is c times
as far from the vertical axis as the corresponding point on the graph
of f. So if (x, y) is a point on the graph of f, then (cx, y) is a point on
the graph of h. Again, the reader should stop for a minute and think
about the graphs of the functions f(cx) and f(x/c) when c < −1 is a
negative constant.
3.2. Combining Two Functions. If f and g are two functions, then their
sum, diﬀerence, and product are deﬁned wherever both f and g are
deﬁned. That is, the domain of f +g, f −g, and fg is the intersection
of the domains of f and g. Furthermore, (f + g)(x) = f(x) + g(x),
6 1. FUNCTIONS
(f −g)(x) = f(x)−g(x), and (fg)(x) = f(x)g(x). We have to be just a
little bit more careful with f/g, since this function is not deﬁned when
g(x) = 0, even if x is in the domain of both f and g. So the domain of
f/g is the intersection of the domain of f and the domain of g, with
the exception of the points x satisfying g(x) = 0. For each point of this
domain, (f/g)(x) = f(x)/g(x).
If the range of f is part of the domain of g, then we can compose f
and g by ﬁrst applying f and then g. The function we obtain in this
way sends x to g(f(x)) and is called the composition of f and g. It is
denoted by g ◦ f. Note that in g ◦ f, ﬁrst f, and then g is applied.
Example 1. Let R be the set of all real numbers. If f and g are
both functions from R to R and f(x) = x
2
and g(x) = x + 1, then
(g ◦ f)(x) = g(f(x)) = x
2
+ 1,
while
(f ◦ g)(x) = f(g(x)) = (x + 1)
2
= x
2
+ 2x + 1.
Note that f ◦ g and g ◦ f are, in general, diﬀerent functions.
3.3. Exercises.
(1) Sketch the graph of f(x) = x
2
, g(x) = (x − 3)
2
, and h(x) =
(2x + 5)
2
.
(2) Sketch the graph of f(x) = cos 2x, g(x) = sin(x − 2), and
h(x) = 3 tan x.
(3) Show examples for f and g when g ◦ f is deﬁned for all real
numbers, but f ◦ g is not.
(4) Show examples when f ◦ g = g ◦ f.
4. Viewing the Graphs of Functions
The graph of a function f is the set {(x, f(x))x ∈ D(f)}. It is
a good way of visually describing what a function does. Today, we
have plenty of advanced tools, such as computer software packages and
graphing calculators, to study the graph of functions. In this section,
we point out a few of the common mistakes in using these tools.
In order to facilitate the discussion, let us agree on some terminol
ogy. If the domain of f contains an interval I and for all real numbers
x and x
in I, it is true that x < x
implies f(x) < f(x
), then we say
that f is increasing on I. Visually, this means that the graph of f goes
roughly from the southwest to the northeast while x ∈ I. Similarly,
if, for all real numbers x and x
in I, it is true that x < x
implies
f(x) > f(x
), then we say that f is decreasing on I. In terms of the
4. VIEWING THE GRAPHS OF FUNCTIONS 7
graph of f, this means that the graph goes roughly from the northwest
to the southeast.
If we simply ask a computer or graphing calculator to plot the
graph of a function without specifying the interval [x
1
, x
2
] in which
the value of x can range, we may get an error message, or the com
puter may simply substitute default values for x
1
and x
2
. For example,
the software package Maple 13 uses the default values x
1
= −10 and
x
2
= 10. The interval [x
1
, x
2
] is often called the viewing window.
We have to be careful, however, since not all viewing windows are
appropriate for all functions, and choosing an inappropriate viewing
window may cause misleading results.
For functions like f(x) = x, g(x) = x, or h(x) = x
2
+ 3, the view
ing window [−10, 10] is appropriate as the behavior of these functions
outside that window is similar to their behavior inside the window.
Now let f(x) = (x + 10)
2
. In this case, using the viewing window
[−10, 10], we get the graph of an increasing function. That is misleading
since f is decreasing on the interval (−∞, −10]. So, in this case, a
viewing window that starts at a point x
1
< −10 is necessary.
This problem becomes more diﬃcult if we are dealing with functions
that change from increasing to decreasing many times, perhaps in an
irregular fashion and perhaps far away from the origin. For this reason,
it is worth noting that if f is a polynomial function of degree n, then it
cannot change directions more than n −1 times. If we found all n −1
direction changes, then we can be sure that we did not miss any of
them. We will return to this topic in a later chapter, when we discuss
the derivative of a function.
The preceding example showed why selecting a viewing window
that is too small can be misleading. The next example shows why a
viewing window that is too large can also mislead us. Plot the graph
of the function g(x) = 4x
3
+ 9x
2
+ 6x + 1. Using the default viewing
window [−10, 10], or some window containing that one, many software
packages will show a graph that increases everywhere and disappears in
a small interval to the left of 0. This should raise our suspicion that the
program does not properly display the graph of g around 0. Indeed,
g is deﬁned for all real numbers, so its graph should not disappear
anywhere. Taking a closer look, that is, changing the viewing window
to [−1, 1], we see a function that is actually decreasing between x = −1
and x = −1/2.
Trigonometric functions, with their periodicity, are particularly
good examples to demonstrate what software package can and cannot
do. The reader is encouraged to plot the graph of the functions sin x,
8 1. FUNCTIONS
cos 2x, tan(x/4), and, ﬁnally, sin(1/x) and explain the obtained graphs.
In particular, the reader should try to explain why, for sin(1/x), the
choice of the viewing window is not important as long as it contains
x = 0.
5. Inverse Functions
The inverse f
−1
of a function f : A → B “undoes” what f did.
That is, if f(x) = y, then f
−1
(y) = x, so f sends x to y, while f
1
sends y back to x. It goes without saying that this f
−1
will only be
a function if f
−1
(y) is unambiguous, that is, when there is only one
x ∈ A so that f(x) = y. In that case, and only in that case, it is clear
that f
−1
(y) = x.
Let us now formalize these concepts.
Definition 1. A function f : A → B is called onetoone if it sends
diﬀerent elements into diﬀerent elements, that is, if x = x
implies that
f(x) = f(x
).
Onetoone functions are also called injective functions or injections.
Visually, no horizontal line can intersect the graph of a onetoone
function more than once.
For instance, if A and B are both the set of real numbers, then
f(x) = x and g(x) = x
3
are both onetoone, but h(x) = x
2
is not.
Definition 2. Let f be a onetoone function with domain A and
range B. Then the inverse of f is the function f
−1
: B → A given by
f
−1
(y) = x if f(x) = y.
Example 2. Let A and B both be the set of all real numbers. Let
f : A → B be given by f(x) = 2x + 7. Then f
−1
(y) = (y −7)/2.
Solution: If f(x) = y, then y = 2x + 7, so y − 7 = 2x and
(y − 7)/2 = x. As x = f
−1
(y), it follows that f
−1
(y) = (y − 7)/2.
2
The preceding example shows a general strategy for ﬁnding the in
verse of a function. Write the equation f(x) = y, with the appropriate
algebraic expression replacing f(x). Then solve for x. If there is more
than one solution, then f is not onetoone, and so it has no inverse
function. If there is one solution, then that expression is the value of
f
−1
(y).
Example 3. If A is the set of positive real numbers, B is the set
of real numbers that are larger than 1, and f : A → B is given by
f(x) = x
2
+ 1, then f
−1
(y) =
√
y −1.
5. INVERSE FUNCTIONS 9
Solution: We have f(x) = x
2
+1 = y. So x
2
= y −1, and because we
know that x is positive and y > 1, we can take the square root of both
sides, leading to x =
√
y −1. Hence, f
−1
(y) =
√
y −1. 2
Finally, we point out that if f is a onetoone function with domain
A and range B, then f
−1
◦ f is the identity function of A and f ◦ f
−1
is the identity function of B.
For instance, using the functions of Example 3, for all positive real
numbers x, the identity (f
−1
◦ f)(x) =
(x
2
+ 1) −1 =
√
x
2
= x
holds, and for all y > 1, the identity (f ◦ f
−1
)(y) = (
√
y −1)
2
+ 1 =
y − 1 + 1 = y holds.
5.1. Logarithmic Functions. If a function contains only additions, sub
tractions, multiplications, and divisions, then its inverse is often easy
to compute. Power functions, that is, functions of the form f(x) = x
α
,
where α is a real number, are not much more diﬃcult. However, what
is the inverse of an exponential function?
Let f(x) = 2
x
. It is easy to see, by plotting the graph of f or
otherwise, that f is a onetoone function whose domain is the set of
all real numbers and whose range is the set of all positive real numbers.
So the inverse of f is a function from the set of positive reals to the set
of all reals. But what is that inverse function f
−1
? By the deﬁnition
of inverse functions in general, this is the function that sends 2
x
to x
for all positive real numbers 2
x
. In particular, f
−1
(2) = 1, f
−1
(4) = 2,
f
−1
(32) = 5, and f
−1
(1/2) = −1. That is, f
−1
(y) tells us to what power
we have to raise 2 if the result is to be y. This important concept has
its own name.
Definition 3. Let m be a positive real number. Then the inverse
of the function f(x) = m
x
is called the logarithmic function with base
m, and is denoted by log
m
.
So if f(x) = x
m
= y, then log
m
(y) = x. For instance, log
2
(64) = 6,
log
3
(81) = 4, log
5
(1/25) = −2, and log
0.5
(16) = −4.
Logarithmic functions satisfy certain rules that are very similar to
those satisﬁed by exponential functions and can, in fact, be deduced
from them. These are
(I) log(xy) = log x + log y.
(II) log(x/y) = log x −log y.
(III) log (x
a
) = a log x.
(IV) log
b
√
x =
log x
b
.
(V) a
log
a
x
= x.
(VI) log
a
(a
x
) = x.
10 1. FUNCTIONS
The last two rules simply express the fact that the functions f(x) =
a
x
and f
−1
(y) = log
a
(y) are inverses of each other, so their composition
is an identity function.
If we know the logarithm of a number in a base and want to compute
it in another base, we can do so using the following theorem.
Theorem 1. For positive real numbers a, b, and x, we have
log
a
x =
log
b
(x)
log
b
(a)
.
Proof. Start with the identity
x = a
log
a
x
.
Now take the logarithm of base b of both sides to get
log
b
x = log
a
log
b
ax.
Now divide both sides by log
b
to get the identity of the theorem. 2
Example 4. We can use Theorem 1 to compute log
16
(256) from
log
2
(256) as follows:
log
16
(256) =
log
2
(256)
log
2
(16)
=
8
4
= 2.
So if a calculator or computer can provide the logarithm of all
positive real numbers in one base, we can compute the logarithm of any
positive real number in any base. For this reason, many calculators and
computers are programmed to work primarily with logarithms of one
given base, namely of base e, where e 2.718 is an irrational number
that will be formally deﬁned in Chapter 2.
The logarithm of base e is so important that it has its own name,
natural logarithm, and its own notation, ln. So ln x = log
e
x.
5.2. Inverses of Trigonometric Functions. Basic trigonometric functions,
such as sin, cos, and tan, are very important in calculus, so it is no sur
prise that their inverse functions are important as well. However, we
have to be precise when we deﬁne them since trigonometric functions
are not onetoone. In fact, they are periodical, of period 2π or π, and
so they take every value in their range inﬁnitely often.
In order to get around this diﬃculty, we will restrict our trigono
metric functions to just a short interval, in which they are onetoone,
and deﬁne their inverses based on that restriction.
For instance, consider sin as a function whose domain is [−π/2, π/2].
In that interval, sin is a onetoone function (since it is increasing),
and its range is the interval [−1, 1]. So its inverse is the function
6. THE VELOCITY PROBLEM AND THE TANGENT PROBLEM 11
sin
−1
: [−1, 1] → [−π/2, π/2]. That is, if y ∈ [−1, 1], then sin
−1
y
is the (only) x ∈ [−π/2, π/2] for which sin x = y. For instance,
sin
−1
(1/2) = π/6, while sin
−1
(0) = 0 and sin
−1
(
√
2/2) = π/4.
The inverses of the other trigonometric functions are deﬁned simi
larly, just the intervals to which we restrict the functions (in order to
make them onetoone) can change.
That is, cos
−1
is the inverse function of the cos function that is
restricted to the interval [0, π]. So cos
−1
is a function with domain
[−1, 1] and range [0, π]. Similarly, tan
−1
is the inverse function of the
tan function that is restricted to the interval (−π/2, π/2). Its domain
is the set of all real numbers, and its range is the interval (−π/2, π/2).
The inverse functions of cot, sec, and csc, while not used often, can
also be deﬁned analogously.
5.3. Exercises.
(1) Is there a function f deﬁned on all positive real numbers for
which f
−1
= f?
(2) If we are given log
a
x, how can we compute log
1/a
(x)?
(3) For which values of a is log
a
an increasing function, and for
which values of a is it a decreasing function?
(4) What is the geometric connection between the graphs of f and
f
−1
?
(5) Is it true that if g is the inverse function of the onetoone
function f, then g is onetoone?
6. The Velocity Problem and the Tangent Problem
6.1. The Velocity Problem. Let us assume that a car was on the road
from 3:00 p.m. to 5:00 p.m. on a given afternoon, and it traveled a
distance of 100 miles, all due west. From the data, it is easy to compute
the average speed of the car by the formula
(1.1) v =
s
t
,
where t is the time passed, s is the distance covered in time t, and v
is the average speed for the given time period. In physics, when the
direction in which an object is moving is taken into account, we talk
about velocity instead of speed, hence the abbreviation v. In the given
example, all travel was in one direction (west), so there is no danger
of confusion, and we can use either word. Let us assume that time is
measured in hours and distance is measure in miles.
12 1. FUNCTIONS
Then Equation (1.1) yields
v =
100 mi
2 hr
= 50
mi
hr
,
so the average velocity of the car for the given twohour period is 50
miles per hour.
The car probably did not cover the entire distance at its average
velocity. For various traﬃcrelated or other reasons, it sometimes may
have gone faster or slower. If we want to know its average velocity
for the time period between 4:00 p.m. and 4:10 p.m., then we need
know the distance it covered in that time period. If that distance is
10 miles, then we conclude that in that 10minute time period, the
average velocity of the car was
v =
10 mi
1/6 hr
= 60
mi
hr
.
If we want more precise information, like the average velocity of
the car between 4:02 p.m. and 4:05 p.m., we can proceed similarly,
decreasing the value of both the numerator and the denominator of
the fraction s/t. However, what if we want to know the instantaneous
velocity of the car in a given moment, such as exactly at 4:02:23 p.m.
(and not in the second that passed between 4:02:23 p.m. and 4:02:24
p.m.)? In that case, a direct application of Equation (1.1) is impossible,
because the denominator t is equal to 0. The numerator s is also equal
to 0, since the car needs time to cover any distance; if it is given no
time, it will cover no distance.
In this section, we will not give a completely formal answer to the
problem of deﬁning instantaneous velocity; we will leave that task to
an upcoming section. However, we will say the following. The instan
taneous velocity of a car in a given moment m can be approximated by
choosing smaller and smaller time periods containing m and computing
the average speed of the car for those time periods. These averages will
approximate the instantaneous velocity.
6.2. The Tangent Problem. The problem of ﬁnding the instantaneous
velocity of a moving object is simply a special case of a much more
general problem, that of ﬁnding the slope of a tangent line to a curve
at a given point.
In the previous problem, the distance the car covered can be viewed
as a function of the time that passed since the car started moving. So
s(t) is the distance covered from the moment when the car started
moving to the moment t hours later. In order to compute the average
6. THE VELOCITY PROBLEM AND THE TANGENT PROBLEM 13
velocity for the time period from t
1
to t
2
, we simply compute the value
of the fraction
s(t
2
) −s(t
1
)
t
2
−t
1
.
This fraction is precisely the slope of the line that intersects the graph
of the function s at points (t
1
, s(t
1
)) and (t
2
, s(t
2
)). If we choose t
1
and t
2
closer and closer together, then these points will get closer and
closer together as well. Finally, if we set t
1
= t
2
, then we will not
immediately know the slope of the line that touches the graph of s at
the point (t
1
, s(t
1
)) since we will know only one, not two, point of this
line. However, and this will be made more precise in the next section,
the slope we are looking for will be approximated by the sequence of
slopes of the lines that we got when we chose t
1
and t
2
closer and closer
together.
Finally, we point out that there is nothing magical about the func
tion s(t) here. We could consider any function f : R → R, and ask
what the slope of the tangent line to this curve is at the point (x, f(x)).
6.3. Exercises.
(1) A car travels one hour at a speed of 60 miles per hour, then
two hours at a speed of 45 miles per hour. What is the average
speed of the car during this threehour period?
(2) Consider the car of the previous exercise. What is its average
speed during the ﬁrst two hours of its trip?
(3) I drove at 40 miles per hour for two hours. How fast do I have
to drive in my third hour if I want to reach an average speed
of 45 miles per hour for my threehour drive?
(4) Consider the function f(x) = x
2
. Can you ﬁnd two points P
and Q on the graph of f such that the slope of the line PQ is
between 0 and 0.01?
(5) Consider the function g(x) = x
3
. Let P = (1, 1). Can you ﬁnd
a point Q on the graph of g such that the slope of the line PQ
is between 1 and 1.01?
CHAPTER 2
Limits and Derivatives
7. The Limit of a Function
7.1. TwoSided Limits. Consider the function given by the rule f(x) =
1/(1 + x). Let us compute the values of f(x) for various real numbers
x that are close to 0. We ﬁnd that
• f(1) = 1/2,
• f(1/2) = 2/3,
• f(1/3) = 3/4, and, in general,
• f(1/n) = n/(n + 1).
Similarly, for negative values of x, we get
• f(−1/2) = 2,
• f(−1/3) = 3/2,
• f(−1/4) = 4/3, and, in general,
• f(−1/n) = n/(n −1).
What we see is that if x gets close to 0 (from either side), then
f(x) gets close to f(0) = 1. In fact, we can get f(x) to be as close
to f(0) = 1 as we want; all we need to do is to choose x suﬃciently
close to 0. Indeed, looking at the previous examples, we conclude that
if 0 < x < 1/n, then n/(n + 1) < f(x) < 1, and if −1/n < x < 0, then
1 < f(x) < n/(n − 1). So for instance, if we want f(x) to be closer
than
1
1000
to 1, then any choice of x in the interval [0,
1
999
) or any choice
of x in the interval (−
1
1001
, 0] will work. That is, any choice of x in the
interval (−
1
1001
,
1
999
) will imply that f(x) −f(0) < 0.001.
This phenomenon, that is, the fact that there exists an interval such
that, for each real number in that interval, the value of f(x) is closer
to f(0) than a prescribed bound is so important in mathematics that
it has its own name.
Definition 4. Let f : R → R be a function and let a be a real
number. We say that the limit of f in a is the real number L if the
values of f(x) get arbitrarily close to L and stay arbitrarily close to L
when x is suitably close to a without being equal to a.
The fact that the limit of f in a is L is expressed by the notation
lim
x→a
f(x) = L.
15
16 2. LIMITS AND DERIVATIVES
So, if f is the starting example of this section, then lim
x→0
f(x) = 1.
Note that the deﬁnition of lim
x→a
f(x) requires that f(x) stay close
to L when x is close to a, regardless of which of x or a is larger. That
is, f(x) has to be close to L if x is a little bit less than a, and f(x) has
to be close to L if x is a little bit more than a, though f(x) does not
have to be close to L if x = a.
Several comments are in order. First, lim
x→a
g(x) does not always
exist.
Example 5. Let
g(x) =
1 if 0 ≤ x,
0 if x < 0.
Then the limit of g at a = 0 does not exist. Indeed, no matter how
small an interval I we take around the point a = 0, that interval I
will contain some positive and some negative real numbers. Hence, the
values of g(x) will sometimes equal and sometimes equal 0 for x ∈ I,
no matter how small I is. There is no number L such that both 0 and
1 are arbitrarily close to it– in fact there is no number such that both
0 and 1 are both closer than 0.5 to it. So lim
x→0
g(x) does not exist.
Second, if lim
x→0
f(x) exists, it is unique; that is, f cannot have
two diﬀerent limits at any given point a. Let us illustrate this using
the introductory example of this section, the function f(x) = 1/(1+x).
We have seen that lim
x→0
f(x) = 1. Indeed, we saw that the values
of f(x) can get arbitrarily close to 1 if the real numbers x are chosen
from a suitably small interval around 0. At this point, one could ask the
following question. If 1 satisﬁes the requirements to be the lim
x→0
f(x),
why does 1.0001 not? After all, what is close to 1 is also close to 1.0001.
In order to answer this question, we must have a good understand
ing of the deﬁnition of limits. That deﬁnition says that if lim
x→0
f(x) =
L, then the values of f(x) will get arbitrarily close to f(0) if x is chosen
from a suitably small interval around 0. The key word in the previous
sentence is arbitrarily. While 1.0001 is close to 1, it is not arbitrarily
close to 1; it is exactly 0.0001 away. And that is a problem, since we
have seen at the beginning of this chapter that, as x approaches 0, the
values of f(x) will get arbitrarily close to 1. In particular, if x is close
enough to 0, then f(x) will be closer than
1
10
6
to 1, but then it cannot
also be closer than
1
10
6
to 1.0001.
An analogous argument shows that no function can have two dif
ferent limits at any one point.
7. THE LIMIT OF A FUNCTION 17
Sometimes it can happen that h is not even deﬁned in a, but
lim
x→0
h(x) still exists. Note that the fact that h(a) is not deﬁned
is not a problem since the deﬁnition of limits speciﬁcally states that x
should not be equal to a anyway.
Example 6. Let h(x) = (x
2
− 9)/(x − 3). Then h is deﬁned for
all real numbers except x = 3. Still, lim
x→3
h(x) = 6. In particular,
lim
x→3
h(x) exists.
Solution: If x = 3, then
f(x) =
x
2
−9
x −3
=
(x + 3)(x −3)
x −3
= x + 3.
So if we want f(x) = x + 3 to be closer to 6 than a given distance a,
then all we have to do is to choose x such that x −3 < a. 2
At this point, the reader should test his or her understanding of the
material by ﬁnding lim
x→−2
((x
2
+ 3x + 2)/(x + 2)).
Sometimes, limits are not easy to determine. Plotting the graph of
the function h(x) = (sin x)/x, we are led to believe that
lim
x→0
sin x
x
= 1.
However, we have not yet learned the techniques to rigorously prove
this. Plotting the graph of the function or producing more numerical
data should not be considered as a complete answer, since, as x ap
proaches 0, eventually x and sin x will get so small that the computer
will no longer manipulate them, or their ratio, accurately.
Finally, we point out that in the deﬁnition of the limit, the require
ment that f(x) get close to L and stay close to L is important. Consider
the function f(x) = sin (1/x) around x = 0. As x approaches 0, the
value of 1/x will increase very fast, and so it will equal a multiple of π
many times. All those times, f(0) = 0 will hold, so f(x) will be as
close to 0 as possible. However, lim
x→0
f(x) does not exist, since f(x)
will take all other values in the interval [−1, 1] inﬁnitely often as well
as x approaches 0. So the value of f(x) will not stay arbitrarily close
to 0, no matter how close x is to 0.
7.2. The Precise Deﬁnition of Limits. It is time for us to give a precise
mathematical deﬁnition of limits. The advantage of this formal deﬁni
tion is that we can ﬁnally do away with the words arbitrarily close and
suﬃciently close. The price to pay for that is that we have to use more
notation.
18 2. LIMITS AND DERIVATIVES
Definition 5. Let f be a function deﬁned on some open interval
that contains the real number a, with the possible exception of a itself.
Then we say that the limit of f at a is L, denoted by lim
x→a
f(x) = L,
if, for all > 0, there exists δ > 0 such that if x − a < δ, then
f(x) −L < .
Example 7. We have lim
x→0
2x sin x = 0.
Solution: Let be any positive real number. Then let δ = /2. We
know that  sin x ≤ 1 for all x. So if x − 0 = x < δ = /2, then
f(x) −0 = f(x) = 2x sin x ≤ 2x < 2δ = , as required. 2
7.3. OneSided Limits. There are functions that behave in a certain
way up to a point a, and then behave very diﬀerently after that. We
have seen such a function in Example 5. The function g of that example
satisﬁed g(x) = 0 for negative values of x, and g(x) = 1 for positive
values of x. We have seen that lim
x→0
g(x) does not exist, since no real
number L is arbitrarily close to both 0 and 1.
Nevertheless, there are weaker, onesided notions of limits that are
relevant in this example.
Definition 6. Let f : R → R be a function and let a be a real
number. We say that the lefthand limit of f in a is the real number L
if the values of f(x) get arbitrarily close to L and stay arbitrarily close
to L when x is suitably close to a and x < a.
The fact that L is the lefthand limit of f in a is denoted by
lim
x→a
−
f(x) = L.
For instance, if g is the function deﬁned in Example 5, then
lim
x→0
−
g(x) = 0.
Indeed, if we choose x close to 0 but less than 0, then g(x) = 0, so g(x)
is arbitrarily close (in fact, equal) to 0.
Definition 7. Let f : R → R be a function and let a be a real
number. We say that the righthand limit of f in a is the real number
L if the values of f(x) get arbitrarily close to L and stay arbitrarily
close to L when x is suitably close to a and x > a.
The fact that L is the righthand limit of f in a is denoted by
lim
x→a
+
f(x) = L.
For instance, if g is the function deﬁned in Example 5, then
lim
x→0
+
g(x) = 1.
7. THE LIMIT OF A FUNCTION 19
Indeed, if we choose x close to 0 but more than 0, then g(x) = 1, so
g(x) is arbitrarily close (in fact, equal) to 1.
At this point, the reader should compare the deﬁnitions of limit,
lefthand limit, and righthand limit. The deﬁnition of limit (Deﬁni
tion 4) imposes the strongest requirements on the values of f. Indeed,
the values of f(x) have to be close to L when x is close to a and x < a
and also when x is close to a and x > a. The deﬁnitions of the left
hand and righthand limits impose weaker requirements in that each
deﬁnition only requires that f(x) be close to L when x is on a given
side of a and close to a.
It then follows—and the reader should spend a minute verify
ing it—that if lim
x→a
f(x) = L, then lim
x→a
− f(x) = L and lim
x→a
+
f(x) = L.
Conversely, if both the lefthand limit and the righthand limit of
f in a is equal to L, then the limit of f in a exists and is equal to L.
At this point, the reader should check his or her understanding of
the material by considering the function
h(x) =
x
x
as x approaches 0 and deciding if the limits lim
x→0
h(x), lim
x→0
− h(x),
and lim
x→0
+ h(x), exist.
7.4. Inﬁnite Limits. In our deﬁnitions of limits in this section, the limit
L was always a real number. In this section, we extend those deﬁnitions
to the cases of inﬁnite limits. If L = ∞, then the values of f have to
get arbitrarily close to ∞; that is, they have to get as large as we want.
This is the content of the following deﬁnition.
Definition 8. Let f : R → R be a function. We say that the limit
of f in a is ∞ if we can get f(x) arbitrarily large and keep it arbitrarily
large if we choose x suitably close to a without being equal to a.
Similarly, if g : R → R is a function, we say that the limit of g in a
is −∞ if we can make g(x) a negative number with an arbitrarily large
absolute value and keep g(x) that way if we choose x suitably close to a
without being equal to a.
The fact that the limit of f in a is ∞ is denoted by
lim
x→a
f(x) = ∞.
20 2. LIMITS AND DERIVATIVES
Example 8. Let f(x) = 1/x
2
. Then lim
x→0
f(x) = ∞.
Solution: If we want f(x) to be larger than an arbitrary positive
real number N, all we need to do is to choose x from the interval
(−
1/N,
1/N). Then x
2
< 1/N will hold, implying that f(x) =
1/x
2
> N. 2
Similarly, if g(x) = −1/x
4
, then lim
x→0
g(x) = −∞. Note that
if the limit of a function at a given point a is ∞ or −∞, then, as x
approaches a, the graph of the function will approach a vertical line
intersecting the horizontal axis at x = a. This phenomenon is referred
to by saying that f has a vertical asymptote at a.
7.4.1 The Precise Deﬁnition of Inﬁnite Limits. The formal deﬁnition
of inﬁnite limits is similar to that of ﬁnite limits. The diﬀerence lies in
the fact that it is not the same to be close to ∞ or to be close to a real
number.
Definition 9. Let f : R → R be a function. We say that the limit
of f in a is ∞ if, for all positive real numbers N, there exists > 0
such that if x −a < , then f(x) > N.
Similarly, let g : R → R be a function. We say that the limit of g
in a is −∞ if for all negative real numbers M, there exists > 0 such
that if x −a < , then g(x) < M.
7.4.2 OneSided Inﬁnite Limits. Onesided inﬁnite limits are deﬁned
in an analogous way, as we can see in the following deﬁnition.
Definition 10. Let f : R → R be a function and let a be a real
number. We say that the lefthand limit of f in a is ∞ if the values of
f(x) get arbitrarily large and stay arbitrarily large when x is suitably
close to a and x < a.
Similarly, we say that the righthand limit of f in a is ∞ if the
values of f(x) get arbitrarily large and stay arbitrarily large when x is
suitably close to a and x > a.
Example 9. Let f(x) = 1/x. Then f is not deﬁned in 0. Further
more, lim
x→0
− = −∞ and lim
x→0
+ = ∞. As the two onesided limits
are diﬀerent, lim
x→0
does not exist.
Solution: We can make f(x) = 1/x smaller than any given negative
number M by choosing x from the interval (1/M, 0). We can make
x larger than any positive number P by choosing x from the interval
(0, P). 2
8. LIMIT LAWS 21
7.5. Exercises.
(1) Find lim
x→3
x
2
−4x+3
x−3
.
(2) Does lim
x→0
1
x
exist?
(3) Give an example of a function f such that lim
x→0
− f(x) = 0
and lim
x→0
+ f(x) = ∞.
(4) Does lim
x→0
1
x
3
+
1
x
2
exist?
(5) Give an example of a function f such that lim
x→1
− f(x) = ∞,
lim
x→1
+ f(x) = −∞, and f(1) is a real number.
8. Limit Laws
8.1. Basic Limit Laws. If f and g are two functions and we know the
limit of each of them at a given point a, then we can easily compute
the limit at a of their sum, diﬀerence, product, constant multiple, and
quotient. The rules that provide this limit are given below, and they
are very similar to the ways in which the sum, diﬀerence, product,
constant multiple, and quotient of two functions are deﬁned. Indeed,
(I)
lim
x→a
(f + g)(x) = lim
x→a
f(x) + lim
x→a
g(x),
(II)
lim
x→a
(f −g)(x) = lim
x→a
f(x) − lim
x→a
g(x),
(III)
lim
x→a
(f · g)(x) = lim
x→a
f(x) · lim
x→a
g(x),
(IV)
lim
x→a
(c · f)(x) = c · lim
x→a
f(x),
where c is a real number, and
(V)
lim
x→a
f
g
(x) =
lim
x→a
f(x)
lim
x→a
g(x)
if lim
x→a
g(x) = 0.
It is not diﬃcult to believe that these rules are valid. For instance,
if f(x) gets arbitrarily close to L as x approaches a and g(x) gets
arbitrarily close to L
as x approaches a, then, as x approaches a, the
value of f(x) +g(x), that is, the value of (f +g)(x), will get arbitrarily
close to L+L
. This intuitive argument can be made formal using the
precise deﬁnition of limits.
22 2. LIMITS AND DERIVATIVES
Example 10. Let f(x) = x and let g(x) = x
2
. Find the limits of
f + g, f −g, fg, 3f + 2g, and f/g at a = 2.
Solution: Based on the ﬁve limit laws given earlier, it makes sense to
ﬁrst compute the limits of f and g at 2. The reader is invited to verify
that
lim
x→2
f(x) = lim
x→2
x = lim
x→2
x = 2,
and
lim
x→2
g(x) = lim
x→2
x
2
= lim
x→2
x · lim
x→2
x = 2 · 2 = 4,
where we used the fact that g(x) = x
2
= x· x, so law III can be applied
to compute the limit of g at 2.
Now it is simply a matter of basic algebra to compute the ﬁve limits
that we have been asked to ﬁnd. Indeed, applying the ﬁve limit laws,
we get that
(I) lim
x→2
(f + g)(x) = lim
x→2
f(x) + lim
x→2
g(x) = 2 + 4 = 6,
(II) lim
x→2
(f −g)(x) = lim
x→2
f(x) −lim
x→2
g(x) = 2 −4 = −2,
(III) lim
x→2
(f · g)(x) = lim
x→2
f(x) · lim
x→2
g(x) = 2 · 4 = 8,
(IV) lim
x→2
(3f + 2g)(x) = 3 lim
x→2
f(x) + 2 lim
x→2
g(x) = 3 · 2 +
2 · 4 = 14 (note that here we applied limit law IV to ﬁrst f,
then to g, and then we applied law I to 3f and 2g), and
(V)
lim
x→2
f
g
(x) =
lim
x→2
f(x)
lim
x→2
g(x)
=
2
4
=
1
2
.
2
8.2. Frequently Used Special Cases of Limit Laws. A few special cases
of limit laws I–V are used so frequently that it is worth mentioning
them separately. First, if we repeatedly multiply a function by itself,
we get a power of that function. Applying law III each time, we get
that for all positive integers n,
(2.1) lim
x→a
(f(x))
n
=
lim
x→a
f(x)
n
.
Note that we have essentially applied this rule in the special case
of n = 2 when we computed lim
x→2
x
2
in Example 10.
The reader is invited to verify that the limits of the constant
function f(x) = c and the identity function f(x) = x are given by
lim
x→a
c = c for all a and lim
x→a
x = a. Formal proofs will be given in
the next section.
8. LIMIT LAWS 23
Applying Equation (2.1) to the identity function f(x) = x yields
the equation
(2.2) lim
x→a
x
n
= a
n
.
It turns out (though it is not obvious) that in Equation (2.1) the
exponent n can be replaced by 1/n; in other words, powers can be
replaced by roots, yielding
(2.3) lim
x→a
n
f(x) =
n
lim
x→a
f(x).
(Here f(x) has to be nonnegative if n is even.) So, in particular, if
f(x) = x, then
lim
x→a
n
√
x =
n
√
a.
8.3. Other Useful Facts About Limits. In this section, we discuss a few
facts about limits that are often used to compute limits, but are slightly
diﬀerent in nature from the limit laws we discussed so far.
First, let us recall that the deﬁnition of L = lim
x→a
f(x) requires
that f(x) get arbitrarily close to L if x is suﬃciently close to a but
not equal to a. That is, the value of f(a) does not have to satisfy any
requirements. In fact, we can change f(a) to anything we want, and
L = lim
x→a
f(x) will not change. What matters is what happens at
points other than a. Hence, we can conclude that if f(x) = g(x) for all
points x = a, then lim
x→a
f(x) = lim
x→a
g(x) as long as these limits
exist. For instance, let f(x) = (x
2
− 4)/(x − 2) for all real numbers
x = 2 and let f(2) = 2010. Let g(x) = x+2 for all real numbers. Then
f(x) = g(x) unless x = 2, and hence lim
x→a
f(x) = lim
x→a
g(x) = 4.
The statement that if f(x) = g(x) for all points x = a, then
lim
x→a
f(x) = lim
x→a
g(x) as long as these limits exist can be sig
niﬁcantly strengthened. See Exercise 8.4(1) for a possible direction for
that.
Second, Equation (2.2) can be interpreted by saying that the limit
of a power function f(x) = x
n
at any point a is simply the value of
f(a). Now note that polynomials are nothing else but sums of con
stant multiples of power functions with nonnegative integer exponents.
Hence, using limit laws I and IV, we get the following theorem.
Theorem 2. Let p be a polynomial function. Then, for any real
number a, we have
lim
x→a
p(x) = p(a).
24 2. LIMITS AND DERIVATIVES
Now recall that a rational function is just the ratio of two polyno
mials. Hence, using limit law V, we get the following statement from
Theorem 2.
Corollary 1. Let R(x) be a rational function and let a be a real
number such that R(a) is deﬁned. Then
lim
x→a
R(x) = R(a).
Proof. If R(x) = p(x)/q(x), where p and q are polynomials, then
by ﬁrst applying limit law V, and then Theorem 2, we get
lim
x→a
R(x) = lim
x→a
p(x)
q(x)
=
lim
x→a
p(x)
lim
x→a
q(x)
=
p(a)
q(a)
= R(a).
2
So far all the relationships that we discussed for limits involved
equations. We will now discuss two rules that, involve inequalities.
Theorem 3. Let f and g be two functions and assume that, for all
real numbers x, the inequality f(x) ≤ g(x) holds. Then
(2.4) lim
x→a
f(x) ≤ lim
x→a
g(x)
for any real number a as long as both limits exist.
Proof. If (2.4) did not hold, then
L
f
= lim
x→a
f(x) = D + lim
x→a
g(x) = D + L
g
would hold, for some positive real number D. That would lead to a
contradiction, since if x is so close to a that f(x) −L
f
 < (D/3), then,
in particular, f(x) > L
f
−(D/3), so
g(x) > L
f
−
D
3
= L
g
+
2D
3
.
This inequality says that no matter how close x is to a, the distance
between g(x) at L
g
is more than 2D/3. This contradicts the deﬁnition
of L
g
, since if L
g
exists, then the values of g(x) should get arbitrarily
close to it, provided that x is suﬃciently close to a. 2
Note that in Theorem 3, the fact that the inequalities are not strict
is important. See Exercise 8.4(2) for a relevant question.
Corollary 2 (Squeeze Principle). If f, g, and h are functions
such that, for all real numbers x, the inequality f(x) ≤ g(x) ≤ h(x)
holds and
lim
x→a
f(x) = lim
x→a
h(x) = L,
then lim
x→a
g(x) exists and lim
x→a
g(x) = L.
9. CONTINUOUS FUNCTIONS 25
Proof. If lim
x→a
g(x) exists, then by applying Theorem 3 to f and
g, it follows that L ≤ lim
x→a
g(x), and by applying Theorem 3 to g
and h, it follows that lim
x→a
g(x) ≤ L. So if lim
x→a
g(x) exists, it is
equal to L. In Exercise 8.4(3) you are asked to prove that this limit
exists. 2
The squeeze principle is very useful since it allows us to compute
the limits of rather complicated functions as long as we can squeeze
them between two functions with identical limits.
Example 11. Let g(x) = x cos(log x). Then lim
x→0
g(x) = 0.
Solution: Indeed, let f(x) = −x and h(x) = x. Then, since cos(log z)
is always a real number in the interval [−1, 1], the inequality f(x) ≤
g(x) ≤ h(x) holds for all real numbers x. Furthermore, lim
x→0
f(x) =
lim
x→0
h(x) = 0, so we can apply Corollary 2 to prove our claim. 2
We could not have used limit law III to compute lim
x→0
g(x) since
lim
x→0
cos(log x) does not exist. You are asked to prove this in
Exercise 8.4(4).
8.4. Exercises.
(1) Let f(x) and g(x) be two functions that only diﬀer for a
ﬁnite number of values of the variable x. Is it true that
lim
x→a
f(x) = lim
x→a
g(x) as long as these limits exist? Why
or why not?
(2) Find an example of two functions f and g such that f(x) <
g(x) for all real numbers x, but there exists a real number a
such that lim
x→a
f(x) = lim
x→a
g(x).
(3) Explain why lim
x→a
g(x) exists if the conditions of Corollary
2 hold.
(4) Prove that lim
x→0
cos(log x) does not exist.
(5) Prove that lim
x→0
x sin(
√
x) = 0.
9. Continuous Functions
Intuitively speaking, a function is called continuous at a point x = a
if its graph in a neighborhood of x = a can be drawn without lifting
the pencil from the paper, that is, by a “continuous” line. The formal
deﬁnition of continuity is as follows.
26 2. LIMITS AND DERIVATIVES
Definition 11. A function f is called continuous at a if the equality
lim
x→a
f(x) = f(a)
holds.
Note that Deﬁnition 11 really requires three things. The limit of
f at a must exist, the function f must be deﬁned in a such that f(a)
exists, and the value of f(a) must agree with the limit of f at a.
If all these conditions hold, then the behavior of f at a is very
similar to the behavior of f around a; in particular, the graph of f can
be drawn without lifting the pencil from the paper. If we had to lift the
pencil from the paper, that would mean that some kind of “gap” would
exist in the graph of f, so the requirements of Deﬁnition 11 would not
be satisﬁed.
If a function f : R → R is continuous at all a ∈ R, then it is called
continuous. If f is continuous at each point of the open interval (c, d),
then we say that f is continuous on (c, d). Finally, if you really want
a formal deﬁnition, the neighborhood of a is a set S that contains an
open interval (c, d) containing a.
9.0.1. The Precise Deﬁnition of Continuity. As the informal deﬁni
tion of continuity is very close to that of limits, it is not surprising that
their precise deﬁnitions are also similar.
Definition 12. Let f be deﬁned in an open interval containing a.
We say that f is continuous in a if, for all > 0, there exists δ > 0
such that if x −a < δ, then f(x) −f(a) < .
9.1. Examples of Continuous Functions. Let us consider some of the
most frequently used continuous functions.
Example 12. Polynomial functions are continuous.
Solution: This is a direct consequence of Theorem 2, which we dis
cussed in the last section. Theorem 2 stated that the limit of a polyno
mial function at a is equal to the value of the polynomial at a, which
is precisely what the deﬁnition of continuity requires. 2
There are many classes of functions that are continuous at every
point where they are deﬁned. If they are not deﬁned somewhere, then,
of course, they cannot be continuous there.
Example 13. The following are examples of functions that are con
tinuous in every point where they are deﬁned.
9. CONTINUOUS FUNCTIONS 27
(I) Rational functions
(II) Exponential functions
(III) Trigonometric functions
(IV) Logarithmic functions
(V) Inverse trigonometric functions
The reader is invited to recall the graphs of each of these functions
and verify that they consist of continuous lines as long as they are
deﬁned.
9.2. Functions That Are Not Continuous. It is time to stop for a moment
and think about functions that are not continuous at a given point a.
There can be three reasons for this. First, it could be that f(a) is not
deﬁned, for instance, when f is a rational function whose denominator
becomes 0 when x = a. Or it could be that g is deﬁned at a, but
lim
x→a
g(x) does not exist at a. An example of this is the function
deﬁned by g(x) = 1 if x ≥ 0 and g(x) = 0 if x < 0. As we have seen
before, the limit of this function does not exist in a = 0, even if g(0)
is deﬁned. So g is not continuous at 0. Finally, it could happen that h
is deﬁned in a and the limit of h at a exists, but h(a) is not equal to
this limit. That happens, for example, if h(x) = (x + 3)/(x
2
− 9) if
x = 3 and h(x) = 1 if x = 3. Let a = −3. Then
h(a) = 1 = lim
x→a
h(x) = −
1
6
.
The interested reader is invited to think about the following
example.
Excursion 1. The following function is not continuous anywhere.
Let f(x) = 1 if x is rational and let f(x) = 0 if x is irrational.
9.3. NewContinuous Functions fromOld. It follows from the limit laws
that several transformations preserve the continuous property of
functions.
Theorem 4. Let f and g be two functions that are continuous at a
and let c be a real number. Then all of the following are also continuous
functions at a:
(I) f + g,
(II) f −g,
(III) f · g,
(IV) cf, and
(V) f/g as long as g(a) = 0.
28 2. LIMITS AND DERIVATIVES
Example 14. It follows from successive applications of the previous
theorem that h(x) = e
x
· sin x+3 ln x−
√
x is continuous at all positive
real numbers a.
The following important theorem also holds, though it is not a
direct consequence of our limit laws.
Theorem 5. Let f and g be two functions such that f is continuous
at a and g is continuous at f(a). Then the composition function g ◦ f
is continuous at a.
This theorem is important since it enables us to prove the continuity
of functions that would otherwise be cumbersome to handle.
Example 15. The function h(x) =
√
2 + sin x is continuous at all
real numbers a.
Solution: Let f(x) = 2 + sin x and let g(x) =
√
x. Then f is contin
uous everywhere, and g is continuous at all positive real numbers. As
f(x) is always a positive real number, the statement follows. 2
9.4. OneSided Continuity. A function may happen to be continuous in
only one direction, either from the “left” or from the “right.” Formally,
this means the following.
Definition 13. We say that the function f is leftcontinuous at a
if f(a) = lim
x→a
− f(x). Similarly, we say that f is rightcontinuous at
a if f(a) = lim
x→a
+ f(x).
Example 16. Let g be the function deﬁned by g(x) = 1 if x ≥ 0
and g(x) = 0 if x < 0. Then lim
x→0
− g(x) = 0 = 1 = g(0), so g is not
leftcontinuous at 0. On the other hand, lim
x→0
+ g(x) = 1 = g(0), so g
is rightcontinuous at 0.
The reader is invited to verify that f is continuous at a if and only
if f is both leftcontinuous and rightcontinuous at a.
We say that a function is continuous on an interval [a, b] if it is con
tinuous at all points of (a, b), leftcontinuous at a, and rightcontinuous
at b.
9.5. Intermediate Value Theorem. Perhaps the most important prop
erty of continuous functions is that they do not skip any values between
two values that they actually take. For instance, if a tree grows from
3 feet to 6 feet, then there is a time in between when the tree is exactly
4.47 feet tall. The intuitive reason for this is that if there were a value
in between that is not taken by the function, then there would be a
10. LIMITS AT INFINITY 29
gap in the graph of the function, contradicting the requirement that
the function be continuous. This is the content of the next theorem.
Theorem 6 (Intermediate Value Theorem). Let f be a function
that is continuous on the interval [a, b]. Then, if f(a) = y
1
and f(b) =
y
2
and y is a real number that is between y
1
and y
2
, then there exists
x ∈ [a, b] such that f(x) = y.
In other words, f takes all values between y
1
and y
2
on the interval
[a, b].
Example 17. There is a real number x in the interval [0, 1] such
that x + e
x
= 2.
Solution: Let f(x) = x+e
x
. Then f is continuous everywhere, f(0) =
1, and f(1) = 1 + e > 3.71. So, by the intermediate value theorem,
we get that f takes all values between 1 and 1 + e on that interval,
including y = 2. 2
9.6. Exercises.
(1) Where is tan x continuous?
(2) Where is 1/x not continuous?
(3) Where is
3x+2
5x
2
−6x+1
continuous?
(4) Where is sin(x
2
) continuous?
(5) Prove that the equation x
5
−x−1 = 0 has a root in the interval
(−1, 2).
(6) Prove that the equation x
3
−3x−1 = 0 has at least two roots
in the interval (−1, 2).
10. Limits at Inﬁnity
10.1. Finite Limits at Inﬁnity. In Section 7, we deﬁned what it meant
for a function to have a limit L at a real number a. In this section, we
extend that deﬁnition and deﬁne what it means for a function to have
a limit L at ∞ or at −∞.
Definition 14. Let f : R → R be a function that is deﬁned on
some interval (b, ∞). We say that the limit of f at ∞is the real number
L if the values of f(x) get arbitrarily close to L and stay arbitrarily close
to L when x is suitably large.
The fact that the limit of f at ∞ is L is expressed by the notation
lim
x→∞
f(x) = L.
This deﬁnition follows the idea of the deﬁnition of limits at ﬁnite
points. Indeed, in order for lim
x→∞
f(x) = L to hold, we require that
30 2. LIMITS AND DERIVATIVES
the values of f(x) get arbitrarily close to L and stay arbitrarily close
to L if x is large enough. Here “x is large enough” means that x is
in a suitably selected neighborhood of ∞, in other words, in an open
interval (c, ∞). Recall that this is analogous to what we required in the
ﬁnite case. There we said that lim
x→a
f(x) = L if f(x) got arbitrarily
close to L and stayed arbitrarily close to L once x was suitably close
to a, that is, when x was in a suitably selected neighborhood of a.
Example 18. Let f(x) = 1/x. Then
lim
x→∞
f(x) = 0.
Solution: If we want the value of f(x) to be closer than to 0, all we
have to do is to select x such that x > 1/ holds. Once x gets past 1/,
the values of f(x) will stay between 0 and . 2
The deﬁnition of limits at −∞ is what the reader probably expects.
Definition 15. Let f : R → R be a function deﬁned on some
interval (−∞, b). We say that the limit of f at −∞ is the real number
L if the values of f(x) get arbitrarily close to L and stay arbitrarily
close to L when x is a negative number with a suitably large absolute
value.
The fact that the limit of f at −∞is L is expressed by the notation
lim
x→−∞
f(x) = L.
Example 19. Let f(x) = 1/x
2
. Then
lim
x→−∞
f(x) = 0.
Solution: If we want to get f(x) closer than to 0 and keep it there,
it suﬃces to choose x such that x < −1/
√
. Then x
2
> 1/, and hence
f(x) = 1/x
2
< . 2
10.1.1. The Formal Deﬁnition of Limits at Inﬁnity. The formal deﬁni
tion of limits at inﬁnity is very similar to that of limits at ﬁnite points.
The only diﬀerence is in the formal description of what it means to be
in a neighborhood of inﬁnity versus what it means to be in a neighbor
hood of a real number.
Definition 16. Let f : R → R be a function deﬁned on some
interval (b, ∞). We say that lim
x→∞
f(x) = L if, for all positive real
numbers , there exists a positive real number N such that if x > N,
then f(x) −L < .
10. LIMITS AT INFINITY 31
The formal deﬁnition of limits at negative inﬁnity is analogous. The
only diﬀerence is again in the formal description of what it means for x
to be in a neighborhood of −∞. It means to be in an interval (−∞, c).
Definition 17. Let f : R → R be a function deﬁned on some
interval (−∞, b). We say that lim
x→−∞
f(x) = L if, for all positive real
numbers , there exists a negative real number N such that if x < N,
then f(x) −L < .
10.1.2. The Graphical Meaning of a Finite Limit at Inﬁnity. If a func
tion f has limit L at ∞ or −∞, then the graph of the function will
approach the horizontal line y = L at that inﬁnity. The graph may or
may not actually touch that line or even become that line. The line
y = L is called a horizontal asymptote of the graph of y = f(x) when
lim
x→∞
f(x) = L or lim
x→−∞
f(x) = L holds.
10.2. Inﬁnite Limits at Inﬁnity. It can happen that the limit of a function
at ∞ is not a real number but rather ∞ or −∞.
Definition 18. Let f : R → R be a function deﬁned on some
interval (b, ∞). We say that the limit of f at ∞ is ∞, denoted by
lim
x→∞
f(x) = ∞,
if f(x) gets arbitrarily large and stays arbitrarily large if x gets suﬃ
ciently large.
Example 20. Let f(x) = e
x
. Then lim
x→∞
f(x) = ∞.
Solution: In order to get f(x) to be larger than some given positive
real number M, it suﬃces to choose x > ln M. 2
The following notation is deﬁned in an analogous way:
(I) lim
x→∞
f(x) = −∞.
(II) lim
x→−∞
g(x) = ∞.
(III) lim
x→−∞
h(x) = −∞.
Each of these deﬁnitions refers to a fact that the values of a function
get arbitrarily far away from 0 and stay arbitrarily far away from 0 (in
the appropriate direction) if x gets suﬃciently far away from 0 (in the
appropriate direction). The reader should test his or her understand
ing of these concepts by verifying that lim
x→∞
1 − x = −∞, while
lim
x→−∞
x
2
= ∞, and lim
x→−∞
x
3
= −∞.
32 2. LIMITS AND DERIVATIVES
10.2.1. The Formal Deﬁnition of Inﬁnite Limits at Inﬁnity. By now, the
formal deﬁnition of inﬁnite limits at inﬁnity probably does not come
as a surprise. We are providing a formal deﬁnition for one of the four
possible scenarios that can occur due to changes in sign. The other
three cases are analogous.
Definition 19. Let f : R → R be a function deﬁned on some
interval (b, ∞). We say that lim
x→∞
f(x) = ∞ if, for all positive real
numbers M, there exists a positive real number N such that if x > N,
then f(x) > M.
10.3. Computing Limits at Inﬁnity. The limit laws that we learned for
limits at ﬁnite points stay true for limits at inﬁnity as well, provided,
of course, that they make sense. Here are a few examples.
Example 21. We have
lim
x→∞
x + 3
x −4
= 1.
It would be wrong to argue as follows: “The numerator is the func
tion f(x) = x + 3, and the denominator is the function g(x) = x − 4.
At ∞, they both have limit ∞, so, by the limit law for quotients, the
limit of their quotient is 1.”
The problem with this argument is that ∞ is not a number. So
∞/∞ is not deﬁned. It is possible for f and g both to have limit ∞
at ∞, and for f/g to have limits c at ∞, for any given real number c.
Indeed, let f(x) = cx and let g(x) = x.
Instead, we can solve Example 21 as follows.
Solution:
lim
x→∞
x + 3
x −4
= lim
x→∞
(x −4) + 7
x −4
= lim
x→∞
1 +
7
x −4
= 1 + lim
x→∞
7
x −4
= 1 + 0
= 1.
2
We would like to point out other pitfalls when dealing with the
application of limit laws and inﬁnite limits. The following expressions
are not deﬁned:
10. LIMITS AT INFINITY 33
(I) ∞+ (−∞)
(II) ∞· 0 and −∞· 0
(III) 1
∞
and 1
−∞
The following theorem is very useful when dealing with limits at ∞.
Theorem 7. Let r be a positive rational number. Then
lim
x→∞
1
x
r
= 0.
If r is an integer, then this statement follows from the fact that
lim
x→∞
1/x = 0 by applying limit law III (for products) r times. If
r = p/q, where p and q are positive integers, then we can ﬁrst prove
the theorem for x
p
, and then, using the root law, for x
p/q
=
q
√
x
p
.
Many limits can be computed with the help of this theorem.
Example 22. We have
lim
x→∞
x
2
+ 3x + 1
x
3
= 0.
Solution: We have
x
2
+ 3x + 1
x
3
=
x
2
x
3
+
3x
x
3
+
1
x
3
,
and each of the three summands has limit 0 at ∞ by the preceding
theorem. Hence, by the limit law for sums, so does their sum. 2
Note that the limit would not change if we changed the denominator
from x
3
to x
3
+3x
2
+4x+5. This would have decreased the value of our
function, but would have still kept it positive. Hence, by the squeeze
principle, we can then conclude that
lim
x→∞
x
2
+ 3x + 1
x
3
+ 3x
2
+ 4x + 5
= 0.
10.4. Exercises.
(1) Find lim
x→∞
x+1
x
2
+4
.
(2) Find lim
x→∞
3x
2
+4x+1
x
2
+4
.
(3) Find lim
x→∞
x
3
+2x
x
2
+4
.
(4) Find lim
x→−∞
3x
2
+4x+1
x
2
+4
.
(5) Let R(x) = p(x)/q(x) be a rational function. Explain how
lim
x→∞
R(x) depends on p(x) and q(x).
34 2. LIMITS AND DERIVATIVES
11. Derivatives
11.1. Tangent Lines. Let us consider a function, such as f(x) = x
2
, and
its graph. Let us choose a point on the graph, say the point P = (3, 9).
Now let us look for the slope of the tangent line to the graph at that
point.
That is, consider a sequence of points P
1
, P
2
, . . . that are all on the
graph of f and are closer and closer to P. For each of these points,
draw the line P
i
P. The slope of these lines will approach a certain
slope, and so the lines P
i
P will approach a certain line. That line is
called the tangent line of f at P.
Definition 20. Let f be a function and let P = (a, f(a)) be a
point on the graph of f. Then the tangent line to f at P is the line
that contains P and has slope
(2.5) lim
x→a
f(x) −f(a)
x −a
,
provided that this limit exists.
Note that in the preceding deﬁnition, (f(x)−f(a))/(x−a) is simply
the slope of the line connecting the points P and (x, f(x)).
Example 23. In our running example, that is, when f(x) = x
2
and
P = (3, 9), the tangent line is the line that goes through P and has slope
lim
x→3
f(x) −f(3)
x −3
= lim
x→3
x
2
−9
x −3
= lim
x→3
(x + 3) = 6.
11.2. Velocities. Recall that in Section 6, we mentioned that the av
erage velocity of a moving object, such as a car, can be computed by
the rule v = s/t. That is, the average velocity is equal to the distance
covered divided by the time needed to cover that distance. However,
what can be said about the instantaneous velocity, that is, the velocity
in a given moment?
We could not answer that question in Section 6 since we did not
have the tools to handle the fact that when only a given moment is
considered, both the numerator and the denominator of the formula
v = s/t are 0. Now that we have learned about limits, we can overcome
that diﬃculty as follows.
Definition 21. Let f(t) be a function such that f(t) is the distance
covered by a moving object in t units of time. Then the instantaneous
velocity of the object a units of time after it starts moving is
v(t) = lim
t→a
f(t) −f(a)
t −a
,
provided that this limit exists.
11. DERIVATIVES 35
Example 24. A car starts out by accelerating for 10 seconds so
that the distance covered in the ﬁrst t seconds is obtained (in meters)
by the function f(t) =
1
2
t
2
if t ≤ 10. What is the instantaneous velocity
of the car after 4 seconds?
Solution: By the deﬁnition of instantaneous velocity, we must
compute
v(4) = lim
t→4
f(t) −f(4)
t −4
= lim
t→4
t
2
−16
2(t −4)
= lim
t→4
t + 4
2
= 4.
So, at the end of the fourth second (exactly 4 seconds after starting
out), the car will move at a rate of 4 meters per second. 2
11.3. The Derivative of a Function. The fact that the last two concepts,
the tangent line and the instantaneous velocity, led to very similar
deﬁnitions suggests that there is a very general principle at work and
we have seen two special cases of that principle.
This is indeed the case.
Definition 22. Let f be a function. The derivative of f at a is
the limit
f
(a) = lim
x→a
f(x) −f(a)
x −a
if this limit exists and is ﬁnite.
So, in particular, f
(a) is the slope of the tangent line of f at a
(unless that tangent line is vertical). Furthermore, the instantaneous
velocity at time a is the derivative of the distance covered (as a function
of the time t needed to cover that distance) at t = a.
In other words, the derivative is a common generalization of the
concepts of tangent line and instantaneous velocity.
11.4. Exercises.
(1) Find the slope of the tangent line to the curve f(x) = 3x
2
−7
at the point (2, 5).
(2) Find the slope of tangent line to the curve f(x) = x
3
at x = 0.
(3) Show an example of a curve that does not have a tangent line
at some point a because the limit deﬁned in (2.5) does not
exist.
(4) The distance covered by a car in a certain time period is de
scribed by the function
f(t) = tm +
t
2
(b −m)
2
,
36 2. LIMITS AND DERIVATIVES
where b and m are positive constants. Let us assume that
t ∈ [0, 1]. Find the instantaneous velocity of the car at a
given moment t = a.
(5) Find the derivative of the function f(x) = x + 5 at a = 3.
12. The Derivative as a Function
12.1. Rates of Change. In the last section, we saw that the derivative
of a function at a given point was a common generalization of the
concepts of tangent lines and instantaneous velocities. We will now
further elaborate on that, in order to understand how farreaching the
concept of derivatives is.
If f is a function and f(x) = y, then the quantity denoted by y
depends on the quantity denoted by x. This is sometimes expressed
by saying that x is the independent variable and y is the dependent
variable. If x changes, then the change in y can be described in terms
of the change in x.
In particular, if x changes from x
1
to x
2
, then y = f(x) changes
from y
1
= f(x
1
) to y
2
= f(x
2
). The average rate of change for the
interval (x
1
, x
2
) is then the ratio
y
2
−y
1
x
2
−x
1
=
Δy
Δx
,
where Δx is the change (or increment) of x. We have to use the word
“average” since we only have information about the values of y at the
endpoints of the interval (x
1
, x
2
); we do not know how f(x) = y behaves
in the rest of the interval. If we want more precise information, such
as the instantaneous rate of change of f(x) = y at a given point, then
we have to use the notion of limits again, just as we have done twice
in the last section. That is, at a given point x = a, we deﬁne the
instantaneous rate of change of f(x) = y as
lim
x
2
→a
f(x
2
) −f(a)
x
2
−a
= lim
Δx→0
Δy
Δx
.
12.2. The Derivative of the Function f . Recall that, at a given point a,
the derivative of the function f is deﬁned as the limit
f
(a) = lim
x→a
f(x) −f(a)
x −a
.
Note that this deﬁnition associates the real number f
(a) to the
real number a. That is, f
: R → R is a function. The function f
is
called the derivative of f. The operation that takes f into f
is called
diﬀerentiation. This explains the following deﬁnition.
12. THE DERIVATIVE AS A FUNCTION 37
Definition 23. A function f is called diﬀerentiable at a if f
(a)
exists.
We say that f is diﬀerentiable on the interval (a, b) if f is diﬀeren
tiable at d for all d ∈ (a, b).
Example 25. The function f(x) = x
3
is diﬀerentiable in every real
number a, and f
(a) = 3a
2
.
Solution: We have
lim
x→a
f(x) −f(a)
x −a
= lim
x→a
x
3
−a
3
x −a
= lim
x→a
(x −a)(x
2
+ xa + a
2
)
x −a
= lim
x→a
x
2
+ xa + a
2
= 3a
2
.
2
The functions we have considered so far had only one independent
variable, usually the variable x. The dependent variable was usually
denoted by y, so y = f(x) held. So it was always clear that the de
rivative was taken with respect to x. However, there are circumstances
when this is not so clear, usually when f depends on more than one
variable. Therefore, there are additional ways to denote the function
f
such as
•
dy
dx
,
•
df
dx
,
•
d
dx
f(x),
• D
x
f(x), or
• Df(x).
12.3. Differentiability Versus Continuity. The deﬁnitions of diﬀerentia
bility and continuity are similar. Which one imposes stronger require
ments on a function at a given point? The following theorem shows
that diﬀerentiability is the stronger requirement.
Theorem 8. If f is diﬀerentiable at a, then f is continuous at a.
Proof. If f is diﬀerentiable at a, then
f
(a) = lim
x→a
f(x) −f(a)
x −a
;
38 2. LIMITS AND DERIVATIVES
in particular, the limit shown on the righthand side exists. Multiplying
both sides by the function g(x) = x −a, we get
f
(a)(x −a) = (x −a) lim
x→a
f(x) −f(a)
x −a
;
Now, taking limits at a on both sides, we obtain
(2.6) f
(a) · lim
x→a
(x −a) = lim
x→a
(f(x) −f(a)),
since we can apply the limit law for products on the righthand side to
get that
lim
x→a
(x −a) lim
x→a
f(x) −f(a)
x −a
= lim
x→a
(x −a) ·
f(x) −f(a)
x −a
= lim
x→a
(f(x) −f(a)) .
Finally, note that the lefthand side of (2.6) is equal to 0 since
f
(a)(x −a) is a polynomial that takes value 0 when x = a. Hence, the
righthand side of (2.6) is equal to 0 as well, that is,
0 = lim
x→a
(f(x) −f(a)) = (lim
x→a
f(x)) −f(a).
Adding f(a) to both the far left and far right sides, we get that
f(a) = lim
x→a
f(x),
which means that f is continuous at a. 2
The converse of Theorem 8 is not true. Indeed, the function f(x) =
x is continuous at a = 0, but it is not diﬀerentiable. The reader is
invited to prove this by showing that
lim
x→0
−
x −0
x −0
= lim
x→0
−
x
x
= lim
x→0
+
x
x
= lim
x→0
+
x −0
x −0
,
and hence
f
(0) = lim
x→0
x −0
x −0
does not exist.
In general, there are several reasons a continuous function may fail
to be diﬀerentiable at a given point. It could be that the graph of the
function has a “corner,” like that of x at 0, and hence the slope of
the tangent line cannot be deﬁned because the lefthand limit and the
righthand limit of the lines approaching the purported tangent line
are not equal. Or, it could be that the function has a vertical tangent
line at the given point. See Exercise 12.5(5) for an example of this.
12. THE DERIVATIVE AS A FUNCTION 39
12.4 HigherOrder Derivatives. In upcoming chapters, it will often be
useful to consider not only the derivative of a function but also the
derivative of the derivative and even the derivative of the derivative
of the derivative. These functions appear so often that they have their
own names.
If f is a diﬀerentiable function on an interval (a, b) and its derivative
f
is also diﬀerentiable on (a, b), then the derivative of f
is called
the second derivative of f and is denoted by f
. Similarly, if f
is
diﬀerentiable on (a, b), then its derivative is called the third derivative
of f and is denoted by f
. Higherorder derivatives are deﬁned in
an analogous way, but denoted slightly diﬀerently. For instance, the
seventh derivative of f is denoted by f
(7)
, and, in general, the nth
derivative is denoted by f
(n)
.
Example 26. We have seen in Example 25 that if f(x) = x
3
, then
f
(x) = 3x
2
. Therefore,
f
(a) = lim
x→a
f
(x) −f
(a)
x −a
= lim
x→a
3(x
2
−a
2
)
x −a
= lim
x→a
3(x + a)
= 6a.
So f
(x) = 6x.
In Exercise 12.5(1), you are asked to prove that f
(x) = 6 for
all x, and in Exercise 12.5(2), you are asked to compute higherorder
derivatives of f.
12.5. Exercises.
(1) Let f(x) = x
3
. Prove that f
(x) = 6 for all real numbers x.
(2) Let f(x) = x
3
. Compute f
4
(x). What can be said about
higherorder derivatives of f?
(3) Let f(x) =
√
x. Compute f
(a) at some point a > 0.
(4) Let f(x) = 1/x. Compute f
(a) at some point a = 0.
(5) Let f be deﬁned on the interval [0, 2] by f(x) =
√
1 −x
2
if
0 ≤ x ≤ 1, and f(x) = −
1 −(x −2)
2
if 1 < x ≤ 2. So
the graph of f(x) is the union of two quarters of a unit circle.
Prove that, at x = 1, the graph of f has a vertical tangent
line, that is,
lim
x→1
f(x) −f(1)
x −1
= ∞.
CHAPTER 3
Rules of Differentiation
13. Derivatives of Polynomial and Exponential Functions
13.1. Polynomials. Let us recall that polynomials are sums of power
functions with nonnegative integer exponents, such as the function
f(x) = 3x
2
+ 4x + 6. In this section, we will deduce general rules for
the derivatives of polynomial functions. We start by their “building
blocks,” power functions. The simplest of these is the class of constant
functions.
Theorem 9. Let c be a real number and let f(x) = c for all x.
Then f
(a) = 0 for all real numbers a.
Before we prove the theorem, we point out that, intuitively, it makes
perfect sense. The derivative of a function f describes the rate of
change of f, but if f is a constant function, then f never changes (it
has zero change).
Proof of Theorem 9. We have
f
(a) = lim
x→a
f(x) −f(a)
x −a
= lim
x→a
c −c
x −a
= 0.
Note that lim
x→a
(c −c)/(x −a) = 0 since (c −c)/(x −a) = 0 for all
values x = a. 2
We now turn our attention to a more general class of power func
tions, those of the form f(x) = x
n
, where n is a positive integer. Let
us recall the algebraic identity
x
n
−a
n
= (x −a) · (x
n−1
+ x
n−2
a +· · · + xa
n−2
+ a
n−1
).
Theorem 10. Let n be a positive integer and let f(x) = x
n
. Then
f
(a) = na
n−1
.
41
42 3. RULES OF DIFFERENTIATION
Proof. We have
f
(a) = lim
x→a
f(x) −f(a)
x −a
= lim
x→a
x
n
−a
n
x −a
= lim
x→a
(x −a) · (x
n−1
+ x
n−2
a +· · · + xa
n−2
+ a
n−1
)
x −a
= lim
x→a
(x
n−1
+ x
n−2
a +· · · + xa
n−2
+ a
n−1
)
= na
n−1
. 2
Note that this agrees with our result from the last section that
showed that if f(x) = x
3
, then f
(x) = 3x
2
.
It turns out that Theorem 10 holds even if n is not a positive integer.
That is, for all real numbers α, if f(x) = x
α
, then f
(x) = αx
α−1
. We
will see a formal proof of this fact later. In the exercises, you are asked
to prove two special cases of this general result.
13.1.1 Three Simple Rules. Derivatives are limits of certain func
tions, so it is not surprising that some of the laws governing their
computation are very similar to limit laws. That is, if we know the
derivative of f and g, then we can easily compute the derivative of
f + g, f −g, and cf, where c is a given real number. The rules are as
follows.
Theorem 11. Let f and g be two functions that are diﬀerentiable
at a. Then f + g is diﬀerentiable at a, and
(f + g)
(a) = f
(a) + g
(a).
Proof. We have
(f + g)
(a) = lim
x→a
(f + g)(x) −(f + g)(a)
x −a
= lim
x→a
f(x) −f(a)
x −a
+
g(x) −g(a)
x −a
= f
(a) + g
(a).
2
The other two rules and their proofs are so similar that they are
left as exercises.
Theorem 12. Let f and g be two functions that are diﬀerentiable
at a. Then f −g is diﬀerentiable at a, and
(f −g)
(a) = f
(a) −g
(a).
13. DERIVATIVES OF POLYNOMIAL AND EXPONENTIAL FUNCTIONS 43
Theorem 13. Let f be a function that is diﬀerentiable at a, and
let c be a real number. Then cf is diﬀerentiable at a, and
(cf)
(a) = cf
(a).
It is very important to point out that the other limit laws do not
carry over to derivatives in the same fashion. That is, in general,
(fg)
= f
g
, and (f/g)
= f
/g
. We will learn some more complicated
rules to compute the derivatives of fg and f/g in the next section.
Theorems 11, 12, and 13 enable us to compute the derivative of any
polynomial function.
Example 27. Let p(x) = 3x
3
+ 5x
2
−6x + 8. Find p
(x).
Solution: Note that p(x) is just a sum (and diﬀerence) of constant
multiples of power functions. The derivatives of power functions are
computed in Theorem 10. Then we can apply Theorems 11, 12, and
13 to get
p
(x) = (3x
3
)
+ (5x
2
)
−(6x)
+ (8)
= 3(x
3
)
+ 5(x
2
)
−6(x)
+ (8)
= 9x
2
+ 10x −6. 2
13.2. Exponential Functions. Let us now compute the derivative of the
exponential function f(x) = b
x
, where b is some positive constant. By
the deﬁnition of derivatives, we get
f
(a) = lim
x→a
f(x) −f(a)
x −a
= lim
x→a
b
x
−b
a
x −a
= lim
z→0
b
a+z
−b
a
z
= b
a
lim
z→0
b
z
−1
z
= b
a
f
(0).
Several comments are in order. First, note the substitution z = x−a
in the third line. Second, note that b
a
is a constant that does not
depend on z; hence, the limit law for constant multiples was used in
the fourth line. Third, in the special case when a = 0, the deﬁnition
of the derivative yields f
(0) = lim
z→0
(b
z
−1)/z. We used this fact in
the last line.
In other words,
(3.1) f
(x) = f
(0)b
x
= f
(0)f(x).
44 3. RULES OF DIFFERENTIATION
That is, the derivative of the function f is a constant multiple of f.
The constant in question is f
(0), that is, lim
z→0
(b
z
−1)/z. Numerical
experimentation suggests that the larger b is, the larger this limit is.
Graphical experimentation suggests this as well. Indeed, f
(0) is the
slope of the tangent line to the curve of f(x) = b
x
at the point x = 0,
and plotting f for various values of b suggests that the larger b is, the
larger this slope is.
In particular, it can be proved that there exists a real number e,
close to 2.71, such that
lim
z→0
e
z
−1
z
= 1.
Definition 24. Let e be the real number such that
lim
z→0
e
z
−1
z
= 1.
So, in the special case of b = e, Equation (3.1) takes the form
(e
x
)
= e
x
,
since
f
(0) = lim
z→0
e
z
−1
z
= 1.
That is, the derivative of f(x) = e
x
is f(x) = e
x
itself. In the next
section, we will see what this implies for the derivatives of exponential
functions with bases diﬀerent from e.
13.3. Exercises.
(1) Prove that if f(x) = x
1/2
and a = 0, then f
(a) =
1
2
√
a
.
(2) Prove that if f(x) = 1/x and a = 0, then f
(a) = −
1
a
2
.
(3) Prove Theorem 12.
(4) Prove Theorem 13.
(5) Let f(x) = 3x
3
−4x
2
+ x −2 + 4e
x
. Compute f
(x).
14. The Product and Quotient Rules
14.1. The Product Rule. We mentioned in the last section that, in gen
eral, (fg)
= f
g
. For instance, if f(x) = 2x+1 and g(x) = x+2, then
(fg)(x) = 2x
2
+ 5x + 2, so (fg)
(x) = (2x
2
+ 5x + 2)
= 4x + 5, while
f
(x) = 2 and g
(x) = 1, so f
(x)g
(x) = 2.
It turns out that there is a rule to compute the derivative of a
product; it is just a little bit more complicated than the limit law for
products. This is the focus of our ﬁrst theorem in this section.
14. THE PRODUCT AND QUOTIENT RULES 45
Theorem 14. Let f and g be two functions that are diﬀerentiable
at a. Then fg is diﬀerentiable at a, and
(fg)
(a) = f(a)g
(a) + f
(a)g(a).
Proof. By deﬁnition, we have
(3.2) (fg)
(a) = lim
x→a
f(x)g(x) −f(a)g(a)
x −a
.
The crucial idea is to decompose the diﬀerence f(x)g(x) −f(a)g(a) as
(f(x)g(x) −f(x)g(a)) +(f(x)g(a) −f(a)g(a)) in the numerator of the
righthand side of (3.2).
Using this idea, we obtain from Equation (3.2)
(fg)
(a) = lim
x→a
f(x)g(x) −f(x)g(a)
x −a
+
f(x)g(a) −f(a)g(a)
x −a
= lim
x→a
f(x)g(x) −f(x)g(a)
x −a
+ lim
x→a
f(x)g(a) −f(a)g(a)
x −a
= lim
x→a
f(x) ·
g(x) −g(a)
x −a
+ lim
x→a
g(x) ·
f(x) −f(a)
x −a
= f(a)g
(a) + g(a)f
(a).
2
Example 28. The derivative of h(x) = x
2
e
x
can be computed as
follows. Let f(x) = x
2
and g(x) = e
x
. Then
h
(x) = (fg)
(x)
= f(x)g
(x) + f
(x)g(x)
= x
2
(e
x
)
+ (x
2
)
e
x
= x
2
e
x
+ 2xe
x
= e
x
(x
2
+ 2x).
14.2. The Quotient Rule. The rule for the derivative of the quotient of
two functions is a little bit more complicated than that for the deriva
tive of the product of two functions. Though more complex, both the
rule and its proof bear some similarity to the rule given in Theorem 14.
Theorem 15. Let f and g be two functions that are diﬀerentiable
at a and let us assume that g(a) = 0. Then f/g is diﬀerentiable at a,
and we have
f
g
(a) =
g(a)f
(a) −f(a)g
(a)
g(a)
2
.
46 3. RULES OF DIFFERENTIATION
Proof. By deﬁnition, we have
(3.3)
f
g
(a) = lim
x→a
f(x)
g(x)
−
f(a)
g(a)
x −a
.
Let us multiply both the numerator and the denominator of the right
hand side by g(x)g(a) to get
f
g
(a) = lim
x→a
f(x)g(a) −f(a)g(x)
(x −a)g(x)g(a)
.
Now transform the numerator of the righthand side by subtracting
and then adding g(a)f(a) to get
f
g
(a) = lim
x→a
f(x)g(a) −g(a)f(a) + g(a)f(a) −f(a)g(x)
(x −a)g(x)g(a)
= lim
x→a
g(a)
g(x)g(a)
·
f(x) −f(a)
x −a
− lim
x→a
f(a)
g(x)g(a)
·
g(x) −g(a)
x −a
=
g(a)f
(a) −f(a)g
(a)
g(a)
2
.
2
Theorem 15 now enables us to compute the derivative of rational
functions.
Example 29. Let h(x) = (x + 3)/(x
2
+ 1). Find h
(x).
Solution: Let f(x) = x + 3 and let g(x) = x
2
+ 1. Then f
(x) = 1
and g
(x) = 2x. So, by Theorem 15, we have
h
(x) =
g(x)f
(x) −f(x)g
(x)
g(x)
2
=
x
2
+ 1 −(x + 3)2x
x
4
+ 2x
2
+ 1
=
−x
2
−6x + 1
x
4
+ 2x
2
+ 1
.
2
14.3. Exercises.
(1) Let h(x) = e
x
x
3
. Find h
(x) and h
(x).
(2) Find a rule to compute (f
2
)
(x).
(3) Find a rule to compute (1/f)
(x).
(4) Use the result of the previous exercise to prove a formula for
g
(x) if g(x) = x
n
for a negative integer n.
(5) Let g(x) = e
−x
. Find g
(x).
(6) Let h(x) = x/e
x
. Find h
(x).
15. DERIVATIVES OF TRIGONOMETRIC FUNCTIONS 47
15. Derivatives of Trigonometric Functions
In this section, we show how to compute the derivatives of trigono
metric functions. First, we compute (sin x)
. This will be a somewhat
lengthy procedure, due to the fact that this is the ﬁrst trigonometric
function we will diﬀerentiate and we will have to apply new methods.
However, once we know the derivatives of sin x and cos x, it will be
much simpler to deduce the derivatives of other trigonometric func
tions, since those functions can be obtained from sin and cos, and then
the various diﬀerentiation rules can be used.
Theorem 16. We have (sin x)
= cos x.
Proof. Recall the identity sin(a +b) = sin a cos b +sin b cos a. We
have
(sin x)
= lim
h→0
sin(x + h) −sin x
h
= lim
h→0
sin x cos h + sin hcos x −sin x
h
= lim
h→0
sin x
(cos h) −1
h
+
sin hcos x
h
= sin x lim
h→0
(cos h) −1
h
+ cos x lim
h→0
sin h
h
.
Note that as, h approaches 0, we certainly have lim
h→0
sin x = sin x
and lim
h→0
cos x = cos x, since these functions do not even depend on
h.
There remains the task of computing the two nontrivial limits
lim
h→0
(cos h) −1
h
and lim
h→0
sin h
h
.
We will carry out this task in two lemmas.
Lemma 1. We have
lim
h→0
sin h
h
= 1.
Proof. Let us consider a circle with unit radius and a regular n
gon whose center is at the center O of the circle and whose n vertices
are all on the unit circle. Then the area of the circle is π, and the area
of the ngon is n ·
1
2
· sin α, where α = 2π/n is the angle AOB, with A
and B being adjacent vertices of our ngon.
Considering just 1/n of both the circle and the ngon, we see that
the area of the triangle AOB is (sin α)/2, and the area of 1/n of the
48 3. RULES OF DIFFERENTIATION
circle bordered by the lines AO, BO, and the arc AB is π · α/(2π) =
α/2. So the ratio of the two areas is
(sin α)/2
α/2
=
sin α
α
.
On the other hand, as n gets larger and larger, α gets smaller and
smaller, while the area of the ngon gets closer and closer to the area
of the circle. Hence, their ratio, sin α/α, will get arbitrarily close to 1
and stay arbitrarily close to 1. 2
Lemma 2. The equality
(3.4) lim
h→0
(cos h) −1
h
= 0
holds.
Proof. We will manipulate the expression ((cos h) −1)/h so that
we can use the result of Lemma 1. First, we multiply both the numer
ator and the denominator by cos h + 1 to get
(cos h) −1
h
=
(cos
2
h) −1
h(1 + cos h)
=
−sin
2
h
h(1 + cos h)
.
Therefore, we have
lim
h→0
(cos h) −1
h
= −lim
h→0
sin
2
h
h(1 + cos h)
= −
lim
h→0
sin h
h
·
sin h
1 + cos h
= −
lim
h→0
sin h
h
·
lim
h→0
sin h
1 + cos h
= (−1) · 0 = 0.
2
We can now ﬁnish the proof of Theorem 16. At the end of the ﬁrst
displayed chain of equations in that proof, we saw that
(sin x)
= sin x lim
h→0
(cos h) −1
h
+ cos x lim
h→0
sin h
h
.
The previous two lemmas showed that, on the righthand side, the ﬁrst
limit is 0 and the second limit is 1, so (sin x)
= cos x as claimed. 2
The following theorem can be proved by very similar methods.
Theorem 17. The equality (cos x)
= −sin x holds.
16. THE CHAIN RULE 49
You are asked to prove this theorem in Exercise 15.1(1).
Now that we have the derivatives of sin and cos, the derivatives
of other trigonometric functions can be obtained by simply using the
quotient rule. The next theorem shows an example of this.
Theorem 18. We have (tan x)
= sec
2
x.
Proof. Note that tan x = sin x/cos x, so we can apply the quotient
rule. This leads to
(tan x)
=
sin x
cos x
=
cos x · (sin x)
−sin x(cos x)
cos
2
x
=
cos
x
+sin
2
x
cos
2
x
=
1
cos
2
x
= sec
2
x.
2
The derivatives of the other three trigonometric functions are given
in the exercises.
15.1. Exercises.
(1) Prove that (cos x)
= −sin x.
(2) Prove that (cot x)
= −csc
2
x.
(3) Prove that (csc x)
= −csc x · cot x.
(4) Prove that (sec x)
= sec x tan x.
(5) Let h(x) = e
x
cos x. Find h
(x).
(6) Let h(x) = e
x
/ sin x. Find h
(x).
16. The Chain Rule
16.1. The Derivative of the Composition of Two Functions. In previous
sections, we learned how to compute the derivative of the sum, diﬀer
ence, product, and quotient of two functions. We still do not know
how to compute the derivative of the composition of functions, such as
h(x) = sin(3x), t(x) =
√
x
2
+ 1, or r(x) = e
4x
. In this section, we will
learn a rule, called the chain rule, that applies in these situations.
Theorem 19 (Chain Rule). Let h(x) = f(g(x)), where g is diﬀer
entiable at x and f is diﬀerentiable at g(x). Then h is diﬀerentiable at
x, and we have
h
(x) = f
(g(x))g
(x).
50 3. RULES OF DIFFERENTIATION
The proof of the chain rule is somewhat technical, so we will post
pone it until the end of this section. Now we will discuss some examples
of the applications of the chain rule.
Example 30. Find the derivative of h(x) = sin 3x.
Solution: Let f(x) = sin x and let g(x) = 3x. Then h(x) = f(g(x)),
so, by the chain rule, we have
h
(x) = f
(g(x)) · g
(x) = (cos 3x) · 3 = 3 cos 3x.
2
Example 31. Let h(x) =
√
x
2
+ 1. Find h
(x).
Solution: Let f(x) =
√
x and let g(x) = x
2
+1. Then h(x) = f(g(x)),
so, by the chain rule, we have
h
(x) = f
(g(x))g
(x) =
1
2
√
x
2
+ 1
· 2x =
x
√
x
2
+ 1
.
2
Sometimes the chain rule is written in the Leibniz notation, that
is, as
dh
dx
=
dh
dg
·
dg
dx
.
16.2. Two Applications of the Chain Rule.
16.2.1. A Simple Way of Obtaining (cos x)
. Recall that in the last
section, it took considerable time and eﬀort to prove that (sin x)
=
cos x. Finding (cos x)
with similar methods is just as timeconsuming.
On the other hand, the chain rule enables us to compute (cos x)
faster.
Recall that cos x = sin
x +
π
2
. So we can write cos x as the com
position of two functions, namely cos x = f(g(x)), with f(x) = sin x
and g(x) = x +
π
2
. So the chain rule applies, and we get
(cos x)
= f
(g(x)) · g
(x)
= cos
x +
π
2
· 1
= cos x cos
π
2
−sin x sin
π
2
= 0 −sin x
= −sin x.
16. THE CHAIN RULE 51
16.2.2. The Derivatives of Exponential Functions. Recall that we de
ﬁned the number e such that the derivative of the exponential function
f(x) = e
x
was f(x) itself. Now the chain rule enables us to compute
the derivatives of exponential functions with any base.
Theorem 20. Let a be a positive real number and let h(x) = a
x
.
Then we have
h
(x) = a
x
ln a.
Proof. Note that
h(x) = a
x
= (e
ln a
)
x
= e
xln a
.
So we have succeeded in writing h as the composition of two functions,
namely h(x) = f(g(x)), where f(x) = e
x
and g(x) = x ln a. Therefore,
the chain rule applies, and we get
h
(x) = f
(g(x)) · g
(x) = e
xln a
· ln a = a
x
ln a.
2
16.3. Proof of the Chain Rule. It is time that we proved the chain rule.
Proof of Theorem 19. As g is diﬀerentiable at x, we know that
(3.5) lim
r→0
g(x + r) −g(x)
r
−g
(x)
= 0.
Set
t =
g(x + r) −g(x)
r
−g
(x).
Note that t depends on r, and as r approaches 0, t approaches 0.
Similarly, let y = g(x). As f is diﬀerentiable at y, we have
(3.6) lim
s→0
f(y + s) −f(y)
s
−f
(y)
= 0.
Set
u =
f(y + s) −f(y)
s
−f
(y).
Again, note that u depends on s and that u approaches 0 as s ap
proaches 0.
Now we undertake a series of manipulations of the preceding two
equations. Our goal is to express
f(g(x))
= lim
r→0
f(g(x + r)) −f(g(x))
r
in terms of f
(g(x)) and g
(x).
52 3. RULES OF DIFFERENTIATION
Rearranging the equation that deﬁnes the variable t that we just
introduced, we get
(3.7) g(x + r) = g(x) + (g
(x) + t)r.
Similarly, rearranging the equation that deﬁnes the variable u, we get
(3.8) f(y + s) = f(y) + (f
(y) + u)s.
Now apply the function f to both sides of (3.7) to get
(3.9) f(g(x + r)) = f (g(x) + (g
(x) + t)r) .
Observe that (3.8) holds for all y and s, so, in particular, it holds
when y = g(x) and s = (g
(x) + t)r. Making these substitutions in
(3.8), Equation (3.9) yields
f (g(x + r)) = f(g(x) + (g
(x) + t)r) (3.10)
= f(g(x)) + (f
(g(x)) + u) · (g
(x) + u)r. (3.11)
We can now express the quotient (f(g(x + r)) −f(g(x)))/r from
the equality of the lefthand side of (3.10) and the expression in (3.11)
as
f(g(x + r)) −f(g(x))
r
=
(f
(g(x)) + u)(g
(x) + t)r
r
= (f
(g(x)) + u) (g
(x) + t).
Finally, we are in a position to compute the derivative we were
looking for as the limit of the lefthand side as r approaches 0. We get
lim
r→0
f(g(x + r)) −f(g(x))
r
= lim
r→0
(f
(g(x)) + u)(g
(x) + t)
=
lim
r→0
f
(g(x)) + lim
r→0
u
·
lim
r→0
g
(x) + lim
r→0
t
= f
(g(x)) · g
(x)
since both t and u approach 0 as r approaches 0. 2
16.4. Exercises.
(1) Let h(x) = (x
2
+ 1)
5
. Find h
(x).
(2) Let h(x) = sin (x
2
). Find h
(x).
(3) Let h(x) = tan 3x. Find h
(x).
(4) Let h(x) = e
sin x
. Find h
(x).
(5) Let h(x) = e
x
2
sin x
. Find h
(x).
17. IMPLICIT DIFFERENTIATION 53
17. Implicit Differentiation
In the last several sections, we computed the derivatives of many
diﬀerent functions. Although these functions were diﬀerent, they had
one important feature in common. They were explicitly given. That
is, they were given by a rule that directly described how f(x) = y is
obtained from x.
17.1. Tangent Lines to Implicitly Deﬁned Curves. Sometimes we have to
deal with curves that are given by a diﬀerent kind of rule. Consider
the curve given by the equation
(3.12) x
3
+ y
3
= 4xy.
Let us say that we want to compute the slope of the tangent line
to this curve at the point (2, 2). If we could express y as a function
of x, we could simply take the derivative of that function at x = 2.
However, it is not clear how to write y explicitly in terms of x, even if
(3.12) implicitly describes this dependence.
It is in these situations that we resort to implicit diﬀerentiation.
Keep in mind that we do not need to explicitly know how y depends
on x, that is, we do not need an explicit expression for the function
y(x); we only need to know the derivative dy/dx of that function at
x = 2.
Consider Equation (3.12), and diﬀerentiate both sides with respect
to the variable x to get
d
dx
x
3
+ y
3
=
d
dx
(4xy).
Now recall that y = y(x) is a function of x. So, when computing
(d/dx)y
3
on the lefthand side, we need to use the chain rule. On the
righthand side, we need to use the product rule. Using these rules,
we get
3x
2
+ 3y
2
dy
dx
= 4y + 4x
dy
dx
.
Expressing dy/dx from this equation, we get
dy
dx
=
(4y −3x
2
)
(3y
2
−4x)
.
At the point (2, 2), the righthand side is −4/4 = −1, so the slope of
the tangent line at (2, 2) is −1.
Note that the fact that the tangent line at (2, 2) has slope −1 makes
(intuitively) perfect sense, since the curve in question is symmetric in
54 3. RULES OF DIFFERENTIATION
x and y. That is, if (x, y) is on the curve, then (y, x) is also on the
curve.
17.2. Derivatives of Inverse Trigonometric Functions. One place where
implicit diﬀerentiation is a very powerful tool is in the computation of
the derivatives of inverse trigonometric functions. Recall that tan
−1
x =
y is the function that is the inverse of the restriction of the function
tan x to the interval (−π/2, π/2). That is, if
tan
−1
(x) = y,
then
(3.13) x = tan y,
where y ∈ (−π/2, π/2).
Our goal is to determine
d
dx
tan
−1
(x) =
dy
dx
.
To that end, let us take the derivative of both sides of (3.13) with
respect to x. Recalling that
d
dz
tan z = sec
2
z and y = y(x),
we get
1 = sec
2
y ·
dy
dx
.
Solving for dy/dx and recalling the identity sec
2
z = 1 + tan
2
z, we
obtain
dy
dx
=
1
sec
2
y
=
1
1 + tan
2
y
=
1
1 + x
2
.
In other words, we proved the suprisingly simple formula
(tan
−1
x)
=
1
1 + x
2
.
This formula is interesting for two reasons. First, it is surprisingly sim
ple. Second, it does not even contain trigonometric functions. Imagine
trying to get this result without implicit diﬀerentiation, using just the
deﬁnition of derivatives.
You will be asked to compute the derivatives of the other inverse
trigonometric functions in the exercises.
18. DERIVATIVES OF LOGARITHMIC FUNCTIONS 55
17.3. Exercises.
(1) Let C be the circle given by the equation x
2
+ y
2
= 169. Use
implicit diﬀerentiation to ﬁnd the slope of the tangent line to
C at the point (5, 12).
(2) Prove that (sin
−1
x)
=
1
√
1−x
2
.
(3) Prove that (cos
−1
x)
= −
1
√
1−x
2
.
(4) Prove that (cot
−1
x)
= −
1
1+x
2
.
(5) Prove that (sec
−1
x)
=
1
x
√
x
2
−1
.
(6) Prove that (csc
−1
x)
= −
1
x
√
x
2
−1
.
18. Derivatives of Logarithmic Functions
18.1. The Formula for (log
a
x)
. As another powerful application of im
plicit diﬀerentiation, we compute the derivative of the function f(x) =
ln x.
Theorem 21. We have
(ln x)
=
1
x
.
Proof. Set y = ln x. Then e
y
= x. Diﬀerentiating both sides with
respect to x, we get
e
y
·
dy
dx
= 1,
dy
dx
=
1
e
y
.
However, e
y
= x by deﬁnition, so
dy
dx
=
1
x
as claimed. 2
It is now a breeze to determine the derivative of logarithmic func
tions of any base.
Corollary 3. Let a = 1 be a ﬁxed positive real number. Then
(log
a
x)
=
1
x ln a
.
Proof. Note that
x =
e
ln a
log
a
x
= e
(ln a)(log
a
x)
.
56 3. RULES OF DIFFERENTIATION
So ln x = (ln a) · (log
a
x) and
f(x) = log
a
x =
ln x
ln a
.
As ln a is a constant, it follows that
f
(x) =
1
ln a
(ln x)
=
1
x ln a
as claimed. 2
18.2. The Chain Rule and ln x. An interesting consequence of Theorem
21 is the following.
Corollary 4. Let f(x) be a diﬀerentiable function that takes pos
itive values only. Then
d
dx
ln f(x) =
f
(x)
f(x)
.
Proof. By the chain rule,
d
dx
ln f =
d
df
·
df
dx
=
f
(x)
f(x)
. 2
Example 32. Let f(x) = cos x. Then
d
dx
ln(cos x) =
−sin x
cos x
= −tan x.
18.3. Logarithmic Differentiation. Sometimes we need to compute the
derivative of a complicated product. This is sometimes easier by taking
the logarithm of the product, which will be a sum, and using implicit
diﬀerentiation. This procedure, which is called logarithmic diﬀerenti
ation, has the inherent advantage that it deals with sums instead of
products, and sums are much easier to diﬀerentiate than products.
Example 33. Let
y =
x
3
√
x + 1
√
x −2
.
Compute dy/dx.
Solution: Taking logarithms, we get
ln y = 3 ln x +
1
2
ln(x + 1) −
1
2
ln(x −2).
Now taking derivatives with respect to x and using Corollary 4, we
have
dy
dx
·
1
y
=
3
x
+
1
2(x + 1)
−
1
2(x −2)
.
18. DERIVATIVES OF LOGARITHMIC FUNCTIONS 57
Finally, we can solve this equation for dy/dx to get
dy
dx
= y
3
x
+
1
2(x + 1)
−
1
2(x −2)
=
x
3
√
x + 1
√
x −2
·
3
x
+
1
2(x + 1)
−
1
2(x −2)
.
2
18.4. Power Functions Revisited. Recall that in an earlier section, we
proved that if n is a ﬁxed positive integer, then (x
n
)
= nx
n−1
. We
stated that this was the case for all real numbers n, not just positive
integers, but we have not proved that claim. Now we have the tools,
namely logarithmic diﬀerentiation, to prove it.
Theorem 22. Let n be any real number. Then we have
d
dx
x
n
= nx
n−1
.
Proof. Set y = x
n
. Let us assume for the case of simplicity that
x is positive. Taking logarithms, we have
ln y = nln x.
Diﬀerentiating both sides with respect to x, we get
dy
dx
·
1
y
=
n
x
.
Solving for dy/dx yields
dy
dx
=
ny
x
=
nx
n
x
= nx
n−1
as claimed. 2
18.5. The Number e Revisited. Recall that we have deﬁned the num
ber e, the base of the natural logarithm, as the number for which
lim
h→0
(e
h
−1)/h = 1. Our new knowledge lets us express e more di
rectly, as a limit.
Note that if f(x) = ln x, then f
(x) = 1/x, so f
(1) = 1. By the
deﬁnition of derivatives, this means that
lim
h→0
ln(1 + h) −ln 1
h
= 1.
Observing that ln 1 = 0 and using the power rule of logarithms, we get
lim
h→0
ln(1 + h)
1/h
= 1,
or, applying the exponential function e
z
to both sides, we have
lim
h→0
(1 + h)
1/h
= e.
58 3. RULES OF DIFFERENTIATION
Equivalently, setting x = 1/h, we get
lim
x→∞
1 +
1
x
x
= e.
Either of the last two formulas can help to determine the approximate
value 2.712828 of e.
18.6. Exercises.
(1) Compute
d
dx
ln(
√
x + 1).
(2) Compute (ln x)
.
(3) Compute (x
x
)
.
(4) Compute f
(x) if f(x) = x
4
3
√
x+4
x+1
.
(5) Compute lim
x→∞
1 +
1
x
2x
.
19. Applications of Rates of Change
In this section, we consider a few applications of derivatives in var
ious disciplines.
19.1. Physics. Recall that if an object moves along a line and the
distance it covers in time t is described by the function s(t), then
(3.14) v(t) =
ds
dt
= s
(t) = lim
h→0
s(t + h) −s(t)
h
is the instantaneous velocity of the object at time t.
We can take this concept one step further. If the object moves at
a changing velocity, then the rate of change of the velocity itself can
be important information. For instance, when considering a vehicle’s
performance, we may be interested in how fast it can reach its top
speed, not only what its top speed is.
The corresponding notion in physics is called acceleration, and is
denoted by a(t). That is, keeping the previous notation, we have
(3.15) a(t) = v
(t) =
dv
dt
= s
(t).
Example 34. The position of a particle is described by the equation
(3.16) s(t) =
1
3
t
3
−3t
2
+ 5t.
Here s is measured in meters and t in seconds.
(I) What is the velocity of the particle after 3 seconds?
(II) Find the acceleration of the particle after 10 seconds.
(III) When does the particle move backward?
19. APPLICATIONS OF RATES OF CHANGE 59
Additional questions about the movement of this particle will be
given in the exercises.
Solution:
(I) The velocity of the particle is described by the function v(t) =
s
(t) = t
2
− 6t + 5. This yields v(3) = 9 − 18 + 5 = −4. So
the velocity of the particle after 3 seconds is −4 m/s, meaning
that the particle is moving backward at a speed of 4 meters
per second after 3 seconds.
(II) The acceleration of the particle is given by the formula a(t) =
v
(t) = 2t −6. So, after 10 seconds, the particle is accelerating
at 14 m/s
2
.
(III) The particle is moving backward when its velocity v(t) is nega
tive. That happens when v(t) = t
2
−6t+5 = (t−1)(t−5) < 0,
that is, when t ∈ (1, 5). In other words, the particle is moving
backward between the ﬁrst and ﬁfth seconds. 2
19.2. Economics. Let us say that a company estimates that it costs
C(x) dollars to produce x units of a new product. It is often the case
that C(x), which is called the cost function, can be described by a
polynomial function, such as
C(x) = a + bx + cx
2
+ dx
3
.
The reason for this is as follows. There will be some costs, such as
designing the product and obtaining permits, that will be present re
gardless of the number of units produced. These will be represented
by the constant term a. Then there will be costs, such as renting a
location and buying supplies, that will be more or less in direct pro
portion to the number of units produced. These will be represented
by the linear term bx. Then there will be other factors, such as hiring
workers, marketing the product, and organizing production, that will
be in direct proportion to a higher power of x as the diﬀerences in size
turn into diﬀerences in kind. Taxes may factor in at an even higher
rate.
Because the cost function C(x) is not a linear polynomial, producing
the 1001st unit does not cost of the same as producing the ﬁrst unit
or the 5001st unit. The cost of increasing production from n units to
n + 1 units, in other words, the cost of producing the (n + 1)th unit,
can be computed by the formula
M(n) = C(n + 1) −C(n).
60 3. RULES OF DIFFERENTIATION
The marginal cost function C
(x) describes how the cost function
changes. In that, C
(x) and M(n) are similar. There is one important
diﬀerence. As we know, the derivative C
(x) is given by
(3.17) lim
Δx→0
C(x + Δx) −C(x)
Δx
.
However, it could well be that the smallest meaningful positive value
of Δx is 1, in case the products are such that fractional units do not
make sense (e.g., automobiles). In that case, Δx → 0 is impossible
in its precise mathematical meaning; the closest that Δx can get to
0 is when Δx = 1. In that case, however, the expression after the
limit symbol in (3.17) simpliﬁes to C(x + 1) − C(x), justifying the
approximation
(3.18) M(x) = C(x + 1) −C(x) ≈ C
(x).
Example 35. The cost function of a bottle of a new medication is
given by C(x) = 10
6
+20x+0.001x
2
+0.000001x
3
. Find the approximate
cost of producing the 101st and the 1001st bottles.
Solution: By the preceding discussion, we need to compute the func
tion C
(x). By the rules of diﬀerentiating a polynomial function, we
get C
(x) = 0.000003x
2
+ 0.002x + 20. So the 101st bottle costs
0.0003 · 100
2
+ 0.002 · 100 + 20 = 20.23 dollars to produce, while the
1001st bottle costs 0.000003 · 1000
2
+ 0.002 · 100
2
+ 20 = 43 dollars to
produce. 2
It is important to note that the result of the previous example, that
is, the fact that it costs more to produce the 1001st bottle than the
101st bottle does not mean that the more bottles are produced, the
more expensive it is to produce the average bottle. This is because
the cost of producing the ﬁrst bottle is astronomical, since C(1) > 10
6
.
Compared to that, the cost of each of the ﬁrst thousand, or even, ﬁrst
ten thousand bottles is very small, so the production of each of them
will bring the cost of producing the average bottle down. (The cost
of producing the average bottle if n bottles are produced is of course
C(n)/n.)
In the exercises, you are asked to compare these results to the results
obtained by using the formula C(n + 1) −C(n).
19.3. Exercises.
(1) Consider the particle of Example 34. After 6 seconds, how far
from its starting point is that particle? In what direction?
(2) Consider the particle of the previous exercise. Are there any
moments when the particle is not moving?
20. RELATED RATES 61
(3) The location of an object moving vertically is described by the
function s(t) = t −
t
2
5
, for t ∈ [0, 5], where time is measured
in seconds and distance is measured in meters. When will the
object have an instantaneous velocity of 0.2 m/s?
(4) Consider the object of the previous exercise. When does it
have the greatest speed going up? When does it have the
greatest speed going down?
(5) Use the formula M(n) = C(n + 1) − C(n) to ﬁnd the cost of
producing the 101st and 1001st units in Example 35. Com
pare your results with the estimates that we found using the
function C
(x).
20. Related Rates
20.1. Preliminaries. An intuitive idea of the notion of related rates
comes from a simple fact of everyday life: If there are two related quan
tities that are changing with time, then their rates of change should
also be related. For example, the volume V of water in a pool of area
20 m
2
is related to the water level h (the pool depth in meters) as
V = 20 h. Suppose the water level is low and needs to be increased. A
hose is put into the pool that can pump water at a rate of 0.2 m
3
/h.
At what rate does the water level increase? So the volume and the
water level are both functions of time, V = V (t) and h = h(t). For
every instance of time t, their values are related as V (t) = 20h(t) and
so must be their derivatives or rates of change:
(3.19) V (t) = 20h(t) → V
(t) = 20h
(t) .
Now the question is easy to answer. Since V
(t) = 0.2 m
3
/h, h
(t) =
V
(t)/20 = 0.01 m/h = 1 cm/h. The water level rises by 1 cm every
hour. A somewhat practical estimate! You would know exactly when
to come back and turn oﬀ the water if you needed an inch or so of the
water level increase. Apparently, the same idea of related rates would
work for lowering the water level after rain.
20.2. Units. It is important to bring all the quantities to the same
system of units. For example, in the above problem the pool area is
often given in square feet, for example, 200 ft
2
, while the pumping rate
is given in gallons per hour, for example, V
= 60 gal/h. One gallon
is 3.785 · 10
−3
m
3
and therefore V
= 60 · 3.785 · 10
−3
= 0.2271 m
3
/h.
One square foot is 9.29 · 10
−2
m
2
, so the pool area is 200 · 9.29 · 10
−2
=
18.58 m
2
. Hence, h
= 0.2271/18.58 ≈ 1.2 cm/h. In 1999, NASA lost a
$125 million Mars orbiter because a Lockheed Martin engineering team
62 3. RULES OF DIFFERENTIATION
used English units of measurement while the agency’s team used the
more conventional metric system for a key spacecraft operation.
20.3. Formal Deﬁnition of Related Rates. The very notion of related
quantities can be stated in proper mathematical terms.
Definition 25. Two quantities y and x are said to be related if
there is a function f such that y = f(x).
In the previous example, V = f(h) = 20h. Suppose now that the quanti
ties y and x are functions of another variable t (e.g., t is time): x = x(t)
and y = y(t). Then the rate of change of x or y with respect to t is
nothing but the derivative x
(t) or y
(t). The problem of “related rates”
can now be cast in the proper mathematical terms: What is the rela
tionship between the derivatives x
(t) and y
(t) if the values of x(t) and
y(t) are related by y = f(x)? The values of the functions x(t) and y(t)
are related as y(t) = f(x(t)) for any t. Taking the derivative of both
sides with respect to t by means of the chain rule (Theorem 19), we
obtain a generalization of (3.19):
(3.20) y(t) = f(x(t)) → y
(t) = f
(x(t))x
(t).
Equation (3.20) establishes the soughtafter relation between the rates
y
and x
. However, it seems somewhat diﬀerent from (3.19): The rates
are still proportional to one another, but the proportionality coeﬃcient
f
(x) is no longer a constant, but a function. How do we use it? Take
a particular value of t = t
0
. Let the values of x and y at t = t
0
be x
0
=
x(t
0
) and y
0
= y(t
0
). The number a = f
(x
0
) can be calculated. Then
the equality y
(t
0
) = ax
(t
0
) determines the relation between the rates
y
and x
at the instance when x has the value x
0
(or y has the value
y
0
= f(x
0
)).
Example 36. Let a laser pointer be positioned at a distance D = 1
m from a wall. The pointer can be rotated so that the bright spot created
by the laser beam travels horizontally on the wall.
(I) At what speed does the bright spot travel along the wall if the
pointer revolves at a constant rate ω = π rad/s?
(II) At what direction of the laser beam does the bright spot travel
at the speed v = 4π m/s?
Solution:
(I) The analysis of any problem on related rates must begin with
deﬁning the quantities whose rates are being studied. In other
20. RELATED RATES 63
words, one has to answer the question: How are these quan
tities measured? The orientation of the laser beam can be
described by the angle ϕ between the perpendicular to the
wall and the laser beam. The position of the bright spot may
be set by the distance y traveled by it from the point on the
wall when ϕ = 0, that is, when the laser beam is perpendic
ular to the wall. If the pointer rotates, the angle becomes a
function of time, ϕ = ϕ(t), and so does the position of the
bright spot, y = y(t). Thus, the question is about the relation
between the rates y
(t) = v (the speed at which the bright spot
travels) and ϕ
(t) = ω (the rate at which the pointer rotates).
(II) The next step is to ﬁnd a function that determines the relation
between the quantities of interest, that is, between the distance
y and the angle ϕ: y = f(ϕ). It is clear that D and y are
related as the catheti of the right triangle whose hypotenuse
is the laser beam: y = Dtan ϕ = f(ϕ).
(III) Once the relation between the quantities of interest has been
established, the relation between their rates can be found.
Since (tan ϕ)
= 1/ cos
2
ϕ, Equation (3.20) yields
(3.21) y = Dtan ϕ → y
=
D
cos
2
ϕ
ϕ
.
The ﬁrst question is answered by setting ϕ
= ω = π rad/s
and D = 1 m/s, and y
= v is measured in meters per second.
(IV) Note that the rate y
= v is not constant even if the rate ϕ
= ω
is constant. To answer the second question, one has to ﬁnd
the value of ϕ when v = 4π m/s, D = 1 m, and ω = π rad/s.
It follows from Equation (3.21) that
cos
2
ϕ =
Dω
v
=
1
4
→ ϕ =
π
3
;
that is, the bright spot moves at the speed 4π m/s when the
laser beam makes 60
◦
with the perpendicular to the wall.
2
20.4. Can Anything Travel Faster Than Light? The solution (3.21) has
an interesting feature. When ϕ approaches 90
◦
, that is, the laser beam
is getting closer to being parallel to the wall, the cosine, cos ϕ, tends to
0 in Equation (3.21), and hence the rate y
= v grows unboundedly. It
seems like just with merely a laser pointer, a superluminal object can
be created in a lecture hall! Let us investigate this. The speed of light is
c ≈ 300,000 km/s ≈ 186,000 mi/sec. The light can make a trip around
64 3. RULES OF DIFFERENTIATION
the world in merely 0.13 seconds! Example 36 is now supplemented by
two additional questions:
(I) Is it possible that v can exceed the speed of light? If so, at
which direction of the laser beam does it happen?
(II) At which position of the bright spot does it happen?
The answers read:
(I) Setting D = 1 m = 10
−3
km (watch the units: all distances
are now in kilometers!) and v = c = 3 · 10
5
km/s, the angle
at which the bright spot exceeds the speed of light satisﬁes
the equation cos
2
ϕ = Dω/c ≈ 1.05 · 10
−8
, and hence ϕ ≈
89.99414
◦
. So the bright spot becomes superluminal if ϕ >
89.99414
◦
!
(II) Since y = Dtan ϕ, v > c if y > 9772 m. Well, a lecture
hall appears to be a “bit” small for this experiment! Take
a Dremel miniature grinder (sold in Lowe’s stores) for which
ω ≈ π· 10
3
rad/s (it can be used to rotate the pointer), and set
D = 0.1 m, then v > c if y > 98 m; not yet exactly a lecture
hall experiment, but it can be managed on the campus!
Einstein’s theory of relativity states that no material object can
travel faster than light. Has a counterexample to Einstein’s theory just
been found? The answer is “no.” In the motion of the bright spot,
no material object actually moves along the wall. Bright spots at y
and y + Δy are created by diﬀerent portions of the laser beam that
are emitted by the laser at two distinct moments of time. A lump of
light that arrived at y was reﬂected by the wall (that is why we see
the bright spot!), and hence it could not appear at the next position
y +Δy (at this position arrived a diﬀerent lump of light emitted by the
laser at a later time). So the rate Δy/Δt cannot possibly be associated
with the motion of any material object.
20.5. Related Problem. The next time you watch a Florida sunset, look
at your shadow. Does there exist a position of the Sun above the
horizon at which your shadow extends faster than the speed of light?
20.6. More Than Two Related Rates. There are situations when several
quantities are related among themselves. If these quantities become
functions of a variable t, then their rates are linearly related. A proof of
this statement is given in a more advanced course, where the functions
of several variables are studied. However, the basic idea of ﬁnding
relations between the rates has not changed: They are obtained by
21. LINEAR APPROXIMATIONS AND DIFFERENTIALS 65
diﬀerentiating the relations between the quantities in question with
respect to t. The procedure is illustrated in the following example.
Example 37. Consider a rectangle with sides x and y. Suppose that
x and y change with time. Find their rates of change when x = 3 cm
and y = 1 cm if, at that moment, the area of the rectangle decreases at
the rate 2 cm
2
/s while the perimeter does not change.
Solution:
(I) There are four quantities involved: the rectangle dimensions x
and y, the area S, and the perimeter P.
(II) There are two relations between them:
S = xy , P = 2(x + y) .
(III) If x = x(t) and y = y(t), then S(t) = x(t)y(t) and P(t) =
2(x(t)+y(t)). Using the derivative of the product and the sum
of two functions, the linear relations between the rates are
S
= x
y + xy
, P
= 2(x
+ y
).
(IV) Since P
= 0 (the perimeter does not change), x
= −y
and
S
= (x − y)y
. Now let S
= −2 cm
2
/s because S decreases
(S
must be negative). With x = 3 cm and y = 1 cm, one
has −2 = (3 − 1)y
and y
= −1 cm/s. It then follows that
x
= −y
= 1 cm/s.
2
21. Linear Approximations and Differentials
21.1. Tangent Line Approximation. The derivative of a function f(x)
at a point x = x
0
deﬁnes the slope of the line tangent to the graph
y = f(x) at the point (x
0
, f(x
0
)) (see Equation (2.5)). The equation
of the tangent line is
y −f(x
0
)
x −x
0
= f
(x
0
) or y = f(x
0
) + f
(x
0
)(x −x
0
) .
Definition 26. Suppose f(x) is diﬀerentiable at x = x
0
. The
linear function
(3.22) L(x) = f(x
0
) + f
(x
0
)(x −x
0
)
is called the linearization of f(x) in a neighborhood of x
0
.
Since the values of f and L coincide at x = x
0
, one might expect
that the diﬀerence f(x) −L(x) is small, provided x is close enough to
x
0
. So the linear function L(x) may be used to approximate f(x) in a
66 3. RULES OF DIFFERENTIATION
small neighborhood of x
0
, that is, f(x) ≈ L(x). This approximation is
called the linear approximation or tangent line approximation.
Example 38. Use the linear approximation to estimate the value
√
3.92.
Solution:
(I) Consider f(x) =
√
x. The closest value of x to 3.92 at which
the square root can be evaluated without a calculator is x
0
= 4:
f(x
0
) = 2. Note the two important steps here: the choice of
f(x) suitable for the problem and the choice of x
0
near which
the linear approximation is to be used.
(II) Since f
(x) = (
√
x)
= 1/(2
√
x) and f
(4) = 1/4, by Equation
(3.22) the linearization of
√
x near x = 4 is
L(x) = 2 +
1
4
(x −4).
(III) The linear approximation means that the value f(3.92) =
√
3.92 is approximated by the value L(3.92):
√
3.92 ≈ L(3.92) = 2 +
1
4
(3.92 −4) = 1.98.
2
A calculator gives
√
3.96 ≈ 1.98997. So the approximation error is

√
3.92−L(3.92) < 1.02·10
−4
. It is easy to see that L(4.08) = 2.02 and

√
4.08 − L(4.08) < 1.02 · 10
−4
. This observation can be summarized
by the following inequality:

√
x −L(x) < 1.02 · 10
−4
if x −4 ≤ 0.08.
In other words, the values of
√
x and its linearization diﬀer by no more
than 1.02· 10
−4
for all 3.92 ≤ x ≤ 4.08. Naturally, a decrease (increase)
in the upper bound for the error would lead to a decrease (increase) in
the size of a neighborhood of x = 4 where the linear approximation is
accurate.
21.2. Accuracy of the Linear Approximation. The previous example leads
to a problem that is extremely important in applications: Given an up
per bound for the error ε of the linear approximation of a function f(x)
near x
0
, ﬁnd δ such that
f(x) −L(x) ≤ ε if x −x
0
 ≤ δ,
or, alternatively, given δ, i.e., the neighborhood x
0
− δ ≤ x ≤ x
0
+ δ,
estimate the error ε of the linear approximation. The following theorem
is useful to answer these questions.
21. LINEAR APPROXIMATIONS AND DIFFERENTIALS 67
Theorem 23. Suppose a function f(x) is twice diﬀerentiable in
(a, b) such that f
(x) ≤ M for all x ∈ (a, b) and some number M.
Let L(x) be the linearization of f(x) at x
0
∈ (a, b). Then
f(x) −L(x) ≤
1
2
M(x −x
0
)
2
, x ∈ (a, b).
This theorem is a simpler version of the Taylor theorem, which is
proved in advanced calculus courses. The following example illustrates
the use of this theorem to assess the accuracy of the linear approxima
tion.
Example 39. Consider the linearization of sin x at x = 0. Find
an interval x ≤ δ in which the error of the linear approximation does
not exceed ε = 0.5 · 10
−3
.
Solution:
(I) Since f
(x) = (sin x)
= cos x, f
(0) = 1, and f(0) = 0, the
linearization is L(x) = x.
(II) In Theorem 23, let a = −δ and b = δ. Next, one has to
ﬁnd M. The simplest way to do this is to take the maximal
value of f
(x) in the interval x ≤ δ. Note that there should
be δ < π/2 because L(π/2) − sin(π/2) = π/2 − 1 exceeds
the given error ε. So sin x is monotonic in x ≤ δ, and hence
(sin x)
 =  sin x ≤ sin δ = M for all x ≤ δ. By Theorem 23,
(3.23)  sin x −x ≤
1
2
Mx
2
≤
1
2
Mδ
2
= ε if x ≤ δ.
With M = sin δ, the solution of the equation δ
2
sin δ = 2ε =
10
−3
determines δ. An analytic solution of this equation is im
possible. So a value of δ has to be found numerically (actually,
δ ≈ 0.100057).
(III) Otherwise, one can choose a larger M, for example, sin x ≤
1 for any x. So M = 1 is acceptable, too. This simpliﬁes
Equation (3.23): δ
2
= 10
−3
and hence δ ≈ 0.0362. This value
of δ appears to be smaller than that in the case M = sin δ. It
follows from Equation (3.23) that a larger value of M leads to
a smaller δ. So this option should not be “abused.” A good
M is not too large and yet is simple enough to solve Equation
(3.23). This requires some skills to achieve.
(IV) A good compromise is to use the inequality sin δ ≤ δ. So
the choice M = δ also fulﬁlls the conditions of Theorem 23.
68 3. RULES OF DIFFERENTIATION
Equation (3.23) becomes δ
3
= 10
−3
and δ = 0.1, which is to
be compared with δ = 0.0362 when M = 1 and δ ≈ 0.100057
when M = sin δ. 2
The converse problem is simpler: Find an upper bound for the error
of the linear approximation of sin x at x = 0 in the interval x ≤ 0.2.
One has  sin x −x ≤ ε =
1
2
Mδ
2
= 0.5 · sin(0.2) · (0.2)
2
≈ 3.9734 · 10
−3
.
21.3. Differential. For a real variable x, the diﬀerential dx is deﬁned as
an increment of x. It can be given the value of any real number inde
pendently of the value of x; that is, dx is considered as an independent
variable. So, with every real variable, one can associate another real
variable, called the diﬀerential. If two real variables are related, the
following rule postulates the relation between their diﬀerentials.
Definition 27. Let two variables y and x be related as y = f(x),
where f is a diﬀerentiable function. The diﬀerential dy = df(x) is
deﬁned by the linear transformation of dx:
(3.24) dy = df(x) = f
(x) dx.
Note that the variables x and dx on the righthand side are independent
variables. Equation (3.24) states that, if the variables y and x are
related, then the diﬀerential dy is no longer an independent variable
and is determined by x and dx; speciﬁcally, dy depends linearly on dx.
21.4. Geometrical Signiﬁcance of the Differential. Put dx = Δx, where
Δx is a real number. Consider an increment of the variable y between
x + Δx and x: Δy = f(x + Δx) − f(x). For any x, the diﬀerential
of a function df(x) = dy = f
(x) Δx does not generally coincide with
its increment Δy. For example, put f(x) = x
2
, x = 1, Δx = 0.2, then
Δy = (1 + 0.2)
2
− 1 = 0.44, whereas dy = f
(1) Δx = 2 · 0.2 = 0.4.
Since the derivative f
(x) determines the slope of the tangent line to
the graph y = f(x), the diﬀerential is the increment of y along the
tangent line through the point (x, f(x)) in the interval [x, x + Δx].
Thus, dy = Δy because the tangent line does not generally coincide
with the graph.
An intuitive understanding of the diﬀerential stems from its geo
metrical interpretation. Let Δx tend to 0. The ratio
Δy −dy
Δx
=
Δy
Δx
−f
(x)
tends to 0 by the existence of f
(x). This means that the diﬀerence
Δy − dy must go to 0 faster than Δx. An increment Δx is said to
be inﬁnitesimally small if (Δx)
n
, n > 1, can always be neglected. So
21. LINEAR APPROXIMATIONS AND DIFFERENTIALS 69
one might think of diﬀerentials as inﬁnitesimal variations of variables.
From this point of view, the deﬁnition (3.24) looks rather natural:
Inﬁnitesimal variations of two related variables must be related linearly
as their higher powers can always be neglected. Thus, the concept of
the diﬀerential becomes rather practical: to establish relations between
variations of related quantities in situations when these variations may
be viewed as inﬁnitesimal.
21.5. Related Errors. Every physical quantity is known only with a
certain degree of accuracy. Errors are inherent in the very process of
taking measurements. As a point of fact, a value of a physical quan
tity given without its measurement error does not make much sense;
neither should one draw any conclusion from data without a proper
analysis of the errors. One of the important practical applications of
the diﬀerential lies in the error analysis of related quantities.
Suppose there is a relation between two quantities y and x, y =
f(x). Let x be measured with an error: x ±Δx; that is, Δx is known.
Using the measured values of x, the values of y are computed by
y = f(x). What is the accuracy of the values of y? Apparently, x
and Δx are independent variables as the error Δx depends on the way
in which the variable is measured (there are more and less accurate
methods which would lead to smaller and higher values of Δx indepen
dently of the actual value of x). Naturally, one might assume that the
errors are small; that is, they are inﬁnitesimal variations of measured
quantities. Then the errors of the related quantities must be related as
their diﬀerentials! That is, y ±dy, where dy = f
(x) dx and dx = Δx.
The diﬀerential dy represents an absolute error of y. The quantity dy/y
is called a relative error.
Example 40. What are the absolute and relative errors of the vol
ume of a cube if its side is 10 ±0.1 cm?
A typical ruler has a grid spacing of 1 mm, which determines the
length measurement error of 0.1 cm.
Solution: The volume V and side x are related as V = x
3
. So
dV = 3x
2
dx. Setting dx = 0.1 cm and x = 10 cm, dV = 30 cm
3
and V = 1000±30 cm
3
. The relative error is dV/V = 0.03 or 3% (note
that dx/x = 0.01, i.e., only 1%). 2
The error analysis for several related quantities is given in a more
advanced course; it is based on the diﬀerential of a function of several
variables.
CHAPTER 4
Applications of Differentiation
22. Minimum and Maximum Values
Some of the most important applications of calculus are optimiza
tion problems. An example of an ancient optimization problem: A man
can throw a stone at a speed of v
0
. At what angle should the stone be
thrown in order to get the maximal range? An example of a modern
optimization problem: How can one optimize the information ﬂow in
the World Wide Web to avoid crashes of servers? Many of these prob
lems can be reduced to ﬁnding the maximal and minimal values of a
given function.
Definition 28 (Absolute Maximum and Minimum). A function f
has an absolute maximum at c if f(x) ≤ f(c) for all x in the domain
D of f. Similarly, the value f(c) is called the maximum value of f. A
function f has an absolute minimum at c if f(x) ≥ f(c) for all x in
the domain D of f. The value f(c) is called the minimum value of f.
The maximum and minimum values of f are called the extreme values
of f.
For example, the function f(x) = cos x attains its maximum value
1 at x = 2πn, where n = 0, ±1, ±2, ..., and its minimum value −1
at x = π + 2πn. A function does not always have a maximum or
minimum. For instance, the function f(x) = 1/x deﬁned for all real
x = 0 has neither maximum nor minimum because, for any real M,
one can always ﬁnd x such that f(x) > M (0 < x < 1/M). So
no real number can be the maximum of f(x). Similarly, for any real
M, f(x) < M if −1/M < x < 0; that is, no minimum exists. The
function f(x) = x
2
has no maximum on the real axis, but it does have
an absolute minimum at x = 0 because x
2
≥ 0 for all x and f(0) = 0,
that is, f(x) ≥ f(0).
Definition 29 (Local Maximum and Minimum). A function f
has a local (or relative) maximum at c if f(x) ≤ f(c) for all x in
some open interval containing c. Similarly, a function f has a local (or
relative) minimum at c if f(x) ≥ f(c) for all x in some open interval
containing c.
71
72 4. APPLICATIONS OF DIFFERENTIATION
Example 41. Does the function f(x) = x
3
− x = x(x
2
− 1) have
an absolute maximal (minimal) value and relative maxima (minima)
on the real axis?
Solution:
(I) The function has neither an absolute maximum nor an abso
lute minimum because it grows unboundedly with increasing
x and it decreases unboundedly as x attains larger negative
values.
(II) The function vanishes at three points x = 0, ±1. It can have
relative minima and maxima between its zeros because the
values of f are bounded from above and below: f(x) ≤ x
3
+
x ≤ 2 for x ≤ 1, that is, −2 ≤ f(x) ≤ 2 if −1 ≤ x ≤ 1.
(III) Consider the open interval x ∈ (0, 1). The function is strictly
negative in it and bounded from below: M < f(x) < 0 for
all x ∈ (0, 1) (e.g., M = −3). By increasing M, one can even
tually reach the situation when there is 0 < c < 1 such that
M = f(c) ≤ f(x) for all 0 < x < 1. This happens when the
horizontal line y = M touches the graph y = f(x). Thus, f
must have a relative minimum in (0, 1).
Remark. The actual value c = 1/
√
3. How is it obtained?
There is a technique to ﬁnd c, which will be studied shortly.
(IV) Similarly, f is strictly positive in (−1, 0) and bounded from
above 0 < f(x) < M for some M. By lowering the horizontal
line y = M (or decreasing M) to the point when it touches
the graph y = f(x), one can ﬁnd a point c ∈ (−1, 0) such
that f(x) ≤ f(c) for all x ∈ (−1, 0); that is, f has a relative
maximum in (−1, 0).
Remark. The actual value is c = −1/
√
3 (see below). 2
One of the lessons that can be learned from this example is that one
can think of a relative minimum (maximum) as an absolute minimum
(maximum) when f is restricted to a suﬃciently small subset in its
domain. This observation is accurately stated by the following theorem.
Theorem 24 (The Extreme Value Theorem). If f is a continuous
function on a closed interval [a, b], then f attains its absolute maximum
and minimum values in [a, b]; that is, there exist c
1
and c
2
in [a, b] such
that f(c
1
) ≤ f(x) ≤ f(c
2
) for all x in [a, b].
The continuity hypothesis is essential. In fact, the continuity of
f(x) = x
3
− x was implicitly used in Example 41 to establish the ex
istence of its relative maximum and minimum! The following example
22. MINIMUM AND MAXIMUM VALUES 73
illustrates the point. Consider the function f(x) = 2x if x ∈ [0, 1) and
f(x) = 1 if x ∈ [1, 2]. So the function is deﬁned on the closed interval
[0, 2] and bounded from above f(x) < M (e.g., M > 2). An attempt
to establish the existence of a maximal value of f by lowering M fails!
Indeed, the lowest upper bound is M = 2, but there is no c such that
f(c) = 2. The values of f approach 2 as x approaches 1 from the left,
but f(1) = 1! For any positive > 0, f(2 −) < f(x) for x ∈ (2 −, 2)
no matter how small is. Thus, f does not have an absolute maximum
value because of its discontinuity at x = 1. The absolute minimum
exists: f(0) ≤ f(x). Note that the function f(x) = 2x when x ∈ [0, 1]
and f(x) = 1 when x ∈ (1, 2] has an absolute maximum and mini
mum, f(0) ≤ f(x) ≤ f(1), despite its discontinuity at x = 1. So the
continuity hypothesis is a suﬃcient condition, but not necessary.
The hypothesis of the closedness of the interval is also a suﬃcient
condition, but not necessary. The continuous function f(x) = x does
not attain its absolute maximum or minimum value on any open in
terval (a, b). But it does so if the interval becomes closed: f(a) ≤
f(x) ≤ f(b) for any x ∈ [a, b]. On the other hand, the continuous
function f(x) = x
3
−x in the open interval (−1, 1) attains its absolute
maximum and minimum value (see Example 41).
The second observation resulting from Example 41 is that at the
point where a continuous function attains its local minimum or max
imum value there is a horizontal line that touches the graph of this
function. So, if the function is diﬀerentiable, then this horizontal line
is a tangent line with the vanishing slope; that is, the derivative of the
function vanishes at points where the function attains its local maxi
mum or minimum value.
Theorem 25 (Fermat’s Theorem). If f has a local maximum or
minimum at c, and if f
(c) exists, then f
(c) = 0.
Proof. By the existence of f
(c).
lim
h→0
f(c + h) −f(c)
h
= f
(c).
Therefore, the righthand and lefthand limits must coincide with f
(c)
(see Section 2):
(4.1) lim
h→0
−
f(c + h) −f(c)
h
= f
(c) = lim
h→0
+
f(c + h) −f(c)
h
.
Let f have a local maximum (the case of a local minimum can be
treated similarly). Then f(c) ≥ f(x) or f(x) −f(c) ≤ 0 in some open
interval a < x < b. In particular, [f(c + h) − f(c)]/h ≤ 0 for any
74 4. APPLICATIONS OF DIFFERENTIATION
positive h > 0 such that c < c + h < b. By Theorem 3,
(4.2)
f(c + h) −f(c)
h
≤ 0 → lim
h→0
+
f(c + h) −f(c)
h
≤ 0 .
Similarly, for any negative h < 0 such that a < c + h < c, one has
[f(c + h) −f(c)] ≤ 0 and [f(c + h) −f(c)]/h ≥ 0. Hence,
(4.3) 0 ≤
f(c + h) −f(c)
h
→ 0 ≤ lim
h→0
−
f(c + h) −f(c)
h
.
By inequalities (4.2) and (4.3), it follows from (4.1) that
0 ≤ f
(c) ≤ 0 ,
which is only possible if f
(c) = 0. 2
This theorem provides a powerful tool to determine the actual po
sitions of local maxima and minima. Let us go back to Example 41
(f(x) = x
3
− x). The slope f
(x) = 3x
2
− 1 vanishes at two points
x = ±1/
√
3. According to the analysis carried out in Example 41,
f has a local maximum at x = −1/
√
3 ∈ (−1, 0) and a local minimum
at x = 1/
√
3 ∈ (0, 1).
Definition 30. A number c in the domain of a function f is said
to be a critical point of f if either f
(c) = 0 or f
(c) does not exist.
Does the equation f
(x) = 0 determine all local maxima and minima
of f?
(I) A function may have a local minimum or maximum at a point
where the derivative does not exist. A simple example is the
function f(x) = x. It has an absolute minimum at x = 0,
but f
(x) does not exist at x = 0. So this minimum cannot be
found from f
(x) = 0.
(II) If f is diﬀerentiable everywhere, then, by solving f
(x) = 0,
all local minima and maxima can be found. However, not all
the solutions generally correspond to either a local maximum
or a local minimum. The function f(x) = x
3
has no minimum
or maximum, but its derivative f
(x) = 3x
2
vanishes at x = 0.
In other words, the converse of Fermat’s theorem is false.
(III) If all critical points of a function are found, then their type
(local maximum, local minimum, or none of the above) can
be analyzed by comparing values f(c ± h) with f(c), where c
is a critical point (cf. Deﬁnition 29). If f
(c) exists, then the
second derivative test can be used, which is discussed later.
22. MINIMUM AND MAXIMUM VALUES 75
(IV) A function deﬁned on a closed interval [a, b] can have its ab
solute maximum or minimum at the endpoints. When ﬁnding
the absolute maximum and minimum values, the values of f
at the critical points must be compared with f(a) and f(b).
The largest (smallest) of them is the absolute maximum (min
imum) value. 2
Example 42. If a stone is thrown at a speed v
0
m/s and an angle θ
with the horizontal line, then its trajectory is a parabola:
(4.4) y = x tan θ −x
2
g
2v
2
0
cos
2
θ
,
where y is the stone height (vertical position), x is the horizontal posi
tion (all the positions are in meters), and g = 9.8 m/s
2
is a constant
universal for all objects near the surface of the Earth (the freefall ac
celeration). This is a consequence of the Newton’s second law. At what
angle should one throw a stone to reach the maximal range at a given
speed v
0
?
Solution:
(I) The range as a function of the angle θ has to be found ﬁrst.
The stone lands when its height y vanishes. This happens at
x = 0 (naturally, this is where the stone was thrown) and at
x = L(θ), where
L(θ) =
2v
2
0
g
tan θ cos
2
θ =
2v
2
0
g
sin θ cos θ =
v
2
0
g
sin(2θ).
(II) The range L(θ) is a diﬀerentiable function of θ so the values
of θ at which L attains its extreme values may be found from
the equation
L
(θ) = 0 →
v
2
0
g
2 cos(2θ) = 0 → cos(2θ) = 0 .
This equation has countably many solutions 2θ = π/2 + πn,
where n is any integer. But in the interval of the physical
values of θ ∈ [0, π/2], it has only one solution θ = π/4. Since
sin(2π/4) = 1 (the absolute maximum of the sine), L attains
its maximal value at θ = π/4; that is, the range is maximal,
L
max
= v
2
0
/g, when a stone is thrown at 45
◦
.
2
Remark. The conclusion in the preceding example is independent
of the stone’s mass and its initial speed v
0
. In reality, for larger values
of v
0
, like a projectile shot by a gun, the trajectory would deviate a
76 4. APPLICATIONS OF DIFFERENTIATION
bit from the parabola (due friction with the air). So the optimal angle
would deviate a bit from π/4. The deviation would also depend on the
mass and the initial speed. The range optimization problem would be
more involved and would require the theory of diﬀerential equations.
It should also be noted that the angle at which the maximal range is
attained depends on the initial height at which the stone is thrown.
So, the angle would be diﬀerent from 45
◦
when the stone is thrown, for
example, from a cliﬀ.
23. The Mean Value Theorem
Theorem 26 (Rolle’s Theorem). Let f be a function that satisﬁes
the following three hypotheses:
(I) f is continuous on the closed interval [a, b].
(II) f is diﬀerentiable on the open interval (a, b).
(III) f(a) = f(b).
Then there is a number c in (a, b) such that f
(c) = 0.
This theorem provides a useful method to prove the existence of a
local maximum or minimum of a function f when analytic solutions
of the equation f
(x) = 0 are hard to ﬁnd. In fact, it has already
been used in Example 41: The function f(x) = x
3
− x on the inter
vals [−1, 0], [0, 1], [−1, 1] satisﬁes the hypotheses of Rolle’s theorem
because f(±1) = f(0) = 0. The proof follows closely the arguments of
Example 41.
Proof of Theorem 26.
(I) If f(x) = f(a) = k is a constant function, then f
(x) = 0
everywhere.
(II) Let f(x) > f(a) for some x ∈ (a, b) (cf. Example 41 for
x ∈ [−1, 0]). Since f is continuous, the extreme value theorem
applies, and therefore f has a maximum in [a, b]. Since f(a) =
f(b), the maximal value must be attained at c ∈ (a, b). By Fer
mat’s theorem, f
(c) = 0 because f is diﬀerentiable in (a, b).
(III) If f(x) < f(a) for some x ∈ (a, b) (cf. Example 41 for x ∈
[−1, 1] or x ∈ [0, 1]), then, by the extreme value theorem, f has
a minimum at c ∈ (a, b), and, by Fermat’s theorem, f
(c) = 0.
2
Rolle’s theorem is also useful to analyze the root pattern of a
function.
Example 43. How many real roots does the equation x
5
+x
3
+x−
1 = 0 have?
23. THE MEAN VALUE THEOREM 77
Solution:
(I) Let f(x) = x
5
+ x
3
− 1. Evidently, f(−1) = −4 < 0 and
f(1) = 2 > 0. By continuity, f has to take all intermediate
values between −4 and 2 (the intermediate value theorem). So
f has at least one root in (−1, 1).
(II) Suppose it has two roots a and b, that is, f(a) = f(b) = 0.
Then, by Rolle’s theorem, f
(x) has to vanish somewhere in
(a, b). But this is not possible because f
(x) = 5x
4
+3x
2
+1 > 0
for any x. Thus, f has the only real root.
2
Theorem 27 (The Mean Value Theorem). Let f be a function that
satisﬁes the following hypotheses:
(I) f is continuous on the closed interval [a, b].
(II) f is diﬀerentiable on the open interval (a, b).
Then there is a number c ∈ (a, b) such that
(4.5) f
(c) =
f(b) −f(a)
b −a
or f(b) −f(a) = f
(c)(b −a) .
The geometrical interpretation of the theorem is simple. Consider
the line through the points (a, f(a)) and (b, f(b)). Its slope is (f(b) −
f(a))/(b −a). The theorem asserts the existence of a point where the
graph y = f(x) has a tangent line with the same slope (cf. Equation
(4.5)) (as f
(c) is the slope of the tangent line at x = c). Let us turn
to a formal proof.
Proof of Theorem 27.
(I) Consider the line through the points (a, f(a)) and (b, f(b)). Its
equation is
y = L(x) = f(a) +
f(b) −f(a)
b −a
(x −a) , (4.6)
L(a) = f(a) , L(b) = f(b) .
Next, consider the function
(4.7) h(x) = f(x) −L(x) = f(x) −f(a) −
f(b) −f(a)
b −a
(x −a) .
Its values determine the deviation of the graph y = f(x) from
the secant line y = L(x) on the closed interval [a, b].
(II) The function h(x) satisﬁes the three hypotheses of Rolle’s
theorem. First, it is continuous on [a, b] as the sum of two
78 4. APPLICATIONS OF DIFFERENTIATION
continuous functions f(x) and −L(x) (a linear function is con
tinuous). Second, it is diﬀerentiable on (a, b) as the sum of two
diﬀerentiable functions:
(4.8) h
(x) = f
(x) −
f(b) −f(a)
b −a
, x ∈ (a, b) .
Finally, by (4.6) and (4.7), h(a) = f(a) −L(a) = 0 and h(b) =
f(b) −L(b) = 0, that is, h(a) = h(b).
(III) By Rolle’s theorem, there is a number c ∈ (a, b) such that
h
(c) = 0 → f
(c) =
f(b) −f(a)
b −a
,
where Equation (4.8) has been used. 2
Example 44. A speeding car was pulled over on an interstate road
and a state trooper gave a warning to the driver. Forty ﬁve minutes
later and passed 65 miles on the road, the car stopped at a rest area.
Another state trooper approached the driver and issued a speeding ticket,
claiming that the driver exceeded 86 mi/hr. Was the trooper’s claim
correct?
Solution: Let s(t) be the distance traveled by the car after it was
pulled over the ﬁrst time. The rate of change s
(t) = v(t) is the speed
of the car at any moment of time. The function s(t) is deﬁned between
t = 0 and t = 45 min = 0.75 hr so that s(0) = 0 and s(0.75) = 65 mi.
It is diﬀerentiable as s
(t) is the car speed! By the mean value theorem,
there is a time moment t = c ∈ (0, 0.75) when
s
(c) = v(c) =
s(0.75) −s(0)
0.75 −0
=
65
0.75
≈ 86.7 mi/hr.
The speeding ticket is justiﬁed. 2
For any two moments of time a and b, the ratio (s(b) −s(a))/(b−a)
is the average speed on the time interval [a, b]. The mean value theorem
simply states that a moving object always attains its average speed at
least at one moment of time between a and b. So, if at time moment b
the object appears to be traveling slower than its average speed, prior
to that it must have been traveling faster than its average speed.
Example 45. Suppose the derivative f
exists and is bounded on
(a, b), that is, m ≤ f
(x) ≤ M. If f(a) is given, how small and how
large can f(b) possibly be?
Solution: By the mean value theorem, there is a c ∈ (a, b) such that
f(b) = f(a) + f
(c)(b −a). Since m ≤ f
(c) ≤ M,
f(a) + m(b −a) ≤ f(b) ≤ f(a) + M(b −a) .
24. THE FIRST AND SECOND DERIVATIVE TESTS 79
This equation is easy to understand with the help of a mechanical anal
ogy: How far can a car travel in time b−a if its speed is not lower than
m, but cannot exceed M. 2
The derivative of a constant function vanishes. How about the
converse? The following theorem answers this question.
Theorem 28. If f
(x) = 0 for all x in an interval (a, b), then f is
constant on (a, b).
Proof. Take any two numbers x
1
and x
2
between a and b. By the
mean value theorem, there is a number c between x
1
and x
2
such that
f(x
1
) − f(x
2
) = f
(c)(x
1
− x
2
). By hypothesis, f
(c) = 0 for any c.
Thus, f(x
1
) − f(x
2
) = 0 or f(x
1
) = f(x
2
) for any x
1
and x
2
in (a, b);
that is, f is constant. 2
The hypothesis that f
(x) = 0 in a single interval is crucial. For
example, the sign function f(x) = 1 if x > 0, and f(x) = −1 if
x < 0, has zero derivative at any point of its domain, but it is not
constant. The key point to note is that the domain is not a single
interval, but a union of two disjoint intervals (−∞, 0) and (0, ∞) So
the mean value theorem is not applicable to any interval containing
x = 0. This example is easily extended to the case when the domain is
any collection of disjoint intervals and f takes diﬀerent constant values
on diﬀerent intervals.
Corollary 5. If f
(x) = g
(x) for all x in an interval (a, b), then
f −g is constant, that is, f(x) = g(x) + k, where k is a constant.
Proof. Let h(x) = f(x) −g(x). Since h
= f
−g
= 0 in (a, b), h
is constant, and the conclusion follows. 2
24. The First and Second Derivative Tests
Suppose the critical points of a function f are known. If f is diﬀer
entiable, then all critical points can be found by solving the equation
f
(x) = 0. How can one ﬁgure out the nature of a critical point,
that is, whether it is a local maximum, local minimum, or none of the
above? It turns out that this question can be answered by studying
the derivatives f
and f
. In addition, many qualitative features of
the graph y = f(x) can be deduced from properties of the derivatives
of f.
80 4. APPLICATIONS OF DIFFERENTIATION
24.1. Properties of the First Derivative.
Theorem 29 (IncreasingDecreasing Test).
(I) If f
> 0 on an interval, then f is increasing on that interval.
(II) If f
< 0 on an interval, then f is decreasing on that interval.
Proof. Take any two numbers x
1
and x
2
in the interval so that
x
1
< x
2
. A function is increasing if f(x
1
) < f(x
2
) and decreasing if
f(x
1
) > f(x
2
). Since f is diﬀerentiable, the mean value theorem states
that there is a number c between x
1
and x
2
such that
(4.9) f(x
2
) −f(x
1
) = f
(c)(x
2
−x
1
) .
If f
> 0, then it follows from (4.9) that f(x
2
) −f(x
1
) > 0 because, by
assumption, x
2
> x
1
; that is, the function is increasing. Similarly, for
f
< 0, f(x
2
) −f(x
1
) < 0, and the function is decreasing. 2
Suppose f
is continuous such that f
(a) = m and f
(b) = M. Then,
on the interval [a, b], f
must take all intermediate values between m
and M. Suppose m < 0 and M > 0 or m > 0 and M < 0, that
is, the derivative changes its sign on the interval [a, b], then f
must
vanish between a and b. This means that f has a critical point a <
c < b, f
(c) = 0. More to the point, if the derivative f
changes from
negative to positive at c, then, according to the increasingdecreasing
test, the function f changes from increasing to decreasing at c, that is,
f(c − h) < f(c) and f(c) > f(c + h) for some small positive h. We
can then conclude that f attains its local maximum at c. Similarly, if
the derivative f
changes from negative to positive at c, then f changes
from decreasing to increasing at c, f(c−h) > f(c) and f(c) < f(c+h),
and hence f attains its local minimum at c. Naturally, there is a
possibility that f
(c) = 0 but f
(x) does not change its sign at c. In
such a situation, the increasingdecreasing test yields f(c−h) < f(c) <
f(c+h) or f(c−h) > f(c) > f(c+h); that is, in either case the function
f has neither a local minimum nor a local maximum. The ﬁndings are
summarized in the following theorem.
Theorem 30 (The First Derivative Test). Suppose that c is a crit
ical point of a continuous function f.
(I) If f
changes from positive to negative at c, then f has a local
maximum at c.
(II) If f
changes from negative to positive at c, then f has a local
minimum at c.
(III) If f
does not changes its sign at c, then f has neither a local
maximum nor a local minimum at c.
24. THE FIRST AND SECOND DERIVATIVE TESTS 81
It is important to note that the very existence of f
at c is not
required in the ﬁrst derivative test. Recall the deﬁnition of a critical
point (f
(c) = 0 or f
(c) does not exist). In fact, in the preceding proof
of the ﬁrst derivative test, the condition f
(c) = 0 can easily be dropped
because all that is really needed to apply the increasingdecreasing test
is the sign of the derivative f
(x) for x < c and x > c. For example,
f(x) = x. Then f
(x) = −1 for x < 0 (the function is decreasing)
and f
(x) = 1 for x > 0 (the function is increasing). Hence, f(x)
has a minimum at x = 0, even though f
does not exist at x = 0. The
continuity hypothesis is also crucial. Consider the function f(x) = 1/x
2
for x = 0 and f(0) = 0. Then f
(x) = −2/x
3
for x = 0 and f
(0) does
not exist. So x = 0 is a critical number. The function is increasing for
x < 0 because f
> 0, and it is decreasing for x > 0 because f
< 0.
However, f has no maximum at x = 0 because f is discontinuous at
x = 0. In fact, it attains its absolute minimum at x = 0!
There are plenty of mechanical analogies of the ﬁrst derivative test.
Let H(t) be the height (relative to the ground) of a stone thrown up
ward as a function of time t. At the beginning, the stone moves up
ward so H
> 0 (the height is increasing). When the stone comes back
to the ground, it moves downward so H
< 0 (the height is decreas
ing). Naturally, at some moment of time, the stone has to reach the
maximal height. Analyze the motion of a pendulum (or a seesaw)
from this point of view! The height would have two maxima and one
minimum.
Example 46 (Example 41 Revisited). Find all local maxima and
minima of f(x) = x
3
− x and the intervals on which the function is
increasing or decreasing.
Solution:
(I) Since f is diﬀerentiable (it is a polynomial), all its critical
points satisfy the equation
f
(x) = 3x
2
−1 = 3
x −1/
√
3
x + 1/
√
3
= 0 .
Hence, the critical points are c
1
= −1/
√
3 and c
2
= 1/
√
3.
(II) For x < c
1
, the product (x − c
1
)(x − c
2
) is positive (as the
product of two negative numbers), and hence f
> 0 (f is
increasing on (−∞, c
1
)). For c
1
< x < c
2
, the product (x −
c
1
)(x−c
2
) is negative (as the product of a negative and positive
number), and hence f
< 0 (f is decreasing on (c
1
, c
2
)). For
x > c
2
, the product (x−c
1
)(x−c
2
) is positive (as the product
82 4. APPLICATIONS OF DIFFERENTIATION
of two positive numbers), and hence f
> 0 (f is increasing on
(c
2
, ∞)).
(III) The derivative changes from positive to negative at c
1
. There
fore, f has a local maximum at c
1
. The derivative changes
from negative to positive at c
2
. Therefore, f has a local mini
mum at c
2
.
2
24.2. Properties of the Second Derivative: Inﬂection Points.
Definition 31 (Concavity). The graph of a function f is called
concave upward on an interval I if it lies above all of its tangent lines
on I. The graph is called concave downward on I if it lies below all of
its tangent lines on I.
Note that the notion of concavity implies that f is diﬀerentiable
(otherwise, the tangent lines do not exist). If f is twice diﬀerentiable,
then the concavity is determined by the sign of the second derivative
f
. Suppose that the graph of f is concave upward on I. Consider the
tangent lines at two points c and c + h in I:
L
1
(x) = f(c) +f
(c)(x −c) , L
2
(x) = f(c +h) +f
(c +h)(x −c −h).
The graph of f lies above the lines L
1
and L
2
, that is, f(x) −L
1
(x) > 0
and f(x) −L
2
(x) > 0 for all x in I. Putting x = c in the last inequality
and x = c + h in the former one, we obtain
f(c) −L
2
(c) = f(c) −f(c + h) + f
(c + h)h > 0,
f(c + h) −L
1
(c + h) = f(c + h) −f(c) −f
(c)h > 0
The sum of the righthand sides of these inequalities is positive as the
sum of two positive numbers:
(4.10) h[f
(c + h) −f
(c)] > 0 →
f
(c + h) −f
(c)
h
> 0,
where the ﬁrst inequality has been divided by a positive number h
2
.
The latter inequality is true for any h. Therefore, by taking the limit
h →0, we can conclude that f
(c) > 0 if the graph is concave upward.
Inequality (4.10) shows that f
(c + h) > f
(c) for h > 0 and f
(c) >
f
(c+h) for h < 0. In other words, the derivative f
, or the slope of the
tangent line of the graph of f, increases for the upward concavity, and
hence (f
)
= f
must be positive by the increasingdecreasing test.
Similarly, the downward concavity implies that f
is negative. It turns
out that the converse is also true.
24. THE FIRST AND SECOND DERIVATIVE TESTS 83
Theorem 31 (The Concavity Test). Let f be twice diﬀerentiable
on an interval I.
(I) If f
(x) > 0 for all x in I, then the graph of f is concave
upward on I.
(II) If f
(x) < 0 for all x in I, then the graph of f is concave
downward on I.
How does the graph of f look near a point c where f
(c) = 0? There
are four possibilities. First, f
(c ± h) > 0 for some small h > 0. This
means that the graph is concave upward to the left and right of c. As
an example, consider f(x) = x
4
. Second, f
(c ± h) < 0. This implies
that the graph is concave downward to the left and right of c. As an
example, take f(x) = −x
4
. Third, f
(c−h) > 0 and f
(c+h) < 0, that
is, the concavity changes from upward to downward (e.g., f(x) = −x
3
).
Fourth, f
(c −h) < 0 and f
(c +h) > 0, that is, the concavity changes
from downward to upward (e.g., f(x) = x
3
).
Definition 32 (Inﬂection Point). A point P on the graph y =
f(x) is called an inﬂection point if f is continuous there and the graph
changes from concave upward to concave downward or from concave
downward to concave upward.
Let c be a critical point of f. Suppose f
is continuous near c.
What can f
(c) tell us about the nature of the critical number (local
minimum or maximum)? There are three possibilities. First, f
(c) > 0.
This means that f
(x) > 0 for all x in some neighborhood of c (by
the continuity of f
). Hence, f is concave upward near c; that is,
its graph lies above the tangent line at c, which is a horizontal line
because f
(c) = 0. So f must have a local minimum. Similarly, the
condition f
(c) < 0 implies that the concavity is downward near c
and f has a local maximum. If f
(c) = 0, then the concavity may
or may not change at c as discussed earlier. The function may have
a local maximum, a local minimum, or an inﬂection point; that is, no
conclusion about the nature of the critical point can be reached.
Theorem 32 (The Second Derivative Test). Suppose f
is contin
uous near c.
(I) If f
(c) = 0 and f
(c) > 0, then f has a local. minimum at c.
(II) If f
(c) = 0 and f
(c) < 0, then f has a local. maximum at c.
(III) If f
(c) = 0 and f
(c) = 0, then f may have a local maximum,
a local minimum, or an inﬂection point.
In Example 46, the function f(x) = x
3
− x is shown to have two
critical points: x = ±1/
√
3. Since f
(x) = 6x, f
(−1/
√
3) = −2
√
3 <
84 4. APPLICATIONS OF DIFFERENTIATION
0 (a local maximum) and f
(1/
√
3) = 2
√
3 > 0 (a local minimum).
The function also has an inﬂection point at x = 0: f
(x) = 6x < 0
if x < 0 and f
(x) = 6x > 0 if x > 0. Note that an inﬂection point
may not be related a critical point! In other words, the tangent line
at an inﬂection point can have any slope. In the previous example,
f
(0) = −1.
25. Taylor Polynomials and the Local Behavior of a Function
The tangent line approximation L(x) is the best linear approxima
tion of f(x) near x = a because L(x) and f(x) have the same rate
of change at a. In the previous section, it was shown that the second
derivative at a provides important information about the behavior of
f(x) near a, namely the concavity. The tangent line L(x) has no con
cavity as L
(x) = 0. The question arises whether there is a systematic
method to improve the accuracy of the tangent line approximation to
capture more essential features of the behavior of f(x) near a (i.e., the
local behavior of f).
25.1. Taylor Polynomials. The function L(x) is a polynomial of the ﬁrst
degree. Consider the seconddegree polynomial
T
2
(x) = f(a) + f
(a)(x −a) + c
2
(x −a)
2
= L(x) + c
2
(x −a)
2
,
where c
2
is an arbitrary coeﬃcient. This polynomial has the same
features as L(x), that is, T
2
(a) = L(a) = f(a) and T
2
(a) = L
(a) =
f
(a) because T
2
(x) = f
(a) + 2c
2
(x −a). So it might provide a better
approximation of f(x) than L(x) near a if the coeﬃcient c
2
is chosen
so that T
2
(x) has the same concavity as f(x) near a. By the concavity
test, it is then reasonable to assume that T
2
(a) = f
(a), which yields
2c
2
= f
(a) or c
2
= f
(a)/2. The idea can be extended to a polynomial
of degree n:
T
n
(x) = c
0
+ c
1
(x −a) + c
2
(x −a)
2
+· · · + c
n
(x −a)
n
,
where the coeﬃcients are ﬁxed by the conditions
T
n
(a) = f(a) , T
n
(a) = f
(a) , T
n
(a) = f
(a) , . . . , T
(n)
n
(a) = c
n
.
The resulting polynomial is called the nthdegree Taylor polynomial:
T
n
(x) = f(a) + f
(a)(x −a) +
f
(a)
2!
(x −a)
2
+· · · +
f
(n)
(a)
n!
(x −a)
n
.
25. TAYLOR POLYNOMIALS AND THE LOCAL BEHAVIOR 85
25.2. Accuracy of Taylor Polynomials. The accuracy of the tangent line
approximation is assessed in Theorem 23. Let us compare it with
the accuracy of higherdegree Taylor polynomials. Consider Taylor
polynomials of the exponential function e
x
near x = 0. Since (e
x
)
= e
x
and e
0
= 1, the Taylor polynomials are
f(x) = e
x
: T
n
(x) = 1 + x +
1
2
x
2
+
1
6
x
3
+· · · +
1
n!
x
n
.
Let us take a few values of x near x = 0 and compare the values of the
Taylor polynomials with the value of the function:
x = 1 : f = 2.718 T
1
= 2.000 T
2
= 2.500 T
3
= 2.667
x = −0.5 : f = 0.607 T
1
= 0.500 T
2
= 0.625 T
3
= 0.604
x = 0.25 : f = 1.284 T
1
= 1.250 T
2
= 1.281 T
3
= 1.284
Two observations can be made from this table. First, the accuracy
increases with the degree of the Taylor polynomial (reading the rows
of the table). Second, lowerdegree Taylor polynomials become more
accurate as the argument gets closer to the point at which the Taylor
polynomials are constructed (reading the columns of the table). For ex
ample, the approximation e
x
≈ T
3
(x) is accurate up to four signiﬁcant
digits if x ≤ 1/4. So the accuracy of the approximation e
x
≈ T
2
(x)
is determined by the diﬀerence T
2
− T
3
= −x
3
/6, that is, by the next
monomial to be added to T
2
to get the next Taylor polynomial. This
observation is a characteristic feature of Taylor polynomials:
Theorem 33. Let f be continuously diﬀerentiable n+1 times on an
open interval I containing a. Let f
(n+1)
be bounded on I, f
(n+1)
(x) ≤
M. Then
(4.11) f(x) −T
n
(x)
≤
M
(n + 1)!
x −a
n+1
,
where T
n
is the Taylor polynomial at a.
Theorem 23 is a particular case of this theorem for n = 1. Inequality
(4.11) is a consequence of the Taylor theorem whose proof goes beyond
the scope of this course. For example, what is the accuracy of the Taylor
polynomial T
5
(x) near a = 0 for the exponential e
x
in the interval
[−1, 1]? To get the upper bound on errors, one should take the maximal
value of the righthand side of (4.11) for n = 5 in the interval, that is,
(e
x
)
(n)
= e
x
≤ M = e, and x ≤ 1, so errors cannot exceed e/6! ≈
0.0038 (less than 4%).
86 4. APPLICATIONS OF DIFFERENTIATION
25.3. Taylor Polynomials near Critical Points. Let a be a critical point
of f. Provided f is enough times diﬀerentiable, Taylor polynomials can
be constructed near a. The linear term vanishes because f
(a) = 0.
The second derivative test is easy to understand by looking at T
2
(x) =
f(a) + f
(a)(x − a)
2
/2. If f
(a) > 0, then f looks like a downward
parabola near a (a local maximum). If f
(a) < 0, then f looks like an
upward parabola near a (a local minimum). For example, cos x has a
local maximum at a = 0, and it behaves near a = 0 as cos x ≈ T
2
(x) =
1 − x
2
/2. The second derivative test is inconclusive if f
(a) = 0. In
this case, f(x) behaves near a as T
3
(x) = f(a) +f
(a)(x−a)
3
/6. So, if
f
(a) = 0, f has an inﬂection point at a. If f
(a) = 0, one should look
at T
4
(x) = f(a) +f
(4)
(a)(x −a)
4
/24. A function has a local maximum
(minimum) at a if f
(4)
(a) > 0 (f
(4)
(a) < 0). Thus, the local behavior
of f near its critical point is determined by a Taylor polynomial that
has the ﬁrst nonvanishing correction to f(a).
Example 47. Investigate f(x) = x −tan x near x = 0.
Solution: Find a Taylor polynomial for tan x with two nontrivial
terms. In this case, it is T
3
: tan x ≈ T
3
(x) = x − x
3
/3. Therefore,
f(x) ≈ x −T
3
(x) = x
3
/3. So there is an inﬂection point at x = 0. 2
25.4. Asymptotes. Taylor polynomials provide a powerful technique to
investigate the local behavior of a diﬀerentiable function. How can one
analyze the behavior of a function near a if it is not diﬀerentiable at
a, or not even deﬁned at a, or how does it behave in the asymptotic
regions x →±∞?
Definition 33 (Vertical Asymptotes). The line x = a is a ver
tical asymptote of the graph y = f(x) if at least one of the limits
lim
x→a
± f(x) is inﬁnite (∞ or −∞).
In other words, the function f(x) increases (decreases) unboundedly
as x approaches a from either the left or the right. For example, the
function
(4.12) f(x) =
x(x
2
+ 3)
x
2
−1
=
x(x
2
+ 3)
(x −1)(x + 1)
has two vertical asymptotes because the denominator vanishes at x = 1
and x = −1. When x approaches −1 from the left, f(x) tends to −∞,
while it tends to ∞ if −1 is approached from the right. Similarly,
f(x) →−∞ as x →1
−
and f(x) →∞ as x →1
+
.
Suppose f has a vertical asymptote at a. How does it behave near
a? How “fast” does it diverge when x gets closer to a?
25. TAYLOR POLYNOMIALS AND THE LOCAL BEHAVIOR 87
Definition 34 (Asymptotic Behavior). The functions f(x) and
g(x) on an open interval x > a (including x > −∞) or x < a (including
x < ∞) are said to have the same asymptotic behavior if
(4.13) lim
x→a
+
(f(x) −g(x)) = 0 or lim
x→a
−
(f(x) −g(x)) = 0 .
In particular, if x → ±∞ and g(x) = mx + b, then f is said to have
a slant asymptote, and for m = 0, the slant asymptote is called a
horizontal asymptote.
For a given f, there are many g that have the same asymptotic be
havior because one can always change g by adding h such that h(x) →0
as x → a
±
. A practical problem is to ﬁnd as simple a g as possible
with the property (4.13). In other words, one looks for a simple way
to estimate the values of f(x) near a.
Example 48. Find the asymptotic behavior of the function (4.12).
Solution: The function has to be investigated near x = ±1 and also
when x →±∞.
(I) Near x = −1, the unbounded growth of f(x) is associated with
the divergent factor 1/(x + 1) so that f(x) = h(x)/(1 + x),
where h(x) is ﬁnite near x = −1. Then f(x) ≈ h(−1)/(x +
1) = g(x):
f(x) =
1
x + 1
x(x
2
+ 3)
x −1
≈
2
x + 1
= g(x) .
Apparently, lim
x→−1
±(f(x) − g(x)) = 0. The graphs of f(x)
and g(x) = 2/(x + 1) are close near x = −1,
(II) Similarly, near x = 1
f(x) =
1
x −1
x(x
2
+ 3)
x + 1
≈
2
x −1
= g(x) .
(III) When x is a large negative or positive number,
f(x) =
x · x
2
(1 +
3
x
2
)
x
2
(1 −
1
x
2
)
≈
x
3
x
2
= x = g(x) ,
where 1/x
3
and 3/x
2
are small as compared to 1 for large x
2
and can be neglected, that is, lim
x→±∞
(f(x) −x) = 0. So the
graph of f asymptotically approaches the line y = x. Since
1 + 3/x
2
> 1 − 1/x
2
, the ratio (1 + 3/x
2
)/(1 − 1/x
2
) > 1 for
all x
2
> 1, and hence f(x) > x. This means that the graph of
f approaches the slant asymptote y = x from above when x is
a large positive or negative number. 2
88 4. APPLICATIONS OF DIFFERENTIATION
The following example illustrates the use of Taylor polynomials to
ﬁnd the asymptotic behavior.
Example 49. Investigate f(x) = x
−8/3
(1 −cos x) near x = 0.
Solution: The factor x
−8/3
diverges as x → 0, but cos x is smooth
near x = 0 and can be approximated by the Taylor polynomial T
3
=
1 −x
2
/2 + x
4
/24:
f(x) ≈ x
−8/3
(1 −T
4
(x)) = x
−8/3
(
1
2
x
2
−
1
24
x
4
) =
1
2
x
−2/3
−
1
24
x
4/3
.
Therefore, f(x) −
1
2
x
−2/3
≈
1
24
x
4/3
→ 0 as x → 0. Note that the use
of T
2
in place of T
4
would not be enough to establish the asymptotic
behavior of f. 2
26. L’Hospital’s Rule
If a function f is not deﬁned at a, then its behavior depends on the
limit of f as x →a, whether it is ﬁnite, inﬁnite, or does not even exist.
So this question is of importance when investigating a function. There
is a special technique to answer it.
26.1. Indeterminate Forms
0
0
and
∞
∞
. Consider the behavior of the fol
lowing functions:
e
x
−1
x
,
1 −cos(x)
x
2
,
tan(x) −x
x
3
as x →0 . (4.14)
Do they have a vertical asymptote at x = 0? These functions have a
common feature. They are ratios f/g of two functions f and g such
that f(x) →0 and g(x) →0 as x →0. Similarly, one can make ratios
where the limits of the numerator and denominator at a particular
point are inﬁnite:
(4.15)
ln(x)
x
−1
as x →0
+
.
In general, a limit of the form
lim
x→a
f(x)
g(x)
is called an indeterminate form of type
0
0
if both f(x) →0 and g(x) →0
as x →a; it is called an indeterminate form of type
∞
∞
if both f(x) →∞
(or −∞) and g(x) → 0 (or −∞). The limit itself may or may not
exist. The following theorem provides a powerful method to study the
indeterminate forms of these types.
26. L’HOSPITAL’S RULE 89
Theorem 34 (L’Hospital’s Rule). Suppose f and g are diﬀeren
tiable and g
(x) = 0 on an open interval that contains a (except possibly
at a). Suppose that
lim
x→a
f(x) = lim
x→a
g(x) = 0
or that
lim
x→a
f(x) = ±∞ and lim
x→a
g(x) = ±∞.
Then
(4.16) lim
x→a
f(x)
g(x)
= lim
x→a
f
(x)
g
(x)
if the limit on the righthand side exists (or is inﬁnite).
For the special case in which f(a) = g(a) = 0, the derivatives f
and g
are continuous, and g
(a) = 0, it is not diﬃcult to see why
l’Hospital’s rule (4.16) holds:
lim
x→a
f(x)
g(x)
= lim
x→a
f(x)−f(a)
x−a
g(x)−g(a)
x−a
=
lim
x→a
f(x)−f(a)
x−a
lim
x→a
g(x)−g(a)
x−a
=
f
(a)
g
(a)
= lim
x→a
f
(x)
g
(x)
.
The ﬁrst equality follows from f(a) = g(a) = 0, the second and third
equalities are the consequence of the limit laws and the assumption
that g
(a) = 0, and the last equality follows from the continuity of the
derivatives. This simpliﬁed version of l’Hospital’s rule can be under
stood geometrically. The functions f and g can be approximated by
their tangent lines at a, f(x) ≈ f
(a)(x − a) and g(x) ≈ g
(a)(x − a),
so that f(x)/g(x) ≈ f
(a)/g
(a) near a.
It is not so easy to prove the general version of l’Hospital’s rule
(the proof is omitted here). L’Hospital’s rule is also valid for onesided
limits x →a
±
and for the limits at ±∞. The conditions of l’Hospital’s
rule must be veriﬁed for the corresponding limits.
What happens if f
(a) = g
(a) = 0? Apparently, the conditions of
l’Hospital’s rule are satisﬁed for the derivatives f
(x) and g
(x) in this
case. So, l’Hospital’s rule may be applied again to the ratio f
(x)/g
(x).
For functions diﬀerentiable many times, l’Hospital’s rule is easy to un
derstand via the Taylor polynomials:
f(x)
g(x)
≈
f(a) + f
(a)(x −a) +
1
2
f
(a)(x −a)
2
+· · ·
g(a) + g
(a)(x −a) +
1
2
g
(a)(x −a)
2
+· · ·
.
If f(a) = g(a) = 0, then the limit of the ratio is determined by
f
(a)/g
(a). If f(a) = g(a) = 0 and f
(a) = g
(a) = 0, then the
limit is determined by f
(a)/g
(a) and so on.
90 4. APPLICATIONS OF DIFFERENTIATION
Example 50. Investigate the indeterminate forms (4.14) and (4.15).
Solution:
(I) Let f(x) = e
x
− 1 and g(x) = x. Then f(0) = g(0) = 0 (the
conditions of l’Hospital’s rule are fulﬁlled). Hence,
lim
x→0
e
x
−1
x
= lim
x→0
(e
x
−1)
(x)
= lim
x→0
e
x
1
= 1 .
(II) Let f(x) = 1 −cos(x) and g(x) = x
2
so that f(0) = g(0) = 0.
Then f
(x) = sin(x) and g
(x) = 2x. Since f
(0) = 0 and
g
(0) = 0, l’Hospital’s rule can be applied again:
lim
x→0
1 −cos(x)
x
2
= lim
x→0
sin(x)
2x
= lim
x→0
(sin(x))
(2x)
= lim
x→0
cos(x)
2
=
1
2
.
(III) Let f(x) = tan(x) −x and g(x) = x
3
so that f(0) = g(0) = 0.
The derivatives f
(x) = sec
2
(x) − 1 and g
(x) = 3x
2
vanish
at x = 0. L’Hospital’s rule can be used again to resolve the
indeterminate form. For complicated functions, taking higher
order derivatives might be quite an algebraic exercise. Some
times, simple algebraic transformations of an indeterminate
form in combination with basic limit laws may lead to the
answer faster than a successive use of l’Hospital’s rule:
lim
x→0
tan(x) −x
x
3
= lim
x→0
sec
2
(x) −1
3x
2
= lim
x→0
1 −cos
2
(x)
3x
2
cos
2
(x)
= lim
x→0
sin
2
(x)
3x
2
=
1
3
lim
x→0
sin(x)
x
2
=
1
3
.
The third equality follows from cos(x) → 1 as x → 0, and
therefore cos
2
(x) in the denominator can be replaced by 1 in
accord with the basic limit laws.
(IV) Let f(x) = ln(x) and g(x) = x
−1
so that f(x) → −∞ and
g(x) → ∞ as x → 0
+
. So the conditions of l’Hospital’s rule
are fulﬁlled. Therefore,
lim
x→0
+
ln(x)
x
−1
= lim
x→0
+
(ln(x))
(x
−1
)
= lim
x→0
+
x
−1
−x
−2
= − lim
x→0
+
x = 0.
2
26. L’HOSPITAL’S RULE 91
26.2. Indeterminate Products 0 · ∞. Suppose that f(x) →∞and g(x) →
0 as x → a. How can the indeterminate product f(x)g(x) be inves
tigated when x → a? It turns out the indeterminate product can be
transformed into one of the indeterminate forms to which l’Hospital’s
rule is applicable:
(4.17) fg =
f
1/g
∞· 0 →
∞
∞
or fg =
g
1/f
∞· 0 →
0
0
.
The function x ln(x) is an indeterminate product of the type 0 · ∞ as
x →0
+
. It can be transformed into an indeterminate form of the type
∞
∞
as in (4.15), which is then resolved by l’Hospital’s rule (see Example
50). Note that, although either of the transformations in (4.17) may be
applied with the subsequent use of l’Hospital’s rule, the technicalities
involved might diﬀer substantially. For instance, if the second option
in (4.17) is applied to x ln(x) = x/(1/ ln(x)), then
lim
x→0
+
x ln(x) = lim
x→0
+
x
1
ln(x)
= lim
x→0
+
1
−
1
ln
2
(x)
1
x
= − lim
x→0
+
x ln
2
(x) .
Although our goal has not been achieved, our eﬀort has not been in
vain. Since the lefthand side vanishes by Example 50, it follows that
x ln
2
(x) → 0 as x → 0
+
. By repeating this procedure recursively, one
can infer that x ln
n
(x) →0 as x →0
+
for any n = 1, 2, ....
26.3. Indeterminate Powers 0
0
, ∞
0
, and 1
∞
. Several indeterminate forms
arise from the limits of [f(x)]
g(x)
as x →a:
0
0
(f(x) →0 , g(x) →0) ; ∞
0
(f(x) →∞, g(x) →0);
1
∞
(f(x) →1 , g(x) →0) .
Note c
0
= 1 if c = 0 and c = ∞. Similarly, c
∞
= 0 if 0 ≤ c < 1 and
c
∞
= ∞ if c > 1. The indeterminate powers can be transformed into
an indeterminate product with the help of the identity y = e
ln(y)
:
lim
x→a
[f(x)]
g(x)
= lim
x→a
e
ln([f(x)]
g(x)
)
= lim
x→a
e
g(x) ln(f(x))
= e
limx→a g(x) ln(f(x))
.
The limit of g(x) ln(f(x)) is of type 0 · ∞ and can be treated by the
rule (4.17). The procedure is illustrated with an example of the type
∞
0
indeterminate power:
lim
x→∞
x
1/x
= lim
x→∞
e
ln(x
1/x
)
= lim
x→∞
e
ln(x)/x
= e
limx→∞ln(x)/x
= e
0
= 1 .
92 4. APPLICATIONS OF DIFFERENTIATION
26.4. Indeterminate Differences ∞−∞. Suppose f(x) →∞and g(x) →
∞ as x → a. The limit of f(x) − g(x) as x → a is called an indeter
minate diﬀerence. The following transformations might be helpful to
investigate it:
f −g = f
1 −
g
f
=
1 −g/f
1/f
or f −g = g
f
g −1
=
f/g −1
1/g
.
If f(x)/g(x) →1, then the indeterminate diﬀerence is equivalent to an
indeterminate form of type 0/0 and can be investigated by l’Hospital’s
rule. The limit of f/g is an indeterminate form of type ∞/∞ and can
also be investigated by l’Hospital’s rule. Suppose that f(x)/g(x) → k
as x → a, where k can be either a nonnegative number or k = ∞.
If k < 1, then f − g = g(f/g − 1) → ∞ · (k − 1) = −∞; that
is, g increases faster than f as x → a. If k > 1 or k = ∞, then
f − g = g(f/g − 1) → ∞· (k − 1) = ∞; that is, f increases faster
than g as x →a. For example,
lim
x→0
+
ln(x) +
1
x
= lim
x→0
+
1
x
1 + x ln(x)
= lim
x→0
+
1
x
(1 + 0) = ∞.
If k = 1, then it is also possible that f −g →c, where c is a number. In
this case, f and g increase asymptotically at the same rate: f
−g
→0.
If c = 0, the functions f and g have the same asymptotic behavior. For
example,
lim
x→0
1
sin(x)
−cot(x)
= lim
x→0
1
sin(x)
1 −cos(x)
= lim
x→0
sin(x)
cos(x)
= 0,
where l’Hospital’s rule has been used in the second equality. Note that
Taylor polynomials allow us to ﬁnd the local behavior of this function
near x = 0. Use T
2
to approximate cos(x) and T
3
for sin(x):
1 −cos(x)
sin(x)
≈
x
2
/2
x −x
3
/6
=
x/2
1 −x
2
/6
≈
x
2
,
where x
2
/6 is small as compared to 1 when x is close enough to 0 and
can therefore be neglected in the denominator.
27. Analyzing the Shape of a Graph
To analyze the shape of a graph y = f(x), it is useful to have
a clear idea of how the basic functions behave. For example, sin(x)
and cos(x) are regular everywhere, bounded (e.g.,  sin(x) ≤ 1), and
periodic with a period of 2π. In addition, sin(x) has zeros at x = πn,
n = 0, ±1, ±2, ..., while cos(x) vanishes at π/2 + πn. The function
sin(x) is odd, while cos(x) is even. Their ratio tan(x) = sin(x)/ cos(x)
27. ANALYZING THE SHAPE OF A GRAPH 93
is not deﬁned at roots of cos(x). How does tan(x) behave, say, near
x = π/2? Since both sin(x) and cos(x) are smooth near x = π/2, the
behavior of tan(x) near π/2 can be understood with the help of Taylor
polynomials. Let us approximate sin(x) by T
1
(x) = 1 + (x −π/2) and
cos(x) by T
3
(x) = −(x−π/2) +(x−π/2)
3
/6. To simplify the notation,
write Δx = x −π/2 (the deviation of x from π/2). Then
tan(x) ≈
1 + Δx
−Δx + (Δx)
3
/6
= −
1
Δx
1 + Δx
1 −(Δx)
2
/6
≈ −
1
Δx
= −
1
x −π/2
,
where the second ratio in the product has been approximated by 1
because Δx is small. Since tan(x + π) = tan(x), this behavior repeats
itself at near every root of cos(x).
27.1. Growth of the Power, Exponential, and Logarithmic Functions. Let
us compare the growth of the power function x
n
, the exponential func
tion e
x
, and the logarithmic function ln(x) as x →∞. The exponential
function grows faster than the power function. Let f(x) = e
x
and
g(x) = x
n
. Let us analyze the ratio f/g as x → ∞. The conditions
of l’Hospital’s rule are satisﬁed: e
x
→ ∞ and x
n
→ ∞ as x → ∞.
L’Hospital’s rule can successively be applied until the indeterminate
form is resolved:
lim
x→∞
e
x
x
n
= lim
x→∞
e
x
nx
n−1
= lim
x→∞
e
x
n(n −1)x
n−2
= · · · = lim
x→∞
e
x
n!
= ∞.
The conclusion is true for any real n. For any real n, there exists
a positive integer N such that n < N or x
n
< x
N
, x > 1. But e
x
grows faster than x
N
. Similarly, it is straightforward to show that the
logarithmic function grows slower than any power function:
lim
x→∞
ln(x)
x
n
= lim
x→∞
(ln(x))
(x
n
)
= lim
x→∞
1
x
nx
n−1
= lim
x→∞
1
nx
n
= 0
for any n > 0 (n may be any positive real number here).
27.2. Asymptotes at x → ±∞. The asymptotic behavior of rational
functions is easily determined by the highest powers of the numerator
and denominator, as in Example 48. In general, if lim
x→∞
f(x) is
inﬁnite, then the limit of f/g can be studied for trial g’s with diﬀerent
growth, g = mx (for slant asymptotes), g = x
n
, g = ln(x), and so on.
Suppose g(x) is found such that f(x)/g(x) → 1 as x → ∞. Does this
mean that g and f have the same asymptotic behavior? The answer is
“no.” If the indeterminate form f(x) −g(x) of type ∞−∞ converges
to 0 as x → ∞, then the indeterminate form f(x)/g(x) of type
∞
∞
converges to 1. Indeed, it follows from 1/g(x) →0 and f(x)−g(x) →0
94 4. APPLICATIONS OF DIFFERENTIATION
that (1/g(x))(f(x) −g(x)) = f(x)/g(x) −1 → 0. The converse is not
true. Consider the following simple example: f(x) = x + sin(x) and
g(x) = x. Evidently, f(x)/g(x) = 1 + sin(x)/x → 1 as x → ∞. But
the limit lim
x→∞
(f(x) −g(x)) = lim
x→∞
sin(x) does not exist. So, even
if g is found to have the property f(x)/g(x) → 1 as x → ∞, the
indeterminate form f −g of type ∞−∞ must still be investigated in
order to determine whether or not g has the same asymptotic behavior
as f.
27.3. Guidelines for Analyzing the Shape of a Graph. The following guide
lines are useful for sketching the graph of a function. It should be noted
that not all the steps can always be carried out. This depends very
much on the complexity of the function in question. So these are really
guidelines, not a “mustdo” algorithm. Given a function f, ﬁnd:
(I) Domain.
The domain consists of all values of x at which f(x) is de
ﬁned. Typically, it is a collection of intervals. If f is deﬁned
for x > a or x < a, or both, but not at a, the the local behavior
of f near a must be studied (see below).
(II) Roots of f and the value f(0).
Roots of f(x) deﬁne the intercepts of the graph y = f(x) with
the x axis. They are not always easy to ﬁnd. The value f(0)
(if x = 0 in the domain of f) deﬁnes the intercept of y = f(x)
with the y axis.
(III) Symmetry and periodicity.
If f(−x) = f(x) (an even function) for all x in the domain,
then the graph y = f(x) is symmetric about the y axis. If
f(−x) = −f(x) (an odd function) for all x in the domain,
then the graph y = f(x) is symmetric about the origin (or the
rotation through 180
◦
about the origin). If there is a number p
such that f(x+p) = f(x), then f is periodic and p is its period.
The graph y = f(x) repeats itself on intervals of length p, for
example [a, a+p], [a+p, a+2p], and so on for any a. Examples
are sin(x), p = 2π; tan(x), p = π; cos(4x), p = 2π/4 = π/2.
(IV) Asymptotes and asymptotic behavior of f .
If f is a ratio f = h/g, then vertical asymptotes are x = c,
where c solves g(c) = 0 and h(c) = 0. If h(c) = 0, ﬁnd the
limits lim
x→c
± f(x). If one of the limits or both is inﬁnite,
investigate the local behavior of f near c (e.g., with the help
of Taylor polynomials if possible). The asymptotic behavior of
f(x) near c and for large positive and negative x determines
27. ANALYZING THE SHAPE OF A GRAPH 95
the shape of y = f(x) near the vertical asymptotes and the
asymptotic shape of the graph when x →±∞.
(V) Critical numbers of f .
Critical numbers are solutions of f
(x) = 0 or the values of
x where f
(x) does not exist. If, for example, f
(x) tends to
∞ (or −∞) as x approaches c, then the line tangent to the
graph y = f(x) at x = c is vertical. For example, f(x) = x
1/3
and f
(x) = 1/(3x
2/3
). So f
(x) diverges as x →0. The graph
y = x
1/3
has a vertical tangent line at x = 0.
(VI) Intervals of positive and negative values of f .
These are the intervals where the graph y = f(x) lies above or
below the x axis. Roots of f generally separate the intervals of
positive and negative values of f. However, this is not always
the case. Let c be a root of f. If f
(c) = 0, then the function
f is increasing or decreasing at c and hence must change its
sign. If f
(c) = 0 or f
does not exist at c, that is, a root
of f coincides with its critical point, then f is negative near
c if f has a local maximum at c and f is positive near c if
it has a local minimum at c. So the sign of the derivative f
must be investigated near c (the ﬁrst derivative test). Vertical
asymptotes can also separate intervals of positive and negative
values of f. For example, the function (4.12) has one root x = 0
and two vertical asymptotes at x = −1 and x = 1. So f is
negative on (−∞, −1), positive on (−1, 0), negative on (0, 1),
and positive on (1, ∞).
(VII) Intervals of increase (f
> 0) and decrease (f
< 0).
If f
> 0 (f
< 0) on an interval, then f increases (decreases)
on it (the increasingdecreasing test). These intervals are gen
erally separated by critical numbers and vertical asymptotes.
As a consequence of this study, the nature of each critical point
is established by the ﬁrst derivative test.
(VIII) Intervals of upward and downward concavity.
These intervals are separated by inﬂection points and vertical
asymptotes. The sign of f
(x) must be studied. Yet, the second
derivative test and Taylor polynomials can be used to establish
the nature of a critical point of f.
(IX) Values of f at critical points and inﬂection points.
These values set relative scales of the graph (e.g., they show
how much the function increases between two critical
points).
96 4. APPLICATIONS OF DIFFERENTIATION
Example 51. Sketch the graph of f(x) = x
1/3
(x −6)
2/3
.
Solution: Following the preceding guidelines:
(I) The domain is the whole real line.
(II) The roots of f are x = 0 and x = 6 (the intercepts with the x
axis). The intercept with the y axis is f(0) = 0.
(III) There is no vertical asymptote. For large values x →±∞, one
has f(x) = x(1−6x
−2/3
) ≈ x. So the graph has the asymptotic
behavior y = x. Since 1 −6x
−2/3
< 1 (for large x), the graph
approaches the line y = x from below.
(IV) The function is not periodic, and it is neither odd nor even.
(V) The derivative reads
f
(x) =
2
3
x −3
x
2/3
(x −6)
1/3
.
It vanishes at x = 3 and does not exist at x = 0 and x = 6.
The critical points are 0, 3, and 6. In particular, f
(x) →∞as
x →0 and it tends to ±∞as x →6
±
, respectively. Therefore,
the graph has vertical tangent lines at x = 0 and x = 6. Near
x = 0, the graph looks like y = f(x) ≈ 6
2/3
x
1/3
, while near
x = 6, it has a downward cusp y = f(x) ≈ 6
1/3
(x −6)
2/3
.
(VI) The graph lies below the x axis on (−∞, 0) (f > 0) and above
it on (0, ∞). There is no sign change at the root x = 6 (f must
have a local minimum at 6, which is also veriﬁed by the ﬁrst
derivative test below).
(VII) The derivative is a product of three factors x − 3, x
−2/3
, and
(x − 6)
−1/3
. By investigating the signs of these factors on the
intervals separated by the critical points, we can conclude that
f
> 0 (f is increasing) on (−∞, 0), f
> 0 (f is increasing)
on (0, 3), f
< 0 (f is decreasing) on (3, 6), and f
> 0 (f is
increasing) on (6, ∞). Also, f has a local maximum at x = 3
and a local minimum at x = 6 by the ﬁrst derivative test.
(VIII) The second derivative reads
f
(x) = −
4
9
x
2
−3x + 18
x
5/3
(x −6)
4/3
.
The polynomial x
2
− 3x + 18 > 0 for all x because it has no
real root. The factor (x−6)
4/3
cannot be negative, either. The
sign of f
is determined only by that of x
5/3
. Thus, f
> 0 on
(−∞, 0) (the graph is upward concave) and f
< 0 on (0, 6)
and (6, ∞) (the graph is downward concave). So x = 0 is the
inﬂection point. Also, near x = 3, the graph looks like the
28. OPTIMIZATION PROBLEMS 97
downward parabola y = T
2
(x) = f(3) + f
(3)(x − 3)
2
/2 =
3 −
4
9
(x −3)
2
.
2
In the age of graphing calculators, the preceding guidelines might
look rather obsolete because ﬁnding the shape of a graph can be done
just by hitting the right calculator buttons. But what a calculator can
not do is to provide details of the local behavior of a function near
points of interest (e.g., critical points, asymptotes, etc.). In science
and engineering, this is often much more important than the overall
shape of a graph. In the previous example, a calculator would show
that there is a slant asymptote, a cusp at x = 6, and a local maximum
at x = 3, but it would not be able to determine the local behavior
of the function near the cusp, or at the local maximum, or in the as
ymptotic region. Here a good working knowledge of calculus becomes
indispensable, while a graphing calculator is just a useful tool that
greatly facilitates the study of a function.
28. Optimization Problems
Suppose that a quantity Q depends on some variables. The prob
lem of optimizing Q implies ﬁnding the values of the variables at which
the quantity Q attains it maximal or minimal value. The simplest op
timization problem arises when Q depends on a single variable x such
that Q is a function f(x). Then the optimization problem is reduced
to the problem of ﬁnding extreme values of f(x). The latter problem
has been analyzed in Section 22. To determine extreme values of f, one
has to:
(I) Find all critical points of f.
(II) Investigate the nature of the critical points (local minima and
local maxima). The ﬁrst or second derivative tests can be used
for this purpose.
(III) Calculate the values of f at the endpoint of the interval [a, b]
(if extreme values are sought only in [a, b]) and compare them
with values of f at its local maxima and minima to determine
absolute extreme values of f.
The following test can also be used to ﬁnd absolute extreme values of
a function.
Theorem 35 (First Derivative Test for Absolute Extreme Values).
Suppose c is a critical point of a continuous function f deﬁned on an
interval.
98 4. APPLICATIONS OF DIFFERENTIATION
(I) If f
(x) > 0 for all x < c and f
(x) < 0 for all x > c, then
f(c) is the absolute maximum value of f.
(II) If f
(x) < 0 for all x < c and f
(x) > 0 for all x > c, then
f(c) is the absolute minimum value of f.
The conclusion of the theorem is easy to understand. Consider
case (I). Since f
(x) > 0 for all x < c, the function increases for all
x < c. Since f
(x) < 0 for all x > c, the function decreases for all
x > c. By continuity of f, the number f(c) must be the largest value
of f. Case (II) is proved similarly.
Recall Example 42. This is a typical optimization problem. Its
solution is rather straightforward, provided Equation (4.4) is given.
Without it, the problem of ﬁnding an optimal angle for a projectile
becomes far more diﬃcult. Its major part now involves a derivation
of Equation (4.4)! This is quite typical for optimization problems. As
a rule, they arise in various disciplines, and their formulation as the
mathematical problem of extreme values requires a speciﬁc knowledge
outside mathematics, for example, the laws of physics as in Example
42, chemistry, biology, economics, and so on. A typical optimization
problem may be split into three basic steps:
(I) Identify a variable with respect to which a quantity Q is to be
optimized.
(II) Use the laws of a speciﬁc discipline to express Q as a function
f of that variable, Q = f(x).
(III) Solve the mathematical problem of extreme values of f.
Example 52. An aluminum can has the shape of a cylinder of
radius r and height h. Design an aluminum can of volume V = 300 cm
3
to minimize the cost (or the amount) of material needed to make the
can.
Solution: Following the preceding guidelines:
(I) Apparently, the least amount of material is used when the
surface area of the can is minimal. So one has to minimize
the surface area S, which depends on r and h. But the vari
ables r and h are not independent because the volume is
ﬁxed.
(II) The surface area is the sum of the areas of the side, top, and
bottom of the can: S = 2πrh +πr
2
+πr
2
= 2πrh +2πr
2
. The
volume is V = πr
2
h. Since the volume is ﬁxed, the variables r
28. OPTIMIZATION PROBLEMS 99
and h are related as h = V/(πr
2
). Hence, S can be written as
a function of the radius r only:
S(r) = 2πr
V
πr
2
+ 2πr
2
=
2V
r
+ 2πr
2
.
One has to ﬁnd the value of r > 0 at which S(r) attains its
absolute minimum. The corresponding value of h is then found
from the relation h = V/(πr
2
).
(III) The function S(r) is diﬀerentiable for all r > 0. Therefore, all
its critical points are roots of the derivative:
S
(r) = −
2V
r
2
+ 4πr =
4π
r
2
r
3
−
V
2π
= 0 .
So the critical point is
r
c
=
V
2π
1/3
.
Since S
(r) < 0 for all 0 < r < r
c
and S
(r) > 0 for all
r > r
c
, the function S(r) attains its absolute minimum at r
c
by the ﬁrst derivative test for absolute extreme values. The
dimensions of the can with minimal costs of material for a
given volume V are
r =
V
2π
1/3
≈ 3.6 cm, h =
V
πr
2
c
=
4V
π
1/3
= 2r
c
≈ 7.2 cm.
The analysis has shown that the height and diameter of a can
of a given volume must be equal in order to minimize the cost
of material (or the surface area of the can). Check out a local
supermarket to see if manufacturers use this fact! 2
Remark. In the previous example, S has been expressed as a func
tion of r. The same conclusion could be reached if S is expressed as a
function of the height h only, that is, when the relation r =
V/(πh)
is substituted into the expression for the surface area to obtain S(h).
The critical point of S(h) can be shown to be h
c
= 2r
c
. Verify this!
A Curious Fact. The preceding problem is essential to reduce
waste from plastic, glass, and aluminum containers. It can be stated
more generally. What is the shape of a container that has the smallest
surface area at a given volume? It can be proved by the calculus of
variations that such a container must be a sphere. Even in the example
of an aluminum can, the optimal dimensions appear to be as close to
those of a sphere as the cylindrical geometry would allow: The height
and diameter are the same. Should only spherical containers be used to
100 4. APPLICATIONS OF DIFFERENTIATION
“go green”? To answer this question, a far more complicated optimiza
tion problem must be studied. For example, spheres are not optimal
for storage and hence for transportation; rectangular containers are far
better. Storage maintenance and transportation require energy (hence
carbon emissions). The production waste for containers of diﬀerent
shapes is diﬀerent. Finally, what about consumers’ reaction to spheri
cal Coke cans in a vending machine or spherical aluminum cans in the
supermarket?
28.1. Applications to Economics. In Section 19, we introduced the cost
function C(x), which is the cost of producing x units of a certain prod
uct. The derivative C
(x) is the marginal cost. It determines the cost of
increasing production from x units to x+1 units. Let p(x) be the price
per unit that a company can charge if it sells x units. The function p(x)
is also called the price function. Naturally, it is generally expected to
be a decreasing function because the price per unit usually goes down
when a larger number of units is sold. The total revenue R(x) = xp(x)
is called the revenue function. The derivative R
(x) is called the mar
ginal revenue function. It determines the change in the revenue when
the number of units sold increases from x to x + 1. Finally, the proﬁt
function
P(x) = R(x) −C(x) = xp(x) −C(x)
determines the total proﬁt if x units are sold. Its derivative P
(x)
determines the change in the total proﬁt when the number of units
sold increases from x to x + 1. The standard optimization problem
here is to minimize costs and maximize revenues and proﬁt.
Example 53. A small store sells jeans at a price of $80 per pair. Ev
ery week 60 units are sold. The cost to the store for 60 units is $2500,
including the cost of transportation. A market survey indicates that, for
each $10 rebate oﬀered to buyers, the number of units sold will increase
by 20 a week. Also, the purchase and transportation costs will go down
by $10 per each weekly order increase of 5 units. How large a rebate
should the store oﬀer to maximize its proﬁt?
Solution:
(I) What is known about the price function p(x)? First, its value
at a particular number of sold units x = x
0
= 60 is p
0
=
p(60) = 80. Also, if x increases by an amount of Δx = 20,
the price function decreases by Δp = 10 (the rebate). Thus,
the ratio m = −Δp/Δx = −1/2 is the rate of change of p(x)
29. NEWTON’S METHOD 101
(the minus sign indicates the decrease in p(x)). So the price
function is
p(x) = p
0
+ m(x −x
0
) = 80 −
1
2
(x −60) = 110 −
1
2
x.
(II) What is known about the cost function C(x)? First, its value
at a particular number of supplied units x = x
0
= 60 is C
0
=
C(60) = 2500. Also, the cost function decreases by ΔC = 20
if x increases by Δx = 5. So the ratio M = −ΔC/Δx = −4 is
the rate of change of C or the marginal cost. Therefore,
C(x) = C
0
+ M(x −x
0
) = 2500 −4(x −60) = 2740 −4x.
(III) One has to maximize the proﬁt function:
P(x) = xp(x) −C(x) = 114x −
1
2
x
2
−2740.
Since P
(x) = 114 − x, the function has one critical point
x = 114 at which P(x) attains its absolute maximal value by
the ﬁrst derivative test for absolute extreme values.
(IV) If x = 114 units can be sold, the price per unit is p(114) =
110 −57 = 53; that is, the rebate should be p(60) −p(114) =
80 − 53 = 27. Thus, the store should oﬀer a rebate of $27 to
maximize its proﬁt. Note also the increase in the weekly proﬁt:
P(60) = $2,300 whereas P(114) = $3,758.
2
Remark. In fact, the linear (tangent line) approximation has been
used to get the unknown price and cost functions in the previous ex
ample. This is a beneﬁt of market surveys: They estimate the deriva
tives (or trends) of the price functions. Naturally, an increase in sales
leads to a decrease in the demand for that particular item. So, af
ter a successful rebate campaign, the store would need a new mar
ket survey to estimate p
(114) and get the linear approximation at
x = 114. The price may go up then. Similarly, the cost function is
generally highly nonlinear. Its linearization near a particular x = x
0
cannot be valid for all x > x
0
. Indeed, in the previous example, it van
ishes at x = 685 and becomes negative after that, which cannot possibly
be true.
29. Newton’s Method
Finding roots of a function f(x) is an important problem in various
applications. Unfortunately, an analytic solution of the equation f(x) =
0 is impossible in many practical cases. For example, consider f(x) =
102 4. APPLICATIONS OF DIFFERENTIATION
x − e
−x
. The equation f(x) = 0 is equivalent to x = e
−x
. The graphs
y = x and y = e
−x
intersect at some x between 0 and 1. So f(x) has a
root. But how can it be calculated? Here we present one of the simplest
methods, known as Newton’s method. It provides a recurrence relation
that allows us to compute a root of a diﬀerentiable function with any
desired accuracy.
29.1. Newton’s Recurrence Relation for Finding a Root. Suppose f(x)
has a root near x
1
. Consider the tangent line approximation of f near
x
1
: L(x) = f(x
1
) + f
(x
1
)(x − x
1
). It is easy to ﬁnd the root of L(x),
which is denoted by x
2
:
L(x) = 0 −→ x = x
2
= x
1
−
f(x
1
)
f
(x
1
)
.
Note that the root of L(x) exists if f
(x
1
) = 0 (otherwise, the tangent
line is horizontal and cannot have any root). Since L(x) is only an
approximation to f(x), the number x
2
is closer to the root of f than
x
1
, but does not coincide with it. In other words, the value f(x
2
) is
closer to 0 than f(x
1
): 0 < f(x
2
) < f(x
1
) (the absolute value is
necessary if the function takes negative values). Therefore, the tangent
line constructed at x = x
2
, L(x) = f(x
2
) − f
(x
2
)(x − x
2
), can be
expected to approximate f(x) even better near its root because x
2
is
closer to the root than x
1
. The root of the new tangent line is given
by the same expression as before where x
1
should be replaced by x
2
:
x
3
= x
2
−f(x
2
)/f
(x
2
). The procedure may be recursively repeated to
generate a sequence of values x
n
:
(4.18) x
n+1
= x
n
−
f(x
n
)
f
(x
n
)
, n = 1, 2, ...,
provided f
(x
n
) = 0.
Theorem 36. If f has a single root r in an open interval and
f
(x) = 0 on the interval, then there exists x
1
suﬃciently close to r
such that the sequence (4.18) converges to the root
lim
n→∞
x
n
= r .
In practical terms, the sequence elements are calculated with a par
ticular number of signiﬁcant digits (decimal places). Newton’s recur
rence is applied until x
n+1
and x
n
agree to all the relevant decimal
places. Then r = x
n+1
is correct to the relevant decimal places.
29. NEWTON’S METHOD 103
Example 54. Find the root of f(x) = x−e
−x
that is correct to six
decimal places.
Solution:
(I) Determine the position of the root ﬁrst. The graphs y = x
and y = e
−x
intersect between 0 and 1. So the root lies in the
interval (0, 1).
(II) Verify the condition f
(x) = 0: f
(x) = 1 + e
−x
> 0 for all x.
(III) Pick an initial value of x
1
= 0. Then Newton’s sequence for
six decimal places is:
x
1
= 0 , x
2
= 0.5 , x
3
= 0.566311 , x
4
= 0.567143 , x
5
= 0.567143 .
So the root r = 0.567143 is correct to six decimal places (in
fact, f(0.567143) = −4.5 ×10
−7
).
2
29.2. Pitfalls in Newton’s Method. Unfortunately, there is no unique
recipe for choosing an initial point in Newton’s sequence. The choice
depends very much on the function in question. In practice, it is de
termined by trying diﬀerent values. A few possible bad behaviors of
Newton’s sequence are useful to keep in mind.
(I) A bad choice of the initial point x
1
can produce the value of x
2
that is a worse approximation to the root than x
1
. Consider,
for example, the function f(x) = x
3
− 3x
2
+ 2 in the interval
[0, 2] and f(x) = 2 when x < 0 and f(x) = −2 when x > 2.
The function is continuously diﬀerentiable because f
(x) =
3x
2
−6x approaches 0 as x →0+ and x →2−. The function
has the root x = 1 and f
(x) < 0 in the open interval (0, 2).
If 0 < x
1
< 2 is close enough to either x = 0 or x = 2, then
x
2
would be outside the interval (0, 2). Note that the actual
behavior of f(x) outside the interval [0, 2] is not relevant for
the conclusion. The essential point here is that such a situation
is likely to occur when f
(x
1
) is close to 0.
(II) A poor choice of the initial point may lead to a cycle in New
ton’s sequence. Take f(x) = x
3
− 2x + 2 and x
1
= 0. Since
f
(x) = 3x
2
− 2, the next elements are x
2
= 0 − 2/(−2) = 1,
x
3
= 1 −1/1 = 0 = x
1
. That is, Newton’s sequence is a cyclic
sequence, which never converges. The initial point must be
taken closer to the root.
(III) If f
(x) →±∞ as x approaches a root r (the graph y = f(x)
has a vertical tangent line at the root), Newton’s sequence may
oscillate around r, never converging to it, or it may diverge for
any initial point. To understand this phenomenon, suppose
104 4. APPLICATIONS OF DIFFERENTIATION
f(x) behaves near its root r as f(x) ≈ a(x − r)
ν
, ν > 0,
and a is constant. Since f(x)/f
(x) = ν
−1
(x − r), Newton’s
sequence (4.18) x
n+1
= x
n
− ν
−1
(x
n
− r) can also be written
as x
n+1
− r = q(x
n
− r), where q = 1 − ν
−1
. Apparently,
x
n
→ r is equivalent to y
n
= x
n
− r → 0. But the sequence
y
n+1
= qy
n
= q
2
y
n−1
= · · · = q
n+1
y
1
would converge if q =
1 − ν
−1
< 1 or ν < 1/2 unless y
1
= 0 (i.e., if the root is
accurately guessed!). For example, for f(x) = x
1/3
(ν = 1/3),
Newton’s sequence diverges: x
n+1
= (1−3)x
n
= −2x
n
for any
choice of the initial point x
1
= 0. For f(x) = x
1/2
(ν = 1/2),
Newton’s sequence oscillates x
n+1
= (1 −2)x
n
= −x
n
.
29.3. Understanding Money Loans. Suppose that one takes a loan of
P dollars (the principal) for n months with an annual interest rate of
I%. What is the monthly payment? It is calculated as follows. The
interest rate per month is x = I/12. For example, an annual interest
rate of 6% means that I = 0.06 and x = 0.06/12 = 0.005. Each
payment includes the payment toward the principal and the interest.
Let F
k
be the amount yet to be paid after k monthly payments. It
is called the future value of the loan. The sequence F
k
satisﬁes the
conditions: F
0
= P and F
n
= 0 (the loan and interest are paid oﬀ after
n payments). Let A be the monthly payment. Then
F
1
= P + Px −A, F
2
= F
1
+ F
1
x −A, ..., F
k
= F
k−1
+ F
k−1
x −A.
Here F
1
is the future value of the loan after one payment, which is the
loan P plus the monthly interest Px minus the payment A. After one
payment, the loan value is F
1
. So, after one more payment, its value is
the value F
1
plus interest F
1
x minus the payment A, and so on. After
n payments,
F
n
= F
n−1
(1 + x) −A
= F
n−2
(1 + x)
2
−A[(1 + x) + 1] = · · ·
= F
0
(1 + x)
n
−A[(1 + x)
n−1
+ (1 + x)
n−2
+· · · + (1 + x) + 1]
= P(1 + x)
n
−A
(1 + x)
n
−1
x
,
where, in the last equality, the geometric sum formula 1+q +q
2
+· · · +
q
n−1
= (q
n
−1)/(q −1) has been used (it can be proved by dividing the
polynomial q
n
−1 by the polynomial q −1). Since F
n
= 0, the monthly
payment is
(4.19) A =
Px
1 −(1 + x)
−n
.
29. NEWTON’S METHOD 105
For example, a loan of $200,000 for 10 years at a ﬁxed annual inter
est rate of 6% implies 120 monthly payments of $2,220.41. Indeed, in
Equation (4.19), substitute x = 0.06/12 = 0.005, n = 120, and P =
200,000, then A ≈ 2220.41004. The total amount paid after 10 years is
120 ×A = $266,449.20. The interest paid is nA −P = $66,449.20.
When selling a car, a dealer might oﬀer a monthly payment for a
few years if a customer cannot aﬀord to pay the price in full. In this
case, the loan amount P is the price of the car; the monthly payment A
and its number n are known. To assess the deal (or to pick one among
a few oﬀered at diﬀerent dealerships), one has to know the interest rate
before signing up. It might be the case that the loan for a higherquality
car, meaning a higher price and higher monthly payments, might have
a lower interest rate, than the loan for a cheaper car (smaller monthly
payments). If A, P, and n are given, then x can be found by solving
Equation (4.19), which can be written in a more convenient form as
(4.20) f(x) = Px(1 + x)
n
−A(1 + x)
n
+ A = 0 .
In other words, this is the rootﬁnding problem! It can be solved by
Newton’s method. The number x should be found up to ﬁve decimal
places, which is suﬃcient our purposes.
Example 55. A dealer oﬀers a car at a price of $10,000. It can
also be sold for payments of $217.42 per month for 5 years. There is
another car being oﬀered at a price of $15,000, which can also be sold
for payments of $311.38 per month for 5 years. Which loan has a lower
interest rate?
Solution:
(I) For the ﬁrst car, one has to ﬁnd the root of Equation (4.20) if
A = 217.42, P = 10,000, and n = 5×12 = 60. It is convenient
to initiate Newton’s sequence at x
1
= 0.01, which corresponds
to an annual interest rate of 12% (i.e., I = 0.12 and x =
0.12/12 = 0.01). Up to ﬁve decimal places, Newton’s method
yields x = 0.00917, which corresponds to I = 12x = 0.11004,
or an annual interest rate of 11%.
(II) For the second car, one has to ﬁnd the root of Equation (4.20)
if A = 311.38, P = 15,000, and n = 5 × 12 = 60. Newton’s
method, initiated again at x
1
= 0.01, yields the root x =
0.00750 (up to ﬁve decimal places). This corresponds to an
annual interest rate of 9%. So the second loan has a lower
interest rate.
2
106 4. APPLICATIONS OF DIFFERENTIATION
It is interesting to note that the car prices diﬀer by 50%. Similarly,
the monthly payments appear in a similar proportion 311.38/217.42 ≈
1.43. The oﬀers might look like as nearly the same deal. In fact, they
are not!
30. Antiderivatives
In many practical problems, a function is to be recovered from its
derivative. For example, if the velocity is given as a function of time,
v = v(t), one might want to ﬁnd the position as a function of time,
s = s(t), where s
(t) = v(t). What is s(t)?
Definition 35. A function F is called an antiderivative of f on
an interval I if F
(x) = f(x) for all x in I.
For many basic functions, it is not diﬃcult to ﬁnd the corresponding
antiderivative. For example, from the rule (x
n+1
)
= (n+1)x
n
, it follows
that if f(x) = x
n
, n = −1, the antiderivative is F(x) = x
n+1
/(n+1). It
has also been proved that (ln x)
= 1/x. So the function F(x) = ln x
is the antiderivative of f(x) = 1/x for all x = 0.
30.1. Uniqueness of the Antiderivative. Suppose F
(x) = f(x) for all x
in an interval (a, b). Is such an F(x) unique? This question is answered
by Corollary 5 given at the end of Section 23. Indeed, let F(x) and
G(x) be antiderivatives of f(x), that is, F
(x) = G
(x) = f(x) on
(a, b). By Corollary 5, F and G may only diﬀer by a constant: G(x) =
F(x)+C. Recall that Corollary 5 does not hold for the union of disjoint
intervals. Thus, any two antiderivatives of the same function may diﬀer
at most by a constant on an interval.
Theorem 37. If F is an antiderivative of f on an interval I, then
the most general antiderivative of f on I is
F(x) + C,
where C is an arbitrary constant.
For example, the general antiderivative of the power function f(x) =
x
n
, n = −1, is F(x) = x
n+1
/(n + 1) + C, and for f(x) = 1/x, it is
F(x) = ln x + C. This nonuniqueness of the antiderivative is not a
drawback of the concept but rather a great advantage. This is explained
by the following example. The velocity of a piece of chalk thrown verti
cally upward with a velocity of v
0
is v(t) = v
0
−gt, where g = 9.8 m/s
2
is the acceleration of a free fall. At t = 0, the chalk has a velocity of
v(0) = v
0
. Then it begins to slow down (v(t) decreases because of grav
ity). Eventually, at t = v
0
/g, the chalk stops and begins to fall back. If
30. ANTIDERIVATIVES 107
h(t) is the height of the chalk relative to the ﬂoor, then h
(t) = v(t);
that is, the height is an antiderivative of v(t). It is easy to ﬁnd a
particular antiderivative of v(t) using the antiderivative of the power
function: h(t) = v
0
t−gt
2
/2 (indeed, h
(t) = v
0
−gt). What is the phys
ical signiﬁcance of the general antiderivative h(t) = C + v
0
t − gt
2
/2?
It appears as if the position of the chalk relative to the ﬂoor is not
uniquely determined. In particular, h(0) = C is the height at the very
moment when the chalk was thrown upward. But the chalk could be
thrown upward at 1 m above the ﬂoor or 2 m above it with the very
same initial velocity. So, in both the cases, v(t) is the same, while the
h(t) are not. In the ﬁrst case, h(0) = 1, whereas in the second case,
h(0) = 2. Thus, the constant C can be ﬁxed by specifying the value of
the antiderivative at a particular point.
This feature of the general antiderivative can also be visualized by
plotting the graphs y = F(x) + C for diﬀerent values of C. All such
graphs are obtained from the graph y = F(x) by rigid translations
along the y axis. If one demands that the graph y = F(x) + C should
pass through a particular point (x
0
, y
0
), then C is ﬁxed: y
0
= F(x
0
)+C
or C = y
0
−F(x
0
). For example, ﬁnd f(x) if f
(x) = 3x
2
and f(2) = 1.
The general antiderivative of 3x
2
is f(x) = x
3
+ C. From f(2) = 1, it
follows that f(2) = 8 + C = 1 or C = −7. Therefore, f(x) = x
3
−7.
30.2. Linearity of the Antiderivative. Let F and G be antiderivatives of
f and g, respectively. Then an antiderivative of f + g is F + G. An
antiderivative of kf, where k is an arbitrary constant, is kF. These
properties are easily veriﬁed. Indeed, (F + G)
= F
+ G
= f + g
and (kF)
= kF
= kf, where the linearity of the derivative has been
used. In other words, antidiﬀerentiation is a linear operation just like
diﬀerentiation itself.
30.3. Antiderivatives of Basic Functions. An antiderivative of the power
function has been found by studying the derivative of the power and
logarithmic functions. The idea is useful for other basis functions. Their
antiderivatives can be found by reading the table of derivatives of basic
functions backward, that is, from the right to left. For example,
(sin(x))
= cos(x), (−cos(x))
= sin(x), (e
x
)
= e
x
,
(tan(x))
= (sec(x))
2
, (sin
−1
(x))
=
1
√
1 −x
2
, (tan
−1
(x))
=
1
1 + x
2
.
In particular, this table says that the general antiderivative of f(x) =
1/(1 + x
2
) is F(x) = tan
−1
(x) + C. The table of derivatives of basic
108 4. APPLICATIONS OF DIFFERENTIATION
functions combined with the linearity of antidiﬀerentiation is a good
source of antiderivatives of more complicated functions.
Example 56. Find the general antiderivative of f(x) = e
−2x
+
cos(4x) + x
2
/(1 + x
2
).
Solution:
(I) By the linearity of the antiderivative, it is suﬃcient to ﬁnd
antiderivatives of e
−2x
, cos(4x), and x
2
/(1 + x
2
). The gen
eral antiderivative is obtained by adding a general constant to
the sum of the particular antiderivatives of the previous three
functions.
(II) From (e
−2x
)
= −2e
−2x
, it follows that (−e
−2x
/2)
= e
−2x
.
Hence, an antiderivative of e
−2x
is −e
−2x
/2.
(III) Similarly, from (sin(4x))
= 4 cos(4x), it follows that an anti
derivative of cos(4x) is sin(4x)/4.
(IV) The table of derivatives does not appear helpful in the case of
x
2
/(1 + x
2
). However, a simple algebraic manipulation leads
to the goal:
x
2
1 + x
2
=
1 + x
2
−1
1 + x
2
= 1 −
1
1 + x
2
.
So its antiderivative is x − tan
−1
(x). Thus, the general anti
derivative reads:
F(x) = −
1
2
e
−2x
+
1
4
sin(4x) + x −tan
−1
(x) + C .
2
30.4. Antiderivatives of Higher Order. What is F(x) if F
(x) = f(x)
for a given f(x)? Or, more generally, what is F(x) if F
(n)
(x) = f(x)?
A function F that satisﬁes the latter condition is called an antideriv
ative of f of the nth order. To ﬁnd it, one has to antidiﬀerentiate f
n times. For example, F
(x) = 6x. Taking the ﬁrst antiderivative of
f(x) = 6x, one gets F
(x) = 3x
2
. Taking the antiderivative one more
time yields F(x) = x
3
. What about the uniqueness of higherorder
antiderivatives? To ﬁnd the general antiderivative of a higher order,
each time antidiﬀerentiation is carried out, the corresponding general
antiderivative must be used. In the preceding example, the general
antiderivative of f(x) = 6x is 3x
2
+ C
1
, where C
1
is an arbitrary con
stant. Hence, F
(x) = 3x
2
+C
1
. Its general antiderivative reads F(x) =
x
3
+C
1
x+C
2
, where C
2
is another arbitrary constant. Thus, the general
second antiderivative can be obtained from a particular one by adding
a general function whose second derivative is 0, which is a general linear
function: (C
1
x + C
2
)
= 0. Similarly, if F(x) is a particular function
30. ANTIDERIVATIVES 109
that satisﬁes the condition F
(n)
(x) = f(x), then the general antideriv
ative of the nth order is F(x) + C
1
x
n−1
+ C
2
x
n−2
+· · · + C
n−1
x + C
n
,
where C
1
, ..., C
n
are arbitrary constants. Indeed, the nth derivative of
a polynomial of degree n −1 is 0. Note that this analysis applied only
when f was deﬁned in an interval. Why?
The following example illustrates the signiﬁcance of arbitrary con
stants in general higherorder antiderivatives.
Example 57. Any freefalling object near the surface of the Earth
has the freefall acceleration of 9.8 m/s
2
. A piece of chalk is thrown
vertically upward at a speed of 7 m/s and at 1.5 m above the ﬂoor. When
does the chalk hit the ﬂoor?
Solution:
(I) Let h(t) be the height of the chalk relative to the ﬂoor. Then
its velocity is v(t) = h
(t), and its acceleration is a(t) =
v
(t) = h
(t). Since all freefalling objects have an acceleration
of 9.8 m/s
2
, one has h
(t) = −9.8. The minus sign indicates
that the acceleration is directed downward.
(II) The general second antiderivative of the constant function
−9.8 is h(t) = −9.8t
2
/2 + C
1
t + C
2
, where C
1
and C
2
are
arbitrary constants.
(III) To ﬁx C
1
and C
2
, the initial conditions of the motion must
be used. The initial velocity is v(0) = 7. Since v(t) = h
(t) =
−9.8t+C
1
, one can infer that v(0) = C
1
= 7. The initial height
is h(0) = 1.5. Hence, h(0) = C
1
= 1.5.
(IV) The height is h(t) = −9.8t
2
/2 + 7t + 1.5. The chalk hits the
ﬂoor when its height vanishes, that is, at the time moment
t > 0 when h(t) = 0. A positive root of the quadratic equation
−9.8t
2
/2 + 7t + 1.5 = 0 is t ≈ 1.62 s. The maximal height
reached by the chalk is 4 m. Why? 2
CHAPTER 5
Integration
31. Areas and Distances
Consider the linear function f(x) = x. What is the area below the
graph y = f(x) and above the interval 0 ≤ x ≤ 1? This question is
easy to answer because the area in question is the area of the right
triangle with catheti of unit length: A = 1/2. Let f(x) = x
2
. What
is the area now? To calculate it, consider a partition of the interval
[0, 1] by n segments on length 1/n. The partition is deﬁned by the
set of points x
0
= 0, x
1
= 1/n, x
2
= 2/n,..., x
n−1
= (n − 1)/n, and
x
n
= n/n = 1, that is, x
k
= k/n, where k = 0, 1, 2, ..., n. The area
under the parabola y = x
2
over the interval [0, 1] is the sum of the
areas S
k
under the parabola over the partition interval [x
k−1
, x
k
] where
k = 1, 2, ..., n,
A = S
1
+ S
2
+· · · + S
n
.
The area S
k
cannot exceed the area of a rectangle with base 1/n and
height f(x
k
) = (k/n)
2
. Let us denote this upper bound by S
U
k
= k
2
/n
3
.
The area S
k
is greater than the area of a rectangle with base 1/n
and height f(x
k−1
) = (k − 1)
2
/n
2
. The lower bound is denoted by
S
L
k
= (k −1)
2
/n
3
. Thus,
S
L
k
=
(k −1)
2
n
3
< S
k
<
k
2
n
3
= S
U
k
.
Therefore, the area A is bounded above by the sum of S
U
k
and below
by the sum of S
L
k
:
S
L
1
+ S
L
2
+· · · + S
L
n
= A
L
n
≤ A ≤ A
U
n
= S
U
1
+ S
U
2
+· · · + S
U
n
for any number n of partition segments. Let us calculate the diﬀerence
0 < S
U
k
−S
L
k
=
2k −1
n
3
≤
2n −1
n
3
<
2
n
2
for any k = 1, 2, ..., n; in the second inequality, the condition k ≤ n has
been used. This inequality allows us to estimate the diﬀerence A
U
n
−A
L
n
:
0 < A
U
n
−A
L
n
= (S
U
1
−S
L
1
) +(S
U
2
−S
L
2
) +· · · +(S
U
n
−S
L
n
) < n
2
n
2
=
2
n
.
111
112 5. INTEGRATION
Thus, if the limit lim
n→∞
A
U
n
exists, then limA
U
n
= limA
L
n
because
0 < A
U
n
−A
L
n
< 2/n →0 as n →∞. On the other hand, A
L
n
≤ A ≤ A
U
n
for any n. Taking the limit n →∞ in this inequality yields
lim
n→∞
A
L
n
= A = lim
n→∞
A
U
n
.
From a geometrical point of view, when n gets larger, the area A
U
n
approaches A from above while A
L
n
does so from below. For n large
enough, both A
U
n
and A
L
n
may serve as a good approximation of A.
In fact, the error of either of the approximations does not exceed 2/n
because 0 < A
U
n
− A
L
n
< 2/n and A
L
n
≤ A ≤ A
U
n
. It appears that the
limit lim
n→∞
A
U
n
can actually be calculated by means of the formula
for the sum of squares of the ﬁrst n positive integers:
(5.1) 1
2
+ 2
2
+· · · + n
2
=
1
6
n(n + 1)(2n + 1) =
n
6
(2n
2
+ 3n + 1).
Indeed, by making use of this formula, one can infer that
A
U
n
=
1
n
3
(1
2
+ 2
2
+· · · + n
2
) =
2n
2
+ 3n + 1
6n
2
=
1
3
+
1
2n
+
1
6n
2
→
1
3
as n →∞. So the area is A =
1
3
.
Let x
∗
k
be a number in the interval [x
k−1
, x
k
]. Then the area S
k
can also be approximated by the area S
∗
k
of a rectangle with base
1/n and height f(x
∗
k
) = (x
∗
k
)
2
, that is, S
∗
k
= f(x
∗
k
)/n. Then the total
area under the graph is approximated by the sum A
∗
n
of all S
k
. Since
S
L
k
≤ S
∗
k
≤ S
U
k
(owing to the monotonicity of the function x
2
in each
interval [x
k−1
, x
k
]), the following inequality holds for any n:
A
L
n
≤ S
∗
1
+ S
∗
2
+· · · + S
∗
n
= A
∗
n
≤ A
U
n
.
Taking the limit n →∞ in this inequality leads to a remarkable result
lim
n→∞
A
∗
n
= A;
that is, the limit of A
∗
n
does not depend on the choice of sample points
x
∗
k
. The area could have been approximated by, for example, A
∗
n
with
the sample points as the midpoints x
∗
k
= (x
k
+ x
k−1
)/2, or any other
convenient choice. This analysis can be extended to any continuous
function.
31.1. The Area Under the Graph of a Continuous Function. Let f(x) be
continuous on [a, b]. Consider a partition of [a, b] by n segments of
length Δx = (b − a)/n. The endpoints of the partitions segments are
x
k
= a + k Δx with k = 0, 1, 2, ..., n, such that x
0
= a and x
n
= b. Let
x
∗
k
be a sample point in the interval [x
k−1
, x
k
].
31. AREAS AND DISTANCES 113
Definition 36. The area A of the region that lies under the graph
of a continuous function f(x) ≥ 0 on an interval [a, b] is
(5.2) A = lim
n→∞
A
∗
n
= lim
n→∞
f(x
∗
1
) Δx + f(x
∗
2
) Δx +· · · + f(x
∗
n
) Δx
for any choice of sample points x
∗
k
.
Let us assess this deﬁnition. Any continuous function attains its
maximal and minimal values on a closed interval. Let M
k
and m
k
be
the maximal and minimal values of f(x) on the interval [x
k−1
, x
k
]. If
S
k
is the area under the graph y = f(x) on the interval [x
k−1
, x
k
],
then S
L
k
= m
k
Δx ≤ S
k
≤ S
U
k
= M
k
Δx. The area S
∗
k
= f(x
∗
k
) Δx is
a continuous function of x
∗
k
on the interval [x
k−1
, x
k
]. Therefore, S
∗
k
must take all the values between its minimal value S
L
k
and its maximal
value S
U
k
. In particular, S
∗
k
= S
k
for some x
∗
k
. So, for any n, there
is a choice of sample points such that A
∗
n
= A. Thus, one can always
ﬁnd a set of sample points for which Equation (5.2) gives the area
under the graph of a continuous function. A nontrivial fact is that
Equation (5.2) holds for any choice of x
∗
k
. Continuing the analogy
with the example of f(x) = x
2
, the limit (5.2) is independent of the
choice of sample points if the lower sums A
L
n
= S
L
1
+ · · · + S
L
n
and
the upper sums A
U
n
= S
U
1
+ · · · + S
U
n
converge to the same number as
n → ∞. Indeed, from the inequality A
L
n
≤ A ≤ A
U
n
it would then
follow that limA
L
n
= limA
U
n
, = A; that is, if A
L
n
and A
U
n
converge
to the same number, then this number must be the area under the
graph. On the other hand, A
L
n
≤ A
∗
n
≤ A
U
n
for any choice of x
∗
k
and
for any n. Therefore, lim
n→∞
A
∗
n
= A independently of the choice of
sample points. It will be shown in the next section that for continuous
functions the limits of A
L
n
and A
U
n
exist and coincide. This fact justiﬁes
the previous deﬁnition.
In practice, Equation (5.2) can be used to ﬁnd the area under the
graph that is correct to any desired number of decimal places. Take a
partition of the interval [a, b], that is, ﬁx some n and Δx. Set sample
points x
∗
k
. Convenient choices might be the left points x
∗
k
= x
k−1
, the
right points x
∗
k
= x
k
, or the midpoints x
∗
k
= (x
k−1
+ x
k
)/2. Calculate
the sum A
∗
n
, keeping the desired number of decimal places. Reﬁne
the partition by, for example, doubling the number of segments, and
calculate A
∗
2n
. If A
∗
n
and A
∗
2n
coincide in the desired number of decimal
places, then A = A
∗
2n
is correct to that number of decimal places. If
not, reﬁne the partition further and compute A
∗
4n
and compare it with
A
∗
2n
and so on, until the needed accuracy is reached.
114 5. INTEGRATION
31.2. Sigma Notation for Sums. To avoid writing lengthy expressions
for sums of an arbitrary number of terms, it is convenient to adopt the
following notation:
A
∗
n
= S
∗
1
+ S
∗
2
+· · · + S
∗
n
=
n
¸
k=1
S
∗
k
,
where the index k is called the summation index. The symbol
¸
means
adding all S
∗
k
, starting with k = 1 up to k = n. For example, the
geometric sum formula can now be written as
(5.3) 1 + q + q
2
+· · · + q
n
=
n
¸
k=0
q
k
=
q
n+1
−1
q −1
.
31.3. The Distance Problem. If an object moves with a constant
velocity v during a time interval a ≤ t ≤ b, then the distance trav
eled by the object is D = v(b−a). How does one calculate the distance
if the speed is a nonconstant function of time v = v(t) ≥ 0? If D(t)
is the distance as a function of time, then v(t) = D
(t). The function
D(t) is an antiderivative of v(t). If it is known, then the distance trav
eled is D(b) − D(a). It turns out that the problem can also be solved
by ﬁnding the area under the graph of v(t)! This implies that there is
a relation between an antiderivative of a function and the area under
its graph.
The time interval [a, b] can be partitioned into n time intervals
Δt = (b − a)/n. Let t
k
= Δt k, k = 0, 1, ..., n, be the end points of
the partition intervals. The distance ΔD
k
= D(t
k
) −D(t
k−1
) traveled
by the object in the time interval [t
k−1
, t
k
] can be found by the mean
value theorem: D(t
k
) − D(t
k−1
) = v(t
∗
k
) Δt for some t
∗
k
in [t
k−1
, t
k
].
Recall that the value v(t
∗
k
) is the average velocity over the time inter
val [t
k−1
, t
k
]. Thus, the total distance is D = ΔD
1
+· · ·+ΔD
n
. In a plot
showing the graph of v(t), the distance ΔD
k
= v(t
∗
k
) Δt is nothing but
the area under the graph over the interval [t
k−1
, t
k
]. Recall that one can
always ﬁnd a particular sample point t
∗
k
such that this area coincides
with the area of a rectangle v(t
∗
k
) Δt. Therefore, D is the area under the
graph of v(t). This implies, in particular, that in order to calculate D,
any sample points t
∗
k
can be used, not necessarily those at which v coin
cides with the average velocity in each partition interval. But there is a
price for this generalization. Namely, the limit (5.2) must be computed:
D = lim
n→∞
n
¸
k=1
v(t
∗
k
) Δt ,
where t
∗
k
is any set of sample points.
32. THE DEFINITE INTEGRAL 115
Example 58. A moving object slows down so that its velocity is
v(t) = e
−2t
. What is the distance traveled by the object during the time
interval 0 ≤ t ≤ 1?
Solution: Let Δt = 1/n so that t
k
= k/n, k = 0, 1, ..., n. Take t
∗
k
=
(k − 1)/n, k = 1, 2, ..., n (the left points of partition intervals). Then
ΔD
k
= v(t
∗
k
) Δt = q
k−1
/n, where q = e
−2/n
. The distance traveled is
D = lim
n→∞
1
n
n−1
¸
k=0
q
k
= lim
n→∞
1
n
q
n
−1
q −1
=
1 −e
−2
lim
n→∞
n(1 −e
−2/n
)
,
where the sum formula (5.3) has been used. To compute the limit in
the denominator, let x = 1/n, that is, x → 0. The limit becomes the
indeterminate form (1 −e
−2x
)/x of type 0/0, which can be resolved by
l’Hospital’s rule: (1 − e
−2x
)
/(x)
= 2e
−2x
/1 → 2 as x → 0. Thus, the
distance traveled is D = (1 −e
−2
)/2. 2
Remark. An alternative solution uses the general antiderivative
of e
−2t
: D(t) = −e
−2t
/2 + C. The distance traveled is D(1) −D(0) =
(1−e
−2
)/2. Note that the result is independent of an arbitrary constant
C. When compared to the previous solution, this one looks like cheat
ing! More to the point, take v(t) = t
2
(the example discussed at the
beginning of this section). Its antiderivative is D(t) = t
3
/3+C. So the
distance traveled, or the area under the graph of t
2
, is D(1) −D(0) =
1/3. It is so simple, isn’t it? Thus, the concept of the antiderivative
and its relation to the area (5.2) must be further investigated.
32. The Deﬁnite Integral
A generalization of the concept of the area under a graph leads to
one of the most fundamental concepts in calculus, the deﬁnite integral.
Let f be a function on an interval [a, b]. Consider a partition of [a, b]
by n intervals of length Δx = (b − a)/n so that the endpoints of the
partition intervals are x
k
= a+k Δx, k = 0, 1, ..., n. Let I
k
= [x
k−1
, x
k
],
k = 1, 2, ..., n, denote the partition intervals. Let M
k
= max
I
k
f(x)
(the maximal value of f(x) in the interval I
k
) and m
k
= min
I
k
f(x)
(the minimal value of f(x) in the interval I
k
). By the analogy of the
area under a graph, deﬁne the lower A
L
n
and upper A
U
n
sums for f by
A
L
n
=
n
¸
k=1
m
k
Δx, A
U
n
=
n
¸
k=1
M
k
Δx
for every partition of [a, b].
116 5. INTEGRATION
Definition 37 (The Deﬁnite Integral). A function f on an interval
[a, b] is said to be integrable if the sequences of its lower and upper sums
converge to the same number. This number is called the deﬁnite integral
of f from a to b and is denoted by
b
a
f(x) dx = lim
n→∞
A
L
n
= lim
n→∞
A
U
n
;
the numbers a and b are called the lower and upper integration limits,
respectively, and the function f is called the integrand.
Apparently, for a continuous and nonnegative f on [a, b], the def
inite integral coincides with the area under the graph of f. The geo
metrical signiﬁcance of the deﬁnite integral in general will be discussed
later after establishing its basic properties.
Let f be integrable on [a, b] and let x
∗
k
be a sample point in I
k
for
each k = 1, 2, ..., n. For any number ∈> 0, there exists an integer N
such that
b
a
f(x) dx −
n
¸
k=1
f(x
∗
k
) Δx
<
for every integer n > N and for every choice of x
∗
k
in I
k
. Indeed, since,
for any x
∗
k
, m
k
≤ f(x
∗
k
) ≤ M
k
and therefore
A
L
n
≤
n
¸
k=1
f(x
∗
k
) Δx ≤ A
U
n
.
By the squeeze principle, the sequence of the sums in the middle of
this inequality must converge to the deﬁnite integral of f from a to b
as n →∞:
(5.4) lim
n→∞
n
¸
k=1
f(x
∗
k
) Δx =
b
a
f(x) dx.
Hence, by the deﬁnition of the limit, no matter how small is, there is
always a large enough integer N such the deviation of the values of the
sequence elements from the limit does not exceed for all n > N. The
sum in (5.4) is called the Riemann sum after the German mathemati
cian Bernhard Riemann (1826–1866). It follows from the preceding
analysis that the sequence of Riemann sums for an integrable function
converges to the deﬁnite integral. Since the limit is independent of the
choice of sample points, any choice convenient to calculate the limit
(5.4) can be made.
32. THE DEFINITE INTEGRAL 117
32.1. Continuity and Integrability. The relation (5.4) holds and can be
used to calculate the deﬁnite integral, provided the function f is inte
grable. The question of integrability requires investigating the conver
gence of the sequences of the upper and lower sums, which might be a
tedious task even for such simple functions as, for example, f(x) = x
2
,
as discussed in the previous section. The following theorem is helpful
when studying the question of integrability.
Theorem 38. If f is continuous on [a, b], or if f has only a ﬁnite
number of jump discontinuities, then f is integrable on [a, b]; that is,
the deﬁnite integral
b
a
f(x) dx exists.
This theorem justiﬁes the deﬁnition of the area under the graph of
a continuous function introduced in the previous section.
Let f(x) be deﬁned on [0, 1] such that f(x) = 1 if x is a rational
number, and f(x) = 0 otherwise (i.e., if x is irrational). The function is
not continuous anywhere in [0, 1]. For example, f(1/2) = 1, but when
x approaches 1/2, the value f(x) keeps jumping from 0 to 1 and back,
no matter how close x is to 1/2 because, for any δ > 0, the interval
(
1
2
− δ,
1
2
+ δ) always contains both rational and irrational numbers.
This function gives an example of a nonintegrable function. Indeed,
take a partition x
k
= k/n, k = 0, 1, ..., n. Any partition interval [(k −
1)/n, k/n] contains both rational and irrational numbers. Therefore,
m
k
= 0 and M
k
= 1. Hence, the lower sum vanishes for any partition,
A
L
n
= 0, whereas the upper sum is A
U
n
=
¸
n
k=1
Δx = 1, that is,
lim
n→∞
A
L
n
= 0 while lim
n→∞
A
U
n
= 1. The function is not integrable.
The integral does not exist. Note that the Riemann sum can still be
deﬁned, but its limit would depend on the choice of sample points (e.g.,
take x
∗
k
to be rational numbers or take x
∗
k
to be irrational numbers;
both options are possible since any partition interval always contains
rational and irrational numbers).
32.2. Properties of the Deﬁnite Integral. Suppose f(x) = c, where c is
a constant. In this case, for any partition interval I
k
, M
k
= m
k
= c
and A
U
n
= A
L
n
= c
¸
n
k=1
Δx = cnΔx = c(b − a). In other words, a
constant function is integrable and its integral is c(b −a):
(5.5)
b
a
c dx = c(b −a).
For any two integrable functions f(x) and g(x) and constants c
1
and
c
2
, it follows from the convergence of the Riemann sums (5.4) for f and
118 5. INTEGRATION
g that
b
a
[c
1
f(x) + c
2
g(x)] dx = lim
n→∞
n
¸
k=1
[c
1
f(x
∗
k
) + c
2
g(x
∗
k
)] Δx
= c
1
lim
n→∞
n
¸
k=1
f(x
∗
k
) Δx + c
2
lim
n→∞
n
¸
k=1
g(x
∗
k
) Δx
= c
1
b
a
f(x) dx + c
2
b
a
g(x) dx. (5.6)
So the integration is a linear operation. In particular, the integral of
the sum of two functions is the sum of their integrals. The integral of
a function multiplied by a constant is the product of the constant and
the integral of the function. If the integration limits are reversed, then
Δx changes its sign: Δx = (b −a)/n →(a −b)/n. Therefore,
(5.7)
b
a
f(x) dx = −
a
b
f(x) dx
and, in particular,
a
a
f(x) dx = 0 .
It can be proved that
(5.8)
b
a
f(x) dx =
c
a
f(x) dx +
b
c
f(x) dx
for f integrable on [a, b] and any a ≤ c ≤ b. The proof is rather
technical and is omitted. If f is continuous and positive on [a, b], then
the property (5.8) is trivial: The area under the graph of f on [a, b] is
the sum of the areas under the graph of f on [a, c] and [c, b].
32.3. Geometrical Signiﬁcance of the Deﬁnite Integral. As already noted,
the deﬁnite integral of f from a to b coincides with the area under the
graph of f for a continuous and positive f. Suppose f is continuous
and negative on [a, b]. Consider the function g(x) = −f(x). The inte
gral of g is the area A under the graph of g. By the linearity of the
integral,
b
a
f(x) dx = −
b
a
g(x) dx = −A. But the area A coincides
with the area above the graph of f and below the x axis. So, for a
negative f, the integral of f coincides with the negative area of the
region bounded below by the graph of f and above by the x axis. Now
let f be continuous on [a, b]. Let it be positive on [a, c] and negative
32. THE DEFINITE INTEGRAL 119
on [c, b], that is, f(c) = 0. Then it follows from the property (5.8) that
b
a
f(x) dx =
c
a
f(x) dx +
b
c
f(x) dx = A
1
−A
2
,
where A
1
is the area under the graph of f on [a, c] and A
2
is the area
above the graph of f on [c, b].
32.4. Comparison Properties of the Integral. The following additional
properties of the deﬁnite integral can be established:
b
a
f(x) dx ≥ 0, if f(x) ≥ 0 in [a, b] , (5.9)
b
a
f(x) dx ≥
b
a
g(x) dx, if f(x) ≥ g(x) in [a, b] , (5.10)
m(b −a) ≤
b
a
f(x) dx ≤ M(b −a), if m ≤ f(x) ≤ M in [a, b] .
(5.11)
The property (5.10) follows directly from the deﬁnition: 0 ≤ m
k
≤
M
k
for any partition if f(x) ≥ 0; that is, the upper and lower sums
are nonnegative and so must be the integral. If f is continuous, the
property (5.9) states the obvious that the area under the graph of f
is nonnegative. The property (5.10) follows from (5.9) for the function
f(x) − g(x) ≥ 0 and the linearity of the integral (5.6). The property
(5.11) is also a consequence of the deﬁnition. Indeed, for any partition,
m ≤ m
k
≤ M
k
≤ M. Hence, m(b −a) ≤ A
L
n
≤ A
U
n
≤ M(b −a) for any
n. In the limit n →∞, this inequality turns into (5.11).
32.5. Evaluation of the Integral by the Riemann Sum. If the integral
exists (f is integrable), then it can be evaluated as the limit of the
Riemann sum (5.4). The limit is independent of the choice of sample
points. The following choices are often used in practice:
x
∗
k
= x
k−1
(the leftpoint rule),
x
∗
k
= x
k
(the rightpoint rule),
x
∗
k
= (x
k−1
+ x
k
)/2 (the midpoint rule),
in combination with the basic properties of the integral. The evaluation
of the Riemann sum is rather technical. Formulas like (5.1), (5.3), and
(5.12)
n
¸
k=1
k =
n(n −1)
2
,
n
¸
k=1
k
3
=
¸
n(n −1)
2
2
120 5. INTEGRATION
can be helpful. However, the Riemann sum is mostly used to calculate
the integral approximately with some designated accuracy by means of
computer simulations, similarly to approximate calculations of the area
discussed in the previous section.
Example 59. Find the deﬁnite integral of f(x) = e
−2x
−2x
2
+4x
3
from 0 to 1.
Solution:
(I) The function is continuous on [0, 1] and hence integrable; that
is, Equation (5.4) applies for any choice of x
∗
k
. The leftpoint
rule will be used.
(II) By the linearity of the integral,
1
0
f(x) dx =
1
0
e
−2x
dx −2
1
0
x
2
dx + 4
1
0
x
3
dx.
The ﬁrst integral is (1 − e
−2
)/2 by Example 58 (where the
area under the graph of e
−2x
in [0, 1] was calculated). The
area under the graph x
2
in [0, 1] can be found at the beginning
of the previous section and is equal to 1/3. The area under the
graph of x
3
can be found with the help of the second relation
in (5.12). Let Δx = 1/n and x
k
= (k − 1)/n (the leftpoint
rule), then the Riemann sum (5.4) becomes
1
0
x
3
dx = lim
n→∞
1
n
4
n
¸
k=1
k
3
= lim
n→∞
1
n
4
n
2
(n −1)
2
4
=
1
4
.
(III) Thus,
1
0
f(x) dx =
1 −e
−2
2
−
2
3
+
1
4
=
1 −6e
−2
12
.
2
33. The Fundamental Theorem of Calculus
In this section, the relation between the deﬁnite integral of a func
tion and its antiderivative will be established. This relation provides
a powerful method for calculating the deﬁnite integral that avoids the
use of the Riemann sum (5.4).
33.1. Integration and Differentiation. Consider the deﬁnite integral of
f(t) = t from 0 to x for some x > 0. This integral represents the area
33. THE FUNDAMENTAL THEOREM OF CALCULUS 121
under the graph of f(t) = t in the interval [0, x], which is the area of a
right triangle:
A(x) =
x
0
t dt =
x
2
2
.
The area A(x) can be viewed as a function of the variable x, which
is the length of the triangle catheti. This function has an interesting
property:
A
(x) = x = f(x) .
In other words, the derivative of the deﬁnite integral with respect to its
upper limit equals the value of the integrand at the upper limit. Recall
that if v(t) ≥ 0 is the speed of a moving object, then the distance
traveled by the object in time T is given by the area under the graph
of v(t), that is,
s(T) =
T
0
v(t) dt .
On the other hand, the speed is the rate of change of s(T), and therefore
there should be s
(T) = v(T); that is, the derivative of the integral
with respect to its upper limit is again the value of the integrand at
the upper limit. How general is this property? Does it hold for all
integrable functions? The following theorem answers these questions.
Theorem 39. If f is continuous on [a, b], then the function deﬁned
by
g(x) =
x
a
f(t) dt , a ≤ x ≤ b,
is continuous on [a, b] and diﬀerentiable on (a, b), and g
(x) = f(x).
Proof. By the deﬁnition of the derivative, one has to prove that
(5.13) lim
h→0
g(x + h) −g(x)
h
= f(x)
for a < x < b. The ratio in the limit can be transformed as follows:
g(x + h) −g(x)
h
=
1
h
¸
x+h
a
f(t) dt −
x
a
f(t) dt
=
1
h
¸
x
a
f(t) dt +
x+h
x
f(t) dt
−
x
a
f(t) dt
=
1
h
x+h
x
f(t) dt ,
where the property (5.8) has been used. Note that since a < x < b
(i.e., x = a and x = b), for a suﬃciently small h = 0, both x and x +h
(h can be positive or negative) always lie in the interval (a, b) so that
122 5. INTEGRATION
the interval [x, x + h] is contained in (a, b). By the continuity of f(t)
on the interval [x, x + h], the function f attains its absolute maximal
and minimal values in [x, x + h]. Let M = f(v) and m = f(u) be
the absolute maximal and minimal values, respectively, where v and
u are in [x, x + h]. Suppose that h > 0. Then m ≤ f(t) ≤ M for
x ≤ t ≤ x + h and, by the property (5.11),
(5.14) mh = f(u)h ≤
x+h
x
f(t) dt ≤ f(v)h = Mh.
Since h > 0, by dividing this inequality by h, one can infer that
(5.15) f(u) ≤
1
h
x+h
x
f(t) dt ≤ f(v)
for some u and v in [x, x + h]. Inequality (5.15) can be established
for h < 0 in a similar manner. Indeed, inequality (5.14) holds for the
integral
x
x+h
f(t) dt. After dividing it by −h > 0, inequality (5.15)
is obtained but with the minus sign at the integral. By the property
(5.7), the sign is reversed, yielding (5.15). Thus,
f(u) ≤
g(x + h) −g(x)
h
≤ f(v) .
Since u and v lie in the interval [x, x + h],
lim
h→0
f(u) = f(x) , lim
h→0
f(v) = f(x) .
Then the relation (5.13) follows from the squeeze principle:
f(x) = lim
h→0
f(u) ≤ lim
h→0
g(x + h) −g(x)
h
≤ lim
h→0
f(v) = f(x) .
2
This theorem basically states that if a continuous function is ﬁrst
integrated and then diﬀerentiated, then it remains unchanged:
(5.16)
d
dx
x
a
f(t) dt = f(x) , a < x < b .
In other words, F(x) =
x
a
f(t) dt is an antiderivative of f(x) in an
open interval (a, b). The continuity of f on [a, b] is essential for this
relation to hold. Take, for example, f(t) = 0 if x < 1 and f(t) = 1 if
x ≥ 1. Let a = 0. Then f has a jump discontinuity at t = 1. By the
property (5.5), g(x) =
x
0
f(t) dt = 0 if x < 1. For x ≥ 1, one has
g(x) =
x
0
f(t) dt =
1
0
f(t) dt+
x
1
f(t) dt = 0+(x−1) = x−1 , x ≥ 1 .
33. THE FUNDAMENTAL THEOREM OF CALCULUS 123
Therefore, g
(x) = 0 if x < 1 and g
(x) = 1 if x > 1. But g
(1) does
not exist.
Example 60. Let g(x) =
b
x
e
−t
2
dt. Find g
(x).
Solution: The function e
−t
2
is a continuous function everywhere as a
composition of two continuous functions, the exponential and power
functions. By the property (5.7), g(x) = −
x
b
e
−t
2
dt. Therefore,
g
(x) = −e
−x
2
by (5.16). 2
This example is a particular case of the general property
d
dx
b
x
f(t) dt = −
d
dx
x
b
f(t) dt = −f(x)
for a continuous f.
33.2. The Deﬁnite Integral and Antiderivative. The following theorem
establishes the relation between the deﬁnite integral of a function and
its antiderivative.
Theorem 40. (The Fundamental Theorem of Calculus). If f is
continuous on [a, b], then
b
a
f(x) dx = F(b) −F(a) ,
where F is any antiderivative of f, that is, a function such that F
= f.
Proof. Let g(x) =
x
a
f(t) dt. By (5.16), the function g(x) is an
antiderivative of f(x) in an open interval (a, b). If F is any other
antiderivative of f, then F and g may diﬀer only by a constant,
F(x) = g(x) + C , a < x < b .
Also, by the deﬁnition of g(x), g(a) = 0 and g(b) =
b
a
f(t) dt. The
function g(x) is continuous on [a, b] because lim
x→a
+ g(x) = g(a) = 0
and lim
x→b
− g(x) = g(b). Therefore, F(x) is also continuous on [a, b]
(as the sum of two continuous functions). Hence,
F(b) −F(a) =
g(b) + C
−
g(a) + C
= g(b) =
b
a
f(t) dt .
The proof is complete. 2
The fundamental theorem of calculus provides a powerful analytic
tool to evaluate deﬁnite integrals.
124 5. INTEGRATION
Example 61. Evaluate
1
0
(1 + x
2
)
−1
dx.
Solution: An antiderivative of (1+x
2
)
−1
is F(x) = tan
−1
(x). Therefore,
1
0
1
1 + x
2
dx = tan
−1
(1) −tan
−1
(0) =
π
4
−0 =
π
4
.
2
Example 62. Evaluate
4
1
(1 + x)/
√
x dx.
Solution: By the linearity of the integral,
4
1
1 + x
√
x
dx =
4
1
(x
−1/2
+ x
1/2
) dx =
4
1
x
−1/2
dx +
4
1
x
1/2
dx.
An antiderivative of x
n
is x
n+1
/(n + 1) for any real n = 1. By taking
n = −1/2 and n = 1/2, an antiderivative is obtained: F(x) = 2x
1/2
+
2x
3/2
/3. Hence,
1
0
1 + x
√
x
dx = F(4) −F(1) =
4 +
16
3
−
2 +
2
3
=
20
3
. 2
34. Indeﬁnite Integrals and the Net Change
As has been shown in the previous section, the derivative of the
deﬁnite integral of a continuous function f with respect to the upper
limit equals the value of f at the upper limit. So integration and diﬀer
entiation appear as operations inverse to one another. To further stress
this relation between the integration and diﬀerentiation, the notion of
an indeﬁnite integral is introduced.
Definition 38 (Indeﬁnite Integral). The function F is called an
indeﬁnite integral of f and is denoted by
F(x) =
f(x) dx if F
(x) = f(x) .
It follows from this deﬁnition that an indeﬁnite integral is noth
ing but the general antiderivative of f. The reason for introducing
the integral symbol into the antiderivative notation is the fundamental
theorem of calculus:
b
a
f(x) dx = F(b) −F(a) ,
where F is any antiderivative of f. Since all antiderivatives diﬀer only
by a constant, which is always cancelled out in the diﬀerence F(b) −
F(a), the deﬁnite integral is the diﬀerence in values of the indeﬁnite
integral at the upper and lower limits of the deﬁnite integral. The
34. INDEFINITE INTEGRALS AND THE NET CHANGE 125
indeﬁnite integral has the same properties as the antiderivative. It is
linear:
(5.17)
c
1
f(x) + c
2
g(x)
dx = c
1
f(x) dx + c
2
g(x) dx
for any constant c
1
and c
2
and any functions f and g.
Using the table of antiderivatives of basic functions, one can make
a table of indeﬁnite integrals of basic functions. Let C be an arbitrary
constant. Then it is easy to verify the following relationships:
x
n
dx =
x
n+1
n + 1
+ C , n = 1 ,
1
x
dx = ln(x) + C ,
sin(ax) dx = −
cos(ax)
a
+ C ,
cos(ax) dx =
sin(ax)
a
+ C ,
e
x
dx = e
x
+ C ,
a
x
dx =
a
x
ln(a)
+ C, a > 0 ,
1
1 + x
2
dx = tan
−1
(x) + C ,
1
√
1 −x
2
dx = sin
−1
(x) + C ,
sec
2
(ax) dx =
tan(ax)
a
+ C ,
csc
2
(ax) dx = −
cot(ax)
a
+ C ,
sec(x) tan(x) dx = sec(x) + C ,
csc(x) cot(x) dx = −csc(x) + C .
Recall that the general antiderivative on a given interval is obtained
from a particular antiderivative by adding an arbitrary constant. This
does not hold for a domain being a disjoint union of two or more in
tervals (review the properties of antiderivatives). So, in the preceding
table, the convention is used that the given expressions for indeﬁnite
integrals are valid only in an interval.
Example 63. Find a general indeﬁnite integral for x
−3
.
Solution: The function x
−3
is not deﬁned at x = 0. So its domain
is the union of two disjoint intervals (−∞, 0) and (0, ∞). By the ﬁrst
equality in the preceding table (n = −3),
x
−3
dx = −
x
−2
2
+ C
1
, x > 0;
x
−3
dx = −
x
−2
2
+ C
2
, x < 0,
where C
1
and C
2
are arbitrary constants. 2
126 5. INTEGRATION
The following notation is used in the fundamental theorem of calculus:
(5.18)
b
a
f(x) dx = F(x)
b
a
= F(b) −F(a) ,
where F(x) =
f(x) dx.
Example 64. Evaluate
1
0
[3x
2
−x + 4(1 + x
2
)
−1
] dx.
Solution: By the linearity of the indeﬁnite integral (5.17), an indeﬁnite
integral of the integrand is x
3
−x
2
/2 + 4 tan
−1
(x). An arbitrary con
stant in the indeﬁnite integral may be omitted here because, as already
noted, it is always cancelled out in the deﬁnite integral. Therefore,
1
0
3x
2
−x +
4
1 + x
2
dx = x
3
−
x
2
2
+ 4 tan
−1
(x)
1
0
=
1
2
−π,
where tan
−1
(1) = π/4 has been used. 2
34.1. The Net Change Theorem. Put f(x) = F
(x) in the fundamental
theorem of calculus (5.18). The result obtained is known as the net
change theorem.
Theorem 41. The integral of a continuous rate of change is the
net change:
b
a
F
(x) dx = F(b) −F(a) .
Note that F
(x) may be positive and negative in the interval [a, b] so
that the quantity y = F(x) may increase and decrease. The diﬀerence
F(b) − F(a) represents the net change of y when x changes from a
to b. The net change vanishes if F(b) −F(a) = 0. This does not mean
that the quantity y does not change at all, but rather this might mean,
for example, that the quantity y increases from the value F(a), then,
at some c in [a, b], it begins to decrease, returning to its initial value
when x = b so that its net change vanishes.
An analogy with an object moving along a straight line can be
made to illustrate the net change. Let s(t) be a position function of
the object relative to some point on the line. Then s
(t) = v(t) is its
velocity (note that the velocity can be negative so that the object can
move back and forth). The net change of the position over the time
interval [t
1
, t
2
] is
t
2
t
1
v(t) dt = s(t
2
) −s(t
1
).
34. INDEFINITE INTEGRALS AND THE NET CHANGE 127
Example 65. Suppose an object travels along a straight line with
a velocity of v(t) = 1 −2t. Find the net change of its position over the
time interval [0, 1] and the total distance traveled by the object over the
same time interval.
Solution:
(I) The indeﬁnite integral of v(t) is s(t) = t −t
2
+ C. So the net
change of the object position is
1
0
v(t) dt =
1
0
s
(t) dt = s(1) −s(0) = 0 .
(II) Note that the velocity changes its sign at t = 1/2. So, in the
interval [0, 1/2], it is positive (i.e., the object moves to the right
from its initial position), then the velocity becomes negative in
[1/2, 1] (i.e., the object goes back to the initial point). To ﬁnd
the distance traveled by the object, the absolute value v(t)
must be integrated over the interval [0, 1]. Think of v(t) as
the speed shown on the speedometer of your car; it is always
non negative regardless of the direction in which the car is
moving.
1
0
1 −2t dt =
1/2
0
(1 −2t) −
1
1/2
(1 −2t) dt
= [s(1/2) −s(0)] −[s(1) −s(1/2)] = 1/2,
where the deﬁnition v = v if v > 0 and v = −v if v < 0 has
been used. 2
Other examples of the net change includes the volume V (t) of water
in a reservoir between two moments of time
t
2
t
1
V
(t) dt = V (t
2
) −V (t
1
),
where V
(t) is the rate of change of the volume; the net change of the
population growth
t
2
t
1
n
(t) dt = n(t
2
) −n(t
1
),
128 5. INTEGRATION
where n
(t) is the growth rate; the relation between the cost and
marginal cost functions:
t
2
t
1
C
(t) dt = C(t
2
) −C(t
1
);
and similarly for many other quantities.
35. The Substitution Rule
An indeﬁnite integral of the derivative F
(x) is the function F(x)
itself. Let u = F(x), where u is a new variable deﬁned as a diﬀeren
tiable function of x. Consider the diﬀerential du = F
(x) dx. Then the
following equalities hold:
F
(x) dx = F(x) + C = u + C =
du,
where C is an arbitrary constant and the last equality follows from the
fact that an indeﬁnite integral of f(u) = 1 is u. So we can conclude
that
F
(x) dx =
du, provided the variables u and x are related as
u = F(x). This also shows that it is permissible to operate with dx
and du after the integral sign as if they were diﬀerentials. This obser
vation leads to a neat technical trick to calculate indeﬁnite integrals.
For example,
1
√
x + 1
dx =
d
2
√
x + 1
= 2
√
x + 1 + C ,
where the substitution u = 2
√
x + 1 has been used. This trick can be
generalized.
Let F(u) be an indeﬁnite integral of a continuous function f(u) on
an interval I. Let u = g(x), where g is diﬀerentiable and its range is
the interval I. By the chain rule,
F(g(x))
= F
(g(x))g
(x) = f(g(x))g
(x) .
In other words, F(g(x))+C is an indeﬁnite integral of f(g(x))g
(x). On
an interval, the most general indeﬁnite integral of f(u) is
f(u) du =
F(u) + C. Therefore, F(g(x)) and
f(u) du can diﬀer at most by an
additive constant. This proves the following theorem.
Theorem 42. (The Substitution Rule). If u = g(x) is a diﬀeren
tiable function whose range is an interval I and f is continuous on I,
then
(5.19)
f(g(x))g
(x) dx =
f(g(x)) dg(x) =
f(u) du.
35. THE SUBSTITUTION RULE 129
The substitution rule is often referred to as a change of the integra
tion variable. It is a powerful method to calculate indeﬁnite integrals.
Example 66. Find
x sin(x
2
+ 1) dx.
Solution:
x sin(x
2
+ 1) dx =
sin(x
2
+ 1)
1
2
d(x
2
+ 1) =
1
2
sin(u) du
= −
1
2
cos(u) + C = −
1
2
cos(x
2
+ 1) + C,
where the substitution u = x
2
+ 1 has been used. 2
Example 67. Find
tan(x) dx.
Solution:
tan(x) dx =
sin(x)
cos(x)
dx = −
d(cos(x))
cos(x)
= −
du
u
= −ln u + C = −ln  cos(x) + C = ln  sec(x) + C ,
where the substitution u = cos(x) and the logarithm property
ln(1/a) = −ln(a) have been used. 2
The substitution rule can be used to evaluate deﬁnite integrals by
means of the fundamental theorem of calculus.
Example 68. Evaluate
2
0
xe
x
2
dx.
Solution: First, ﬁnd an indeﬁnite integral:
F(x) =
xe
x
2
dx =
1
2
e
x
2
dx
2
=
1
2
e
u
du =
1
2
e
u
+ C =
1
2
e
x
2
+ C .
where u = x
2
. By the fundamental theorem of calculus,
1
0
xe
x
2
dx = F(2) −F(0) =
1
2
(e
4
−1).
2
Note that, when evaluating the integral, the original variable x has
been restored in the indeﬁnite integral in order to apply the fundamen
tal theorem of calculus. The fundamental theorem of calculus can also
be applied directly in the new variable u, provided the range of u is
properly changed. Indeed, in the previous example, the answer could
have been recovered from the indeﬁnite integral
1
2
e
u
+ C if u = x
2
ranges from 0 = 0
2
to 4 = 2
2
as x ranges from 0 to 2. This is especially
useful when a calculation of a deﬁnite integral requires several changes
of the integration variable.
130 5. INTEGRATION
Theorem 43. (The Substitution Rule for Deﬁnite Integrals). If g
is continuous on [a, b] and f is continuous on the range of u = g(x),
then
(5.20)
b
a
f(g(x))g
(x) dx =
g(b)
g(a)
f(u) du.
Proof. Let F be an antiderivative of f. Then F(g(x)) is an an
tiderivative of (F(g(x)))
= F
(g(x))g
(x) = f(g(x))g
(x). By the fun
damental theorem of calculus,
b
a
f(g(x))g
(x) dx = F(g(x))
b
a
= F(g(b)) −F(g(a)) .
On the other hand, since F(u) is an antiderivative of f(u), the funda
mental theorem of calculus yields
g(b)
g(a)
f(u) du = F(u)
g(b)
g(a)
= F(g(b)) −F(g(a)) .
Since the righthand sides of these equalities coincide, so must their
lefthand sides, which implies (5.20). 2
Example 69. Evaluate
e
1
ln(x)/x dx.
Solution: The integrand can be transformed as
ln(x)
x
dx = ln(x) d ln(x).
So the substitution u = ln(x) can be made. The range of the new
integration variable u is determined by the range of the old one: u = 0
when x = 1 and u = 1 when x = e. Thus,
e
1
ln(x)
x
dx =
1
0
udu =
u
2
2
1
0
=
1
2
.
2
35.1. Symmetry. The calculation of a deﬁnite integral over a symmet
ric interval can be simpliﬁed if the integrand possesses symmetry prop
erties.
Theorem 44. Suppose f is continuous on a symmetric interval
[−a, a]. Then
a
−a
f(x) dx = 2
a
0
f(x) dx if f(−x) = f(x) (f is even), (5.21)
a
−a
f(x) dx = 0 if f(−x) = −f(x) (f is odd). (5.22)
35. THE SUBSTITUTION RULE 131
Proof. The integral can be split into two integrals:
a
−a
f(x) dx =
0
−a
+
a
0
f(x) dx = −
−a
0
f(x) dx +
a
0
f(x) dx.
In the ﬁrst integral on the very righthand side, the substitution u = −x
is made so that u = 0 when x = 0 and u = a when x = −a and
dx = −du. Hence,
−
−a
0
f(x) dx =
a
0
f(−u) du
and
a
−a
f(x) dx =
a
0
f(−u) du +
a
0
f(x) dx.
Now, if f is even, then f(−u) = f(u) and (5.21) follows. If f is odd,
then f(−u) = −f(u) and (5.22) follows. 2
The geometrical interpretation of this theorem is transparent. Sup
pose f(x) ≥ 0 for 0 ≤ x ≤ a. The integral
a
0
f(x) dx = A is the
area under the graph of f on [0, a]. If f is even, then, by symmetry,
the graph of f on [−a, 0] is obtained from that on [0, a] by a reﬂection
about the y axis. Therefore, the area
0
−a
f(x) dx must coincide with
A. If f is odd, then its graph on [−a, 0] is obtained by the mirror
reﬂection about the origin so that the area A appears beneath the x
axis. Hence,
0
−a
f(x) dx = −A.
Example 70. Evaluate
π
−π
sin(x
3
) dx.
Solution: Unfortunately, an antiderivative of sin(x
3
) cannot be ex
pressed in elementary functions, and the fundamental theorem of cal
culus cannot be used. One can always evaluate the integral by taking
the limit of the sequence of Riemann sums. An alternative solution
is due to a simple symmetry argument. Note that sin(x
3
) is an odd
function, sin((−x)
3
) = sin(−x
3
) = −sin(x
3
). The integration interval
is also symmetric, [−π, π]. Thus, by property (5.22),
π
−π
sin(x
3
) dx = 0 .
2
132 5. INTEGRATION
Remark. In the previous example, take a partition of [−π, π] by
points x
k
= k Δx, k = −N, −N + 1, ..., −1, 0, 1, ..., N − 1, N, where
Δx = π/N. Consider the Riemann sum with sample points being the
midpoints. It is then straightforward to show that the Riemann sum
vanishes because sin(x
∗3
−k
) = −sin(x
∗3
k
) for k = 1, 2, ..., N.
Concepts in Calculus I
Beta Version
Miklos Bona and Sergei Shabanov University of Florida Department of Mathematics
University Press of Florida Gainesville • Tallahassee • Tampa • Boca Raton Pensacola • Orlando • Miami • Jacksonville • Ft. Myers • Sarasota
FL 326112079 http://www.com). ISBN 9781616101558 Orange Grove Texts Plus is an imprint of the University Press of Florida. Florida Gulf Coast University. comprising Florida A&M University. You must attribute the work in the manner speciﬁed by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).org/licenses/byncnd/3. visit http:// creativecommons. Florida Atlantic University.0 Unported License. Florida State University. distribute. University of Florida. which is the scholarly publishing agency for the State University System of Florida. For any reuse or distribution. Any of the above conditions can be waived if you get permission from the University Press of Florida. University Press of Florida 15 Northwest 15th Street Gainesville.upf.0/. However. University of South Florida.com . University of Central Florida. and transmit this work if you attribute authorship. Nothing in this license impairs or restricts the author’s moral rights. University of North Florida. New College of Florida. Please contact UPF for information about how to obtain copies of the work for print distribution.upf. To view a copy of this license.Copyright 2011 by the University of Florida Board of Trustees on behalf of the University of Florida Department of Mathematics This work is licensed under a modiﬁed Creative Commons AttributionNoncommercialNo Derivative Works 3. Florida International University. all printing rights are reserved by the University Press of Florida (http://www. and University of West Florida. you must make clear to others the license terms of this work. You are free to electronically copy.
Derivatives of Trigonometric Functions 16. Taylor Polynomials and the Local Behavior of a Function 26. Derivatives 12. Applications of Differentiation 22. Derivatives of Polynomial and Exponential Functions 14. L’Hospital’s Rule 27. The Derivative as a Function Chapter 3. The Velocity Problem and the Tangent Problem Chapter 2. The Chain Rule 17. Continuous Functions 10. Operations on Functions 4. Linear Approximations and Diﬀerentials Chapter 4. Rules of Differentiation 13. Classes of Functions 3. Functions 1. Related Rates 21. Viewing the Graphs of Functions 5. Analyzing the Shape of a Graph 28. Limit Laws 9. The Limit of a Function 8. Applications of Rates of Change 20. Limits and Derivatives 7. The Product and Quotient Rules 15. The Mean Value Theorem 24. Minimum and Maximum Values 23. Limits at Inﬁnity 11. Derivatives of Logarithmic Functions 19. Functions 2. The First and Second Derivative Tests 25. Inverse Functions 6. Optimization Problems 1 1 2 4 6 8 11 15 15 21 25 29 34 36 41 41 44 47 49 53 55 58 61 65 71 71 76 79 84 88 92 97 .Contents Chapter 1. Implicit Diﬀerentiation 18.
Integration 31. The Substitution Rule 101 106 111 111 115 120 124 128 .vi CONTENTS 29. Areas and Distances 32. Antiderivatives Chapter 5. Newton’s Method 30. Indeﬁnite Integrals and the Net Change 35. The Fundamental Theorem of Calculus 34. The Deﬁnite Integral 33.
f (x) = x = x if 0 ≤ x. and income above $80. that is. ⎨ 0. which is shorter. Here the set D is called the domain of f . −x if x < 0. Instead of saying that f associates f (x) to x. income above $40. 1 to 5.000 is taxed at a rate of 15%. If the sets mentioned in the previous deﬁnition are sets of numbers.2(x − 80) if x > 80. 1 . it sends 0 to 3. There are times when the rules that apply in various cases are closely connected to each other.15(x − 40) if 40 < x ≤ 80. It could be that a function is deﬁned by cases such as ⎧ if 0 ≤ x ≤ 40.” The rule that describes f may be simple or complicated. This example could describe an income tax code. Let N be the set of all natural numbers (which are the nonnegative integers). The fact that f associates to each element of D an element of R is represented by the symbol f : D → R. The value of f (x) is the amount of tax to be paid after an income of x thousand dollars for any positive real number x.CHAPTER 1 Functions 1. then it is often easier to describe f by an algebraic expression. For instance. while the set R is called the range of f . the algebraic description is simpler than actually saying “f is the function that sends n to 2n + 3. 17 to 37.000 is taxed at a rate of 20%.000 but below $60. and so on. we often say that f sends x to f (x). The ﬁrst $40. f (x) = ⎩ 10 + 0.000 of income is taxed at a rate of 10%.1x 4 + 0. Then the function f : N → N given by the rule f (x) = 2x + 3 is the function that sends each nonnegative integer n to the nonnegative integer 2n + 3. A classic example is the absolute value function. Functions A function f is a rule that associates to each element x in a set D a unique element f (x) of another set R. In this case.
f (x) = f (−x) for all x. Then we say that f is an odd function.2 1. “let g be the function that sends each integer that is at least 2 into its largest prime divisor” is simpler than describing that function with algebraic symbols (and symbols of formal logic). let f (x) be the amount of precipitation that x had in 2009. This is called the vertical line test. and the vertical axis corresponds to elements of R. for instance. Note that it is not by accident that we said that f (x) is the father (and not the son) of x. The requirement that f (x) is unique for each x will ensure that no vertical line intersects the graph of f more than once. that is. and for a city x. 2. If f : D → R is a function.1. This function is given by its list of values. FUNCTIONS In this case. then let us consider a twodimensional coordinate system such that the horizontal axis corresponds to elements of D. Note that the rule g(x) = 1 for all real numbers x also deﬁnes a power function. There are also functions for which −f (x) = f (−x) holds for all x. the function f (x) = 1/x. Finally. When that happens. There are times when a plain English description of a function is simpler than an algebraic one. so. Indeed. The graph of f is the set of all points with coordinates (x. An example of this is when D and R are both sets of people and f (x) is the biological father of person x. While a person has only one biological father. is called the reciprocal function. Classes of Functions 2. The special case of a = −1. Examples of odd functions include g(x) = sin x and h(x) = x3 . x−3 = 1/x3 . he or she may have several sons. Power Functions. g(x) = cos x and h(x) = x2 are even functions. f (x)) such that x ∈ D. Then f is a function since it sends each x ∈ D into an element of R. one . let R be the set of all nonnegative real numbers. not by a rule that would specify how to compute f (x) if given x. an algebraic description may not even be possible. For instance. a function must send x to a unique f (x). Sometimes the rule that sends x to f (x) can only be given by listing the value of f (x) for each x. A power function is a function f given by the rule f (x) = xa . where a is a ﬁxed real number. let D be the set of 200 speciﬁc cities in the United States. Note that x−a = 1/xa . For instance. For instance. we say that f is an even function. as opposed to a general rule. If the sets D and R are not sets of numbers. functions can also be represented by their graphs.
A rational function is the ratio of two polyno mial functions such as 3x2 + 4x − 7 . are called quadratic functions. We will discuss these functions. A function f is called periodic with period T > 0 if f (x) = f (x + T ) for all x and T is the smallest positive real number with this property. The real numbers that multiply the power functions in a polynomial are called the coeﬃcients of the polynomial. we mention one of their interesting properties. tan. In the preceding example. where n is a positive integer. sec. For now. such as h(x) = x2 − 4x − 21. are called linear functions. CLASSES OF FUNCTIONS 3 in which a = 0. In the last example. and −5. The domain of these functions is the set of all real numbers. their periodicity. cot. and csc in earlier courses. x3 − 8 The domain of a rational function is the set of all real numbers. 2. 2. 2. A polynomial function is the sum of a ﬁnite number of constant multiples of power functions with nonnegative integer exponents. • Polynomials of degree 3. Some subclasses of polynomial functions have their own names as follows: • Polynomials of degree 0. 7. are called cubic functions.4. The reader has surely encountered the trigonometric functions sin. Periodicity.3. The largest exponent that is present in a polynomial function is called the degree of the polynomial.2. • Polynomials of degree 2. Rational Functions. are called constant functions. cos.2. Polynomials. . such as g(x) = 3x − 2. Trigonometric Functions. such as the function f given by the rule f (x) = 3x4 + 2x2 + 7x − 5. the only such number is x = 2. such as p(x) = x3 − x2 + 6x − 2. later in the text. So the degree of f in the last example is 4. and their inverses. such as f (x) = 6. then the power function f given by the rule √ f (x) = xa = x1/n = n a is also called a root function. R(x) = 2. except for the numbers that make the polynomial in the denominator 0. • Polynomials of degree 1. If a = 1/n. they are 3.
Horizontal translations are a little bit trickier. (1) Are sec and csc periodic functions? If yes. such as f (x) = 2x .6. Transcendental Functions. polynomials are algebraic functions as well since they are sums of constant multiples of power functions. multiplication. Algebraic Functions. 2. and taking roots. This implies that rational functions are also algebraic since they are obtained by dividing a polynomial (also an algebraic function) by another one. which are called logarithmic functions. the graph of the function h given by h(x) = f (x) − 7 is the graph of f translated by seven units to the south. We will discuss these functions in later sections of this chapter. Similarly. what is their period? (2) Let f (x) = x2/3 .7(1) about the periodicity of sec and csc. There are many additional examples. Operations on Functions 3. power functions with integer exponents are algebraic functions. We have seen the basic mathemat ical functions and their graphs in the last section. then the graph of g is the graph of f translated by two units to the east. subtraction.4 1. Exercises. Is f an algebraic function? 3. g(x) = x.5. In this section. that is. since they only use multiplication.7. h(x) = (x + 1)/(x − 1). though possibly many times. It is easy to see what happens to the graph of a function if we increase or decrease each value of a function by a constant. Functions that are not algebraic are called transcendental functions. sin and cos are both periodic with period 2π. . such as the functions given by √ 3 the rules f (x) = x + 3. For instance. FUNCTIONS For example. 2. These include trigonometric functions and their inverses. Transformations of a Function. Therefore.1. The reader will be asked in Exercise 2. The preceding list did not contain all algebraic functions since it did not contain any functions in which roots were involved. exponential functions. which do not have their own names. Indeed. and tan and cot are periodic with period π. and their inverses. 2. An algebraic function is a function that contains only addition. the graph of the function g given by g(x) = f (x) + 5 for all x is simply the graph of the function f translated by ﬁve units to the north. division. So we get additional examples if we include√ roots. The reader is invited to verify that if g is the function given by g(x) = f (x − 2). which are functions that contain a variable in the exponent. we will look at their transformations.
and f g is the intersection of the domains of f and g. then the graph of g is the graph of f compressed horizontally by a factor of c. if h is the function given by h(x) = f (x + 3) for all x. that is. if h is obtained from g by the rule h(x) = f (x/c). each point on the graph of g is c times as close to the vertical line as the corresponding point on the graph of f . where c > 1 is a real number. So if (x. if h(x) = f (x)/c. In other words. if (x. then the graph of h is a horizontally stretched version of the graph of f . At this point. the domain of f + g. OPERATIONS ON FUNCTIONS 5 in the positive direction. That is. y) is a point on the graph of g. then their sum. the stretching or compressing eﬀect will not change (it will only depend on c). y) is a point on the graph of f . diﬀerence. . but each point on the graph will be reﬂected through the horizontal axis. then the graph of h is a vertically compressed version of the graph of f . That is. each point on the graph of h is c times as close to the horizontal axis as the corresponding point on the graph of f . then (cx. the reader should stop for a minute and think about the graphs of the functions f (cx) and f (x/c) when c < −1 is a negative constant. For instance. the reader should stop and think about what happens if c < −1 is a negative constant.2. g(8) = f (6). 3. Combining Two Functions. then the graph of h is the graph of f translated by three units to the west. Furthermore. It goes without saying that dividing by c > 1 has the opposite eﬀect. If f and g are two functions. each point on the graph of g is c times as far away from the horizontal axis as the corresponding point on the graph of f . Again. then the graph of g is simply the graph of f “stretched” vertically by a factor of c. y) is a point on the graph of f . On the other hand. That is. in the negative direction. Horizontal transformations involving multiplication and division are similar to their counterparts involving addition and subtraction in that their eﬀect is the opposite of what one might think at ﬁrst. The eﬀect of multiplication and division on functions can be described similarly. (f + g)(x) = f (x) + g(x). If c > 1 and g is the function obtained from f by the rule g(x) = f (cx). As the reader probably ﬁgured out. That is. and product are deﬁned wherever both f and g are deﬁned. y) is a point on the graph of h. That is. If f is a function and g is the function given by f (x) = c · g(x). then (x/c. Indeed.3. we must substitute a larger number into g to get the same value as from f . each point on the graph of h is c times as far from the vertical axis as the corresponding point on the graph of f . Similarly. In other words. f − g.
(2) Sketch the graph of f (x) = cos 2x. if. (3) Show examples for f and g when g ◦ f is deﬁned for all real numbers. and then g is applied. we point out a few of the common mistakes in using these tools. then we say that f is increasing on I. Example 1. and h(x) = (2x + 5)2 . If the range of f is part of the domain of g. Note that f ◦ g and g ◦ f are. Similarly. If f and g are both functions from R to R and f (x) = x2 and g(x) = x + 1. and h(x) = 3 tan x. FUNCTIONS (f −g)(x) = f (x)−g(x). (4) Show examples when f ◦ g = g ◦ f . and (f g)(x) = f (x)g(x). even if x is in the domain of both f and g. For each point of this domain. ﬁrst f . this means that the graph of f goes roughly from the southwest to the northeast while x ∈ I. to study the graph of functions. for all real numbers x and x in I. diﬀerent functions. g(x) = (x − 3)2 . The function we obtain in this way sends x to g(f (x)) and is called the composition of f and g. Note that in g ◦ f . it is true that x < x implies f (x) > f (x ). Visually. In this section.6 1. then we can compose f and g by ﬁrst applying f and then g. (1) Sketch the graph of f (x) = x2 . Let R be the set of all real numbers.3. it is true that x < x implies f (x) < f (x ). in general. In order to facilitate the discussion. f (x))x ∈ D(f )}. It is a good way of visually describing what a function does. 4. but f ◦ g is not. such as computer software packages and graphing calculators. 3. We have to be just a little bit more careful with f /g. g(x) = sin(x − 2). (f /g)(x) = f (x)/g(x). In terms of the . since this function is not deﬁned when g(x) = 0. So the domain of f /g is the intersection of the domain of f and the domain of g. Today. Exercises. while (f ◦ g)(x) = f (g(x)) = (x + 1)2 = x2 + 2x + 1. with the exception of the points x satisfying g(x) = 0. then we say that f is decreasing on I. then (g ◦ f )(x) = g(f (x)) = x2 + 1. let us agree on some terminology. If the domain of f contains an interval I and for all real numbers x and x in I. It is denoted by g ◦ f . Viewing the Graphs of Functions The graph of a function f is the set {(x. we have plenty of advanced tools.
perhaps in an irregular fashion and perhaps far away from the origin. we get the graph of an increasing function. since not all viewing windows are appropriate for all functions. We have to be careful. g(x) = x.4. x2 ] in which the value of x can range. with their periodicity. If we simply ask a computer or graphing calculator to plot the graph of a function without specifying the interval [x1 . For example. the viewing window [−10. it is worth noting that if f is a polynomial function of degree n. a viewing window that starts at a point x1 < −10 is necessary. 10] is appropriate as the behavior of these functions outside that window is similar to their behavior inside the window. The interval [x1 . This problem becomes more diﬃcult if we are dealing with functions that change from increasing to decreasing many times. when we discuss the derivative of a function. The reader is encouraged to plot the graph of the functions sin x. or the computer may simply substitute default values for x1 and x2 . This should raise our suspicion that the program does not properly display the graph of g around 0. this means that the graph goes roughly from the northwest to the southeast. In this case. that is. Now let f (x) = (x + 10)2 . are particularly good examples to demonstrate what software package can and cannot do. using the viewing window [−10. The next example shows why a viewing window that is too large can also mislead us. so its graph should not disappear anywhere. For functions like f (x) = x. the software package Maple 13 uses the default values x1 = −10 and x2 = 10. Indeed. Using the default viewing window [−10. in this case. Taking a closer look. then it cannot change directions more than n − 1 times. x2 ] is often called the viewing window. then we can be sure that we did not miss any of them. many software packages will show a graph that increases everywhere and disappears in a small interval to the left of 0. We will return to this topic in a later chapter. however. Trigonometric functions. For this reason. g is deﬁned for all real numbers. VIEWING THE GRAPHS OF FUNCTIONS 7 graph of f . we see a function that is actually decreasing between x = −1 and x = −1/2. 10]. or h(x) = x2 + 3. So. 10]. That is misleading since f is decreasing on the interval (−∞. and choosing an inappropriate viewing window may cause misleading results. −10]. changing the viewing window to [−1. . Plot the graph of the function g(x) = 4x3 + 9x2 + 6x + 1. If we found all n − 1 direction changes. we may get an error message. or some window containing that one. 1]. The preceding example showed why selecting a viewing window that is too small can be misleading.
tan(x/4). while f 1 sends y back to x. no horizontal line can intersect the graph of a onetoone function more than once. but h(x) = x2 is not. If there is more than one solution.8 1. and only in that case. sin(1/x) and explain the obtained graphs. Onetoone functions are also called injective functions or injections. and so it has no inverse function. If A is the set of positive real numbers. Let A and B both be the set of all real numbers. As x = f −1 (y). for sin(1/x). ﬁnally. Definition 1. Visually. then f −1 (y) = x. it is clear that f −1 (y) = x. Example 2. then f is not onetoone. B is the set of real numbers that are larger than 1. then that expression is the value of f −1 (y). that is. the choice of the viewing window is not important as long as it contains x = 0. 2 The preceding example shows a general strategy for ﬁnding the inverse of a function. That is. then f (x) = x and g(x) = x3 are both onetoone. when there is only one x ∈ A so that f (x) = y. If there is one solution. that is. so f sends x to y. In particular. then f −1 (y) = y − 1. and. the reader should try to explain why. Let f : A → B be given by f (x) = 2x + 7. if f (x) = y. Inverse Functions The inverse f −1 of a function f : A → B “undoes” what f did. Definition 2. Then the inverse of f is the function f −1 : B → A given by f −1 (y) = x if f (x) = y. it follows that f −1 (y) = (y − 7)/2. It goes without saying that this f −1 will only be a function if f −1 (y) is unambiguous. Then solve for x. Let us now formalize these concepts. then y = 2x + 7. Example 3. FUNCTIONS cos 2x. Then f −1 (y) = (y − 7)/2. so y − 7 = 2x and (y − 7)/2 = x. In that case. A function f : A → B is called onetoone if it sends diﬀerent elements into diﬀerent elements. and f : A → B is given by √ f (x) = x2 + 1. . Let f be a onetoone function with domain A and range B. with the appropriate algebraic expression replacing f (x). Write the equation f (x) = y. Solution: If f (x) = y. 5. if A and B are both the set of real numbers. if x = x implies that f (x) = f (x ). For instance.
f −1 (4) = 2. f −1 (32) = 5. Logarithmic Functions. But what is that inverse function f −1 ? By the deﬁnition of inverse functions in general. This important concept has its own name. Then the inverse of the function f (x) = mx is called the logarithmic function with base m. we point out that if f is a onetoone function with domain A and range B. and because we know that x is positive and y > 1. log5 (1/25) = −2. then logm (y) = x. then f −1 ◦ f is the identity function of A and f ◦ f −1 is the identity function of B. multiplications. and f −1 (1/2) = −1. and is denoted by logm . Logarithmic functions satisfy certain rules that are very similar to those satisﬁed by exponential functions and can. for all positive real √ numbers x. √ log b x = log x . However. Let m be a positive real number. functions of the form f (x) = xα . this is the function that sends 2x to x for all positive real numbers 2x . are not much more diﬃcult. and divisions. INVERSE FUNCTIONS 9 Solution: We have f (x) = x2 + 1 = y. That is. we can take the square root of both √ √ 2 sides. So if f (x) = xm = y. using the functions of Example 3. If a function contains only additions. For instance.1. 5. that f is a onetoone function whose domain is the set of all real numbers and whose range is the set of all positive real numbers. in fact. log(x/y) = log x − log y. It is easy to see. by plotting the graph of f or otherwise.5 (16) = −4. log2 (64) = 6. subtractions. then its inverse is often easy to compute. So x2 = y − 1. the identity (f ◦ f −1 )(y) = ( y − 1)2 + 1 = y − 1 + 1 = y holds. leading to x = y − 1. and log0. log3 (81) = 4. the identity (f −1 ◦ f )(x) = (x2 + 1) −√ = x2 = x 1 holds. f −1 (y) tells us to what power we have to raise 2 if the result is to be y. b aloga x = x. what is the inverse of an exponential function? Let f (x) = 2x . Power functions. be deduced from them. These are (I) (II) (III) (IV) (V) (VI) log(xy) = log x + log y. f −1 (2) = 1. So the inverse of f is a function from the set of positive reals to the set of all reals. and for all y > 1. f −1 (y) = y − 1. log (xa ) = a log x. where α is a real number. Finally. For instance. .5. Definition 3. loga (ax ) = x. that is. In particular. Hence.
ln. For positive real numbers a. 5. Theorem 1. we can compute the logarithm of any positive real number in any base. we have logb (x) . where e 2. So its inverse is the function . However. and tan. so their composition is an identity function. For this reason. If we know the logarithm of a number in a base and want to compute it in another base. are very important in calculus. So ln x = loge x. π/2]. and its range is the interval [−1. In that interval. and deﬁne their inverses based on that restriction.10 1. of period 2π or π. FUNCTIONS The last two rules simply express the fact that the functions f (x) = ax and f −1 (y) = loga (y) are inverses of each other. 2 Example 4. we will restrict our trigonometric functions to just a short interval. namely of base e. In fact. 1]. In order to get around this diﬃculty. and so they take every value in their range inﬁnitely often. and x. We can use Theorem 1 to compute log16 (256) from log2 (256) as follows: log16 (256) = 8 log2 (256) = = 2. For instance. sin is a onetoone function (since it is increasing). we have to be precise when we deﬁne them since trigonometric functions are not onetoone. b. Basic trigonometric functions. Start with the identity x = aloga x . many calculators and computers are programmed to work primarily with logarithms of one given base. so it is no surprise that their inverse functions are important as well. Now divide both sides by logb to get the identity of the theorem. such as sin. Inverses of Trigonometric Functions. The logarithm of base e is so important that it has its own name. and its own notation. Now take the logarithm of base b of both sides to get logb x = loga logb ax.2. consider sin as a function whose domain is [−π/2. natural logarithm. loga x = logb (a) Proof. we can do so using the following theorem. cos. in which they are onetoone.718 is an irrational number that will be formally deﬁned in Chapter 2. log2 (16) 4 So if a calculator or computer can provide the logarithm of all positive real numbers in one base. they are periodical.
cos−1 is the inverse function of the cos function that is restricted to the interval [0. (1) Is there a function f deﬁned on all positive real numbers for which f −1 = f ? (2) If we are given loga x. The inverses of the other trigonometric functions are deﬁned similarly. Let us assume that a car was on the road from 3:00 p. π/2). That is. sin−1 (1/2) = π/6.1. So cos−1 is a function with domain [−1. π/2]. if y ∈ [−1. so there is no danger of confusion. sec. Similarly. THE VELOCITY PROBLEM AND THE TANGENT PROBLEM 11 sin−1 : [−1.6. In physics. In the given example. and v is the average speed for the given time period. π]. then sin−1 y is the (only) x ∈ [−π/2. From the data. all due west. t where t is the time passed. s is the distance covered in time t. all travel was in one direction (west).m. The Velocity Problem and the Tangent Problem 6. on a given afternoon. 1] → [−π/2. That is. 5. then g is onetoone? 6.1) s v= . The inverse functions of cot. π/2). and we can use either word. hence the abbreviation v. π]. Its domain is the set of all real numbers. we talk about velocity instead of speed. 1] and range [0. just the intervals to which we restrict the functions (in order to make them onetoone) can change. 1]. can also be deﬁned analogously. π/2] for which sin x√= y. and it traveled a distance of 100 miles. while sin−1 (0) = 0 and sin−1 ( 2/2) = π/4.m. The Velocity Problem. and csc. when the direction in which an object is moving is taken into account.3. . how can we compute log1/a (x)? (3) For which values of a is loga an increasing function. to 5:00 p. it is easy to compute the average speed of the car by the formula (1. and its range is the interval (−π/2. Let us assume that time is measured in hours and distance is measure in miles. tan−1 is the inverse function of the tan function that is restricted to the interval (−π/2. For instance. Exercises. while not used often. and for which values of a is it a decreasing function? (4) What is the geometric connection between the graphs of f and f −1 ? (5) Is it true that if g is the inverse function of the onetoone function f .
So s(t) is the distance covered from the moment when the car started moving to the moment t hours later. we will not give a completely formal answer to the problem of deﬁning instantaneous velocity.m. In this section. FUNCTIONS Then Equation (1. the distance the car covered can be viewed as a function of the time that passed since the car started moving.12 1. In order to compute the average .m.. it sometimes may have gone faster or slower. 6. like the average velocity of the car between 4:02 p. These averages will approximate the instantaneous velocity. we will say the following. The car probably did not cover the entire distance at its average velocity. decreasing the value of both the numerator and the denominator of the fraction s/t.1) yields mi 100 mi = 50 . The instantaneous velocity of a car in a given moment m can be approximated by choosing smaller and smaller time periods containing m and computing the average speed of the car for those time periods. if it is given no time. and 4:05 p. such as exactly at 4:02:23 p.m. then we need know the distance it covered in that time period. 1/6 hr hr If we want more precise information. For various traﬃcrelated or other reasons.2. The problem of ﬁnding the instantaneous velocity of a moving object is simply a special case of a much more general problem. In the previous problem. because the denominator t is equal to 0. and 4:02:24 p. The numerator s is also equal to 0. what if we want to know the instantaneous velocity of the car in a given moment. However. However. we will leave that task to an upcoming section.m.1) is impossible. The Tangent Problem.m.)? In that case. (and not in the second that passed between 4:02:23 p. 2 hr hr so the average velocity of the car for the given twohour period is 50 miles per hour. a direct application of Equation (1. then we conclude that in that 10minute time period. that of ﬁnding the slope of a tangent line to a curve at a given point. since the car needs time to cover any distance. If we want to know its average velocity for the time period between 4:00 p.. the average velocity of the car was v= v= mi 10 mi = 60 . we can proceed similarly.m.m. it will cover no distance. and 4:10 p. If that distance is 10 miles.
t2 − t1 This fraction is precisely the slope of the line that intersects the graph of the function s at points (t1 . What is its average speed during the ﬁrst two hours of its trip? (3) I drove at 40 miles per hour for two hours. Finally. s(t1 )) and (t2 . we simply compute the value of the fraction s(t2 ) − s(t1 ) . then these points will get closer and closer together as well. 6. s(t1 )) since we will know only one. f (x)). What is the average speed of the car during this threehour period? (2) Consider the car of the previous exercise. and ask what the slope of the tangent line to this curve is at the point (x. However. the slope we are looking for will be approximated by the sequence of slopes of the lines that we got when we chose t1 and t2 closer and closer together. Finally. not two.01? (5) Consider the function g(x) = x3 . THE VELOCITY PROBLEM AND THE TANGENT PROBLEM 13 velocity for the time period from t1 to t2 . We could consider any function f : R → R.6. Exercises. then two hours at a speed of 45 miles per hour. s(t2 )). then we will not immediately know the slope of the line that touches the graph of s at the point (t1 .01? . we point out that there is nothing magical about the function s(t) here. and this will be made more precise in the next section. (1) A car travels one hour at a speed of 60 miles per hour. Can you ﬁnd two points P and Q on the graph of f such that the slope of the line P Q is between 0 and 0. Let P = (1. point of this line. How fast do I have to drive in my third hour if I want to reach an average speed of 45 miles per hour for my threehour drive? (4) Consider the function f (x) = x2 .3. 1). if we set t1 = t2 . If we choose t1 and t2 closer and closer together. Can you ﬁnd a point Q on the graph of g such that the slope of the line P Q is between 1 and 1.
.
Let us compute the values of f (x) for various real numbers x that are close to 0. The Limit of a Function 7. and. Consider the function given by the rule f (x) = 1/(1 + x). This phenomenon. in general. any choice of x in the 1 1 interval (− 1001 . TwoSided Limits. Indeed. • f (1/n) = n/(n + 1). 15 . • f (1/2) = 2/3. So for instance. and. then f (x) gets close to f (0) = 1. then 1 < f (x) < n/(n − 1). 999 ) will imply that f (x) − f (0) < 0. • f (−1/3) = 3/2. • f (−1/n) = n/(n − 1). 999 ) or any choice 1 of x in the interval (− 1001 . Similarly.CHAPTER 2 Limits and Derivatives 7. That is. we get • f (−1/2) = 2. and if −1/n < x < 0. then n/(n + 1) < f (x) < 1. all we need to do is to choose x suﬃciently close to 0. that is. In fact. Let f : R → R be a function and let a be a real number. for negative values of x. then any choice of x in the interval [0. We say that the limit of f in a is the real number L if the values of f (x) get arbitrarily close to L and stay arbitrarily close to L when x is suitably close to a without being equal to a. What we see is that if x gets close to 0 (from either side). The fact that the limit of f in a is L is expressed by the notation x→a lim f (x) = L.001. the fact that there exists an interval such that. the value of f (x) is closer to f (0) than a prescribed bound is so important in mathematics that it has its own name.1. • f (1/3) = 3/4. if we want f (x) to be closer 1 1 than 1000 to 1. in general. we conclude that if 0 < x < 1/n. • f (−1/4) = 4/3. We ﬁnd that • f (1) = 1/2. for each real number in that interval. we can get f (x) to be as close to f (0) = 1 as we want. looking at the previous examples. 0] will work. Definition 4.
0001 away. In order to answer this question. In particular. Several comments are in order. but then it cannot 1 also be closer than 106 to 1. the values of g(x) will sometimes equal and sometimes equal 0 for x ∈ I. . Second. if limx→0 f (x) exists. An analogous argument shows that no function can have two different limits at any one point. then f (x) will be closer than 106 to 1. First. And that is a problem. we must have a good understanding of the deﬁnition of limits. then the values of f (x) will get arbitrarily close to f (0) if x is chosen from a suitably small interval around 0. If 1 satisﬁes the requirements to be the limx→0 f (x). if x is close 1 enough to 0. Example 5. since we have seen at the beginning of this chapter that. the function f (x) = 1/(1+x). no matter how small an interval I we take around the point a = 0. That deﬁnition says that if limx→0 f (x) = L. Let us illustrate this using the introductory example of this section. We have seen that limx→0 f (x) = 1. it is not arbitrarily close to 1. limx→a g(x) does not always exist. then limx→0 f (x) = 1.0001 is close to 1. Hence. 0 if x < 0. Then the limit of g at a = 0 does not exist. f cannot have two diﬀerent limits at any given point a. Indeed. it is unique. f (x) has to be close to L if x is a little bit less than a.0001 not? After all. LIMITS AND DERIVATIVES So.16 2. that is. what is close to 1 is also close to 1. regardless of which of x or a is larger. While 1. At this point. why does 1. Note that the deﬁnition of limx→a f (x) requires that f (x) stay close to L when x is close to a. the values of f (x) will get arbitrarily close to 1. if f is the starting example of this section.0001.5 to it. Let g(x) = 1 if 0 ≤ x. one could ask the following question. and f (x) has to be close to L if x is a little bit more than a. as x approaches 0. we saw that the values of f (x) can get arbitrarily close to 1 if the real numbers x are chosen from a suitably small interval around 0.0001. That is. Indeed. So limx→0 g(x) does not exist. it is exactly 0. that interval I will contain some positive and some negative real numbers. though f (x) does not have to be close to L if x = a. no matter how small I is. The key word in the previous sentence is arbitrarily. There is no number L such that both 0 and 1 are arbitrarily close to it– in fact there is no number such that both 0 and 1 are both closer than 0.
we point out that in the deﬁnition of the limit. As x approaches 0. So the value of f (x) will not stay arbitrarily close to 0. we are led to believe that x→0 lim sin x = 1. since f (x) will take all other values in the interval [−1. It is time for us to give a precise mathematical deﬁnition of limits. x−3 x−3 So if we want f (x) = x + 3 to be closer to 6 than a given distance a. 1] inﬁnitely often as well as x approaches 0. x However. Consider the function f (x) = sin (1/x) around x = 0. then f (x) = (x + 3)(x − 3) x2 − 9 = = x + 3. 7. accurately. so f (x) will be as close to 0 as possible. The Precise Deﬁnition of Limits. All those times. since. Sometimes. Note that the fact that h(a) is not deﬁned is not a problem since the deﬁnition of limits speciﬁcally states that x should not be equal to a anyway. Then h is deﬁned for all real numbers except x = 3. as x approaches 0. . limx→3 h(x) = 6. eventually x and sin x will get so small that the computer will no longer manipulate them. Plotting the graph of the function or producing more numerical data should not be considered as a complete answer. the reader should test his or her understanding of the material by ﬁnding limx→−2 ((x2 + 3x + 2)/(x + 2)). but limx→0 h(x) still exists. the requirement that f (x) get close to L and stay close to L is important. limx→3 h(x) exists. we have not yet learned the techniques to rigorously prove this. or their ratio.7. Example 6. THE LIMIT OF A FUNCTION 17 Sometimes it can happen that h is not even deﬁned in a. In particular. the value of 1/x will increase very fast. 2 At this point. The advantage of this formal deﬁnition is that we can ﬁnally do away with the words arbitrarily close and suﬃciently close. no matter how close x is to 0. The price to pay for that is that we have to use more notation. f (0) = 0 will hold. limits are not easy to determine. However. limx→0 f (x) does not exist. Finally. Solution: If x = 3. Plotting the graph of the function h(x) = (sin x)/x. Still.2. then all we have to do is to choose x such that x − 3 < a. and so it will equal a multiple of π many times. Let h(x) = (x2 − 9)/(x − 3).
Example 7. if we choose x close to 0 but less than 0. For instance. with the possible exception of a itself. as required. then f (x) − L < . The fact that L is the lefthand limit of f in a is denoted by x→a− lim f (x) = L. since no real number L is arbitrarily close to both 0 and 1. The fact that L is the righthand limit of f in a is denoted by x→a+ lim f (x) = L. The function g of that example satisﬁed g(x) = 0 for negative values of x. Indeed. equal) to 0. onesided notions of limits that are relevant in this example. We have seen that limx→0 g(x) does not exist. if g is the function deﬁned in Example 5. For instance. OneSided Limits. Nevertheless. for all > 0. We have limx→0 2x sin x = 0. Then we say that the limit of f at a is L. So if x − 0 = x < δ = /2. There are functions that behave in a certain way up to a point a. LIMITS AND DERIVATIVES Definition 5.18 2. denoted by limx→a f (x) = L. then x→0+ lim g(x) = 1. then f (x) − 0 = f (x) = 2x sin x ≤ 2x < 2δ = . then g(x) = 0. We say that the righthand limit of f in a is the real number L if the values of f (x) get arbitrarily close to L and stay arbitrarily close to L when x is suitably close to a and x > a. if. if g is the function deﬁned in Example 5. there are weaker. so g(x) is arbitrarily close (in fact. Definition 7. there exists δ > 0 such that if x − a < δ. and g(x) = 1 for positive values of x. Solution: Let be any positive real number. We say that the lefthand limit of f in a is the real number L if the values of f (x) get arbitrarily close to L and stay arbitrarily close to L when x is suitably close to a and x < a. then x→0− lim g(x) = 0. We know that  sin x ≤ 1 for all x. We have seen such a function in Example 5. Let f be a function deﬁned on some open interval that contains the real number a. and then behave very diﬀerently after that. Let f : R → R be a function and let a be a real number.3. Definition 6. . Then let δ = /2. 2 7. Let f : R → R be a function and let a be a real number.
and righthand limit. equal) to 1. This is the content of the following deﬁnition. if g : R → R is a function. the reader should compare the deﬁnitions of limit. if both the lefthand limit and the righthand limit of f in a is equal to L. Inﬁnite Limits. Definition 8. the reader should check his or her understanding of the material by considering the function h(x) = x x as x approaches 0 and deciding if the limits limx→0 h(x). that is. If L = ∞. At this point. We say that the limit of f in a is ∞ if we can get f (x) arbitrarily large and keep it arbitrarily large if we choose x suitably close to a without being equal to a. so g(x) is arbitrarily close (in fact. Conversely. limx→0− h(x). the limit L was always a real number. then the limit of f in a exists and is equal to L.7. At this point. we say that the limit of g in a is −∞ if we can make g(x) a negative number with an arbitrarily large absolute value and keep g(x) that way if we choose x suitably close to a without being equal to a. The fact that the limit of f in a is ∞ is denoted by x→a lim f (x) = ∞. Let f : R → R be a function. the values of f (x) have to be close to L when x is close to a and x < a and also when x is close to a and x > a. The deﬁnition of limit (Deﬁnition 4) imposes the strongest requirements on the values of f . exist. they have to get as large as we want. and limx→0+ h(x). we extend those deﬁnitions to the cases of inﬁnite limits.4. if we choose x close to 0 but more than 0. The deﬁnitions of the lefthand and righthand limits impose weaker requirements in that each deﬁnition only requires that f (x) be close to L when x is on a given side of a and close to a. . THE LIMIT OF A FUNCTION 19 Indeed. 7. Indeed. In this section. then limx→a− f (x) = L and limx→a+ f (x) = L. then the values of f have to get arbitrarily close to ∞. In our deﬁnitions of limits in this section. then g(x) = 1. Similarly. lefthand limit. It then follows—and the reader should spend a minute verifying it—that if limx→a f (x) = L.
Note that if the limit of a function at a given point a is ∞ or −∞. Onesided inﬁnite limits are deﬁned in an analogous way. 2 . then g(x) < M . We say that the limit of g in a is −∞ if for all negative real numbers M . Let f (x) = 1/x. Then limx→0 f (x) = ∞. 7. limx→0 does not exist. We say that the limit of f in a is ∞ if. This phenomenon is referred to by saying that f has a vertical asymptote at a. The diﬀerence lies in the fact that it is not the same to be close to ∞ or to be close to a real number. if g(x) = −1/x4 .4. Furthermore. LIMITS AND DERIVATIVES Example 8. 0). Definition 9. limx→0− = −∞ and limx→0+ = ∞. we say that the righthand limit of f in a is ∞ if the values of f (x) get arbitrarily large and stay arbitrarily large when x is suitably close to a and x > a. 1/N ). let g : R → R be a function. Then x2 < 1/N will hold.1 The Precise Deﬁnition of Inﬁnite Limits. Then f is not deﬁned in 0. Similarly. as x approaches a. there exists > 0 such that if x − a < . As the two onesided limits are diﬀerent. Solution: If we want f (x) to be larger than an arbitrary positive real number N . The formal deﬁnition of inﬁnite limits is similar to that of ﬁnite limits. Let f (x) = 1/x2 . Definition 10. for all positive real numbers N . Let f : R → R be a function and let a be a real number. then limx→0 g(x) = −∞.4. We can make x larger than any positive number P by choosing x from the interval (0.2 OneSided Inﬁnite Limits. Example 9. implying that f (x) = 2 1/x2 > N . the graph of the function will approach a vertical line intersecting the horizontal axis at x = a. P ). as we can see in the following deﬁnition. Similarly. Let f : R → R be a function. We say that the lefthand limit of f in a is ∞ if the values of f (x) get arbitrarily large and stay arbitrarily large when x is suitably close to a and x < a. all we need to do is to choose x from the interval (− 1/N .20 2. then. Similarly. 7. Solution: We can make f (x) = 1/x smaller than any given negative number M by choosing x from the interval (1/M. there exists > 0 such that if x − a < . then f (x) > N .
LIMIT LAWS 21 7. that is.8. the value of f (x) + g(x). 1 1 (4) Does limx→0 x3 + x2 exist? (5) Give an example of a function f such that limx→1− f (x) = ∞. constant multiple. and (V) x→a lim f g (x) = limx→a f (x) limx→a g(x) if limx→a g(x) = 0. the value of (f + g)(x). If f and g are two functions and we know the 2 limit of each of them at a given point a. constant multiple. then we can easily compute the limit at a of their sum. x→a where c is a real number. and quotient. and quotient of two functions are deﬁned. Basic Limit Laws. product. then. and they are very similar to the ways in which the sum. x→a x→a (IV) x→a lim (c · f )(x) = c · lim f (x). product. For instance. Indeed. if f (x) gets arbitrarily close to L as x approaches a and g(x) gets arbitrarily close to L as x approaches a. . It is not diﬃcult to believe that these rules are valid.5. 8.1. diﬀerence. as x approaches a. The rules that provide this limit are given below. (1) Find limx→3 x −4x+3 . Exercises. Limit Laws 8. (I) x→a lim (f + g)(x) = lim f (x) + lim g(x). x−3 1 (2) Does limx→0 x exist? (3) Give an example of a function f such that limx→0− f (x) = 0 and limx→0+ f (x) = ∞. and f (1) is a real number. will get arbitrarily close to L + L . diﬀerence. x→a x→a (II) x→a lim (f − g)(x) = lim f (x) − lim g(x). x→a x→a (III) x→a lim (f · g)(x) = lim f (x) · lim g(x). This intuitive argument can be made formal using the precise deﬁnition of limits. limx→1+ f (x) = −∞.
Formal proofs will be given in the next section.22 2. if we repeatedly multiply a function by itself. limx→2 (f − g)(x) = limx→2 f (x) − limx→2 g(x) = 2 − 4 = −2. LIMITS AND DERIVATIVES Example 10. The reader is invited to verify that lim f (x) = lim x = lim x = 2. A few special cases of limit laws I–V are used so frequently that it is worth mentioning them separately. then to g. . First. we get a power of that function. applying the ﬁve limit laws. we get that for all positive integers n. Applying law III each time. Frequently Used Special Cases of Limit Laws. it makes sense to ﬁrst compute the limits of f and g at 2. Let f (x) = x and let g(x) = x2 . Find the limits of f + g. (2. and f g x→2 (V) lim f (x) lim g(x) = x→2 lim (x) = x→2 2 1 = . limx→2 (f · g)(x) = limx→2 f (x) · limx→2 g(x) = 2 · 4 = 8. we get that (I) (II) (III) (IV) limx→2 (f + g)(x) = limx→2 f (x) + limx→2 g(x) = 2 + 4 = 6. Solution: Based on the ﬁve limit laws given earlier. Indeed. x→2 x→2 x→2 and x→2 lim g(x) = lim x2 = lim x · lim x = 2 · 2 = 4. The reader is invited to verify that the limits of the constant function f (x) = c and the identity function f (x) = x are given by limx→a c = c for all a and limx→a x = a. and f /g at a = 2. limx→2 (3f + 2g)(x) = 3 limx→2 f (x) + 2 limx→2 g(x) = 3 · 2 + 2 · 4 = 14 (note that here we applied limit law IV to ﬁrst f . 3f + 2g.2. f − g. f g. and then we applied law I to 3f and 2g). Now it is simply a matter of basic algebra to compute the ﬁve limits that we have been asked to ﬁnd. Note that we have essentially applied this rule in the special case of n = 2 when we computed limx→2 x2 in Example 10. x→2 x→2 x→2 where we used the fact that g(x) = x2 = x · x. 4 2 2 8. so law III can be applied to compute the limit of g at 2.1) x→a lim (f (x))n = lim f (x) x→a n .
powers can be replaced by roots. That is. In this section. Now note that polynomials are nothing else but sums of constant multiples of power functions with nonnegative integer exponents. First. the value of f (a) does not have to satisfy any requirements. Hence. Second. What matters is what happens at points other than a. . if f (x) = x.2) x→a lim xn = an . then limx→a f (x) = limx→a g(x) as long as these limits exist can be signiﬁcantly strengthened.3) x→a lim n f (x) = n x→a lim f (x). we discuss a few facts about limits that are often used to compute limits. we can change f (a) to anything we want. (Here f (x) has to be nonnegative if n is even. we can conclude that if f (x) = g(x) for all points x = a.) So. using limit laws I and IV. for any real number a. then limx→a f (x) = limx→a g(x) as long as these limits exist. in other words. and hence limx→a f (x) = limx→a g(x) = 4. x→a 8. Let g(x) = x + 2 for all real numbers. Then. but are slightly diﬀerent in nature from the limit laws we discussed so far.1) to the identity function f (x) = x yields the equation (2. we have x→a lim p(x) = p(a). Equation (2. In fact. LIMIT LAWS 23 Applying Equation (2.2) can be interpreted by saying that the limit of a power function f (x) = xn at any point a is simply the value of f (a).3. It turns out (though it is not obvious) that in Equation (2. then √ √ lim n x = n a. Let p be a polynomial function. yielding (2. in particular.8. Other Useful Facts About Limits. Hence.4(1) for a possible direction for that. For instance. The statement that if f (x) = g(x) for all points x = a. let us recall that the deﬁnition of L = limx→a f (x) requires that f (x) get arbitrarily close to L if x is suﬃciently close to a but not equal to a. Then f (x) = g(x) unless x = 2. let f (x) = (x2 − 4)/(x − 2) for all real numbers x = 2 and let f (2) = 2010. Theorem 2. See Exercise 8. we get the following theorem. and L = limx→a f (x) will not change.1) the exponent n can be replaced by 1/n.
using limit law V. for all real numbers x. . Proof.4) did not hold. Then (2. 3 3 This inequality says that no matter how close x is to a. and h are functions such that. for some positive real number D. the inequality f (x) ≤ g(x) ≤ h(x) holds and lim f (x) = lim h(x) = L. See Exercise 8. then the values of g(x) should get arbitrarily close to it. Theorem 3.4) x→a lim f (x) ≤ lim g(x) x→a for any real number a as long as both limits exist. lim R(x) = lim x→a x→a q(x) limx→a q(x) q(a) 2 So far all the relationships that we discussed for limits involved equations. Proof. Then x→a lim R(x) = R(a). where p and q are polynomials. in particular. Corollary 1. the distance between g(x) at Lg is more than 2D/3. since if Lg exists. so D 2D g(x) > Lf − = Lg + . for all real numbers x. we get p(x) limx→a p(x) p(a) = = = R(a). then. Corollary 2 (Squeeze Principle). We will now discuss two rules that. the fact that the inequalities are not strict is important. we get the following statement from Theorem 2. If (2. then Lf = lim f (x) = D + lim g(x) = D + Lg x→a x→a would hold. This contradicts the deﬁnition of Lg . If f . Let f and g be two functions and assume that. That would lead to a contradiction.4(2) for a relevant question. LIMITS AND DERIVATIVES Now recall that a rational function is just the ratio of two polynomials. provided that x is suﬃciently close to a. Hence. f (x) > Lf − (D/3). involve inequalities. If R(x) = p(x)/q(x).24 2. then by ﬁrst applying limit law V. the inequality f (x) ≤ g(x) holds. 2 Note that in Theorem 3. since if x is so close to a that f (x) − Lf  < (D/3). Let R(x) be a rational function and let a be a real number such that R(a) is deﬁned. x→a x→a then limx→a g(x) exists and limx→a g(x) = L. and then Theorem 2. g.
Furthermore. Solution: Indeed. If limx→a g(x) exists. it is equal to L. In Exercise 8. (1) Let f (x) and g(x) be two functions that only diﬀer for a ﬁnite number of values of the variable x.4(3) you are asked to prove that this limit exists.4(4). let f (x) = −x and h(x) = x. . it follows that L ≤ limx→a g(x). You are asked to prove this in Exercise 8. a function is called continuous at a point x = a if its graph in a neighborhood of x = a can be drawn without lifting the pencil from the paper. Exercises. 1]. Let g(x) = x cos(log x). 2 The squeeze principle is very useful since it allows us to compute the limits of rather complicated functions as long as we can squeeze them between two functions with identical limits. CONTINUOUS FUNCTIONS 25 Proof. limx→0 f (x) = 2 limx→0 h(x) = 0. √ (5) Prove that limx→0 x sin( x) = 0. We could not have used limit law III to compute limx→0 g(x) since limx→0 cos(log x) does not exist. that is. and by applying Theorem 3 to g and h. then by applying Theorem 3 to f and g. Continuous Functions Intuitively speaking. by a “continuous” line. Then. The formal deﬁnition of continuity is as follows. (4) Prove that limx→0 cos(log x) does not exist.4. Then limx→0 g(x) = 0. (3) Explain why limx→a g(x) exists if the conditions of Corollary 2 hold. since cos(log z) is always a real number in the interval [−1.9. 9. it follows that limx→a g(x) ≤ L. Is it true that limx→a f (x) = limx→a g(x) as long as these limits exist? Why or why not? (2) Find an example of two functions f and g such that f (x) < g(x) for all real numbers x. so we can apply Corollary 2 to prove our claim. So if limx→a g(x) exists. 8. the inequality f (x) ≤ g(x) ≤ h(x) holds for all real numbers x. but there exists a real number a such that limx→a f (x) = limx→a g(x). Example 11.
If all these conditions hold. Let f be deﬁned in an open interval containing a. A function f is called continuous at a if the equality x→a lim f (x) = f (a) holds. the function f must be deﬁned in a such that f (a) exists. Note that Deﬁnition 11 really requires three things. If we had to lift the pencil from the paper. We say that f is continuous in a if.26 2. the neighborhood of a is a set S that contains an open interval (c. d). then the behavior of f at a is very similar to the behavior of f around a. If f is continuous at each point of the open interval (c. 9. Solution: This is a direct consequence of Theorem 2. of course. Theorem 2 stated that the limit of a polynomial function at a is equal to the value of the polynomial at a. they cannot be continuous there. d) containing a. for all > 0. there exists δ > 0 such that if x − a < δ. Examples of Continuous Functions. Example 12. The following are examples of functions that are continuous in every point where they are deﬁned. As the informal deﬁnition of continuity is very close to that of limits. the graph of f can be drawn without lifting the pencil from the paper.1. that would mean that some kind of “gap” would exist in the graph of f . if you really want a formal deﬁnition. 9. Definition 12. The limit of f at a must exist. then f (x) − f (a) < . Example 13. . so the requirements of Deﬁnition 11 would not be satisﬁed.0. 2 There are many classes of functions that are continuous at every point where they are deﬁned. and the value of f (a) must agree with the limit of f at a. Finally. If a function f : R → R is continuous at all a ∈ R. in particular. Polynomial functions are continuous. which we discussed in the last section. If they are not deﬁned somewhere. d). Let us consider some of the most frequently used continuous functions. then it is called continuous. LIMITS AND DERIVATIVES Definition 11. The Precise Deﬁnition of Continuity. then. then we say that f is continuous on (c. it is not surprising that their precise deﬁnitions are also similar.1. which is precisely what the deﬁnition of continuity requires.
the limit of this function does not exist in a = 0.9. New Continuous Functions from Old. when f is a rational function whose denominator becomes 0 when x = a. Let a = −3.3. x→a 6 The interested reader is invited to think about the following example. . Or it could be that g is deﬁned at a. (II) f − g. Let f (x) = 1 if x is rational and let f (x) = 0 if x is irrational. it could happen that h is deﬁned in a and the limit of h at a exists. if h(x) = (x + 3)/(x2 − 9) if x = 3 and h(x) = 1 if x = 3. Theorem 4. 9. (III) f · g. 9. Functions That Are Not Continuous. for instance. but h(a) is not equal to this limit. and (V) f /g as long as g(a) = 0. (IV) cf . CONTINUOUS FUNCTIONS 27 (I) (II) (III) (IV) (V) Rational functions Exponential functions Trigonometric functions Logarithmic functions Inverse trigonometric functions The reader is invited to recall the graphs of each of these functions and verify that they consist of continuous lines as long as they are deﬁned. but limx→a g(x) does not exist at a. for example. The following function is not continuous anywhere. There can be three reasons for this. It is time to stop for a moment and think about functions that are not continuous at a given point a. So g is not continuous at 0. even if g(0) is deﬁned. An example of this is the function deﬁned by g(x) = 1 if x ≥ 0 and g(x) = 0 if x < 0. First. It follows from the limit laws that several transformations preserve the continuous property of functions. it could be that f (a) is not deﬁned.2. Let f and g be two functions that are continuous at a and let c be a real number. Then all of the following are also continuous functions at a: (I) f + g. Finally. That happens. Excursion 1. Then 1 h(a) = 1 = lim h(x) = − . As we have seen before.
Let f and g be two functions such that f is continuous at a and g is continuous at f (a). then there is a time in between when the tree is exactly 4. Then f is continuous everywhere.” Formally.28 2. OneSided Continuity. We say that a function is continuous on an interval [a. Let g be the function deﬁned by g(x) = 1 if x ≥ 0 and g(x) = 0 if x < 0. We say that the function f is leftcontinuous at a if f (a) = limx→a− f (x). The function h(x) = 2 + sin x is continuous at all real numbers a. b] if it is continuous at all points of (a.47 feet tall. It follows from successive applications of the previous √ theorem that h(x) = ex · sin x + 3 ln x − x is continuous at all positive real numbers a. This theorem is important since it enables us to prove the continuity of functions that would otherwise be cumbersome to handle. Perhaps the most important prop erty of continuous functions is that they do not skip any values between two values that they actually take. On the other hand. Then the composition function g ◦ f is continuous at a. The reader is invited to verify that f is continuous at a if and only if f is both leftcontinuous and rightcontinuous at a. Then limx→0− g(x) = 0 = 1 = g(0). The following important theorem also holds. so g is not leftcontinuous at 0. Similarly. leftcontinuous at a.5. so g is rightcontinuous at 0. then there would be a . Example 16. b). the statement follows. Definition 13. For instance. either from the “left” or from the “right.4. 9. limx→0+ g(x) = 1 = g(0). A function may happen to be continuous in only one direction. √ Example 15. √ Solution: Let f (x) = 2 + sin x and let g(x) = x. and rightcontinuous at b. LIMITS AND DERIVATIVES Example 14. The intuitive reason for this is that if there were a value in between that is not taken by the function. Theorem 5. Intermediate Value Theorem. 2 9. and g is continuous at all positive real numbers. we say that f is rightcontinuous at a if f (a) = limx→a+ f (x). if a tree grows from 3 feet to 6 feet. As f (x) is always a positive real number. though it is not a direct consequence of our limit laws. this means the following.
Solution: Let f (x) = x+ex . We say that the limit of f at ∞ is the real number L if the values of f (x) get arbitrarily close to L and stay arbitrarily close to L when x is suitably large. b]. then there exists x ∈ [a. There is a real number x in the interval [0. f takes all values between y1 and y2 on the interval [a. Example 17. Then f is continuous everywhere. b] such that f (x) = y. 2 9.6. Indeed. Limits at Inﬁnity 10. if f (a) = y1 and f (b) = y2 and y is a real number that is between y1 and y2 . we get that f takes all values between 1 and 1 + e on that interval. In Section 7. 2). Then. by the intermediate value theorem. LIMITS AT INFINITY 29 gap in the graph of the function. we extend that deﬁnition and deﬁne what it means for a function to have a limit L at ∞ or at −∞. b]. 1] such that x + ex = 2. contradicting the requirement that the function be continuous. Theorem 6 (Intermediate Value Theorem). Where is tan x continuous? Where is 1/x not continuous? 3x+2 Where is 5x2 −6x+1 continuous? Where is sin(x2 ) continuous? Prove that the equation x5 −x−1 = 0 has a root in the interval (−1. This deﬁnition follows the idea of the deﬁnition of limits at ﬁnite points. Let f be a function that is continuous on the interval [a. in order for limx→∞ f (x) = L to hold. In other words. 2). This is the content of the next theorem. So.1. Definition 14. Finite Limits at Inﬁnity.10. The fact that the limit of f at ∞ is L is expressed by the notation x→∞ lim f (x) = L. 10. we require that . In this section.71. including y = 2. we deﬁned what it meant (1) (2) (3) (4) (5) for a function to have a limit L at a real number a. ∞). Let f : R → R be a function that is deﬁned on some interval (b. Exercises. f (0) = 1. and f (1) = 1 + e > 3. (6) Prove that the equation x3 − 3x − 1 = 0 has at least two roots in the interval (−1.
Let f (x) = 1/x2 . ∞). LIMITS AND DERIVATIVES the values of f (x) get arbitrarily close to L and stay arbitrarily close to L if x is large enough. The Formal Deﬁnition of Limits at Inﬁnity.1. ∞). There we said that limx→a f (x) = L if f (x) got arbitrarily close to L and stayed arbitrarily close to L once x was suitably close to a. Recall that this is analogous to what we required in the ﬁnite case. Then x→−∞ lim f (x) = 0. 2 The deﬁnition of limits at −∞ is what the reader probably expects. the values of f (x) will stay between 0 and . Definition 15. The only diﬀerence is in the formal description of what it means to be in a neighborhood of inﬁnity versus what it means to be in a neighborhood of a real number. Example 19. in an open interval (c. Example 18. for all positive real numbers . it suﬃces to choose x such that x < −1/ . there exists a positive real number N such that if x > N .30 2. Let f : R → R be a function deﬁned on some interval (b. b). Let f : R → R be a function deﬁned on some interval (−∞. all we have to do is to select x such that x > 1/ holds. We say that limx→∞ f (x) = L if. then f (x) − L < .1. Then x2 > 1/ . when x was in a suitably selected neighborhood of a. . Let f (x) = 1/x. The fact that the limit of f at −∞ is L is expressed by the notation x→−∞ lim f (x) = L. 10. Solution: If we want the value of f (x) to be closer than to 0. Then x→∞ lim f (x) = 0. We say that the limit of f at −∞ is the real number L if the values of f (x) get arbitrarily close to L and stay arbitrarily close to L when x is a negative number with a suitably large absolute value. Solution: If we want to get f (x) closer√ than to 0 and keep it there. Definition 16. The formal deﬁnition of limits at inﬁnity is very similar to that of limits at ﬁnite points. in other words. Here “x is large enough” means that x is in a suitably selected neighborhood of ∞. Once x gets past 1/ . and hence 2 f (x) = 1/x2 < . that is.
The only diﬀerence is again in the formal description of what it means for x to be in a neighborhood of −∞. if f (x) gets arbitrarily large and stays arbitrarily large if x gets suﬃciently large. Each of these deﬁnitions refers to a fact that the values of a function get arbitrarily far away from 0 and stay arbitrarily far away from 0 (in the appropriate direction) if x gets suﬃciently far away from 0 (in the appropriate direction). (III) limx→−∞ h(x) = −∞. LIMITS AT INFINITY 31 The formal deﬁnition of limits at negative inﬁnity is analogous. denoted by x→∞ lim f (x) = ∞. It can happen that the limit of a function at ∞ is not a real number but rather ∞ or −∞. and limx→−∞ x3 = −∞. The line y = L is called a horizontal asymptote of the graph of y = f (x) when limx→∞ f (x) = L or limx→−∞ f (x) = L holds. Example 20. . We say that the limit of f at ∞ is ∞. The Graphical Meaning of a Finite Limit at Inﬁnity.2. Solution: In order to get f (x) to be larger than some given positive real number M . while limx→−∞ x2 = ∞.1. 10. Let f (x) = ex . for all positive real numbers . It means to be in an interval (−∞. Definition 18. The graph may or may not actually touch that line or even become that line. We say that limx→−∞ f (x) = L if. then f (x) − L < . there exists a negative real number N such that if x < N .2. c). Inﬁnite Limits at Inﬁnity. Let f : R → R be a function deﬁned on some interval (b. ∞). it suﬃces to choose x > ln M . 2 The following notation is deﬁned in an analogous way: (I) limx→∞ f (x) = −∞. 10. The reader should test his or her understanding of these concepts by verifying that limx→∞ 1 − x = −∞. b). then the graph of the function will approach the horizontal line y = L at that inﬁnity. Then limx→∞ f (x) = ∞. (II) limx→−∞ g(x) = ∞. Definition 17. Let f : R → R be a function deﬁned on some interval (−∞.10. If a func tion f has limit L at ∞ or −∞.
32
2. LIMITS AND DERIVATIVES
10.2.1. The Formal Deﬁnition of Inﬁnite Limits at Inﬁnity. By now, the formal deﬁnition of inﬁnite limits at inﬁnity probably does not come as a surprise. We are providing a formal deﬁnition for one of the four possible scenarios that can occur due to changes in sign. The other three cases are analogous.
Definition 19. Let f : R → R be a function deﬁned on some interval (b, ∞). We say that limx→∞ f (x) = ∞ if, for all positive real numbers M , there exists a positive real number N such that if x > N , then f (x) > M .
10.3. Computing Limits at Inﬁnity. The limit laws that we learned for
limits at ﬁnite points stay true for limits at inﬁnity as well, provided, of course, that they make sense. Here are a few examples. Example 21. We have x+3 = 1. x→∞ x − 4 lim It would be wrong to argue as follows: “The numerator is the function f (x) = x + 3, and the denominator is the function g(x) = x − 4. At ∞, they both have limit ∞, so, by the limit law for quotients, the limit of their quotient is 1.” The problem with this argument is that ∞ is not a number. So ∞/∞ is not deﬁned. It is possible for f and g both to have limit ∞ at ∞, and for f /g to have limits c at ∞, for any given real number c. Indeed, let f (x) = cx and let g(x) = x. Instead, we can solve Example 21 as follows. Solution: x+3 (x − 4) + 7 = lim x→∞ x − 4 x→∞ x−4 7 = lim 1 + x→∞ x−4 7 = 1 + lim x→∞ x − 4 =1+0 = 1. lim 2 We would like to point out other pitfalls when dealing with the application of limit laws and inﬁnite limits. The following expressions are not deﬁned:
10. LIMITS AT INFINITY
33
(I) ∞ + (−∞) (II) ∞ · 0 and −∞ · 0 (III) 1∞ and 1−∞ The following theorem is very useful when dealing with limits at ∞. Theorem 7. Let r be a positive rational number. Then 1 = 0. x→∞ xr lim If r is an integer, then this statement follows from the fact that limx→∞ 1/x = 0 by applying limit law III (for products) r times. If r = p/q, where p and q are positive integers, then we can ﬁrst prove √ the theorem for xp , and then, using the root law, for xp/q = q xp . Many limits can be computed with the help of this theorem. Example 22. We have x2 + 3x + 1 = 0. lim x→∞ x3 Solution: We have x2 + 3x + 1 x2 3x 1 = 3 + 3 + 3, 3 x x x x and each of the three summands has limit 0 at ∞ by the preceding theorem. Hence, by the limit law for sums, so does their sum. 2 Note that the limit would not change if we changed the denominator from x3 to x3 +3x2 +4x+5. This would have decreased the value of our function, but would have still kept it positive. Hence, by the squeeze principle, we can then conclude that x2 + 3x + 1 = 0. x→∞ x3 + 3x2 + 4x + 5 lim
10.4. Exercises.
(1) (2) (3) (4) (5)
x+1 Find limx→∞ x2 +4 . 2 Find limx→∞ 3x x+4x+1 . 2 +4 x3 +2x Find limx→∞ x2 +4 . 2 Find limx→−∞ 3x x+4x+1 . 2 +4 Let R(x) = p(x)/q(x) be a rational function. Explain how limx→∞ R(x) depends on p(x) and q(x).
34
2. LIMITS AND DERIVATIVES
11. Derivatives
11.1. Tangent Lines. Let us consider a function, such as f (x) = x2 , and
its graph. Let us choose a point on the graph, say the point P = (3, 9). Now let us look for the slope of the tangent line to the graph at that point. That is, consider a sequence of points P1 , P2 , . . . that are all on the graph of f and are closer and closer to P . For each of these points, draw the line Pi P . The slope of these lines will approach a certain slope, and so the lines Pi P will approach a certain line. That line is called the tangent line of f at P . Definition 20. Let f be a function and let P = (a, f (a)) be a point on the graph of f . Then the tangent line to f at P is the line that contains P and has slope f (x) − f (a) , (2.5) lim x→a x−a provided that this limit exists. Note that in the preceding deﬁnition, (f (x)−f (a))/(x−a) is simply the slope of the line connecting the points P and (x, f (x)). Example 23. In our running example, that is, when f (x) = x2 and P = (3, 9), the tangent line is the line that goes through P and has slope f (x) − f (3) x2 − 9 = lim = lim (x + 3) = 6. x→3 x→3 x − 3 x→3 x−3 11.2. Velocities. Recall that in Section 6, we mentioned that the average velocity of a moving object, such as a car, can be computed by the rule v = s/t. That is, the average velocity is equal to the distance covered divided by the time needed to cover that distance. However, what can be said about the instantaneous velocity, that is, the velocity in a given moment? We could not answer that question in Section 6 since we did not have the tools to handle the fact that when only a given moment is considered, both the numerator and the denominator of the formula v = s/t are 0. Now that we have learned about limits, we can overcome that diﬃculty as follows. lim Definition 21. Let f (t) be a function such that f (t) is the distance covered by a moving object in t units of time. Then the instantaneous velocity of the object a units of time after it starts moving is f (t) − f (a) , v(t) = lim t→a t−a provided that this limit exists.
Furthermore. 5). A car starts out by accelerating for 10 seconds so that the distance covered in the ﬁrst t seconds is obtained (in meters) by the function f (t) = 1 t2 if t ≤ 10. What is the instantaneous velocity 2 of the car after 4 seconds? Solution: compute By the deﬁnition of instantaneous velocity. led to very similar deﬁnitions suggests that there is a very general principle at work and we have seen two special cases of that principle. (3) Show an example of a curve that does not have a tangent line at some point a because the limit deﬁned in (2. (1) Find the slope of the tangent line to the curve f (x) = 3x2 − 7 at the point (2. the instantaneous velocity at time a is the derivative of the distance covered (as a function of the time t needed to cover that distance) at t = a.3. 2 11. the car will move at a rate of 4 meters per second. the tangent line and the instantaneous velocity. This is indeed the case.5) does not exist.4. Definition 22. the derivative is a common generalization of the concepts of tangent line and instantaneous velocity. So. t→4 t→4 2(t − 4) t→4 t−4 2 v(4) = lim So. The fact that the last two concepts. The derivative of f at a is the limit f (x) − f (a) f (a) = lim x→a x−a if this limit exists and is ﬁnite. (2) Find the slope of tangent line to the curve f (x) = x3 at x = 0. at the end of the fourth second (exactly 4 seconds after starting out). In other words. 2 . (4) The distance covered by a car in a certain time period is described by the function f (t) = tm + t2 (b − m) . The Derivative of a Function. we must f (t) − f (4) t2 − 16 t+4 = lim = lim = 4. in particular.11. Let f be a function. f (a) is the slope of the tangent line of f at a (unless that tangent line is vertical). DERIVATIVES 35 Example 24. Exercises. 11.
in order to understand how farreaching the concept of derivatives is.2. if x changes from x1 to x2 . 12. x2 ). LIMITS AND DERIVATIVES where b and m are positive constants. just as we have done twice in the last section.36 2. x2 ) is then the ratio Δy y2 − y1 = . The function f is called the derivative of f . we do not know how f (x) = y behaves in the rest of the interval. The operation that takes f into f is called diﬀerentiation. then the change in y can be described in terms of the change in x. then the quantity denoted by y depends on the quantity denoted by x. the derivative of the function f is deﬁned as the limit f (x) − f (a) . The Derivative of the Function f . (5) Find the derivative of the function f (x) = x + 5 at a = 3. then we have to use the notion of limits again. x2 →a Δx→0 Δx x2 − a lim 12. x2 − x1 Δx where Δx is the change (or increment) of x. This is sometimes expressed by saying that x is the independent variable and y is the dependent variable. Find the instantaneous velocity of the car at a given moment t = a. x→a x−a Note that this deﬁnition associates the real number f (a) to the real number a. If we want more precise information. 1]. Let us assume that t ∈ [0. In the last section. f : R → R is a function. we saw that the derivative of a function at a given point was a common generalization of the concepts of tangent lines and instantaneous velocities. Rates of Change. We will now further elaborate on that. f (a) = lim . then y = f (x) changes from y1 = f (x1 ) to y2 = f (x2 ). That is. In particular. Recall that. we deﬁne the instantaneous rate of change of f (x) = y as f (x2 ) − f (a) Δy = lim . The average rate of change for the interval (x1 . We have to use the word “average” since we only have information about the values of y at the endpoints of the interval (x1 .1. This explains the following deﬁnition. That is. at a given point a. The Derivative as a Function 12. such as the instantaneous rate of change of f (x) = y at a given point. at a given point x = a. If f is a function and f (x) = y. If x changes.
Differentiability Versus Continuity. so y = f (x) held.12. Solution: We have f (x) − f (a) x3 − a3 = lim x→a x→a x − a x−a (x − a)(x2 + xa + a2 ) = lim x→a x−a 2 = lim x + xa + a2 lim = 3a . • Dx f (x). df • dx . However. If f is diﬀerentiable at a. 2 The functions we have considered so far had only one independent variable. We say that f is diﬀerentiable on the interval (a. there are additional ways to denote the function f such as dy • dx . usually when f depends on more than one variable. The deﬁnitions of diﬀerentia bility and continuity are similar. b) if f is diﬀerentiable at d for all d ∈ (a. d • dx f (x). and f (a) = 3a2 . then f (a) = lim f (x) − f (a) . b). So it was always clear that the derivative was taken with respect to x. then f is continuous at a. Which one imposes stronger requirements on a function at a given point? The following theorem shows that diﬀerentiability is the stronger requirement. Therefore. The dependent variable was usually denoted by y. Theorem 8. usually the variable x. x→a 2 12. The function f (x) = x3 is diﬀerentiable in every real number a. or • Df (x). If f is diﬀerentiable at a. Example 25. Proof. THE DERIVATIVE AS A FUNCTION 37 Definition 23. A function f is called diﬀerentiable at a if f (a) exists. there are circumstances when this is not so clear.3. x→a x−a .
x→a x→a Adding f (a) to both the far left and far right sides. .6) is equal to 0 as well. we get f (x) − f (a) . x→a Finally. the limit shown on the righthand side exists. we obtain f (a)(x − a) = (x − a) lim (2.6) f (a) · lim (x − a) = lim (f (x) − f (a)). the righthand side of (2. It could be that the graph of the function has a “corner. x→a x−a Now. Hence. but it is not diﬀerentiable. that is. taking limits at a on both sides. In general. and hence the slope of the tangent line cannot be deﬁned because the lefthand limit and the righthand limit of the lines approaching the purported tangent line are not equal. See Exercise 12. the function f (x) = x is continuous at a = 0. note that the lefthand side of (2. The reader is invited to prove this by showing that x→0 lim − x − 0 x x x − 0 = lim = lim = lim .5(5) for an example of this. LIMITS AND DERIVATIVES in particular. we get that f (a) = lim f (x). x→a x→a since we can apply the limit law for products on the righthand side to get that x→a lim (x − a) lim f (x) − f (a) f (x) − f (a) = lim (x − a) · x→a x→a x−a x−a = lim (f (x) − f (a)) . x→a which means that f is continuous at a. it could be that the function has a vertical tangent line at the given point.38 2. 2 The converse of Theorem 8 is not true. − x + x + x − 0 x→0 x→0 x→0 x−0 f (0) = lim x − 0 x→0 x − 0 and hence does not exist. Multiplying both sides by the function g(x) = x − a. there are several reasons a continuous function may fail to be diﬀerentiable at a given point. Indeed.” like that of x at 0. Or.6) is equal to 0 since f (a)(x − a) is a polynomial that takes value 0 when x = a. 0 = lim (f (x) − f (a)) = (lim f (x)) − f (a).
So f (x) = 6x. the graph of f has a vertical tangent line. (2) Let f (x) = x3 . Higherorder derivatives are deﬁned in an analogous way. Therefore. b). the nth derivative is denoted by f (n) .4 HigherOrder Derivatives. These functions appear so often that they have their own names. Prove that f (x) = 6 for all real numbers x. in general. In upcoming chapters. f (x) − f (1) = ∞. then its derivative is called the third derivative of f and is denoted by f . Compute f (a) at some point a = √ (5) Let f be deﬁned on the interval [0. So the graph of f (x) is the union of two quarters of a unit circle. Example 26. Exercises. the seventh derivative of f is denoted by f (7) . Similarly.5(2). and f (x) = − 1 − (x − 2)2 if 1 < x ≤ 2.12. if f is diﬀerentiable on (a. (4) Let f (x) = 1/x. In Exercise 12. lim x→1 x−1 . then f (x) = 3x2 . We have seen in Example 25 that if f (x) = x3 . THE DERIVATIVE AS A FUNCTION 39 12. b) and its derivative f is also diﬀerentiable on (a. then the derivative of f is called the second derivative of f and is denoted by f . at x = 1.5(1).5. it will often be useful to consider not only the derivative of a function but also the derivative of the derivative and even the derivative of the derivative of the derivative. Compute f 4 (x). (1) Let f (x) = x3 . and in Exercise 12. f (x) − f (a) f (a) = lim x→a x−a 3(x2 − a2 ) = lim x→a x−a = lim 3(x + a) x→a = 6a. Compute f (a) at some point a > 0. b). 0. What can be said about higherorder derivatives of f ? √ (3) Let f (x) = x. 2] by f (x) = 1 − x2 if 0 ≤ x ≤ 1. For instance. you are asked to prove that f (x) = 6 for all x. you are asked to compute higherorder derivatives of f . Prove that. If f is a diﬀerentiable function on an interval (a. that is. and. but denoted slightly diﬀerently. 12.
.
CHAPTER 3 Rules of Differentiation 13.” power functions. Polynomials. Let c be a real number and let f (x) = c for all x. we will deduce general rules for the derivatives of polynomial functions. we point out that. it makes perfect sense. Let us recall that polynomials are sums of power functions with nonnegative integer exponents. The simplest of these is the class of constant functions. Derivatives of Polynomial and Exponential Functions 13. We start by their “building blocks. We have f (a) = lim f (x) − f (a) x→a x−a c−c = lim x→a x − a = 0. Let n be a positive integer and let f (x) = xn . 41 . Then f (a) = 0 for all real numbers a. where n is a positive integer. Note that limx→a (c − c)/(x − a) = 0 since (c − c)/(x − a) = 0 for all values x = a. those of the form f (x) = xn . Theorem 10. Before we prove the theorem. Let us recall the algebraic identity xn − an = (x − a) · (xn−1 + xn−2 a + · · · + xan−2 + an−1 ). Theorem 9. then f never changes (it has zero change). intuitively. The derivative of a function f describes the rate of change of f . Then f (a) = nan−1 . In this section. 2 We now turn our attention to a more general class of power functions. such as the function f (x) = 3x2 + 4x + 6. but if f is a constant function.1. Proof of Theorem 9.
2 The other two rules and their proofs are so similar that they are left as exercises. Let f and g be two functions that are diﬀerentiable at a. We have (f + g) (a) = lim (f + g)(x) − (f + g)(a) x→a x−a f (x) − f (a) g(x) − g(a) + = lim x→a x−a x−a = f (a) + g (a). Derivatives are limits of certain func tions. you are asked to prove two special cases of this general result. Theorem 11. then f (x) = 3x2 . In the exercises. That is. 13. for all real numbers α.1. Let f and g be two functions that are diﬀerentiable at a. then f (x) = αxα−1 . Then f − g is diﬀerentiable at a. and cf .42 3. then we can easily compute the derivative of f + g.1 Three Simple Rules. where c is a given real number. We will see a formal proof of this fact later. It turns out that Theorem 10 holds even if n is not a positive integer. so it is not surprising that some of the laws governing their computation are very similar to limit laws. 2 Note that this agrees with our result from the last section that showed that if f (x) = x3 . . RULES OF DIFFERENTIATION Proof. Proof. f − g. Then f + g is diﬀerentiable at a. We have f (x) − f (a) f (a) = lim x→a x−a xn − an = lim x→a x − a (x − a) · (xn−1 + xn−2 a + · · · + xan−2 + an−1 ) = lim x→a x−a n−1 n−2 + x a + · · · + xan−2 + an−1 ) = lim (x = na x→a n−1 . Theorem 12. That is. if f (x) = xα . and (f − g) (a) = f (a) − g (a). and (f + g) (a) = f (a) + g (a). if we know the derivative of f and g. The rules are as follows.
Let p(x) = 3x3 + 5x2 − 6x + 8. First. DERIVATIVES OF POLYNOMIAL AND EXPONENTIAL FUNCTIONS 43 Theorem 13. hence. note that ba is a constant that does not depend on z. and (f /g) = f /g . and 13 to get p (x) = (3x3 ) + (5x2 ) − (6x) + (8) = 3(x3 ) + 5(x2 ) − 6(x) + (8) = 9x2 + 10x − 6.13. the deﬁnition of the derivative yields f (0) = limz→0 (bz − 1)/z. We used this fact in the last line. in general. Then we can apply Theorems 11. Exponential Functions. and (cf ) (a) = cf (a). Find p (x). Solution: Note that p(x) is just a sum (and diﬀerence) of constant multiples of power functions. . Theorems 11. (f g) = f g . where b is some positive constant. Then cf is diﬀerentiable at a. Let us now compute the derivative of the exponential function f (x) = bx . Let f be a function that is diﬀerentiable at a. 12. By the deﬁnition of derivatives. (3. In other words.2. note the substitution z = x−a in the third line. the limit law for constant multiples was used in the fourth line. The derivatives of power functions are computed in Theorem 10. Second. We will learn some more complicated rules to compute the derivatives of f g and f /g in the next section. and 13 enable us to compute the derivative of any polynomial function. That is. in the special case when a = 0. we get f (x) − f (a) f (a) = lim x→a x−a x b − ba = lim x→a x − a ba+z − ba = lim z→0 z z b −1 = ba lim z→0 z = ba f (0). and let c be a real number.1) f (x) = f (0)bx = f (0)f (x). 12. Several comments are in order. Example 27. Third. It is very important to point out that the other limit laws do not carry over to derivatives in the same fashion. 2 13.
such that ez − 1 = 1. Numerical experimentation suggests that the larger b is.44 3. f (0) = lim 13. The Product and Quotient Rules 14. in general. The constant in question is f (0). we will see what this implies for the derivatives of exponential functions with bases diﬀerent from e. so (f g) (x) = (2x2 + 5x + 2) = 4x + 5. the larger this slope is. while f (x) = 2 and g (x) = 1. . it is just a little bit more complicated than the limit law for products. Exercises. then f (a) = 2√a . the larger this limit is. f (0) is the slope of the tangent line to the curve of f (x) = bx at the point x = 0. For instance. the derivative of f (x) = ex is f (x) = ex itself. The Product Rule. Let e be the real number such that ez − 1 lim = 1. RULES OF DIFFERENTIATION That is. if f (x) = 2x + 1 and g(x) = x + 2. Equation (3. close to 2. then (f g)(x) = 2x2 + 5x + 2. the derivative of the function f is a constant multiple of f . so f (x)g (x) = 2. it can be proved that there exists a real number e. This is the focus of our ﬁrst theorem in this section. limz→0 (bz − 1)/z. Compute f (x). In particular. and plotting f for various values of b suggests that the larger b is. in the special case of b = e. 14. ez − 1 = 1. In the next section. since (1) (2) (3) (4) (5) 1 Prove that if f (x) = x1/2 and a = 0. Prove Theorem 12. Prove Theorem 13. Graphical experimentation suggests this as well.3. It turns out that there is a rule to compute the derivative of a product. that is. Prove that if f (x) = 1/x and a = 0. z→0 z Definition 24. (f g) = f g . Indeed.1.1) takes the form (ex ) = ex . then f (a) = − a12 . Let f (x) = 3x3 − 4x2 + x − 2 + 4ex . z→0 z lim So.71. z→0 z That is. We mentioned in the last section that.
2. The rule for the derivative of the quotient of two functions is a little bit more complicated than that for the derivative of the product of two functions. Then h (x) = (f g) (x) = f (x)g (x) + f (x)g(x) = x2 (ex ) + (x2 ) ex = x2 ex + 2xex = ex (x2 + 2x). Though more complex. Let f and g be two functions that are diﬀerentiable at a and let us assume that g(a) = 0. 14.2). Theorem 15. The Quotient Rule. Then f g is diﬀerentiable at a. Proof. and we have g(a)f (a) − f (a)g (a) f (a) = . Let f and g be two functions that are diﬀerentiable at a. both the rule and its proof bear some similarity to the rule given in Theorem 14. we obtain from Equation (3. g g(a)2 . 2 Example 28. Let f (x) = x2 and g(x) = ex .2) (f g) (a) = lim f (x)g(x) − f (a)g(a) . By deﬁnition. we have (3. Using this idea. x→a x−a The crucial idea is to decompose the diﬀerence f (x)g(x) − f (a)g(a) as (f (x)g(x) − f (x)g(a)) + (f (x)g(a) − f (a)g(a)) in the numerator of the righthand side of (3. and (f g) (a) = f (a)g (a) + f (a)g(a).2) (f g) (a) = lim f (x)g(x) − f (x)g(a) f (x)g(a) − f (a)g(a) + x→a x−a x−a f (x)g(x) − f (x)g(a) f (x)g(a) − f (a)g(a) = lim + lim x→a x→a x−a x−a g(x) − g(a) f (x) − f (a) = lim f (x) · + lim g(x) · x→a x→a x−a x−a = f (a)g (a) + g(a)f (a).14. The derivative of h(x) = x2 ex can be computed as follows. THE PRODUCT AND QUOTIENT RULES 45 Theorem 14. Then f /g is diﬀerentiable at a.
46 3. Find a rule to compute (f 2 ) (x). RULES OF DIFFERENTIATION Proof. Find h (x). (5) Let g(x) = e−x . we have (3. Exercises. g(a)2 2 Theorem 15 now enables us to compute the derivative of rational functions.3. Example 29. Find a rule to compute (1/f ) (x). Solution: Let f (x) = x + 3 and let g(x) = x2 + 1. (6) Let h(x) = x/ex . = g(x)2 x4 + 2x2 + 1 x + 2x2 + 1 2 14. we have h (x) = −x2 − 6x + 1 g(x)f (x) − f (x)g (x) x2 + 1 − (x + 3)2x = 4 . Find g (x). x→a (x − a)g(x)g(a) Now transform the numerator of the righthand side by subtracting and then adding g(a)f (a) to get f g (a) = lim f (x)g(a) − g(a)f (a) + g(a)f (a) − f (a)g(x) x→a (x − a)g(x)g(a) g(a) f (a) f (x) − f (a) g(x) − g(a) = lim · − lim · x→a g(x)g(a) x→a g(x)g(a) x−a x−a g(a)f (a) − f (a)g (a) = . (1) (2) (3) (4) Let h(x) = ex x3 . . Find h (x). Use the result of the previous exercise to prove a formula for g (x) if g(x) = xn for a negative integer n.3) f g (a) = lim f (x) g(x) − f (a) g(a) x→a x−a . Find h (x) and h (x). So. by Theorem 15. By deﬁnition. Let us multiply both the numerator and the denominator of the righthand side by g(x)g(a) to get f g (a) = lim f (x)g(a) − f (a)g(x) . Then f (x) = 1 and g (x) = 2x. Let h(x) = (x + 3)/(x2 + 1).
due to the fact that this is the ﬁrst trigonometric function we will diﬀerentiate and we will have to apply new methods. h→0 h h We will carry out this task in two lemmas. where α = 2π/n is the angle AOB. There remains the task of computing the two nontrivial limits (cos h) − 1 sin h and lim . First. Considering just 1/n of both the circle and the ngon. and the area of 1/n of the lim . h→0 h→0 h h Note that as. Recall the identity sin(a + b) = sin a cos b + sin b cos a. This will be a somewhat lengthy procedure. However. We have sin h = 1. Let us consider a circle with unit radius and a regular ngon whose center is at the center O of the circle and whose n vertices are all on the unit circle. we certainly have limh→0 sin x = sin x and limh→0 cos x = cos x. We have sin(x + h) − sin x (sin x) = lim h→0 h sin x cos h + sin h cos x − sin x = lim h→0 h (cos h) − 1 sin h cos x + = lim sin x h→0 h h (cos h) − 1 sin h = sin x lim + cos x lim . h→0 lim Lemma 1. we compute (sin x) . since those functions can be obtained from sin and cos. h approaches 0. with A 2 and B being adjacent vertices of our ngon. We have (sin x) = cos x. Derivatives of Trigonometric Functions In this section. Then the area of the circle is π. it will be much simpler to deduce the derivatives of other trigonometric functions. DERIVATIVES OF TRIGONOMETRIC FUNCTIONS 47 15. Proof. we see that the area of the triangle AOB is (sin α)/2. and then the various diﬀerentiation rules can be used. we show how to compute the derivatives of trigonometric functions. since these functions do not even depend on h. and the area of the ngon is n · 1 · sin α.15. once we know the derivatives of sin x and cos x. Theorem 16. h→0 h Proof.
2 (sin x) = sin x lim The following theorem can be proved by very similar methods. we saw that (cos h) − 1 sin h + cos x lim . 2 Lemma 2. = α/2 α On the other hand. So the ratio of the two areas is (sin α)/2 sin α . sin α/α. Proof. At the end of the ﬁrst displayed chain of equations in that proof. α gets smaller and smaller. The equality (cos x) = − sin x holds. their ratio.4) holds.48 3. . we have (cos h) − 1 sin2 h = − lim h→0 h→0 h(1 + cos h) h sin h sin h · = − lim h→0 h 1 + cos h sin h sin h = − lim · lim h→0 h h→0 1 + cos h = (−1) · 0 = 0. lim h→0 lim (cos h) − 1 =0 h 2 We can now ﬁnish the proof of Theorem 16. and the arc AB is π · α/(2π) = α/2. Theorem 17. will get arbitrarily close to 1 and stay arbitrarily close to 1. Hence. on the righthand side. as n gets larger and larger. We will manipulate the expression ((cos h) − 1)/h so that we can use the result of Lemma 1. so (sin x) = cos x as claimed. the ﬁrst limit is 0 and the second limit is 1. h h(1 + cos h) h(1 + cos h) Therefore. The equality (3. we multiply both the numerator and the denominator by cos h + 1 to get (cos2 h) − 1 − sin2 h (cos h) − 1 = = . while the area of the ngon gets closer and closer to the area of the circle. First. BO. RULES OF DIFFERENTIATION circle bordered by the lines AO. h→0 h→0 h h The previous two lemmas showed that.
Prove that (csc x) = − csc x · cot x. In this section. Exercises. The Chain Rule 16. t(x) = x2 + 1. and quotient of two functions. so we can apply the quotient rule. Find h (x). Theorem 19 (Chain Rule). The Derivative of the Composition of Two Functions.1(1). product. we will learn a rule. We still do not know how to compute the derivative of the composition of functions. where g is diﬀerentiable at x and f is diﬀerentiable at g(x). We have (tan x) = sec2 x. 15. The next theorem shows an example of this.1. (1) (2) (3) (4) (5) (6) Prove that (cos x) = − sin x. Let h(x) = f (g(x)). The derivatives of the other three trigonometric functions are given in the exercises. Proof. the derivatives of other trigonometric functions can be obtained by simply using the quotient rule. 16. Now that we have the derivatives of sin and cos. Let h(x) = ex / sin x. Find h (x).16.1. diﬀerence. Prove that (cot x) = − csc2 x. This leads to sin x (tan x) = cos x cos x · (sin x) − sin x(cos x) = cos2 x cosx + sin2 x = cos2 x 1 = cos2 x 2 = sec2 x. Prove that (sec x) = sec x tan x. that applies in these situations. or r(x) = e4x . Theorem 18. THE CHAIN RULE 49 You are asked to prove this theorem in Exercise 15. . In previous sections. such as √ h(x) = sin(3x). and we have h (x) = f (g(x))g (x). called the chain rule. Note that tan x = sin x/cos x. we learned how to compute the derivative of the sum. Then h is diﬀerentiable at x. Let h(x) = ex cos x.
we have h (x) = f (g(x)) · g (x) = (cos 3x) · 3 = 3 cos 3x. On the other hand. Now we will discuss some examples of the applications of the chain rule.2. A Simple Way of Obtaining (cos x) . RULES OF DIFFERENTIATION The proof of the chain rule is somewhat technical. So we can write cos x as the com2 position of two functions. with f (x) = sin x and g(x) = x + π . h (x) = f (g(x))g (x) = √ 2+1 2+1 2 x x 2 Sometimes the chain rule is written in the Leibniz notation. Two Applications of the Chain Rule. So the chain rule applies. Recall that cos x = sin x + π . Example 30.50 3. Let h(x) = x2 + 1. namely cos x = f (g(x)).1. it took considerable time and eﬀort to prove that (sin x) = cos x. by the chain rule. Finding (cos x) with similar methods is just as timeconsuming. . Recall that in the last section. 2 √ Example 31. Find the derivative of h(x) = sin 3x. Solution: Let f (x) = sin x and let g(x) = 3x. so we will postpone it until the end of this section. Then h(x) = f (g(x)). that is. we have x 1 · 2x = √ . so. 16.2. Find h (x). so. √ Solution: Let f (x) = x and let g(x) = x2 + 1. by the chain rule. and we get 2 (cos x) = f (g(x)) · g (x) π ·1 = cos x + 2 π π = cos x cos − sin x sin 2 2 = 0 − sin x = − sin x. dx dg dx 16. the chain rule enables us to compute (cos x) faster. Then h(x) = f (g(x)). as dh dg dh = · .
namely h(x) = f (g(x)). Therefore.2. Recall that we deﬁned the number e such that the derivative of the exponential function f (x) = ex was f (x) itself. Proof of Theorem 19.16. THE CHAIN RULE 51 16. and we get h (x) = f (g(x)) · g (x) = ex ln a · ln a = ax ln a. let y = g(x). Then we have h (x) = ax ln a. Let a be a positive real number and let h(x) = ax . As f is diﬀerentiable at y. Proof. the chain rule applies.6) Set lim f (y + s) − f (y) − f (y) s u= = 0. and as r approaches 0. note that u depends on s and that u approaches 0 as s approaches 0.5) Set r→0 lim g(x + r) − g(x) − g (x) r t= = 0.3. s Again. r Note that t depends on r. As g is diﬀerentiable at x. f (g(x)) = lim . f (y + s) − f (y) − f (y). The Derivatives of Exponential Functions. where f (x) = ex and g(x) = x ln a. t approaches 0. Note that h(x) = ax = (eln a )x = ex ln a . Now the chain rule enables us to compute the derivatives of exponential functions with any base. 2 16. Now we undertake a series of manipulations of the preceding two equations. Proof of the Chain Rule. So we have succeeded in writing h as the composition of two functions. Similarly. we know that (3. It is time that we proved the chain rule. we have s→0 (3. g(x + r) − g(x) − g (x).2. Theorem 20. Our goal is to express f (g(x + r)) − f (g(x)) r→0 r in terms of f (g(x)) and g (x).
Find h (x). Find h (x).10) and the expression in (3. RULES OF DIFFERENTIATION Rearranging the equation that deﬁnes the variable t that we just introduced.8) holds for all y and s.10) (3. We can now express the quotient (f (g(x + r)) − f (g(x)))/r from the equality of the lefthand side of (3.4. Equation (3. rearranging the equation that deﬁnes the variable u. Observe that (3. h(x) = sin (x2 ). Find h (x). so.52 3.7) g(x + r) = g(x) + (g (x) + t)r. h(x) = esin x . Exercises.7) to get (3. Now apply the function f to both sides of (3. Making these substitutions in (3. 2 h(x) = ex sin x . Find h (x). Find h (x). h(x) = tan 3x.9) yields (3. we get (3. . Finally. we get (3.9) f (g(x + r)) = f (g(x) + (g (x) + t)r) . 16.11) f (g(x + r)) = f (g(x) + (g (x) + t)r) = f (g(x)) + (f (g(x)) + u) · (g (x) + u)r. we are in a position to compute the derivative we were looking for as the limit of the lefthand side as r approaches 0. it holds when y = g(x) and s = (g (x) + t)r. in particular. We get f (g(x + r)) − f (g(x)) = lim (f (g(x)) + u)(g (x) + t) r→0 r→0 r lim = lim f (g(x)) + lim u · lim g (x) + lim t r→0 r→0 r→0 r→0 = f (g(x)) · g (x) since both t and u approach 0 as r approaches 0.8) f (y + s) = f (y) + (f (y) + u)s. 2 (1) (2) (3) (4) (5) Let Let Let Let Let h(x) = (x2 + 1)5 .11) as (f (g(x)) + u)(g (x) + t)r f (g(x + r)) − f (g(x)) = r r = (f (g(x)) + u) (g (x) + t).8). Similarly.
That is. Consider Equation (3. they were given by a rule that directly described how f (x) = y is obtained from x. They were explicitly given. we do not need an explicit expression for the function y(x). so the slope of the tangent line at (2. Implicit Differentiation In the last several sections. even if (3. we need to use the chain rule.12). we computed the derivatives of many diﬀerent functions. we only need to know the derivative dy/dx of that function at x = 2. since the curve in question is symmetric in . Note that the fact that the tangent line at (2. Sometimes we have to deal with curves that are given by a diﬀerent kind of rule. dx dx Now recall that y = y(x) is a function of x. Tangent Lines to Implicitly Deﬁned Curves. dx (3y 2 − 4x) At the point (2. 3x2 + 3y 2 dx dx Expressing dy/dx from this equation. we could simply take the derivative of that function at x = 2. they had one important feature in common. and diﬀerentiate both sides with respect to the variable x to get d d x3 + y 3 = (4xy).17. we need to use the product rule.12) implicitly describes this dependence. 2). Using these rules.1. that is. 17. when computing (d/dx)y 3 on the lefthand side. On the righthand side. Let us say that we want to compute the slope of the tangent line to this curve at the point (2. 2) is −1. So. 2) has slope −1 makes (intuitively) perfect sense. Although these functions were diﬀerent. the righthand side is −4/4 = −1. However. it is not clear how to write y explicitly in terms of x. we get dy (4y − 3x2 ) = .12) x3 + y 3 = 4xy. 2). Keep in mind that we do not need to explicitly know how y depends on x. we get dy dy = 4y + 4x . It is in these situations that we resort to implicit diﬀerentiation. If we could express y as a function of x. IMPLICIT DIFFERENTIATION 53 17. Consider the curve given by the equation (3.
x = tan y.13) with respect to x.2. then (y. This formula is interesting for two reasons. One place where implicit diﬀerentiation is a very powerful tool is in the computation of the derivatives of inverse trigonometric functions. First. Recall that tan−1 x = y is the function that is the inverse of the restriction of the function tan x to the interval (−π/2. Recalling that d tan z = sec2 z dz we get dy .13) where y ∈ (−π/2. using just the deﬁnition of derivatives. 1 + x2 and y = y(x). y) is on the curve. RULES OF DIFFERENTIATION x and y. dx Solving for dy/dx and recalling the identity sec2 z = 1 + tan2 z. if tan−1 (x) = y. dx dx To that end. we obtain 1 1 1 dy = = = . That is. x) is also on the curve. π/2). then (3. Derivatives of Inverse Trigonometric Functions. . if (x. it is surprisingly simple. 2 2y dx sec 1 + tan y 1 + x2 1 = sec2 y · In other words. Second.54 3. Our goal is to determine d dy tan−1 (x) = . we proved the suprisingly simple formula (tan−1 x) = 1 . That is. Imagine trying to get this result without implicit diﬀerentiation. You will be asked to compute the derivatives of the other inverse trigonometric functions in the exercises. 17. let us take the derivative of both sides of (3. π/2). it does not even contain trigonometric functions.
18. Set y = ln x. Diﬀerentiating both sides with respect to x. We have 1 . . we compute the derivative of the function f (x) = ln x. 1 (6) Prove that (csc−1 x) = − x√x2 −1 . dx 1 dy = y. 1 (4) Prove that (cot−1 x) = − 1+x2 .1. The Formula for (loga x) . Use implicit diﬀerentiation to ﬁnd the slope of the tangent line to C at the point (5.3. 1 (5) Prove that (sec−1 x) = x√x2 −1 . Then ey = x. (1) Let C be the circle given by the equation x2 + y 2 = 169. Exercises. Theorem 21. x Proof. (loga x) = x ln a Proof. As another powerful application of im plicit diﬀerentiation. dx e However. Derivatives of Logarithmic Functions 18. 1 (3) Prove that (cos−1 x) = − √1−x2 . we get (ln x) = ey · dy = 1. Then 1 . 2 It is now a breeze to determine the derivative of logarithmic functions of any base. Note that x = eln a loga x = e(ln a)(loga x) . ey = x by deﬁnition. 12).18. Let a = 1 be a ﬁxed positive real number. Corollary 3. 1 (2) Prove that (sin−1 x) = √1−x2 . so dy 1 = dx x as claimed. DERIVATIVES OF LOGARITHMIC FUNCTIONS 55 17.
Let f (x) be a diﬀerentiable function that takes positive values only. 2 18. we get 1 1 ln y = 3 ln x + ln(x + 1) − ln(x − 2). This is sometimes easier by taking the logarithm of the product. Logarithmic Differentiation. Then 2 d − sin x = − tan x. y= √ x−2 Compute dy/dx. Example 33. which will be a sum. 2 2 Now taking derivatives with respect to x and using Corollary 4. This procedure. dx f (x) Proof. has the inherent advantage that it deals with sums instead of products. An interesting consequence of Theorem Corollary 4. d df d f (x) ln f = .56 3.2. Then f (x) d ln f (x) = . 21 is the following. ln(cos x) = dx cos x 18. and using implicit diﬀerentiation. we have dy 1 3 1 1 · = + − . By the chain rule. Sometimes we need to compute the derivative of a complicated product. which is called logarithmic diﬀerentiation. Solution: Taking logarithms. Let f (x) = cos x. it follows that 1 1 f (x) = (ln x) = ln a x ln a as claimed. Let √ x3 x + 1 . · = dx df dx f (x) Example 32. dx y x 2(x + 1) 2(x − 2) . The Chain Rule and ln x.3. RULES OF DIFFERENTIATION So ln x = (ln a) · (loga x) and f (x) = loga x = ln x . ln a As ln a is a constant. and sums are much easier to diﬀerentiate than products.
as a limit. Recall that in an earlier section. to prove it. Recall that we have deﬁned the number e. we get h→0 lim ln(1 + h)1/h = 1. We stated that this was the case for all real numbers n. DERIVATIVES OF LOGARITHMIC FUNCTIONS 57 Finally. as the number for which limh→0 (eh − 1)/h = 1. The Number e Revisited. dx Proof. Now we have the tools. Power Functions Revisited. Set y = xn . By the deﬁnition of derivatives. Note that if f (x) = ln x. not just positive integers.5. Let n be any real number. Taking logarithms. this means that ln(1 + h) − ln 1 = 1. lim h→0 h Observing that ln 1 = 0 and using the power rule of logarithms. Let us assume for the case of simplicity that x is positive. then f (x) = 1/x. lim (1 + h)1/h = e. we proved that if n is a ﬁxed positive integer. we get n dy 1 · = . dx y x Solving for dy/dx yields dy ny nxn = = = nxn−1 dx x x as claimed. or.4. the base of the natural logarithm.18. Then we have d n x = nxn−1 . but we have not proved that claim. so f (1) = 1. we can solve this equation for dy/dx to get dy =y dx 3 1 1 + − x 2(x + 1) 2(x − 2) √ 3 x3 x + 1 1 1 · = √ + − x 2(x + 1) 2(x − 2) x−2 . we have h→0 . namely logarithmic diﬀerentiation. 2 18. Diﬀerentiating both sides with respect to x. applying the exponential function ez to both sides. Theorem 22. we have ln y = n ln x. then (xn ) = nxn−1 . Our new knowledge lets us express e more directly. 2 18.
For instance. That is. The corresponding notion in physics is called acceleration. Recall that if an object moves along a line and the distance it covers in time t is described by the function s(t). dt Example 34.712828 of e. Applications of Rates of Change In this section. when considering a vehicle’s performance. we have (3. The position of a particle is described by the equation 1 (3. keeping the previous notation. If the object moves at a changing velocity.6.1.14) v(t) = dv = s (t). we consider a few applications of derivatives in various disciplines. Either of the last two formulas can help to determine the approximate value 2. (1) (2) (3) (4) √ d Compute dx ln( x + 1). not only what its top speed is. 18. RULES OF DIFFERENTIATION Equivalently. and is denoted by a(t). x (5) Compute limx→∞ 1 + 19.16) s(t) = t3 − 3t2 + 5t. 3 Here s is measured in meters and t in seconds. √ 3 x+4 Compute f (x) if f (x) = x4 x+1 .15) a(t) = v (t) = . 19. Physics. we may be interested in how fast it can reach its top speed. We can take this concept one step further. setting x = 1/h. Exercises. then s(t + h) − s(t) ds = s (t) = lim h→0 dt h is the instantaneous velocity of the object at time t. Compute (ln x) . Compute (xx ) . 1 2x . (I) What is the velocity of the particle after 3 seconds? (II) Find the acceleration of the particle after 10 seconds. we get x→∞ lim 1 1+ x x = e. (III) When does the particle move backward? (3. then the rate of change of the velocity itself can be important information.58 3.
The reason for this is as follows. that will be more or less in direct proportion to the number of units produced. such as C(x) = a + bx + cx2 + dx3 . Solution: (I) The velocity of the particle is described by the function v(t) = s (t) = t2 − 6t + 5. That happens when v(t) = t2 −6t+5 = (t−1)(t−5) < 0. and organizing production. In other words. Then there will be other factors. (II) The acceleration of the particle is given by the formula a(t) = v (t) = 2t − 6. Because the cost function C(x) is not a linear polynomial. the particle is moving backward between the ﬁrst and ﬁfth seconds. in other words. the cost of producing the (n + 1)th unit.19. such as hiring workers. So. 2 19. Taxes may factor in at an even higher rate. when t ∈ (1. that will be in direct proportion to a higher power of x as the diﬀerences in size turn into diﬀerences in kind. that is. marketing the product. It is often the case that C(x). such as renting a location and buying supplies. This yields v(3) = 9 − 18 + 5 = −4. . Then there will be costs. So the velocity of the particle after 3 seconds is −4 m/s. 5). the particle is accelerating at 14 m/s2 . can be described by a polynomial function. There will be some costs. after 10 seconds. can be computed by the formula M (n) = C(n + 1) − C(n). that will be present regardless of the number of units produced. These will be represented by the linear term bx. such as designing the product and obtaining permits. Economics. which is called the cost function. APPLICATIONS OF RATES OF CHANGE 59 Additional questions about the movement of this particle will be given in the exercises. meaning that the particle is moving backward at a speed of 4 meters per second after 3 seconds. These will be represented by the constant term a. Let us say that a company estimates that it costs C(x) dollars to produce x units of a new product. The cost of increasing production from n units to n + 1 units. producing the 1001st unit does not cost of the same as producing the ﬁrst unit or the 5001st unit. (III) The particle is moving backward when its velocity v(t) is negative.2.
000003 · 10002 + 0. (1) Consider the particle of Example 34. in case the products are such that fractional units do not make sense (e.000001x3 . we get C (x) = 0. There is one important diﬀerence.001x2 +0. (The cost of producing the average bottle if n bottles are produced is of course C(n)/n.0003 · 1002 + 0.23 dollars to produce.3. Compared to that. the fact that it costs more to produce the 1001st bottle than the 101st bottle does not mean that the more bottles are produced.18) M (x) = C(x + 1) − C(x) ≈ C (x). In that case. the derivative C (x) is given by C(x + Δx) − C(x) . since C(1) > 106 . automobiles).002 · 100 + 20 = 20. ﬁrst ten thousand bottles is very small. Δx → 0 is impossible in its precise mathematical meaning. In that. the cost of each of the ﬁrst thousand. Exercises.g. so the production of each of them will bring the cost of producing the average bottle down.002x + 20.17) simpliﬁes to C(x + 1) − C(x).002 · 1002 + 20 = 43 dollars to produce. Δx→0 Δx However. RULES OF DIFFERENTIATION The marginal cost function C (x) describes how the cost function changes. In that case. the expression after the limit symbol in (3. while the 1001st bottle costs 0. Are there any moments when the particle is not moving? . This is because the cost of producing the ﬁrst bottle is astronomical. By the rules of diﬀerentiating a polynomial function. the closest that Δx can get to 0 is when Δx = 1. C (x) and M (n) are similar.60 3. how far from its starting point is that particle? In what direction? (2) Consider the particle of the previous exercise. 2 It is important to note that the result of the previous example. or even.) In the exercises. Example 35. it could well be that the smallest meaningful positive value of Δx is 1. The cost function of a bottle of a new medication is given by C(x) = 106 +20x+0. Solution: By the preceding discussion. justifying the approximation (3. So the 101st bottle costs 0. we need to compute the function C (x). 19..000003x2 + 0. As we know. that is. you are asked to compare these results to the results obtained by using the formula C(n + 1) − C(n).17) lim (3. After 6 seconds. Find the approximate cost of producing the 101st and the 1001st bottles. however. the more expensive it is to produce the average bottle.
For example. Apparently. RELATED RATES 61 (3) The location of an object moving vertically is described by the 2 function s(t) = t − t5 . so the pool area is 200 · 9.2271 m3 /h. Preliminaries. When will the object have an instantaneous velocity of 0.1. At what rate does the water level increase? So the volume and the water level are both functions of time. their values are related as V (t) = 20h(t) and so must be their derivatives or rates of change: (3. then their rates of change should also be related. while the pumping rate is given in gallons per hour. Compare your results with the estimates that we found using the function C (x). When does it have the greatest speed going up? When does it have the greatest speed going down? (5) Use the formula M (n) = C(n + 1) − C(n) to ﬁnd the cost of producing the 101st and 1001st units in Example 35. An intuitive idea of the notion of related rates comes from a simple fact of everyday life: If there are two related quantities that are changing with time. for example. 200 ft2 .19) V (t) = 20h(t) → V (t) = 20h (t) . Units. Hence. in the above problem the pool area is often given in square feet.2 m/s? (4) Consider the object of the previous exercise. the volume V of water in a pool of area 20 m2 is related to the water level h (the pool depth in meters) as V = 20 h.29 · 10−2 m2 . For every instance of time t. where time is measured in seconds and distance is measured in meters.58 m2 . The water level rises by 1 cm every hour. for t ∈ [0.785 · 10−3 = 0. h = 0. A hose is put into the pool that can pump water at a rate of 0. the same idea of related rates would work for lowering the water level after rain.2271/18. For example.01 m/h = 1 cm/h. One gallon is 3. 5]. A somewhat practical estimate! You would know exactly when to come back and turn oﬀ the water if you needed an inch or so of the water level increase.29 · 10−2 = 18. V = V (t) and h = h(t). NASA lost a $125 million Mars orbiter because a Lockheed Martin engineering team . It is important to bring all the quantities to the same system of units. for example.20.785 · 10−3 m3 and therefore V = 60 · 3.2 cm/h. Since V (t) = 0. One square foot is 9.2 m3 /h. 20. h (t) = V (t)/20 = 0.58 ≈ 1. Now the question is easy to answer. In 1999. Suppose the water level is low and needs to be increased. 20. V = 60 gal/h.2. Related Rates 20.2 m3 /h.
Formal Deﬁnition of Related Rates..19): The rates are still proportional to one another. The number a = f (x0 ) can be calculated.19): (3. Example 36. Let the values of x and y at t = t0 be x0 = x(t0 ) and y0 = y(t0 ). How do we use it? Take a particular value of t = t0 . In the previous example. Let a laser pointer be positioned at a distance D = 1 m from a wall.62 3.20) establishes the soughtafter relation between the rates y and x . Definition 25. The very notion of related quantities can be stated in proper mathematical terms. but the proportionality coeﬃcient f (x) is no longer a constant. we obtain a generalization of (3. RULES OF DIFFERENTIATION used English units of measurement while the agency’s team used the more conventional metric system for a key spacecraft operation. In other . However.20) y(t) = f (x(t)) → y (t) = f (x(t))x (t). Equation (3.g. t is time): x = x(t) and y = y(t). The problem of “related rates” can now be cast in the proper mathematical terms: What is the relationship between the derivatives x (t) and y (t) if the values of x(t) and y(t) are related by y = f (x)? The values of the functions x(t) and y(t) are related as y(t) = f (x(t)) for any t. Then the rate of change of x or y with respect to t is nothing but the derivative x (t) or y (t).3. Two quantities y and x are said to be related if there is a function f such that y = f (x). Then the equality y (t0 ) = ax (t0 ) determines the relation between the rates y and x at the instance when x has the value x0 (or y has the value y0 = f (x0 )). (I) At what speed does the bright spot travel along the wall if the pointer revolves at a constant rate ω = π rad/s? (II) At what direction of the laser beam does the bright spot travel at the speed v = 4π m/s? Solution: (I) The analysis of any problem on related rates must begin with deﬁning the quantities whose rates are being studied. 20. Taking the derivative of both sides with respect to t by means of the chain rule (Theorem 19). V = f (h) = 20h. The pointer can be rotated so that the bright spot created by the laser beam travels horizontally on the wall. Suppose now that the quantities y and x are functions of another variable t (e. it seems somewhat diﬀerent from (3. but a function.
000 mi/sec.21) y = D tan ϕ → y = D ϕ . (IV) Note that the rate y = v is not constant even if the rate ϕ = ω is constant. and y = v is measured in meters per second.20) yields (3. If the pointer rotates. To answer the second question. ϕ = ϕ(t).4. and hence the rate y = v grows unboundedly. the laser beam is getting closer to being parallel to the wall. (III) Once the relation between the quantities of interest has been established. and so does the position of the bright spot. It follows from Equation (3. one has to ﬁnd the value of ϕ when v = 4π m/s. that is. and ω = π rad/s. y = y(t). D = 1 m. It seems like just with merely a laser pointer.21) that 1 π Dω = → ϕ= . Since (tan ϕ) = 1/ cos2 ϕ. cos ϕ. Thus. Can Anything Travel Faster Than Light? The solution (3. v 4 3 that is. the cosine. that is. the bright spot moves at the speed 4π m/s when the laser beam makes 60◦ with the perpendicular to the wall.20.21) has an interesting feature. It is clear that D and y are related as the catheti of the right triangle whose hypotenuse is the laser beam: y = D tan ϕ = f (ϕ). the relation between their rates can be found. when the laser beam is perpendicular to the wall. between the distance y and the angle ϕ: y = f (ϕ). the angle becomes a function of time. (II) The next step is to ﬁnd a function that determines the relation between the quantities of interest. a superluminal object can be created in a lecture hall! Let us investigate this. one has to answer the question: How are these quantities measured? The orientation of the laser beam can be described by the angle ϕ between the perpendicular to the wall and the laser beam.000 km/s ≈ 186. The position of the bright spot may be set by the distance y traveled by it from the point on the wall when ϕ = 0. When ϕ approaches 90◦ . Equation (3. 2 cos2 ϕ = 20. cos2 ϕ The ﬁrst question is answered by setting ϕ = ω = π rad/s and D = 1 m/s. tends to 0 in Equation (3. the question is about the relation between the rates y (t) = v (the speed at which the bright spot travels) and ϕ (t) = ω (the rate at which the pointer rotates). that is.21). The speed of light is c ≈ 300. The light can make a trip around . RELATED RATES 63 words.
the basic idea of ﬁnding relations between the rates has not changed: They are obtained by . Bright spots at y and y + Δy are created by diﬀerent portions of the laser beam that are emitted by the laser at two distinct moments of time. Well. Does there exist a position of the Sun above the horizon at which your shadow extends faster than the speed of light? 20. and hence it could not appear at the next position y + Δy (at this position arrived a diﬀerent lump of light emitted by the laser at a later time). More Than Two Related Rates.1 m.64 3. Has a counterexample to Einstein’s theory just been found? The answer is “no. 20.5.99414◦ . So the bright spot becomes superluminal if ϕ > 89.6. where the functions of several variables are studied. A proof of this statement is given in a more advanced course. then their rates are linearly related. and hence ϕ ≈ 89. and set D = 0. RULES OF DIFFERENTIATION the world in merely 0.05 · 10−8 . A lump of light that arrived at y was reﬂected by the wall (that is why we see the bright spot!). Related Problem. There are situations when several quantities are related among themselves. look at your shadow. However. no material object actually moves along the wall.99414◦ ! (II) Since y = D tan ϕ. If these quantities become functions of a variable t. not yet exactly a lecture hall experiment.” In the motion of the bright spot.13 seconds! Example 36 is now supplemented by two additional questions: (I) Is it possible that v can exceed the speed of light? If so. but it can be managed on the campus! Einstein’s theory of relativity states that no material object can travel faster than light. then v > c if y > 98 m. the angle at which the bright spot exceeds the speed of light satisﬁes the equation cos2 ϕ = Dω/c ≈ 1. The next time you watch a Florida sunset. a lecture hall appears to be a “bit” small for this experiment! Take a Dremel miniature grinder (sold in Lowe’s stores) for which ω ≈ π · 103 rad/s (it can be used to rotate the pointer). at which direction of the laser beam does it happen? (II) At which position of the bright spot does it happen? The answers read: (I) Setting D = 1 m = 10−3 km (watch the units: all distances are now in kilometers!) and v = c = 3 · 105 km/s. So the rate Δy/Δt cannot possibly be associated with the motion of any material object. v > c if y > 9772 m.
22) L(x) = f (x0 ) + f (x0 )(x − x0 ) is called the linearization of f (x) in a neighborhood of x0 . the area of the rectangle decreases at the rate 2 cm2 /s while the perimeter does not change.21. 2 21. Since the values of f and L coincide at x = x0 . Consider a rectangle with sides x and y. Linear Approximations and Differentials 21. Now let S = −2 cm2 /s because S decreases (S must be negative). It then follows that x = −y = 1 cm/s. Solution: (I) There are four quantities involved: the rectangle dimensions x and y. the linear relations between the rates are S = x y + xy . x = −y and S = (x − y)y . one has −2 = (3 − 1)y and y = −1 cm/s.1. The procedure is illustrated in the following example. With x = 3 cm and y = 1 cm.5)). Find their rates of change when x = 3 cm and y = 1 cm if. So the linear function L(x) may be used to approximate f (x) in a . Using the derivative of the product and the sum of two functions. at that moment. Suppose that x and y change with time. (IV) Since P = 0 (the perimeter does not change). Definition 26. f (x0 )) (see Equation (2. Suppose f (x) is diﬀerentiable at x = x0 . provided x is close enough to x0 . P = 2(x + y) . The derivative of a function f (x) at a point x = x0 deﬁnes the slope of the line tangent to the graph y = f (x) at the point (x0 . Tangent Line Approximation. (III) If x = x(t) and y = y(t). (II) There are two relations between them: S = xy . LINEAR APPROXIMATIONS AND DIFFERENTIALS 65 diﬀerentiating the relations between the quantities in question with respect to t. one might expect that the diﬀerence f (x) − L(x) is small. The equation of the tangent line is y − f (x0 ) = f (x0 ) x − x0 or y = f (x0 ) + f (x0 )(x − x0 ) . Example 37. P = 2(x + y ). then S(t) = x(t)y(t) and P (t) = 2(x(t)+y(t)). and the perimeter P . the area S. The linear function (3.
92 ≈ L(3. alternatively. The previous example leads to a problem that is extremely important in applications: Given an upper bound for the error ε of the linear approximation of a function f (x) near x0 .02·10−4 .e. This approximation is called the linear approximation or tangent line approximation. .08.22) the linearization of x near x = 4 is 1 L(x) = 2 + (x − 4). √ Example 38.02 · 10−4 .02 and  4. i. the values of x and its linearization diﬀer by no more than 1.92 ≤ x ≤ 4.92−L(3. Solution: √ (I) Consider f (x) = x. The following theorem is useful to answer these questions.66 3. 4 √ A calculator gives 3.98997. that is.02·10−4 for all 3. RULES OF DIFFERENTIATION small neighborhood of x0 . 21.08) < 1.96 ≈ 1.98. by Equation √ (3.92 − 4) = 1.2. So the approximation error is √ √3. f (x) ≈ L(x). Accuracy of the Linear Approximation. the neighborhood x0 − δ ≤ x ≤ x0 + δ. a decrease (increase) in the upper bound for the error would lead to a decrease (increase) in the size of a neighborhood of x = 4 where the linear approximation is accurate. √ In other words.92 at which the square root can be evaluated without a calculator is x0 = 4: f (x0 ) = 2.08.92.08 − L(4. to √ (II) Since f (x) = ( x) = 1/(2 x) and f (4) = 1/4.92) < 1. ﬁnd δ such that f (x) − L(x) ≤ ε if x − x0  ≤ δ. estimate the error ε of the linear approximation. Note the two important steps here: the choice of f (x) suitable for the problem and the choice of x0 near which the linear approximation is√ be used.92): √ 1 2 3.92) = √ 3. Naturally.02 · 10−4 if x − 4 ≤ 0..92) = 2 + (3. This observation can be summarized by the following inequality: √  x − L(x) < 1. It is easy to see that L(4. 4 (III) The linear approximation means that the value f (3.08) = 2. or.92 is approximated by the value L(3. The closest value of x to 3. given δ. Use the linear approximation to estimate the value 3.
Example 39. By Theorem 23. b). It follows from Equation (3.23): δ 2 = 10−3 and hence δ ≈ 0. With M = sin δ. . The following example illustrates the use of this theorem to assess the accuracy of the linear approximation. for example. let a = −δ and b = δ. f (0) = 1. and f (0) = 0. So a value of δ has to be found numerically (actually. (3. δ ≈ 0. So M = 1 is acceptable. b). Then f (x) − L(x) ≤ 1 M (x − x0 )2 . Let L(x) be the linearization of f (x) at x0 ∈ (a. b) such that f (x) ≤ M for all x ∈ (a. So this option should not be “abused. one has to ﬁnd M . the linearization is L(x) = x. The simplest way to do this is to take the maximal value of f (x) in the interval x ≤ δ. Note that there should be δ < π/2 because L(π/2) − sin(π/2) = π/2 − 1 exceeds the given error ε. one can choose a larger M . Next. Solution: (I) Since f (x) = (sin x) = cos x. LINEAR APPROXIMATIONS AND DIFFERENTIALS 67 Theorem 23. 2 x ∈ (a. sin x ≤ 1 for any x. and hence (sin x)  =  sin x ≤ sin δ = M for all x ≤ δ.21. So sin x is monotonic in x ≤ δ. An analytic solution of this equation is impossible. Find an interval x ≤ δ in which the error of the linear approximation does not exceed ε = 0. This theorem is a simpler version of the Taylor theorem. This requires some skills to achieve.23). This value of δ appears to be smaller than that in the case M = sin δ. b) and some number M .23) that a larger value of M leads to a smaller δ. This simpliﬁes Equation (3. which is proved in advanced calculus courses.23)  sin x − x ≤ 1 M x2 ≤ 1 M δ 2 = ε if 2 2 x ≤ δ.” A good M is not too large and yet is simple enough to solve Equation (3.0362. Consider the linearization of sin x at x = 0.5 · 10−3 . too. the solution of the equation δ 2 sin δ = 2ε = 10−3 determines δ.100057). (II) In Theorem 23. So the choice M = δ also fulﬁlls the conditions of Theorem 23. (III) Otherwise. Suppose a function f (x) is twice diﬀerentiable in (a. (IV) A good compromise is to use the inequality sin δ ≤ δ.
n > 1.1. An intuitive understanding of the diﬀerential stems from its geometrical interpretation. RULES OF DIFFERENTIATION Equation (3. Thus. one can associate another real variable. So . called the diﬀerential. where Δx is a real number. whereas dy = f (1) Δx = 2 · 0. if the variables y and x are related. then the diﬀerential dy is no longer an independent variable and is determined by x and dx. Let two variables y and x be related as y = f (x). with every real variable.0362 when M = 1 and δ ≈ 0.24) states that. where f is a diﬀerentiable function. So. f (x)) in the interval [x. 2 The converse problem is simpler: Find an upper bound for the error of the linear approximation of sin x at x = 0 in the interval x ≤ 0.2) · (0.2)2 − 1 = 0. For any x. then Δy = (1 + 0.2)2 ≈ 3. For a real variable x. the diﬀerential is the increment of y along the tangent line through the point (x.2.5 · sin(0. Equation (3.2 = 0. dx is considered as an independent variable.24) dy = df (x) = f (x) dx .4.68 3. For example.4. Put dx = Δx. the diﬀerential dx is deﬁned as an increment of x. 21. can always be neglected. Differential. dy depends linearly on dx. Note that the variables x and dx on the righthand side are independent variables.44. that is. If two real variables are related. It can be given the value of any real number independently of the value of x. This means that the diﬀerence Δy − dy must go to 0 faster than Δx. speciﬁcally. Consider an increment of the variable y between x + Δx and x: Δy = f (x + Δx) − f (x). The ratio Δy Δy − dy = − f (x) Δx Δx tends to 0 by the existence of f (x). the diﬀerential of a function df (x) = dy = f (x) Δx does not generally coincide with its increment Δy. x = 1. The diﬀerential dy = df (x) is deﬁned by the linear transformation of dx: (3. put f (x) = x2 .3. dy = Δy because the tangent line does not generally coincide with the graph. the following rule postulates the relation between their diﬀerentials.100057 when M = sin δ. One has  sin x − x ≤ ε = 1 M δ 2 = 0. which is to be compared with δ = 0. Let Δx tend to 0. An increment Δx is said to be inﬁnitesimally small if (Δx)n . x + Δx]. Δx = 0. 2 21.23) becomes δ 3 = 10−3 and δ = 0. Definition 27. Since the derivative f (x) determines the slope of the tangent line to the graph y = f (x).2. Geometrical Signiﬁcance of the Differential.9734 · 10−3 .
.03 or 3% (note that dx/x = 0. only 1%). the concept of the diﬀerential becomes rather practical: to establish relations between variations of related quantities in situations when these variations may be viewed as inﬁnitesimal. As a point of fact. Using the measured values of x. Solution: The volume V and side x are related as V = x3 . Example 40. that is. a value of a physical quantity given without its measurement error does not make much sense.24) looks rather natural: Inﬁnitesimal variations of two related variables must be related linearly as their higher powers can always be neglected. Thus. So dV = 3x2 dx. it is based on the diﬀerential of a function of several variables.1 cm and x = 10 cm. y ± dy. The diﬀerential dy represents an absolute error of y. i. The quantity dy/y is called a relative error.1 cm.01. which determines the length measurement error of 0. the deﬁnition (3.5. neither should one draw any conclusion from data without a proper analysis of the errors. Then the errors of the related quantities must be related as their diﬀerentials! That is. . 2 The error analysis for several related quantities is given in a more advanced course.21. the values of y are computed by y = f (x). LINEAR APPROXIMATIONS AND DIFFERENTIALS 69 one might think of diﬀerentials as inﬁnitesimal variations of variables. dV = 30 cm3 and V = 1000 ± 30 cm3 . Δx is known. where dy = f (x) dx and dx = Δx. One of the important practical applications of the diﬀerential lies in the error analysis of related quantities. Every physical quantity is known only with a certain degree of accuracy. one might assume that the errors are small. they are inﬁnitesimal variations of measured quantities. x and Δx are independent variables as the error Δx depends on the way in which the variable is measured (there are more and less accurate methods which would lead to smaller and higher values of Δx independently of the actual value of x). What is the accuracy of the values of y? Apparently. y = f (x).1 cm? A typical ruler has a grid spacing of 1 mm. Suppose there is a relation between two quantities y and x. Naturally. Setting dx = 0.e. Related Errors. 21. What are the absolute and relative errors of the volume of a cube if its side is 10 ± 0. Let x be measured with an error: x ± Δx. Errors are inherent in the very process of taking measurements. that is. The relative error is dV /V = 0. From this point of view.
.
Similarly. . one can always ﬁnd x such that f (x) > M (0 < x < 1/M ). no minimum exists. the function f (x) = 1/x deﬁned for all real x = 0 has neither maximum nor minimum because. 71 . the value f (c) is called the maximum value of f . A function f has an absolute minimum at c if f (x) ≥ f (c) for all x in the domain D of f . for any real M . but it does have an absolute minimum at x = 0 because x2 ≥ 0 for all x and f (0) = 0. ±2. At what angle should the stone be thrown in order to get the maximal range? An example of a modern optimization problem: How can one optimize the information ﬂow in the World Wide Web to avoid crashes of servers? Many of these problems can be reduced to ﬁnding the maximal and minimal values of a given function. Similarly. The function f (x) = x2 has no maximum on the real axis. ±1. where n = 0. For example. A function f has a local (or relative) maximum at c if f (x) ≤ f (c) for all x in some open interval containing c. For instance. that is. The maximum and minimum values of f are called the extreme values of f . f (x) < M if −1/M  < x < 0. Similarly. Definition 28 (Absolute Maximum and Minimum)... that is.CHAPTER 4 Applications of Differentiation 22. Minimum and Maximum Values Some of the most important applications of calculus are optimization problems. a function f has a local (or relative) minimum at c if f (x) ≥ f (c) for all x in some open interval containing c. An example of an ancient optimization problem: A man can throw a stone at a speed of v0 . A function does not always have a maximum or minimum. the function f (x) = cos x attains its maximum value 1 at x = 2πn. Definition 29 (Local Maximum and Minimum).. So no real number can be the maximum of f (x). The value f (c) is called the minimum value of f . f (x) ≥ f (0). and its minimum value −1 at x = π + 2πn. A function f has an absolute maximum at c if f (x) ≤ f (c) for all x in the domain D of f . for any real M .
In fact. f has a relative maximum in (−1. that is. The continuity hypothesis is essential. b]. M = −3). then f attains its absolute maximum and minimum values in [a. APPLICATIONS OF DIFFERENTIATION Example 41. 1). Does the function f (x) = x3 − x = x(x2 − 1) have an absolute maximal (minimal) value and relative maxima (minima) on the real axis? Solution: (I) The function has neither an absolute maximum nor an absolute minimum because it grows unboundedly with increasing x and it decreases unboundedly as x attains larger negative values. ±1. there exist c1 and c2 in [a. (III) Consider the open interval x ∈ (0. 1) (e. which will be studied shortly. f must have a relative minimum in (0. 2 One of the lessons that can be learned from this example is that one can think of a relative minimum (maximum) as an absolute minimum (maximum) when f is restricted to a suﬃciently small subset in its domain. The actual value c = 1/ 3.72 4. −2 ≤ f (x) ≤ 2 if −1 ≤ x ≤ 1.g. By increasing M . 1). one can eventually reach the situation when there is 0 < c < 1 such that M = f (c) ≤ f (x) for all 0 < x < 1. This observation is accurately stated by the following theorem. √ Remark. b]. 0). the continuity of f (x) = x3 − x was implicitly used in Example 41 to establish the existence of its relative maximum and minimum! The following example . (II) The function vanishes at three points x = 0. that is. This happens when the horizontal line y = M touches the graph y = f (x). The function is strictly negative in it and bounded from below: M < f (x) < 0 for all x ∈ (0. b]. The actual value is c = −1/ 3 (see below). that is. √ Remark. How is it obtained? There is a technique to ﬁnd c. Thus. It can have relative minima and maxima between its zeros because the values of f are bounded from above and below: f (x) ≤ x3 + x ≤ 2 for x ≤ 1. f is strictly positive in (−1. one can ﬁnd a point c ∈ (−1. 0) and bounded from above 0 < f (x) < M for some M . Theorem 24 (The Extreme Value Theorem). (IV) Similarly. By lowering the horizontal line y = M (or decreasing M ) to the point when it touches the graph y = f (x). If f is a continuous function on a closed interval [a. 0) such that f (x) ≤ f (c) for all x ∈ (−1. b] such that f (c1 ) ≤ f (x) ≤ f (c2 ) for all x in [a. 0)..
the derivative of the function vanishes at points where the function attains its local maximum or minimum value. If f has a local maximum or minimum at c. but not necessary. despite its discontinuity at x = 1. Proof. In particular. The second observation resulting from Example 41 is that at the point where a continuous function attains its local minimum or maximum value there is a horizontal line that touches the graph of this function. So the function is deﬁned on the closed interval [0. that is. but f (1) = 1! For any positive > 0. Note that the function f (x) = 2x when x ∈ [0. The continuous function f (x) = x does not attain its absolute maximum or minimum value on any open interval (a.22. On the other hand. f (c + h) − f (c) = f (c). then f (c) = 0. the lowest upper bound is M = 2.. The values of f approach 2 as x approaches 1 from the left. Thus. if the function is diﬀerentiable. By the existence of f (c). b]. 2] has an absolute maximum and minimum. f (0) ≤ f (x) ≤ f (1). then this horizontal line is a tangent line with the vanishing slope. 2) no matter how small is. but there is no c such that f (c) = 2. Then f (c) ≥ f (x) or f (x) − f (c) ≤ 0 in some open interval a < x < b. So. The hypothesis of the closedness of the interval is also a suﬃcient condition. The absolute minimum exists: f (0) ≤ f (x). f does not have an absolute maximum value because of its discontinuity at x = 1. So the continuity hypothesis is a suﬃcient condition. 1] and f (x) = 1 when x ∈ (1. An attempt to establish the existence of a maximal value of f by lowering M fails! Indeed.g. Theorem 25 (Fermat’s Theorem). the righthand and lefthand limits must coincide with f (c) (see Section 2): lim f (c + h) − f (c) f (c + h) − f (c) = f (c) = lim+ . b). 1) and f (x) = 1 if x ∈ [1. the continuous function f (x) = x3 − x in the open interval (−1.1) lim− . But it does so if the interval becomes closed: f (a) ≤ f (x) ≤ f (b) for any x ∈ [a. 2] and bounded from above f (x) < M (e. M > 2). f (2 − ) < f (x) for x ∈ (2 − . but not necessary. and if f (c) exists. 2]. Consider the function f (x) = 2x if x ∈ [0. h→0 h→0 h h Let f have a local maximum (the case of a local minimum can be treated similarly). 1) attains its absolute maximum and minimum value (see Example 41). [f (c + h) − f (c)]/h ≤ 0 for any (4. MINIMUM AND MAXIMUM VALUES 73 illustrates the point. h→0 h Therefore.
(III) If all critical points of a function are found. then their type (local maximum. by solving f (x) = 0. Deﬁnition 29). it follows from (4. Definition 30. However.2) and (4. but its derivative f (x) = 3x2 vanishes at x = 0. If f (c) exists. the converse of Fermat’s theorem is false. So this minimum cannot be found from f (x) = 0. or none of the above) can be analyzed by comparing values f (c ± h) with f (c). The function f (x) = x3 has no minimum or maximum. which is only possible if f (c) = 0.2) f (c + h) − f (c) ≤0 h → h→0 lim+ f (c + h) − f (c) ≤ 0. APPLICATIONS OF DIFFERENTIATION positive h > 0 such that c < c + h < b. all local minima and maxima can be found. where c is a critical point (cf. 2 This theorem provides a powerful tool to determine the actual positions of local maxima and minima. It has an absolute minimum at x = 0. (4. A simple example is the function f (x) = x. Let us go back to Example 41 (f (x) = x3 − x). 0) and a local minimum √ at x = 1/ 3 ∈ (0. Does the equation f (x) = 0 determine all local maxima and minima of f ? (I) A function may have a local minimum or maximum at a point where the derivative does not exist. (4.3). then. . The slope f (x) = 3x2 − 1 vanishes at two points √ x = ±1/ 3. According to the analysis carried out in Example 41. local minimum.1) that 0 ≤ f (c) ≤ 0 . 1). for any negative h < 0 such that a < c + h < c. then the second derivative test can be used. not all the solutions generally correspond to either a local maximum or a local minimum. one has [f (c + h) − f (c)] ≤ 0 and [f (c + h) − f (c)]/h ≥ 0. By Theorem 3. h By inequalities (4. In other words. √ f has a local maximum at x = −1/ 3 ∈ (−1. but f (x) does not exist at x = 0. (II) If f is diﬀerentiable everywhere. which is discussed later. h Similarly.3) 0≤ f (c + h) − f (c) h → 0 ≤ lim − h→0 f (c + h) − f (c) .74 4. Hence. A number c in the domain of a function f is said to be a critical point of f if either f (c) = 0 or f (c) does not exist.
the range is maximal. 2 Lmax = v0 /g. At what angle should one throw a stone to reach the maximal range at a given speed v0 ? Solution: (I) The range as a function of the angle θ has to be found ﬁrst. 2 Remark. x is the horizontal position (all the positions are in meters). the trajectory would deviate a . for larger values of v0 . like a projectile shot by a gun. (4. 2 Example 42.4) y = x tan θ − x2 2 2v0 cos2 θ where y is the stone height (vertical position). The stone lands when its height y vanishes. where n is any integer. π/2].8 m/s2 is a constant universal for all objects near the surface of the Earth (the freefall acceleration). In reality. If a stone is thrown at a speed v0 m/s and an angle θ with the horizontal line. This is a consequence of the Newton’s second law. The conclusion in the preceding example is independent of the stone’s mass and its initial speed v0 . MINIMUM AND MAXIMUM VALUES 75 (IV) A function deﬁned on a closed interval [a. This equation has countably many solutions 2θ = π/2 + πn. When ﬁnding the absolute maximum and minimum values. where L(θ) = 2 v2 2v 2 2v0 tan θ cos2 θ = 0 sin θ cos θ = 0 sin(2θ). But in the interval of the physical values of θ ∈ [0. The largest (smallest) of them is the absolute maximum (minimum) value. this is where the stone was thrown) and at x = L(θ). g g g (II) The range L(θ) is a diﬀerentiable function of θ so the values of θ at which L attains its extreme values may be found from the equation L (θ) = 0 → 2 v0 2 cos(2θ) = 0 g → cos(2θ) = 0 . it has only one solution θ = π/4. and g = 9. Since sin(2π/4) = 1 (the absolute maximum of the sine). This happens at x = 0 (naturally. b] can have its absolute maximum or minimum at the endpoints. when a stone is thrown at 45◦ . then its trajectory is a parabola: g .22. that is. L attains its maximal value at θ = π/4. the values of f at the critical points must be compared with f (a) and f (b).
it has already been used in Example 41: The function f (x) = x3 − x on the intervals [−1. [−1. APPLICATIONS OF DIFFERENTIATION bit from the parabola (due friction with the air). Example 41 for x ∈ [−1. Example 41 for x ∈ [−1. The Mean Value Theorem Theorem 26 (Rolle’s Theorem). Since f is continuous. then. Proof of Theorem 26. Example 43. b). b]. and therefore f has a maximum in [a. 0]. The range optimization problem would be more involved and would require the theory of diﬀerential equations. b]. 23. the angle would be diﬀerent from 45◦ when the stone is thrown. b) (cf. So the optimal angle would deviate a bit from π/4. (III) f (a) = f (b). It should also be noted that the angle at which the maximal range is attained depends on the initial height at which the stone is thrown. b). f (c) = 0 because f is diﬀerentiable in (a. (III) If f (x) < f (a) for some x ∈ (a. 1] or x ∈ [0. the extreme value theorem applies. 1]). By Fermat’s theorem. (I) If f (x) = f (a) = k is a constant function. This theorem provides a useful method to prove the existence of a local maximum or minimum of a function f when analytic solutions of the equation f (x) = 0 are hard to ﬁnd. b) such that f (c) = 0. (II) Let f (x) > f (a) for some x ∈ (a. by Fermat’s theorem. for example. and. 1] satisﬁes the hypotheses of Rolle’s theorem because f (±1) = f (0) = 0. 0]). So. In fact. 1].76 4. (II) f is diﬀerentiable on the open interval (a. f (c) = 0. [0. from a cliﬀ. Since f (a) = f (b). b) (cf. then f (x) = 0 everywhere. the maximal value must be attained at c ∈ (a. b). The deviation would also depend on the mass and the initial speed. f has a minimum at c ∈ (a. by the extreme value theorem. 2 Rolle’s theorem is also useful to analyze the root pattern of a function. b). The proof follows closely the arguments of Example 41. Then there is a number c in (a. How many real roots does the equation x5 + x3 + x − 1 = 0 have? . Let f be a function that satisﬁes the following three hypotheses: (I) f is continuous on the closed interval [a.
5)) (as f (c) is the slope of the tangent line at x = c). Then there is a number c ∈ (a. L(b) = f (b) . f (a)) and (b. b). So f has at least one root in (−1. it is continuous on [a. Proof of Theorem 27.23. (II) f is diﬀerentiable on the open interval (a. (II) Suppose it has two roots a and b. (I) Consider the line through the points (a. Let us turn to a formal proof. b) such that (4. by Rolle’s theorem. Thus. b]. b). that is. f (b)). By continuity. b−a L(a) = f (a) . f has to take all intermediate values between −4 and 2 (the intermediate value theorem). 2 Theorem 27 (The Mean Value Theorem). Its equation is (4. Consider the line through the points (a. consider the function (4. b].6) y = L(x) = f (a) + f (b) − f (a) (x − a) . THE MEAN VALUE THEOREM 77 Solution: (I) Let f (x) = x5 + x3 − 1. Let f be a function that satisﬁes the following hypotheses: (I) f is continuous on the closed interval [a. f has the only real root. f (b)).7) h(x) = f (x) − L(x) = f (x) − f (a) − Its values determine the deviation of the graph y = f (x) from the secant line y = L(x) on the closed interval [a. First. b] as the sum of two . f (a) = f (b) = 0. But this is not possible because f (x) = 5x4 +3x2 +1 > 0 for any x.5) f (c) = f (b) − f (a) b−a or f (b) − f (a) = f (c)(b − a) . Equation (4. Evidently. f (b) − f (a) (x − a) . f (x) has to vanish somewhere in (a. Then. 1). The geometrical interpretation of the theorem is simple. The theorem asserts the existence of a point where the graph y = f (x) has a tangent line with the same slope (cf. f (−1) = −4 < 0 and f (1) = 2 > 0. b−a Next. (II) The function h(x) satisﬁes the three hypotheses of Rolle’s theorem. f (a)) and (b. Its slope is (f (b) − f (a))/(b − a).
there is a number c ∈ (a.78 4. there is a c ∈ (a. Second. b) such that f (b) − f (a) . .7).8) has been used. that is. b].75) when 65 s(0. how small and how large can f (b) possibly be? Solution: By the mean value theorem. b).75 hr so that s(0) = 0 and s(0.75 The speeding ticket is justiﬁed. Since m ≤ f (c) ≤ M . Example 45. Another state trooper approached the driver and issued a speeding ticket. The mean value theorem simply states that a moving object always attains its average speed at least at one moment of time between a and b.6) and (4. that is. h(a) = h(b). b) as the sum of two diﬀerentiable functions: f (b) − f (a) .75) = 65 mi. It is diﬀerentiable as s (t) is the car speed! By the mean value theorem. A speeding car was pulled over on an interstate road and a state trooper gave a warning to the driver. Forty ﬁve minutes later and passed 65 miles on the road.7 mi/hr. So. m ≤ f (x) ≤ M . it is diﬀerentiable on (a. 0. there is a time moment t = c ∈ (0. the ratio (s(b) − s(a))/(b − a) is the average speed on the time interval [a. h(a) = f (a) − L(a) = 0 and h(b) = f (b) − L(b) = 0. if at time moment b the object appears to be traveling slower than its average speed.75 − 0 0. Suppose the derivative f exists and is bounded on (a. by (4. s (c) = v(c) = 0. Example 44. (III) By Rolle’s theorem. prior to that it must have been traveling faster than its average speed. the car stopped at a rest area.8) h (x) = f (x) − b−a Finally. b) such that f (b) = f (a) + f (c)(b − a). 2 For any two moments of time a and b. b) . If f (a) is given. The function s(t) is deﬁned between t = 0 and t = 45 min = 0. claiming that the driver exceeded 86 mi/hr. x ∈ (a. (4. h (c) = 0 → f (c) = b−a 2 where Equation (4.75) − s(0) = ≈ 86. APPLICATIONS OF DIFFERENTIATION continuous functions f (x) and −L(x) (a linear function is continuous). f (a) + m(b − a) ≤ f (b) ≤ f (a) + M (b − a) . The rate of change s (t) = v(t) is the speed of the car at any moment of time. Was the trooper’s claim correct? Solution: Let s(t) be the distance traveled by the car after it was pulled over the ﬁrst time.
The First and Second Derivative Tests Suppose the critical points of a function f are known. many qualitative features of the graph y = f (x) can be deduced from properties of the derivatives of f . that is. the sign function f (x) = 1 if x > 0. 2 24. How about the converse? The following theorem answers this question. but it is not constant.24. then f is constant on (a. or none of the above? It turns out that this question can be answered by studying the derivatives f and f . If f is diﬀerentiable. that is. If f (x) = g (x) for all x in an interval (a. then all critical points can be found by solving the equation f (x) = 0. Since h = f − g = 0 in (a. b). and the conclusion follows. The key point to note is that the domain is not a single interval. Take any two numbers x1 and x2 between a and b. How can one ﬁgure out the nature of a critical point. whether it is a local maximum. By the mean value theorem. 0) and (0. b). This example is easily extended to the case when the domain is any collection of disjoint intervals and f takes diﬀerent constant values on diﬀerent intervals. Corollary 5. f is constant. By hypothesis. local minimum. and f (x) = −1 if x < 0. 2 The derivative of a constant function vanishes. but cannot exceed M . Thus. . where k is a constant. THE FIRST AND SECOND DERIVATIVE TESTS 79 This equation is easy to understand with the help of a mechanical analogy: How far can a car travel in time b − a if its speed is not lower than m. ∞) So the mean value theorem is not applicable to any interval containing x = 0. For example. Theorem 28. f (c) = 0 for any c. but a union of two disjoint intervals (−∞. f (x1 ) − f (x2 ) = 0 or f (x1 ) = f (x2 ) for any x1 and x2 in (a. Proof. that is. f (x) = g(x) + k. there is a number c between x1 and x2 such that f (x1 ) − f (x2 ) = f (c)(x1 − x2 ). then f − g is constant. If f (x) = 0 for all x in an interval (a. In addition. b). b). Let h(x) = f (x) − g(x). b). has zero derivative at any point of its domain. h is constant. 2 The hypothesis that f (x) = 0 in a single interval is crucial. Proof.
Proof. Then. then f has a minimum at c. f (c − h) < f (c) and f (c) > f (c + h) for some small positive h. the increasingdecreasing test yields f (c−h) < f (c) < f (c+h) or f (c−h) > f (c) > f (c+h). If f > 0. f (c − h) > f (c) and f (c) < f (c + h). Suppose m < 0 and M > 0 or m > 0 and M < 0. the mean value theorem states that there is a number c between x1 and x2 such that (4. then f must vanish between a and b. We can then conclude that f attains its local maximum at c. Similarly. that is. if the derivative f changes from negative to positive at c. and hence f attains its local minimum at c. (II) If f < 0 on an interval. and the function is decreasing. APPLICATIONS OF DIFFERENTIATION 24.80 4.9) that f (x2 ) − f (x1 ) > 0 because. A function is increasing if f (x1 ) < f (x2 ) and decreasing if f (x1 ) > f (x2 ). Similarly. Theorem 29 (IncreasingDecreasing Test). for 2 f < 0. b]. Suppose f is continuous such that f (a) = m and f (b) = M . (II) If f changes from negative to positive at c. then f is decreasing on that interval.9) f (x2 ) − f (x1 ) = f (c)(x2 − x1 ) . Suppose that c is a ical point of a continuous function f . Naturally. Theorem 30 (The First Derivative Test). that is. by assumption. (I) If f changes from positive to negative at c. then f is increasing on that interval.1. then it follows from (4. f (c) = 0. then f has neither a maximum nor a local minimum at c. there is a possibility that f (c) = 0 but f (x) does not change its sign at c. that is. on the interval [a. Take any two numbers x1 and x2 in the interval so that x1 < x2 . the derivative changes its sign on the interval [a. then f has a maximum at c. the function f changes from increasing to decreasing at c. then. Properties of the First Derivative. if the derivative f changes from negative to positive at c. according to the increasingdecreasing test. that is. More to the point. x2 > x1 . In such a situation. f must take all intermediate values between m and M . The ﬁndings are summarized in the following theorem. the function is increasing. f (x2 ) − f (x1 ) < 0. in either case the function f has neither a local minimum nor a local maximum. (III) If f does not changes its sign at c. Since f is diﬀerentiable. (I) If f > 0 on an interval. critlocal local local . then f changes from decreasing to increasing at c. b]. This means that f has a critical point a < c < b.
For x > c2 . Recall the deﬁnition of a critical point (f (c) = 0 or f (c) does not exist). Naturally. Consider the function f (x) = 1/x2 for x = 0 and f (0) = 0. For example. The function is increasing for x < 0 because f > 0. in the preceding proof of the ﬁrst derivative test. However. √ √ Hence. the product (x − c1 )(x − c2 ) is positive (as the product of two negative numbers). f has no maximum at x = 0 because f is discontinuous at x = 0. the stone has to reach the maximal height. In fact. and it is decreasing for x > 0 because f < 0. Example 46 (Example 41 Revisited). c1 )). For c1 < x < c2 . the condition f (c) = 0 can easily be dropped because all that is really needed to apply the increasingdecreasing test is the sign of the derivative f (x) for x < c and x > c. c2 )). it attains its absolute minimum at x = 0! There are plenty of mechanical analogies of the ﬁrst derivative test. the stone moves upward so H > 0 (the height is increasing). The continuity hypothesis is also crucial. the critical points are c1 = −1/ 3 and c2 = 1/ 3. In fact. Let H(t) be the height (relative to the ground) of a stone thrown upward as a function of time t. Then f (x) = −1 for x < 0 (the function is decreasing) and f (x) = 1 for x > 0 (the function is increasing). Hence. THE FIRST AND SECOND DERIVATIVE TESTS 81 It is important to note that the very existence of f at c is not required in the ﬁrst derivative test. At the beginning. f (x) = x. So x = 0 is a critical number. the product (x − c1 )(x − c2 ) is positive (as the product . at some moment of time. all its critical points satisfy the equation √ √ f (x) = 3x2 − 1 = 3 x − 1/ 3 x + 1/ 3 = 0 . Find all local maxima and minima of f (x) = x3 − x and the intervals on which the function is increasing or decreasing.24. and hence f < 0 (f is decreasing on (c1 . f (x) has a minimum at x = 0. Solution: (I) Since f is diﬀerentiable (it is a polynomial). Analyze the motion of a pendulum (or a seesaw) from this point of view! The height would have two maxima and one minimum. When the stone comes back to the ground. the product (x − c1 )(x−c2 ) is negative (as the product of a negative and positive number). even though f does not exist at x = 0. and hence f > 0 (f is increasing on (−∞. (II) For x < c1 . Then f (x) = −2/x3 for x = 0 and f (0) does not exist. it moves downward so H < 0 (the height is decreasing).
f has a local minimum at c2 . h where the ﬁrst inequality has been divided by a positive number h2 . Note that the notion of concavity implies that f is diﬀerentiable (otherwise. The latter inequality is true for any h. In other words. If f is twice diﬀerentiable. f has a local maximum at c1 . Therefore.10) h[f (c + h) − f (c)] > 0 → f (c + h) − f (c) > 0. Therefore. Properties of the Second Derivative: Inﬂection Points. f (x) − L1 (x) > 0 and f (x) − L2 (x) > 0 for all x in I. by taking the limit h → 0. ∞)). Inequality (4. Suppose that the graph of f is concave upward on I. Putting x = c in the last inequality and x = c + h in the former one. and hence f > 0 (f is increasing on (c2 .82 4. . we can conclude that f (c) > 0 if the graph is concave upward. we obtain f (c) − L2 (c) = f (c) − f (c + h) + f (c + h)h > 0.10) shows that f (c + h) > f (c) for h > 0 and f (c) > f (c+h) for h < 0. The graph is called concave downward on I if it lies below all of its tangent lines on I. that is. Therefore. Consider the tangent lines at two points c and c + h in I: L1 (x) = f (c) + f (c)(x − c) . (III) The derivative changes from positive to negative at c1 . the downward concavity implies that f is negative. APPLICATIONS OF DIFFERENTIATION of two positive numbers).2. the derivative f . f (c + h) − L1 (c + h) = f (c + h) − f (c) − f (c)h > 0 The sum of the righthand sides of these inequalities is positive as the sum of two positive numbers: (4. or the slope of the tangent line of the graph of f . Definition 31 (Concavity). The derivative changes from negative to positive at c2 . It turns out that the converse is also true. The graph of f lies above the lines L1 and L2 . L2 (x) = f (c + h) + f (c + h)(x − c − h). Similarly. The graph of a function f is called concave upward on an interval I if it lies above all of its tangent lines on I. increases for the upward concavity. and hence (f ) = f must be positive by the increasingdecreasing test. the tangent lines do not exist). 2 24. then the concavity is determined by the sign of the second derivative f .
that is. f (x) = x3 ). Third.. THE FIRST AND SECOND DERIVATIVE TESTS 83 Theorem 31 (The Concavity Test).24. take f (x) = −x4 . In Example 46. f (c ± h) < 0. then the concavity may or may not change at c as discussed earlier. Definition 32 (Inﬂection Point). Similarly. (II) If f (c) = 0 and f (c) < 0. A point P on the graph y = f (x) is called an inﬂection point if f is continuous there and the graph changes from concave upward to concave downward or from concave downward to concave upward.g. then the graph of f is concave downward on I. Suppose f is continuous near c. The function may have a local maximum. First. As an example. Theorem 32 (The Second Derivative Test). What can f (c) tell us about the nature of the critical number (local minimum or maximum)? There are three possibilities. f (−1/ 3) = −2 3 < . a local minimum. (I) If f (x) > 0 for all x in I. This implies that the graph is concave downward to the left and right of c. the concavity changes from downward to upward (e. f (c ± h) > 0 for some small h > 0. Second. which is a horizontal line because f (c) = 0. Suppose f is continuous near c. Hence. (II) If f (x) < 0 for all x in I. that is. its graph lies above the tangent line at c. f (x) = −x3 ). (I) If f (c) = 0 and f (c) > 0. Let f be twice diﬀerentiable on an interval I. a local minimum. How does the graph of f look near a point c where f (c) = 0? There are four possibilities. the condition f (c) < 0 implies that the concavity is downward near c and f has a local maximum. then f may have a local maximum. This means that the graph is concave upward to the left and right of c. then f has a local. the √ function f (x) = x3 − x is shown to have two √ √ critical points: x = ±1/ 3. that is. (III) If f (c) = 0 and f (c) = 0. f is concave upward near c. f (c) > 0. f (c−h) > 0 and f (c+h) < 0. f (c − h) < 0 and f (c + h) > 0. that is. First. Since f (x) = 6x.g. consider f (x) = x4 . Fourth. then the graph of f is concave upward on I. or an inﬂection point. the concavity changes from upward to downward (e. maximum at c. If f (c) = 0. no conclusion about the nature of the critical point can be reached. As an example. This means that f (x) > 0 for all x in some neighborhood of c (by the continuity of f ).. minimum at c. Let c be a critical point of f . or an inﬂection point. So f must have a local minimum. then f has a local.
. The resulting polynomial is called the nthdegree Taylor polynomial: Tn (x) = f (a) + f (a)(x − a) + f (a) f (n) (a) (x − a)2 + · · · + (x − a)n . Note that an inﬂection point may not be related a critical point! In other words. namely the concavity. .. In the previous example. Consider the seconddegree polynomial T2 (x) = f (a) + f (a)(x − a) + c2 (x − a)2 = L(x) + c2 (x − a)2 . Taylor Polynomials and the Local Behavior of a Function The tangent line approximation L(x) is the best linear approximation of f (x) near x = a because L(x) and f (x) have the same rate of change at a. APPLICATIONS OF DIFFERENTIATION √ √ 0 (a local maximum) and f (1/ 3) = 2 3 > 0 (a local minimum). This polynomial has the same features as L(x). The function also has an inﬂection point at x = 0: f (x) = 6x < 0 if x < 0 and f (x) = 6x > 0 if x > 0.84 4. T2 (a) = L(a) = f (a) and T2 (a) = L (a) = f (a) because T2 (x) = f (a) + 2c2 (x − a). So it might provide a better approximation of f (x) than L(x) near a if the coeﬃcient c2 is chosen so that T2 (x) has the same concavity as f (x) near a. where the coeﬃcients are ﬁxed by the conditions Tn (a) = f (a) . The question arises whether there is a systematic method to improve the accuracy of the tangent line approximation to capture more essential features of the behavior of f (x) near a (i. (n) Tn (a) = cn . which yields 2c2 = f (a) or c2 = f (a)/2. where c2 is an arbitrary coeﬃcient.1. By the concavity test. f (0) = −1. Tn (a) = f (a) . 2! n! . The idea can be extended to a polynomial of degree n: Tn (x) = c0 + c1 (x − a) + c2 (x − a)2 + · · · + cn (x − a)n . that is. . The tangent line L(x) has no concavity as L (x) = 0. the tangent line at an inﬂection point can have any slope. it is then reasonable to assume that T2 (a) = f (a). The function L(x) is a polynomial of the ﬁrst degree. it was shown that the second derivative at a provides important information about the behavior of f (x) near a. Taylor Polynomials.e. 25. . In the previous section. Tn (a) = f (a) . 25. the local behavior of f ).
Second. Let f (n+1) be bounded on I.604 f = 1. Then (4.000 T2 = 2. that is.284 T1 = 1. by the next monomial to be added to T2 to get the next Taylor polynomial.25 : f = 2. Let f be continuously diﬀerentiable n+1 times on an open interval I containing a. the Taylor polynomials are f (x) = ex: Tn (x) = 1 + x + 1 2 1 3 1 x + x + · · · + xn . (n + 1)! where Tn is the Taylor polynomial at a. the accuracy increases with the degree of the Taylor polynomial (reading the rows of the table).25.11) f (x) − Tn (x) ≤ M x − an+1 . f (n+1) (x) ≤ M . one should take the maximal value of the righthand side of (4.11) for n = 5 in the interval.607 T1 = 0.667 f = 0.625 T3 = 0.500 T2 = 0. 1]? To get the upper bound on errors.284 Two observations can be made from this table. Theorem 23 is a particular case of this theorem for n = 1. Let us compare it with the accuracy of higherdegree Taylor polynomials. (ex )(n) = ex ≤ M = e. First. This observation is a characteristic feature of Taylor polynomials: Theorem 33.0038 (less than 4%).11) is a consequence of the Taylor theorem whose proof goes beyond the scope of this course. . lowerdegree Taylor polynomials become more accurate as the argument gets closer to the point at which the Taylor polynomials are constructed (reading the columns of the table).5 : x = 0. TAYLOR POLYNOMIALS AND THE LOCAL BEHAVIOR 85 25.281 T3 = 1.500 T3 = 2.718 T1 = 2. For example. Accuracy of Taylor Polynomials.2. Consider Taylor polynomials of the exponential function ex near x = 0. So the accuracy of the approximation ex ≈ T2 (x) is determined by the diﬀerence T2 − T3 = −x3 /6. and x ≤ 1. what is the accuracy of the Taylor polynomial T5 (x) near a = 0 for the exponential ex in the interval [−1. 2 6 n! Let us take a few values of x near x = 0 and compare the values of the Taylor polynomials with the value of the function: x = 1: x = −0.250 T2 = 1. the approximation ex ≈ T3 (x) is accurate up to four signiﬁcant digits if x ≤ 1/4. Inequality (4. Since (ex ) = ex and e0 = 1. so errors cannot exceed e/6! ≈ 0. that is. For example. The accuracy of the tangent line approximation is assessed in Theorem 23.
cos x has a local maximum at a = 0. Investigate f (x) = x − tan x near x = 0. 2 25. So. Thus. When x approaches −1 from the left. Suppose f has a vertical asymptote at a. In this case. f (x) → −∞ as x → 1− and f (x) → ∞ as x → 1+ . Solution: Find a Taylor polynomial for tan x with two nontrivial terms. Therefore. Provided f is enough times diﬀerentiable. f (x) ≈ x − T3 (x) = x3 /3. For example. Taylor Polynomials near Critical Points. one should look at T4 (x) = f (a) + f (4) (a)(x − a)4 /24. f (x) tends to −∞. the function x(x2 + 3) x(x2 + 3) = (4. Asymptotes. the function f (x) increases (decreases) unboundedly as x approaches a from either the left or the right. f (x) behaves near a as T3 (x) = f (a) + f (a)(x − a)3 /6. Taylor polynomials provide a powerful technique to investigate the local behavior of a diﬀerentiable function.86 4. Taylor polynomials can be constructed near a. it is T3 : tan x ≈ T3 (x) = x − x3 /3. Example 47.3. The second derivative test is easy to understand by looking at T2 (x) = f (a) + f (a)(x − a)2 /2. then f looks like a downward parabola near a (a local maximum). or not even deﬁned at a. and it behaves near a = 0 as cos x ≈ T2 (x) = 1 − x2 /2. if f (a) = 0. or how does it behave in the asymptotic regions x → ±∞? Definition 33 (Vertical Asymptotes). the local behavior of f near its critical point is determined by a Taylor polynomial that has the ﬁrst nonvanishing correction to f (a). Similarly. Let a be a critical point of f . The line x = a is a vertical asymptote of the graph y = f (x) if at least one of the limits limx→a± f (x) is inﬁnite (∞ or −∞). In this case. If f (a) > 0. How can one analyze the behavior of a function near a if it is not diﬀerentiable at a. So there is an inﬂection point at x = 0. while it tends to ∞ if −1 is approached from the right. f has an inﬂection point at a.12) f (x) = x2 − 1 (x − 1)(x + 1) has two vertical asymptotes because the denominator vanishes at x = 1 and x = −1. If f (a) = 0. In other words. APPLICATIONS OF DIFFERENTIATION 25. The second derivative test is inconclusive if f (a) = 0. How does it behave near a? How “fast” does it diverge when x gets closer to a? . The linear term vanishes because f (a) = 0. then f looks like an upward parabola near a (a local minimum). A function has a local maximum (minimum) at a if f (4) (a) > 0 (f (4) (a) < 0).4. For example. If f (a) < 0.
(I) Near x = −1. there are many g that have the same asymptotic behavior because one can always change g by adding h such that h(x) → 0 as x → a± . Since 1 + 3/x2 > 1 − 1/x2 . where h(x) is ﬁnite near x = −1. the ratio (1 + 3/x2 )/(1 − 1/x2 ) > 1 for all x2 > 1. the unbounded growth of f (x) is associated with the divergent factor 1/(x + 1) so that f (x) = h(x)/(1 + x). the slant asymptote is called a horizontal asymptote. near x = 1 f (x) = f (x) = 2 1 x(x2 + 3) ≈ = g(x) .13) x→a+ lim (f (x) − g(x)) = 0 or x→a− lim (f (x) − g(x)) = 0 . one looks for a simple way to estimate the values of f (x) near a.25. x−1 x+1 x−1 3 x · x2 (1 + x2 ) x3 ≈ 2 = x = g(x) . For a given f . limx→±∞ (f (x) − x) = 0. then f is said to have a slant asymptote.12). TAYLOR POLYNOMIALS AND THE LOCAL BEHAVIOR 87 Definition 34 (Asymptotic Behavior). 1 x x2 (1 − x2 ) (III) When x is a large negative or positive number. and for m = 0.13). In particular. So the graph of f asymptotically approaches the line y = x. A practical problem is to ﬁnd as simple a g as possible with the property (4. (II) Similarly. Example 48. The functions f (x) and g(x) on an open interval x > a (including x > −∞) or x < a (including x < ∞) are said to have the same asymptotic behavior if (4. and hence f (x) > x. limx→−1± (f (x) − g(x)) = 0. that is. Solution: The function has to be investigated near x = ±1 and also when x → ±∞. f (x) = where 1/x3 and 3/x2 are small as compared to 1 for large x2 and can be neglected. if x → ±∞ and g(x) = mx + b. x+1 x−1 x+1 Apparently. The graphs of f (x) and g(x) = 2/(x + 1) are close near x = −1. This means that the graph of f approaches the slant asymptote y = x from above when x is a large positive or negative number. Then f (x) ≈ h(−1)/(x + 1) = g(x): 2 1 x(x2 + 3) ≈ = g(x) . In other words. Find the asymptotic behavior of the function (4. 2 .
They are ratios f /g of two functions f and g such that f (x) → 0 and g(x) → 0 as x → 0. . a limit of the form (4.15) as x → 0+ . Note that the use 2 of T2 in place of T4 would not be enough to establish the asymptotic behavior of f . Investigate f (x) = x−8/3 (1 − cos x) near x = 0. Similarly. L’Hospital’s Rule If a function f is not deﬁned at a. There is a special technique to answer it. The limit itself may or may not exist. Indeterminate Forms lowing functions: (4. 2 x x x3 Do they have a vertical asymptote at x = 0? These functions have a common feature. Solution: The factor x−8/3 diverges as x → 0. 26. ∞ Consider the behavior of the fol tan(x) − x ex − 1 1 − cos(x) . APPLICATIONS OF DIFFERENTIATION The following example illustrates the use of Taylor polynomials to ﬁnd the asymptotic behavior. or does not even exist. f (x) x→a g(x) lim is called an indeterminate form of type 0 if both f (x) → 0 and g(x) → 0 0 ∞ as x → a.14) 0 0 and ∞ . 2 26.88 4. . The following theorem provides a powerful method to study the indeterminate forms of these types. then its behavior depends on the limit of f as x → a. f (x) − 1 x−2/3 ≈ 24 x4/3 → 0 as x → 0. Example 49. inﬁnite. one can make ratios where the limits of the numerator and denominator at a particular point are inﬁnite: ln(x) x−1 In general. as x → 0 .1. whether it is ﬁnite. 1 Therefore. So this question is of importance when investigating a function. it is called an indeterminate form of type ∞ if both f (x) → ∞ (or −∞) and g(x) → 0 (or −∞). but cos x is smooth near x = 0 and can be approximated by the Taylor polynomial T3 = 1 − x2 /2 + x4 /24: f (x) ≈ x−8/3 (1 − T4 (x)) = x−8/3 ( 1 x2 − 2 1 4 x) 24 = 1 x−2/3 − 2 1 4/3 x 24 .
The conditions of l’Hospital’s rule must be veriﬁed for the corresponding limits. Then (4. then the limit is determined by f (a)/g (a) and so on. the conditions of l’Hospital’s rule are satisﬁed for the derivatives f (x) and g (x) in this case. 1 g(x) g(a) + g (a)(x − a) + 2 g (a)(x − a)2 + · · · If f (a) = g(a) = 0. then the limit of the ratio is determined by f (a)/g (a). the derivatives f and g are continuous. It is not so easy to prove the general version of l’Hospital’s rule (the proof is omitted here). For functions diﬀerentiable many times. If f (a) = g(a) = 0 and f (a) = g (a) = 0. l’Hospital’s rule may be applied again to the ratio f (x)/g (x). What happens if f (a) = g (a) = 0? Apparently. So. f (x) ≈ f (a)(x − a) and g(x) ≈ g (a)(x − a). it is not diﬃcult to see why l’Hospital’s rule (4.16) holds: f (x) = lim lim x→a g(x) x→a f (x)−f (a) x−a g(x)−g(a) x−a = limx→a f (x)−f (a) x−a g(x)−g(a) limx→a x−a = f (x) f (a) = lim .26. Suppose that x→a lim f (x) = lim g(x) = 0 x→a or that x→a lim f (x) = ±∞ and x→a lim g(x) = ±∞. L’HOSPITAL’S RULE 89 Theorem 34 (L’Hospital’s Rule). so that f (x)/g(x) ≈ f (a)/g (a) near a. Suppose f and g are diﬀerentiable and g (x) = 0 on an open interval that contains a (except possibly at a). L’Hospital’s rule is also valid for onesided limits x → a± and for the limits at ±∞. l’Hospital’s rule is easy to understand via the Taylor polynomials: f (a) + f (a)(x − a) + 1 f (a)(x − a)2 + · · · f (x) 2 ≈ . This simpliﬁed version of l’Hospital’s rule can be understood geometrically. . g (a) x→a g (x) The ﬁrst equality follows from f (a) = g(a) = 0.16) f (x) f (x) = lim x→a g(x) x→a g (x) lim if the limit on the righthand side exists (or is inﬁnite). and g (a) = 0. For the special case in which f (a) = g(a) = 0. The functions f and g can be approximated by their tangent lines at a. and the last equality follows from the continuity of the derivatives. the second and third equalities are the consequence of the limit laws and the assumption that g (a) = 0.
15). Then f (x) = sin(x) and g (x) = 2x. L’Hospital’s rule can be used again to resolve the indeterminate form. APPLICATIONS OF DIFFERENTIATION Example 50. x→0+ (x−1 ) x→0+ −x−2 x→0+ x−1 2 . taking higherorder derivatives might be quite an algebraic exercise. x→0 x→0 x→0 1 x (x) lim (II) Let f (x) = 1 − cos(x) and g(x) = x2 so that f (0) = g(0) = 0. ex − 1 (ex − 1) ex = lim = lim = 1. simple algebraic transformations of an indeterminate form in combination with basic limit laws may lead to the answer faster than a successive use of l’Hospital’s rule: tan(x) − x sec2 (x) − 1 1 − cos2 (x) = lim = lim 2 lim x→0 x→0 x→0 3x cos2 (x) x3 3x2 = lim sin2 (x) 1 = 2 x→0 3x 3 sin(x) x→0 x lim 2 = 1 . Investigate the indeterminate forms (4. l’Hospital’s rule can be applied again: lim 1 − cos(x) sin(x) (sin(x)) cos(x) 1 = lim = lim = lim = . Since f (0) = 0 and g (0) = 0. Hence. x→0 lim + ln(x) (ln(x)) x−1 = lim = lim = − lim x = 0. Then f (0) = g(0) = 0 (the conditions of l’Hospital’s rule are fulﬁlled). therefore cos2 (x) in the denominator can be replaced by accord with the basic limit laws.90 4. The derivatives f (x) = sec2 (x) − 1 and g (x) = 3x2 vanish at x = 0. So the conditions of l’Hospital’s are fulﬁlled. (IV) Let f (x) = ln(x) and g(x) = x−1 so that f (x) → −∞ g(x) → ∞ as x → 0+ . 2 x→0 x→0 x→0 x 2x (2x) 2 2 x→0 (III) Let f (x) = tan(x) − x and g(x) = x3 so that f (0) = g(0) = 0. Solution: (I) Let f (x) = ex − 1 and g(x) = x. 3 and 1 in and rule The third equality follows from cos(x) → 1 as x → 0. For complicated functions. Therefore. Sometimes.14) and (4.
26. The procedure is illustrated with an example of the type ∞0 indeterminate power: x→∞ lim x1/x = lim eln(x x→∞ 1/x ) = lim eln(x)/x = elimx→∞ ln(x)/x = e0 = 1 . although either of the transformations in (4. For instance. it follows that x ln2 (x) → 0 as x → 0+ .17). ∞0 . The indeterminate powers can be transformed into an indeterminate product with the help of the identity y = eln(y) : x→a lim [f (x)]g(x) = lim eln([f (x)] x→a g(x) ) = lim eg(x) ln(f (x)) = elimx→a g(x) ln(f (x)) . + x→0 Although our goal has not been achieved. It can be transformed into an indeterminate form of the type ∞ as in (4. Several indeterminate forms arise from the limits of [f (x)]g(x) as x → a: 00 (f (x) → 0 . our eﬀort has not been in vain. 0 The function x ln(x) is an indeterminate product of the type 0 · ∞ as x → 0+ . How can the indeterminate product f (x)g(x) be investigated when x → a? It turns out the indeterminate product can be transformed into one of the indeterminate forms to which l’Hospital’s rule is applicable: (4. Note that. g(x) → 0). Note c0 = 1 if c = 0 and c = ∞. if the second option in (4. and 1∞ .17) fg = f 1/g ∞·0→ ∞ ∞ or f g = g 1/f ∞·0→ 0 .. one can infer that x lnn (x) → 0 as x → 0+ for any n = 1. Suppose that f (x) → ∞ and g(x) → 0 as x → a. then lim x ln(x) = lim + + x→0 x 1 ln(x) x→0 = lim + x→0 1 1 − ln21(x) x = − lim x ln2 (x) . g(x) → 0) .17) is applied to x ln(x) = x/(1/ ln(x)). By repeating this procedure recursively. g(x) → 0) . which is then resolved by l’Hospital’s rule (see Example ∞ 50). (f (x) → ∞ . L’HOSPITAL’S RULE 91 26. ∞0 1∞ (f (x) → 1 .2. c∞ = 0 if 0 ≤ c < 1 and c∞ = ∞ if c > 1. Indeterminate Products 0 · ∞. the technicalities involved might diﬀer substantially.26. Since the lefthand side vanishes by Example 50. 2.15).. x→a The limit of g(x) ln(f (x)) is of type 0 · ∞ and can be treated by the rule (4. .3. x→∞ . Similarly. Indeterminate Powers 00 .17) may be applied with the subsequent use of l’Hospital’s rule..
that is. 3 /6 2 /6 sin(x) x−x 1−x 2 where x2 /6 is small as compared to 1 when x is close enough to 0 and can therefore be neglected in the denominator. − cot(x) = lim x→0 sin(x) x→0 cos(x) sin(x) where l’Hospital’s rule has been used in the second equality. The following transformations might be helpful to investigate it: f −g =f 1− g f = 1 − g/f 1/f or f − g = g f g−1 = f /g − 1 . The limit of f /g is an indeterminate form of type ∞/∞ and can also be investigated by l’Hospital’s rule. In this case. n = 0. the functions f and g have the same asymptotic behavior. bounded (e.. 1/g If f (x)/g(x) → 1. then f − g = g(f /g − 1) → ∞ · (k − 1) = −∞. sin(x) has zeros at x = πn. g increases faster than f as x → a. Analyzing the Shape of a Graph To analyze the shape of a graph y = f (x). Use T2 to approximate cos(x) and T3 for sin(x): x2 /2 x/2 x 1 − cos(x) ≈ = ≈ . then the indeterminate diﬀerence is equivalent to an indeterminate form of type 0/0 and can be investigated by l’Hospital’s rule. where k can be either a nonnegative number or k = ∞.g. f increases faster than g as x → a. APPLICATIONS OF DIFFERENTIATION 26. If k > 1 or k = ∞. If k < 1. If c = 0.4.. The limit of f (x) − g(x) as x → a is called an indeterminate diﬀerence. where c is a number. Indeterminate Differences ∞−∞. that is. For example. For example. The function sin(x) is odd. it is useful to have a clear idea of how the basic functions behave. while cos(x) is even. ±2.92 4.. lim ln(x) + + + x + x x→0 x→0 x→0 x If k = 1. while cos(x) vanishes at π/2 + πn. then it is also possible that f − g → c. ±1.. For example. Their ratio tan(x) = sin(x)/ cos(x) . sin(x) and cos(x) are regular everywhere. Note that Taylor polynomials allow us to ﬁnd the local behavior of this function near x = 0. Suppose that f (x)/g(x) → k as x → a. In addition. and periodic with a period of 2π. 27. 1 1 1 = lim 1 + x ln(x) = lim (1 + 0) = ∞. .  sin(x) ≤ 1). f and g increase asymptotically at the same rate: f −g → 0. Suppose f (x) → ∞ and g(x) → ∞ as x → a. then f − g = g(f /g − 1) → ∞ · (k − 1) = ∞. x→0 lim 1 1 sin(x) 1 − cos(x) = lim = 0.
and the logarithmic function ln(x) as x → ∞.2. and Logarithmic Functions. there exists a positive integer N such that n < N or xn < xN . 27. L’Hospital’s rule can successively be applied until the indeterminate form is resolved: ex ex ex ex = ∞. g = xn . But ex grows faster than xN . it follows from 1/g(x) → 0 and f (x)−g(x) → 0 . and so on. write Δx = x − π/2 (the deviation of x from π/2). then the limit of f /g can be studied for trial g’s with diﬀerent growth. Exponential. = lim = · · · = lim lim n = lim x→∞ x x→∞ nxn−1 x→∞ n(n − 1)xn−2 x→∞ n! The conclusion is true for any real n. Indeed. g = mx (for slant asymptotes). Does this mean that g and f have the same asymptotic behavior? The answer is “no. Let us analyze the ratio f /g as x → ∞. To simplify the notation.” If the indeterminate form f (x) − g(x) of type ∞ − ∞ converges ∞ to 0 as x → ∞. The exponential function grows faster than the power function.1. The conditions of l’Hospital’s rule are satisﬁed: ex → ∞ and xn → ∞ as x → ∞. 3 /6 2 /6 −Δx + (Δx) Δx 1 − (Δx) Δx x − π/2 where the second ratio in the product has been approximated by 1 because Δx is small. 27. as in Example 48. For any real n. Similarly. the behavior of tan(x) near π/2 can be understood with the help of Taylor polynomials. How does tan(x) behave. Since tan(x + π) = tan(x). this behavior repeats itself at near every root of cos(x). The asymptotic behavior of rational functions is easily determined by the highest powers of the numerator and denominator. it is straightforward to show that the logarithmic function grows slower than any power function: 1 ln(x) (ln(x)) 1 x = lim = lim = lim =0 lim x→∞ xn x→∞ (xn ) x→∞ nxn−1 x→∞ nxn for any n > 0 (n may be any positive real number here). say. Let us approximate sin(x) by T1 (x) = 1 + (x − π/2) and cos(x) by T3 (x) = −(x − π/2) + (x − π/2)3 /6. Then tan(x) ≈ 1 1 + Δx 1 1 1 + Δx =− ≈− =− . In general.27. near x = π/2? Since both sin(x) and cos(x) are smooth near x = π/2. the exponential function ex . Asymptotes at x → ±∞. x > 1. g = ln(x). Let us compare the growth of the power function xn . Let f (x) = ex and g(x) = xn . if limx→∞ f (x) is inﬁnite. then the indeterminate form f (x)/g(x) of type ∞ converges to 1. ANALYZING THE SHAPE OF A GRAPH 93 is not deﬁned at roots of cos(x). Growth of the Power. Suppose g(x) is found such that f (x)/g(x) → 1 as x → ∞.
So these are really guidelines. They are not always easy to ﬁnd. So. a+p]. Evidently. (II) Roots of f and the value f (0). Given a function f . [a+p. APPLICATIONS OF DIFFERENTIATION that (1/g(x))(f (x) − g(x)) = f (x)/g(x) − 1 → 0. Guidelines for Analyzing the Shape of a Graph.. the the local behavior of f near a must be studied (see below). Examples are sin(x). the indeterminate form f − g of type ∞ − ∞ must still be investigated in order to determine whether or not g has the same asymptotic behavior as f . If h(c) = 0. The converse is not true. p = 2π. If one of the limits or both is inﬁnite. The domain consists of all values of x at which f (x) is deﬁned. then f is periodic and p is its period. Consider the following simple example: f (x) = x + sin(x) and g(x) = x. investigate the local behavior of f near c (e. even if g is found to have the property f (x)/g(x) → 1 as x → ∞. The following guide lines are useful for sketching the graph of a function.94 4. f (x)/g(x) = 1 + sin(x)/x → 1 as x → ∞.3. (IV) Asymptotes and asymptotic behavior of f . If f (−x) = f (x) (an even function) for all x in the domain. with the help of Taylor polynomials if possible). 27. If f is a ratio f = h/g. not a “mustdo” algorithm. ﬁnd the limits limx→c± f (x). and so on for any a. If there is a number p such that f (x+p) = f (x). or both. The asymptotic behavior of f (x) near c and for large positive and negative x determines . but not at a. then vertical asymptotes are x = c. then the graph y = f (x) is symmetric about the y axis. The graph y = f (x) repeats itself on intervals of length p. ﬁnd: (I) Domain. tan(x). If f (−x) = −f (x) (an odd function) for all x in the domain. It should be noted that not all the steps can always be carried out. a+2p]. Roots of f (x) deﬁne the intercepts of the graph y = f (x) with the x axis. for example [a. it is a collection of intervals. then the graph y = f (x) is symmetric about the origin (or the rotation through 180◦ about the origin). If f is deﬁned for x > a or x < a. p = π. The value f (0) (if x = 0 in the domain of f ) deﬁnes the intercept of y = f (x) with the y axis. But the limit limx→∞ (f (x) − g(x)) = limx→∞ sin(x) does not exist. where c solves g(c) = 0 and h(c) = 0. Typically.g. p = 2π/4 = π/2. This depends very much on the complexity of the function in question. cos(4x). (III) Symmetry and periodicity.
Yet. Intervals of increase (f > 0) and decrease (f < 0). then the function f is increasing or decreasing at c and hence must change its sign. the function (4. If f (c) = 0 or f does not exist at c. then f is negative near c if f has a local maximum at c and f is positive near c if it has a local minimum at c. 0). The graph y = x1/3 has a vertical tangent line at x = 0. The sign of f (x) must be studied. If. −1). However. a root of f coincides with its critical point. So f (x) diverges as x → 0. for example. Critical numbers of f .g. they show how much the function increases between two critical points). For example. These intervals are generally separated by critical numbers and vertical asymptotes. 1). Critical numbers are solutions of f (x) = 0 or the values of x where f (x) does not exist. If f (c) = 0. Roots of f generally separate the intervals of positive and negative values of f . Vertical asymptotes can also separate intervals of positive and negative values of f . As a consequence of this study. So f is negative on (−∞. Let c be a root of f . this is not always the case.12) has one root x = 0 and two vertical asymptotes at x = −1 and x = 1. If f > 0 (f < 0) on an interval. Values of f at critical points and inﬂection points. and positive on (1. then the line tangent to the graph y = f (x) at x = c is vertical. Intervals of positive and negative values of f . These are the intervals where the graph y = f (x) lies above or below the x axis. then f increases (decreases) on it (the increasingdecreasing test). that is.27. positive on (−1. the nature of each critical point is established by the ﬁrst derivative test. For example. ANALYZING THE SHAPE OF A GRAPH 95 (V) (VI) (VII) (VIII) (IX) the shape of y = f (x) near the vertical asymptotes and the asymptotic shape of the graph when x → ±∞. negative on (0. So the sign of the derivative f must be investigated near c (the ﬁrst derivative test). Intervals of upward and downward concavity. These values set relative scales of the graph (e. . ∞). f (x) = x1/3 and f (x) = 1/(3x2/3 ). These intervals are separated by inﬂection points and vertical asymptotes.. the second derivative test and Taylor polynomials can be used to establish the nature of a critical point of f . f (x) tends to ∞ (or −∞) as x approaches c.
∞). (V) The derivative reads f (x) = x−3 2 . (VIII) The second derivative reads 4 x2 − 3x + 18 . Near x = 0. Also. the graph approaches the line y = x from below. it has a downward cusp y = f (x) ≈ 61/3 (x − 6)2/3 . So x = 0 is the inﬂection point. The intercept with the y axis is f (0) = 0. the graph looks like y = f (x) ≈ 62/3 x1/3 . the graph looks like the . 3. the graph has vertical tangent lines at x = 0 and x = 6. 0) (the graph is upward concave) and f < 0 on (0. Sketch the graph of f (x) = x1/3 (x − 6)2/3 . one has f (x) = x(1−6x−2/3 ) ≈ x. So the graph has the asymptotic behavior y = x. and f > 0 (f is increasing) on (6. The factor (x − 6)4/3 cannot be negative. For large values x → ±∞. respectively. while near x = 6. Therefore. and 6. and it is neither odd nor even. 6) and (6. near x = 3. (III) There is no vertical asymptote. which is also veriﬁed by the ﬁrst derivative test below). Thus. and (x − 6)−1/3 . f (x) → ∞ as x → 0 and it tends to ±∞ as x → 6± . 0). There is no sign change at the root x = 6 (f must have a local minimum at 6. (II) The roots of f are x = 0 and x = 6 (the intercepts with the x axis). x−2/3 . f has a local maximum at x = 3 and a local minimum at x = 6 by the ﬁrst derivative test. (IV) The function is not periodic. f > 0 on (−∞. 0) (f > 0) and above it on (0. APPLICATIONS OF DIFFERENTIATION Example 51. By investigating the signs of these factors on the intervals separated by the critical points. Since 1 − 6x−2/3 < 1 (for large x). f (x) = − 9 x5/3 (x − 6)4/3 The polynomial x2 − 3x + 18 > 0 for all x because it has no real root. f < 0 (f is decreasing) on (3. Also. (VI) The graph lies below the x axis on (−∞. Solution: Following the preceding guidelines: (I) The domain is the whole real line. f > 0 (f is increasing) on (0. ∞) (the graph is downward concave). (VII) The derivative is a product of three factors x − 3. In particular. 3).96 4. The critical points are 0. either. ∞). The sign of f is determined only by that of x5/3 . we can conclude that f > 0 (f is increasing) on (−∞. 2/3 (x − 6)1/3 3x It vanishes at x = 3 and does not exist at x = 0 and x = 6. 6).
b]) and compare them with values of f at its local maxima and minima to determine absolute extreme values of f . (II) Investigate the nature of the critical points (local minima and local maxima). one has to: (I) Find all critical points of f . The simplest optimization problem arises when Q depends on a single variable x such that Q is a function f (x). But what a calculator cannot do is to provide details of the local behavior of a function near points of interest (e. In science and engineering. Here a good working knowledge of calculus becomes indispensable. asymptotes. (III) Calculate the values of f at the endpoint of the interval [a. OPTIMIZATION PROBLEMS 97 downward parabola y = T2 (x) = f (3) + f (3)(x − 3)2 /2 = 3 − 4 (x − 3)2 . a calculator would show that there is a slant asymptote. Optimization Problems Suppose that a quantity Q depends on some variables. 28. To determine extreme values of f . The problem of optimizing Q implies ﬁnding the values of the variables at which the quantity Q attains it maximal or minimal value.. this is often much more important than the overall shape of a graph. critical points. .28. etc. Theorem 35 (First Derivative Test for Absolute Extreme Values). Suppose c is a critical point of a continuous function f deﬁned on an interval.). a cusp at x = 6. The following test can also be used to ﬁnd absolute extreme values of a function. while a graphing calculator is just a useful tool that greatly facilitates the study of a function. The latter problem has been analyzed in Section 22. 2 9 In the age of graphing calculators. or at the local maximum. and a local maximum at x = 3. The ﬁrst or second derivative tests can be used for this purpose. but it would not be able to determine the local behavior of the function near the cusp. or in the asymptotic region.g. In the previous example. Then the optimization problem is reduced to the problem of ﬁnding extreme values of f (x). b] (if extreme values are sought only in [a. the preceding guidelines might look rather obsolete because ﬁnding the shape of a graph can be done just by hitting the right calculator buttons.
biology. top. for example. Since f (x) < 0 for all x > c. A typical optimization problem may be split into three basic steps: (I) Identify a variable with respect to which a quantity Q is to be optimized. (III) Solve the mathematical problem of extreme values of f . Design an aluminum can of volume V = 300 cm3 to minimize the cost (or the amount) of material needed to make the can. APPLICATIONS OF DIFFERENTIATION (I) If f (x) > f (c) is the (II) If f (x) < f (c) is the 0 for all x < c and f (x) < 0 for all x > c. An aluminum can has the shape of a cylinder of radius r and height h. Recall Example 42. provided Equation (4. Example 52. As a rule. (II) The surface area is the sum of the areas of the side. the function decreases for all x > c. Q = f (x). and so on. Its solution is rather straightforward. then absolute maximum value of f .4)! This is quite typical for optimization problems. Consider case (I). and their formulation as the mathematical problem of extreme values requires a speciﬁc knowledge outside mathematics. So one has to minimize the surface area S. the least amount of material is used when the surface area of the can is minimal. the variables r . By continuity of f . The volume is V = πr2 h. Since the volume is ﬁxed. the problem of ﬁnding an optimal angle for a projectile becomes far more diﬃcult. economics. But the variables r and h are not independent because the volume is ﬁxed.98 4. the laws of physics as in Example 42. Since f (x) > 0 for all x < c. the number f (c) must be the largest value of f . 0 for all x < c and f (x) > 0 for all x > c. Case (II) is proved similarly. Without it. This is a typical optimization problem. Solution: Following the preceding guidelines: (I) Apparently. (II) Use the laws of a speciﬁc discipline to express Q as a function f of that variable. which depends on r and h. Its major part now involves a derivation of Equation (4. chemistry. The conclusion of the theorem is easy to understand. then absolute minimum value of f . the function increases for all x < c.4) is given. they arise in various disciplines. and bottom of the can: S = 2πrh + πr2 + πr2 = 2πrh + 2πr2 .
Hence. Even in the example of an aluminum can. The corresponding value of h is then found from the relation h = V /(πr2 ). S can be written as a function of the radius r only: 2V V + 2πr2 . Therefore. Verify this! A Curious Fact. What is the shape of a container that has the smallest surface area at a given volume? It can be proved by the calculus of variations that such a container must be a sphere. when the relation r = V /(πh) is substituted into the expression for the surface area to obtain S(h). S (r) = − 2 + 4πr = 2 r3 − r r 2π So the critical point is rc = V 2π 1/3 . The preceding problem is essential to reduce waste from plastic. The dimensions of the can with minimal costs of material for a given volume V are V 4V 1/3 V 1/3 ≈ 3.2 cm. h = 2 = = 2rc ≈ 7. the function S(r) attains its absolute minimum at rc by the ﬁrst derivative test for absolute extreme values. that is. (III) The function S(r) is diﬀerentiable for all r > 0.28. Check out a local supermarket to see if manufacturers use this fact! 2 Remark. Should only spherical containers be used to . The critical point of S(h) can be shown to be hc = 2rc . It can be stated more generally. S(r) = 2πr 2 + 2πr2 = πr r One has to ﬁnd the value of r > 0 at which S(r) attains its absolute minimum. and aluminum containers.6 cm. The same conclusion could be reached if S is expressed as a function of the height h only. all its critical points are roots of the derivative: 2V 4π V = 0. OPTIMIZATION PROBLEMS 99 and h are related as h = V /(πr2 ). In the previous example. the optimal dimensions appear to be as close to those of a sphere as the cylindrical geometry would allow: The height and diameter are the same. S has been expressed as a function of r. r= 2π πrc π The analysis has shown that the height and diameter of a can of a given volume must be equal in order to minimize the cost of material (or the surface area of the can). glass. Since S (r) < 0 for all 0 < r < rc and S (r) > 0 for all r > rc .
it is generally expected to be a decreasing function because the price per unit usually goes down when a larger number of units is sold. Also. Thus.100 4. rectangular containers are far better. A market survey indicates that. It determines the change in the revenue when the number of units sold increases from x to x + 1. we introduced the cost function C(x). Naturally. if x increases by an amount of Δx = 20. the purchase and transportation costs will go down by $10 per each weekly order increase of 5 units. APPLICATIONS OF DIFFERENTIATION “go green”? To answer this question. The production waste for containers of diﬀerent shapes is diﬀerent. a far more complicated optimization problem must be studied. the proﬁt function P (x) = R(x) − C(x) = xp(x) − C(x) determines the total proﬁt if x units are sold. Finally. the number of units sold will increase by 20 a week. A small store sells jeans at a price of $80 per pair. The total revenue R(x) = xp(x) is called the revenue function. spheres are not optimal for storage and hence for transportation. which is the cost of producing x units of a certain product. including the cost of transportation. The derivative R (x) is called the marginal revenue function. Also. Storage maintenance and transportation require energy (hence carbon emissions). The derivative C (x) is the marginal cost. The function p(x) is also called the price function. the price function decreases by Δp = 10 (the rebate). Example 53. Finally. for each $10 rebate oﬀered to buyers. It determines the cost of increasing production from x units to x + 1 units. In Section 19. what about consumers’ reaction to spherical Coke cans in a vending machine or spherical aluminum cans in the supermarket? 28. Every week 60 units are sold. Applications to Economics. The cost to the store for 60 units is $2500. Its derivative P (x) determines the change in the total proﬁt when the number of units sold increases from x to x + 1. For example. Let p(x) be the price per unit that a company can charge if it sells x units. its value at a particular number of sold units x = x0 = 60 is p0 = p(60) = 80. the ratio m = −Δp/Δx = −1/2 is the rate of change of p(x) . How large a rebate should the store oﬀer to maximize its proﬁt? Solution: (I) What is known about the price function p(x)? First.1. The standard optimization problem here is to minimize costs and maximize revenues and proﬁt.
C(x) = C0 + M (x − x0 ) = 2500 − 4(x − 60) = 2740 − 4x . its value at a particular number of supplied units x = x0 = 60 is C0 = C(60) = 2500. Note also the increase in the weekly proﬁt: P (60) = $2. This is a beneﬁt of market surveys: They estimate the derivatives (or trends) of the price functions. the store should oﬀer a rebate of $27 to maximize its proﬁt. after a successful rebate campaign.300 whereas P (114) = $3. the function has one critical point x = 114 at which P (x) attains its absolute maximal value by the ﬁrst derivative test for absolute extreme values. the cost function is generally highly nonlinear. NEWTON’S METHOD 101 (the minus sign indicates the decrease in p(x)).29. So. Indeed. the rebate should be p(60) − p(114) = 80 − 53 = 27. consider f (x) = . (IV) If x = 114 units can be sold. Unfortunately. the price per unit is p(114) = 110 − 57 = 53. which cannot possibly be true. Thus. it vanishes at x = 685 and becomes negative after that. In fact. an increase in sales leads to a decrease in the demand for that particular item. that is. Similarly. 29. 2 Since P (x) = 114 − x. the cost function decreases by ΔC = 20 if x increases by Δx = 5. 2 Remark. an analytic solution of the equation f (x) = 0 is impossible in many practical cases. in the previous example. Its linearization near a particular x = x0 cannot be valid for all x > x0 . the store would need a new market survey to estimate p (114) and get the linear approximation at x = 114. So the ratio M = −ΔC/Δx = −4 is the rate of change of C or the marginal cost. Therefore. For example. Newton’s Method Finding roots of a function f (x) is an important problem in various applications.758. 2 (II) What is known about the cost function C(x)? First. So the price function is 1 p(x) = p0 + m(x − x0 ) = 80 − 1 (x − 60) = 110 − 2 x . (III) One has to maximize the proﬁt function: P (x) = xp(x) − C(x) = 114x − 1 x2 − 2740. Naturally. The price may go up then. the linear (tangent line) approximation has been used to get the unknown price and cost functions in the previous example. Also.
. 29. the tangent line constructed at x = x2 . f (x1 ) Note that the root of L(x) exists if f (x1 ) = 0 (otherwise. then there exists x1 suﬃciently close to r such that the sequence (4. The graphs y = x and y = e−x intersect at some x between 0 and 1. In other words..1. It provides a recurrence relation that allows us to compute a root of a diﬀerentiable function with any desired accuracy. APPLICATIONS OF DIFFERENTIATION x − e−x . which is denoted by x2 : L(x) = 0 −→ x = x2 = x 1 − f (x1 ) .18) xn+1 = xn − f (xn ) . can be expected to approximate f (x) even better near its root because x2 is closer to the root than x1 . f (xn ) n = 1. the sequence elements are calculated with a particular number of signiﬁcant digits (decimal places). but does not coincide with it. But how can it be calculated? Here we present one of the simplest methods. the value f (x2 ) is closer to 0 than f (x1 ): 0 < f (x2 ) < f (x1 ) (the absolute value is necessary if the function takes negative values). . In practical terms. L(x) = f (x2 ) − f (x2 )(x − x2 ). Newton’s Recurrence Relation for Finding a Root..18) converges to the root n→∞ lim xn = r .. Suppose f (x) has a root near x1 . The equation f (x) = 0 is equivalent to x = e−x . If f has a single root r in an open interval and f (x) = 0 on the interval. Therefore. Then r = xn+1 is correct to the relevant decimal places. The procedure may be recursively repeated to generate a sequence of values xn : (4. Consider the tangent line approximation of f near x1 : L(x) = f (x1 ) + f (x1 )(x − x1 ). It is easy to ﬁnd the root of L(x). Since L(x) is only an approximation to f (x). known as Newton’s method. Newton’s recurrence is applied until xn+1 and xn agree to all the relevant decimal places. 2. provided f (xn ) = 0. the tangent line is horizontal and cannot have any root). So f (x) has a root. The root of the new tangent line is given by the same expression as before where x1 should be replaced by x2 : x3 = x2 − f (x2 )/f (x2 ). Theorem 36. the number x2 is closer to the root of f than x1 .102 4.
Unfortunately.5 . 2 29. The function is continuously diﬀerentiable because f (x) = 3x2 − 6x approaches 0 as x → 0+ and x → 2−. 2). 2] and f (x) = 2 when x < 0 and f (x) = −2 when x > 2. The graphs y = x and y = e−x intersect between 0 and 1.29. 1). then x2 would be outside the interval (0. x4 = 0. That is. (II) A poor choice of the initial point may lead to a cycle in Newton’s sequence. So the root lies in the interval (0. If 0 < x1 < 2 is close enough to either x = 0 or x = 2. Take f (x) = x3 − 2x + 2 and x1 = 0. (III) Pick an initial value of x1 = 0. The function has the root x = 1 and f (x) < 0 in the open interval (0. NEWTON’S METHOD 103 Example 54. suppose . f (0. (III) If f (x) → ±∞ as x approaches a root r (the graph y = f (x) has a vertical tangent line at the root). 2] is not relevant for the conclusion. Newton’s sequence is a cyclic sequence. A few possible bad behaviors of Newton’s sequence are useful to keep in mind.2. Find the root of f (x) = x − e−x that is correct to six decimal places. for example. or it may diverge for any initial point. Note that the actual behavior of f (x) outside the interval [0. x3 = 1 − 1/1 = 0 = x1 . the next elements are x2 = 0 − 2/(−2) = 1. never converging to it. which never converges. The essential point here is that such a situation is likely to occur when f (x1 ) is close to 0.567143) = −4.567143 . The choice depends very much on the function in question.5 × 10−7 ). Pitfalls in Newton’s Method. (I) A bad choice of the initial point x1 can produce the value of x2 that is a worse approximation to the root than x1 .566311 . the function f (x) = x3 − 3x2 + 2 in the interval [0. Consider. So the root r = 0. there is no unique recipe for choosing an initial point in Newton’s sequence. x5 = 0. 2). x3 = 0. Since f (x) = 3x2 − 2. In practice. To understand this phenomenon.567143 is correct to six decimal places (in fact. Then Newton’s sequence for six decimal places is: x1 = 0 . x2 = 0. (II) Verify the condition f (x) = 0: f (x) = 1 + e−x > 0 for all x. Newton’s sequence may oscillate around r. Solution: (I) Determine the position of the root ﬁrst.567143 . The initial point must be taken closer to the root. it is determined by trying diﬀerent values.
Fn = Fn−1 (1 + x) − A = Fn−2 (1 + x)2 − A[(1 + x) + 1] = · · · = F0 (1 + x)n − A[(1 + x)n−1 + (1 + x)n−2 + · · · + (1 + x) + 1] (1 + x)n − 1 . Here F1 is the future value of the loan after one payment. The interest rate per month is x = I/12.104 4. Newton’s sequence (4. Suppose that one takes a loan of P dollars (the principal) for n months with an annual interest rate of I%. x where. Let Fk be the amount yet to be paid after k monthly payments. after one more payment. For example. After one payment. Newton’s sequence diverges: xn+1 = (1 − 3)xn = −2xn for any choice of the initial point x1 = 0. It is called the future value of the loan. if the root is accurately guessed!). What is the monthly payment? It is calculated as follows. the geometric sum formula 1 + q + q 2 + · · · + q n−1 = (q n − 1)/(q − 1) has been used (it can be proved by dividing the polynomial q n − 1 by the polynomial q − 1).. For example. xn → r is equivalent to yn = xn − r → 0. its value is the value F1 plus interest F1 x minus the payment A. Let A be the monthly payment. The sequence Fk satisﬁes the conditions: F0 = P and Fn = 0 (the loan and interest are paid oﬀ after n payments). Understanding Money Loans. Then F1 = P + P x − A . and so on. APPLICATIONS OF DIFFERENTIATION f (x) behaves near its root r as f (x) ≈ a(x − r)ν . Apparently.06 and x = 0.005. Fk = Fk−1 + Fk−1 x − A. For f (x) = x1/2 (ν = 1/2). Since f (x)/f (x) = ν −1 (x − r).18) xn+1 = xn − ν −1 (xn − r) can also be written as xn+1 − r = q(xn − r). where q = 1 − ν −1 . But the sequence yn+1 = qyn = q 2 yn−1 = · · · = q n+1 y1 would converge if q = 1 − ν −1 < 1 or ν < 1/2 unless y1 = 0 (i..19) A= 1 − (1 + x)−n = P (1 + x)n − A . ν > 0. for f (x) = x1/3 (ν = 1/3).. Each payment includes the payment toward the principal and the interest. the loan value is F1 . After n payments.. F2 = F1 + F1 x − A . an annual interest rate of 6% means that I = 0. 29. Newton’s sequence oscillates xn+1 = (1 − 2)xn = −xn . in the last equality. So.06/12 = 0. (4.3. which is the loan P plus the monthly interest P x minus the payment A. and a is constant. the monthly payment is Px . .e. Since Fn = 0.
then x can be found by solving Equation (4. which can be written in a more convenient form as (4.01).449. A dealer oﬀers a car at a price of $10.20) if A = 311. initiated again at x1 = 0. The interest paid is nA − P = $66. and n = 5 × 12 = 60. P = 15. (II) For the second car. It might be the case that the loan for a higherquality car.000. substitute x = 0.00917. n = 120. or an annual interest rate of 11%. P = 10.000.005. might have a lower interest rate. Newton’s method yields x = 0.000.01. There is another car being oﬀered at a price of $15.20.220. this is the rootﬁnding problem! It can be solved by Newton’s method. It can also be sold for payments of $217. in Equation (4. and n are given. than the loan for a cheaper car (smaller monthly payments). In other words. The number x should be found up to ﬁve decimal places. which corresponds to an annual interest rate of 12% (i.000.41.000 for 10 years at a ﬁxed annual interest rate of 6% implies 120 monthly payments of $2. Up to ﬁve decimal places.e.12 and x = 0. In this case.12/12 = 0. Indeed. a loan of $200. Newton’s method.000.38.41004. 2 . This corresponds to an annual interest rate of 9%. one has to ﬁnd the root of Equation (4. Which loan has a lower interest rate? Solution: (I) For the ﬁrst car.449.00750 (up to ﬁve decimal places). and n = 5 × 12 = 60.20) f (x) = P x(1 + x)n − A(1 + x)n + A = 0 . one has to ﬁnd the root of Equation (4. If A. the monthly payment A and its number n are known.06/12 = 0. and P = 200..01.42.20.38 per month for 5 years.20) if A = 217. meaning a higher price and higher monthly payments.19). one has to know the interest rate before signing up. To assess the deal (or to pick one among a few oﬀered at diﬀerent dealerships). a dealer might oﬀer a monthly payment for a few years if a customer cannot aﬀord to pay the price in full. Example 55.19). So the second loan has a lower interest rate. the loan amount P is the price of the car. When selling a car. The total amount paid after 10 years is 120 × A = $266. NEWTON’S METHOD 105 For example. It is convenient to initiate Newton’s sequence at x1 = 0.29.42 per month for 5 years. then A ≈ 2220. which corresponds to I = 12x = 0. which can also be sold for payments of $311. yields the root x = 0.11004. I = 0. P . which is suﬃcient our purposes.
106 4. F (x) = G (x) = f (x) on (a. For example. the monthly payments appear in a similar proportion 311.8 m/s2 is the acceleration of a free fall. By Corollary 5. What is s(t)? Definition 35. one might want to ﬁnd the position as a function of time. If F is an antiderivative of f on an interval I. In fact. It has also been proved that (ln x) = 1/x. n = −1. At t = 0. it follows that if f (x) = xn . it is not diﬃcult to ﬁnd the corresponding antiderivative. This is explained by the following example. the chalk stops and begins to fall back. Uniqueness of the Antiderivative. For many basic functions. Thus. Antiderivatives In many practical problems. where g = 9. Suppose F (x) = f (x) for all x in an interval (a. it is F (x) = ln x + C. the chalk has a velocity of v(0) = v0 . then the most general antiderivative of f on I is F (x) + C. the antiderivative is F (x) = xn+1 /(n + 1). where s (t) = v(t). The velocity of a piece of chalk thrown vertically upward with a velocity of v0 is v(t) = v0 − gt. n = −1.1. Recall that Corollary 5 does not hold for the union of disjoint intervals. Similarly.43.42 ≈ 1. b). APPLICATIONS OF DIFFERENTIATION It is interesting to note that the car prices diﬀer by 50%. F and G may only diﬀer by a constant: G(x) = F (x)+C. is F (x) = xn+1 /(n + 1) + C. If . and for f (x) = 1/x. 30. v = v(t). a function is to be recovered from its derivative. at t = v0 /g. the general antiderivative of the power function f (x) = xn . A function F is called an antiderivative of f on an interval I if F (x) = f (x) for all x in I. For example. Eventually. Then it begins to slow down (v(t) decreases because of gravity). This nonuniqueness of the antiderivative is not a drawback of the concept but rather a great advantage. from the rule (xn+1 ) = (n+1)xn . s = s(t). b). For example. where C is an arbitrary constant. Indeed.38/217. Theorem 37. if the velocity is given as a function of time. So the function F (x) = ln x is the antiderivative of f (x) = 1/x for all x = 0. any two antiderivatives of the same function may diﬀer at most by a constant on an interval. they are not! 30. Is such an F (x) unique? This question is answered by Corollary 5 given at the end of Section 23. that is. let F (x) and G(x) be antiderivatives of f (x). The oﬀers might look like as nearly the same deal.
where the linearity of the derivative has been used. It is easy to ﬁnd a particular antiderivative of v(t) using the antiderivative of the power function: h(t) = v0 t − gt2 /2 (indeed. For example. while the h(t) are not. The idea is useful for other basis functions. the constant C can be ﬁxed by specifying the value of the antiderivative at a particular point. In other words. h(0) = 1. (tan(x)) = (sec(x))2 . If one demands that the graph y = F (x) + C should pass through a particular point (x0 . An antiderivative of the power function has been found by studying the derivative of the power and logarithmic functions. ﬁnd f (x) if f (x) = 3x2 and f (2) = 1. is kF . From f (2) = 1. 30. (− cos(x)) = sin(x). h(0) = C is the height at the very moment when the chalk was thrown upward. in both the cases. What is the physical signiﬁcance of the general antiderivative h(t) = C + v0 t − gt2 /2? It appears as if the position of the chalk relative to the ﬂoor is not uniquely determined. where k is an arbitrary constant. Thus. (sin(x)) = cos(x). An antiderivative of kf . In particular. So. Linearity of the Antiderivative. y0 ).30. Then an antiderivative of f + g is F + G. v(t) is the same. Indeed. In the ﬁrst case. This feature of the general antiderivative can also be visualized by plotting the graphs y = F (x) + C for diﬀerent values of C. that is. h (t) = v0 − gt). antidiﬀerentiation is a linear operation just like diﬀerentiation itself. For example. from the right to left. Therefore. All such graphs are obtained from the graph y = F (x) by rigid translations along the y axis. (F + G) = F + G = f + g and (kF ) = kF = kf . (sin−1 (x)) = √ 1 1 . Let F and G be antiderivatives of f and g. the height is an antiderivative of v(t). ANTIDERIVATIVES 107 h(t) is the height of the chalk relative to the ﬂoor.2. whereas in the second case. The table of derivatives of basic .3. this table says that the general antiderivative of f (x) = 1/(1 + x2 ) is F (x) = tan−1 (x) + C. Antiderivatives of Basic Functions. 2 1 + x2 1−x In particular. But the chalk could be thrown upward at 1 m above the ﬂoor or 2 m above it with the very same initial velocity. h(0) = 2. 30. it follows that f (2) = 8 + C = 1 or C = −7. f (x) = x3 − 7. (tan−1 (x)) = . (ex ) = ex . The general antiderivative of 3x2 is f (x) = x3 + C. respectively. These properties are easily veriﬁed. that is. then h (t) = v(t). Their antiderivatives can be found by reading the table of derivatives of basic functions backward. then C is ﬁxed: y0 = F (x0 )+C or C = y0 − F (x0 ).
the general antiderivative reads: 1 1 2 F (x) = − e−2x + sin(4x) + x − tan−1 (x) + C . Solution: (I) By the linearity of the antiderivative. where C2 is another arbitrary constant. To ﬁnd it. Hence. each time antidiﬀerentiation is carried out. cos(4x). Hence. F (x) = 3x2 +C1 . Taking the antiderivative one more time yields F (x) = x3 . where C1 is an arbitrary constant. it follows that (−e−2x /2) = e−2x . What is F (x) if F (x) = f (x) for a given f (x)? Or. the corresponding general antiderivative must be used. a simple algebraic manipulation leads to the goal: x2 1 + x2 − 1 1 = =1− . (III) Similarly.108 4. Its general antiderivative reads F (x) = x3 +C1 x+C2 . (II) From (e−2x ) = −2e−2x . Thus. an antiderivative of e−2x is −e−2x /2. what is F (x) if F (n) (x) = f (x)? A function F that satisﬁes the latter condition is called an antiderivative of f of the nth order. (IV) The table of derivatives does not appear helpful in the case of x2 /(1 + x2 ). one gets F (x) = 3x2 . one has to antidiﬀerentiate f n times. However. it is suﬃcient to ﬁnd antiderivatives of e−2x . Find the general antiderivative of f (x) = e−2x + cos(4x) + x2 /(1 + x2 ). it follows that an antiderivative of cos(4x) is sin(4x)/4. Taking the ﬁrst antiderivative of f (x) = 6x. For example. F (x) = 6x. In the preceding example. Similarly. the general antiderivative of f (x) = 6x is 3x2 + C1 . Example 56. The general antiderivative is obtained by adding a general constant to the sum of the particular antiderivatives of the previous three functions. and x2 /(1 + x2 ). What about the uniqueness of higherorder antiderivatives? To ﬁnd the general antiderivative of a higher order. which is a general linear function: (C1 x + C2 ) = 0. from (sin(4x)) = 4 cos(4x). 2 4 30. Thus. APPLICATIONS OF DIFFERENTIATION functions combined with the linearity of antidiﬀerentiation is a good source of antiderivatives of more complicated functions. 2 2 1+x 1+x 1 + x2 So its antiderivative is x − tan−1 (x).4. Antiderivatives of Higher Order. the general second antiderivative can be obtained from a particular one by adding a general function whose second derivative is 0. if F (x) is a particular function . more generally.
then the general antiderivative of the nth order is F (x) + C1 xn−1 + C2 xn−2 + · · · + Cn−1 x + Cn . Why? 2 . Why? The following example illustrates the signiﬁcance of arbitrary constants in general higherorder antiderivatives. one has h (t) = −9. Then its velocity is v(t) = h (t).8t2 /2 + 7t + 1. (II) The general second antiderivative of the constant function −9.5 m above the ﬂoor.8.5 = 0 is t ≈ 1. Since all freefalling objects have an acceleration of 9. Hence. The initial height is h(0) = 1. Example 57. where C1 . the initial conditions of the motion must be used. where C1 and C2 are arbitrary constants. and its acceleration is a(t) = v (t) = h (t).8 m/s2 . Since v(t) = h (t) = −9.. A piece of chalk is thrown vertically upward at a speed of 7 m/s and at 1. at the time moment t > 0 when h(t) = 0. The chalk hits the ﬂoor when its height vanishes. Note that this analysis applied only when f was deﬁned in an interval.5. . one can infer that v(0) = C1 = 7. The initial velocity is v(0) = 7. h(0) = C1 = 1.5. that is. Cn are arbitrary constants.8t+C1 ... Indeed. (IV) The height is h(t) = −9.8t2 /2 + C1 t + C2 . The minus sign indicates that the acceleration is directed downward. The maximal height reached by the chalk is 4 m. the nth derivative of a polynomial of degree n − 1 is 0. ANTIDERIVATIVES 109 that satisﬁes the condition F (n) (x) = f (x). When does the chalk hit the ﬂoor? Solution: (I) Let h(t) be the height of the chalk relative to the ﬂoor.5. (III) To ﬁx C1 and C2 .8 is h(t) = −9. Any freefalling object near the surface of the Earth has the freefall acceleration of 9.62 s. A positive root of the quadratic equation −9.8t2 /2 + 7t + 1.30.8 m/s2 .
.
... The area Sk cannot exceed the area of a rectangle with base 1/n and U height f (xk ) = (k/n)2 . the condition k ≤ n has been used.. in the second inequality.. n. 2. What is the area below the graph y = f (x) and above the interval 0 ≤ x ≤ 1? This question is easy to answer because the area in question is the area of the right triangle with catheti of unit length: A = 1/2. n. that is. the area A is bounded above by the sum of Sk and below L by the sum of Sk : L L L U U U S 1 + S 2 + · · · + S n = AL ≤ A ≤ A U = S 1 + S 2 + · · · + S n n n for any number n of partition segments. What is the area now? To calculate it. 3 n n U Therefore. 1] by n segments on length 1/n. and xn = n/n = 1. 1] is the sum of the areas Sk under the parabola over the partition interval [xk−1 . n. x1 = 1/n. 1... L Sk (k − 1)2 k2 U = < Sk < 3 = Sk . Let us denote this upper bound by Sk = k 2 /n3 . x2 = 2/n.. . A = S 1 + S 2 + · · · + Sn . n2 n 111 . xn−1 = (n − 1)/n.CHAPTER 5 Integration 31. The partition is deﬁned by the set of points x0 = 0. consider a partition of the interval [0. Thus. 2. .. xk = k/n. The area Sk is greater than the area of a rectangle with base 1/n and height f (xk−1 ) = (k − 1)2 /n2 . Let us calculate the diﬀerence 2k − 1 2n − 1 2 ≤ < 2 n3 n3 n for any k = 1... where k = 0.. Let f (x) = x2 . . The lower bound is denoted by L Sk = (k − 1)2 /n3 . xk ] where k = 1. This inequality allows us to estimate the diﬀerence AU −AL : n n U L 0 < Sk − Sk = U L U L U L 0 < AU − AL = (S1 − S1 ) + (S2 − S2 ) + · · · + (Sn − Sn ) < n n n 2 2 = .. 2. Areas and Distances Consider the linear function f (x) = x. The area under the parabola y = x2 over the interval [0.
the limit of A∗ does not depend on the choice of sample points n x∗ . Let x∗ be a number in the interval [xk−1 .. k . Consider a partition of [a.. Taking the limit n → ∞ in this inequality yields n→∞ lim AL = A = lim AU . A∗ with n k the sample points as the midpoints x∗ = (xk + xk−1 )/2. the following inequality holds for any n: AU = n ∗ ∗ ∗ AL ≤ S1 + S2 + · · · + Sn = A∗ ≤ AU . for example. n n n Taking the limit n → ∞ in this inequality leads to a remarkable result n→∞ lim A∗ = A . the error of either of the approximations does not exceed 2/n because 0 < AU − AL < 2/n and AL ≤ A ≤ AU . On the other hand. by making use of this formula. Then the area Sk k ∗ can also be approximated by the area Sk of a rectangle with base ∗ ∗ 2 ∗ 1/n and height f (xk ) = (xk ) . if the limit limn→∞ AU exists. 6 6 Indeed. when n gets larger. or any other k convenient choice. xk ]. xk ].1) 12 + 22 + · · · + n2 = n(n + 1)(2n + 1) = (2n2 + 3n + 1). that is. This analysis can be extended to any continuous function. It appears that the n n n n limit limn→∞ AU can actually be calculated by means of the formula n for the sum of squares of the ﬁrst n positive integers: n 1 (5. 31. n. b] by n segments of length Δx = (b − a)/n. For n large n enough. Let x∗ be a sample point in the interval [xk−1 . The endpoints of the partitions segments are xk = a + k Δx with k = 0. such that x0 = a and xn = b. AL ≤ A ≤ AU n n n n for any n.112 5. So the area is A = 3 . Let f (x) be continuous on [a. Then the total k area under the graph is approximated by the sum A∗ of all Sk .. then lim AU = lim AL because n n n 0 < AU − AL < 2/n → 0 as n → ∞. The area could have been approximated by. Since n L ∗ U Sk ≤ Sk ≤ Sk (owing to the monotonicity of the function x2 in each interval [xk−1 . xk ]). n n n→∞ From a geometrical point of view. The Area Under the Graph of a Continuous Function. n n In fact. INTEGRATION Thus. one can infer that 1 1 2 2n2 + 3n + 1 1 1 1 (1 + 22 + · · · + n2 ) = = + + 2 → 3 2 n 6n 3 2n 6n 3 1 as n → ∞. .1. b]. 1. the area AU n approaches A from above while AL does so from below. Sk = f (x∗ )/n. both AU and AL may serve as a good approximation of A. n that is. 2.
ﬁx some n and Δx. Sk = Sk for some x∗ . keeping the desired number of decimal places. xk ]. A nontrivial fact is that Equation (5. Sk k L must take all the values between its minimal value Sk and its maximal U ∗ value Sk .2) holds for any choice of x∗ . If Sk is the area under the graph y = f (x) on the interval [xk−1 . 2n . for example. Therefore. that is. until the needed accuracy is reached.2) is independent of the L L choice of sample points if the lower sums AL = S1 + · · · + Sn and n U U U the upper sums An = S1 + · · · + Sn converge to the same number as n → ∞.31.2) A = lim A∗ = lim f (x∗ ) Δx + f (x∗ ) Δx + · · · + f (x∗ ) Δx n 1 2 n n→∞ n→∞ for any choice of sample points x∗ . there k is a choice of sample points such that A∗ = A. reﬁne the partition further and compute A∗ and compare it with 4n A∗ and so on. Indeed. If A∗ and A∗ coincide in the desired number of decimal 2n n 2n places. Thus. b]. Continuing the analogy k with the example of f (x) = x2 . b] is (5. Any continuous function attains its maximal and minimal values on a closed interval. Equation (5. then A = A∗ is correct to that number of decimal places. one can always n ﬁnd a set of sample points for which Equation (5. or the midpoints x∗ = (xk−1 + xk )/2. AL ≤ A∗ ≤ AU for any choice of x∗ and n n n k for any n. xk ]. So. Set sample points x∗ . = A. xk ]. In practice. L U ∗ then Sk = mk Δx ≤ Sk ≤ Sk = Mk Δx. In particular.2) can be used to ﬁnd the area under the graph that is correct to any desired number of decimal places. Convenient choices might be the left points x∗ = xk−1 . Let Mk and mk be the maximal and minimal values of f (x) on the interval [xk−1 . The area A of the region that lies under the graph of a continuous function f (x) ≥ 0 on an interval [a. On the other hand. Therefore. then this number must be the area under the graph. limn→∞ A∗ = A independently of the choice of n sample points.2) gives the area under the graph of a continuous function. It will be shown in the next section that for continuous functions the limits of AL and AU exist and coincide. the limit (5. Calculate k k the sum A∗ . doubling the number of segments. The area Sk = f (x∗ ) Δx is k ∗ a continuous function of x∗ on the interval [xk−1 . AREAS AND DISTANCES 113 Definition 36. and calculate A∗ . if AL and AU converge n n n n to the same number. Take a partition of the interval [a. k Let us assess this deﬁnition. Reﬁne n the partition by. from the inequality AL ≤ A ≤ AU it would then n n follow that lim AL = lim AU . that is. for any n. If 2n not. This fact justiﬁes n n the previous deﬁnition. the k k right points x∗ = xk .
114
5. INTEGRATION
31.2. Sigma Notation for Sums. To avoid writing lengthy expressions for sums of an arbitrary number of terms, it is convenient to adopt the following notation:
n
A∗ n
=
∗ S1
+
∗ S2
+ ··· +
∗ Sn
=
k=1
∗ Sk ,
where the index k is called the summation index. The symbol means ∗ adding all Sk , starting with k = 1 up to k = n. For example, the geometric sum formula can now be written as
n
(5.3)
1 + q + q2 + · · · + qn =
k=0
qk =
q n+1 − 1 . q−1
31.3. The Distance Problem. If an object moves with a constant
velocity v during a time interval a ≤ t ≤ b, then the distance traveled by the object is D = v(b − a). How does one calculate the distance if the speed is a nonconstant function of time v = v(t) ≥ 0? If D(t) is the distance as a function of time, then v(t) = D (t). The function D(t) is an antiderivative of v(t). If it is known, then the distance traveled is D(b) − D(a). It turns out that the problem can also be solved by ﬁnding the area under the graph of v(t)! This implies that there is a relation between an antiderivative of a function and the area under its graph. The time interval [a, b] can be partitioned into n time intervals Δt = (b − a)/n. Let tk = Δt k, k = 0, 1, ..., n, be the end points of the partition intervals. The distance ΔDk = D(tk ) − D(tk−1 ) traveled by the object in the time interval [tk−1 , tk ] can be found by the mean value theorem: D(tk ) − D(tk−1 ) = v(t∗ ) Δt for some t∗ in [tk−1 , tk ]. k k Recall that the value v(t∗ ) is the average velocity over the time interk val [tk−1 , tk ]. Thus, the total distance is D = ΔD1 +· · ·+ΔDn . In a plot showing the graph of v(t), the distance ΔDk = v(t∗ ) Δt is nothing but k the area under the graph over the interval [tk−1 , tk ]. Recall that one can always ﬁnd a particular sample point t∗ such that this area coincides k with the area of a rectangle v(t∗ ) Δt. Therefore, D is the area under the k graph of v(t). This implies, in particular, that in order to calculate D, any sample points t∗ can be used, not necessarily those at which v coink cides with the average velocity in each partition interval. But there is a price for this generalization. Namely, the limit (5.2) must be computed:
n
D = lim where t∗ k
n→∞
v(t∗ ) Δt , k
k=1
is any set of sample points.
32. THE DEFINITE INTEGRAL
115
Example 58. A moving object slows down so that its velocity is v(t) = e−2t . What is the distance traveled by the object during the time interval 0 ≤ t ≤ 1? Solution: Let Δt = 1/n so that tk = k/n, k = 0, 1, ..., n. Take t∗ = k (k − 1)/n, k = 1, 2, ..., n (the left points of partition intervals). Then ΔDk = v(t∗ ) Δt = q k−1 /n, where q = e−2/n . The distance traveled is k 1 D = lim n→∞ n
n−1
1 − e−2 1 qn − 1 = , q = lim n→∞ n q − 1 limn→∞ n(1 − e−2/n ) k=0
k
where the sum formula (5.3) has been used. To compute the limit in the denominator, let x = 1/n, that is, x → 0. The limit becomes the indeterminate form (1 − e−2x )/x of type 0/0, which can be resolved by l’Hospital’s rule: (1 − e−2x ) /(x) = 2e−2x /1 → 2 as x → 0. Thus, the 2 distance traveled is D = (1 − e−2 )/2. Remark. An alternative solution uses the general antiderivative of e−2t : D(t) = −e−2t /2 + C. The distance traveled is D(1) − D(0) = (1−e−2 )/2. Note that the result is independent of an arbitrary constant C. When compared to the previous solution, this one looks like cheating! More to the point, take v(t) = t2 (the example discussed at the beginning of this section). Its antiderivative is D(t) = t3 /3 + C. So the distance traveled, or the area under the graph of t2 , is D(1) − D(0) = 1/3. It is so simple, isn’t it? Thus, the concept of the antiderivative and its relation to the area (5.2) must be further investigated. 32. The Deﬁnite Integral A generalization of the concept of the area under a graph leads to one of the most fundamental concepts in calculus, the deﬁnite integral. Let f be a function on an interval [a, b]. Consider a partition of [a, b] by n intervals of length Δx = (b − a)/n so that the endpoints of the partition intervals are xk = a+k Δx, k = 0, 1, ..., n. Let Ik = [xk−1 , xk ], k = 1, 2, ..., n, denote the partition intervals. Let Mk = maxIk f (x) (the maximal value of f (x) in the interval Ik ) and mk = minIk f (x) (the minimal value of f (x) in the interval Ik ). By the analogy of the area under a graph, deﬁne the lower AL and upper AU sums for f by n n
n n
AL n
=
k=1
mk Δx ,
AU n
=
k=1
Mk Δx
for every partition of [a, b].
116
5. INTEGRATION
Definition 37 (The Deﬁnite Integral). A function f on an interval [a, b] is said to be integrable if the sequences of its lower and upper sums converge to the same number. This number is called the deﬁnite integral of f from a to b and is denoted by
b
f (x) dx = lim AL = lim AU ; n n
a n→∞ n→∞
the numbers a and b are called the lower and upper integration limits, respectively, and the function f is called the integrand. Apparently, for a continuous and nonnegative f on [a, b], the definite integral coincides with the area under the graph of f . The geometrical signiﬁcance of the deﬁnite integral in general will be discussed later after establishing its basic properties. Let f be integrable on [a, b] and let x∗ be a sample point in Ik for k each k = 1, 2, ..., n. For any number ∈> 0, there exists an integer N such that
b n
f (x) dx −
a k=1
f (x∗ ) Δx < k
for every integer n > N and for every choice of x∗ in Ik . Indeed, since, k for any x∗ , mk ≤ f (x∗ ) ≤ Mk and therefore k k
n
AL n
≤
k=1
f (x∗ ) Δx ≤ AU . k n
By the squeeze principle, the sequence of the sums in the middle of this inequality must converge to the deﬁnite integral of f from a to b as n → ∞:
n
(5.4)
n→∞
lim
f (x∗ ) Δx = k
k=1 a
b
f (x) dx.
Hence, by the deﬁnition of the limit, no matter how small is, there is always a large enough integer N such the deviation of the values of the sequence elements from the limit does not exceed for all n > N . The sum in (5.4) is called the Riemann sum after the German mathematician Bernhard Riemann (1826–1866). It follows from the preceding analysis that the sequence of Riemann sums for an integrable function converges to the deﬁnite integral. Since the limit is independent of the choice of sample points, any choice convenient to calculate the limit (5.4) can be made.
32. THE DEFINITE INTEGRAL
117
32.1. Continuity and Integrability. The relation (5.4) holds and can be used to calculate the deﬁnite integral, provided the function f is integrable. The question of integrability requires investigating the convergence of the sequences of the upper and lower sums, which might be a tedious task even for such simple functions as, for example, f (x) = x2 , as discussed in the previous section. The following theorem is helpful when studying the question of integrability.
Theorem 38. If f is continuous on [a, b], or if f has only a ﬁnite number of jump discontinuities, then f is integrable on [a, b]; that is, b the deﬁnite integral a f (x) dx exists. This theorem justiﬁes the deﬁnition of the area under the graph of a continuous function introduced in the previous section. Let f (x) be deﬁned on [0, 1] such that f (x) = 1 if x is a rational number, and f (x) = 0 otherwise (i.e., if x is irrational). The function is not continuous anywhere in [0, 1]. For example, f (1/2) = 1, but when x approaches 1/2, the value f (x) keeps jumping from 0 to 1 and back, no matter how close x is to 1/2 because, for any δ > 0, the interval ( 1 − δ, 1 + δ) always contains both rational and irrational numbers. 2 2 This function gives an example of a nonintegrable function. Indeed, take a partition xk = k/n, k = 0, 1, ..., n. Any partition interval [(k − 1)/n, k/n] contains both rational and irrational numbers. Therefore, mk = 0 and Mk = 1. Hence, the lower sum vanishes for any partition, n AL = 0, whereas the upper sum is AU = n n k=1 Δx = 1, that is, L U limn→∞ An = 0 while limn→∞ An = 1. The function is not integrable. The integral does not exist. Note that the Riemann sum can still be deﬁned, but its limit would depend on the choice of sample points (e.g., take x∗ to be rational numbers or take x∗ to be irrational numbers; k k both options are possible since any partition interval always contains rational and irrational numbers).
32.2. Properties of the Deﬁnite Integral. Suppose f (x) = c, where c is
a constant. In this case, for any partition interval Ik , Mk = mk = c and AU = AL = c n Δx = cn Δx = c(b − a). In other words, a n n k=1 constant function is integrable and its integral is c(b − a):
b
(5.5)
a
c dx = c(b − a).
For any two integrable functions f (x) and g(x) and constants c1 and c2 , it follows from the convergence of the Riemann sums (5.4) for f and
6) = c1 a f (x) dx + c2 a g(x) dx . 32. then the property (5. b] and any a ≤ c ≤ b.8) a f (x) dx = a f (x) dx + c f (x) dx for f integrable on [a. b]. b]. But the area A coincides with the area above the graph of f and below the x axis. Let it be positive on [a. c] and [c. The integral of a function multiplied by a constant is the product of the constant and the integral of the function. The integral of g is the area A under the graph of g. In particular.3. b]. a It can be proved that b c b (5. c] and negative . As already noted.7) a f (x) dx = − b f (x) dx and. If the integration limits are reversed. the deﬁnite integral of f from a to b coincides with the area under the graph of f for a continuous and positive f . the integral of the sum of two functions is the sum of their integrals. Therefore. b]. Geometrical Signiﬁcance of the Deﬁnite Integral. Suppose f is continuous and negative on [a. INTEGRATION g that b n [c1 f (x) + c2 g(x)] dx = lim a n→∞ [c1 f (x∗ ) + c2 g(x∗ )] Δx k k k=1 n n = c1 lim b n→∞ f (x∗ ) Δx k k=1 + c2 lim b n→∞ g(x∗ ) Δx k k=1 (5. for a negative f . b a (5. the integral of f coincides with the negative area of the region bounded below by the graph of f and above by the x axis. By the linearity of the b b integral. in particular. a f (x) dx = 0 . b] is the sum of the areas under the graph of f on [a. a f (x) dx = − a g(x) dx = −A. Consider the function g(x) = −f (x).8) is trivial: The area under the graph of f on [a. then Δx changes its sign: Δx = (b − a)/n → (a − b)/n. If f is continuous and positive on [a.118 5. So the integration is a linear operation. The proof is rather technical and is omitted. So. Now let f be continuous on [a.
where A1 is the area under the graph of f on [a. (5. for any partition. b].4). b if f (x) ≥ 0 in [a. b].1). The following additional properties of the deﬁnite integral can be established: b (5. b] . Hence. If f is continuous. if m ≤ f (x) ≤ M in [a.11) is also a consequence of the deﬁnition. The property (5. then it can be evaluated as the limit of the Riemann sum (5. and (5.10) a f (x) dx ≥ a b (5. this inequality turns into (5.12) n(n − 1) . The limit is independent of the choice of sample points.4.10) follows from (5.9) for the function f (x) − g(x) ≥ 0 and the linearity of the integral (5. b] . Evaluation of the Integral by the Riemann Sum. k= 2 k=1 n n k3 = k=1 n(n − 1) 2 2 . Comparison Properties of the Integral. THE DEFINITE INTEGRAL 119 on [c. that is. 32. b] .9) states the obvious that the area under the graph of f is nonnegative.11).8) that b c b f (x) dx = a a f (x) dx + c f (x) dx = A1 − A2 . Formulas like (5.3).10) follows directly from the deﬁnition: 0 ≤ mk ≤ Mk for any partition if f (x) ≥ 0. c] and A2 is the area above the graph of f on [c. m ≤ mk ≤ Mk ≤ M . The property (5. Then it follows from the property (5.9) a b f (x) dx ≥ 0. m(b − a) ≤ AL ≤ AU ≤ M (b − a) for any n n n. The following choices are often used in practice: x∗ = xk−1 k x∗ k x∗ k = xk = (xk−1 + xk )/2 (the leftpoint rule). g(x) dx.32.11) m(b − a) ≤ a f (x) dx ≤ M (b − a). (5. 32. The property (5.6). (the midpoint rule). If the integral exists (f is integrable). the upper and lower sums are nonnegative and so must be the integral. The evaluation of the Riemann sum is rather technical. (the rightpoint rule).5. if f (x) ≥ g(x) in [a. f (c) = 0. In the limit n → ∞. that is. Indeed. the property (5. in combination with the basic properties of the integral.
The Fundamental Theorem of Calculus In this section. the Riemann sum is mostly used to calculate the integral approximately with some designated accuracy by means of computer simulations.120 5. Find the deﬁnite integral of f (x) = e−2x − 2x2 + 4x3 from 0 to 1. (II) By the linearity of the integral.12). 1] can be found at the beginning of the previous section and is equal to 1/3. Integration and Differentiation. This relation provides a powerful method for calculating the deﬁnite integral that avoids the use of the Riemann sum (5. The area under the graph of x3 can be found with the help of the second relation in (5. Example 59.1. 1] was calculated).4) applies for any choice of x∗ . Solution: (I) The function is continuous on [0.4) becomes 1 0 1 x dx = lim 4 n→∞ n 3 1 1 n2 (n − 1)2 = . then the Riemann sum (5. 1 f (x) dx = 0 1 − 6e−2 1 − e−2 2 1 . However. Consider the deﬁnite integral of f (t) = t from 0 to x for some x > 0. The area under the graph x2 in [0. INTEGRATION can be helpful. This integral represents the area .4). − + = 2 3 4 12 2 33. that is. similarly to approximate calculations of the area discussed in the previous section. 33. the relation between the deﬁnite integral of a function and its antiderivative will be established. 1] and hence integrable. Let Δx = 1/n and xk = (k − 1)/n (the leftpoint rule). k = lim 4 n→∞ n 4 4 k=1 3 n (III) Thus. The ﬁrst integral is (1 − e )/2 by Example 58 (where the area under the graph of e−2x in [0. Equation (5. 1 1 f (x) dx = 0 0 e−2x dx − 2 0 −2 1 1 x2 dx + 4 0 x3 dx. The leftpoint k rule will be used.
The ratio in the limit can be transformed as follows: x+h x g(x + h) − g(x) 1 = f (t) dt − f (t) dt h h a a 1 = h = x x+h x f (t) dt + a x+h x f (t) dt − a f (t) dt 1 f (t) dt . both x and x + h (h can be positive or negative) always lie in the interval (a. that is. Theorem 39. 2 0 The area A(x) can be viewed as a function of the variable x.e.8) has been used. the speed is the rate of change of s(T ). How general is this property? Does it hold for all integrable functions? The following theorem answers these questions. By the deﬁnition of the derivative. T s(T ) = 0 v(t) dt .13) lim h→0 h for a < x < b. THE FUNDAMENTAL THEOREM OF CALCULUS 121 under the graph of f (t) = t in the interval [0. This function has an interesting property: A (x) = x = f (x) . which is the length of the triangle catheti. b] and diﬀerentiable on (a. then the distance traveled by the object in time T is given by the area under the graph of v(t). On the other hand. and g (x) = f (x). Proof. Recall that if v(t) ≥ 0 is the speed of a moving object. the derivative of the integral with respect to its upper limit is again the value of the integrand at the upper limit. b]. that is. which is the area of a right triangle: x x2 t dt = A(x) = . x].33.. Note that since a < x < b (i. and therefore there should be s (T ) = v(T ). then the function deﬁned by g(x) = a x f (t) dt . x = a and x = b). a ≤ x ≤ b. for a suﬃciently small h = 0. b) so that . If f is continuous on [a. h x where the property (5. is continuous on [a. one has to prove that g(x + h) − g(x) = f (x) (5. b). In other words. the derivative of the deﬁnite integral with respect to its upper limit equals the value of the integrand at the upper limit.
Take. Then m ≤ f (t) ≤ M for x ≤ t ≤ x + h and.15) can be established for h < 0 in a similar manner. h→0 h→0 h 2 This theorem basically states that if a continuous function is ﬁrst integrated and then diﬀerentiated. the sign is reversed. Indeed. h Since u and v lie in the interval [x. The continuity of f on [a. For x ≥ 1.15) f (u) ≤ 1 h x+h f (t) dt ≤ f (v) x for some u and v in [x. Let a = 0.16) d dx x f (t) dt = f (x) . then it remains unchanged: (5. g(x) = 0 f (t) dt = 0 if x < 1. a x a < x < b. where v and u are in [x.122 5. In other words. h→0 lim f (v) = f (x) . by the property (5.11). f (t) = 0 if x < 1 and f (t) = 1 if x ≥ 1. x + h]. the function f attains its absolute maximal and minimal values in [x. . By the x property (5.7).15). Inequality (5. f (u) ≤ h→0 lim f (u) = f (x) .14) holds for the x integral x+h f (t) dt. x + h]. respectively.15) is obtained but with the minus sign at the integral. x + h]. x + h] is contained in (a. b). b). yielding (5. Thus. Let M = f (v) and m = f (u) be the absolute maximal and minimal values.14) mh = f (u)h ≤ x f (t) dt ≤ f (v)h = M h . Suppose that h > 0. inequality (5. x + h]. x+h (5. by dividing this inequality by h. x + h].5). g(x + h) − g(x) ≤ f (v) . By the continuity of f (t) on the interval [x. one can infer that (5. After dividing it by −h > 0. INTEGRATION the interval [x. Since h > 0. Then the relation (5.13) follows from the squeeze principle: f (x) = lim f (u) ≤ lim h→0 g(x + h) − g(x) ≤ lim f (v) = f (x) . b] is essential for this relation to hold. F (x) = a f (t) dt is an antiderivative of f (x) in an open interval (a. Then f has a jump discontinuity at t = 1. one has x 1 x g(x) = 0 f (t) dt = 0 f (t) dt+ 1 f (t) dt = 0+(x−1) = x−1 . inequality (5. for example. By the property (5. x ≥ 1 .
Therefore. Solution: The function e−t is a continuous function everywhere as a composition of two continuous functions. 2 This example is a particular case of the general property d dx for a continuous f . THE FUNDAMENTAL THEOREM OF CALCULUS 123 Therefore. then b f (x) dx = F (b) − F (a) . Let g(x) = a f (t) dt. 33. Therefore. x Find g (x). a where F is any antiderivative of f . b] because limx→a+ g(x) = g(a) = 0 and limx→b− g(x) = g(b). b x Also. b].7). g(x) = − b e−t dt. The fundamental theorem of calculus provides a powerful analytic tool to evaluate deﬁnite integrals. the function g(x) is an antiderivative of f (x) in an open interval (a. The Deﬁnite Integral and Antiderivative. b F (b) − F (a) = g(b) + C − g(a) + C = g(b) = a f (t) dt . b] (as the sum of two continuous functions). the exponential and power 2 x functions. By (5. F (x) is also continuous on [a. . g (x) = 0 if x < 1 and g (x) = 1 if x > 1. a function such that F = f .16). The following theorem establishes the relation between the deﬁnite integral of a function and its antiderivative.16). by the deﬁnition of g(x). Let g(x) = 2 b −t2 e dt. g(a) = 0 and g(b) = a f (t) dt.2. Proof. a < x < b. But g (1) does not exist. b).33. b f (t) dt = − x d dx x f (t) dt = −f (x) b Theorem 40. 2 The proof is complete. By the property (5. (The Fundamental Theorem of Calculus). F (x) = g(x) + C . then F and g may diﬀer only by a constant. The function g(x) is continuous on [a. If F is any other antiderivative of f . Example 60. that is. If f is continuous on [a. Hence. 2 g (x) = −e−x by (5.
To further stress this relation between the integration and diﬀerentiation. 3 3 3 x 34. the derivative of the deﬁnite integral of a continuous function f with respect to the upper limit equals the value of f at the upper limit. Indeﬁnite Integrals and the Net Change 2 As has been shown in the previous section. 1 π π dx = tan−1 (1) − tan−1 (0) = − 0 = . 2 4 4 0 1+x √ 4 Example 62. Definition 38 (Indeﬁnite Integral). which is always cancelled out in the diﬀerence F (b) − F (a). Solution: By the linearity of the integral. The function F is called an indeﬁnite integral of f and is denoted by F (x) = f (x) dx if F (x) = f (x) . Evaluate 1 + x2 )−1 dx. Since all antiderivatives diﬀer only by a constant. an antiderivative is obtained: F (x) = 2x1/2 + 2x3/2 /3. 4 1 2 1+x √ dx = x 4 1 (x−1/2 + x1/2 ) dx = 1 n n+1 4 x−1/2 dx + 1 4 x1/2 dx . INTEGRATION 1 (1 0 Example 61. The . Therefore. the notion of an indeﬁnite integral is introduced. By taking n = −1/2 and n = 1/2. It follows from this deﬁnition that an indeﬁnite integral is nothing but the general antiderivative of f . a where F is any antiderivative of f . So integration and diﬀerentiation appear as operations inverse to one another. 1 0 2 20 16 1+x √ dx = F (4) − F (1) = 4 + − 2+ = . Hence. the deﬁnite integral is the diﬀerence in values of the indeﬁnite integral at the upper and lower limits of the deﬁnite integral. Solution: An antiderivative of (1+x2 )−1 is F (x) = tan−1 (x). The reason for introducing the integral symbol into the antiderivative notation is the fundamental theorem of calculus: b f (x) dx = F (b) − F (a) . An antiderivative of x is x /(n + 1) for any real n = 1. Evaluate 1 (1 + x)/ x dx.124 5.
2 2 1+x 1−x tan(ax) cot(ax) +C. n+1 x cos(ax) sin(ax) +C. Solution: The function x−3 is not deﬁned at x = 0. 2 x > 0. dx = sin−1 (x) + C . csc2 (ax) dx = − +C. Example 63. .17) c1 f (x) + c2 g(x) dx = c1 f (x) dx + c2 g(x) dx for any constant c1 and c2 and any functions f and g. the convention is used that the given expressions for indeﬁnite integrals are valid only in an interval. ∞). dx = ln(x) + C . 2 x < 0. INDEFINITE INTEGRALS AND THE NET CHANGE 125 indeﬁnite integral has the same properties as the antiderivative. x−3 dx = − x−2 + C2 . ax dx = ln(a) 1 1 √ dx = tan−1 (x) + C . So its domain is the union of two disjoint intervals (−∞. So. Let C be an arbitrary constant. + C. sin(ax) dx = − +C. csc(x) cot(x) dx = − csc(x) + C . a > 0 . Using the table of antiderivatives of basic functions. Find a general indeﬁnite integral for x−3 . in the preceding table. 0) and (0. cos(ax) dx = a a ax ex dx = ex + C .34. n = 1. By the ﬁrst equality in the preceding table (n = −3). Recall that the general antiderivative on a given interval is obtained from a particular antiderivative by adding an arbitrary constant. sec2 (ax) dx = a a xn dx = sec(x) tan(x) dx = sec(x) + C . one can make a table of indeﬁnite integrals of basic functions. It is linear: (5. Then it is easy to verify the following relationships: xn+1 1 + C . x−3 dx = − x−2 + C1 . 2 where C1 and C2 are arbitrary constants. This does not hold for a domain being a disjoint union of two or more intervals (review the properties of antiderivatives).
The integral of a continuous rate of change is the net change: b F (x) dx = F (b) − F (a) .18). The Net Change Theorem. for example. a Note that F (x) may be positive and negative in the interval [a. b] so that the quantity y = F (x) may increase and decrease. Put f (x) = F (x) in the fundamental theorem of calculus (5. The net change vanishes if F (b) − F (a) = 0. as already noted. The result obtained is known as the net change theorem. An analogy with an object moving along a straight line can be made to illustrate the net change. t2 ] is t2 t1 v(t) dt = s(t2 ) − s(t1 ). an indeﬁnite integral of the integrand is x3 − x2 /2 + 4 tan−1 (x). Solution: By the linearity of the indeﬁnite integral (5. The diﬀerence F (b) − F (a) represents the net change of y when x changes from a to b. that the quantity y increases from the value F (a). it is always cancelled out in the deﬁnite integral.17). then. Let s(t) be a position function of the object relative to some point on the line. a b where F (x) = f (x) dx. 1 3x2 − x + 0 4 x2 + 4 tan−1 (x) dx = x3 − 1 + x2 2 1 = 0 1 − π. but rather this might mean. 1 [3x2 0 Example 64. Evaluate − x + 4(1 + x2 )−1 ] dx. Theorem 41. INTEGRATION The following notation is used in the fundamental theorem of calculus: b (5. An arbitrary constant in the indeﬁnite integral may be omitted here because. b]. returning to its initial value when x = b so that its net change vanishes. . This does not mean that the quantity y does not change at all.18) a f (x) dx = F (x) = F (b) − F (a) . it begins to decrease.126 5. 34.1. at some c in [a. 2 2 where tan−1 (1) = π/4 has been used. The net change of the position over the time interval [t1 . Then s (t) = v(t) is its velocity (note that the velocity can be negative so that the object can move back and forth). Therefore.
. (II) Note that the velocity changes its sign at t = 1/2. the net change of the population growth t2 t1 n (t) dt = n(t2 ) − n(t1 ). 1 1/2 1 1 − 2t dt = 0 0 (1 − 2t) − 1/2 (1 − 2t) dt = [s(1/2) − s(0)] − [s(1) − s(1/2)] = 1/2. 1] (i. So. 2 Other examples of the net change includes the volume V (t) of water in a reservoir between two moments of time t2 t1 V (t) dt = V (t2 ) − V (t1 ). it is positive (i. it is always non negative regardless of the direction in which the car is moving. Think of v(t) as the speed shown on the speedometer of your car.e. . where the deﬁnition v = v if v > 0 and v = −v if v < 0 has been used. in the interval [0. Find the net change of its position over the time interval [0. So the net change of the object position is 1 1 v(t) dt = 0 0 s (t) dt = s(1) − s(0) = 0 . Suppose an object travels along a straight line with a velocity of v(t) = 1 − 2t. INDEFINITE INTEGRALS AND THE NET CHANGE 127 Example 65. the absolute value v(t) must be integrated over the interval [0.34. 1/2]. where V (t) is the rate of change of the volume. then the velocity becomes negative in [1/2. the object goes back to the initial point). To ﬁnd the distance traveled by the object. 1] and the total distance traveled by the object over the same time interval. 1].e. the object moves to the right from its initial position).. Solution: (I) The indeﬁnite integral of v(t) is s(t) = t − t2 + C.
INTEGRATION where n (t) is the growth rate. Then the following equalities hold: F (x) dx = F (x) + C = u + C = du . the most general indeﬁnite integral of f (u) is f (u) du = F (u) + C. This trick can be generalized. where u is a new variable deﬁned as a diﬀerentiable function of x. √ √ 1 √ dx = d 2 x + 1 = 2 x + 1 + C . then (5. where g is diﬀerentiable and its range is the interval I. (The Substitution Rule). Therefore.19) f (g(x))g (x) dx = f (g(x)) dg(x) = f (u) du. Let F (u) be an indeﬁnite integral of a continuous function f (u) on an interval I. x+1 √ where the substitution u = 2 x + 1 has been used. This observation leads to a neat technical trick to calculate indeﬁnite integrals. Let u = g(x). If u = g(x) is a diﬀerentiable function whose range is an interval I and f is continuous on I. For example. Let u = F (x). Consider the diﬀerential du = F (x) dx. F (g(x))+C is an indeﬁnite integral of f (g(x))g (x). 35. F (g(x)) and f (u) du can diﬀer at most by an additive constant. F (g(x)) = F (g(x))g (x) = f (g(x))g (x) . the relation between the cost and marginal cost functions: t2 C (t) dt = C(t2 ) − C(t1 ). Theorem 42.128 5. By the chain rule. t1 and similarly for many other quantities. On an interval. So we can conclude that F (x) dx = du. where C is an arbitrary constant and the last equality follows from the fact that an indeﬁnite integral of f (u) = 1 is u. This proves the following theorem. . In other words. provided the variables u and x are related as u = F (x). The Substitution Rule An indeﬁnite integral of the derivative F (x) is the function F (x) itself. This also shows that it is permissible to operate with dx and du after the integral sign as if they were diﬀerentials.
Example 68. . 2 Solution: First. when evaluating the integral. This is especially useful when a calculation of a deﬁnite integral requires several changes of the integration variable. x sin(x2 + 1) dx. F (x) = xex dx = 2 2 2 2 where u = x2 . Indeed. the answer could have been recovered from the indeﬁnite integral 1 eu + C if u = x2 2 ranges from 0 = 02 to 4 = 22 as x ranges from 0 to 2. tan(x) dx. 2 2 2 2 where the substitution u = x + 1 has been used. By the fundamental theorem of calculus. 2 The substitution rule can be used to evaluate deﬁnite integrals by means of the fundamental theorem of calculus. x sin(x2 + 1) dx = Example 67. THE SUBSTITUTION RULE 129 The substitution rule is often referred to as a change of the integration variable. ﬁnd an indeﬁnite integral: 1 1 1 1 2 2 2 ex dx2 = eu du = eu + C = ex + C . Example 66. It is a powerful method to calculate indeﬁnite integrals. 2 2 Note that. where the substitution u = cos(x) and the logarithm property ln(1/a) = − ln(a) have been used. The fundamental theorem of calculus can also be applied directly in the new variable u. Evaluate 2 0 xex dx.35. 1 0 1 2 xex dx = F (2) − F (0) = (e4 − 1). the original variable x has been restored in the indeﬁnite integral in order to apply the fundamental theorem of calculus. in the previous example. Find Solution: 1 1 sin(u) du sin(x2 + 1) d(x2 + 1) = 2 2 1 1 = − cos(u) + C = − cos(x2 + 1) + C. Find Solution: tan(x) dx = d(cos(x)) du sin(x) dx = − =− cos(x) cos(x) u = − ln u + C = − ln  cos(x) + C = ln  sec(x) + C . provided the range of u is properly changed.
Then F (g(x)) is an antiderivative of (F (g(x))) = F (g(x))g (x) = f (g(x))g (x). By the fundamental theorem of calculus. Theorem 44. Evaluate e 1 ln(x)/x dx.22) −a f (x) dx = 0 .20) a f (g(x))g (x) dx = g(a) f (u) du. x So the substitution u = ln(x) can be made. Suppose f is continuous on a symmetric interval [−a. Then a a (5. a]. b b f (g(x))g (x) dx = F (g(x)) a a = F (g(b)) − F (g(a)) . b] and f is continuous on the range of u = g(x).21) −a a f (x) dx = 2 0 f (x) dx if f (−x) = f (x) (f is even). On the other hand. if f (−x) = −f (x) (f is odd). (5. The range of the new integration variable u is determined by the range of the old one: u = 0 when x = 1 and u = 1 when x = e. Let F be an antiderivative of f . e 1 ln(x) dx = x 1 u du = 0 u2 2 1 = 0 1 . Thus. Proof.20). then b g(b) (5. Symmetry.1. If g is continuous on [a. since F (u) is an antiderivative of f (u). 2 Example 69. 2 2 35.130 5. which implies (5. Solution: The integrand can be transformed as ln(x) dx = ln(x) d ln(x). The calculation of a deﬁnite integral over a symmet ric interval can be simpliﬁed if the integrand possesses symmetry properties. (The Substitution Rule for Deﬁnite Integrals). INTEGRATION Theorem 43. the fundamental theorem of calculus yields g(b) g(b) f (u) du = F (u) g(a) g(a) = F (g(b)) − F (g(a)) . Since the righthand sides of these equalities coincide. so must their lefthand sides.
by property (5. Therefore. THE SUBSTITUTION RULE 131 Proof. then f (−u) = −f (u) and (5. If f is odd. −π 2 . the area −a f (x) dx must coincide with A. In the ﬁrst integral on the very righthand side. Thus. 0] is obtained by the mirror reﬂection about the origin so that the area A appears beneath the x 0 axis. 0] is obtained from that on [0. π sin(x3 ) dx = 0 . −a a − 0 f (x) dx = 0 f (−u) du and a a a f (x) dx = −a 0 f (−u) du + 0 f (x) dx. sin((−x)3 ) = sin(−x3 ) = − sin(x3 ). π]. Solution: Unfortunately. Now. a] by a reﬂection 0 about the y axis. and the fundamental theorem of calculus cannot be used. The integration interval is also symmetric. a]. Evaluate π −π sin(x3 ) dx.22) follows. the substitution u = −x is made so that u = 0 when x = 0 and u = a when x = −a and dx = −du. 2 The geometrical interpretation of this theorem is transparent. Note that sin(x3 ) is an odd function. If f is odd. One can always evaluate the integral by taking the limit of the sequence of Riemann sums. If f is even.35. Hence. the graph of f on [−a. an antiderivative of sin(x3 ) cannot be expressed in elementary functions. −a f (x) dx = −A. Example 70. then its graph on [−a. The integral can be split into two integrals: a 0 a f (x) dx = −a −a + 0 f (x) dx = − 0 −a a f (x) dx + 0 f (x) dx. The integral 0 f (x) dx = A is the area under the graph of f on [0. Supa pose f (x) ≥ 0 for 0 ≤ x ≤ a. then. Hence. if f is even. An alternative solution is due to a simple symmetry argument. [−π.21) follows.22). by symmetry. then f (−u) = f (u) and (5.
.. In the previous example. .132 5. −1. .. Consider the Riemann sum with sample points being the midpoints. 2. π] by points xk = k Δx. 1. 0... It is then straightforward to show that the Riemann sum vanishes because sin(x∗3 ) = − sin(x∗3 ) for k = 1. . N .... k = −N. N . −k k . N − 1. INTEGRATION Remark. −N + 1. where Δx = π/N .. take a partition of [−π.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.