You are on page 1of 34

Lecture 27

Indeterminate forms

(Relevant section from Stewart, Seventh Edition: Section 4.4)

Many of you are familiar with the idea of “indeterminate forms” as well as “l’Hôpital’s Rule”. If
you are not, there is no problem, since we shall develop the subject from first principles.

You will recall that we showed the following limit result,

sin x
lim = 1, (1)
x→0 x

by means of a geometric argument. But let’s step back a bit and pretend that we don’t know this
result. If we encounter the problem
sin x
lim (2)
x→0 x
for the first time, we see that there is indeed a problem: The numerator sin x tends to 0 as x → 0 as
does the denominator x. Had the denominator been x + 1, we could have easily concluded, from the
limit law for quotients, that
sin x limx→0 sin x
lim = = 0. (3)
x→0 x+1 limx→0 x + 1
This is possible because the limit of the denominator is not 0. But what happens if the limit of the
denominator is zero?
As you should know by now, the answer is, “It depends.” For example, from Eq. (1) and a change
of variable, one can show that
sin 5x
lim = 5. (4)
x→0 x
In this case again, the limits of the numerator and denominator are both 0, but we obtain a different
result.
The natural question is: What happens in general when we are faced with the following problem,

f (x)
lim , (5)
x→a g(x)

when
lim f (x) = 0 and lim g(x) = 0. (6)
x→a x→a

203
0
Such a problem is known as an indeterminate form . It is “indeterminate” because we just can’t
0
conclude, on the basis of f (x) and g(x) what the limit is, or even if it exists.


You have also encountered another indeterminate form, namely . For example, consider the

limit problem,
x2 + 5
lim . (7)
x→∞ 2x2 + 4
Both functions approach ∞ as x → ∞. You are already aware of a method to determine this limit:
Divide out the highest power of x in both the numerator and denominator:

x2 + 5 1 + 5/x2 limx→∞ 1 + 5/x2 1


lim 2
= lim 2
= 2
= . (8)
x→∞ 2x + 4 x→∞ 2 + 4/x limx→∞ 2 + 4/x 2

A “calculus-based” approach does exist to treat indeterminate forms – it is known as “l’Hôpital’s Rule:

L’Hôpital’s Rule: Assume that f and g are differentiable on an open interval that
contains the point a ∈ R (except possibly at a). Also suppose that either

1. lim f (x) = 0 and lim g(x) = 0 or


x→a x→a

2. lim f (x) = ±∞ and lim g(x) = ±∞.


x→a x→a

Then
f (x) f ′ (x)
lim = lim ′ , (9)
x→a g(x) x→a g (x)
if the limit on the RHS exists (it can be ∞ or −∞).

Example 1: Let’s return to the first example considered in this lecture, i.e.,

sin x
lim . (10)
x→0 x

Here
f (x) = sin x, g(x) = x, (11)

so that
f ′ (x) = cos x, g′ (x) = 1. (12)

204
We already know the following, but let’s repeat it here, to show that it satisfies the requirements of
l’Hôpital’s Rule:
lim f (x) = 0, lim g(x) = 0. (13)
x→0 x→0

Now note that


f ′ (x) cos x
lim ′
= lim = 1. (14)
x→0 g (x) x→0 1
The limit exists so we can conclude, by l’Hôpital’s Rule, that

sin x
lim = 1. (15)
x→0 x

Example 2: Before proving a simple version of l’Hôpital’s Rule, we return to the second example
mentioned earlier,
x2 + 5
lim . (16)
x→∞ 2x2 + 4
1
(We know this limit to be .) Here,
2

f (x) = x2 + 5, g(x) = 2x2 + 4, (17)

so that
f ′ (x) = 2x, g′ (x) = 4x. (18)

Then
f ′ (x) 2x 2 1
lim ′
= lim = lim = , (19)
x→∞ g (x) x→∞ 4x x→∞ 4 2
in agreement with our earlier result using another method.

Note: If necessary, l’Hôpital’s Rule can be applied more than once. In fact, in the above example, if
2x
we decided that we didn’t want to divide out the x’s from the ratio , we could have considered
4x
the numerator 2x and the denominator 4x as new functions to which the Rule could be applied.
2 1
Differentiating again, we obtain the ratio , which yields our limit .
4 2

Finally, we mention that l’Hôpital’s Rule is also applicable to one-sided limits, i.e., x → a+ and
x → a− .

205
Proof of l’Hôpital’s Rule for a special case

In what follows, we provide a proof of l’Hôpital’s Rule for a special case – the simplest or “nicest
case”, which is very often encountered in applications. We assume that the functions f ′ (x) and g′ (x)
are continuous functions in an interval containing the limit point a and that f (a) = g(a) = 0.
From the continuity of f ′ (x) and g′ (x), it follows that

f ′ (x) f ′ (a)
lim = . (20)
x→a g′ (x) g′ (a)

But the quotient on the right may also be expressed as follows, by definition of the derivative,
f (x)−f (a) f (x)−f (a)
f ′ (a) limx→a x−a x−a f (x) − 0

= g(x)−g(a)
= lim = lim . (21)
g (a) limx→a x→a g(x)−g(a) x→a g(x) − 0
x−a x−a

It therefore follows that


f (x) f ′ (x)
lim = lim ′ . (22)
x→a g(x) x→a g (x)

In the case that f (a) or g(a) (or both) is not defined, but lim f (x) = 0 and lim g(x) = 0, then a
x→a x→a
slightly more complicated proof is required. It can be found in Appendix F of the text by Stewart.

We now consider some additional examples of the use of l’Hôpital’s Rule as well as other indeterminate
forms.

Example 3: lim x e−x .


x→∞
0 ∞
This doesn’t look like an indeterminate form of the form or .The term x goes to ∞ and e−x
0 ∞
goes to zero, so it is actually the indeterminate form “0 · ∞” which we’ll discuss a little later. We can

turn the above into an indeterminate form by rewriting it as

x
lim . (23)
x→∞ ex

Here, f (x) = x and g(x) = ex . This is now an indeterminate form: As x → ∞, f (x) → ∞ and

g(x) → ∞. Since f ′ (x) = 1 and g′ (x) = ex , we have that

f ′ (x) 1
lim ′
= lim x = 0, (24)
x→∞ g (x) x→∞ e

206
it follows that
x
lim = 0. (25)
x→∞ ex

x2
Example 4: lim .
x→∞ ex

Here, f (x) = x2 and g(x) = ex . Once again, we have an indeterminate form: As x → ∞,

f (x) → ∞ and g(x) → ∞. Since f ′ (x) = 2x and g′ (x) = ex , we have that

f ′ (x) 2x
lim ′
= lim x = 0, (26)
x→∞ g (x) x→∞ e

where we have used the result of Example 3. (If we didn’t have this result at hand, we would simply
apply l’Hôpital’s Rule one more time.) Therefore

x2
lim = 0. (27)
x→∞ ex

xn
Example 4: From Examples 3 and 4 above, we should be able to see that lim , for n = 1, 2, 3, · · · .
x→∞ ex
Once again, if we didn’t have these examples at hand, we would, for any given n, apply l’Hôpital’s
Rule n times. This result implies that ex → ∞ faster than any power of x as x → ∞.

Other indeterminate forms

There are other indeterminate forms, for example,

0 · ∞, ∞ − ∞, 0∞ , ∞0 , 1∞ . (28)

0 ∞
As the and indeterminate forms, we must investigate each case separately.
0 ∞
For example, in response to a question by a student, it does not necessarily follow that 0 · ∞ is
∞. Consider the following cases, in all of which

lim f (x) = 0 and lim g(x) = ∞. (29)


x→∞ x→∞

1
1. f (x) = , g(x) = x. Then lim f (x)g(x) = lim 1 = 1.
x x→∞ x→∞

1 1
2. f (x) = , g(x) = x. Then lim f (x)g(x) = lim = 0.
x2 x→∞ x→∞ x

207
1
3. f (x) = , g(x) = x2 . Then lim f (x)g(x) = lim x = ∞.
x x→∞ x→∞

As Stewart writes in his textbook, in the indeterminate forms in (28), for example, ∞ − ∞, the
question is “Which one wins?” The answer is, “We have to treat each case separately.”

Example 5: lim x ln x.
x→0+

(We must consider the right-sided limit because ln x is undefined for x < 0.) This is a 0 · ∞
indeterminate form. In an effort to use l’Hôpital’s Rule, we rewrite the function x ln x as a quotient:

ln x f (x)
x ln x = −1
= . (30)
x g(x)

Since f (x) = ln x → −∞ and g(x) = x−1 → ∞ as x → 0, this quotient is an indeterminate form.

We now try to apply l’Hôpital’s Rule,

f ′ (x) x−1

= = −x, (31)
g (x) −x−2

we have
f ′ (x)
lim = lim (−x) = 0. (32)
x→0+ g′ (x) x→0+
It follows that
lim x ln x = 0. (33)
x→0+

There remains the question: Could we have used the other possible quotient, i.e.,

x f (x)
x ln x = 1 = ? (34)
ln x
g(x)

Unfortunately, the derivative of the denominator gets quite complicated:

1
f ′ (x) = 1, g′ (x) = , (35)
x(ln x)2

so that
f ′ (x)
= x(ln x)2 . (36)
g′ (x)
This problem is more complicated than the original problem, suggesting that this is not the way to
proceed.

208
Another note on Example 5: You may recall that we encountered this limit problem in a previous
lecture – the lecture on logarithmic differentiation. The problem was to compute the derivative of the
function f (x) = xx . To do so, we took logarithms:

1 dy dy
y = xx ⇒ ln y = x ln x, ⇒ = 1 + ln x ⇒ = y(1 + ln x) = xx (1 + ln x). (37)
y dx dx

But to get an idea of the graph of xx , it was necessary to determine its behaviour as x → 0+ . Since
y = xx , it follows that
ln y = x ln x, (38)

so that
lim ln y = lim x ln x. (39)
x→0+ x→0+

At that time, we simply stated that the limit on the RHS was 0, with the understanding that we
would prove it later in the course, which is what we have done in Example 5. From this result, it
follows that
lim ln y = 0. (40)
x→0+

We now rewrite the LHS as follows,  


ln lim y = 0. (41)
x→0+

This follows from the continuity of the ln function. Since ln 1 = 0, we can conclude that

lim y = lim xx = 1. (42)


x→0+ x→0+

p
Example 6: lim x2 + x − x.
x→∞
This is an ∞ − ∞ indeterminate form. Once again, we try to express this function in terms of
a quotient. We might try to remove the square root in the usual way:

p
2
p
2
x2 + x + x (x2 + x) − x2 x
x + x − x = ( x + x − x) · √ = √ =√ . (43)
x2 + x − x x2 + x + x x2 + x + x
Then
p x 1 1
lim x2 + x − x = lim √ = lim √ = . (44)
x→∞ x→∞ x2 +x+x x→∞ −1
1+x +1 2


This result might be a little surprising. As x → ∞, both x2 + x and x go to infinity, yet their
difference approaches a finite number, and a small one. In MATH 138, you will learn another method

209
to obtain this limit, with the help of Taylor series. For the moment, consider two particular numerical
values of the function in this example, i.e.,
p
f (x) = x2 + x − x. (45)

We find that
f (100) ≈ 0.498756, f (1000) ≈ 0.499875. (46)

Even at x = 1000, which is far from ∞, f (1000) is quite close to the limiting value of 0.5. In MATH
138, with the help of Taylor series, you’ll be able to estimate the difference between f (x) and the
limiting value 0.5.

 x
1
Example 7: We return to the basic limit for e: lim 1+ = e.
x→∞ x

We derived this limit in a previous lecture. It corresponds to an 1∞ indeterminate form. We now


consider a slight modification of the above indeterminate form,
 a x
lim 1+ . (47)
x→∞ x

We’ll try to express the above limit in terms of the known limit for e as follows: Let

a 1
= ⇒ x = ay, (48)
x y

so that  a
1 ay 1 y
  
 a x
1+ = 1+ = 1+ . (49)
x y y
Then  a   a
1 y 1 y
 
 a x
lim 1 + = lim 1+ = lim 1 + = ea . (50)
x→∞ x y→∞ y y→∞ y

210
Lecture 28

Optimization problems

(This lecture was presented using a slide presentation of the lecture notes below.)

(Relevant section from Stewart, Seventh Edition: 4.7)


We now enter – albeit briefly – the territory of “Applied Calculus,” the use of Calculus to solve
some “practical” problems. These problems involving the maximizing or minimizing of something, i.e.,
a quantity “Q”, for example, a distance, a time, an area, a volume, or perhaps the cost of constructing
something (which, in turn, may depend upon its area or volume).
In real life, the quantities Q that one seeks to minimize or maximize will involve many variables.
This will be the subject of a future calculus course on functions of several variables (e.g. MATH 237
or MATH 227).
Since this course deals with functions of a single real variable, we’ll examine some problems that
can be transformed into such functions. Indeed, this is often the stumbling block – how to transform
a problem with a number of components into one with a single variable.
It is recommended that you read Section 4.7 of Stewart’s textbook thoroughly for a good under-
standing of this topic. It begins with a helpful outline of the strategy behind solving optimization
problem. A number of illustrative problems are then solved.
In this lecture, we have time only to consider a few problems, starting with a simple one and then
working our way up to a very interesting problem from Physics: the refraction of light.

Example 1: Find the area of the largest rectangle that can be inscribed in a semicircle of radius R.
The first step is to sketch a picture of the situation, if only to get a feel for the problem.
The next step is to introduce a quantity that could serve as the unifying variable in the problem. In
this case, we may wish to consider the x-coordinate of the side of the rectangle, as indicated in the
figure above. The domain of this x variable will be [0, R]. By symmetry, this implies that the rectangle
will extend from −x to x. As such, its base length b is

b = 2x. (51)

211
y

y= R2 − x2

−R 0 x R

The height h of the rectangle will then be determined by the semicircle:


p
h= R2 − x2 . (52)

The area of the rectangle now becomes a function of x:


p
A = bh = 2x R2 − x2 = A(x). (53)

The goal is now to maximize A(x) over the interval [0, R]. Note that

A(0) = 0, A(R) = 0, (54)

as expected from the sketch: When x = 0, the rectangle becomes a vertical line from (0, 0) to (0, R).
When x = R, the rectangle becomes the horizontal line y = 0 and −R ≤ x ≤ R.
We look for critical points of A(x):
p 1 −2x
A′ (x) = 2 R2 − x2 + 2x · · √
2 R2 − x2
p 2x 2
= 2 R 2 − x2 − √
R2 − x2
2
R −x −x2 2
= 2 √
R2 − x2
R2 − 2x2
= 2√ . (55)
R 2 − x2
We see that
R2 R
A′ (x) = 0 ⇒ 2x2 = R2 ⇒ x2 = ⇒ x= √ . (56)
2 2
Evaluating A at this critical point, we see that
 2   
R R R
A √ =2 √ √ = R2 . (57)
2 2 2

212
Since this value is greater than the value of A(x) at the endpoints, i.e., A(0) = A(R) = 0, we conclude
that this is the maximum value of A(x) on [0, R].

That being said, let’s just step back and examine another method to ascertain that, in fact, the
above point corresponds to a local maximum and perhaps even an absolute maximum. To see that
R
x = √ is a local maximum, let’s examine the derivative A′ (x) on both sides of this point. If we can
2
show that
R R
A′ (x) > 0 for x < √ and A′ (x) < 0 for x > √ , (58)
2 2
R
then x = √ is a local maximum. The above implies that f (x) is increasing until we get to the critical
2
point, and then decreasing as we move away from it to the right.
To show the above inequality, let us rewrite the derivative A′ (x) slightly as follows,

R2
 
′ 4 2
A (x) = √ −x . (59)
R − x2
2 2
We see that
R2 R2
x2 < ⇒ A′ (x) > 0 and x2 > ⇒ A′ (x) < 0. (60)
2 2
R
which agrees with the inequalities in (58). Therefore the critical point x = √ is a local maximum.
2
In fact, since the inequalities hold for all appropriate x-values in the interval [0, R], we can conclude
that the critical point is an absolute maximum on the interval. (This idea is discussed in Stewart’s
textbook, Section 4.7, p. 324, as the “First Derivative Test for Absolute Extrema”.)

In Eq. (59), we see that the derivative A′ (x) → −∞ as x → R− . This suggests that the graph of
A(x) exhibits a cusp-like nature at x = R. This is confirmed by the following plot for the case R = 1,
courtesy of MAPLE:
An alternate method: Instead of using the Cartesian coordinate x, we can introduce an angle
variable θ as shown below.
The base length b and height l of the rectangle are now given in terms of θ:
π
b = 2R cos θ, l = R sin θ, 0≤θ≤ , (61)
2
so that the area A is now a function of θ,

A = bl = (2R cos θ)(R sin θ) = 2R2 cos θ sin θ = A(θ). (62)

213
1

0.8

0.6

0.4

0.2

0
0 0.2 0.4 0.6 0.8 1
x

y

y= R2 − x2

h
θ
−R 0 R

We now seek to maximize A(θ). One could compute the θ-derivative of the above expression to do
this. But recalling the double-angle formula for sin 2θ, we may rewrite A(θ) as

A(θ) = R2 sin 2θ. (63)

Over the interval θ ∈ [0, π/2], the sin(2θ) function increases to 1 at θ = π/4 and then decreases to 0
at π/2. So we could simply write down that θ = π/4 yields the maximum value of A(θ). But just to
be complete, let’s compute the derivative,

A′ (θ) = 2R2 cos 2θ. (64)

We see that
π
A′ (θ) = 0 when cos 2θ = 0 ⇒ θ= . (65)
4
Thus, θ = π/4 is the only critical point on [0, π/2]. At this value of θ,
π π 
A = R sin = R2 , (66)
4 2

as before.

214
R
Note that the critical point θ = π/4 found above corresponds to the value x = √ , which is
2
in agreement with the previous method. This is good! It is often possible to solve a problem using
several methods – if they are correct, they should yield the same result!
That being said, note that the graphs of A(x) and A(θ) are quite different. The former was plotted
above, and we saw that A′ (x) → −∞ as x → R− . This comes from the derivative of the square root

function R2 − x2 which, in turn, arises from the use of the Cartesian variable. On the other hand,
the function A(θ) = R2 sin 2θ is a simple one-half sine wave over [0, R]. The square root function from
the Cartesian representation is replaced by a trigonometric function. It is often – but not always! –
easier to work with angles.

Example 2: We now consider a three-dimensional version of the above two-dimensional one. Find
the cylinder of largest volume that can be inscribed in a hemisphere of radius R. The situation is
sketched below.

For simplicity, however, we consider a cross-sectional view of this problem, which basically gives
us the same type of picture in Example 1:
In this problem, however, we seek to maximize the volume of the inscribed cylinder, which will
be given by
V = (area of circular base) × (height) = πr 2 h. (67)

We’ll introduce the angle θ again, so that the base radius of the cylinder is given by

r = R cos θ, h = R sin θ. (68)

215
y

y= R2 − x2

h
θ
−R 0 R

The volume V , as a function of θ, becomes


π
V (θ) = πR3 cos2 θ sin θ, 0≤θ≤ . (69)
2
Let’s just step back for a moment and check whether the dimensionality of this result is correct: The
trigonometric terms are dimensionless, so the dimensionality of the RHS comes from the term R3 ,
which is length3 , written as “L3 ”. This is the correct dimension for a volume.
Also note that V (0) = V (π/2) = 0. Since we are looking for the maximum value of V that will
be attained on the interval (0, π/2), we search for critical points on that interval. Just to reduce some
work, let’s write the volume function as

V (θ) = πR3 f (θ), where f (θ) = cos2 θ sin θ. (70)

We now look for critical points of f (θ) by computing f ′ (θ).


There are several ways to proceed in the computation of f ′ (θ). We could simply differentiate the
above expression, i.e.,

f ′ (θ) = −2 cos θ sin2 θ + cos3 θ = cos θ(−2 sin2 θ + cos2 θ). (71)

Since it will be the term in brackets that will determine our critical point, we could convert the sin
function into a cos function or vice versa. If we use cos2 θ = 1 − sin2 θ, we obtain

f ′ (θ) = cos θ(1 − 3 sin2 θ). (72)

We see that f ′ (θ) = 0 if either cos θ = 0 or the term in brackets vanishes. But cos θ = 0 implies that
θ = π/2, for which the volume V = 0, which will not be a maximum volume. The term in brackets
vanishes if
1
sin2 θ = . (73)
3

216
Before going on, let’s examine another way to obtain this result. We go back to Eq. (70) for the
definition of f (θ) and rewrite it as follows,

f (θ) = (1 − sin2 θ) sin θ = sin θ − sin3 θ. (74)

Differentiating,
f ′ (θ) = cos θ − 3 sin2 θ cos θ = cos θ(1 − 3 sin2 θ), (75)

which is in agreement with Eq. (72).


It is not necessary to solve for θ in Eq. (73). All we need to do is to find the values of sin θ and
cos θ from this equation, which is easy:
r
1 1 1 2
sin θ = =√ , cos2 θ = 1 − sin2 θ = 1 − = . (76)
3 3 3 3

We then substitute these values into Eq. (70),


  
3 2 1 2
V = πR √ = √ πR3 . (77)
3 3 3 3

Clearly, this value of V is greater than the values V = 0 at the endpoints of the interval, so we can
comfortably conclude that this is the maximum cylinder volume.
That being said, we could once again investigate how the first derivative changes at this critical
point. We’ll express f ′ (θ) as follows,
 
1
f ′ (θ) = 3 cos θ − sin2 θ . (78)
3

Then it can be seen that

1 1
sin2 θ < ⇒ f ′ (θ) > 0, and sin2 θ > ⇒ f ′ (θ) < 0. (79)
3 3

Once again, the critical point is a local maximum - in fact, it is an absolute maximum.

The result in Eq. (77) is our desired result, but perhaps we can make it more meaningful if we
relate it to the volume of the hemisphere in which the cylinder is embedded. You may or may not
know (it doesn’t matter - you’ll derive this formula in MATH 138) that the volume of a sphere of
radius R is
4
Vsphere = πR3 , (80)
3

217
implying that the volume of the hemisphere is
2 3
Vhemi = πR . (81)
3
Let us now rewrite the maximum cylinder volume in terms of Vsphere :
 
2 3 2 3 1 1
√ πR = πR √ = √ Vhemi ≈ 0.577 Vhemi . (82)
3 3 3 3 3
This gives us a better idea of the size of the cylinder.

Example 3: Snell’s Law of Refraction


We now consider the very important phenomenon of refraction – the “bending” of light rays as
light moves from one medium to another, which is due to the difference in the velocity of light in the
two media. (This is presented as Problem No. 63 on Page 331 of the text by Stewart.)
Let v1 be the velocity of light in air and v2 the velocity of light in water. It is a fact that v1 > v2 ,
i.e., light travels faster in air than in water. Now suppose that light rays are emitted from A in all
directions. Ideally, there is only one ray of light that travels from A and hits point B – during its
travel, it is “bent downward” (toward the normal) as it passes from air to water. This phenomenon is
sketched below.
A
. air
θ1

C
θ2
.B
water

The basic physical principle behind this “bending” or refraction is known as Fermat’s principle:
The path ACB taken by the ray is the one that minimizes the time required to travel from A
to B. (Note that it is the time that is minimized and not the distance. If it were the latter, then
the path would be the straight line AB.)
Our goal is to show that this minimizing path implies that the angles θ1 and θ2 are related as
follows,
sin θ1 v1
= . (83)
sin θ2 v2

218
This is Snell’s Law of Refraction.

To prove Snell’s Law, we shall let x denote the position of point C on the x-axis, which is defined by
the horizontal water surface. Furthermore, introduce points O and P on the x-axis as shown below.
A
.
θ1 d1
a

air C P
0 water x θ
2 b
d2
.B
L

We shall let O define the origin of a coordinate system and let L denote the length of the line OP .
In this way, 0 ≤ x ≤ L. Also let a and b denote the distances between, respectively, A and B to the
air-water interface, as shown in the diagram. We now wish to find the value of x that minimizes the
time T taken for the ray to travel from A to B. For convenience, let d1 = |AC| and d2 = |CB| denote
the distances travelled by the ray in, respectively, air and water. Then

T = t1 + t2 , (84)

where
d1
t1 = (time taken for light ray to travel from A to C), (85)
v1
d2
t2 = (time taken for light ray to travel from C to B), (86)
v2
In terms of x, the distances d1 and d2 become
p p
d1 = a2 + x2 , d2 = (L − x)2 + b2 . (87)

Therefore, the total time T may be expressed as a function of the variable x,

1p 2 1p
T (x) = a + x2 + (L − x)2 + b2 . (88)
v1 v2

We now compute T ′ (x) in order to look for critical points,

1 x 1 (L − x)
T ′ (x) = √ − p . (89)
v1 a2 + x2 v2 (L − x)2 + b2

219
But this result may easily be expressed in terms of the angles θ1 and θ2 as follows,

sin θ1 sin θ2
T ′ (x) = − . (90)
v1 v2

The condition for a critical point is that T ′ (x) = 0. In this case, the RHS is zero and a rearrangement
produces Snell’s Law in (83).
Even though the above method gives the desired answer, namely Snell’s law, we should check if
the critical point corresponds to a minimum or maximum. Differentiation of the RHS in Eq. (89
yields, after a little rearrangement,

1 a2 1 b2
T ′′ (x) = + > 0. (91)
v1 [a2 + x2 ]3/2 v2 [(L − x)2 + b2 ]3/2

Therefore, the graph of T (x) is concave upward on [0, L], implying that the critical point is a global
minimum: T (x) will decrease as x increases from 0 until it attains a minimum value at the critical
point, after which it will increase until x reaches L.

A few notes on this refraction problem are in order before closing this section.

1. First of all, the angles θ1 and θ2 are normally displayed as the angles of incidence and refraction
with respect to the normal to the water surface at the point x, as shown below.

θ1
v1

v2

θ2

Angles of incidence and refraction for light ray travelling from air to water (or vice versa).

2. Even more interesting is that an observer at A will perceive point B as being higher in the water
than it actually is, for example, at point B ′ . Likewise, an observer at B will perceive point A as
being higher than it actually is, for example at point A′ . This is sketched below.

220
A′ apparent position of A

v1

v2

B′ apparent position of B

Apparent positions of each point from the other point’s perspective.

3. Finally, let us imagine a collection of light rays emanating from point B in all directions, and
where some of these will hit the line normal to the air-water interface that passes through point
A. For convenience, we’ll let this normal define the y-axis as shown below.
y

air

0
.B
water

As the light rays from B become more vertical, i.e., the angle θ2 → 0+ , the y-coordinates of the
intersection points increase to infinity. This follows from the fact that the velocity of light in
air, v1 , is greater than the velocity of light in water, v2 . From Snell’s Law, it follows that

v1 v1
sin θ1 = sin θ2 , with > 1. (92)
v2 v2

Because the function sin θ is monotonically increasing on [0, π/2], it follows that θ1 > θ2 . Nev-

221
ertheless, as θ2 → 0+ , we see that θ1 → 0+ .

Now consider the light rays that emanate from B with increasing θ2 , i.e., they start out from
v1
B, travelling leftward and more horizontal. Since > 1, implying that θ1 > θ2 , there will be a
v2
π π
critical value of θ2 – call it θ2∗ < , at which θ1 = . This critical value is easily determined as
2 2
follows,  
π v1 v2 v2
sin = 1 = sin θ2∗ ⇒ sin θ2∗ = ⇒ θ2∗ = arcsin . (93)
2 v2 v1 v1
The ray associated with this critical value θ2∗ is identified in the above figure with a thicker line.
At this critical value, light rays no longer travel into the air.

As you probably know, light rays that emanate from B with θ2 values higher than this critical
value θ2∗ are reflected back into the water. This is known as total internal reflection. However,
Snell’s Law is not sufficient to explain this phenomenon – we must examine the wave picture of
light to account for the reflection.

222
“Newton’s method” for finding the zeros of a function/roots of an
equation

(Relevant section from Stewart, Seventh Edition: Section 4.8, p. 338)

This subject is treated very well in Stewart’s textbook, Section 4.8, but these lecture notes will
provide some supplementary ideas and different perspectives which you may find interesting and
helpful.
First of all, the history of Newton’s method, also referred to as the Newton-Raphson method
is interesting: You can find a brief outline in - where else? - Wikipedia:

http://en.wikipedia.org/wiki/Newton’s method

Very briefly, it was described by Isaac Newton in a book that was written in 1669 but not published
until 1711. His method, however, was different from the form used today – it was based on polynomial
approximations. By the time of the publication of his work, several other people, including Joseph
Raphson also published variations of the method.

The idea of the Newton-Raphson method is to find successive approximations xn of a zero x̄ of a


function f (x), i.e., f (x̄) = 0. As n increases, it is hoped that the xn values provide better and better
approximations to x̄ – in other words, that they converge to x̄, i.e.,

lim xn = x̄. (94)


n→∞

As we’ll see, these approximations xn are produced by the iteration of a function that we shall call
the Newton function. There are advantages and disadvantages to this approach, however.

We begin with a generic illustration of the problem, as shown in the figure below. From the graph
of the function f , we see that it has a zero x̄. Our goal is to start with a “reasonable” approximation
or guess of x̄ – we’ll call it x0 – and then try to come up with a better approximation x1 . Hopefully,
x1 is closer to x̄ than x0 is.
Starting with our initial guess x0 , we evaluate f (x0 ) and then construct the tangent line to the
graph of f at the point (x0 , f (x0 )), as shown in the figure below. Recall that this is the linearization

223
of f at x0 , given by the formula,

Lx0 (x) = f (x0 ) + f ′ (x0 )(x − x0 ). (95)

Of course, we must assume that f ′ (x0 ) exists – for this reason, we’ll assume that f ′ (x) exists for all
x, or at least for all x in an interval of suitable length containing the zero x̄.
y

y = f (x)
y = Lx0 (x)

x
0 x̄ x1 x0

Our next approximation x1 is taken to be the point where the linearization Lx0 (x) intersects the
x-axis, as shown in the above figure. We easily can solve for this point:

Lx0 (x1 ) = 0 ⇒ f (x0 ) + f ′ (x0 )(x1 − x0 ) = 0. (96)

We now solve for x1 :


f (x0 )
x1 = x0 − . (97)
f ′ (x0 )
Of course, we may now continue the procedure, using x1 as our new “guess” and producing the next
approximation, x2 , by constructing the linearization of f at x1 , as sketched below. The result is
f (x1 )
x2 = x1 − . (98)
f ′ (x1 )
We may keep repeating this procedure to produce a sequence of approximations x0 , x1 , x2 , · · · . In
general, if we know element xn , then the next approximation is given by
f (xn )
xn+1 = xn − . (99)
f ′ (xn )

At this point it is convenient to summarize the Newton-Raphson procedure as follows: Given


a function f with a zero x̄, i.e., f (x̄) = 0, that we wish to approximate, we start with an initial

224
y

y = f (x)
y = Lx0 (x)

x
0 x̄ x2 x1 x0

approximation x0 and then perform the iteration process,

f (x)
xn+1 = N (xn ), where N (x) = x − . (100)
f ′ (x)

We shall refer to N (x) as the Newton function associated with the function f .
Eq. (100) represents the iteration of a function. If we consider x0 as the “input” x0 into the
Newton function N (x), the associated “output” is x1 . We then use x1 as an “input” and put it back
into the Newton function to produce the “output” x2 . The procedure is then repeated. We’ll return
to this concept of the iteration of a function a little later.
Note the appearance of f ′ (x) in the denominator of the Newton function in (100). This means
that we should avoid critical points in our iteration procedure. It would be desirable if no critical
points exist in our interval I that contains the zero x̄. This includes the zero x̄ itself. (If f (x̄) = 0 and
f ′ (x̄) = 0, then x̄ is a multiple zero of f , e.g., x = 2 is a double zero of f (x) = (x − 2)2 .)

There are a good number of illustrative examples in Stewart’s textbook, so we’ll consider only a
couple here.


Example 1: Use Newton’s method to estimate 2.

An appropriate function to use here is f (x) = x2 − 2, the zeros of which are ± 2. It is sketched
below. The Newton function N (x) associated with this function is

f (x) x2 − 2
N (x) = x − = x − . (101)
f ′ (x) 2x

225
2
y = x2 − 2

-2 -1 0 1 2

-1

-2

We could leave N (x) in this form or simplify it further as follows,

x2 2 x 1
N (x) = x − + = + . (102)
2x 2x 2 x

We’ll use this final form. Newton’s method then becomes the iteration procedure

xn 1
xn+1 = + (103)
2 xn

for some starting point x0 . Knowing that 2 lies between 1 and 2, let’s start with x0 = 2. Then, to
nine decimal digits,

x0 1 1 3
x1 = + =1+ = = 1.5
2 x0 2 2
x1 1 3 2 17
x2 = + = + = = 1.416̇
2 x1 4 3 12
x2 1 ∼
x3 = + = 1.414215686
2 x2
x3 1 ∼
x4 = + = 1.414213562
2 x3
x3 1 ∼
x5 = + = 1.414213562. (104)
2 x3

Notice that, to nine decimal digits, there is no difference between x4 and x5 , so there is no point in
going further, at least if we are displaying results to nine decimal points. And, indeed, the result x4

is 2 to nine decimal digits. That was fast!

Let’s now start with x0 = 1, on the other side of 2. Then

x0 1 1
x1 = + = + 1 = 1.5. (105)
2 x0 2

226
Note that this value of x1 is the same as was produced by starting with x0 = 2, so we know the result
of this iteration procedure. This doesn’t always happen – you may wish to investigate the graph of
f (x) a little more closely to see why it happens in this particular case.

It can be shown - but we won’t do it here - that for any x0 > 0, the Newton iteration sequence

{xn } converges to 2. Of course, the point x = 0 is to be avoided, since f ′ (0) = 0. And if we start

with x0 < 0, the Newton iteration sequence, as you may expect, will converge to − 2, the other zero
of f (x) = x2 − 2. Therefore, the critical point of f (x), x = 0, serves as the boundary between the
√ √
regions x > 0 and x < 0 that are associated with the respective zeros 2 and − 2.

A further note: With the above example in mind, the set of points x ∈ R which, when used as
starting points for Newton’s method, converge to a given zero x̄ of a function f is known as the basin
of attraction of x̄. In the above example:

1. The basin of attraction of the zero x̄1 = 2 is the set (0, ∞).

2. The basin of attraction of the zero x̄2 = − 2 is the set (−∞, 0).

Finally, we mention that in the above example, you may have noticed that since our starting points
x0 were integers, therefore rational numbers, then the first few iterates were also rational numbers. In
fact, from a look at the Newton function in Eq. (103), we see that if xn is rational, then so is xn+1 .
In other words, the Newton function N (x) maps rational numbers to rational numbers. Therefore, if
we start with a rational number x0 , all higher iterates xn are rational. Newton’s method therefore

generates a series of rational approximations xn to the root x̄ – in this case the irrational number 2
which converge to it in the limit n → ∞.

Example 2: This is probably belabouring the point, but let’s use Newton’s method to estimate
p = 71/5 . Since p5 = 7, the appropriate function to use here is f (x) = x5 − 7. The Newton function
associated with f (x) is
f (x) x5 − 7
N (x) = x − ′
=x− , (106)
f (x) 5x4

227
which can be simplified to
4x 7
N (x) = + 4. (107)
5 5x
Newton’s method then becomes the iteration procedure

4xn 7
xn+1 = + 4 (108)
5 5xn

for some starting point x0 . We note that f (1) = −6 and f (2) = 25. If we start with x0 = 2, then the
first few iterates xn are, to nine decimal places,

x1 = 1.6875

x2 = 1.522644564

x3 = 1.478571365

x4 = 1.475783733

x5 = 1.475773161

x6 = 1.475773162 = 71/5 to nine decimal digits. (109)

The convergence was perhaps not as rapid as the previous example, but still quite good.

Example 3: This example can be found in the textbook (Example 3 on p. 337). It was simply
introduced in the lecture, but we’ll come back to it a little later in these notes. The problem is to find
the root of the equation
cos x = x. (110)

We must rewrite this equation in terms of a function f (x), the zero(s), of which will be the solution
of the equation. In this case,
f (x) = cos x − x, (111)

will do. (We could also use f (x) = x − cos x.) The Newton function N (x) associated with f (x) is

f (x) cos x − x cos x − x


N (x) = x − =x− =x+ . (112)
f ′ (x) − sin x − 1 sin x + 1

The solution of Eq. (110) represents the intersection of the graphs of the functions cos x and x. From
an examination of these graphs (see textbook, p. 338), it would seem that x0 = 1 is a reasonable

228
choice for an initial approximation. We then find, using nine decimal digits accuracy, that

x1 = 0.7503638679

x2 = 0.7391128909

x3 = 0.7390851334

x4 = 0.7390851332

x5 = 0.7390851332. (113)

Therefore, to nine decimal places, the root of Eq. (110) is x = 0.7390851332.

Some problems with Newton’s method

Newton’s method is based on local analysis – the behaviour of iterates of the Newton function N (x)
in a sufficiently small neighbourhood of a simple zero x̄ of a function f (x). It works extremely well if
we start close to a zero. But that’s about it – even though you think that you may be close to the
zero of a function, you may not be close enough, and Newton’s method may not necessarily take you
there. As such, care must be exercised when using Newton’s method.
Here is an illustration of what can go wrong. In the figure below is sketched the graph of a function
with several zeros. If you start with the seed point x0 which is quite close to the leftmost zero in the
figure, x̄1 , then Newton’s method will take you to x̄1 . But if you start with the more distant seed
point p0 , the point p1 = N (p0 ) actually jumps over the first two zeros, to be in a position to converge
to the third zero, x̄3 . Graphically, of course, we see the problem – as we move leftward away from the
zero x̄1 , the graph of the function begins to level off, i.e., veer more toward the left.

y = f (x)

p0 q0 x0 x̄1 x̄2 p1 x̄3

And, indeed, the siutation is even more complicated. Between p0 and x0 , there is a region over

229
which the slope of the graph of f (x) is such that starting points, such as q0 , will be mapped very close
to the second zero, x̄2 .
As a result, the basins of attraction of these three zeros are not simple intervals on the real line
– they will be comprised of subintervals. As such, the basins of attraction of the three zeros sketched
above will be intermingled with each other.
And then follows the natural question: What is the boundary of these basins of attraction? In
the case of three or more roots, this is an extremely interesting question that is certainly beyond the
scope of this course. To understand this problem, one must actually extend Newton’s method into the
complex plane. Very briefly, for the case of three or more zeros, the boundary separating the basins
of attraction of the zeros has a fractal structure. There is a remarkable result involving the theory
of complex functions stating that if you take a point z from such a boundary, and draw a tiny circle
of radius ǫ > 0 around it in the complex plane, then this circular region must contain points from all
basins of attraction! To get an idea of the fascinating structure of these sets, you are invited to look
at the following research paper,

E.R. Vrscay, Julia Sets and Mandelbrot-like Sets Associated With Higher Order Schröder
Rational Iteration Functions: A Computer-Assisted Study, Mathematics of Computation,
Vol. 46, No. 173, 151-169 (1986).

a copy of which has been posted after this set of lecture notes on UW-ACE.

Newton’s method as an iteration procedure, and the role of “fixed points”

As discussed earlier, Newton’s method is an iteration procedure. Given a function f (x), the zero(s)
of which we are interested in approximating, its associated Newton function N (x) is given by

f (x)
N (x) = x − . (114)
f ′ (x)

Given a seed point x0 sufficiently close to a zero x̄ of f , we form the iteration sequence, x1 , x2 , · · · , as
follows,
xn+1 = N (xn ). (115)

If the method “works,” then the iterates xn approach the zero x̄, i.e.,

lim xn = x̄. (116)


n→∞

230
There is something deeper going on here. What happens if we set x0 = x̄, the zero of f (x)? In this
case,
f (x̄)
x1 = N (x0 ) = N (x̄) = x̄ − = x̄, (117)
f ′ (x̄)
since f (x̄) = 0. In summary,
x̄ = N (x̄), (118)

i.e., N maps the point x̄ to itself. Such a point is known as a fixed point.

Fixed points play an extremely important role in mathematics, both from a theoretical as well
as a practical, especially computational, viewpoint. Many algorithms for the solution of problems are
based on the fact that an iteration procedure will produce a sequence of iterates xn that converge to
a fixed point x̄ of a function.

In the Newton iteration procedure outlined above, the zero x̄ of the function f (x) was seen to be
a fixed point of the Newton function N (x). Moreover, the fixed point x̄ is attractive since, according
to Eq. (116), the iterates xn converge to it.
For the sake of completeness, we provide a more precise definition of an attractive fixed point.
We’ll consider a function g(x) instead of f (x), in order to avoid any confusion with the function f (x)
used earlier.

A fixed point p of a function g(x) is a point for which

g(p) = p. (119)

Moreover, the fixed point p is attractive if there exists an open interval I which contains
p and for which
| g(x) − p | < | x − p | for all x ∈ I. (120)

In other words, for any x ∈ I, the point g(x) is closer to p than x is.

Graphically, the situation is sketched below. The fixed point p is the intersection of the graph of
g(x), i.e., y = g(x) and the line y = x. For an x ∈ I, the distance |g(x) − p| marked on the y-axis is
less than the distance |x − p| marked on the x-axis.

231
y=x

p
|g(x) − p|
g(x)
y = g(x)

p x

|x − p|

An attractive fixed point p = g(p)

The astute reader may start to wonder if the slope of the function g(x) around the point p has
something to do about the attractiveness of the fixed point p. If the magnitude of the slope were too
large, then the point g(x) could actually be repelled from p. The answer to this conjecture is yes.
We state the result a little more formally below.

Let g be a function with continuous first derivative, i.e., the function g′ (x) is a continuous
function of x. Furthermore, suppose that p is a fixed point of g and |g′ (p)| < 1. Then p is
an attractive fixed point of g (as defined earlier).

This result, which may be proved with the help of the Intermediate and Mean Value Theorems,
is left as an exercise. (It is posted as a bonus problem in the current assignment (No. 9).)

Returning to the Newton function N (x) associated with a function f (x), the situation is even
better that that of an attractive fixed point. Let’s compute the derivative of N (x) at the fixed point
x̄, i.e., N ′ (x̄):
 
′ d 1
N (x) = 1 − f (x) ′
dx f (x)

f (x) f ′′ (x)
= 1− ′ + f (x) ′
f (x) [f (x)]2
f (x)f ′′ (x)
= . (121)
[f ′ (x)]2
Recalling that f (x̄) = x̄, this implies that

N ′ (x̄) = 0. (122)

232
Obviously, |N ′ (x̄)| < 1, making x̄ an attractive fixed point. But the fact that N ′ (x̄) = 0 actually
makes it superattractive. We’ll simply state the fact that there exists an interval I containing x̄,
and a constant K ≥ 0 such that

|N (x) − x̄| ≤ K|x − x̄|2 , for all x ∈ I. (123)

This means that if the Newton iterate xn lies a distance ǫ from x̄, i.e, |xn − x̄| = ǫ, then the next
Newton iterate xn+1 = N (xn ) lies a distance of less than Kǫ2 from x̄. For ǫ very small, this is a
significant improvement. Because the error is a multiple of ǫ2 , this behaviour is known as quadratic
convergence. If you go back and look at the iterates involved in some of our early illustrative
examples of Newton’s method, you’ll see the quadratic convergence at work. This convergence is
much faster than the 2−n convergence of the bisection method examined earlier in this course.

Some final comments:

1. A proof of quadratic convergence for Newton’s method can be performed using Taylor series
expansion of the Newton function N (x) about the fixed point x̄. You’ll learn more about
Taylor series in MATH 138. But that being said, you can prove quadratic convergence using the
quadratic approximation to Newton’s method – you have already encountered the quadratic
approximation to a function g(x) in this course.

2. The iteration of a function has an interesting graphical interpretation, which was introduced very
briefly in the lecture. For more information, you may consult the supplementary notes, “Graph-
ical interpretation of iteration,” which are posted after this week’s lecture notes on Waterloo
LEARN.

233
Lecture 29

Antiderivatives

(Relevant section from Stewart, Seventh Edition: Section 4.9)

We have already discussed the idea of antiderivatives earlier in this course, because of the need to
relate the velocity function v(t) = x′ (t) to the position function x(t). Obviously, the velocity function
v(t) is the derivative of the position function x(t). But in reverse, since the derivative of the position
function x(t) is the velocity function v(t), x(t) is the antiderivative of v(t).
1 3
As a particular example, given the function f (x) = x2 , then the function F (x) = x is an an-
3
tiderivative of f (x) since F ′ (x) = f (x). As we proved in an earlier lecture, the set of all antiderivatives
of x2 may be given by the set of functions,

1 3
x + C, where C is an arbitrary constant. (124)
3

This is a one-parameter family of functions.


As you may already know, we may summarize this example as follows

1
Z
x2 dx = x3 + C. (125)
3

The indefinite integral on the LHS of the above equation represents the general antiderivative
of the function x2 . And the RHS is the set of all such antiderivatives.

You are encouraged to read Section 4.9 in detail and study the examples discussed therein.

234
Integration

Definite integrals

(Relevant sections from Stewart, Seventh Edition: Sections 5.1 and 5.2)

The material presented in this lecture closely follows the presentation of Section 5.1 of the textbook
and therefore will not be reproduced here. Please read this and Section 5.2 for an excellent discussion
of integration along with numerous helpful examples.
Here is a very brief summary of what was covered in the lecture:
Let f (x) be a continuous (or piecewise continuous function) defined over the interval [a, b]. We
assumed that f (x) was positive, i.e., f (x) > 0 on [a, b] and that the goal was to find the area A of the
region in the plane enclosed by the graph of f (x), the x-axis and the lines x = a and x = b.
b−a
We first dividd this region into n strips of equal width ∆x = . This is conveniently done by
n
introducing the following set of partition points xk on the interval [a, b]:

b−a
xk = a + ∆x = a + k · . (126)
n

on the In this way the endpoints of the interval [a, b] are x0 = a and xn = b. We also let Ik = [xk−1 , xk ]
denote the kth subinterval produced by these partition points. The area Ak lies above the interval Ik .
For each interval Ik , 1 ≤ k ≤ n, pick a sample point x∗k ∈ Ik . This sample point could be the left
endpoint of Ik , i.e., xk−1 , the right endpoint xk or even the midpoint of Ik . (In fact, in computations,
it might be desirable to use the midpoint.)
Then evaluate the function f (x) at this sample point, i.e., compute f (x∗k ). The idea is that the
area of the kth strip is now approximated by the rectangle of width ∆x and height f (x∗k ), i.e.,

Ak ≈ f (x∗k )∆x, 1 ≤ k ≤ n. (127)

Therefore, the total area A is approximated as follows


n
X n
X
A= Ak ≈ f (x∗k )∆x . (128)
k=1 k=1

The sum on the right-hand side, which we’ll denote as


n
X
Sn = f (x∗k )∆x (129)
k=1

235
is known as a Riemann sum. It is an approximation to the true area. Note the subscript n which
indicates the number of subintervals that are being used to approximate the area.
We now claim that in the limit n → ∞, which implies that ∆x → 0, the Riemann sums Sn
converge to a limit, and the limit will be the desired area A, i.e.,

A = lim Sn . (130)
n→∞

This limit will be independent of the choice of partition points x∗k ∈ Ik employed. This seems appro-
priate since the width ∆x of the subintervals Ik tends to zero as n → ∞.
In general, the function f (x) does not have to be positive over the interval [a, b], in which case this
procedure does not yield the area, but rather a signed area. But more on this later. The final result
is that if f (x) is a piecewise continuous function on the interval [a, b], then the sequence of Riemann
sums defined in Eq. (129) converges to a limit, denoted as follows,
n
X Z b
lim Sn = lim f (x∗k )∆x = f (x) dx. (131)
n→∞ n→∞ a
k=1

The quantity on the right represents the so-called Riemann integral of the function f (x) over the
interval [a, b].
The Riemann integral of a function will have many interesting applications in physics, as we’ll see
very shortly.

236

You might also like