This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

**FOREWORD BY CARVER A. MEAD
**

THE ART OF EDUCATED GUESSING AND

OPPORTUNI STIC PROBLEM SOLVI NG

STREET-FIGHTING

MATHEMATICS

Street-Fighting Mathematics

Street-Fighting Mathematics

The Art of Educated Guessing and

Opportunistic Problem Solving

Sanjoy Mahajan

Foreword by Carver A. Mead

The MIT Press

Cambridge, Massachusetts

London, England

C 2010 by Sanjoy Mahajan

Foreword C 2010 by Carver A. Mead

Street-Fighting Mathematics: The Art of Educated Guessing and Opportunistic

Problem Solving by Sanjoy Mahajan (author), Carver A. Mead (foreword),

and MIT Press (publisher) is licensed under the Creative Commons

Attribution–Noncommercial–Share Alike 3.0 United States License.

A copy of the license is available at

http://creativecommons.org/licenses/by-nc-sa/3.0/us/

For information about special quantity discounts, please email

special_sales@mitpress.mit.edu

Typeset in Palatino and Euler by the author using ConT

E

Xt and PDFT

E

X

Library of Congress Cataloging-in-Publication Data

Mahajan, Sanjoy, 1969–

Street-ﬁghting mathematics : the art of educated guessing and

opportunistic problem solving / Sanjoy Mahajan ; foreword by

Carver A. Mead.

p. cm.

Includes bibliographical references and index.

ISBN 978-0-262-51429-3 (pbk. : alk. paper) 1. Problem solving.

2. Hypothesis. 3. Estimation theory. I. Title.

QA63.M34 2010

510—dc22

2009028867

Printed and bound in the United States of America

10 9 8 7 6 5 4 3 2 1

For Juliet

Brief contents

Foreword xi

Preface xiii

1 Dimensions 1

2 Easy cases 13

3 Lumping 31

4 Pictorial proofs 57

5 Taking out the big part 77

6 Analogy 99

Bibliography 123

Index 127

Contents

Foreword xi

Preface xiii

1 Dimensions 1

1.1 Economics: The power of multinational corporations 1

1.2 Newtonian mechanics: Free fall 3

1.3 Guessing integrals 7

1.4 Summary and further problems 11

2 Easy cases 13

2.1 Gaussian integral revisited 13

2.2 Plane geometry: The area of an ellipse 16

2.3 Solid geometry: The volume of a truncated pyramid 17

2.4 Fluid mechanics: Drag 21

2.5 Summary and further problems 29

3 Lumping 31

3.1 Estimating populations: How many babies? 32

3.2 Estimating integrals 33

3.3 Estimating derivatives 37

3.4 Analyzing diﬀerential equations: The spring–mass system 42

3.5 Predicting the period of a pendulum 46

3.6 Summary and further problems 54

4 Pictorial proofs 57

4.1 Adding odd numbers 58

4.2 Arithmetic and geometric means 60

4.3 Approximating the logarithm 66

4.4 Bisecting a triangle 70

4.5 Summing series 73

4.6 Summary and further problems 75

x

5 Taking out the big part 77

5.1 Multiplication using one and few 77

5.2 Fractional changes and low-entropy expressions 79

5.3 Fractional changes with general exponents 84

5.4 Successive approximation: How deep is the well? 91

5.5 Daunting trigonometric integral 94

5.6 Summary and further problems 97

6 Analogy 99

6.1 Spatial trigonometry: The bond angle in methane 99

6.2 Topology: How many regions? 103

6.3 Operators: Euler–MacLaurin summation 107

6.4 Tangent roots: A daunting transcendental sum 113

6.5 Bon voyage 121

Bibliography 123

Index 127

Foreword

Most of us took mathematics courses from mathematicians—Bad Idea!

Mathematicians see mathematics as an area of study in its own right.

The rest of us use mathematics as a precise language for expressing rela-

tionships among quantities in the real world, and as a tool for deriving

quantitative conclusions from these relationships. For that purpose, math-

ematics courses, as they are taught today, are seldom helpful and are often

downright destructive.

As a student, I promised myself that if I ever became a teacher, I would

never put a student through that kind of teaching. I have spent my life

trying to ﬁnd direct and transparent ways of seeing reality and trying to

express these insights quantitatively, and I have never knowingly broken

my promise.

With rare exceptions, the mathematics that I have found most useful was

learned in science and engineering classes, on my own, or from this book.

Street-Fighting Mathematics is a breath of fresh air. Sanjoy Mahajan teaches

us, in the most friendly way, tools that work in the real world. Just when

we think that a topic is obvious, he brings us up to another level. My

personal favorite is the approach to the Navier–Stokes equations: so nasty

that I would never even attempt a solution. But he leads us through one,

gleaning gems of insight along the way.

In this little book are insights for every one of us. I have personally

adopted several of the techniques that you will ﬁnd here. I recommend

it highly to every one of you.

—Carver Mead

Preface

Too much mathematical rigor teaches rigor mortis: the fear of making

an unjustiﬁed leap even when it lands on a correct result. Instead of

paralysis, have courage—shoot ﬁrst and ask questions later. Although

unwise as public policy, it is a valuable problem-solving philosophy, and

it is the theme of this book: how to guess answers without a proof or an

exact calculation.

Educated guessing and opportunistic problem solving require a toolbox.

A tool, to paraphrase George Polya, is a trick I use twice. This book

builds, sharpens, and demonstrates tools useful across diverse ﬁelds of

human knowledge. The diverse examples help separate the tool—the

general principle—from the particular applications so that you can grasp

and transfer the tool to problems of particular interest to you.

The examples used to teach the tools include guessing integrals with-

out integrating, refuting a common argument in the media, extracting

physical properties from nonlinear diﬀerential equations, estimating drag

forces without solving the Navier–Stokes equations, ﬁnding the shortest

path that bisects a triangle, guessing bond angles, and summing inﬁnite

series whose every term is unknown and transcendental.

This book complements works such as How to Solve It [37], Mathematics

and Plausible Reasoning [35, 36], and The Art and Craft of Problem Solving

[49]. They teach how to solve exactly stated problems exactly, whereas life

often hands us partly deﬁned problems needing only moderately accurate

solutions. A calculation accurate only to a factor of 2 may show that

a proposed bridge would never be built or a circuit could never work.

The eﬀort saved by not doing the precise analysis can be spent inventing

promising new designs.

This book grew out of a short course of the same name that I taught

for several years at MIT. The students varied widely in experience: from

ﬁrst-year undergraduates to graduate students ready for careers in re-

search and teaching. The students also varied widely in specialization:

xiv Preface

from physics, mathematics, and management to electrical engineering,

computer science, and biology. Despite or because of the diversity, the

students seemed to beneﬁt from the set of tools and to enjoy the diversity

of illustrations and applications. I wish the same for you.

How to use this book

Aristotle was tutor to the young Alexander of Macedon (later, Alexander

the Great). As ancient royalty knew, a skilled and knowledgeable tutor is

the most eﬀective teacher [8]. A skilled tutor makes few statements and

asks many questions, for she knows that questioning, wondering, and

discussing promote long-lasting learning. Therefore, questions of two

types are interspersed through the book.

Questions marked with a in the margin: These questions are what a tutor

might ask you during a tutorial, and ask you to work out the next steps

in an analysis. They are answered in the subsequent text, where you can

check your solutions and my analysis.

Numbered problems: These problems, marked with a shaded background,

are what a tutor might give you to take home after a tutorial. They ask

you to practice the tool, to extend an example, to use several tools together,

and even to resolve (apparent) paradoxes.

Try many questions of both types!

Copyright license

This book is licensed under the same license as MIT’s OpenCourseWare: a

Creative Commons Attribution-Noncommercial-Share Alike license. The

publisher and I encourage you to use, improve, and share the work non-

commercially, and we will gladly receive any corrections and suggestions.

Acknowledgments

I gratefully thank the following individuals and organizations.

For the title: Carl Moyer.

For editorial guidance: Katherine Almeida and Robert Prior.

For sweeping, thorough reviews of the manuscript: Michael Gottlieb, David

Hogg, David MacKay, and Carver Mead.

Preface xv

For being inspiring teachers: John Allman, Arthur Eisenkraft, Peter Goldre-

ich, John Hopﬁeld, Jon Kettenring, Geoﬀrey Lloyd, Donald Knuth, Carver

Mead, David Middlebrook, Sterl Phinney, and Edwin Taylor.

For many valuable suggestions and discussions: Shehu Abdussalam, Daniel

Corbett, Dennis Freeman, Michael Godfrey, Hans Hagen, Jozef Hanc, Taco

Hoekwater, Stephen Hou, Kayla Jacobs, Aditya Mahajan, Haynes Miller,

Elisabeth Moyer, Hubert Pham, Benjamin Rapoport, Rahul Sarpeshkar,

Madeleine Sheldon-Dante, Edwin Taylor, Tadashi Tokieda, Mark Warner,

and Joshua Zucker.

For advice on the process of writing: Carver Mead and Hillary Rettig.

For advice on the book design: Yasuyo Iguchi.

For advice on free licensing: Daniel Ravicher and Richard Stallman.

For the free software used for calculations: Fredrik Johansson (mpmath), the

Maxima project, and the Python community.

For the free software used for typesetting: Hans Hagen and Taco Hoekwater

(ConT

E

Xt); Han The Thanh (PDFT

E

X); Donald Knuth (T

E

X); John Hobby

(MetaPost); John Bowman, Andy Hammerlindl, and Tom Prince (Asymp-

tote); Matt Mackall (Mercurial); Richard Stallman (Emacs); and the Debian

GNU/Linux project.

For supporting my work in science and mathematics teaching: The Whitaker

Foundation in Biomedical Engineering; the Hertz Foundation; the Master

and Fellows of Corpus Christi College, Cambridge; the MIT Teaching

and Learning Laboratory and the Oﬃce of the Dean for Undergraduate

Education; and especially Roger Baker, John Williams, and the Trustees

of the Gatsby Charitable Foundation.

Bon voyage

As our ﬁrst tool, let’s welcome a visitor from physics and engineering:

the method of dimensional analysis.

1

Dimensions

1.1 Economics: The power of multinational corporations 1

1.2 Newtonian mechanics: Free fall 3

1.3 Guessing integrals 7

1.4 Summary and further problems 11

Our ﬁrst street-ﬁghting tool is dimensional analysis or, when abbreviated,

dimensions. To show its diversity of application, the tool is introduced

with an economics example and sharpened on examples from Newtonian

mechanics and integral calculus.

1.1 Economics: The power of multinational corporations

Critics of globalization often make the following comparison [25] to prove

the excessive power of multinational corporations:

In Nigeria, a relatively economically strong country, the GDP [gross domestic

product] is $99 billion. The net worth of Exxon is $119 billion. “When multi-

nationals have a net worth higher than the GDP of the country in which they

operate, what kind of power relationship are we talking about?” asks Laura

Morosini.

Before continuing, explore the following question:

What is the most egregious fault in the comparison between Exxon and Nigeria?

The ﬁeld is competitive, but one fault stands out. It becomes evident after

unpacking the meaning of GDP. A GDP of $99 billion is shorthand for

a monetary ﬂow of $99 billion per year. A year, which is the time for

the earth to travel around the sun, is an astronomical phenomenon that

2 1 Dimensions

has been arbitrarily chosen for measuring a social phenomenon—namely,

monetary ﬂow.

Suppose instead that economists had chosen the decade as the unit of

time for measuring GDP. Then Nigeria’s GDP (assuming the ﬂow remains

steady from year to year) would be roughly $1 trillion per decade and

be reported as $1 trillion. Now Nigeria towers over Exxon, whose puny

assets are a mere one-tenth of Nigeria’s GDP. To deduce the opposite

conclusion, suppose the week were the unit of time for measuring GDP.

Nigeria’s GDP becomes $2 billion per week, reported as $2 billion. Now

puny Nigeria stands helpless before the mighty Exxon, 50-fold larger than

Nigeria.

A valid economic argument cannot reach a conclusion that depends on

the astronomical phenomenon chosen to measure time. The mistake lies

in comparing incomparable quantities. Net worth is an amount: It has

dimensions of money and is typically measured in units of dollars. GDP,

however, is a ﬂow or rate: It has dimensions of money per time and

typical units of dollars per year. (A dimension is general and independent

of the system of measurement, whereas the unit is how that dimension is

measured in a particular system.) Comparing net worth to GDP compares

a monetary amount to a monetary ﬂow. Because their dimensions diﬀer,

the comparison is a category mistake [39] and is therefore guaranteed to

generate nonsense.

Problem 1.1 Units or dimensions?

Are meters, kilograms, and seconds units or dimensions? What about energy,

charge, power, and force?

A similarly ﬂawed comparison is length per time (speed) versus length:

“I walk 1.5 ms

−1

—much smaller than the Empire State building in New

York, which is 300 m high.” It is nonsense. To produce the opposite but

still nonsense conclusion, measure time in hours: “I walk 5400 m/hr—

much larger than the Empire State building, which is 300 m high.”

I often see comparisons of corporate and national power similar to our

Nigeria–Exxon example. I once wrote to one author explaining that I

sympathized with his conclusion but that his argument contained a fatal

dimensional mistake. He replied that I had made an interesting point

but that the numerical comparison showing the country’s weakness was

stronger as he had written it, so he was leaving it unchanged!

1.2 Newtonian mechanics: Free fall 3

A dimensionally valid comparison would compare like with like: either

Nigeria’s GDP with Exxon’s revenues, or Exxon’s net worth with Nige-

ria’s net worth. Because net worths of countries are not often tabulated,

whereas corporate revenues are widely available, try comparing Exxon’s

annual revenues with Nigeria’s GDP. By 2006, Exxon had become Exxon

Mobil with annual revenues of roughly $350 billion—almost twice Nige-

ria’s 2006 GDP of $200 billion. This valid comparison is stronger than the

ﬂawed one, so retaining the ﬂawed comparison was not even expedient!

That compared quantities must have identical dimensions is a necessary

condition for making valid comparisons, but it is not suﬃcient. A costly

illustration is the 1999 Mars Climate Orbiter (MCO), which crashed into

the surface of Mars rather than slipping into orbit around it. The cause,

according to the Mishap Investigation Board (MIB), was a mismatch be-

tween English and metric units [26, p. 6]:

The MCO MIB has determined that the root cause for the loss of the MCO

spacecraft was the failure to use metric units in the coding of a ground

software ﬁle, Small Forces, used in trajectory models. Speciﬁcally, thruster

performance data in English units instead of metric units was used in the

software application code titled SM_FORCES (small forces). A ﬁle called An-

gular Momentum Desaturation (AMD) contained the output data from the

SM_FORCES software. The data in the AMD ﬁle was required to be in metric

units per existing software interface documentation, and the trajectory model-

ers assumed the data was provided in metric units per the requirements.

Make sure to mind your dimensions and units.

Problem 1.2 Finding bad comparisons

Look for everyday comparisons—for example, on the news, in the newspaper,

or on the Internet—that are dimensionally faulty.

1.2 Newtonian mechanics: Free fall

Dimensions are useful not just to debunk incorrect arguments but also to

generate correct ones. To do so, the quantities in a problem need to have

dimensions. As a contrary example showing what not to do, here is how

many calculus textbooks introduce a classic problem in motion:

A ball initially at rest falls from a height of h feet and hits the ground at a

speed of v feet per second. Find v assuming a gravitational acceleration of g

feet per second squared and neglecting air resistance.

4 1 Dimensions

The units such as feet or feet per second are highlighted in boldface

because their inclusion is so frequent as to otherwise escape notice, and

their inclusion creates a signiﬁcant problem. Because the height is h

feet, the variable h does not contain the units of height: h is therefore

dimensionless. (For h to have dimensions, the problem would instead

state simply that the ball falls from a height h; then the dimension of

length would belong to h.) A similar explicit speciﬁcation of units means

that the variables g and v are also dimensionless. Because g, h, and v

are dimensionless, any comparison of v with quantities derived from g

and h is a comparison between dimensionless quantities. It is therefore

always dimensionally valid, so dimensional analysis cannot help us guess

the impact speed.

Giving up the valuable tool of dimensions is like ﬁghting with one hand

tied behind our back. Thereby constrained, we must instead solve the

following diﬀerential equation with initial conditions:

d

2

y

dt

2

= −g, with y(0) = h and dy/dt = 0 at t = 0, (1.1)

where y(t) is the ball’s height, dy/dt is the ball’s velocity, and g is the

gravitational acceleration.

Problem 1.3 Calculus solution

Use calculus to show that the free-fall diﬀerential equation d

2

y/dt

2

= −g with

initial conditions y(0) = h and dy/dt = 0 at t = 0 has the following solution:

dy

dt

= −gt and y = −

1

2

gt

2

+h. (1.2)

Using the solutions for the ball’s position and velocity in Problem 1.3, what is

the impact speed?

When y(t) = 0, the ball meets the ground. Thus the impact time t

0

is

**2h/g. The impact velocity is −gt
**

0

or −

√

2gh. Therefore the impact

speed (the unsigned velocity) is

√

2gh.

This analysis invites several algebra mistakes: forgetting to take a square

root when solving for t

0

, or dividing rather than multiplying by g when

ﬁnding the impact velocity. Practice—in other words, making and cor-

recting many mistakes—reduces their prevalence in simple problems, but

complex problems with many steps remain mineﬁelds. We would like

less error-prone methods.

1.2 Newtonian mechanics: Free fall 5

One robust alternative is the method of dimensional analysis. But this

tool requires that at least one quantity among v, g, and h have dimensions.

Otherwise, every candidate impact speed, no matter how absurd, equates

dimensionless quantities and therefore has valid dimensions.

Therefore, let’s restate the free-fall problem so that the quantities retain

their dimensions:

A ball initially at rest falls from a height h and hits the ground at speed v.

Find v assuming a gravitational acceleration g and neglecting air resistance.

The restatement is, ﬁrst, shorter and crisper than the original phrasing:

A ball initially at rest falls from a height of h feet and hits the ground at a

speed of v feet per second. Find v assuming a gravitational acceleration of g

feet per second squared and neglecting air resistance.

Second, the restatement is more general. It makes no assumption about

the system of units, so it is useful even if meters, cubits, or furlongs are

the unit of length. Most importantly, the restatement gives dimensions to

h, g, and v. Their dimensions will almost uniquely determine the impact

speed—without our needing to solve a diﬀerential equation.

The dimensions of height h are simply length or, for short, L. The dimen-

sions of gravitational acceleration g are length per time squared or LT

−2

,

where T represents the dimension of time. A speed has dimensions of

LT

−1

, so v is a function of g and h with dimensions of LT

−1

.

Problem 1.4 Dimensions of familiar quantities

In terms of the basic dimensions length L, mass M, and time T, what are the

dimensions of energy, power, and torque?

What combination of g and h has dimensions of speed?

The combination

√

gh has dimensions of speed.

LT

−2

. .. .

g

× L

....

h

1/2

=

√

L

2

T

−2

= LT

−1

. .. .

speed

. (1.3)

Is

√

gh the only combination of g and h with dimensions of speed?

In order to decide whether

√

gh is the only possibility, use constraint

propagation [43]. The strongest constraint is that the combination of g and

h, being a speed, should have dimensions of inverse time (T

−1

). Because

h contains no dimensions of time, it cannot help construct T

−1

. Because

6 1 Dimensions

g contains T

−2

, the T

−1

must come from

√

g. The second constraint is

that the combination contain L

1

. The

√

g already contributes L

1/2

, so the

missing L

1/2

must come from

√

h. The two constraints thereby determine

uniquely how g and h appear in the impact speed v.

The exact expression for v is, however, not unique. It could be

√

gh,

√

2gh,

or, in general,

√

gh×dimensionless constant. The idiom of multiplication

by a dimensionless constant occurs frequently and deserves a compact

notation akin to the equals sign:

v ∼

gh. (1.4)

Including this ∼ notation, we have several species of equality:

∝ equality except perhaps for a factor with dimensions,

∼ equality except perhaps for a factor without dimensions,

≈ equality except perhaps for a factor close to 1.

(1.5)

The exact impact speed is

√

2gh, so the dimensions result

√

gh contains

the entire functional dependence! It lacks only the dimensionless factor

√

2, and these factors are often unimportant. In this example, the height

might vary from a few centimeters (a ﬂea hopping) to a few meters (a cat

jumping from a ledge). The factor-of-100 variation in height contributes

a factor-of-10 variation in impact speed. Similarly, the gravitational accel-

eration might vary from 0.27 ms

−2

(on the asteroid Ceres) to 25 ms

−2

(on

Jupiter). The factor-of-100 variation in g contributes another factor-of-10

variation in impact speed. Much variation in the impact speed, therefore,

comes not from the dimensionless factor

√

2 but rather from the symbolic

factors—which are computed exactly by dimensional analysis.

Furthermore, not calculating the exact answer can be an advantage. Exact

answers have all factors and terms, permitting less important information,

such as the dimensionless factor

√

2, to obscure important information

such as

√

gh. As William James advised, “The art of being wise is the art

of knowing what to overlook” [19, Chapter 22].

Problem 1.5 Vertical throw

You throw a ball directly upward with speed v

0

. Use dimensional analysis to

estimate how long the ball takes to return to your hand (neglecting air resistance).

Then ﬁnd the exact time by solving the free-fall diﬀerential equation. What

dimensionless factor was missing from the dimensional-analysis result?

1.3 Guessing integrals 7

1.3 Guessing integrals

The analysis of free fall (Section 1.2) shows the value of not separating

dimensioned quantities from their units. However, what if the quantities

are dimensionless, such as the 5 and x in the following Gaussian integral:

∞

−∞

e

−5x

2

dx ? (1.6)

Alternatively, the dimensions might be unspeciﬁed—a common case in

mathematics because it is a universal language. For example, probability

theory uses the Gaussian integral

x

2

x

1

e

−x

2

/2σ

2

dx, (1.7)

where x could be height, detector error, or much else. Thermal physics

uses the similar integral

e

−

1

2

mv

2

/kT

dv, (1.8)

where v is a molecular speed. Mathematics, as the common language,

studies their common form

¸

e

−αx

2

without specifying the dimensions of

α and x. The lack of speciﬁcity gives mathematics its power of abstraction,

but it makes using dimensional analysis diﬃcult.

How can dimensional analysis be applied without losing the beneﬁts of mathe-

matical abstraction?

The answer is to ﬁnd the quantities with unspeciﬁed dimensions and then

to assign them a consistent set of dimensions. To illustrate the approach,

let’s apply it to the general deﬁnite Gaussian integral

∞

−∞

e

−αx

2

dx. (1.9)

Unlike its speciﬁc cousin with α = 5, which is the integral

¸

∞

−∞

e

−5x

2

dx,

the general form does not specify the dimensions of x or α—and that

openness provides the freedom needed to use the method of dimensional

analysis.

The method requires that any equation be dimensionally valid. Thus,

in the following equation, the left and right sides must have identical

dimensions:

8 1 Dimensions

∞

−∞

e

−αx

2

dx = something. (1.10)

Is the right side a function of x? Is it a function of α? Does it contain a constant

of integration?

The left side contains no symbolic quantities other than x and α. But

x is the integration variable and the integral is over a deﬁnite range, so

x disappears upon integration (and no constant of integration appears).

Therefore, the right side—the “something”—is a function only of α. In

symbols,

∞

−∞

e

−αx

2

dx = f(α). (1.11)

The function f might include dimensionless numbers such as 2/3 or

√

π,

but α is its only input with dimensions.

For the equation to be dimensionally valid, the integral must have the

same dimensions as f(α), and the dimensions of f(α) depend on the

dimensions of α. Accordingly, the dimensional-analysis procedure has

the following three steps:

Step 1. Assign dimensions to α (Section 1.3.1).

Step 2. Find the dimensions of the integral (Section 1.3.2).

Step 3. Make an f(α) with those dimensions (Section 1.3.3).

1.3.1 Assigning dimensions to α

The parameter α appears in an exponent. An exponent speciﬁes how

many times to multiply a quantity by itself. For example, here is 2

n

:

2

n

= 2 ×2 ×· · · ×2

. .. .

n terms

. (1.12)

The notion of “how many times” is a pure number, so an exponent is

dimensionless.

Hence the exponent −αx

2

in the Gaussian integral is dimensionless. For

convenience, denote the dimensions of α by [α] and of x by [x]. Then

[α] [x]

2

= 1, (1.13)

1.3 Guessing integrals 9

or

[α] = [x]

−2

. (1.14)

This conclusion is useful, but continuing to use unspeciﬁed but general

dimensions requires lots of notation, and the notation risks burying the

reasoning.

The simplest alternative is to make x dimensionless. That choice makes α

and f(α) dimensionless, so any candidate for f(α) would be dimensionally

valid, making dimensional analysis again useless. The simplest eﬀective

alternative is to give x simple dimensions—for example, length. (This

choice is natural if you imagine the x axis lying on the ﬂoor.) Then

[α] = L

−2

.

1.3.2 Dimensions of the integral

The assignments [x] = L and [α] = L

−2

determine the dimensions of the

Gaussian integral. Here is the integral again:

∞

−∞

e

−αx

2

dx. (1.15)

The dimensions of an integral depend on the dimensions of its three

pieces: the integral sign

¸

, the integrand e

−αx

2

, and the diﬀerential dx.

The integral sign originated as an elongated S for Summe, the German

word for sum. In a valid sum, all terms have identical dimensions: The

fundamental principle of dimensions requires that apples be added only

to apples. For the same reason, the entire sum has the same dimensions

as any term. Thus, the summation sign—and therefore the integration

sign—do not aﬀect dimensions: The integral sign is dimensionless.

Problem 1.6 Integrating velocity

Position is the integral of velocity. However, position and velocity have diﬀer-

ent dimensions. How is this diﬀerence consistent with the conclusion that the

integration sign is dimensionless?

Because the integration sign is dimensionless, the dimensions of the inte-

gral are the dimensions of the exponential factor e

−αx

2

multiplied by the

dimensions of dx. The exponential, despite its ﬁerce exponent −αx

2

, is

merely several copies of e multiplied together. Because e is dimensionless,

so is e

−αx

2

.

10 1 Dimensions

What are the dimensions of dx?

To ﬁnd the dimensions of dx, follow the advice of Silvanus Thompson

[45, p. 1]: Read d as “a little bit of.” Then dx is “a little bit of x.” A little

length is still a length, so dx is a length. In general, dx has the same

dimensions as x. Equivalently, d—the inverse of

¸

—is dimensionless.

Assembling the pieces, the whole integral has dimensions of length:

¸

e

−αx

2

dx

=

e

−αx

2

. .. .

1

× [dx]

....

L

= L. (1.16)

Problem 1.7 Don’t integrals compute areas?

A common belief is that integration computes areas. Areas have dimensions of

L

2

. How then can the Gaussian integral have dimensions of L?

1.3.3 Making an f(α) with correct dimensions

The third and ﬁnal step in this dimensional analysis is to construct an f(α)

with the same dimensions as the integral. Because the dimensions of α

are L

−2

, the only way to turn α into a length is to form α

−1/2

. Therefore,

f(α) ∼ α

−1/2

. (1.17)

This useful result, which lacks only a dimensionless factor, was obtained

without any integration.

To determine the dimensionless constant, set α = 1 and evaluate

f(1) =

∞

−∞

e

−x

2

dx. (1.18)

This classic integral will be approximated in Section 2.1 and guessed to be

√

π. The two results f(1) =

√

π and f(α) ∼ α

−1/2

require that f(α) =

π/α,

which yields

∞

−∞

e

−αx

2

dx =

π

α

. (1.19)

We often memorize the dimensionless constant but forget the power of α.

Do not do that. The α factor is usually much more important than the

dimensionless constant. Conveniently, the α factor is what dimensional

analysis can compute.

1.4 Summary and further problems 11

Problem 1.8 Change of variable

Rewind back to page 8 and pretend that you do not know f(α). Without doing

dimensional analysis, show that f(α) ∼ α

−1/2

.

Problem 1.9 Easy case α = 1

Setting α = 1, which is an example of easy-cases reasoning (Chapter 2), violates

the assumption that x is a length and α has dimensions of L

−2

. Why is it okay

to set α = 1?

Problem 1.10 Integrating a diﬃcult exponential

Use dimensional analysis to investigate

∞

0

e

−αt

3

dt.

1.4 Summary and further problems

Do not add apples to oranges: Every term in an equation or sum must

have identical dimensions! This restriction is a powerful tool. It helps us

to evaluate integrals without integrating and to predict the solutions of

diﬀerential equations. Here are further problems to practice this tool.

Problem 1.11 Integrals using dimensions

Use dimensional analysis to ﬁnd

∞

0

e

−ax

dx and

dx

x

2

+a

2

. A useful result is

dx

x

2

+1

= arctanx +C. (1.20)

Problem 1.12 Stefan–Boltzmann law

Blackbody radiation is an electromagnetic phenomenon, so the radiation inten-

sity depends on the speed of light c. It is also a thermal phenomenon, so it

depends on the thermal energy k

B

T, where T is the object’s temperature and k

B

is Boltzmann’s constant. And it is a quantum phenomenon, so it depends on

Planck’s constant

h. Thus the blackbody-radiation intensity I depends on c, k

B

T,

and

h. Use dimensional analysis to show that I ∝ T

4

and to ﬁnd the constant

of proportionality σ. Then look up the missing dimensionless constant. (These

results are used in Section 5.3.3.)

Problem 1.13 Arcsine integral

Use dimensional analysis to ﬁnd

1 −3x

2

dx. A useful result is

1 −x

2

dx =

arcsinx

2

+

x

1 −x

2

2

+C, (1.21)

12 1 Dimensions

Problem 1.14 Related rates

h

Water is poured into a large inverted cone (with a 90

◦

open-

ing angle) at a rate dV/dt = 10 m

3

s

−1

. When the water

depth is h = 5 m, estimate the rate at which the depth is

increasing. Then use calculus to ﬁnd the exact rate.

Problem 1.15 Kepler’s third law

Newton’s law of universal gravitation—the famous inverse-square law—says that

the gravitational force between two masses is

F = −

Gm

1

m

2

r

2

, (1.22)

where G is Newton’s constant, m

1

and m

2

are the two masses, and r is their

separation. For a planet orbiting the sun, universal gravitation together with

Newton’s second law gives

m

d

2

r

dt

2

= −

GMm

r

2

ˆ r, (1.23)

where M is the mass of the sun, m the mass of the planet, r is the vector from

the sun to the planet, and ˆ r is the unit vector in the r direction.

How does the orbital period τ depend on orbital radius r? Look up Kepler’s

third law and compare your result to it.

2

Easy cases

2.1 Gaussian integral revisited 13

2.2 Plane geometry: The area of an ellipse 16

2.3 Solid geometry: The volume of a truncated pyramid 17

2.4 Fluid mechanics: Drag 21

2.5 Summary and further problems 29

A correct solution works in all cases, including the easy ones. This maxim

underlies the second tool—the method of easy cases. It will help us guess

integrals, deduce volumes, and solve exacting diﬀerential equations.

2.1 Gaussian integral revisited

As the ﬁrst application, let’s revisit the Gaussian integral from Section 1.3,

∞

−∞

e

−αx

2

dx. (2.1)

Is the integral

√

πα or

π/α?

The correct choice must work for all α 0. At this range’s endpoints

(α = ∞ and α = 0), the integral is easy to evaluate.

What is the integral when α = ∞?

e

−10x

2

0 1

As the ﬁrst easy case, increase α to ∞. Then −αx

2

be-

comes very negative, even when x is tiny. The exponen-

tial of a large negative number is tiny, so the bell curve

narrows to a sliver, and its area shrinks to zero. There-

fore, as α →∞the integral shrinks to zero. This result refutes the option

14 2 Easy cases

√

πα, which is inﬁnite when α = ∞; and it supports the option

π/α,

which is zero when α = ∞.

What is the integral when α = 0?

e

−x

2

/10

0 1

In the α = 0 extreme, the bell curve ﬂattens into a

horizontal line with unit height. Its area, integrated

over the inﬁnite range, is inﬁnite. This result refutes

the

√

πα option, which is zero when α = 0; and it

supports the

π/α option, which is inﬁnity when α =

0. Thus the

√

πα option fails both easy-cases tests, and the

π/α option

passes both easy-cases tests.

If these two options were the only options, we would choose

π/α. How-

ever, if a third option were

2/α, how could you decide between it and

**π/α? Both options pass both easy-cases tests; they also have identical
**

dimensions. The choice looks diﬃcult.

To choose, try a third easy case: α = 1. Then the integral simpliﬁes to

∞

−∞

e

−x

2

dx. (2.2)

This classic integral can be evaluated in closed form by using polar coor-

dinates, but that method also requires a trick with few other applications

(textbooks on multivariable calculus give the gory details). A less elegant

but more general approach is to evaluate the integral numerically and to

use the approximate value to guess the closed form.

Therefore, replace the smooth curve e

−x

2

with a curve

having n line segments. This piecewise-linear approxi-

mation turns the area into a sum of n trapezoids. As

n approaches inﬁnity, the area of the trapezoids more and more closely

approaches the area under the smooth curve.

n Area

10 2.07326300569564

20 1.77263720482665

30 1.77245385170978

40 1.77245385090552

50 1.77245385090552

The table gives the area under the curve in the

range x = −10 . . . 10, after dividing the curve

into n line segments. The areas settle onto a

stable value, and it looks familiar. It begins

with 1.7, which might arise from

√

3. However,

it continues as 1.77, which is too large to be

√

3.

Fortunately, π is slightly larger than 3, so the

area might be converging to

√

π.

2.1 Gaussian integral revisited 15

Let’s check by comparing the squared area against π:

1.77245385090552

2

≈ 3.14159265358980,

π ≈ 3.14159265358979.

(2.3)

The close match suggests that the α = 1 Gaussian integral is indeed

√

π:

∞

−∞

e

−x

2

dx =

√

π. (2.4)

Therefore the general Gaussian integral

∞

−∞

e

−αx

2

dx (2.5)

must reduce to

√

π when α = 1. It must also behave correctly in the other

two easy cases α = 0 and α = ∞.

Among the three choices

2/α,

π/α, and

√

πα, only

π/α passes all

three tests α = 0, 1, and ∞. Therefore,

∞

−∞

e

−αx

2

dx =

π

α

. (2.6)

Easy cases are not the only way to judge these choices. Dimensional analy-

sis, for example, can also restrict the possibilities (Section 1.3). It even

eliminates choices like

√

π/α that pass all three easy-cases tests. However,

easy cases are, by design, simple. They do not require us to invent or

deduce dimensions for x, α, and dx (the extensive analysis of Section 1.3).

Easy cases, unlike dimensional analysis, can also eliminate choices like

**2/α with correct dimensions. Each tool has its strengths.
**

Problem 2.1 Testing several alternatives

For the Gaussian integral

∞

−∞

e

−αx

2

dx, (2.7)

use the three easy-cases tests to evaluate the following candidates for its value.

(a)

√

π/α (b) 1 + (

√

π −1)/α (c) 1/α

2

+ (

√

π −1)/α.

Problem 2.2 Plausible, incorrect alternative

Is there an alternative to

π/α that has valid dimensions and passes the three

easy-cases tests?

16 2 Easy cases

Problem 2.3 Guessing a closed form

Use a change of variable to show that

∞

0

dx

1 +x

2

= 2

1

0

dx

1 +x

2

. (2.8)

The second integral has a ﬁnite integration range, so it is easier than the ﬁrst

integral to evaluate numerically. Estimate the second integral using the trapezoid

approximation and a computer or programmable calculator. Then guess a closed

form for the ﬁrst integral.

2.2 Plane geometry: The area of an ellipse

b

a

The second application of easy cases is from plane

geometry: the area of an ellipse. This ellipse has

semimajor axis a and semiminor axis b. For its area A

consider the following candidates:

(a) ab

2

(b) a

2

+b

2

(c) a

3

/b (d) 2ab (e) πab.

What are the merits or drawbacks of each candidate?

The candidate A = ab

2

has dimensions of L

3

, whereas an area must have

dimensions of L

2

. Thus ab

2

must be wrong.

The candidate A = a

2

+ b

2

has correct dimensions (as do the remaining

candidates), so the next tests are the easy cases of the radii a and b. For a,

the low extreme a = 0 produces an inﬁnitesimally thin ellipse with zero

area. However, when a = 0 the candidate A = a

2

+b

2

reduces to A = b

2

rather than to 0; so a

2

+b

2

fails the a = 0 test.

The candidate A = a

3

/b correctly predicts zero area when a = 0. Because

a = 0 was a useful easy case, and the axis labels a and b are almost

interchangeable, its symmetric counterpart b = 0 should also be a useful

easy case. It too produces an inﬁnitesimally thin ellipse with zero area;

alas, the candidate a

3

/b predicts an inﬁnite area, so it fails the b = 0 test.

Two candidates remain.

The candidate A = 2ab shows promise. When a = 0 or b = 0, the

actual and predicted areas are zero, so A = 2ab passes both easy-cases

tests. Further testing requires the third easy case: a = b. Then the ellipse

becomes a circle with radius a and area πa

2

. The candidate 2ab, however,

reduces to A = 2a

2

, so it fails the a = b test.

2.3 Solid geometry: The volume of a truncated pyramid 17

The candidate A = πab passes all three tests: a = 0, b = 0, and a = b.

With each passing test, our conﬁdence in the candidate increases; and

πab is indeed the correct area (Problem 2.4).

Problem 2.4 Area by calculus

Use integration to show that A = πab.

Problem 2.5 Inventing a passing candidate

Can you invent a second candidate for the area that has correct dimensions and

passes the a = 0, b = 0, and a = b tests?

Problem 2.6 Generalization

Guess the volume of an ellipsoid with principal radii a, b, and c.

2.3 Solid geometry: The volume of a truncated pyramid

The Gaussian-integral example (Section 2.1) and the ellipse-area example

(Section 2.2) showed easy cases as a method of analysis: for checking

whether formulas are correct. The next level of sophistication is to use

easy cases as a method of synthesis: for constructing formulas.

h

b

a

As an example, take a pyramid with a square base and

slice a piece from its top using a knife parallel to the

base. This truncated pyramid (called the frustum) has a

square base and square top parallel to the base. Let h be

its vertical height, b be the side length of its base, and a

be the side length of its top.

What is the volume of the truncated pyramid?

Let’s synthesize the formula for the volume. It is a function of the three

lengths h, a, and b. These lengths split into two kinds: height and

base lengths. For example, ﬂipping the solid on its head interchanges

the meanings of a and b but preserves h; and no simple operation inter-

changes height with a or b. Thus the volume probably has two factors,

each containing a length or lengths of only one kind:

V(h, a, b) = f(h) ×g(a, b). (2.9)

Proportional reasoning will determine f; a bit of dimensional reasoning

and a lot of easy-cases reasoning will determine g.

18 2 Easy cases

What is f : How should the volume depend on the height?

To ﬁnd f, use a proportional-reasoning thought experi-

ment. Chop the solid into vertical slivers, each like an

oil-drilling core; then imagine doubling h. This change

doubles the volume of each sliver and therefore doubles

the whole volume V. Thus f ∼ h and V ∝ h:

V = h ×g(a, b). (2.10)

What is g: How should the volume depend on a and b?

Because V has dimensions of L

3

, the function g(a, b) has dimensions

of L

2

. That constraint is all that dimensional analysis can say. Further

constraints are needed to synthesize g, and these constraints are provided

by the method of easy cases.

2.3.1 Easy cases

What are the easy cases of a and b?

The easiest case is the extreme case a = 0 (an ordinary pyramid). The

symmetry between a and b suggests two further easy cases, namely a = b

and the extreme case b = 0. The easy cases are then threefold:

h

b

h

a

h

a

a = 0 b = 0 a = b

When a = 0, the solid is an ordinary pyramid, and g is a function only

of the base side length b. Because g has dimensions of L

2

, the only

possibility for g is g ∼ b

2

; in addition, V ∝ h; so, V ∼ hb

2

. When b = 0,

the solid is an upside-down version of the b = 0 pyramid and therefore

has volume V ∼ ha

2

. When a = b, the solid is a rectangular prism having

volume V = ha

2

(or hb

2

).

Is there a volume formula that satisﬁes the three easy-cases constraints?

2.3 Solid geometry: The volume of a truncated pyramid 19

The a = 0 and b = 0 constraints are satisﬁed by the symmetric sum

V ∼ h(a

2

+ b

2

). If the missing dimensionless constant is 1/2, making

V = h(a

2

+b

2

)/2, then the volume also satisﬁes the a = b constraint, and

the volume of an ordinary pyramid (a = 0) would be hb

2

/2.

When a = 0, is the prediction V = hb

2

/2 correct?

Testing the prediction requires ﬁnding the exact dimensionless constant

in V ∼ hb

2

. This task looks like a calculus problem: Slice a pyramid into

thin horizontal sections and add (integrate) their volumes. However, a

simple alternative is to apply easy cases again.

b

h = b

The easy case is easier to construct after we solve a

similar but simpler problem: to ﬁnd the area of a

triangle with base b and height h. The area satisﬁes

A ∼ hb, but what is the dimensionless constant? To

ﬁnd it, choose b and h to make an easy triangle: a

right triangle with h = b. Two such triangles make

an easy shape: a square with area b

2

. Thus each right triangle has area

A = b

2

/2; the dimensionless constant is 1/2. Now extend this reasoning

to three dimensions—ﬁnd an ordinary pyramid (with a square base) that

combines with itself to make an easy solid.

What is the easy solid?

A convenient solid is suggested by the pyramid’s square

base: Perhaps each base is one face of a cube. The cube then

requires six pyramids whose tips meet in the center of the

cube; thus the pyramids have the aspect ratio h = b/2. For

numerical simplicity, let’s meet this condition with b = 2

and h = 1.

Six such pyramids form a cube with volume b

3

= 8, so the volume of one

pyramid is 4/3. Because each pyramid has volume V ∼ hb

2

, and hb

2

= 4

for these pyramids, the dimensionless constant in V ∼ hb

2

must be 1/3.

The volume of an ordinary pyramid (a pyramid with a = 0) is therefore

V = hb

2

/3.

Problem 2.7 Triangular base

Guess the volume of a pyramid with height h and a triangular base of area A.

Assume that the top vertex lies directly over the centroid of the base. Then try

Problem 2.8.

20 2 Easy cases

Problem 2.8 Vertex location

The six pyramids do not make a cube unless each pyramid’s top vertex lies

directly above the center of the base. Thus the result V = hb

2

/3 might apply

only with this restriction. If instead the top vertex lies above one of the base

vertices, what is the volume?

The prediction from the ﬁrst three easy-cases tests was V = hb

2

/2 (when

a = 0), whereas the further easy case h = b/2 alongside a = 0 just showed

that V = hb

2

/3. The two methods are making contradictory predictions.

How can this contradiction be resolved?

The contradiction must have snuck in during one of the reasoning steps.

To ﬁnd the culprit, revisit each step in turn. The argument for V ∝ h looks

correct. The three easy-case requirements—that V ∼ hb

2

when a = 0, that

V ∼ ha

2

when b = 0, and that V = h(a

2

+ b

2

)/2 when a = b—also look

correct. The mistake was leaping from these constraints to the prediction

V ∼ h(a

2

+b

2

) for any a or b.

Instead let’s try the following general form that includes an ab term:

V = h(αa

2

+βab +γb

2

). (2.11)

Then solve for the coeﬃcients α, β, and γ by reapplying the easy-cases

requirements.

The b = 0 test along with the h = b/2 easy case, which showed that

V = hb

2

/3 for an ordinary pyramid, require that α = 1/3. The a = 0

test similarly requires that γ = 1/3. And the a = b test requires that

α +β +γ = 1. Therefore β = 1/3 and voilà,

V =

1

3

h(a

2

+ab +b

2

). (2.12)

This formula, the result of proportional reasoning, dimensional analysis,

and the method of easy cases, is exact (Problem 2.9)!

Problem 2.9 Integration

Use integration to show that V = h(a

2

+ab +b

2

)/3.

Problem 2.10 Truncated triangular pyramid

Instead of a pyramid with a square base, start with a pyramid with an equilateral

triangle of side length b as its base. Then make the truncated solid by slicing a

piece from the top using a knife parallel to the base. In terms of the height h

2.4 Fluid mechanics: Drag 21

and the top and bottom side lengths a and b, what is the volume of this solid?

(See also Problem 2.7.)

Problem 2.11 Truncated cone

What is the volume of a truncated cone with a circular base of radius r

1

and

circular top of radius r

2

(with the top parallel to the base)? Generalize your for-

mula to the volume of a truncated pyramid with height h, a base of an arbitrary

shape and area A

base

, and a corresponding top of area A

top

.

2.4 Fluid mechanics: Drag

The preceding examples showed that easy cases can check and construct

formulas, but the examples can be done without easy cases (for example,

with calculus). For the next equations, from ﬂuid mechanics, no exact

solutions are known in general, so easy cases and other street-ﬁghting

tools are almost the only way to make progress.

Here then are the Navier–Stokes equations of ﬂuid mechanics:

∂v

∂t

+ (v·∇)v = −

1

ρ

∇p +ν∇

2

v, (2.13)

where v is the velocity of the ﬂuid (as a function of position and time),

ρ is its density, p is the pressure, and ν is the kinematic viscosity. These

equations describe an amazing variety of phenomena including ﬂight,

tornadoes, and river rapids.

Our example is the following home experiment on drag. Photocopy this

page while magnifying it by a factor of 2; then cut out the following two

templates:

1in

2in

22 2 Easy cases

With each template, tape together the shaded areas to

make a cone. The two resulting cones have the same

shape, but the large cone has twice the height and width

of the small cone.

When the cones are dropped point downward, what is the

approximate ratio of their terminal speeds (the speeds at which drag balances

weight)?

The Navier–Stokes equations contain the answer to this question. Finding

the terminal speed involves four steps.

Step 1. Impose boundary conditions. The conditions include the motion

of the cone and the requirement that no ﬂuid enters the paper.

Step 2. Solve the equations, together with the continuity equation ∇·v =

0, in order to ﬁnd the pressure and velocity at the surface of the

cone.

Step 3. Use the pressure and velocity to ﬁnd the pressure and velocity

gradient at the surface of the cone; then integrate the resulting

forces to ﬁnd the net force and torque on the cone.

Step 4. Use the net force and torque to ﬁnd the motion of the cone. This

step is diﬃcult because the resulting motion must be consistent

with the motion assumed in step 1. If it is not consistent, go back

to step 1, assume a diﬀerent motion, and hope for better luck

upon reaching this step.

Unfortunately, the Navier–Stokes equations are coupled and nonlinear

partial-diﬀerential equations. Their solutions are known only in very

simple cases: for example, a sphere moving very slowly in a viscous ﬂuid,

or a sphere moving at any speed in a zero-viscosity ﬂuid. There is little

hope of solving for the complicated ﬂow around an irregular, quivering

shape such as a ﬂexible paper cone.

Problem 2.12 Checking dimensions in the Navier–Stokes equations

Check that the ﬁrst three terms of the Navier–Stokes equations have identical

dimensions.

Problem 2.13 Dimensions of kinematic viscosity

From the Navier–Stokes equations, ﬁnd the dimensions of kinematic viscosity ν.

2.4 Fluid mechanics: Drag 23

2.4.1 Using dimensions

Because a direct solution of the Navier–Stokes equations is out of the

question, let’s use the methods of dimensional analysis and easy cases. A

direct approach is to use them to deduce the terminal velocity itself. An

indirect approach is to deduce the drag force as a function of fall speed

and then to ﬁnd the speed at which the drag balances the weight of

the cones. This two-step approach simpliﬁes the problem. It introduces

only one new quantity (the drag force) but eliminates two quantities: the

gravitational acceleration and the mass of the cone.

Problem 2.14 Explaining the simpliﬁcation

Why is the drag force independent of the gravitational acceleration g and of the

cone’s mass m (yet the force depends on the cone’s shape and size)?

The principle of dimensions is that all terms in a valid equation have

identical dimensions. Applied to the drag force F, it means that in the

equation F = f(quantities that aﬀect F) both sides have dimensions of

force. Therefore, the strategy is to ﬁnd the quantities that aﬀect F, ﬁnd

their dimensions, and then combine the quantities into a quantity with

dimensions of force.

On what quantities does the drag depend, and what are their dimensions?

v speed of the cone LT

−1

r size of the cone L

ρ density of air ML

−3

ν viscosity of air L

2

T

−1

The drag force depends on four quan-

tities: two parameters of the cone and

two parameters of the ﬂuid (air). (For

the dimensions of ν, see Problem 2.13.)

Do any combinations of the four parameters

v, r, ρ, and ν have dimensions of force?

The next step is to combine v, r, ρ, and ν into a quantity with dimensions

of force. Unfortunately, the possibilities are numerous—for example,

F

1

= ρv

2

r

2

,

F

2

= ρνvr,

(2.14)

or the product combinations

√

F

1

F

2

and F

2

1

/F

2

. Any sum of these ugly

products is also a force, so the drag force F could be

√

F

1

F

2

+ F

2

1

/F

2

,

3

√

F

1

F

2

−2F

2

1

/F

2

, or much worse.

24 2 Easy cases

Narrowing the possibilities requires a method more sophisticated than

simply guessing combinations with correct dimensions. To develop the

sophisticated approach, return to the ﬁrst principle of dimensions: All

terms in an equation have identical dimensions. This principle applies to

any statement about drag such as

A+B = C (2.15)

where the blobs A, B, and C are functions of F, v, r, ρ, and ν.

Although the blobs can be absurdly complex functions, they have identical

dimensions. Therefore, dividing each term by A, which produces the

equation

A

A

+

B

A

=

C

A

, (2.16)

makes each term dimensionless. The same method turns any valid equa-

tion into a dimensionless equation. Thus, any (true) equation describing

the world can be written in a dimensionless form.

Any dimensionless form can be built from dimensionless groups: from

dimensionless products of the variables. Because any equation describing

the world can be written in a dimensionless form, and any dimensionless

form can be written using dimensionless groups, any equation describing

the world can be written using dimensionless groups.

Is the free-fall example (Section 1.2) consistent with this principle?

Before applying this principle to the complicated problem of drag, try it in

the simple example of free fall (Section 1.2). The exact impact speed of an

object dropped from a height h is v =

√

2gh, where g is the gravitational

acceleration. This result can indeed be written in the dimensionless form

v/

√

gh =

√

2, which itself uses only the dimensionless group v/

√

gh. The

new principle passes its ﬁrst test.

This dimensionless-group analysis of formulas, when reversed, becomes

a method of synthesis. Let’s warm up by synthesizing the impact speed v.

First, list the quantities in the problem; here, they are v, g, and h. Second,

combine these quantities into dimensionless groups. Here, all dimension-

less groups can be constructed just from one group. For that group, let’s

choose v

2

/gh (the particular choice does not aﬀect the conclusion). Then

the only possible dimensionless statement is

2.4 Fluid mechanics: Drag 25

v

2

gh

= dimensionless constant. (2.17)

(The right side is a dimensionless constant because no second group is

available to use there.) In other words, v

2

/gh ∼ 1 or v ∼

√

gh.

This result reproduces the result of the less sophisticated dimensional

analysis in Section 1.2. Indeed, with only one dimensionless group, either

analysis leads to the same conclusion. However, in hard problems—for

example, ﬁnding the drag force—the less sophisticated method does not

provide its constraint in a useful form; then the method of dimensionless

groups is essential.

Problem 2.15 Fall time

Synthesize an approximate formula for the free-fall time t from g and h.

Problem 2.16 Kepler’s third law

Synthesize Kepler’s third law connecting the orbital period of a planet to its

orbital radius. (See also Problem 1.15.)

What dimensionless groups can be constructed for the drag problem?

One dimensionless group could be F/ρv

2

r

2

; a second group could be rv/ν.

Any other group can be constructed from these groups (Problem 2.17), so

the problem is described by two independent dimensionless groups. The

most general dimensionless statement is then

one group = f(second group), (2.18)

where f is a still-unknown (but dimensionless) function.

Which dimensionless group belongs on the left side?

The goal is to synthesize a formula for F, and F appears only in the ﬁrst

group F/ρv

2

r

2

. With that constraint in mind, place the ﬁrst group on the

left side rather than wrapping it in the still-mysterious function f. With

this choice, the most general statement about drag force is

F

ρv

2

r

2

= f

rv

ν

. (2.19)

The physics of the (steady-state) drag force on the cone is all contained

in the dimensionless function f.

26 2 Easy cases

Problem 2.17 Only two groups

Show that F, v, r, ρ, and ν produce only two independent dimensionless groups.

Problem 2.18 How many groups in general?

Is there a general method to predict the number of independent dimensionless

groups? (The answer was given in 1914 by Buckingham [9].)

The procedure might seem pointless, having produced a drag force that

depends on the unknown function f. But it has greatly improved our

chances of ﬁnding f. The original problem formulation required guess-

ing the four-variable function h in F = h(v, r, ρ, ν), whereas dimensional

analysis reduced the problem to guessing a function of only one variable

(the ratio vr/ν). The value of this simpliﬁcation was eloquently described

by the statistician and physicist Harold Jeﬀreys (quoted in [34, p. 82]):

A good table of functions of one variable may require a page; that of a function

of two variables a volume; that of a function of three variables a bookcase;

and that of a function of four variables a library.

Problem 2.19 Dimensionless groups for the truncated pyramid

The truncated pyramid of Section 2.3 has volume

V =

1

3

h(a

2

+ab +b

2

). (2.20)

Make dimensionless groups from V, h, a, and b, and rewrite the volume using

these groups. (There are many ways to do so.)

2.4.2 Using easy cases

Although improved, our chances do not look high: Even the one-variable

drag problem has no exact solution. But it might have exact solutions in

its easy cases. Because the easiest cases are often extreme cases, look ﬁrst

at the extreme cases.

Extreme cases of what?

The unknown function f depends on only rv/ν,

F

ρv

2

r

2

= f

rv

ν

, (2.21)

so try extremes of rv/ν. However, to avoid lapsing into mindless sym-

bol pushing, ﬁrst determine the meaning of rv/ν. This combination rv/ν,

2.4 Fluid mechanics: Drag 27

often denoted Re, is the famous Reynolds number. (Its physical interpreta-

tion requires the technique of lumping and is explained in Section 3.4.3.)

The Reynolds number aﬀects the drag force via the unknown function f:

F

ρv

2

r

2

= f (Re) . (2.22)

With luck, f can be deduced at extremes of the Reynolds number; with

further luck, the falling cones are an example of one extreme.

Are the falling cones an extreme of the Reynolds number?

The Reynolds number depends on r, v, and ν. For the speed v, everyday

experience suggests that the cones fall at roughly 1 ms

−1

(within, say, a

factor of 2). The size r is roughly 0.1 m (again within a factor of 2). And

the kinematic viscosity of air is ν ∼ 10

−5

m

2

s

−1

. The Reynolds number is

r

. .. .

0.1 m×

v

. .. .

1 ms

−1

10

−5

m

2

s

−1

. .. .

ν

∼ 10

4

. (2.23)

It is signiﬁcantly greater than 1, so the falling cones are an extreme case

of high Reynolds number. (For low Reynolds number, try Problem 2.27

and see [38].)

Problem 2.20 Reynolds numbers in everyday ﬂows

Estimate Re for a submarine cruising underwater, a falling pollen grain, a falling

raindrop, and a 747 crossing the Atlantic.

The high-Reynolds-number limit can be reached many ways. One way

is to shrink the viscosity ν to 0, because ν lives in the denominator of

the Reynolds number. Therefore, in the limit of high Reynolds number,

viscosity disappears from the problem and the drag force should not de-

pend on viscosity. This reasoning contains several subtle untruths, yet its

conclusion is mostly correct. (Clarifying the subtleties required two cen-

turies of progress in mathematics, culminating in singular perturbations

and the theory of boundary layers [12, 46].)

Viscosity aﬀects the drag force only through the Reynolds number:

F

ρv

2

r

2

= f

rv

ν

. (2.24)

28 2 Easy cases

To make F independent of viscosity, F must be independent of Reynolds

number! The problem then contains only one independent dimensionless

group, F/ρv

2

r

2

, so the most general statement about drag is

F

ρv

2

r

2

= dimensionless constant. (2.25)

The drag force itself is then F ∼ ρv

2

r

2

. Because r

2

is proportional to the

cone’s cross-sectional area A, the drag force is commonly written

F ∼ ρv

2

A. (2.26)

Although the derivation was for falling cones, the result applies to any

object as long as the Reynolds number is high. The shape aﬀects only

the missing dimensionless constant. For a sphere, it is roughly 1/4; for a

long cylinder moving perpendicular to its axis, it is roughly 1/2; and for

a ﬂat plate moving perpendicular to its face, it is roughly 1.

2.4.3 Terminal velocities

F

drag

W= mg

The result F ∼ ρv

2

A is enough to predict the terminal veloci-

ties of the cones. Terminal velocity means zero acceleration,

so the drag force must balance the weight. The weight is

W = σ

paper

A

paper

g, where σ

paper

is the areal density of paper

(mass per area) and A

paper

is the area of the template after

cutting out the quarter section. Because A

paper

is comparable

to the cross-sectional area A, the weight is roughly given by

W ∼ σ

paper

Ag. (2.27)

Therefore,

ρv

2

A

. .. .

drag

∼ σ

paper

Ag

. .. .

weight

. (2.28)

The area divides out and the terminal velocity becomes

v ∼

gσ

paper

ρ

. (2.29)

All cones constructed from the same paper and having the same shape,

whatever their size, fall at the same speed!

2.5 Summary and further problems 29

To test this prediction, I constructed the small and large cones described

on page 21, held one in each hand above my head, and let them fall. Their

2 m fall lasted roughly 2 s, and they landed within 0.1 s of one another.

Cheap experiment and cheap theory agree!

Problem 2.21 Home experiment of a small versus a large cone

Try the cone home experiment yourself (page 21).

Problem 2.22 Home experiment of four stacked cones versus one cone

Predict the ratio

terminal velocity of four small cones stacked inside each other

terminal velocity of one small cone

. (2.30)

Test your prediction. Can you ﬁnd a method not requiring timing equipment?

Problem 2.23 Estimating the terminal speed

Estimate or look up the areal density of paper; predict the cones’ terminal speed;

and then compare that prediction to the result of the home experiment.

2.5 Summary and further problems

A correct solution works in all cases, including the easy ones. There-

fore, check any proposed formula in the easy cases, and guess formulas

by constructing expressions that pass all easy-cases tests. To apply and

extend these ideas, try the following problems and see the concise and

instructive book by Cipra [10].

Problem 2.24 Fencepost errors

A garden has 10 m of horizontal fencing that you would like to divide into 1 m

segments by using vertical posts. Do you need 10 or 11 vertical posts (including

the posts needed at the ends)?

Problem 2.25 Odd sum

Here is the sum of the ﬁrst n odd integers:

S

n

= 1 +3 +5 +· · · +l

n

. .. .

n terms

(2.31)

a. Does the last term l

n

equal 2n +1 or 2n −1?

b. Use easy cases to guess S

n

(as a function of n).

An alternative solution is discussed in Section 4.1.

30 2 Easy cases

Problem 2.26 Free fall with initial velocity

The ball in Section 1.2 was released from rest. Now imagine that it is given an

initial velocity v

0

(where positive v

0

means an upward throw). Guess the impact

velocity v

i

.

Then solve the free-fall diﬀerential equation to ﬁnd the exact v

i

, and compare

the exact result to your guess.

Problem 2.27 Low Reynolds number

In the limit Re 1, guess the form of f in

F

ρv

2

r

2

= f

rv

ν

. (2.32)

The result, when combined with the correct dimensionless constant, is known

as Stokes drag [12].

Problem 2.28 Range formula

v

R

θ

How far does a rock travel horizontally (no air resistance)?

Use dimensions and easy cases to guess a formula for the

range R as a function of the launch velocity v, the launch

angle θ, and the gravitational acceleration g.

Problem 2.29 Spring equation

The angular frequency of an ideal mass–spring system (Section 3.4.2) is

k/m,

where k is the spring constant and m is the mass. This expression has the spring

constant k in the numerator. Use extreme cases of k or m to decide whether that

placement is correct.

Problem 2.30 Taping the cone templates

The tape mark on the large cone template (page 21) is twice as wide as the tape

mark on the small cone template. In other words, if the tape on the large cone

is, say, 6 mm wide, the tape on the small cone should be 3 mm wide. Why?

3

Lumping

3.1 Estimating populations: How many babies? 32

3.2 Estimating integrals 33

3.3 Estimating derivatives 37

3.4 Analyzing diﬀerential equations: The spring–mass system 42

3.5 Predicting the period of a pendulum 46

3.6 Summary and further problems 54

Where will an orbiting planet be 6 months from now? To predict its new

location, we cannot simply multiply the 6 months by the planet’s current

velocity, for its velocity constantly varies. Such calculations are the reason

that calculus was invented. Its fundamental idea is to divide the time into

tiny intervals over which the velocity is constant, to multiply each tiny

time by the corresponding velocity to compute a tiny distance, and then

to add the tiny distances.

Amazingly, this computation can often be done exactly, even when the

intervals have inﬁnitesimal width and are therefore inﬁnite in number.

However, the symbolic manipulations can be lengthy and, worse, are

often rendered impossible by even small changes to the problem. Using

calculus methods, for example, we can exactly calculate the area under

the Gaussian e

−x

2

between x = 0 and ∞; yet if either limit is any value

except zero or inﬁnity, an exact calculation becomes impossible.

In contrast, approximate methods are robust: They almost always provide

a reasonable answer. And the least accurate but most robust method is

lumping. Instead of dividing a changing process into many tiny pieces,

group or lump it into one or two pieces. This simple approximation and

its advantages are illustrated using examples ranging from demographics

(Section 3.1) to nonlinear diﬀerential equations (Section 3.5).

32 3 Lumping

3.1 Estimating populations: How many babies?

The ﬁrst example is to estimate the number of babies in the United States.

For deﬁniteness, call a child a baby until he or she turns 2 years old. An

exact calculation requires the birth dates of every person in the United

States. This, or closely similar, information is collected once every decade

by the US Census Bureau.

age (yr)

10

6

yr

0 50

0

4

N(t)

As an approximation to this voluminous

data, the Census Bureau [47] publishes

the number of people at each age. The

data for 1991 is a set of points lying on a

wiggly line N(t), where t is age. Then

N

babies

=

2 yr

0

N(t) dt. (3.1)

Problem 3.1 Dimensions of the vertical axis

Why is the vertical axis labeled in units of people per year rather than in units

of people? Equivalently, why does the axis have dimensions of T

−1

?

This method has several problems. First, it depends on the huge resources

of the US Census Bureau, so it is not usable on a desert island for back-

of-the-envelope calculations. Second, it requires integrating a curve with

no analytic form, so the integration must be done numerically. Third, the

integral is of data speciﬁc to this problem, whereas mathematics should

be about generality. An exact integration, in short, provides little insight

and has minimal transfer value. Instead of integrating the population

curve exactly, approximate it—lump the curve into one rectangle.

What are the height and width of this rectangle?

The rectangle’s width is a time, and a plausible time related to populations

is the life expectancy. It is roughly 80 years, so make 80 years the width

by pretending that everyone dies abruptly on his or her 80th birthday.

The rectangle’s height can be computed from the rectangle’s area, which

is the US population—conveniently 300 million in 2008. Therefore,

height =

area

width

∼

3 ×10

8

75 yr

. (3.2)

Why did the life expectancy drop from 80 to 75 years?

3.2 Estimating integrals 33

babies

lumped

age (yr)

10

6

yr

0 75

0

4

census data

Fudging the life expectancy simpliﬁes the

mental division: 75 divides easily into 3 and

300. The inaccuracy is no larger than the

error made by lumping, and it might even

cancel the lumping error. Using 75 years as

the width makes the height approximately

4 ×10

6

yr

−1

.

Integrating the population curve over the range t = 0 . . . 2 yr becomes just

multiplication:

N

babies

∼ 4 ×10

6

yr

−1

. .. .

height

× 2 yr

....

infancy

= 8 ×10

6

. (3.3)

The Census Bureau’s ﬁgure is very close: 7.980 × 10

6

. The error from

lumping canceled the error from fudging the life expectancy to 75 years!

Problem 3.2 Landﬁll volume

Estimate the US landﬁll volume used annually by disposable diapers.

Problem 3.3 Industry revenues

Estimate the annual revenue of the US diaper industry.

3.2 Estimating integrals

The US population curve (Section 3.1) was diﬃcult to integrate partly be-

cause it was unknown. But even well-known functions can be diﬃcult to

integrate. In such cases, two lumping methods are particularly useful: the

1/e heuristic (Section 3.2.1) and the full width at half maximum (FWHM)

heuristic (Section 3.2.2).

3.2.1 1/e heuristic

0

1

0 1

t

. . .

e

−t

Electronic circuits, atmospheric pressure, and radioac-

tive decay contain the ubiquitous exponential and its

integral (given here in dimensionless form)

∞

0

e

−t

dt. (3.4)

34 3 Lumping

To approximate its value, let’s lump the e

−t

curve into one rectangle.

What values should be chosen for the width and height of the rectangle?

lumped

0

1

0 1

t

e

−t

A reasonable height for the rectangle is the maximum

of e

−t

, namely 1. To choose its width, use signiﬁcant

change as the criterion (a method used again in Sec-

tion 3.3.3): Choose a signiﬁcant change in e

−t

; then

ﬁnd the width Δt that produces this change. In an

exponential decay, a simple and natural signiﬁcant

change is when e

−t

becomes a factor of e closer to

its ﬁnal value (which is 0 here because t goes to ∞). With this criterion,

Δt = 1. The lumping rectangle then has unit area—which is the exact

value of the integral!

e

−x

2

0 1 −1

Encouraged by this result, let’s try the heuristic on

the diﬃcult integral

∞

−∞

e

−x

2

dx. (3.5)

0 1 −1

Again lump the area into a single rectangle. Its height

is the maximum of e

−x

2

, which is 1. Its width is

enough that e

−x

2

falls by a factor of e. This drop hap-

pens at x = ±1, so the width is Δx = 2 and its area

is 1 × 2. The exact area is

√

π ≈ 1.77 (Section 2.1),

so lumping makes an error of only 13%: For such a short derivation, the

accuracy is extremely high.

Problem 3.4 General exponential decay

Use lumping to estimate the integral

∞

0

e

−at

dt. (3.6)

Use dimensional analysis and easy cases to check that your answer makes sense.

Problem 3.5 Atmospheric pressure

Atmospheric density ρ decays roughly exponentially with height z:

ρ ∼ ρ

0

e

−z/H

, (3.7)

where ρ

0

is the density at sea level, and H is the so-called scale height (the

height at which the density falls by a factor of e). Use your everyday experience

to estimate H.

3.2 Estimating integrals 35

Then estimate the atmospheric pressure at sea level by estimating the weight of

an inﬁnitely high cylinder of air.

Problem 3.6 Cone free-fall distance

Roughly how far does a cone of Section 2.4 fall before reaching a signiﬁcant

fraction of its terminal velocity? How large is that distance compared to the

drop height of 2 m? Hint: Sketch (very roughly) the cone’s acceleration versus

time and make a lumping approximation.

3.2.2 Full width at half maximum

Another reasonable lumping heuristic arose in the early days of spec-

troscopy. As a spectroscope swept through a range of wavelengths, a

chart recorder would plot how strongly a molecule absorbed radiation

of that wavelength. This curve contains many peaks whose location and

area reveal the structure of the molecule (and were essential in developing

quantum theory [14]). But decades before digital chart recorders existed,

how could the areas of the peaks be computed?

They were computed by lumping the peak into a rectangle whose height is

the height of the peak and whose width is the full width at half maximum

(FWHM). Where the 1/e heuristic uses a factor of e as the signiﬁcant

change, the FWHM heuristic uses a factor of 2.

Try this recipe on the Gaussian integral

¸

∞

−∞

e

−x

2

dx.

√

ln2 −

√

ln2

FWHM

The maximum height of e

−x

2

is 1, so the half maxima

are at x = ±

√

ln2 and the full width is 2

√

ln2. The

lumped rectangle therefore has area 2

√

ln2 ≈ 1.665.

The exact area is

√

π ≈ 1.77 (Section 2.1): The FWHM

heuristic makes an error of only 6%, which is roughly

one-half the error of the 1/e heuristic.

Problem 3.7 Trying the FWHM heuristic

Make single-rectangle lumping estimates of the following integrals. Choose the

height and width of the rectangle using the FWHM heuristic. How accurate is

each estimate?

a.

∞

−∞

1

1 +x

2

dx [exact value: π].

b.

∞

−∞

e

−x

4

dx [exact value: Γ(1/4)/2 ≈ 1.813].

36 3 Lumping

3.2.3 Stirling’s approximation

The 1/e and FWHM lumping heuristics next help us approximate the

ubiquitous factorial function n!; this function’s uses range from proba-

bility theory to statistical mechanics and the analysis of algorithms. For

positive integers, n! is deﬁned as n × (n − 1) × (n − 2) × · · · × 2 × 1. In

this discrete form, it is diﬃcult to approximate. However, the integral

representation for n!,

n! ≡

∞

0

t

n

e

−t

dt, (3.8)

provides a deﬁnition even when n is not a positive integer—and this

integral can be approximated using lumping.

The lumping analysis will generate almost all of Stirling’s famous approx-

imation formula

n! ≈ n

n

e

−n

√

2πn. (3.9)

Lumping requires a peak, but does the integrand t

n

e

−t

have a peak?

To understand the integrand t

n

e

−t

or t

n

/e

t

, examine the extreme cases

of t. When t = 0, the integrand is 0. In the opposite extreme, t → ∞,

the polynomial factor t

n

makes the product inﬁnity while the exponential

factor e

−t

makes it zero. Who wins that struggle? The Taylor series for

e

t

contains every power of t (and with positive coeﬃcients), so it is an

increasing, inﬁnite-degree polynomial. Therefore, as t goes to inﬁnity, e

t

outruns any polynomial t

n

and makes the integrand t

n

/e

t

equal 0 in the

t →∞ extreme. Being zero at both extremes, the integrand must have a

peak in between. In fact, it has exactly one peak. (Can you show that?)

1

te

−t

2

t

2

e

−t

3

t

3

e

−t

Increasing n strengthens the polynomial factor

t

n

, so t

n

survives until higher t before e

t

outruns

it. Therefore, the peak of t

n

/e

t

shifts right as

n increases. The graph conﬁrms this prediction

and suggests that the peak occurs at t = n. Let’s

check by using calculus to maximize t

n

e

−t

or,

more simply, to maximize its logarithm f(t) =

nlnt − t. At a peak, a function has zero slope.

Because df/dt = n/t−1, the peak occurs at t

peak

= n, when the integrand

t

n

e

−t

is n

n

e

−n

—thus reproducing the largest and most important factor

in Stirling’s formula.

3.3 Estimating derivatives 37

t

n

e

−t

2Δt

n

n

/e

n

What is a reasonable lumping rectangle?

The rectangle’s height is the peak height n

n

e

−n

.

For the rectangle’s width, use either the 1/e or

the FWHM heuristics. Because both heuristic re-

quire approximating t

n

e

−t

, expand its logarithm

f(t) in a Taylor series around its peak at t = n:

f(n +Δt) = f(n) +Δt

df

dt

t=n

+

(Δt)

2

2

d

2

f

dt

2

t=n

+· · · . (3.10)

The second term of the Taylor expansion vanishes because f(t) has zero

slope at the peak. In the third term, the second derivative d

2

f/dt

2

at

t = n is −n/t

2

or −1/n. Thus,

f(n +Δt) ≈ f(n) −

(Δt)

2

2n

. (3.11)

To decrease t

n

e

−t

by a factor of F requires decreasing f(t) by lnF. This

choice means Δt =

√

2nlnF. Because the rectangle’s width is 2Δt, the

lumped-area estimate of n! is

n! ∼ n

n

e

−n

√

n ×

√

8 (1/e criterion: F = e)

√

8 ln2 (FWHM criterion: F = 2).

(3.12)

For comparison, Stirling’s formula is n! ≈ n

n

e

−n

√

2πn. Lumping has

explained almost every factor. The n

n

e

−n

factor is the height of the rec-

tangle, and the

√

n factor is from the width of the rectangle. Although

the exact

√

2π factor remains mysterious (Problem 3.9), it is approximated

to within 13% (the 1/e heuristic) or 6% (the FWHM heuristic).

Problem 3.8 Coincidence?

The FWHM approximation for the area under a Gaussian (Section 3.2.2) was

also accurate to 6%. Coincidence?

Problem 3.9 Exact constant in Stirling’s formula

Where does the more accurate constant factor of

√

2π come from?

3.3 Estimating derivatives

In the preceding examples, lumping helped estimate integrals. Because

integration and diﬀerentiation are closely related, lumping also provides

38 3 Lumping

a method for estimating derivatives. The method begins with a dimen-

sional observation about derivatives. A derivative is a ratio of diﬀerentials;

for example, df/dx is the ratio of df to dx. Because d is dimensionless

(Section 1.3.2), the dimensions of df/dx are the dimensions of f/x. This

useful, surprising conclusion is worth testing with a familiar example:

Diﬀerentiating height y with respect to time t produces velocity dy/dt,

whose dimensions of LT

−1

are indeed the dimensions of y/t.

Problem 3.10 Dimensions of a second derivative

What are the dimensions of d

2

f/dx

2

?

3.3.1 Secant approximation

x

x

2

secant

tangent

As df/dx and f/x have identical dimensions,

perhaps their magnitudes are similar:

df

dx

∼

f

x

. (3.13)

Geometrically, the derivative df/dx is the slope

of the tangent line, whereas the approximation

f/x is the slope of the secant line. By replac-

ing the curve with the secant line, we make a

lumping approximation.

Let’s test the approximation on an easy function such as f(x) = x

2

. Good

news—the secant and tangent slopes diﬀer only by a factor of 2:

df

dx

= 2x and

f(x)

x

= x. (3.14)

Problem 3.11 Higher powers

Investigate the secant approximation for f(x) = x

n

.

Problem 3.12 Second derivatives

Use the secant approximation to estimate d

2

f/dx

2

with f(x) = x

2

. How does

the approximation compare to the exact second derivative?

How accurate is the secant approximation for f(x) = x

2

+100?

The secant approximation is quick and useful but can make large errors.

When f(x) = x

2

+ 100, for example, the secant and tangent at x = 1

3.3 Estimating derivatives 39

have dramatically diﬀerent slopes. The tangent slope df/dx is 2, whereas

the secant slope f(1)/1 is 101. The ratio of these two slopes, although

dimensionless, is distressingly large.

Problem 3.13 Investigating the discrepancy

With f(x) = x

2

+100, sketch the ratio

secant slope

tangent slope

(3.15)

as a function of x. The ratio is not constant! Why is the dimensionless factor not

constant? (That question is tricky.)

The large discrepancy in replacing the derivative df/dx, which is

lim

Δx→0

f(x) −f(x −Δx)

Δx

, (3.16)

with the secant slope f(x)/x is due to two approximations. The ﬁrst

approximation is to take Δx = x rather than Δx = 0. Then df/dx ≈

(f(x) − f(0))/x. This ﬁrst approximation produces the slope of the line

from (0, f(0)) to (x, f(x)). The second approximation replaces f(0) with

0, which produces df/dx ≈ f/x; that ratio is the slope of the secant from

(0, 0) to (x, f(x)).

3.3.2 Improved secant approximation

x

x

2

+C

x = 0 secant

tangent

The second approximation is ﬁxed by start-

ing the secant at (0, f(0)) instead of (0, 0).

With that change, what are the secant and tan-

gent slopes when f(x) = x

2

+C?

Call the secant starting at (0, 0) the origin

secant; call the new secant the x = 0 secant.

Then the x = 0 secant always has one-half

the slope of the tangent, no matter the constant C. The x = 0 secant

approximation is robust against—is unaﬀected by—vertical translation.

How robust is the x = 0 secant approximation against horizontal translation?

To investigate how the x = 0 secant handles horizontal translation, trans-

late f(x) = x

2

rightward by 100 to make f(x) = (x−100)

2

. At the parabola’s

40 3 Lumping

vertex x = 100, the x = 0 secant, from (0, 10

4

) to (100, 0), has slope −100;

however, the tangent has zero slope. Thus the x = 0 secant, although an

improvement on the origin secant, is aﬀected by horizontal translation.

3.3.3 Signiﬁcant-change approximation

The derivative itself is unaﬀected by horizontal and vertical translation,

so a derivative suitably approximated might be translation invariant. An

approximate derivative is

df

dx

≈

f(x +Δx) −f(x)

Δx

, (3.17)

where Δx is not zero but is still small.

How small should Δx be? Is Δx = 0.01 small enough?

The choice Δx = 0.01 has two defects. First, it cannot work when x has

dimensions. If x is a length, what length is small enough? Choosing Δx =

1 mm is probably small enough for computing derivatives related to the

solar system, but is probably too large for computing derivatives related

to falling fog droplets. Second, no ﬁxed choice can be scale invariant.

Although Δx = 0.01 produces accurate derivatives when f(x) = sinx, it

fails when f(x) = sin1000x, the result of simply rescaling x to 1000x.

These problems suggest trying the following signiﬁcant-change approxi-

mation:

df

dx

∼

signiﬁcant Δf (change in f) at x

Δx that produces a signiﬁcant Δf

. (3.18)

Because the Δx here is deﬁned by the properties of the curve at the point

of interest, without favoring particular coordinate values or values of Δx,

the approximation is scale and translation invariant.

cosx

(0, 1) (0, 1)

(2π, 1) (2π, 1)

origin secant

x = 0 secant

To illustrate this approximation, let’s try

f(x) = cos x and estimate df/dx at x =

3π/2 with the three approximations: the

origin secant, the x = 0 secant, and the

signiﬁcant-change approximation. The

origin secant goes from (0, 0) to (3π/2, 0),

so it has zero slope. It is a poor approxi-

mation to the exact slope of 1. The x = 0

3.3 Estimating derivatives 41

secant goes from (0, 1) to (3π/2, 0), so it has a slope of −2/3π, which is

worse than predicting zero slope because even the sign is wrong!

cosx

(2π, 1) (2π, 1)

(

3π

2

, 0) (

3π

2

, 0)

(

5π

3

,

1

2

) (

5π

3

,

1

2

)

The signiﬁcant-change approximation might pro-

vide more accuracy. What is a signiﬁcant change

in f(x) = cos x? Because the cosine changes by 2

(from −1 to 1), call 1/2 a signiﬁcant change in f(x).

That change happens when x changes from 3π/2,

where f(x) = 0, to 3π/2 + π/6, where f(x) = 1/2.

In other words, Δx is π/6. The approximate de-

rivative is therefore

df

dx

∼

signiﬁcant Δf near x

Δx

∼

1/2

π/6

=

3

π

. (3.19)

This estimate is approximately 0.955—amazingly close to the true deriva-

tive of 1.

Problem 3.14 Derivative of a quadratic

With f(x) = x

2

, estimate df/dx at x = 5 using three approximations: the origin

secant, the x = 0 secant, and the signiﬁcant-change approximation. Compare

these estimates to the true slope.

Problem 3.15 Derivative of the logarithm

Use the signiﬁcant-change approximation to estimate the derivative of lnx at

x = 10. Compare the estimate to the true slope.

Problem 3.16 Lennard–Jones potential

The Lennard–Jones potential is a model of the interaction energy between two

nonpolar molecules such as N

2

or CH

4

. It has the form

V(r) = 4

¸

σ

r

12

−

σ

r

6

, (3.20)

where r is the distance between the molecules, and and σ are constants that

depend on the molecules. Use the origin secant to estimate r

0

, the separation r

at which V(r) is a minimum. Compare the estimate to the true r

0

found using

calculus.

Problem 3.17 Approximate maxima and minima

Let f(x) be an increasing function and g(x) a decreasing function. Use the origin

secant to show, approximately, that h(x) = f(x) + g(x) has a minimum where

f(x) = g(x). This useful rule of thumb, which generalizes Problem 3.16, is often

called the balancing heuristic.

42 3 Lumping

3.4 Analyzing diﬀerential equations: The spring–mass system

Estimating derivatives reduces diﬀerentiation to division (Section 3.3); it

thereby reduces diﬀerential equations to algebraic equations.

k

m

x

0

To produce an example equation to analyze, con-

nect a block of mass m to an ideal spring with

spring constant (stiﬀness) k, pull the block a dis-

tance x

0

to the right relative to the equilibrium

position x = 0, and release it at time t = 0. The block oscillates back and

forth, its position x described by the ideal-spring diﬀerential equation

m

d

2

x

dt

2

+kx = 0. (3.21)

Let’s approximate the equation and thereby estimate the oscillation fre-

quency.

3.4.1 Checking dimensions

Upon seeing any equation, ﬁrst check its dimensions (Chapter 1). If

all terms do not have identical dimensions, the equation is not worth

solving—a great savings of eﬀort. If the dimensions match, the check has

prompted reﬂection on the meaning of the terms; this reﬂection helps

prepare for solving the equation and for understanding any solution.

What are the dimensions of the two terms in the spring equation?

Look ﬁrst at the simple second term kx. It arises from Hooke’s law, which

says that an ideal spring exerts a force kx where x is the extension of the

spring relative to its equilibrium length. Thus the second term kx is a

force. Is the ﬁrst term also a force?

The ﬁrst term m(d

2

x/dt

2

) contains the second derivative d

2

x/dt

2

, which is

familiar as an acceleration. Many diﬀerential equations, however, contain

unfamiliar derivatives. The Navier–Stokes equations of ﬂuid mechanics

(Section 2.4),

∂v

∂t

+ (v·∇)v = −

1

ρ

∇p +ν∇

2

v, (3.22)

contain two strange derivatives: (v·∇)v and ∇

2

v. What are the dimen-

sions of those terms?

3.4 Analyzing diﬀerential equations: The spring–mass system 43

To practice for later handling such complicated terms, let’s now ﬁnd the

dimensions of d

2

x/dt

2

by hand. Because d

2

x/dt

2

contains two exponents

of 2, and x is length and t is time, d

2

x/dt

2

might plausibly have dimen-

sions of L

2

T

−2

.

Are L

2

T

−2

the correct dimensions?

To decide, use the idea from Section 1.3.2 that the diﬀerential symbol d

means “a little bit of.” The numerator d

2

x, meaning d of dx, is “a little

bit of a little bit of x.” Thus, it is a length. The denominator dt

2

could

plausibly mean (dt)

2

or d(t

2

). [It turns out to mean (dt)

2

.] In either case,

its dimensions are T

2

. Therefore, the dimensions of the second derivative

are LT

−2

:

¸

d

2

x

dt

2

= LT

−2

. (3.23)

This combination is an acceleration, so the spring equation’s ﬁrst term

m(d

2

x/dt

2

) is mass times acceleration—giving it the same dimensions as

the kx term.

Problem 3.18 Dimensions of spring constant

What are the dimensions of the spring constant k?

3.4.2 Estimating the magnitudes of the terms

The spring equation passes the dimensions test, so it is worth analyzing

to ﬁnd the oscillation frequency. The method is to replace each term with

its approximate magnitude. These replacements will turn a complicated

diﬀerential equation into a simple algebraic equation for the frequency.

To approximate the ﬁrst term m(d

2

x/dt

2

), use the signiﬁcant-change ap-

proximation (Section 3.3.3) to estimate the magnitude of the acceleration

d

2

x/dt

2

.

d

2

x

dt

2

∼

signiﬁcant Δx

(Δt that produces a signiﬁcant Δx)

2

. (3.24)

Problem 3.19 Explaining the exponents

The numerator contains only the ﬁrst power of Δx, whereas the denominator

contains the second power of Δt. How can that discrepancy be correct?

44 3 Lumping

To evaluate this approximate acceleration, ﬁrst decide on a signiﬁcant

Δx—on what constitutes a signiﬁcant change in the mass’s position. The

mass moves between the points x = −x

0

and x = +x

0

, so a signiﬁcant

change in position should be a signiﬁcant fraction of the peak-to-peak

amplitude 2x

0

. The simplest choice is Δx = x

0

.

Now estimate Δt: the time for the block to move a distance comparable

to Δx. This time—called the characteristic time of the system—is related

to the oscillation period T. During one period, the mass moves back

and forth and travels a distance 4x

0

—much farther than x

0

. If Δt were,

say, T/4 or T/2π, then in the time Δt the mass would travel a distance

comparable to x

0

. Those choices for Δt have a natural interpretation as

being approximately 1/ω, where the angular frequency ω is connected

to the period by the deﬁnition ω ≡ 2π/T. With the preceding choices for

Δx and Δt, the m(d

2

x/dt

2

) term is roughly mx

0

ω

2

.

What does “is roughly” mean?

The phrase cannot mean that mx

0

ω

2

and m(d

2

x/dt

2

) are within, say, a

factor of 2, because m(d

2

x/dt

2

) varies and mx

0

/τ

2

is constant. Rather, “is

roughly” means that a typical or characteristic magnitude of m(d

2

x/dt

2

)—

for example, its root-mean-square value—is comparable to mx

0

ω

2

. Let’s

include this meaning within the twiddle notation ∼. Then the typical-

magnitude estimate can be written

m

d

2

x

dt

2

∼ mx

0

ω

2

. (3.25)

With the same meaning of “is roughly”, namely that the typical magni-

tudes are comparable, the spring equation’s second term kx is roughly kx

0

.

The two terms must add to zero—a consequence of the spring equation

m

d

2

x

dt

2

+kx = 0. (3.26)

Therefore, the magnitudes of the two terms are comparable:

mx

0

ω

2

∼ kx

0

. (3.27)

The amplitude x

0

divides out! With x

0

gone, the frequency ω and oscil-

lation period T = 2π/ω are independent of amplitude. [This reasoning

uses several approximations, but this conclusion is exact (Problem 3.20).]

The approximated angular frequency ω is then

k/m.

3.4 Analyzing diﬀerential equations: The spring–mass system 45

For comparison, the exact solution of the spring diﬀerential equation is,

from Problem 3.22,

x = x

0

cos ωt, (3.28)

where ω is

k/m. The approximated angular frequency is also exact!

Problem 3.20 Amplitude independence

Use dimensional analysis to show that the angular frequency ω cannot depend

on the amplitude x

0

.

Problem 3.21 Checking dimensions in the alleged solution

What are the dimensions of ωt? What are the dimensions of cos ωt? Check the

dimensions of the proposed solution x = x

0

cos ωt, and the dimensions of the

proposed period 2π

m/k.

Problem 3.22 Veriﬁcation

Show that x = x

0

cos ωt with ω =

**k/m solves the spring diﬀerential equation
**

m

d

2

x

dt

2

+kx = 0. (3.29)

3.4.3 Meaning of the Reynolds number

As a further example of lumping—in particular, of the signiﬁcant-change

approximation—let’s analyze the Navier–Stokes equations introduced in

Section 2.4,

∂v

∂t

+ (v·∇)v = −

1

ρ

∇p +ν∇

2

v, (3.30)

and extract from them a physical meaning for the Reynolds number rv/ν.

To do so, we estimate the typical magnitude of the inertial term (v·∇)v

and of the viscous term ν∇

2

v.

What is the typical magnitude of the inertial term?

The inertial term (v·∇)v contains the spatial derivative ∇v. According to

the signiﬁcant-change approximation (Section 3.3.3), the derivative ∇v is

roughly the ratio

signiﬁcant change in ﬂow velocity

distance over which ﬂow velocity changes signiﬁcantly

. (3.31)

46 3 Lumping

The ﬂow velocity (the velocity of the air) is nearly zero far from the

cone and is comparable to v near the cone (which is moving at speed v).

Therefore, v, or a reasonable fraction of v, constitutes a signiﬁcant change

in ﬂow velocity. This speed change happens over a distance comparable

to the size of the cone: Several cone lengths away, the air hardly knows

about the falling cone. Thus ∇v ∼ v/r. The inertial term (v·∇)v contains

a second factor of v, so (v·∇)v is roughly v

2

/r.

What is the typical magnitude of the viscous term?

The viscous term ν∇

2

v contains two spatial derivatives of v. Because

each spatial derivative contributes a factor of 1/r to the typical magnitude,

ν∇

2

v is roughly νv/r

2

. The ratio of the inertial term to the viscous term

is then roughly (v

2

/r)/(νv/r

2

). This ratio simpliﬁes to rv/ν—the familiar,

dimensionless, Reynolds number.

Thus, the Reynolds number measures the importance of viscosity. When

Re 1, the viscous term is small, and viscosity has a negligible eﬀect. It

cannot prevent nearby pieces of ﬂuid from acquiring signiﬁcantly diﬀerent

velocities, and the ﬂow becomes turbulent. When Re 1, the viscous

term is large, and viscosity is the dominant physical eﬀect. The ﬂow

oozes, as when pouring cold honey.

3.5 Predicting the period of a pendulum

Lumping not only turns integration into multiplication, it turns nonlin-

ear into linear diﬀerential equations. Our example is the analysis of the

period of a pendulum, for centuries the basis of Western timekeeping.

How does the period of a pendulum depend on its amplitude?

m

l

θ

The amplitude θ

0

is the maximum angle of the swing; for a loss-

less pendulum released from rest, it is also the angle of release.

The eﬀect of amplitude is contained in the solution to the pendu-

lum diﬀerential equation (see [24] for the equation’s derivation):

d

2

θ

dt

2

+

g

l

sinθ = 0. (3.32)

The analysis will use all our tools: dimensions (Section 3.5.2), easy cases

(Section 3.5.1 and Section 3.5.3), and lumping (Section 3.5.4).

3.5 Predicting the period of a pendulum 47

Problem 3.23 Angles

Explain why angles are dimensionless.

Problem 3.24 Checking and using dimensions

Does the pendulum equation have correct dimensions? Use dimensional analy-

sis to show that the equation cannot contain the mass of the bob (except as a

common factor that divides out).

3.5.1 Small amplitudes: Applying extreme cases

θ

1

sinθ

unit circle

θ

The pendulum equation is diﬃcult because of its

nonlinear factor sinθ. Fortunately, the factor is easy

in the small-amplitude extreme case θ →0. In that

limit, the height of the triangle, which is sinθ, is

almost exactly the arclength θ. Therefore, for small

angles, sinθ ≈ θ.

Problem 3.25 Chord approximation

The sinθ ≈ θ approximation replaces the arc with a straight, vertical line. To

make a more accurate approximation, replace the arc with the chord (a straight

but nonvertical line). What is the resulting approximation for sinθ?

In the small-amplitude extreme, the pendulum equation becomes linear:

d

2

θ

dt

2

+

g

l

θ = 0. (3.33)

Compare this equation to the spring–mass equation (Section 3.4)

d

2

x

dt

2

+

k

m

x = 0. (3.34)

The equations correspond with x analogous to θ and k/m analogous

to g/l. The frequency of the spring–mass system is ω =

k/m, and

its period is T = 2π/ω = 2π

**m/k. For the pendulum equation, the
**

corresponding period is

T = 2π

l

g

(for small amplitudes). (3.35)

(This analysis is a preview of the method of analogy, which is the subject

of Chapter 6.)

48 3 Lumping

Problem 3.26 Checking dimensions

Does the period 2π

**l/g have correct dimensions?
**

Problem 3.27 Checking extreme cases

Does the period T = 2π

**l/g make sense in the extreme cases g → ∞ and
**

g →0?

Problem 3.28 Possible coincidence

Is it a coincidence that g ≈ π

2

ms

−2

? (For an extensive historical discussion

that involves the pendulum, see [1] and more broadly also [4, 27, 42].)

Problem 3.29 Conical pendulum for the constant

m

l

θ

The dimensionless factor of 2π can be derived using an in-

sight from Huygens [15, p. 79]: to analyze the motion of a

pendulum moving in a horizontal circle (a conical pendu-

lum). Projecting its two-dimensional motion onto a ver-

tical screen produces one-dimensional pendulum motion,

so the period of the two-dimensional motion is the same

as the period of one-dimensional pendulum motion! Use

that idea along with Newton’s laws of motion to explain

the 2π.

3.5.2 Arbitrary amplitudes: Applying dimensional analysis

The preceding results might change if the amplitude θ

0

is no longer small.

As θ

0

increases, does the period increase, remain constant, or decrease?

Any analysis becomes cleaner if expressed using dimensionless groups

(Section 2.4.1). This problem involves the period T, length l, gravitational

strength g, and amplitude θ

0

. Therefore, T can belong to the dimen-

sionless group T

**l/g. Because angles are dimensionless, θ
**

0

is itself a

dimensionless group. The two groups T

l/g and θ

0

are independent

and fully describe the problem (Problem 3.30).

k

m

x

0

An instructive contrast is the ideal spring–mass

system. The period T, spring constant k, and mass

m can form the dimensionless group T

m/k; but

the amplitude x

0

, as the only quantity containing

a length, cannot be part of any dimensionless group (Problem 3.20) and

cannot therefore aﬀect the period of the spring–mass system. In contrast,

3.5 Predicting the period of a pendulum 49

the pendulum’s amplitude θ

0

is already a dimensionless group, so it can

aﬀect the period of the system.

Problem 3.30 Choosing dimensionless groups

Check that period T, length l, gravitational strength g, and amplitude θ

0

pro-

duce two independent dimensionless groups. In constructing useful groups for

analyzing the period, why should T appear in only one group? And why should

θ

0

not appear in the same group as T?

Two dimensionless groups produce the general dimensionless form

one group = function of the other group, (3.36)

so

T

l/g

= function of θ

0

. (3.37)

Because T

l/g = 2π when θ

0

= 0 (the small-amplitude limit), factor out

the 2π to simplify the subsequent equations, and deﬁne a dimensionless

period h as follows:

T

l/g

= 2πh(θ

0

). (3.38)

The function h contains all information about how amplitude aﬀects the

period of a pendulum. Using h, the original question about the period be-

comes the following: Is h an increasing, constant, or decreasing function

of amplitude? This question is answered in the following section.

3.5.3 Large amplitudes: Extreme cases again

For guessing the general behavior of h as a function of amplitude, useful

clues come from evaluating h at two amplitudes. One easy amplitude is

the extreme of zero amplitude, where h(0) = 1. A second easy amplitude

is the opposite extreme of large amplitudes.

How does the period behave at large amplitudes? As part of that question, what

is a large amplitude?

An interesting large amplitude is π/2, which means releasing the pendu-

lum from horizontal. However, at π/2 the exact h is the following awful

expression (Problem 3.31):

50 3 Lumping

h(π/2) =

√

2

π

π/2

0

dθ

√

cos θ

. (3.39)

Is this integral less than, equal to, or more than 1? Who knows? The inte-

gral is likely to have no closed form and to require numerical evaluation

(Problem 3.32).

Problem 3.31 General expression for h

Use conservation of energy to show that the period is

T(θ

0

) = 2

√

2

l

g

θ

0

0

dθ

√

cos θ −cos θ

0

. (3.40)

Conﬁrm that the equivalent dimensionless statement is

h(θ

0

) =

√

2

π

θ

0

0

dθ

√

cos θ −cos θ

0

. (3.41)

For horizontal release, θ

0

= π/2, and

h(π/2) =

√

2

π

π/2

0

dθ

√

cos θ

. (3.42)

Problem 3.32 Numerical evaluation for horizontal release

Why do the lumping recipes (Section 3.2) fail for the integrals in Problem 3.31?

Compute h(π/2) using numerical integration.

Because θ

0

= π/2 is not a helpful extreme, be even more extreme. Try

θ

0

= π, which means releasing the pendulum bob from vertical. If the

bob is connected to the pivot point by a string, however, a vertical release

would mean that the bob falls straight down instead of oscillating. This

novel behavior is neither included in nor described by the pendulum

diﬀerential equation.

θ

0

h(θ

0

)

π

1 1

Fortunately, a thought experiment is cheap to im-

prove: Replace the string with a massless steel

rod. Balanced perfectly at θ

0

= π, the pendulum

bob hangs upside down forever, so T(π) = ∞and

h(π) = ∞. Thus, h(π) > 1 and h(0) = 1. From

these data, the most likely conjecture is that h in-

creases monotonically with amplitude. Although

h could ﬁrst decrease and then increase, such twists and turns would

be surprising behavior from such a clean diﬀerential equation. (For the

behavior of h near θ

0

= π, see Problem 3.34).

3.5 Predicting the period of a pendulum 51

Problem 3.33 Small but nonzero amplitude

θ

0

h

1

A

B

As the amplitude approaches π, the dimensionless period h

diverges to inﬁnity; at zero amplitude, h = 1. But what about

the derivative of h? At zero amplitude (θ

0

= 0), does h(θ

0

)

have zero slope (curve A) or positive slope (curve B)?

Problem 3.34 Nearly vertical release

β h(π −β)

10

−1

2.791297

10

−2

4.255581

10

−3

5.721428

10

−4

7.187298

Imagine releasing the pendulum from almost vertical:

an initial angle π − β with β tiny. As a function of β,

roughly how long does the pendulum take to rotate by

a signiﬁcant angle—say, by 1 rad? Use that information

to predict how h(θ

0

) behaves when θ

0

≈ π. Check and

reﬁne your conjectures using the tabulated values. Then

predict h(π −10

−5

).

3.5.4 Moderate amplitudes: Applying lumping

The conjecture that h increases monotonically was derived using the ex-

tremes of zero and vertical amplitude, so it should apply at intermediate

amplitudes. Before taking that statement on faith, recall a proverb from

arms-control negotiations: “Trust, but verify.”

At moderate (small but nonzero) amplitudes, does the period, or its dimensionless

cousin h, increase with amplitude?

In the zero-amplitude extreme, sinθ is close to θ. That approximation

turned the nonlinear pendulum equation

d

2

θ

dt

2

+

g

l

sinθ = 0 (3.43)

into the linear, ideal-spring equation—in which the period is independent

of amplitude.

At nonzero amplitude, however, θ and sinθ diﬀer and their diﬀerence

aﬀects the period. To account for the diﬀerence and predict the period,

split sinθ into the tractable factor θ and an adjustment factor f(θ). The

resulting equation is

d

2

θ

dt

2

+

g

l

θ

sinθ

θ

. .. .

f(θ)

= 0. (3.44)

52 3 Lumping

0

1

0 θ

0

f(θ)

The nonconstant f(θ) encapsulates the nonlinearity of

the pendulum equation. When θ is tiny, f(θ) ≈ 1: The

pendulum behaves like a linear, ideal-spring system.

But when θ is large, f(θ) falls signiﬁcantly below 1,

making the ideal-spring approximation signiﬁcantly

inaccurate. As is often the case, a changing process is

diﬃcult to analyze—for example, see the awful integrals in Problem 3.31.

As a countermeasure, make a lumping approximation by replacing the

changing f(θ) with a constant.

0

1

0 θ

0

f(0)

The simplest constant is f(0). Then the pendu-

lum diﬀerential equation becomes

d

2

θ

dt

2

+

g

l

θ = 0. (3.45)

This equation is, again, the ideal-spring equation.

In this approximation, period does not depend on amplitude, so h = 1 for

all amplitudes. For determining how the period of an unapproximated

pendulum depends on amplitude, the f(θ) → f(0) lumping approxima-

tion discards too much information.

0

1

0 θ

0

f(θ

0

)

Therefore, replace f(θ) with the other extreme

f(θ

0

). Then the pendulum equation becomes

d

2

θ

dt

2

+

g

l

θf(θ

0

) = 0. (3.46)

Is this equation linear? What physical system does

it describe?

Because f(θ

0

) is a constant, this equation is linear! It describes a zero-

amplitude pendulum on a planet with gravity g

eﬀ

that is slightly weaker

than earth gravity—as shown by the following slight regrouping:

d

2

θ

dt

2

+

g

eﬀ

. .. .

gf(θ

0

)

l

θ = 0. (3.47)

Because the zero-amplitude pendulum has period T = 2π

l/g, the zero-

amplitude, low-gravity pendulum has period

T(θ

0

) ≈ 2π

l

g

eﬀ

= 2π

l

gf(θ

0

)

. (3.48)

3.5 Predicting the period of a pendulum 53

θ

0

π

1

h

f

−1/2

Using the dimensionless period h avoids writing

the factors of 2π, l, and g, and it yields the simple

prediction

h(θ

0

) ≈ f(θ

0

)

−1/2

=

sinθ

0

θ

0

−1/2

. (3.49)

At moderate amplitudes the approximation closely

follows the exact dimensionless period (dark curve). As a bonus, it also

predicts h(π) = ∞, so it agrees with the thought experiment of releasing

the pendulum from upright (Section 3.5.3).

How much larger than the period at zero amplitude is the period at 10

◦

amplitude?

A 10

◦

amplitude is roughly 0.17 rad, a moderate angle, so the approximate

prediction for h can itself accurately be approximated using a Taylor series.

The Taylor series for sinθ begins θ −θ

3

/6, so

f(θ

0

) =

sinθ

0

θ

0

≈ 1 −

θ

2

0

6

. (3.50)

Then h(θ

0

), which is roughly f(θ

0

)

−1/2

, becomes

h(θ

0

) ≈

1 −

θ

2

0

6

−1/2

. (3.51)

Another Taylor series yields (1 +x)

−1/2

≈ 1 −x/2 (for small x). Therefore,

h(θ

0

) ≈ 1 +

θ

2

0

12

. (3.52)

Restoring the dimensioned quantities gives the period itself.

T ≈ 2π

l

g

1 +

θ

2

0

12

. (3.53)

Compared to the period at zero amplitude, a 10

◦

amplitude produces a

fractional increase of roughly θ

2

0

/12 ≈ 0.0025 or 0.25%. Even at moderate

amplitudes, the period is nearly independent of amplitude!

Problem 3.35 Slope revisited

Use the preceding result for h(θ

0

) to check your conclusion in Problem 3.33

about the slope of h(θ

0

) at θ

0

= 0.

54 3 Lumping

Does our lumping approximation underestimate or overestimate the period?

The lumping approximation simpliﬁed the pendulum diﬀerential equa-

tion by replacing f(θ) with f(θ

0

). Equivalently, it assumed that the mass

always remained at the endpoints of the motion where |θ| = θ

0

. Instead,

the pendulum spends much of its time at intermediate positions where

|θ| < θ

0

and f(θ) > f(θ

0

). Therefore, the average f is greater than f(θ

0

).

Because h is inversely related to f (h = f

−1/2

), the f(θ) → f(θ

0

) lumping

approximation overestimates h and the period.

The f(θ) → f(0) lumping approximation, which predicts T = 2π

l/g,

underestimates the period. Therefore, the true coeﬃcient of the θ

2

0

term

in the period approximation

T ≈ 2π

l

g

1 +

θ

2

0

12

(3.54)

lies between 0 and 1/12. A natural guess is that the coeﬃcient lies halfway

between these extremes—namely, 1/24. However, the pendulum spends

more time toward the extremes (where f(θ) = f(θ

0

)) than it spends near

the equilibrium position (where f(θ) = f(0)). Therefore, the true coef-

ﬁcient is probably closer to 1/12—the prediction of the f(θ) → f(θ

0

)

approximation—than it is to 0. An improved guess might be two-thirds

of the way from 0 to 1/12, namely 1/18.

In comparison, a full successive-approximation solution of the pendulum

diﬀerential equation gives the following period [13, 33]:

T = 2π

l

g

1 +

1

16

θ

2

0

+

11

3072

θ

4

0

+· · ·

. (3.55)

Our educated guess of 1/18 is very close to the true coeﬃcient of 1/16!

3.6 Summary and further problems

Lumping turns calculus on its head. Whereas calculus analyzes a chang-

ing process by dividing it into ever ﬁner intervals, lumping simpliﬁes a

changing process by combining it into one unchanging process. It turns

curves into straight lines, diﬃcult integrals into multiplication, and mildly

nonlinear diﬀerential equations into linear diﬀerential equations.

. . . the crooked shall be made straight, and the rough places plain. (Isaiah 40:4)

3.6 Summary and further problems 55

Problem 3.36 FWHM for another decaying function

Use the FWHM heuristic to estimate

∞

−∞

dx

1 +x

4

. (3.56)

Then compare the estimate with the exact value of π/

√

2. For an enjoyable

additional problem, derive the exact value.

Problem 3.37 Hypothetical pendulum equation

Suppose the pendulum equation had been

d

2

θ

dθ

2

+

g

l

tanθ = 0. (3.57)

How would the period T depend on amplitude θ

0

? In particular, as θ

0

increases,

would T decrease, remain constant, or increase? What is the slope dT/dθ

0

at

zero amplitude? Compare your results with the results of Problem 3.33.

For small but nonzero θ

0

, ﬁnd an approximate expression for the dimensionless

period h(θ

0

) and use it to check your previous conclusions.

Problem 3.38 Gaussian 1-sigma tail

The Gaussian probability density function with zero mean and unit variance is

p(x) =

e

−x

2

/2

√

2π

. (3.58)

The area of its tail is an important quantity in statistics, but it has no closed form.

In this problem you estimate the area of the 1-sigma tail

∞

1

e

−x

2

/2

√

2π

dx. (3.59)

a. Sketch the above Gaussian and shade the 1-sigma tail.

b. Use the 1/e lumping heuristic (Section 3.2.1) to estimate the area.

c. Use the FWHM heuristic to estimate the area.

d. Compare the two lumping estimates with the result of numerical integration:

∞

1

e

−x

2

/2

√

2π

dx =

1 − erf(1/

√

2)

2

≈ 0.159, (3.60)

where erf(z) is the error function.

Problem 3.39 Distant Gaussian tails

For the canonical probability Gaussian, estimate the area of its n-sigma tail (for

large n). In other words, estimate

∞

n

e

−x

2

/2

√

2π

dx. (3.61)

4

Pictorial proofs

4.1 Adding odd numbers 58

4.2 Arithmetic and geometric means 60

4.3 Approximating the logarithm 66

4.4 Bisecting a triangle 70

4.5 Summing series 73

4.6 Summary and further problems 75

Have you ever worked through a proof, understood and conﬁrmed each

step, yet still not believed the theorem? You realize that the theorem is

true, but not why it is true.

To see the same contrast in a familiar example, imagine learning that your

child has a fever and hearing the temperature in Fahrenheit or Celsius

degrees, whichever is less familiar. In my everyday experience, tempera-

tures are mostly in Fahrenheit. When I hear about a temperature of 40

◦

C,

I therefore react in two stages:

1. I convert 40

◦

C to Fahrenheit: 40 ×1.8 +32 = 104.

2. I react: “Wow, 104

◦

F. That’s dangerous! Get thee to a doctor!”

The Celsius temperature, although symbolically equivalent to the Fahren-

heit temperature, elicits no reaction. My danger sense activates only after

the temperature conversion connects the temperature to my experience.

A symbolic description, whether a proof or an unfamiliar temperature, is

unconvincing compared to an argument that speaks to our perceptual sys-

tem. The reason lies in how our brains acquired the capacity for symbolic

reasoning. (See Evolving Brains [2] for an illustrated, scholarly history of

the brain.) Symbolic, sequential reasoning requires language, which has

58 4 Pictorial proofs

evolved for only 10

5

yr. Although 10

5

yr spans many human lifetimes, it

is an evolutionary eyeblink. In particular, it is short compared to the time

span over which our perceptual hardware has evolved: For several hun-

dred million years, organisms have reﬁned their capacities for hearing,

smelling, tasting, touching, and seeing.

Evolution has worked 1000 times longer on our perceptual abilities than

on our symbolic-reasoning abilities. Compared to our perceptual hard-

ware, our symbolic, sequential hardware is an ill-developed latecomer.

Not surprisingly, our perceptual abilities far surpass our symbolic abil-

ities. Even an apparently high-level symbolic activity such as playing

grandmaster chess uses mostly perceptual hardware [16]. Seeing an idea

conveys to us a depth of understanding that a symbolic description of it

cannot easily match.

Problem 4.1 Computers versus people

At tasks like expanding (x + 2y)

50

, computers are much faster than people. At

tasks like recognizing faces or smells, even young children are much faster than

current computers. How do you explain these contrasts?

Problem 4.2 Linguistic evidence for the importance of perception

In your favorite language(s), think of the many sensory synonyms for under-

standing (for example, grasping).

4.1 Adding odd numbers

To illustrate the value of pictures, let’s ﬁnd the sum of the ﬁrst n odd

numbers (also the subject of Problem 2.25):

S

n

= 1 +3 +5 +· · · + (2n −1)

. .. .

n terms

. (4.1)

Easy cases such as n = 1, 2, or 3 lead to the conjecture that S

n

= n

2

.

But how can the conjecture be proved? The standard symbolic method is

proof by induction:

1. Verify that S

n

= n

2

for the base case n = 1. In that case, S

1

is 1, as is

n

2

, so the base case is veriﬁed.

2. Make the induction hypothesis: Assume that S

m

= m

2

for m less than

or equal to a maximum value n. For this proof, the following, weaker

induction hypothesis is suﬃcient:

4.1 Adding odd numbers 59

n

¸

1

(2k −1) = n

2

. (4.2)

In other words, we assume the theorem only in the case that m = n.

3. Perform the induction step: Use the induction hypothesis to show that

S

n+1

= (n +1)

2

. The sum S

n+1

splits into two pieces:

S

n+1

=

n+1

¸

1

(2k −1) = (2n +1) +

n

¸

1

(2k −1). (4.3)

Thanks to the induction hypothesis, the sum on the right is n

2

. Thus

S

n+1

= (2n +1) +n

2

, (4.4)

which is (n +1)

2

; and the theorem is proved.

Although these steps prove the theorem, why the sum S

n

ends up as n

2

still feels elusive.

That missing understanding—the kind of gestalt insight described by

Wertheimer [48]—requires a pictorial proof. Start by drawing each odd

number as an L-shaped puzzle piece:

1

3

5

(4.5)

How do these pieces ﬁt together?

Then compute S

n

by ﬁtting together the puzzle pieces as follows:

S

2

=

1

+

3

=

1

3

S

3

=

1

+

3

+

5

=

1

3

5

(4.6)

Each successive odd number—each piece—extends the square by 1 unit

in height and width, so the n terms build an n × n square. [Or is it an

(n −1) ×(n −1) square?] Therefore, their sum is n

2

. After grasping this

pictorial proof, you cannot forget why adding up the ﬁrst n odd numbers

produces n

2

.

60 4 Pictorial proofs

Problem 4.3 Triangular numbers

Draw a picture or pictures to show that

1 +2 +3 +· · · +n +· · · +3 +2 +1 = n

2

. (4.7)

Then show that

1 +2 +3 +· · · +n =

n(n +1)

2

. (4.8)

Problem 4.4 Three dimensions

Draw a picture to show that

n

¸

0

(3k

2

+3k +1) = (n +1)

3

. (4.9)

Give pictorial explanations for the 1 in the summand 3k

2

+3k+1; for the 3 and

the k

2

in 3k

2

; and for the 3 and the k in 3k.

4.2 Arithmetic and geometric means

The next pictorial proof starts with two nonnegative numbers—for exam-

ple, 3 and 4—and compares the following two averages:

arithmetic mean ≡

3 +4

2

= 3.5; (4.10)

geometric mean ≡

√

3 ×4 ≈ 3.464. (4.11)

Try another pair of numbers—for example, 1 and 2. The arithmetic mean

is 1.5; the geometric mean is

√

2 ≈ 1.414. For both pairs, the geometric

mean is smaller than the arithmetic mean. This pattern is general; it is

the famous arithmetic-mean–geometric-mean (AM–GM) inequality [18]:

a +b

2

. .. .

AM

√

ab

....

GM

. (4.12)

(The inequality requires that a, b 0.)

Problem 4.5 More numerical examples

Test the AM–GM inequality using varied numerical examples. What do you

notice when a and b are close to each other? Can you formalize the pattern?

(See also Problem 4.16.)

4.2 Arithmetic and geometric means 61

4.2.1 Symbolic proof

The AM–GM inequality has a pictorial and a symbolic proof. The sym-

bolic proof begins with (a−b)

2

—a surprising choice because the inequal-

ity contains a + b rather than a − b. The second odd choice is to form

(a − b)

2

. It is nonnegative, so a

2

− 2ab + b

2

0. Now magically decide

to add 4ab to both sides. The result is

a

2

+2ab +b

2

. .. .

(a+b)

2

4ab. (4.13)

The left side is (a +b)

2

, so a +b 2

√

ab and

a +b

2

√

ab. (4.14)

Although each step is simple, the whole chain seems like magic and leaves

the why mysterious. If the algebra had ended with (a + b)/4

√

ab, it

would not look obviously wrong. In contrast, a convincing proof would

leave us feeling that the inequality cannot help but be true.

4.2.2 Pictorial proof

This satisfaction is provided by a pictorial proof.

What is pictorial, or geometric, about the geometric mean?

x

a

b

A geometric picture for the geometric mean starts

with a right triangle. Lay it with its hypotenuse

horizontal; then cut it with the altitude x into

the light and dark subtriangles. The hypotenuse

splits into two lengths a and b, and the altitude

x is their geometric mean

√

ab.

Why is the altitude x equal to

√

ab?

b

x

To show that x =

√

ab, compare the small, dark triangle

to the large, light triangle by rotating the small triangle

and laying it on the large triangle. The two triangles are

similar! Therefore, their aspect ratios (the ratio of the

short to the long side) are identical. In symbols, x/a =

b/x: The altitude x is therefore the geometric mean

√

ab.

62 4 Pictorial proofs

The uncut right triangle represents the geometric-mean portion of the

AM–GM inequality. The arithmetic mean (a +b)/2 also has a picture, as

one-half of the hypotenuse. Thus, the inequality claims that

hypotenuse

2

altitude. (4.15)

Alas, this claim is not pictorially obvious.

Can you ﬁnd an alternative geometric interpretation of the arithmetic mean that

makes the AM–GM inequality pictorially obvious?

√

ab

a+b

2

a

b

The arithmetic mean is also the radius

of a circle with diameter a + b. There-

fore, circumscribe a semicircle around

the triangle, matching the circle’s diam-

eter with the hypotenuse a + b (Prob-

lem 4.7). The altitude cannot exceed the

radius; therefore,

a +b

2

√

ab. (4.16)

Furthermore, the two sides are equal only when the altitude of the triangle

is also a radius of the semicircle—namely when a = b. The picture

therefore contains the inequality and its equality condition in one easy-

to-grasp object. (An alternative pictorial proof of the AM–GM inequality

is developed in Problem 4.33.)

Problem 4.6 Circumscribing a circle around a triangle

Here are a few examples showing a circle circumscribed around a triangle.

Draw a picture to show that the circle is uniquely determined by the triangle.

Problem 4.7 Finding the right semicircle

A triangle uniquely determines its circumscribing circle (Problem 4.6). However,

the circle’s diameter might not align with a side of the triangle. Can a semicir-

cle always be circumscribed around a right triangle while aligning the circle’s

diameter along the hypotenuse?

4.2 Arithmetic and geometric means 63

Problem 4.8 Geometric mean of three numbers

For three nonnegative numbers, the AM–GM inequality is

a +b +c

3

(abc)

1/3

. (4.17)

Why is this inequality, in contrast to its two-number cousin, unlikely to have a

geometric proof? (If you ﬁnd a proof, let me know.)

4.2.3 Applications

Arithmetic and geometric means have wide mathematical application.

The ﬁrst application is a problem more often solved with derivatives:

Fold a ﬁxed length of fence into a rectangle enclosing the largest garden.

What shape of rectangle maximizes the area?

a

b

garden

The problem involves two quantities: a perimeter that

is ﬁxed and an area to maximize. If the perimeter is re-

lated to the arithmetic mean and the area to the geometric

mean, then the AM–GM inequality might help maximize

the area. The perimeter P = 2(a + b) is four times the

arithmetic mean, and the area A = ab is the square of the

geometric mean. Therefore, from the AM–GM inequality,

P

4

....

AM

√

A

....

GM

(4.18)

with equality when a = b. The left side is ﬁxed by the amount of fence.

Thus the right side, which varies depending on a and b, has a maximum

of P/4 when a = b. The maximal-area rectangle is a square.

Problem 4.9 Direct pictorial proof

The AM–GM reasoning for the maximal rectangular garden is indirect pictorial

reasoning. It is symbolic reasoning built upon the pictorial proof for the AM–

GM inequality. Can you draw a picture to show directly that the square is the

optimal shape?

Problem 4.10 Three-part product

Find the maximum value of f(x) = x

2

(1 −2x) for x 0, without using calculus.

Sketch f(x) to conﬁrm your answer.

64 4 Pictorial proofs

Problem 4.11 Unrestricted maximal area

If the garden need not be rectangular, what is the maximal-area shape?

Problem 4.12 Volume maximization

base

ﬂap x

x

Build an open-topped box as follows: Start with a unit square,

cut out four identical corners, and fold in the ﬂaps. The box

has volume V = x(1 − 2x)

2

, where x is the side length of a

corner cutout. What choice of x maximizes the volume of the

box?

Here is a plausible analysis modeled on the analysis of the

rectangular garden. Set a = x, b = 1 − 2x, and c = 1 − 2x. Then abc is the

volume V, and V

1/3

=

3

√

abc is the geometric mean (Problem 4.8). Because the

geometric mean never exceeds the arithmetic mean and because the two means

are equal when a = b = c, the maximum volume is attained when x = 1 − 2x.

Therefore, choosing x = 1/3 should maximize the volume of the box.

Now show that this choice is wrong by graphing V(x) or setting dV/dx = 0;

explain what is wrong with the preceding reasoning; and make a correct version.

Problem 4.13 Trigonometric minimum

Find the minimum value of

9x

2

sin

2

x +4

x sinx

(4.19)

in the region x ∈ (0, π).

Problem 4.14 Trigonometric maximum

In the region t ∈ [0, π/2], maximize sin2t or, equivalently, 2 sint cos t.

The second application of arithmetic and geometric means is a modern,

amazingly rapid method for computing π [5, 6]. Ancient methods for

computing π included calculating the perimeter of many-sided regular

polygons and provided a few decimal places of accuracy.

Recent computations have used Leibniz’s arctangent series

arctanx = x −

x

3

3

+

x

5

5

−

x

7

7

+· · · . (4.20)

Imagine that you want to compute π to 10

9

digits, perhaps to test the

hardware of a new supercomputer or to study whether the digits of π are

random (a theme in Carl Sagan’s novel Contact [40]). Setting x = 1 in the

Leibniz series produces π/4, but the series converges extremely slowly.

Obtaining 10

9

digits requires roughly 10

10

9

terms—far more terms than

atoms in the universe.

4.2 Arithmetic and geometric means 65

Fortunately, a surprising trigonometric identity due to John Machin (1686–

1751)

arctan1 = 4 arctan

1

5

−arctan

1

239

(4.21)

accelerates the convergence by reducing x:

π

4

= 4 ×

1 −

1

3 ×5

3

+· · ·

. .. .

arctan(1/5)

−

1 −

1

3 ×239

3

+· · ·

. .. .

arctan(1/239)

. (4.22)

Even with the speedup, 10

9

-digit accuracy requires calculating roughly

10

9

terms.

In contrast, the modern Brent–Salamin algorithm [3, 41], which relies on

arithmetic and geometric means, converges to π extremely rapidly. The

algorithm is closely related to amazingly accurate methods for calculating

the perimeter of an ellipse (Problem 4.15) and also for calculating mutual

inductance [23]. The algorithm generates several sequences by starting

with a

0

= 1 and g

0

= 1/

√

2; it then computes successive arithmetic means

a

n

, geometric means g

n

, and their squared diﬀerences d

n

.

a

n+1

=

a

n

+g

n

2

, g

n+1

=

√

a

n

g

n

, d

n

= a

2

n

−g

2

n

. (4.23)

The a and g sequences rapidly converge to a number M(a

0

, g

0

) called

the arithmetic–geometric mean of a

0

and g

0

. Then M(a

0

, g

0

) and the

diﬀerence sequence d determine π.

π =

4M(a

0

, g

0

)

2

1 −

¸

∞

j=1

2

j+1

d

j

. (4.24)

The d sequence approaches zero quadratically; in other words, d

n+1

∼ d

2

n

(Problem 4.16). Therefore, each iteration in this computation of π doubles

the digits of accuracy. A billion-digit calculation of π requires only about

30 iterations—far fewer than the 10

10

9

terms using the arctangent series

with x = 1 or even than the 10

9

terms using Machin’s speedup.

Problem 4.15 Perimeter of an ellipse

To compute the perimeter of an ellipse with semimajor axis a

0

and semiminor

axis g

0

, compute the a, g, and d sequences and the common limit M(a

0

, g

0

) of

the a and g sequences, as for the computation of π. Then the perimeter P can

be computed with the following formula:

66 4 Pictorial proofs

P =

A

M(a

0

, g

0

)

⎛

⎝

a

2

0

−B

∞

¸

j=0

2

j

d

j

⎞

⎠

, (4.25)

where A and B are constants for you to determine. Use the method of easy cases

(Chapter 2) to determine their values. (See [3] to check your values and for a

proof of the completed formula.)

Problem 4.16 Quadratic convergence

Start with a

0

= 1 and g

0

= 1/

√

2 (or any other positive pair) and follow several

iterations of the AM–GM sequence

a

n+1

=

a

n

+g

n

2

and g

n+1

=

√

a

n

g

n

. (4.26)

Then generate d

n

= a

2

n

−g

2

n

and log

10

d

n

to check that d

n+1

∼ d

2

n

(quadratic

convergence).

Problem 4.17 Rapidity of convergence

Pick a positive x

0

; then generate a sequence by the iteration

x

n+1

=

1

2

x

n

+

2

x

n

(n 0). (4.27)

To what and how rapidly does the sequence converge? What if x

0

< 0?

4.3 Approximating the logarithm

θ

1

sinθ

unit circle

θ

A function is often approximated by its Taylor series

f(x) = f(0) +x

df

dx

x=0

+

x

2

2

d

2

f

dx

2

x=0

+· · · , (4.28)

which looks like an unintuitive sequence of symbols.

Fortunately, pictures often explain the ﬁrst and most

important terms in a function approximation. For example, the one-term

approximation sinθ ≈ θ, which replaces the altitude of the triangle by

the arc of the circle, turns the nonlinear pendulum diﬀerential equation

into a tractable, linear equation (Section 3.5).

Another Taylor-series illustration of the value of pictures come from the

series for the logarithm function:

ln(1 +x) = x −

x

2

2

+

x

3

3

−· · · . (4.29)

4.3 Approximating the logarithm 67

Its ﬁrst term, x, will lead to the wonderful approximation (1 + x)

n

≈ e

nx

for small x and arbitrary n (Section 5.3.4). Its second term, −x

2

/2, helps

evaluate the accuracy of that approximation. These ﬁrst two terms are

the most useful terms—and they have pictorial explanations.

1

1+t

ln(1 +x)

0 x

1

t

The starting picture is the integral representation

ln(1 +x) =

x

0

dt

1 +t

. (4.30)

What is the simplest approximation for the shaded area?

1

1+t

x

0 x

1

t

As a ﬁrst approximation, the shaded area is roughly

the circumscribed rectangle—an example of lump-

ing. The rectangle has area x:

area = height

. .. .

1

×width

. .. .

x

= x. (4.31)

This area reproduces the ﬁrst term in the Taylor series. Because it uses a

circumscribed rectangle, it slightly overestimates ln(1 +x).

1

1+t

0 x

1

t

The area can also be approximated by drawing an in-

scribed rectangle. Its width is again x, but its height

is not 1 but rather 1/(1+x), which is approximately

1 − x (Problem 4.18). Thus the inscribed rectangle

has the approximate area x(1 − x) = x − x

2

. This

area slightly underestimates ln(1 +x).

Problem 4.18 Picture for approximating the reciprocal function

Conﬁrm the approximation

1

1 +x

≈ 1 −x (for small x) (4.32)

by trying x = 0.1 or x = 0.2. Then draw a picture to illustrate the equivalent

approximation (1 −x)(1 +x) ≈ 1.

We now have two approximations to ln(1 + x). The ﬁrst and slightly

simpler approximation came from drawing the circumscribed rectangle.

The second approximation came from drawing the inscribed rectangle.

Both dance around the exact value.

How can the inscribed- and circumscribed-rectangle approximations be combined

to make an improved approximation?

68 4 Pictorial proofs

1

1+t

0 x

1

t

One approximation overestimates the area, and the

other underestimates the area; their average ought

to improve on either approximation. The average is

a trapezoid with area

x + (x −x

2

)

2

= x −

x

2

2

. (4.33)

This area reproduces the ﬁrst two terms of the full Taylor series

ln(1 +x) = x −

x

2

2

+

x

3

3

−· · · . (4.34)

Problem 4.19 Cubic term

Estimate the cubic term in the Taylor series by estimating the diﬀerence between

the trapezoid and the true area.

For these logarithm approximations, the hardest problem is ln2.

ln(1 +1) ≈

1 (one term)

1 −

1

2

(two terms).

(4.35)

Both approximations diﬀer signiﬁcantly from the true value (roughly

0.693). Even moderate accuracy for ln2 requires many terms of the Taylor

series, far beyond what pictures explain (Problem 4.20). The problem is

that x in ln(1 + x) is 1, so the x

n

factor in each term of the Taylor series

does not shrink the high-n terms.

The same problem happens when computing π using Leibniz’s arctangent

series (Section 4.2.3)

arctanx = x −

x

3

3

+

x

5

5

−

x

7

7

+· · · . (4.36)

By using x = 1, the direct approximation of π/4 requires many terms

to attain even moderate accuracy. Fortunately, the trigonometric identity

arctan1 = 4 arctan1/5 − arctan1/239 lowers the largest x to 1/5 and

thereby speeds the convergence.

Is there an analogous that helps estimate ln2?

Because 2 is also (4/3)/(2/3), an analogous rewriting of ln2 is

ln2 = ln

4

3

−ln

2

3

. (4.37)

4.3 Approximating the logarithm 69

Each fraction has the form 1 + x with x = ±1/3. Because x is small, one

term of the logarithm series might provide reasonable accuracy. Let’s

therefore use ln(1 +x) ≈ x to approximate the two logarithms:

ln2 ≈

1

3

−

−

1

3

=

2

3

. (4.38)

This estimate is accurate to within 5%!

The rewriting trick has helped to compute π (by rewriting the arctanx

series) and to estimate ln(1 + x) (by rewriting x itself). This idea there-

fore becomes a method—a trick that I use twice (this deﬁnition is often

attributed to Polya).

Problem 4.20 How many terms?

The full Taylor series for the logarithm is

ln(1 +x) =

∞

¸

1

(−1)

n+1

x

n

n

. (4.39)

If you set x = 1 in this series, how many terms are required to estimate ln2 to

within 5%?

Problem 4.21 Second rewriting

Repeat the rewriting method by rewriting 4/3 and 2/3; then estimate ln2 using

only one term of the logarithm series. How accurate is the revised estimate?

Problem 4.22 Two terms of the Taylor series

After rewriting ln2 as ln(4/3) − ln(2/3), use the two-term approximation that

ln(1+x) ≈ x−x

2

/2 to estimate ln2. Compare the approximation to the one-term

estimate, namely 2/3. (Problem 4.24 investigates a pictorial explanation.)

Problem 4.23 Rational-function approximation for the logarithm

The replacement ln2 = ln(4/3) −ln(2/3) has the general form

ln(1 +x) = ln

1 +y

1 −y

, (4.40)

where y = x/(2 +x).

Use the expression for y and the one-term series ln(1+x) ≈ x to express ln(1+x)

as a rational function of x (as a ratio of polynomials in x). What are the ﬁrst few

terms of its Taylor series?

Compare those terms to the ﬁrst few terms of the ln(1 + x) Taylor series, and

thereby explain why the rational-function approximation is more accurate than

even the two-term series ln(1 +x) ≈ x −x

2

/2.

70 4 Pictorial proofs

Problem 4.24 Pictorial interpretation of the rewriting

1

1+t

ln2

−1/3 1/3

1

t

a. Use the integral representation of ln(1 + x) to explain

why the shaded area is ln2.

b. Outline the region that represents

ln

4

3

−ln

2

3

(4.41)

when using the circumscribed-rectangle approximation

for each logarithm.

c. Outline the same region when using the trapezoid ap-

proximation ln(1+x) = x−x

2

/2. Show pictorially that

this region, although a diﬀerent shape, has the same area as the region that

you drew in item b.

4.4 Bisecting a triangle

Pictorial solutions are especially likely for a geometric problem:

What is the shortest path that bisects an equilateral triangle into two regions of

equal area?

The possible bisecting paths form an uncountably inﬁnite set. To manage

the complexity, try easy cases (Chapter 2)—draw a few equilateral trian-

gles and bisect them with easy paths. Patterns, ideas, or even a solution

might emerge.

What are a few easy paths?

l =

√

3/2

1

l

The simplest bisecting path is a vertical segment that splits

the triangle into two right triangles each with base 1/2. This

path is the triangle’s altitude, and it has length

l =

1

2

− (1/2)

2

=

√

3

2

≈ 0.866. (4.42)

l = 1/

√

2

An alternative straight path splits the triangle into a trapezoid

and a small triangle.

What is the shape of the smaller triangle, and how long is the path?

The triangle is similar to the original triangle, so it too is equilateral.

Furthermore, it has one-half of the area of the original triangle, so its three

4.4 Bisecting a triangle 71

sides, one of which is the bisecting path, are a factor of

√

2 smaller than the

sides of the original triangle. Thus this path has length 1/

√

2 ≈ 0.707—a

substantial improvement on the vertical path with length

√

3/2.

Problem 4.25 All one-segment paths

An equilateral triangle has inﬁnitely many one-segment bisecting paths.

A few of them are shown in the ﬁgure. Which one-segment path is

the shortest?

l = 1

Now let’s investigate easy two-segment paths. One possible

path encloses a diamond and excludes two small triangles.

The two small triangles occupy one-half of the entire area.

Each small triangle therefore occupies one-fourth of the entire

area and has side length 1/2. Because the bisecting path con-

tains two of these sides, it has length 1. This path is, unfortunately, longer

than our two one-segment candidates, whose lengths are 1/

√

2 and

√

3/2.

Therefore, a reasonable conjecture is that the shortest path has the fewest

segments. This conjecture deserves to be tested (Problem 4.26).

Problem 4.26 All two-segment paths

Draw a ﬁgure showing the variety of two-segment paths. Find the shortest path,

showing that it has length

l = 2 ×3

1/4

×sin15

◦

≈ 0.681. (4.43)

Problem 4.27 Bisecting with closed paths

The bisecting path need not begin or end at an edge of the triangle. Two examples

are illustrated here:

Do you expect closed bisecting paths to be longer or shorter than the shortest

one-segment path? Give a geometric reason for your conjecture, and check the

conjecture by ﬁnding the lengths of the two illustrative closed paths.

Does using fewer segments produce shorter paths?

The shortest one-segment path has an approximate length of 0.707; but the

shortest two-segment path has an approximate length of 0.681. The length

decrease suggests trying extreme paths: paths with an inﬁnite number of

72 4 Pictorial proofs

segments. In other words, try curved paths. The easiest curved path is

probably a circle or a piece of a circle.

What is a likely candidate for the shortest circle or piece of a circle that bisects

the triangle?

Whether the path is a circle or piece of a circle, it needs a center.

However, putting the center inside the triangle and using a full

circle produces a long bisecting path (Problem 4.27). The only

other plausible center is a vertex of the triangle, so imagine a

bisecting arc centered on one vertex.

How long is this arc?

The arc subtends one-sixth (60

◦

) of the full circle, so its length is l = πr/3,

where r is radius of the full circle. To ﬁnd the radius, use the requirement

that the arc must bisect the triangle. Therefore, the arc encloses one-half

of the triangle’s area. The condition on r is that πr

2

= 3

√

3/4:

1

6

×area of the full circle

. .. .

πr

2

=

1

2

×area of the triangle

. .. .

√

3/4

. (4.44)

The radius is therefore (3

√

3/4π)

1/2

; the length of the arc is πr/3, which

is approximately 0.673. This curved path is shorter than the shortest

two-segment path. It might be the shortest possible path.

To test this conjecture, we use symmetry. Because an equilateral triangle

is one-sixth of a hexagon, build a hexagon by replicating the bisected

equilateral triangle. Here is the hexagon built from the triangle bisected

by a horizontal line:

The six bisecting paths form an internal hexagon whose area is one-half

of the area of the large hexagon.

What happens when replicating the triangle bisected by the circular arc?

4.5 Summing series 73

When that triangle is replicated, its six copies make a circle

with area equal to one-half of the area of the hexagon.

For a ﬁxed area, a circle has the shortest perimeter (the

isoperimetric theorem [30] and Problem 4.11); therefore,

one-sixth of the circle is the shortest bisecting path.

Problem 4.28 Replicating the vertical bisection

The triangle bisected by a vertical line, if replicated and only rotated, produces a

fragmented enclosed region rather than a convex polygon. How can the triangle

be replicated so that the six bisecting paths form a regular polygon?

Problem 4.29 Bisecting the cube

Of all surfaces that bisect a cube into two equal volumes, which surface has the

smallest area?

4.5 Summing series

For the ﬁnal example of what pictures can explain, return to the factorial

function. Our ﬁrst approximation to n! began with its integral represen-

tation and then used lumping (Section 3.2.3).

ln2

ln3

ln4

ln5

lnk

1 2 3 4 5

k

Lumping, by replacing a curve with a

rectangle whose area is easily computed,

is already a pictorial analysis. A second

picture for n! begins with the summa-

tion representation

lnn! =

n

¸

1

lnk. (4.45)

This sum equals the combined area of the circumscribing rectangles.

Problem 4.30 Drawing the smooth curve

Setting the height of the rectangles requires drawing the lnk curve—which

could intersect the top edge of each rectangle anywhere along the edge. In the

preceding ﬁgure and the analysis of this section, the curve intersects at the right

endpoint of the edge. After reading the section, redo the analysis for two other

cases:

a. The curve intersects at the left endpoint of the edge.

b. The curve intersects at the midpoint of the edge.

74 4 Pictorial proofs

n

1

lnkdk

lnk

1 · · · n

k

That combined area is approximately

the area under the lnk curve, so

lnn! ≈

n

1

lnk dk = nlnn −n +1.

(4.46)

Each term in this lnn! approximation

contributes one factor to n!:

n! ≈ n

n

×e

−n

×e. (4.47)

Each factor has a counterpart in a factor from Stirling’s approximation

(Section 3.2.3). In descending order of importance, the factors in Stirling’s

approximation are

n! ≈ n

n

×e

−n

×

√

n ×

√

2π. (4.48)

The integral approximation reproduces the two most important factors

and almost reproduces the fourth factor: e and

√

2π diﬀer by only 8%.

The only unexplained factor is

√

n.

lnk

1 · · · n

k

From where does the

√

n factor come?

The

√

n factor must come from the fragments

above the lnk curve. They are almost triangles

and would be easier to add if they were triangles.

Therefore, redraw the lnk curve using straight-

line segments (another use of lumping).

lnk

1 · · · n

k

lnk

1 · · · n

k

The resulting triangles would be easier to add if

they were rectangles. Therefore, let’s double each

triangle to make it a rectangle.

What is the sum of these rectangular pieces?

To sum these pieces, lay your right hand along the

k = n vertical line. With your left hand, shove the

pieces to the right until they hit your right hand.

The pieces then stack to form the lnn rectangle.

Because each piece is double the corresponding

triangular protrusion, the triangular protrusions

sum to (lnn)/2. This triangle correction improves the integral approxi-

mation. The resulting approximation for lnn! now has one more term:

4.6 Summary and further problems 75

lnn! ≈ nlnn −n +1

. .. .

integral

+

lnn

2

. .. .

triangles

. (4.49)

Upon exponentiating to get n!, the correction contributes a factor of

√

n.

n! ≈ n

n

×e

−n

×e ×

√

n. (4.50)

Compared to Stirling’s approximation, the only remaining diﬀerence is

the factor of e that should be

√

2π, an error of only 8%—all from doing

one integral and drawing a few pictures.

Problem 4.31 Underestimate or overestimate?

Does the integral approximation with the triangle correction underestimate or

overestimate n!? Use pictorial reasoning; then check the conclusion numerically.

Problem 4.32 Next correction

The triangle correction is the ﬁrst of an inﬁnite series of corrections. The cor-

rections include terms proportional to n

−2

, n

−3

, . . ., and they are diﬃcult to

derive using only pictures. But the n

−1

correction can be derived with pictures.

a. Draw the regions showing the error made by replacing the smooth lnk curve

with a piecewise-linear curve (a curve made of straight segments).

b. Each region is bounded above by a curve that is almost a parabola, whose

area is given by Archimedes’ formula (Problem 4.34)

area =

2

3

×area of the circumscribing rectangle. (4.51)

Use that property to approximate the area of each region.

c. Show that when evaluating lnn! =

¸

n

1

lnk, these regions sum to approxi-

mately (1 −n

−1

)/12.

d. What is the resulting, improved constant term (formerly e) in the approxima-

tion to n! and how close is it to

√

2π? What factor does the n

−1

term in the

lnn! approximation contribute to the n! approximation?

These and subsequent corrections are derived in Section 6.3.2 using the technique

of analogy.

4.6 Summary and further problems

For tens of millions of years, evolution has reﬁned our perceptual abilities.

A small child recognizes patterns more reliably and quickly than does

76 4 Pictorial proofs

the largest supercomputer. Pictorial reasoning, therefore, taps the mind’s

vast computational power. It makes us more intelligent by helping us

understand and see large ideas at a glance.

For extensive and enjoyable collections of picture proofs, see the works of

Nelsen [31, 32]. Here are further problems to develop pictorial reasoning.

Problem 4.33 Another picture for the AM–GM inequality

Sketch y = lnx to show that the arithmetic mean of a and b is always greater

than or equal to their geometric mean, with equality when a = b.

Problem 4.34 Archimedes’ formula for the area of a parabola

Archimedes showed (long before calculus!) that the closed parabola

encloses two-thirds of its circumscribing rectangle. Prove this result

by integration.

Show that the closed parabola also encloses two-thirds of the circum-

scribing parallelogram with vertical sides. These pictorial recipes are

useful when approximating functions (for example, in Problem 4.32).

Problem 4.35 Ancient picture for the area of a circle

The ancient Greeks knew that the circumference of a circle with radius r was

2πr. They then used the following picture to show that its area is πr

2

. Can you

reconstruct the argument?

=

Problem 4.36 Volume of a sphere

Extend the argument of Problem 4.35 to ﬁnd the volume of a sphere of radius r,

given that its surface area is 4πr

2

. Illustrate the argument with a sketch.

Problem 4.37 A famous sum

Use pictorial reasoning to approximate the famous Basel sum

∞

¸

1

n

−2

.

Problem 4.38 Newton–Raphson method

In general, solving f(t) = 0 requires approximations. One method is to start with

a guess t

0

and to improve it iteratively using the Newton–Raphson method

t

n+1

= t

n

−

f(t

n

)

f

(t

n

)

, (4.52)

where f

(t

n

) is the derivative df/dt evaluated at t = t

n

. Draw a picture to

justify this recipe; then use the recipe to estimate

√

2. (Then try Problem 4.17.)

5

Taking out the big part

5.1 Multiplication using one and few 77

5.2 Fractional changes and low-entropy expressions 79

5.3 Fractional changes with general exponents 84

5.4 Successive approximation: How deep is the well? 91

5.5 Daunting trigonometric integral 94

5.6 Summary and further problems 97

In almost every quantitative problem, the analysis simpliﬁes when you

follow the proverbial advice of doing ﬁrst things ﬁrst. First approximate

and understand the most important eﬀect—the big part—then reﬁne your

analysis and understanding. This procedure of successive approximation

or “taking out the big part” generates meaningful, memorable, and usable

expressions. The following examples introduce the related idea of low-

entropy expressions (Section 5.2) and analyze mental multiplication (Sec-

tion 5.1), exponentiation (Section 5.3), quadratic equations (Section 5.4),

and a diﬃcult trigonometric integral (Section 5.5).

5.1 Multiplication using one and few

The ﬁrst illustration is a method of mental multiplication suited to rough,

back-of-the-envelope estimates. The particular calculation is the storage

capacity of a data CD-ROM. A data CD-ROM has the same format and

storage capacity as a music CD, whose capacity can be estimated as the

product of three factors:

1 hr ×

3600 s

1 hr

. .. .

playing time

×

4.4 ×10

4

samples

1 s

. .. .

sample rate

×2 channels ×

16 bits

1 sample

. .. .

sample size

. (5.1)

78 5 Taking out the big part

(In the sample-size factor, the two channels are for stereophonic sound.)

Problem 5.1 Sample rate

Look up the Shannon–Nyquist sampling theorem [22], and explain why the

sample rate (the rate at which the sound pressure is measured) is roughly 40 kHz.

Problem 5.2 Bits per sample

Because 2

16

∼ 10

5

, a 16-bit sample—as chosen for the CD format—requires

electronics accurate to roughly 0.001%. Why didn’t the designers of the CD

format choose a much larger sample size, say 32 bits (per channel)?

Problem 5.3 Checking units

Check that all the units in the estimate divide out—except for the desired units

of bits.

Back-of-the-envelope calculations use rough estimates such as the playing

time and neglect important factors such as the bits devoted to error detec-

tion and correction. In this and many other estimates, multiplication with

3 decimal places of accuracy would be overkill. An approximate analysis

needs an approximate method of calculation.

What is the data capacity to within a factor of 2?

The units (the biggest part!) are bits (Problem 5.3), and the three numeri-

cal factors contribute 3600 ×4.4 ×10

4

×32. To estimate the product, split

it into a big part and a correction.

The big part: The most important factor in a back-of-the-envelope prod-

uct usually comes from the powers of 10, so evaluate this big part ﬁrst:

3600 contributes three powers of 10, 4.4 × 10

4

contributes four, and 32

contributes one. The eight powers of 10 produce a factor of 10

8

.

The correction: After taking out the big part, the remaining part is a correc-

tion factor of 3.6 ×4.4 ×3.2. This product too is simpliﬁed by taking out

its big part. Round each factor to the closest number among three choices:

1, few, or 10. The invented number few lies midway between 1 and 10:

It is the geometric mean of 1 and 10, so (few)

2

= 10 and few ≈ 3. In the

product 3.6×4.4×3.2, each factor rounds to few, so 3.6×4.4×3.2 ≈ (few)

3

or roughly 30.

The units, the powers of 10, and the correction factor combine to give

capacity ∼ 10

8

×30 bits = 3 ×10

9

bits. (5.2)

5.2 Fractional changes and low-entropy expressions 79

This estimate is within a factor of 2 of the exact product (Problem 5.4),

which is itself close to the actual capacity of 5.6 ×10

9

bits.

Problem 5.4 Underestimate or overestimate?

Does 3 ×10

9

overestimate or underestimate 3600 ×4.4 ×10

4

×32? Check your

reasoning by computing the exact product.

Problem 5.5 More practice

Use the one-or-few method of multiplication to perform the following calcula-

tions mentally; then compare the approximate and actual products.

a. 161 ×294 ×280 ×438. The actual product is roughly 5.8 ×10

9

.

b. Earth’s surface area A = 4πR

2

, where the radius is R ∼ 6 ×10

6

m. The actual

surface area is roughly 5.1 ×10

14

m

2

.

5.2 Fractional changes and low-entropy expressions

Using the one-or-few method for mental multiplication is fast. For exam-

ple, 3.15 × 7.21 quickly becomes few× 10

1

∼ 30, which is within 50% of

the exact product 22.7115. To get a more accurate estimate, round 3.15

to 3 and 7.21 to 7. Their product 21 is in error by only 8%. To reduce the

error further, one could split 3.15 × 7.21 into a big part and an additive

correction. This decomposition produces

(3 +0.15)(7 +0.21) = 3 ×7

. .. .

big part

+0.15 ×7 +3 ×0.21 +0.15 ×0.21

. .. .

additivecorrection

. (5.3)

The approach is sound, but the literal application of taking out the big

part produces a messy correction that is hard to remember and under-

stand. Slightly modiﬁed, however, taking out the big part provides a

clean and intuitive correction. As gravy, developing the improved cor-

rection introduces two important street-ﬁghting ideas: fractional changes

(Section 5.2.1) and low-entropy expressions (Section 5.2.2). The improved

correction will then, as a ﬁrst of many uses, help us estimate the energy

saved by highway speed limits (Section 5.2.3).

5.2.1 Fractional changes

The hygienic alternative to an additive correction is to split the product

into a big part and a multiplicative correction:

80 5 Taking out the big part

3.15 ×7.21 = 3 ×7

. .. .

big part

×(1 +0.05) ×(1 +0.03)

. .. .

correction factor

. (5.4)

Can you ﬁnd a picture for the correction factor?

1

1

0.05

0.03

1 0.05

0.03 ≈ 0 The correction factor is the area of a rectangle with

width 1 + 0.05 and height 1 + 0.03. The rectangle

contains one subrectangle for each term in the ex-

pansion of (1 +0.05) ×(1 +0.03). Their combined

area of roughly 1 + 0.05 + 0.03 represents an 8%

fractional increase over the big part. The big part

is 21, and 8% of it is 1.68, so 3.15 × 7.21 = 22.68,

which is within 0.14% of the exact product.

Problem 5.6 Picture for the fractional error

What is the pictorial explanation for the fractional error of roughly 0.15%?

Problem 5.7 Try it yourself

Estimate 245×42 by rounding each factor to a nearby multiple of 10, and compare

this big part with the exact product. Then draw a rectangle for the correction

factor, estimate its area, and correct the big part.

5.2.2 Low-entropy expressions

The correction to 3.15 × 7.21 was complicated as an absolute or additive

change but simple as a fractional change. This contrast is general. Using

the additive correction, a two-factor product becomes

(x +Δx)(y +Δy) = xy +xΔy +yΔx +ΔxΔy

. .. .

additive correction

. (5.5)

Problem 5.8 Rectangle picture

Draw a rectangle representing the expansion

(x +Δx)(y +Δy) = xy +xΔy +yΔx +ΔxΔy. (5.6)

When the absolute changes Δx and Δy are small (x Δx and y Δy),

the correction simpliﬁes to xΔy+yΔx, but even so it is hard to remember

because it has many plausible but incorrect alternatives. For example, it

could plausibly contain terms such as ΔxΔy, xΔx, or yΔy. The extent

5.2 Fractional changes and low-entropy expressions 81

of the plausible alternatives measures the gap between our intuition and

reality; the larger the gap, the harder the correct result must work to ﬁll

it, and the harder we must work to remember the correct result.

Such gaps are the subject of statistical mechanics and information theory

[20, 21], which deﬁne the gap as the logarithm of the number of plausible

alternatives and call the logarithmic quantity the entropy. The logarithm

does not alter the essential point that expressions diﬀer in the number of

plausible alternatives and that high-entropy expressions [28]—ones with

many plausible alternatives—are hard to remember and understand.

In contrast, a low-entropy expression allows few plausible alternatives,

and elicits, “Yes! How could it be otherwise?!” Much mathematical and

scientiﬁc progress consists of ﬁnding ways of thinking that turn high-

entropy expressions into easy-to-understand, low-entropy expressions.

What is a low-entropy expression for the correction to the product xy?

A multiplicative correction, being dimensionless, automatically has lower

entropy than the additive correction: The set of plausible dimensionless

expressions is much smaller than the full set of plausible expressions.

The multiplicative correction is (x + Δx)(y + Δy)/xy. As written, this

ratio contains gratuitous entropy. It constructs two dimensioned sums

x+Δx and y+Δy, multiplies them, and ﬁnally divides the product by xy.

Although the result is dimensionless, it becomes so only in the last step.

A cleaner method is to group related factors by making dimensionless

quantities right away:

(x +Δx)(y +Δy)

xy

=

x +Δx

x

y +Δy

y

=

1 +

Δx

x

1 +

Δy

y

. (5.7)

The right side is built only from the fundamental dimensionless quantity 1

and from meaningful dimensionless ratios: (Δx)/x is the fractional change

in x, and (Δy)/y is the fractional change in y.

The gratuitous entropy came from mixing x +Δx, y +Δy, x, and y willy

nilly, and it was removed by regrouping or unmixing. Unmixing is dif-

ﬁcult with physical systems. Try, for example, to remove a drop of food

coloring mixed into a glass of water. The problem is that a glass of

water contains roughly 10

25

molecules. Fortunately, most mathematical

expressions have fewer constituents. We can often regroup and unmix

the mingled pieces and thereby reduce the entropy of the expression.

82 5 Taking out the big part

Problem 5.9 Rectangle for the correction factor

Draw a rectangle representing the low-entropy correction factor

1 +

Δx

x

1 +

Δy

y

. (5.8)

A low-entropy correction factor produces a low-entropy fractional change:

Δ(xy)

xy

=

1 +

Δx

x

1 +

Δy

y

−1 =

Δx

x

+

Δy

y

+

Δx

x

Δy

y

, (5.9)

where Δ(xy)/xy is the fractional change from xy to (x + Δx)(y + Δy).

The rightmost term is the product of two small fractions, so it is small

compared to the preceding two terms. Without this small, quadratic term,

Δ(xy)

xy

≈

Δx

x

+

Δy

y

. (5.10)

Small fractional changes simply add!

This fractional-change rule is far simpler than the corresponding approx-

imate rule that the absolute change is xΔy + yΔx. Simplicity indicates

low entropy; indeed, the only plausible alternative to the proposed rule

is the possibility that fractional changes multiply. And this conjecture is

not likely: When Δy = 0, it predicts that Δ(xy) = 0 no matter the value

of Δx (this prediction is explored also in Problem 5.12).

Problem 5.10 Thermal expansion

If, due to thermal expansion, a metal sheet expands in each dimension by 4%,

what happens to its area?

Problem 5.11 Price rise with a discount

Imagine that inﬂation, or copyright law, increases the price of a book by 10%

compared to last year. Fortunately, as a frequent book buyer, you start getting a

store discount of 15%. What is the net price change that you see?

5.2.3 Squaring

In analyzing the engineered and natural worlds, a common operation is

squaring—a special case of multiplication. Squared lengths are areas, and

squared speeds are proportional to the drag on most objects (Section 2.4):

F

d

∼ ρv

2

A, (5.11)

5.2 Fractional changes and low-entropy expressions 83

where v is the speed of the object, A is its cross-sectional area, and ρ is

the density of the ﬂuid. As a consequence, driving at highway speeds for

a distance d consumes an energy E = F

d

d ∼ ρAv

2

d. Energy consumption

can therefore be reduced by driving more slowly. This possibility became

important to Western countries in the 1970s when oil prices rose rapidly

(see [7] for an analysis). As a result, the United States instituted a highway

speed limit of 55 mph (90 kph).

By what fraction does gasoline consumption fall due to driving 55 mph instead

of 65 mph?

A lower speed limit reduces gasoline consumption by reducing the drag

force ρAv

2

and by reducing the driving distance d: People measure and

regulate their commuting more by time than by distance. But ﬁnding a

new home or job is a slow process. Therefore, analyze ﬁrst things ﬁrst—

assume for this initial analysis that the driving distance d stays ﬁxed (then

try Problem 5.14).

With that assumption, E is proportional to v

2

, and

ΔE

E

= 2 ×

Δv

v

. (5.12)

Going from 65 mph to 55 mph is roughly a 15% drop in v, so the energy

consumption drops by roughly 30%. Highway driving uses a signiﬁcant

fraction of the oil consumed by motor vehicles, which in the United States

consume a signiﬁcant fraction of all oil consumed. Thus the 30% drop

substantially reduced total US oil consumption.

Problem 5.12 A tempting error

If A and x are related by A = x

2

, a tempting conjecture is that

ΔA

A

≈

Δx

x

2

. (5.13)

Disprove this conjecture using easy cases (Chapter 2).

Problem 5.13 Numerical estimates

Use fractional changes to estimate 6.3

3

. How accurate is the estimate?

Problem 5.14 Time limit on commuting

Assume that driving time, rather than distance, stays ﬁxed as highway driving

speeds fall by 15%. What is the resulting fractional change in the gasoline con-

sumed by highway driving?

84 5 Taking out the big part

Problem 5.15 Wind power

The power generated by an ideal wind turbine is proportional to v

3

(why?). If

wind speeds increase by a mere 10%, what is the eﬀect on the generated power?

The quest for fast winds is one reason that wind turbines are placed on cliﬀs or

hilltops or at sea.

5.3 Fractional changes with general exponents

The fractional-change approximations for changes in x

2

(Section 5.2.3) and

in x

3

(Problem 5.13) are special cases of the approximation for x

n

Δ(x

n

)

x

n

≈ n ×

Δx

x

. (5.14)

This rule oﬀers a method for mental division (Section 5.3.1), for estimating

square roots (Section 5.3.2), and for judging a common explanation for the

seasons (Section 5.3.3). The rule requires only that the fractional change

be small and that the exponent n not be too large (Section 5.3.4).

5.3.1 Rapid mental division

The special case n = −1 provides the method for rapid mental division.

As an example, let’s estimate 1/13. Rewrite it as (x + Δx)

−1

with x = 10

and Δx = 3. The big part is x

−1

= 0.1. Because (Δx)/x = 30%, the

fractional correction to x

−1

is roughly −30%. The result is 0.07.

1

13

≈

1

10

−30% = 0.07, (5.15)

where the “−30%” notation, meaning “decrease the previous object by

30%,” is a useful shorthand for a factor of 1 −0.3.

How accurate is the estimate, and what is the source of the error?

The estimate is in error by only 9%. The error arises because the linear

approximation

Δ

x

−1

x

−1

≈ −1 ×

Δx

x

(5.16)

does not include the square (or higher powers) of the fractional change

(Δx)/x (Problem 5.17 asks you to ﬁnd the squared term).

5.3 Fractional changes with general exponents 85

How can the error in the linear approximation be reduced?

To reduce the error, reduce the fractional change. Because the fractional

change is determined by the big part, let’s increase the accuracy of the

big part. Accordingly, multiply 1/13 by 8/8, a convenient form of 1, to

construct 8/104. Its big part 0.08 approximates 1/13 already to within 4%.

To improve it, write 1/104 as (x + Δx)

−1

with x = 100 and Δx = 4. The

fractional change (Δx)/x is now 0.04 (rather than 0.3); and the fractional

correction to 1/x and 8/x is a mere −4%. The corrected estimate is 0.0768:

1

13

≈ 0.08 −4% = 0.08 −0.0032 = 0.0768. (5.17)

This estimate can be done mentally in seconds and is accurate to 0.13%!

Problem 5.16 Next approximation

Multiply 1/13 by a convenient form of 1 to make a denominator near 1000; then

estimate 1/13. How accurate is the resulting approximation?

Problem 5.17 Quadratic approximation

Find A, the coeﬃcient of the quadratic term in the improved fractional-change

approximation

Δ

x

−1

x

−1

≈ −1 ×

Δx

x

+A×

Δx

x

2

. (5.18)

Use the resulting approximation to improve the estimates for 1/13.

Problem 5.18 Fuel eﬃciency

Fuel eﬃciency is inversely proportional to energy consumption. If a 55 mph

speed limit decreases energy consumption by 30%, what is the new fuel eﬃciency

of a car that formerly got 30 miles per US gallon (12.8 kilometers per liter)?

5.3.2 Square roots

The fractional exponent n = 1/2 provides the method for estimating

square roots. As an example, let’s estimate

√

10. Rewrite it as (x +Δx)

1/2

with x = 9 and Δx = 1. The big part x

1/2

is 3. Because (Δx)/x = 1/9 and

n = 1/2, the fractional correction is 1/18. The corrected estimate is

√

10 ≈ 3 ×

1 +

1

18

≈ 3.1667. (5.19)

The exact value is 3.1622 . . ., so the estimate is accurate to 0.14%.

86 5 Taking out the big part

Problem 5.19 Overestimate or underestimate?

Does the linear fractional-change approximation overestimate all square roots (as

it overestimated

√

10)? If yes, explain why; if no, give a counterexample.

Problem 5.20 Cosine approximation

Use the small-angle approximation sinθ ≈ θ to show that cos θ ≈ 1 −θ

2

/2.

Problem 5.21 Reducing the fractional change

To reduce the fractional change when estimating

√

10, rewrite it as

√

360/6 and

then estimate

√

360. How accurate is the resulting estimate for

√

10?

Problem 5.22 Another method to reduce the fractional change

Because

√

2 is fractionally distant from the nearest integer square roots

√

1 and

√

4, fractional changes do not give a direct and accurate estimate of

√

2. A

similar problem occurred in estimating ln2 (Section 4.3); there, rewriting 2 as

(4/3)/(2/3) improved the accuracy. Does that rewriting help estimate

√

2?

Problem 5.23 Cube root

Estimate 2

1/3

to within 10%.

5.3.3 A reason for the seasons?

Summers are warmer than winters, it is often alleged, because the earth is

closer to the sun in the summer than in the winter. This common explana-

tion is bogus for two reasons. First, summers in the southern hemisphere

happen alongside winters in the northern hemisphere, despite almost

no diﬀerence in the respective distances to the sun. Second, as we will

now estimate, the varying earth–sun distance produces too small a tem-

perature diﬀerence. The causal chain—that the distance determines the

intensity of solar radiation and that the intensity determines the surface

temperature—is most easily analyzed using fractional changes.

Intensity of solar radiation: The intensity is the solar power divided by the

area over which it spreads. The solar power hardly changes over a year

(the sun has existed for several billion years); however, at a distance r

from the sun, the energy has spread over a giant sphere with surface

area ∼ r

2

. The intensity I therefore varies according to I ∝ r

−2

. The

fractional changes in radius and intensity are related by

ΔI

I

≈ −2 ×

Δr

r

. (5.20)

5.3 Fractional changes with general exponents 87

Surface temperature: The incoming solar energy cannot accumulate and

returns to space as blackbody radiation. Its outgoing intensity depends

on the earth’s surface temperature T according to the Stefan–Boltzmann

law I = σT

4

(Problem 1.12), where σ is the Stefan–Boltzmann constant.

Therefore T ∝ I

1/4

. Using fractional changes,

ΔT

T

≈

1

4

×

ΔI

I

. (5.21)

This relation connects intensity and temperature. The temperature and

distance are connected by (ΔI)/I = −2 × (Δr)/r. When joined, the two

relations connect distance and temperature as follows:

−2

1

4

ΔT

T

≈ −

1

2

×

Δr

r

Δr

r

ΔI

I

≈ −2 ×

Δr

r

I ∝ r

−2

T ∝ I

1/4

l

r

max

r

min

0

◦

θ

r

The next step in the computation is to estimate

the input (Δr)/r—namely, the fractional change

in the earth–sun distance. The earth orbits the

sun in an ellipse; its orbital distance is

r =

l

1 + cos θ

, (5.22)

where is the eccentricity of the orbit, θ is the

polar angle, and l is the semilatus rectum. Thus r varies from r

min

=

l/(1 +) (when θ = 0

◦

) to r

max

= l/(1 −) (when θ = 180

◦

). The increase

from r

min

to l contributes a fractional change of roughly . The increase

from l to r

max

contributes another fractional change of roughly . Thus,

r varies by roughly 2. For the earth’s orbit, = 0.016, so the earth–sun

distance varies by 0.032 or 3.2% (making the intensity vary by 6.4%).

Problem 5.24 Where is the sun?

r

max

r

min

The preceding diagram of the earth’s orbit placed the sun away

from the center of the ellipse. The diagram to the right shows

the sun at an alternative and perhaps more natural location: at

the center of the ellipse. What physical laws, if any, prevent

the sun from sitting at the center of the ellipse?

Problem 5.25 Check the fractional change

Look up the minimum and maximum earth–sun distances and check that the

distance does vary by 3.2% from minimum to maximum.

88 5 Taking out the big part

A 3.2% increase in distance causes a slight drop in temperature:

ΔT

T

≈ −

1

2

×

Δr

r

= −1.6%. (5.23)

However, man does not live by fractional changes alone and experiences

the absolute temperature change ΔT.

ΔT = −1.6%×T. (5.24)

In winter T ≈ 0

◦

C, so is ΔT ≈ 0

◦

C?

If our calculation predicts that ΔT ≈ 0

◦

C, it must be wrong. An even

less plausible conclusion results from measuring T in Fahrenheit degrees,

which makes T often negative in parts of the northern hemisphere. Yet

ΔT cannot ﬂip its sign just because T is measured in Fahrenheit degrees!

Fortunately, the temperature scale is constrained by the Stefan–Boltzmann

law. For blackbody ﬂux to be proportional to T

4

, temperature must be

measured relative to a state with zero thermal energy: absolute zero.

Neither the Celsius nor the Fahrenheit scale satisﬁes this requirement.

In contrast, the Kelvin scale does measure temperature relative to absolute

zero. On the Kelvin scale, the average surface temperature is T ≈ 300 K;

thus, a 1.6% change in T makes ΔT ≈ 5 K. A 5 K change is also a 5

◦

C

change—Kelvin and Celsius degrees are the same size, although the scales

have diﬀerent zero points. (See also Problem 5.26.) A typical tempera-

ture change between summer and winter in temperate latitudes is 20

◦

C—

much larger than the predicted 5

◦

C change, even after allowing for errors

in the estimate. A varying earth–sun distance is a dubious explanation

of the reason for the seasons.

Problem 5.26 Converting to Fahrenheit

The conversion between Fahrenheit and Celsius temperatures is

F = 1.8C +32, (5.25)

so a change of 5

◦

C should be a change of 41

◦

F—suﬃciently large to explain the

seasons! What is wrong with this reasoning?

Problem 5.27 Alternative explanation

If a varying distance to the sun cannot explain the seasons, what can? Your

proposal should, in passing, explain why the northern and southern hemispheres

have summer 6 months apart.

5.3 Fractional changes with general exponents 89

5.3.4 Limits of validity

The linear fractional-change approximation

Δ(x

n

)

x

n

≈ n ×

Δx

x

(5.26)

has been useful. But when is it valid? To investigate without drowning

in notation, write z for Δx; then choose x = 1 to make z the absolute

and the fractional change. The right side becomes nz, and the linear

fractional-change approximation is equivalent to

(1 +z)

n

≈ 1 +nz. (5.27)

The approximation becomes inaccurate when z is too large: for example,

when evaluating

√

1 +z with z = 1 (Problem 5.22). Is the exponent n

also restricted? The preceding examples illustrated only moderate-sized

exponents: n = 2 for energy consumption (Section 5.2.3), −2 for fuel

eﬃciency (Problem 5.18), −1 for reciprocals (Section 5.3.1), 1/2 for square

roots (Section 5.3.2), and −2 and 1/4 for the seasons (Section 5.3.3). We

need further data.

What happens in the extreme case of large exponents?

With a large exponent such as n = 100 and, say, z = 0.001, the approx-

imation predicts that 1.001

100

≈ 1.1—close to the true value of 1.105 . . .

However, choosing the same n alongside z = 0.1 (larger than 0.001 but

still small) produces the terrible prediction

1.1

100

. .. .

(1+z)

n

= 1 +100 ×0.1

. .. .

nz

= 11; (5.28)

1.1

100

is roughly 14,000, more than 1000 times larger than the prediction.

Both predictions used large n and small z, yet only one prediction was

accurate; thus, the problem cannot lie in n or z alone. Perhaps the culprit

is the dimensionless product nz. To test that idea, hold nz constant while

trying large values of n. For nz, a sensible constant is 1—the simplest

dimensionless number. Here are several examples.

1.1

10

≈ 2.59374,

1.01

100

≈ 2.70481,

1.001

1000

≈ 2.71692.

(5.29)

90 5 Taking out the big part

In each example, the approximation incorrectly predicts that (1 +z)

n

= 2.

What is the cause of the error?

k

1 +10

−k

10

k

1 2.5937425

2 2.7048138

3 2.7169239

4 2.7181459

5 2.7182682

6 2.7182805

7 2.7182817

To ﬁnd the cause, continue the sequence beyond

1.001

1000

and hope that a pattern will emerge: The

values seem to approach e = 2.718281828 . . ., the

base of the natural logarithms. Therefore, take the

logarithm of the whole approximation.

ln(1 +z)

n

= nln(1 +z). (5.30)

Pictorial reasoning showed that ln(1 + z) ≈ z when

z 1 (Section 4.3). Thus, nln(1 + z) ≈ nz, mak-

ing (1 + z)

n

≈ e

nz

. This improved approximation

explains why the approximation (1 + z)

n

≈ 1 + nz failed with large nz:

Only when nz 1 is e

nz

approximately 1 + nz. Therefore, when z 1

the two simplest approximation are

(1 +z)

n

≈

1 +nz (z 1 and nz 1),

e

nz

(z 1 and nz unrestricted).

(5.31)

n

z

n

z

=

1

n

/

z

=

1

z

=

1

n = 1

1 +nz

e

nz

z

n

e

n/z

z

n

z

n

1 +nlnz

The diagram shows, across the whole

n–z plane, the simplest approximation

in each region. The axes are logarith-

mic and n and z are assumed positive:

The right half plane shows z 1, and

the upper half plane shows n 1. On

the lower right, the boundary curve is

nlnz = 1. Explaining the boundaries

and extending the approximations is an

instructive exercise (Problem 5.28).

Problem 5.28 Explaining the approximation plane

In the right half plane, explain the n/z = 1 and nlnz = 1 boundaries. For the

whole plane, relax the assumption of positive n and z as far as possible.

Problem 5.29 Binomial-theorem derivation

Try the following alternative derivation of (1+z)

n

≈ e

nz

(where n 1). Expand

(1 + z)

n

using the binomial theorem, simplify the products in the binomial

coeﬃcients by approximating n − k as n, and compare the resulting expansion

to the Taylor series for e

nz

.

5.4 Successive approximation: How deep is the well? 91

5.4 Successive approximation: How deep is the well?

The next illustration of taking out the big part emphasizes successive

approximation and is disguised as a physics problem.

You drop a stone down a well of unknown depth h and hear the splash 4 s

later. Neglecting air resistance, ﬁnd h to within 5%. Use c

s

= 340 ms

−1

as

the speed of sound and g = 10 ms

−2

as the strength of gravity.

Approximate and exact solutions give almost the same well depth, but

oﬀer signiﬁcantly diﬀerent understandings.

5.4.1 Exact depth

The depth is determined by the constraint that the 4 s wait splits into two

times: the rock falling freely down the well and the sound traveling up

the well. The free-fall time is

2h/g (Problem 1.3), so the total time is

T =

2h

g

. .. .

rock

+

h

c

s

....

sound

. (5.32)

To solve for h exactly, either isolate the square root on one side and square

both sides to get a quadratic equation in h (Problem 5.30); or, for a less

error-prone method, rewrite the constraint as a quadratic equation in a

new variable z =

√

h.

Problem 5.30 Other quadratic

Solve for h by isolating the square root on one side and squaring both sides.

What are the advantages and disadvantages of this method in comparison with

the method of rewriting the constraint as a quadratic in z =

√

h?

As a quadratic equation in z =

√

h, the constraint is

1

c

s

z

2

+

2

g

z −T = 0. (5.33)

Using the quadratic formula and choosing the positive root yields

z =

−

2/g +

2/g +4T/c

s

2/c

s

. (5.34)

Because z

2

= h,

92 5 Taking out the big part

h =

−

2/g +

2/g +4T/c

s

2/c

s

2

. (5.35)

Substituting g = 10 ms

−2

and c

s

= 340 ms

−1

gives h ≈ 71.56 m.

Even if the depth is correct, the exact formula for it is a mess. Such high-

entropy horrors arise frequently from the quadratic formula; its use often

signals the triumph of symbol manipulation over thought. Exact answers,

we will ﬁnd, may be less useful than approximate answers.

5.4.2 Approximate depth

To ﬁnd a low-entropy, approximate depth, identify the big part—the

most important eﬀect. Here, most of the total time is the rock’s free

fall: The rock’s maximum speed, even if it fell for the entire 4 s, is only

gT = 40 ms

−1

, which is far below c

s

. Therefore, the most important eﬀect

should arise in the extreme case of inﬁnite sound speed.

If c

s

= ∞, how deep is the well?

In this zeroth approximation, the free-fall time t

0

is the full time T = 4 s,

so the well depth h

0

becomes

h

0

=

1

2

gt

2

0

= 80 m. (5.36)

Is this approximate depth an overestimate or underestimate? How accurate is it?

This approximation neglects the sound-travel time, so it overestimates

the free-fall time and therefore the depth. Compared to the true depth

of roughly 71.56 m, it overestimates the depth by only 11%—reasonable

accuracy for a quick method oﬀering physical insight. Furthermore, this

approximation suggests its own reﬁnement.

How can this approximation be improved?

T t h

1

2

gt

2

T −

h

c

s

To improve it, use the approximate depth h

0

to approx-

imate the sound-travel time.

t

sound

≈

h

0

c

s

≈ 0.24 s. (5.37)

The remaining time is the next approximation to the free-fall time.

5.4 Successive approximation: How deep is the well? 93

t

1

= T −

h

0

c

s

≈ 3.76 s. (5.38)

In that time, the rock falls a distance gt

2

1

/2, so the next approximation to

the depth is

h

1

=

1

2

gt

2

1

≈ 70.87 m. (5.39)

Is this approximate depth an overestimate or underestimate? How accurate is it?

The calculation of h

1

used h

0

to estimate the sound-travel time. Because

h

0

overestimates the depth, the procedure overestimates the sound-travel

time and, by the same amount, underestimates the free-fall time. Thus

h

1

underestimates the depth. Indeed, h

1

is slightly smaller than the true

depth of roughly 71.56 m—but by only 1.3%.

The method of successive approximation has several advantages over solv-

ing the quadratic formula exactly. First, it helps us develop a physical

understanding of the system; we realize, for example, that most of the

T = 4 s is spent in free fall, so the depth is roughly gT

2

/2. Second, it

has a pictorial explanation (Problem 5.34). Third, it gives a suﬃciently

accurate answer quickly. If you want to know whether it is safe to jump

into the well, why calculate the depth to three decimal places?

Finally, the method can handle small changes in the model. Maybe the

speed of sound varies with depth, or air resistance becomes important

(Problem 5.32). Then the brute-force, quadratic-formula method fails. The

quadratic formula and the even messier cubic and the quartic formulas

are rare closed-form solutions to complicated equations. Most equations

have no closed-form solution. Therefore, a small change to a solvable

model usually produces an intractable model—if we demand an exact

answer. The method of successive approximation is a robust alternative

that produces low-entropy, comprehensible solutions.

Problem 5.31 Parameter-value inaccuracies

What is h

2

, the second approximation to the depth? Compare the error in h

1

and h

2

with the error made by using g = 10 ms

−2

.

Problem 5.32 Eﬀect of air resistance

Roughly what fractional error in the depth is produced by neglecting air resis-

tance (Section 2.4.2)? Compare this error to the error in the ﬁrst approximation

h

1

and in the second approximation h

2

(Problem 5.31).

94 5 Taking out the big part

Problem 5.33 Dimensionless form of the well-depth analysis

Even the messiest results are cleaner and have lower entropy in dimensionless

form. The four quantities h, g, T, and c

s

produce two independent dimensionless

groups (Section 2.4.1). An intuitively reasonable pair are

h ≡

h

gT

2

and T ≡

gT

c

s

. (5.40)

a. What is a physical interpretation of T?

b. With two groups, the general dimensionless form is h = f(T). What is h in

the easy case T →0?

c. Rewrite the quadratic-formula solution

h =

−

2/g +

2/g +4T/c

s

2/c

s

2

(5.41)

as h = f(T). Then check that f(T) behaves correctly in the easy case T →0.

Problem 5.34 Spacetime diagram of the well depth

depth

t

4s

rock

sound

wavefront

How does the spacetime diagram [44] illustrate

the successive approximation of the well depth?

On the diagram, mark h

0

(the zeroth approxi-

mation to the depth), h

1

, and the exact depth

h. Mark t

0

, the zeroth approximation to the

free-fall time. Why are portions of the rock and

sound-wavefront curves dotted? How would

you redraw the diagram if the speed of sound

doubled? If g doubled?

5.5 Daunting trigonometric integral

The ﬁnal example of taking out the big part is to estimate a daunting

trigonometric integral that I learned as an undergraduate. My classmates

and I spent many late nights in the physics library solving homework

problems; the graduate students, doing the same for their courses, would

regale us with their favorite mathematics and physics problems.

The integral appeared on the mathematical-preliminaries exam to enter

the Landau Institute for Theoretical Physics in the former USSR. The

problem is to evaluate

π/2

−π/2

(cos t)

100

dt (5.42)

5.5 Daunting trigonometric integral 95

to within 5% in less than 5 min without using a calculator or computer!

That (cos t)

100

looks frightening. Most trigonometric identities do not

help. The usually helpful identity (cos t)

2

= (cos 2t −1)/2 produces only

(cos t)

100

=

cos 2t −1

2

50

, (5.43)

which becomes a trigonometric monster upon expanding the 50th power.

A clue pointing to a simpler method is that 5% accuracy is suﬃcient—so,

ﬁnd the big part! The integrand is largest when t is near zero. There,

cos t ≈ 1 −t

2

/2 (Problem 5.20), so the integrand is roughly

(cos t)

100

≈

1 −

t

2

2

100

. (5.44)

It has the familiar form (1 + z)

n

, with fractional change z = −t

2

/2 and

exponent n = 100. When t is small, z = −t

2

/2 is tiny, so (1 + z)

n

may be

approximated using the results of Section 5.3.4:

(1 +z)

n

≈

1 +nz (z 1 and nz 1)

e

nz

(z 1 and nz unrestricted).

(5.45)

Because the exponent n is large, nz can be large even when t and z are

small. Therefore, the safest approximation is (1 +z)

n

≈ e

nz

; then

(cos t)

100

≈

1 −

t

2

2

100

≈ e

−50t

2

. (5.46)

cost

A cosine raised to a high power becomes a Gaussian!

As a check on this surprising conclusion, computer-

generated plots of (cos t)

n

for n = 1 . . . 5 show a

Gaussian bell shape taking form as n increases.

Even with this graphical evidence, replacing (cos t)

100

by a Gaussian is a

bit suspicious. In the original integral, t ranges from −π/2 to π/2, and

these endpoints are far outside the region where cos t ≈ 1 − t

2

/2 is an

accurate approximation. Fortunately, this issue contributes only a tiny

error (Problem 5.35). Ignoring this error turns the original integral into a

Gaussian integral with ﬁnite limits:

π/2

−π/2

(cos t)

100

dt ≈

π/2

−π/2

e

−50t

2

dt. (5.47)

96 5 Taking out the big part

Unfortunately, with ﬁnite limits the integral has no closed form. But

extending the limits to inﬁnity produces a closed form while contributing

almost no error (Problem 5.36). The approximation chain is now

π/2

−π/2

(cos t)

100

dt ≈

π/2

−π/2

e

−50t

2

dt ≈

∞

−∞

e

−50t

2

dt. (5.48)

Problem 5.35 Using the original limits

The approximation cos t ≈ 1 −t

2

/2 requires that t be small. Why doesn’t using

the approximation outside the small-t range contribute a signiﬁcant error?

Problem 5.36 Extending the limits

Why doesn’t extending the integration limits from ±π/2 to ±∞ contribute a

signiﬁcant error?

The last integral is an old friend (Section 2.1):

¸

∞

−∞

e

−αt

2

dt =

π/α. With

α = 50, the integral becomes

π/50. Conveniently, 50 is roughly 16π, so

the square root—and our 5% estimate—is roughly 0.25.

For comparison, the exact integral is (Problem 5.41)

π/2

−π/2

(cos t)

n

dt = 2

−n

n

n/2

π. (5.49)

When n = 100, the binomial coeﬃcient and power of two produce

12611418068195524166851562157

158456325028528675187087900672

π ≈ 0.25003696348037. (5.50)

Our 5-minute, within-5% estimate of 0.25 is accurate to almost 0.01%!

Problem 5.37 Sketching the approximations

Plot (cos t)

100

and its two approximations e

−50t

2

and 1 −50t

2

.

Problem 5.38 Simplest approximation

Use the linear fractional-change approximation (1 − t

2

/2)

100

≈ 1 − 50t

2

to

approximate the integrand; then integrate it over the range where 1 − 50t

2

is

positive. How close is the result of this 1-minute method to the exact value

0.2500 . . .?

Problem 5.39 Huge exponent

Estimate

π/2

−π/2

(cos t)

10000

dt. (5.51)

5.6 Summary and further problems 97

Problem 5.40 How low can you go?

Investigate the accuracy of the approximation

π/2

−π/2

(cos t)

n

dt ≈

π

n

, (5.52)

for small n, including n = 1.

Problem 5.41 Closed form

To evaluate the integral

π/2

−π/2

(cos t)

100

dt (5.53)

in closed form, use the following steps:

a. Replace cos t with (e

it

+e

−it

)

2.

b. Use the binomial theorem to expand the 100th power.

c. Pair each term like e

ikt

with a counterpart e

−ikt

; then integrate their sum

from −π/2 to π/2. What value or values of k produce a sum whose integral

is nonzero?

5.6 Summary and further problems

Upon meeting a complicated problem, divide it into a big part—the most

important eﬀect—and a correction. Analyze the big part ﬁrst, and worry

about the correction afterward. This successive-approximation approach,

a species of divide-and-conquer reasoning, gives results automatically

in a low-entropy form. Low-entropy expressions admit few plausible

alternatives; they are therefore memorable and comprehensible. In short,

approximate results can be more useful than exact results.

Problem 5.42 Large logarithm

What is the big part in ln(1+e

2

)? Give a short calculation to estimate ln(1+e

2

)

to within 2%.

Problem 5.43 Bacterial mutations

In an experiment described in a Caltech biology seminar in the 1990s, researchers

repeatedly irradiated a population of bacteria in order to generate mutations. In

each round of radiation, 5% of the bacteria got mutated. After 140 rounds,

roughly what fraction of bacteria were left unmutated? (The seminar speaker

gave the audience 3 s to make a guess, hardly enough time to use or even ﬁnd

a calculator.)

98 5 Taking out the big part

Problem 5.44 Quadratic equations revisited

The following quadratic equation, inspired by [29], describes a very strongly

damped oscillating system.

s

2

+10

9

s +1 = 0. (5.54)

a. Use the quadratic formula and a standard calculator to ﬁnd both roots of the

quadratic. What goes wrong and why?

b. Estimate the roots by taking out the big part. (Hint: Approximate and solve

the equation in appropriate extreme cases.) Then improve the estimates using

successive approximation.

c. What are the advantages and disadvantages of the quadratic-formula analysis

versus successive approximation?

Problem 5.45 Normal approximation to the binomial distribution

The binomial expansion

1

2

+

1

2

2n

(5.55)

contains terms of the form

f(k) ≡

2n

n −k

2

−2n

, (5.56)

where k = −n. . . n. Each term f(k) is the probability of tossing n − k heads

(and n + k tails) in 2n coin ﬂips; f(k) is the so-called binomial distribution

with parameters p = q = 1/2. Approximate this distribution by answering the

following questions:

a. Is f(k) an even or an odd function of k? For what k does f(k) have its

maximum?

b. Approximate f(k) when k n and sketch f(k). Therefore, derive and explain

the normal approximation to the binomial distribution.

c. Use the normal approximation to show that the variance of this binomial

distribution is n/2.

Problem 5.46 Beta function

The following integral appears often in Bayesian inference:

f(a, b) =

1

0

x

a

(1 −x)

b

dx, (5.57)

where f(a − 1, b − 1) is the Euler beta function. Use street-ﬁghting methods to

conjecture functional forms for f(a, 0), f(a, a), and, ﬁnally, f(a, b). Check your

conjectures with a high-quality table of integrals or a computer-algebra system

such as Maxima.

6

Analogy

6.1 Spatial trigonometry: The bond angle in methane 99

6.2 Topology: How many regions? 103

6.3 Operators: Euler–MacLaurin summation 107

6.4 Tangent roots: A daunting transcendental sum 113

6.5 Bon voyage 121

When the going gets tough, the tough lower their standards. This idea,

the theme of the whole book, underlies the ﬁnal street-ﬁghting tool of

reasoning by analogy. Its advice is simple: Faced with a diﬃcult problem,

construct and solve a similar but simpler problem—an analogous problem.

Practice develops ﬂuency. The tool is introduced in spatial trigonometry

(Section 6.1); sharpened on solid geometry and topology (Section 6.2);

then applied to discrete mathematics (Section 6.3) and, in the farewell

example, to an inﬁnite transcendental sum (Section 6.4).

6.1 Spatial trigonometry: The bond angle in methane

θ

The ﬁrst analogy comes from spatial trigonometry. In

methane (chemical formula CH

4

), a carbon atom sits at

the center of a regular tetrahedron, and one hydrogen

atom sits at each vertex. What is the angle θ between

two carbon–hydrogen bonds?

Angles in three dimensions are hard to visualize. Try, for

example, to imagine and calculate the angle between two faces of a regular

tetrahedron. Because two-dimensional angles are easy to visualize, let’s

construct and analyze an analogous planar molecule. Knowing its bond

angle might help us guess methane’s bond angle.

100 6 Analogy

Should the analogous planar molecule have four or three hydrogens?

Four hydrogens produce four bonds which, when spaced

regularly in a plane, produce two diﬀerent bond angles. In

contrast, methane contains only one bond angle. Therefore,

using four hydrogens alters a crucial feature of the original

problem. The likely solution is to construct the analogous

planar molecule using only three hydrogens.

θ

Three hydrogens arranged regularly in a plane create only

one bond angle: θ = 120

◦

. Perhaps this angle is the bond

angle in methane! One data point, however, is a thin reed

on which to hang a prediction for higher dimensions. The

single data point for two dimensions (d = 2) is consistent with numerous

conjectures—for example, that in d dimensions the bond angle is 120

◦

or

(60d)

◦

or much else.

θ

Selecting a reasonable conjecture requires gathering further

data. Easily available data comes from an even simpler yet

analogous problem: the one-dimensional, linear molecule

CH

2

. Its two hydrogens sit opposite one another, so the

two C–H bonds form an angle of θ = 180

◦

.

Based on the accumulated data, what are reasonable conjectures for the three-

dimensional angle θ

3

?

d θ

d

1 180

◦

2 120

3 ?

The one-dimensional molecule eliminates the conjecture that

θ

d

= (60d)

◦

. It also suggests new conjectures—for example,

that θ

d

= (240 − 60d)

◦

or θ

d

= 360

◦

/(d +1). Testing these

conjectures is an ideal task for the method of easy cases.

The easy-cases test of higher dimensions (high d) refutes the

conjecture that θ

d

= (240 − 60d)

◦

. For high d, it predicts

implausible bond angles—namely, θ = 0 for d = 4 and θ < 0 for d > 4.

Fortunately, the second suggestion, θ

d

= 360

◦

/(d +1), passes the same

easy-cases test. Let’s continue to test it by evaluating its prediction for

methane—namely, θ

3

= 90

◦

. Imagine then a big brother of methane: a

CH

6

molecule with carbon at the center of a cube and six hydrogens at the

face centers. Its small bond angle is 90

◦

. (The other bond angle is 180

◦

.)

Now remove two hydrogens to turn CH

6

into CH

4

, evenly spreading out

the remaining four hydrogens. Reducing the crowding raises the small

bond angle above 90

◦

—and refutes the prediction that θ

3

= 90

◦

.

6.1 Spatial trigonometry: The bond angle in methane 101

Problem 6.1 How many hydrogens?

How many hydrogens are needed in the analogous four- and ﬁve-dimensional

bond-angle problems? Use this information to show that θ

4

> 90

◦

. Is θ

d

> 90

◦

for all d?

The data so far have refuted the simplest rational-function conjectures

(240−60d)

◦

and 360

◦

/(d+1). Although other rational-function conjectures

might survive, with only two data points the possibilities are too vast.

Worse, θ

d

might not even be a rational function of d.

Progress requires a new idea: The bond angle might not be the simplest

variable to study. An analogous diﬃculty arises when conjecturing the

next term in the series 3, 5, 11, 29, . . .

What is the next term in the series?

At ﬁrst glance, the numbers seems almost random. Yet subtracting 2 from

each term produces 1, 3, 9, 27, . . . Thus, in the original series the next

term is likely to be 83. Similarly, a simple transformation of the θ

d

data

might help us conjecture a pattern for θ

d

.

What transformation of the θ

d

data produces simple patterns?

The desired transformation should produce simple patterns and have aes-

thetic or logical justiﬁcation. One justiﬁcation is the structure of an honest

calculation of the bond angle, which can be computed as a dot product

of two C–H vectors (Problem 6.3). Because dot products involve cosines,

a worthwhile transformation of θ

d

is cos θ

d

.

d θ

d

cos θ

d

1 180

◦

−1

2 120 −1/2

3 ? ?

This transformation simpliﬁes the data: The cos θ

d

series begins simply −1, −1/2, . . . Two plausible

continuations are −1/4 or −1/3; they correspond,

respectively, to the general term −1/2

d−1

or −1/d.

Which continuation and conjecture is the more plausible?

Both conjectures predict cos θ < 0 and therefore θ

d

> 90

◦

(for all d). This

shared prediction is encouraging (Problem 6.1); however, being shared

means that it does not distinguish between the conjectures.

HH

CC

HH

1 1

Does either conjecture match the molecular geometry?

An important geometric feature, apart from the bond

angle, is the position of the carbon. In one dimension, it lies halfway

102 6 Analogy

between the two hydrogens, so it splits the H–H line segment into two

pieces having a 1: 1 length ratio.

HH HH

HH

CC

1

2

In two dimensions, the carbon lies on the altitude that

connects one hydrogen to the midpoint of the other

two hydrogens. The carbon splits the altitude into two

pieces having a 1: 2 length ratio.

How does the carbon split the analogous altitude of methane?

CC

In methane, the analogous altitude runs from the top

vertex to the center of the base. The carbon lies at the

mean position and therefore at the mean height of the

four hydrogens. Because the three base hydrogens have

zero height, the mean height of the four hydrogens is

h/4, where h is the height of the top hydrogen. Thus,

in three dimensions, the carbon splits the altitude into

two parts having a length ratio of h/4 : 3h/4 or 1 : 3. In d dimensions,

therefore, the carbon probably splits the altitude into two parts having a

length ratio of 1: d (Problem 6.2).

109.47

◦

Because 1 : d arises naturally in the geometry, cos θ

d

is

more likely to contain 1/d rather than 1/2

d−1

. Thus, the

more likely of the two cos θ

d

conjectures is that

cos θ

d

= −

1

d

. (6.1)

For methane, where d = 3, the predicted bond angle is

arccos(−1/3) or approximately 109.47

◦

. This prediction using reasoning

by analogy agrees with experiment and with an honest calculation using

analytic geometry (Problem 6.3).

Problem 6.2 Carbon’s position in higher dimensions

Justify conjecture that the carbon splits the altitude into two pieces having a

length ratio 1: d.

Problem 6.3 Analytic-geometry solution

In order to check the solution using analogy, use analytic geometry as follows to

ﬁnd the bond angle. First, assign coordinates (x

n

, y

n

, z

n

) to the n hydrogens,

where n = 1 . . . 4, and solve for those coordinates. (Use symmetry to make the

coordinates as simple as you can.) Then choose two C–H vectors and compute

the angle that they subtend.

6.2 Topology: How many regions? 103

Problem 6.4 Extreme case of high dimensionality

Draw a picture to explain the small-angle approximation arccos x ≈ π/2 − x.

What is the approximate bond angle in high dimensions (large d)? Can you ﬁnd

an intuitive explanation for the approximate bond angle?

6.2 Topology: How many regions?

The bond angle in methane (Section 6.1) can be calculated directly with

analytic geometry (Problem 6.3), so reasoning by analogy does not show

its full power. Therefore, try the following problem.

Into how many regions do ﬁve planes divide space?

This formulation permits degenerate arrangements such as ﬁve parallel

planes, four planes meeting at a point, or three planes meeting at a line. To

eliminate these and other degeneracies, let’s place and orient the planes

randomly, thereby maximizing the number of regions. The problem is

then to ﬁnd the maximum number of regions produced by ﬁve planes.

Five planes are hard to imagine, but the method of easy

cases—using fewer planes—might produce a pattern

that generalizes to ﬁve planes. The easiest case is zero

planes: Space remains whole so R(0) = 1 (where R(n)

denotes the number of regions produced by n planes).

The ﬁrst plane divides space into two halves, giving

R(1) = 2. To add the second plane, imagine slicing an

orange twice to produce four wedges: R(2) = 4.

What pattern(s) appear in the data?

A reasonable conjecture is that R(n) = 2

n

. To test it, try

the case n = 3 by slicing the orange a third time and

cutting each of the four pieces into two smaller pieces;

thus, R(3) is indeed 8. Perhaps the pattern continues

with R(4) = 16 and R(5) = 32. In the following table

for R(n), these two extrapolations are marked in gray to

distinguish them from the veriﬁed entries.

n 0 1 2 3 4 5

R 1 2 4 8 16 32

104 6 Analogy

How can the R(n) = 2

n

conjecture be tested further?

A direct test by counting regions is diﬃcult because the regions are hard

to visualize in three dimensions. An analogous two-dimensional prob-

lem would be easier to solve, and its solution may help test the three-

dimensional conjecture. A two-dimensional space is partitioned by lines,

so the analogous question is the following:

What is the maximum number of regions into which n lines divide the plane?

The method of easy cases might suggest a pattern. If the pattern is 2

n

,

then the R(n) = 2

n

conjecture is likely to apply in three dimensions.

What happens in a few easy cases?

Zero lines leave the plane whole, giving R(0) = 1. The next three cases

are as follows (although see Problem 6.5):

R(1)=2 R(2)=4 R(3)=7

Problem 6.5 Three lines again

The R(3) = 7 illustration showed three lines producing seven regions.

Here is another example with three lines, also in a random arrange-

ment, but it seems to produce only six regions. Where, if anywhere,

is the seventh region? Or is R(3) = 6?

Problem 6.6 Convexity

Must all the regions created by the lines be convex? (A region is convex if and

only if a line segment connecting any two points inside the region lies entirely

inside the region.) What about the three-dimensional regions created by placing

planes in space?

Until R(3) turned out to be 7, the conjecture R(n) = 2

n

looked

sound. However, before discarding such a simple conjecture,

draw a fourth line and carefully count the regions. Four lines

make only 11 regions rather than the predicted 16, so the 2

n

conjecture is dead.

A new conjecture might arise from seeing the two-dimensional data R

2

(n)

alongside the three-dimensional data R

3

(n).

6.2 Topology: How many regions? 105

n 0 1 2 3 4

R

2

1 2 4 7 11

R

3

1 2 4 8

In this table, several entries combine to make nearby entries. For example,

R

2

(1) and R

3

(1)—the two entries in the n = 1 column—sum to R

2

(2) or

R

3

(2). These two entries in turn sum to the R

3

(3) entry. But the table

has many small numbers with many ways to combine them; discarding

the coincidences requires gathering further data—and the simplest data

source is the analogous one-dimensional problem.

What is the maximum number of segments into which n points divide a line?

A tempting answer is that n points make n segments. However, an easy

case—that one point produces two segments—reduces the temptation.

Rather, n points make n + 1 segments. That result generates the R

1

row

in the following table.

n 0 1 2 3 4 5 n

R

1

1 2 3 4 5 6 n +1

R

2

1 2 4 7 11

R

3

1 2 4 8

What patterns are in these data?

The 2

n

conjecture survives partially. In the R

1

row, it fails starting at

n = 2. In the R

2

row, it fails starting at n = 3. Thus in the R

3

row, it

probably fails starting at n = 4, making the conjectures R

3

(4) = 16 and

R

3

(5) = 32 improbable. My personal estimate is that, before seeing these

failures, the probability of the R

3

(4) = 16 conjecture was 0.5; but now it

falls to at most 0.01. (For more on estimating and updating the proba-

bilities of conjectures, see the important works on plausible reasoning by

Corﬁeld [11], Jaynes [21], and Polya [36].)

In better news, the apparent coincidences contain a robust pattern:

n 0 1 2 3 4 5 n

R

1

1 2 3 4 5 6 n +1

R

2

1 2 4 7 11

R

3

1 2 4 8

106 6 Analogy

If the pattern continues, into how many regions can ﬁve planes divide space?

According to the pattern,

R

3

(4) = R

2

(3)

. .. .

7

+R

3

(3)

. .. .

8

= 15 (6.2)

and then

R

3

(5) = R

2

(4)

. .. .

11

+R

3

(4)

. .. .

15

= 26. (6.3)

Thus, ﬁve planes can divide space into a maximum of 26 regions.

This number is hard to deduce by drawing ﬁve planes and counting the

regions. Furthermore, that brute-force approach would give the value of

only R

3

(5), whereas easy cases and analogy give a method to compute

any entry in the table. They thereby provide enough data to conjecture

expressions for R

2

(n) (Problem 6.9), for R

3

(n) (Problem 6.10), and for the

general entry R

d

(n) (Problem 6.12).

Problem 6.7 Checking the pattern in two dimensions

The conjectured pattern predicts R

2

(5) = 16: that ﬁve lines can divide the plane

into 16 regions. Check the conjecture by drawing ﬁve lines and counting the

regions.

Problem 6.8 Free data from zero dimensions

Because the one-dimensional problem gave useful data, try the zero-dimensional

problem. Extend the pattern for the R

3

, R

2

, and R

1

rows upward to construct

an R

0

row. It gives the number of zero-dimensional regions (points) produced

by partitioning a point with n objects (of dimension −1). What is R

0

if the row

is to follow the observed pattern? Is that result consistent with the geometric

meaning of trying to subdivide a point?

Problem 6.9 General result in two dimensions

The R

0

data ﬁts R

0

(n) = 1 (Problem 6.8), which is a zeroth-degree polynomial.

The R

1

data ﬁts R

1

(n) = n + 1, which is a ﬁrst-degree polynomial. Therefore,

the R

2

data probably ﬁts a quadratic.

Test this conjecture by ﬁtting the data for n = 0 . . . 2 to the general quadratic

An

2

+Bn +C, repeatedly taking out the big part (Chapter 5) as follows.

a. Guess a reasonable value for the quadratic coeﬃcient A. Then take out (sub-

tract) the big part An

2

and tabulate the leftover, R

2

(n) −An

2

, for n = 0 . . . 2.

6.3 Operators: Euler–MacLaurin summation 107

If the leftover is not linear in n, then a quadratic term remains or too much

was removed. In either case, adjust A.

b. Once the quadratic coeﬃcient A is correct, use an analogous procedure to

ﬁnd the linear coeﬃcient B.

c. Similarly solve for the constant coeﬃcient C.

d. Check your quadratic ﬁt against new data (R

2

(n) for n 3).

Problem 6.10 General result in three dimensions

A reasonable conjecture is that the R

3

row matches a cubic (Problem 6.9). Use

taking out the big part to ﬁt a cubic to the n = 0 . . . 3 data. Does it produce the

conjectured values R

3

(4) = 15 and R

3

(5) = 26?

Problem 6.11 Geometric explanation

Find a geometric explanation for the observed pattern. Hint: Explain ﬁrst why

the pattern generates the R

2

row from the R

1

row; then generalize the reason to

explain the R

3

row.

Problem 6.12 General solution in arbitrary dimension

The pattern connecting neighboring entries of the R

d

(n) table is the pattern

that generates Pascal’s triangle [17]. Because Pascal’s triangle produces binomial

coeﬃcients, the general expression R

d

(n) should contain binomial coeﬃcients.

Therefore, use binomial coeﬃcients to express R

0

(n) (Problem 6.8), R

1

(n), and

R

2

(n) (Problem 6.9). Then conjecture a binomial-coeﬃcient form for R

3

(n) and

R

d

(n), checking the result against Problem 6.10.

Problem 6.13 Power-of-2 conjecture

Our ﬁrst conjecture for the number of regions was R

d

(n) = 2

n

. In three dimen-

sions, it worked until n = 4. In d dimensions, show that R

d

(n) = 2

n

for n d

(perhaps using the results of Problem 6.12).

6.3 Operators: Euler–MacLaurin summation

The next analogy studies unusual functions. Most functions turn numbers

into other numbers, but special kinds of functions—operators—turn func-

tions into other functions. A familiar example is the derivative operator

D. It turns the sine function into the cosine function, or the hyperbolic

sine function into the hyperbolic cosine function. In operator notation,

D(sin) = cos and D(sinh) = cosh; omitting the parentheses gives the

less cluttered expression Dsin = cos and Dsinh = cosh. To understand

and learn how to use operators, a fruitful tool is reasoning by analogy:

Operators behave much like ordinary functions or even like numbers.

108 6 Analogy

6.3.1 Left shift

Like a number, the derivative operator D can be squared to make D

2

(the

second-derivative operator) or to make any integer power of D. Similarly,

the derivative operator can be fed to a polynomial. In that usage, an

ordinary polynomial such as P(x) = x

2

+ x/10 + 1 produces the operator

polynomial P(D) = D

2

+ D/10 + 1 (the diﬀerential operator for a lightly

damped spring–mass system).

How far does the analogy to numbers extend? For example, do coshD

or sinD have a meaning? Because these functions can be written using

the exponential function, let’s investigate the operator exponential e

D

.

What does e

D

mean?

The direct interpretation of e

D

is that it turns a function f into e

Df

.

D exp f e

Df

Df

However, this interpretation is needlessly nonlinear. It turns 2f into e

2Df

,

which is the square of e

Df

, whereas a linear operator that produces e

Df

from f would produce 2e

Df

from 2f. To get a linear interpretation, use a

Taylor series—as if D were a number—to build e

D

out of linear operators.

e

D

= 1 +D+

1

2

D

2

+

1

6

D

3

+· · · . (6.4)

What does this e

D

do to simple functions?

The simplest nonzero function is the constant function f = 1. Here is that

function being fed to e

D

:

(1 +D+· · ·)

. .. .

e

D

1

....

f

= 1. (6.5)

The next simplest function x turns into x +1.

1 +D+

D

2

2

+· · ·

x = x +1. (6.6)

More interestingly, x

2

turns into (x +1)

2

.

1 +D+

D

2

2

+

D

3

6

· · ·

x

2

= x

2

+2x +1 = (x +1)

2

. (6.7)

6.3 Operators: Euler–MacLaurin summation 109

Problem 6.14 Continue the pattern

What is e

D

x

3

and, in general, e

D

x

n

?

What does e

D

do in general?

The preceding examples follow the pattern e

D

x

n

= (x+1)

n

. Because most

functions of x can be expanded in powers of x, and e

D

turns each x

n

term

into (x+1)

n

, the conclusion is that e

D

turns f(x) into f(x+1). Amazingly,

e

D

is simply L, the left-shift operator.

Problem 6.15 Right or left shift

Draw a graph to show that f(x) → f(x + 1) is a left rather than a right shift.

Apply e

−D

to a few simple functions to characterize its behavior.

Problem 6.16 Operating on a harder function

Apply the Taylor expansion for e

D

to sinx to show that e

D

sinx = sin(x +1).

Problem 6.17 General shift operator

If x has dimensions, then the derivative operator D = d/dx is not dimensionless,

and e

D

is an illegal expression. To make the general expression e

aD

legal, what

must the dimensions of a be? What does e

aD

do?

6.3.2 Summation

Just as the derivative operator can represent the left-shift operator (as L =

e

D

), the left-shift operator can represent the operation of summation. This

operator representation will lead to a powerful method for approximating

sums with no closed form.

Summation is analogous to the more familiar operation of integration.

Integration occurs in deﬁnite and indeﬁnite ﬂavors: Deﬁnite integration

is equivalent to indeﬁnite integration followed by evaluation at the limits

of integration. As an example, here is the deﬁnite integration of f(x) = 2x.

b

a

b

2

−a

2

2x

x

2

+C

integration limits

In general, the connection between an input function g and the result of

indeﬁnite integration is DG = g, where D is the derivative operator and

G =

¸

g is the result of indeﬁnite integration. Thus D and

¸

are inverses

110 6 Analogy

of one another—D

¸

= 1 or D = 1/

¸

—a connection represented by the

loop in the diagram. (

¸

D = 1 because of a possible integration constant.)

b

a

D

G(b) −G(a)

g

G

What is the analogous picture for summation?

f(k)

k

f(2)

2

f(3)

3

f(4)

4 5

Analogously to integration, deﬁne deﬁnite

summation as indeﬁnite summation and

then evaluation at the limits. But apply the

analogy with care to avoid an oﬀ-by-one or

fencepost error (Problem 2.24). The sum

¸

4

2

f(k) includes three rectangles—f(2), f(3), and f(4)—whereas the deﬁ-

nite integral

¸

4

2

f(k) dk does not include any of the f(4) rectangle. Rather

than rectifying the discrepancy by redeﬁning the familiar operation of

integration, interpret indeﬁnite summation to exclude the last rectangle.

Then indeﬁnite summation followed by evaluating at the limits a and b

produces a sum whose index ranges from a to b −1.

As an example, take f(k) = k. Then the indeﬁnite sum

¸

f is the function

F deﬁned by F(k) = k(k−1)/2+C (where C is the constant of summation).

Evaluating F between 0 and n gives n(n − 1)/2, which is

¸

n−1

0

k. In the

following diagram, these steps are the forward path.

b

a

¸

Δ

F(b) −F(a) =

b−1

¸

k=a

f(k) f

F

Δ

In the reverse path, the new Δ operator inverts Σ just as diﬀerentiation

inverts integration. Therefore, an operator representation for Δ provides

one for Σ. Because Δ and the derivative operator D are analogous, their

representations are probably analogous. A derivative is the limit

df

dx

= lim

h→0

f(x +h) −f(x)

h

. (6.8)

6.3 Operators: Euler–MacLaurin summation 111

The derivative operator D is therefore the operator limit

D = lim

h→0

L

h

−1

h

, (6.9)

where the L

h

operator turns f(x) into f(x +h)—that is, L

h

left shifts by h.

Problem 6.18 Operator limit

Explain why L

h

≈ 1 +hD for small h. Show therefore that L = e

D

.

What is an analogous representation of Δ?

The operator limit for D uses an inﬁnitesimal left shift; correspondingly,

the inverse operation of integration sums rectangles of inﬁnitesimal width.

Because summation Σ sums rectangles of unit width, its inverse Δ should

use a unit left shift—namely, L

h

with h = 1. As a reasonable conjecture,

Δ = lim

h→1

L

h

−1

h

= L −1. (6.10)

This Δ—called the ﬁnite-diﬀerence operator—is constructed to be 1/Σ. If

the construction is correct, then (L − 1)Σ is the identity operator 1. In

other words, (L −1)Σ should turn functions into themselves.

How well does this conjecture work in various easy cases?

To test the conjecture, apply the operator (L−1)Σ ﬁrst to the easy function

g = 1. Then Σg is a function waiting to be fed an argument, and (Σg)(k)

is the result of feeding it k. With that notation, (Σg)(k) = k +C. Feeding

this function to the L −1 operator reproduces g.

(L −1)Σg

(k) = (k +1 +C)

. .. .

(LΣg)(k)

− (k +C)

. .. .

(1Σg)(k)

= 1

....

g(k)

. (6.11)

With the next-easiest function—deﬁned by g(k) = k—the indeﬁnite sum

(Σg)(k) is k(k −1)/2 +C. Passing Σg through L −1 again reproduces g.

(L −1)Σg

(k) =

(k +1)k

2

+C

. .. .

(LΣg)(k)

−

k(k −1)

2

+C

. .. .

(1Σg)(k)

= k

....

g(k)

. (6.12)

In summary, for the test functions g(k) = 1 and g(k) = k, the operator

product (L−1)Σ takes g back to itself, so it acts like the identity operator.

112 6 Analogy

This behavior is general—(L−1)Σ1 is indeed 1, and Σ = 1/(L−1). Because

L = e

D

, we have Σ = 1/(e

D

− 1). Expanding the right side in a Taylor

series gives an amazing representation of the summation operator.

¸

=

1

e

D

−1

=

1

D

−

1

2

+

D

12

−

D

3

720

+

D

5

30240

−· · · . (6.13)

Because D

¸

= 1, the leading term 1/D is integration. Thus, summation

is approximately integration—a plausible conclusion indicating that the

operator representation is not nonsense.

Applying this operator series to a function f and then evaluating at the

limits a and b produces the Euler–MacLaurin summation formula

b−1

¸

a

f(k) =

b

a

f(k) dk −

f(b) −f(a)

2

+

f

(1)

(b) −f

(1)

(a)

12

−

f

(3)

(b) −f

(3)

(a)

720

+

f

(5)

(b) −f

(5)

(a)

30240

−· · · ,

(6.14)

where f

(n)

indicates the nth derivative of f.

The sum lacks the usual ﬁnal term f(b). Including this term gives the

useful alternative

b

¸

a

f(k) =

b

a

f(k) dk +

f(b) +f(a)

2

+

f

(1)

(b) −f

(1)

(a)

12

−

f

(3)

(b) −f

(3)

(a)

720

+

f

(5)

(b) −f

(5)

(a)

30240

−· · · .

(6.15)

As a check, try an easy case:

¸

n

0

k. Using Euler–MacLaurin summation,

f(k) = k, a = 0, and b = n. The integral term then contributes n

2

/2;

the constant term

f(b) +f(a)

**2 contributes n/2; and later terms vanish.
**

The result is familiar and correct:

n

¸

0

k =

n

2

2

+

n

2

+0 =

n(n +1)

2

. (6.16)

A more stringent test of Euler–MacLaurin summation is to approximate

lnn!, which is the sum

¸

n

1

lnk (Section 4.5). Therefore, sum f(k) = lnk

between the (inclusive) limits a = 1 and b = n. The result is

n

¸

1

lnk =

n

1

lnk dk +

lnn

2

+· · · . (6.17)

6.4 Tangent roots: A daunting transcendental sum 113

lnk

1 · · · n

k

The integral, from the 1/D operator, contributes

the area under the lnk curve. The correction,

from the 1/2 operator, incorporates the triangular

protrusions (Problem 6.20). The ellipsis includes

the higher-order corrections (Problem 6.21)—hard

to evaluate using pictures (Problem 4.32) but sim-

ple using Euler–MacLaurin summation (Problem 6.21).

Problem 6.19 Integer sums

Use Euler–MacLaurin summation to ﬁnd closed forms for the following sums:

(a)

n

¸

0

k

2

(b)

n

¸

0

(2k +1) (c)

n

¸

0

k

3

.

Problem 6.20 Boundary cases

In Euler–MacLaurin summation, the constant term is

f(b) + f(a)

2—one-half

of the ﬁrst term plus one-half of the last term. The picture for summing lnk

(Section 4.5) showed that the protrusions are approximately one-half of the last

term, namely lnn. What, pictorially, happened to one-half of the ﬁrst term?

Problem 6.21 Higher-order terms

Approximate ln5! using Euler–MacLaurin summation.

Problem 6.22 Basel sum

The Basel sum

∞

¸

1

n

−2

may be approximated with pictures (Problem 4.37).

However, the approximation is too crude to help guess the closed form. As

Euler did, use Euler–MacLaurin summation to improve the accuracy until you

can conﬁdently guess the closed form. Hint: Sum the ﬁrst few terms explicitly.

6.4 Tangent roots: A daunting transcendental sum

Our farewell example, chosen because its analysis combines diverse street-

ﬁghting tools, is a diﬃcult inﬁnite sum.

Find S ≡

¸

x

−2

n

where the x

n

are the positive solutions of tanx = x.

The solutions to tanx = x or, equivalently, the roots of tanx − x, are

transcendental and have no closed form, yet a closed form is required for

almost every summation method. Street-ﬁghting methods will come to

our rescue.

114 6 Analogy

6.4.1 Pictures and easy cases

Begin the analysis with a hopefully easy case.

What is the ﬁrst root x

1

?

y = x

π

2

3π

2

1 1

5π

2

2 2

7π

2

3 3

x

The roots of tanx−x are given by the

intersections of y = x and y = tanx.

Surprisingly, no intersection occurs in

the branch of tanx where 0 < x < π/2

(Problem 6.23); the ﬁrst intersection is

just before the asymptote at x = 3π/2.

Thus, x

1

≈ 3π/2.

Problem 6.23 No intersection with the main branch

Show symbolically that tanx = x has no solution for 0 < x < π/2. (The result

looks plausible pictorially but is worth checking in order to draw the picture.)

Where, approximately, are the subsequent intersections?

As x grows, the y = x line intersects the y = tanx graph ever higher

and therefore ever closer to the vertical asymptotes. Therefore, make the

following asymptote approximation for the big part of x

n

:

x

n

≈

n +

1

2

π. (6.18)

6.4.2 Taking out the big part

This approximate, low-entropy expression for x

n

gives the big part of S

(the zeroth approximation).

S ≈

¸

¸

n +

1

2

π

. .. .

≈x

n

¸

−2

=

4

π

2

∞

¸

1

1

(2n +1)

2

. (6.19)

The sum

¸

∞

1

(2n + 1)

−2

is, from a picture (Section 4.5) or from Euler–

MacLaurin summation (Section 6.3.2), roughly the following integral.

∞

¸

1

(2n +1)

−2

≈

∞

1

(2n +1)

−2

dn = −

1

2

×

1

2n +1

∞

1

=

1

6

. (6.20)

6.4 Tangent roots: A daunting transcendental sum 115

Therefore,

S ≈

4

π

2

×

1

6

= 0.067547 . . . (6.21)

(2k +1)

−2

1 2 3 4

k

The shaded protrusions are roughly triangles,

and they sum to one-half of the ﬁrst rectangle.

That rectangle has area 1/9, so

∞

¸

1

(2n +1)

−2

≈

1

6

+

1

2

×

1

9

=

2

9

. (6.22)

Therefore, a more accurate estimate of S is

S ≈

4

π

2

×

2

9

= 0.090063 . . . , (6.23)

which is slightly higher than the ﬁrst estimate.

Is the new approximation an overestimate or an underestimate?

The new approximation is based on two underestimates. First, the asymp-

tote approximation x

n

≈ (n + 0.5)π overestimates each x

n

and therefore

underestimates the squared reciprocals in the sum

¸

x

−2

n

. Second, after

making the asymptote approximation, the pictorial approximation to the

sum

¸

∞

1

(2n + 1)

−2

replaces each protrusion with an inscribed triangle

and thereby underestimates each protrusion (Problem 6.24).

Problem 6.24 Picture for the second underestimate

Draw a picture of the underestimate in the pictorial approximation

∞

¸

1

1

(2n +1)

2

≈

1

6

+

1

2

×

1

9

. (6.24)

How can these two underestimates be remedied?

The second underestimate (the protrusions) is eliminated by summing

¸

∞

1

(2n+1)

−2

exactly. The sum is unfamiliar partly because its ﬁrst term

is the fraction 1/9—whose arbitrariness increases the entropy of the sum.

Including the n = 0 term, which is 1, and the even squared reciprocals

1/(2n)

2

produces a compact and familiar lower-entropy sum.

∞

¸

1

1

(2n +1)

2

+ 1 +

∞

¸

1

1

(2n)

2

=

∞

¸

1

1

n

2

. (6.25)

116 6 Analogy

The ﬁnal, low-entropy sum is the famous Basel sum (high-entropy results

are not often famous). Its value is B = π

2

/6 (Problem 6.22).

How does knowing B = π

2

/6 help evaluate the original sum

¸

∞

1

(2n +1)

−2

?

The major modiﬁcation from the original sum was to include the even

squared reciprocals. Their sum is B/4.

∞

¸

1

1

(2n)

2

=

1

4

∞

¸

1

1

n

2

. (6.26)

The second modiﬁcation was to include the n = 0 term. Thus, to obtain

¸

∞

1

(2n + 1)

−2

, adjust the Basel value B by subtracting B/4 and then the

n = 0 term. The result, after substituting B = π

2

/6, is

∞

¸

1

1

(2n +1)

2

= B −

1

4

B −1 =

π

2

8

−1. (6.27)

This exact sum, based on the asymptote approximation for x

n

, produces

the following estimate of S.

S ≈

4

π

2

∞

¸

1

1

(2n +1)

2

=

4

π

2

π

2

8

−1

. (6.28)

Simplifying by expanding the product gives

S ≈

1

2

−

4

π

2

= 0.094715 . . . (6.29)

Problem 6.25 Check the earlier reasoning

Check the earlier pictorial reasoning (Problem 6.24) that 1/6 + 1/18 = 2/9

underestimates

¸

∞

1

(2n +1)

−2

. How accurate was that estimate?

This estimate of S is the third that uses the asymptote approximation

x

n

≈ (n +0.5)π. Assembled together, the estimates are

S ≈

⎧

⎨

⎩

0.067547 (integral approximation to

¸

∞

1

(2n +1)

−2

),

0.090063 (integral approximation and triangular overshoots),

0.094715 (exact sum of

¸

∞

1

(2n +1)

−2

).

Because the third estimate incorporated the exact value of

¸

∞

1

(2n+1)

−2

,

any remaining error in the estimate of S must belong to the asymptote

approximation itself.

6.4 Tangent roots: A daunting transcendental sum 117

For which term of

¸

x

−2

n

is the asymptote approximation most inaccurate?

As x grows, the graphs of x and tanx intersect ever closer to the vertical

asymptote. Thus, the asymptote approximation makes its largest absolute

error when n = 1. Because x

1

is the smallest root, the fractional error

in x

n

is, relative to the absolute error in x

n

, even more concentrated at

n = 1. The fractional error in x

−2

n

, being −2 times the fractional error

in x

n

(Section 5.3), is equally concentrated at n = 1. Because x

−2

n

is the

largest at n = 1, the absolute error in x

−2

n

(the fractional error times x

−2

n

itself) is, by far, the largest at n = 1.

Problem 6.26 Absolute error in the early terms

Estimate, as a function of n, the absolute error in x

−2

n

that is produced by the

asymptote approximation.

With the error so concentrated at n = 1, the greatest improvement in the

estimate of S comes from replacing the approximation x

1

= (n + 0.5)π

with a more accurate value. A simple numerical approach is successive

approximation using the Newton–Raphson method (Problem 4.38). To

ﬁnd a root with this method, make a starting guess x and repeatedly

improve it using the replacement

x −→x −

tanx −x

sec

2

x −1

. (6.30)

When the starting guess for x is slightly below the ﬁrst asymptote at 1.5π,

the procedure rapidly converges to x

1

= 4.4934 . . .

Therefore, to improve the estimate S ≈ 0.094715, which was based on the

asymptote approximation, subtract its approximate ﬁrst term (its big part)

and add the corrected ﬁrst term.

S ≈ S

old

−

1

(1.5π)

2

+

1

4.4934

2

≈ 0.09921. (6.31)

Using the Newton–Raphson method to reﬁne, in addition, the 1/x

2

2

term

gives S ≈ 0.09978 (Problem 6.27). Therefore, a highly educated guess is

S =

1

10

. (6.32)

The inﬁnite sum of unknown transcendental numbers seems to be neither

transcendental nor irrational! This simple and surprising rational number

deserves a simple explanation.

118 6 Analogy

Problem 6.27 Continuing the corrections

Choose a small N, say 4. Then use the Newton–Raphson method to compute

accurate values of x

n

for n = 1 . . . N; and use those values to reﬁne the estimate

of S. As you extend the computation to larger values of N, do the reﬁned

estimates of S approach our educated guess of 1/10?

6.4.3 Analogy with polynomials

If only the equation tanx − x = 0 had just a few closed-form solutions!

Then the sum S would be easy to compute. That wish is fulﬁlled by

replacing tanx − x with a polynomial equation with simple roots. The

simplest interesting polynomial is the quadratic, so experiment with a

simple quadratic—for example, x

2

−3x +2.

This polynomial has two roots, x

1

= 1 and x

2

= 2; therefore

¸

x

−2

n

, the

polynomial-root sum analog of the tangent-root sum, has two terms.

¸

x

−2

n

=

1

1

2

+

1

2

2

=

5

4

. (6.33)

This brute-force method for computing the root sum requires a solution

to the quadratic equation. However, a method that can transfer to the

equation tanx − x = 0, which has no closed-form solution, cannot use

the roots themselves. It must use only surface features of the quadratic—

namely, its two coeﬃcients 2 and −3. Unfortunately, no plausible method

of combining 2 and −3 predicts that

¸

x

−2

n

= 5/4.

Where did the polynomial analogy go wrong?

The problem is that the quadratic x

2

−3x +2 is not suﬃciently similar to

tanx − x. The quadratic has only positive roots; however, tanx − x, an

odd function, has symmetric positive and negative roots and has a root

at x = 0. Indeed, the Taylor series for tanx is x + x

3

/3 + 2x

5

/15 + · · ·

(Problem 6.28); therefore,

tanx −x =

x

3

3

+

2x

5

15

+· · · . (6.34)

The common factor of x

3

means that tanx − x has a triple root at x = 0.

An analogous polynomial—here, one with a triple root at x = 0, a positive

root, and a symmetric negative root—is (x+2)x

3

(x−2) or, after expansion,

x

5

−4x

3

. The sum

¸

x

−2

n

(using the positive root) contains only one term

6.4 Tangent roots: A daunting transcendental sum 119

and is simply 1/4. This value could plausibly arise as the (negative) ratio

of the last two coeﬃcients of the polynomial.

To decide whether that pattern is a coincidence, try a richer polynomial:

one with roots at −2, −1, 0 (threefold), 1, and 2. One such polynomial is

(x +2)(x +1)x

3

(x −1)(x −2) = x

7

−5x

5

+4x

3

. (6.35)

The polynomial-root sum uses only the two positive roots 1 and 2 and is

1/1

2

+1/2

2

, which is 5/4—the (negative) ratio of the last two coeﬃcients.

As a ﬁnal test of this pattern, include −3 and 3 among the roots. The

resulting polynomial is

(x

7

−5x

5

+4x

3

)(x +3)(x −3) = x

9

−14x

7

+49x

5

−36x

3

. (6.36)

The polynomial-root sum uses the three positive roots 1, 2, and 3 and is

1/1

2

+ 1/2

2

+ 1/3

2

, which is 49/36—again the (negative) ratio of the last

two coeﬃcients in the expanded polynomial.

What is the origin of the pattern, and how can it be extended to tanx −x?

To explain the pattern, tidy the polynomial as follows:

x

9

−14x

7

+49x

5

−36x

3

= −36x

3

1 −

49

36

x

2

+

14

36

x

4

−

1

36

x

6

. (6.37)

In this arrangement, the sum 49/36 appears as the negative of the ﬁrst

interesting coeﬃcient. Let’s generalize. Placing k roots at x = 0 and single

roots at ±x

1

, ±x

2

, . . ., ±x

n

gives the polynomial

Ax

k

1 −

x

2

x

2

1

1 −

x

2

x

2

2

1 −

x

2

x

2

3

· · ·

1 −

x

2

x

2

n

, (6.38)

where A is a constant. When expanding the product of the factors in

parentheses, the coeﬃcient of the x

2

term in the expansion receives one

contribution from each x

2

/x

2

k

term in a factor. Thus, the expansion begins

Ax

k

¸

1 −

1

x

2

1

+

1

x

2

2

+

1

x

2

3

+· · · +

1

x

2

n

x

2

+· · ·

. (6.39)

The coeﬃcient of x

2

in parentheses is

¸

x

−2

n

, which is the polynomial

analog of the tangent-root sum.

Let’s apply this method to tanx −x. Although it is not a polynomial, its

Taylor series is like an inﬁnite-degree polynomial. The Taylor series is

120 6 Analogy

x

3

3

+

2x

5

15

+

17x

7

315

+· · · =

x

3

3

1 +

2

5

x

2

+

17

105

x

4

+· · ·

. (6.40)

The negative of the x

2

coeﬃcient should be −

¸

x

−2

n

. For the tangent-

sum problem,

¸

x

−2

n

should therefore be −2/5. Unfortunately, the sum

of positive quantities cannot be negative!

What went wrong with the analogy?

One problem is that tanx − x might have imaginary or complex roots

whose squares contribute negative amounts to S. Fortunately, all its roots

are real (Problem 6.29). A harder-to-solve problem is that tanx −x goes

to inﬁnity at ﬁnite values of x, and does so inﬁnitely often, whereas no

polynomial does so even once.

sinx −xcosx

0

x

1

x

2

x

3

The solution is to construct a function having no

inﬁnities but having the same roots as tanx−x. The

inﬁnities of tanx − x occur where tanx blows up,

which is where cos x = 0. To remove the inﬁnities

without creating or destroying any roots, multiply

tanx −x by cos x. The polynomial-like function to

expand is therefore sinx −x cos x.

Its Taylor expansion is

x −

x

3

6

+

x

5

120

−· · ·

. .. .

sinx

−

x −

x

3

2

+

x

5

24

−· · ·

. .. .

x cos x

. (6.41)

The diﬀerence of the two series is

sinx −x cos x =

x

3

3

1 −

1

10

x

2

+· · ·

. (6.42)

The x

3

/3 factor indicates the triple root at x = 0. And there at last, as the

negative of the x

2

coeﬃcient, sits our tangent-root sum S = 1/10.

Problem 6.28 Taylor series for the tangent

Use the Taylor series for sinx and cos x to show that

tanx = x +

x

3

3

+

2x

5

15

+· · · . (6.43)

Hint: Use taking out the big part.

6.5 Bon voyage 121

Problem 6.29 Only real roots

Show that all roots of tanx −x are real.

Problem 6.30 Exact Basel sum

Use the polynomial analogy to evaluate the Basel sum

∞

¸

1

1

n

2

. (6.44)

Compare your result with your solution to Problem 6.22.

Problem 6.31 Misleading alternative expansions

Squaring and taking the reciprocal of tanx = x gives cot

2

x = x

−2

; equivalently,

cot

2

x−x

−2

= 0. Therefore, if x is a root of tanx−x, it is a root of cot

2

x−x

−2

.

The Taylor expansion of cot

2

x −x

−2

is

−

2

3

1 −

1

10

x

2

−

1

63

x

4

−· · ·

. (6.45)

Because the coeﬃcient of x

2

is −1/10, the tangent-root sum S—for cot x = x

−2

and therefore tanx = x—should be 1/10. As we found experimentally and

analytically for tanx = x, the conclusion is correct. However, what is wrong

with the reasoning?

Problem 6.32 Fourth powers of the reciprocals

The Taylor series for sinx −x cos x continues

x

3

3

1 −

x

2

10

+

x

4

280

−· · ·

. (6.46)

Therefore ﬁnd

¸

x

−4

n

for the positive roots of tanx = x. Check numerically

that your result is plausible.

Problem 6.33 Other source equations for the roots

Find

¸

x

−2

n

, where the x

n

are the positive roots of cos x.

6.5 Bon voyage

I hope that you have enjoyed incorporating street-ﬁghting methods into

your problem-solving toolbox. May you ﬁnd diverse opportunities to use

dimensional analysis, easy cases, lumping, pictorial reasoning, taking out

the big part, and analogy. As you apply the tools, you will sharpen

them—and even build new tools.

Bibliography

[1] P. Agnoli and G. D’Agostini. Why does the meter beat the second?.

arXiv:physics/0412078v2, 2005. Accessed 14 September 2009.

[2] John Morgan Allman. Evolving Brains. W. H. Freeman, New York, 1999.

[3] Gert Almkvist and Bruce Berndt. Gauss, Landen, Ramanujan, the arithmetic-

geometric mean, ellipses, π, and the Ladies Diary. American Mathematical Monthly,

95(7):585–608, 1988.

[4] William J. H. Andrewes (Ed.). The Quest for Longitude: The Proceedings of the Longi-

tude Symposium, Harvard University, Cambridge, Massachusetts, November 4–6, 1993.

Collection of Historical Scientiﬁc Instruments, Harvard University, Cambridge,

Massachusetts, 1996.

[5] Petr Beckmann. A History of Pi. Golem Press, Boulder, Colo., 4th edition, 1977.

[6] Lennart Berggren, Jonathan Borwein and Peter Borwein (Eds.). Pi, A Source Book.

Springer, New York, 3rd edition, 2004.

[7] John Malcolm Blair. The Control of Oil. Pantheon Books, New York, 1976.

[8] Benjamin S. Bloom. The 2 sigma problem: The search for methods of group

instruction as eﬀective as one-to-one tutoring. Educational Researcher, 13(6):4–16,

1984.

[9] E. Buckingham. On physically similar systems. Physical Review, 4(4):345–376,

1914.

[10] Barry Cipra. Misteaks: And How to Find Them Before the Teacher Does. AK Peters,

Natick, Massachusetts, 3rd edition, 2000.

[11] David Corﬁeld. Towards a Philosophy of Real Mathematics. Cambridge University

Press, Cambridge, England, 2003.

[12] T. E. Faber. Fluid Dynamics for Physicists. Cambridge University Press, Cambridge,

England, 1995.

[13] L. P. Fulcher and B. F. Davis. Theoretical and experimental study of the motion

of the simple pendulum. American Journal of Physics, 44(1):51–55, 1976.

[14] George Gamow. Thirty Years that Shook Physics: The Story of Quantum Theory.

Dover, New York, 1985.

[15] Simon Gindikin. Tales of Mathematicians and Physicists. Springer, New York, 2007.

124

[16] Fernand Gobet and Herbert A. Simon. The role of recognition processes and

look-ahead search in time-constrained expert problem solving: Evidence from

grand-master-level chess. Psychological Science, 7(1):52-55, 1996.

[17] Ronald L. Graham, Donald E. Knuth and Oren Patashnik. Concrete Mathematics.

Addison–Wesley, Reading, Massachusetts, 2nd edition, 1994.

[18] Godfrey Harold Hardy, J. E. Littlewood and G. Polya. Inequalities. Cambridge

University Press, Cambridge, England, 2nd edition, 1988.

[19] William James. The Principles of Psychology. Harvard University Press, Cambridge,

MA, 1981. Originally published in 1890.

[20] Edwin T. Jaynes. Information theory and statistical mechanics. Physical Review,

106(4):620–630, 1957.

[21] Edwin T. Jaynes. Probability Theory: The Logic of Science. Cambridge University

Press, Cambridge, England, 2003.

[22] A. J. Jerri. The Shannon sampling theorem—Its various extensions and applica-

tions: A tutorial review. Proceedings of the IEEE, 65(11):1565–1596, 1977.

[23] Louis V. King. On some new formulae for the numerical calculation of the mutual

induction of coaxial circles. Proceedings of the Royal Society of London. Series A,

Containing Papers of a Mathematical and Physical Character, 100(702):60–66, 1921.

[24] Charles Kittel, Walter D. Knight and Malvin A. Ruderman. Mechanics, volume 1

of The Berkeley Physics Course. McGraw–Hill, New York, 1965.

[25] Anne Marchand. Impunity for multinationals. ATTAC, 11 September 2002.

[26] Mars Climate Orbiter Mishap Investigation Board. Phase I report. Technical Re-

port, NASA, 1999.

[27] Michael R. Matthews. Time for Science Education: How Teaching the History and

Philosophy of Pendulum Motion can Contribute to Science Literacy. Kluwer, New

York, 2000.

[28] R.D. Middlebrook. Low-entropy expressions: the key to design-oriented analy-

sis. In Frontiers in Education Conference, 1991. Twenty-First Annual Conference. ‘En-

gineering Education in a New World Order’. Proceedings, pages 399–403, Purdue

University, West Lafayette, Indiana, September 21–24, 1991.

[29] R. D. Middlebrook. Methods of design-oriented analysis: The quadratic equa-

tion revisisted. In Frontiers in Education, 1992. Proceedings. Twenty-Second Annual

Conference, pages 95–102, Vanderbilt University, November 11–15, 1992.

[30] Paul J. Nahin. When Least is Best: How Mathematicians Discovered Many Clever

Ways to Make Things as Small (or as Large) as Possible. Princeton University Press,

Princeton, New Jersey, 2004.

[31] Roger B. Nelsen. Proofs without Words: Exercises in Visual Thinking. Mathematical

Association of America, Washington, DC, 1997.

125

[32] Roger B. Nelsen. Proofs without Words II: More Exercises in Visual Thinking. Math-

ematical Association of America, Washington, DC, 2000.

[33] Robert A. Nelson and M. G. Olsson. The pendulum: Rich physics from a simple

system. American Journal of Physics, 54(2):112–121, 1986.

[34] R. C. Pankhurst. Dimensional Analysis and Scale Factors. Chapman and Hall, Lon-

don, 1964.

[35] George Polya. Induction and Analogy in Mathematics, volume 1 of Mathematics and

Plausible Reasoning. Princeton University Press, Princeton, New Jersey, 1954.

[36] George Polya. Patterns of Plausible Inference, volume 2 of Mathematics and Plausible

Reasoning. Princeton University Press, Princeton, New Jersey, 1954.

[37] George Polya. How to Solve It: A New Aspect of the Mathematical Method. Princeton

University Press, Princeton, New Jersey, 1957/2004.

[38] Edward M. Purcell. Life at low Reynolds number. American Journal of Physics,

45(1):3–11, 1977.

[39] Gilbert Ryle. The Concept of Mind. Hutchinson’s University Library, London, 1949.

[40] Carl Sagan. Contact. Simon & Schuster, New York, 1985.

[41] E. Salamin. Computation of pi using arithmetic-geometric mean. Mathematics of

Computation, 30:565–570, 1976.

[42] Dava Sobel. Longitude: The True Story of a Lone Genius Who Solved the Greatest

Scientiﬁc Problem of His Time. Walker and Company, New York, 1995.

[43] Richard M. Stallman and Gerald J. Sussman. Forward reasoning and dependency-

directed backtracking in a system for computer-aided circuit analysis. AI Memos

380, MIT, Artiﬁcial Intelligence Laboratory, 1976.

[44] Edwin F. Taylor and John Archibald Wheeler. Spacetime Physics: Introduction to

Special Relativity. W. H. Freeman, New York, 2nd edition, 1992.

[45] Silvanus P. Thompson. Calculus Made Easy: Being a Very-Simplest Introduction to

Those Beautiful Methods of Reasoning Which are Generally Called by the Terrifying

Names of the Diﬀerential Calculus and the Integral Calculus. Macmillan, New York,

2nd edition, 1914.

[46] D. J. Tritton. Physical Fluid Dynamics. Oxford University Press, New York, 2nd

edition, 1988.

[47] US Bureau of the Census. Statistical Abstracts of the United States: 1992. Govern-

ment Printing Oﬃce, Washington, DC, 112th edition, 1992.

[48] Max Wertheimer. Productive Thinking. Harper, New York, enlarged edition, 1959.

[49] Paul Zeitz. The Art and Craft of Problem Solving. Wiley, Hoboken, New Jersey, 2nd

edition, 2007.

Index

An italic page number refers to a problem on that page.

ν

see kinematic viscosity

1 or few

see few

≈ (approximately equal) 6

π, computing

arctangent series 64

Brent–Salamin algorithm 65

∝ (proportional to) 6

∼ (twiddle) 6, 44

ω

see angular frequency

analogy, reasoning by 99–121

dividing space with planes 103–107

generating conjectures

see conjectures: generating

operators 107–113

left shift (L) 108–109

summation (Σ) 109

preserving crucial features 100, 118,

120

pyramid volume 19

spatial angles 99–103

tangent-root sum 118–121

testing conjectures

see conjectures: testing

to polynomials 118–121

transforming dependent variable 101

angles, spatial 99–103

angular frequency 44

Aristotle xiv

arithmetic–geometric mean 65

arithmetic-mean–geometric-mean in-

equality 60–66

applications 63–66

computing π 64–66

maxima 63–64

equality condition 62

numerical examples 60

pictorial proof 61–63

symbolic proof 61

arithmetic mean

see also geometric mean

picture for 62

asymptotes of tanx 114

atmospheric pressure 34

back-of-the-envelope estimates

correcting 78

mental multiplication in 77

minimal accuracy required for 78

powers of 10 in 78

balancing 41

Basel sum (

¸

n

−2

) 76, 113, 116, 121

beta function 98

big part, correcting the

see also taking out the big part

additive messier than multiplicative

corrections 80

using multiplicative corrections

see fractional changes

using one or few 78

big part, taking out

see taking out the big part

128

binomial coeﬃcients 96, 107

binomial distribution 98

binomial theorem 90, 97

bisecting a triangle 70–73

bits, CD capacity in 78

blackbody radiation 87

boundary layers 27

brain evolution 57

Buckingham, Edgar 26

calculus, fundamental idea of 31

CD-ROM

see also CD

same format as CD 77

CD/CD-ROM, storage capacity 77–79

characteristic magnitudes (typical magni-

tudes) 44

characteristic times 44

checking units 78

circle

area from circumference 76

as polygon with many sides 72

comparisons, nonsense with diﬀerent

dimensions 2

cone free-fall distance 35

cone templates 21

conical pendulum 48

conjectures

discarding coincidences 105, 119

explaining 119

generating 100, 103, 104, 105

probabilities of 105

testing 100, 101, 104, 106, 111, 119

getting more data 100, 105, 106

constants of proportionality

Stefan–Boltzmann constant 11

constraint propagation 5

contradictions 20

convergence, accelerating 65, 68

convexity 104

copyright raising book prices 82

Corﬁeld, David 105

cosine

integral of high power 94–97

small-angle approximation

derived 86

used 95

cube, bisecting 73

d (diﬀerential symbol) 10, 43

degeneracies 103

derivative as a ratio 38

derivatives

approximating with nonzero Δx 40

secant approximation 38

errors in 39

improved starting point 39

large error 38

vertical translation 39

second

dimensions of 38

secant approximation to 38

signiﬁcant-change approximation

40–41

acceleration 43

Navier–Stokes derivatives 45

scale and translation invariance 40

translation invariance 40

desert-island method 32

diﬀerential equations

checking dimensions 42

linearizing 47, 51–54

orbital motion 12

pendulum 46

simplifying into algebraic equations

43–46

spring–mass system 42–45

exact solution 45

pendulum equation 47

dimensional analysis

see dimensions, method of; dimension-

less groups

dimensionless constants

Gaussian integral 10

simple harmonic motion 48

Stefan–Boltzmann law 11

dimensionless groups 24

drag 25

free-fall speed 24

pendulum period 48

spring–mass system 48

129

dimensionless quantities

depth of well 94

fractional change times exponent 89

have lower entropy 94

having lower entropy 81

dimensions

L for length 5

retaining 5

T for time 5

versus units 2

dimensions, method of 1–12

see also dimensionless groups

advantages 6

checking diﬀerential equations 42

choosing unspeciﬁed dimensions 7,

8–9

compared with easy cases 15

constraint propagation 5

drag 23–26

guessing integrals 7–11

Kepler’s third law 12

pendulum 48–49

related-rates problems 12

robust alternative to solving diﬀeren-

tial equations 5

Stefan–Boltzmann law 11

dimensions of

angles 47

d (diﬀerential) 10

dx 10

exponents 8

integrals 9

integration sign

¸

9

kinematic viscosity ν 22

pendulum equation 47

second derivative 38, 43

spring constant 43

summation sign Σ 9

drag 21–29

depth-of-well estimate, eﬀect on 93

high Reynolds number 28

low Reynolds number 30

quantities aﬀecting 23

drag force

see drag

e

in fractional changes 90

earth

surface area 79

surface temperature 87

easy cases 13–30

adding odd numbers 58

beta-function integral 98

bisecting a triangle 70

bond angles 100

checking formulas 13–17

compared with dimensions 15

ellipse area 16–17

ellipse perimeter 65

fewer lines 104

fewer planes 103

guessing integrals 13–16

high dimensionality 103

high Reynolds number 27

large exponents 89

low Reynolds number 30

of inﬁnite sound speed 92, 94

pendulum

large amplitude 49–51

small amplitude 47–48

polynomials 118

pyramid volume 19

roots of tanx = x 114

simple functions 108, 112

synthesizing formulas 17

truncated cone 21

truncated pyramid 18–21

ellipse

area 17

perimeter 65

elliptical orbit

eccentricity 87

position of sun 87

energy conservation 50

energy consumption in driving 82–84

eﬀect of longer commuting time 83

entropy of an expression

see low-entropy expressions

entropy of mixing 81

equality, kinds of 6

130

estimating derivatives

see derivatives, secant approximation;

derivatives, signiﬁcant-change approxi-

mation

Euler 113

see also Basel sum

beta function 98

Euler–MacLaurin summation 112

Evolving Brains 57

exact solution

invites algebra mistakes 4

examples

adding odd numbers 58–60

arithmetic-mean–geometric-mean in-

equality 60–66

babies, number of 32–33

bisecting a triangle 70–73

bond angle in methane 99–103

depth of a well 91–94

derivative of cos x, estimating 40–41

dividing space with planes 103–107

drag on falling paper cones 21–29

ellipse area 16–17

energy savings from 55 mph speed

limit 82–84

factorial function 36–37

free fall 3–6

Gaussian integral using dimensions

7–11

Gaussian integral using easy cases

13–16

logarithm series 66–70

maximizing garden area 63–64

multiplying 3.15 by 7.21

using fractional changes 79–80

using one or few 79

operators

left shift (L) 108–109

summation (Σ) 109–113

pendulum period 46–54

power of multinationals 1–3

rapidly computing 1/13 84–85

seasonal temperature ﬂuctuations

86–88

spring–mass diﬀerential equation

42–45

square root of ten 85–86

storage capacity of a CD-ROM or CD

77–79

summing lnn! 73–75

tangent-root sum 113–121

trigonometric integral 94–97

volume of truncated pyramid 17–21

exponential

decaying, integral of 33

outruns any polynomial 36

exponents, dimensions of 8

extreme cases

see easy cases

factorial

integral representation 36

Stirling’s formula

Euler–MacLaurin summation 112

lumping 36–37

pictures 74

summation representation 73

summing logarithm of 73–75

few

as geometric mean 78

as invented number 78

for mental multiplication 78

fractional changes

cube roots 86

cubing 83, 84

do not multiply 83

earth–sun distance 87

estimating wind power 84

exponent of −2 86

exponent of 1/4 87

general exponents 84–90

increasing accuracy 85, 86

introduced 79–80

large exponents 89–90, 95

linear approximation 82

multiplying 3.15 by 7.21 79

negative and fractional exponents

86–88

no plausible alternative to adding 82

picture 80

small changes add 82

square roots 85–86

131

squaring 82–84

tangent-root sum 117

free fall

analysis using dimensions 3–6

depth of well 91–94

diﬀerential equation 4

impact speed (exact) 4

with initial velocity 30

fudging 33

fuel eﬃciency 85

Gaussian integral

closed form, guessing 14, 16

extending limits to ∞ 96

tail area 55

trapezoidal approximation 14

using dimensions 7–11

using easy cases 13–16

using lumping 34, 35

GDP, as monetary ﬂow 1

geometric mean

see also arithmetic mean; arith-

metic-mean–geometric-mean theorem

deﬁnition 60

picture for 61

three numbers 63

gestalt understanding 59

globalization 1

graphical arguments

see pictorial proofs

high-entropy expressions

see also low-entropy expressions

from quadratic formula 92

How to Solve It xiii

Huygens 48

induction proof 58

information theory 81

integration

approximating as multiplication

see lumping

inverse of diﬀerentiation 109

numerical 14

operator 109

intensity of solar radiation 86

isoperimetric theorem 73

Jaynes, Edwin Thompson 105

Jeﬀreys, Harold 26

Kepler’s third law 25

kinematic viscosity (ν) 21, 27

Landau Institute, daunting trigonomet-

ric integral from 94

L (dimension of length) 5

Lennard–Jones potential 41

life expectancy 32

little bit (meaning of d) 10, 43

logarithms

analyzing fractional changes 90

integral deﬁnition 67

rational-function approximation 69

low-entropy expressions

basis of scientiﬁc progress 81

dimensionless quantities are often

81

fractional changes are often 81

from successive approximation 93

high-entropy intermediate steps 81

introduced 80–82

reducing mixing entropy 81

roots of tanx = x 114

lumping 31–55

1/e heuristic 34

atmospheric pressure 34

circumscribed rectangle 67

diﬀerential equations 51–54

estimating derivatives 37–41

inscribed rectangle 67

integrals 33–37

pendulum, moderate amplitudes 51

population estimates 32–33

too much 52

Mars Climate Orbiter, crash of 3

Mathematics and Plausible Reasoning xiii

mathematics, power of abstraction 7

maxima and minima 41, 70

arithmetic-mean–geometric-mean in-

equality 63–64

132

box volume 64

trigonometry 64

mental division 33

mental multiplication

using one or few

see few

method versus trick 69

mixing entropy 81

Navier–Stokes equations

diﬃcult to solve 22

inertial term 45

statement of 21

viscous term 46

Newton–Raphson method 76, 117, 118

numerical integration 14

odd numbers, sum of 58–60

one or few

if not accurate enough 79

operators

derivative (D) 107

exponential of 108

ﬁnite diﬀerence (Δ) 110

integration 109

left shift (L) 108–109

right shift 109

summation (Σ) 109–113

parabola, area without calculus 76

Pascal’s triangle 107

patterns, looking for 90

pendulum

diﬀerential equation 46

in weaker gravity 52

period of 46–54

perceptual abilities 58

pictorial proofs 57–76

adding odd numbers 58–60

area of circle 76

arithmetic-mean–geometric-mean in-

equality 60–63, 76

bisecting a triangle 70–73

compared to induction proof 58

dividing space with planes 107

factorial 73–75

logarithm series 66–70

Newton–Raphson method 76

roots of tanx = x 114

volume of sphere 76

pictorial reasoning

depth of well 94

plausible alternatives

see low-entropy expressions

Polya, George 105

population, estimating 32

power of multinationals 1–3

powers of ten 78

proportional reasoning 18

pyramid, truncated 17

quadratic formula 91

high entropy 92

versus successive approximation 93

quadratic terms

ignoring 80, 82, 84

including 85

range formula 30

rapid mental division 84–85

rational functions 69, 101

Re

see Reynolds number

related-rates problems 12

rewriting-as-a-ratio trick 68, 70, 86

Reynolds number (Re) 27

high 27

low 30

rigor xiii

rigor mortis xiii

rounding

to nearest integer 79

using one or few 78

scale invariance 40

seasonal temperature changes 86–88

seasonal temperature ﬂuctuations

alternative explanation 88

secant approximation

see derivatives, secant approximation

secant line, slope of 38

133

second derivatives

see derivatives, second

Shannon–Nyquist sampling theorem

78

signiﬁcant-change approximation

see derivatives, signiﬁcant-change

approximation

similar triangles 61, 70

simplifying problems

see taking out the big part; lumping;

easy cases; analogy

sine, small-angle approximation

derived 47

used 86

small-angle approximation

cosine 95

sine 47, 66

solar-radiation intensity 86

space, dividing with planes 103–107

spectroscopy 35

sphere, volume from surface area 76

spring–mass system 42–45

spring constant

dimensions of 43

Hooke’s law, in 42

statistical mechanics 81

Stefan–Boltzmann constant 11, 87

Stefan–Boltzmann law

derivation 11

requires temperature in Kelvin 88

to compute surface temperature 87

stiﬀness

see spring constant

Stirling’s formula

see factorial: Stirling’s formula

successive approximation

see also taking out the big part

depth of well 92–94

low-entropy expressions 93

physical insights 93

robustness 93

versus quadratic formula 93

summation

approximately integration 113, 114

Euler–MacLaurin 112, 113

indeﬁnite 110

integral approximation 74

operator 109–113

represented using diﬀerentiation 112

tangent roots 113–121

triangle correction 74, 113, 115

symbolic reasoning

brain evolution 57

seeming like magic 61

symmetry 72

taking out the big part 77–98

depth of well 92–94

polynomial extrapolation 106, 107

tangent-root sum 114, 117–118

trigonometric integral 94–97

Taylor series

factorial integrand 37

general 66

logarithm 66, 69

cubic term 68

pendulum period 53

tangent 118, 120

L (dimension of length) 5

tetrahedron, regular 99

The Art and Craft of Problem Solving xiii

thermal expansion 82

Thompson, Silvanus 10

thought experiments 18, 50

tools

see dimensions, method of; easy cases;

lumping; pictorial proofs; taking out

the big part; analogy, reasoning by

transformations

logarithmic 36

taking cosine 101

trapezoidal approximation 14

tricks

multiplication by one 85

rewriting as a ratio 68, 70, 86

variable transformation 36, 101

trick versus method 69

tutorial teaching xiv

under- or overestimate?

approximating depth of well 92, 93

computing square roots 86

134

lumping analysis 54

summation approximation 75

tangent-root sum 115

using one or few 79

units

cancellation 78

Mars Climate Orbiter, crash of 3

separating from quantities 4

versus dimensions 2

Wertheimer, Max 59

This book was created entirely with free software and fonts. The text

is set in Palatino, designed by Hermann Zapf and available as TeX Gyre

Pagella. The mathematics is set in Euler, also designed by Hermann Zapf.

Maxima 5.17.1 and the mpmath Python library aided several calculations.

The source ﬁles were created using many versions of GNU Emacs and

managed using the Mercurial revision-control system. The ﬁgure source

ﬁles were compiled with MetaPost 1.208 and Asymptote 1.88. The T

E

X

source was compiled to PDF using ConTeXt 2009.10.27 and PDFTeX 1.40.10.

The compilations were managed with GNU Make 3.81 and took 10 min on

a 2006-vintage laptop. All software was running on Debian GNU/Linux.

I warmly thank the many contributors to the software commons.

Street-Fighting Mathematics

**Street-Fighting Mathematics
**

The Art of Educated Guessing and Opportunistic Problem Solving

Sanjoy Mahajan

Foreword by Carver A. Mead

The MIT Press Cambridge, Massachusetts London, England

**2010 by Sanjoy Mahajan Foreword C 2010 by Carver A. Mead
**

C

Street-Fighting Mathematics: The Art of Educated Guessing and Opportunistic Problem Solving by Sanjoy Mahajan (author), Carver A. Mead (foreword), and MIT Press (publisher) is licensed under the Creative Commons Attribution–Noncommercial–Share Alike 3.0 United States License. A copy of the license is available at http://creativecommons.org/licenses/by-nc-sa/3.0/us/ For information about special quantity discounts, please email special_sales@mitpress.mit.edu Typeset in Palatino and Euler by the author using ConTEXt and PDFTEX

Library of Congress Cataloging-in-Publication Data Mahajan, Sanjoy, 1969– Street-ﬁghting mathematics : the art of educated guessing and opportunistic problem solving / Sanjoy Mahajan ; foreword by Carver A. Mead. p. cm. Includes bibliographical references and index. ISBN 978-0-262-51429-3 (pbk. : alk. paper) 1. Problem solving. 2. Hypothesis. 3. Estimation theory. I. Title. QA63.M34 2010 510—dc22 2009028867 Printed and bound in the United States of America 10 9 8 7 6 5 4 3 2 1

For Juliet

.

Brief contents Foreword Preface 1 2 3 4 5 6 Dimensions Easy cases Lumping Pictorial proofs Taking out the big part Analogy Bibliography Index xi xiii 1 13 31 57 77 99 123 127 .

.

3 Approximating the logarithm 4.2 Newtonian mechanics: Free fall 1.3 Solid geometry: The volume of a truncated pyramid 2.2 Estimating integrals 3.1 Estimating populations: How many babies? 3.4 Bisecting a triangle 4.4 Summary and further problems Easy cases 2.1 Gaussian integral revisited 2.4 Fluid mechanics: Drag 2.5 Summing series 4.3 Estimating derivatives 3.2 Plane geometry: The area of an ellipse 2.Contents Foreword Preface 1 Dimensions 1.5 Summary and further problems Lumping 3.1 Economics: The power of multinational corporations 1.5 Predicting the period of a pendulum 3.1 Adding odd numbers 4.6 Summary and further problems xi xiii 1 1 3 7 11 13 13 16 17 21 29 31 32 33 37 42 46 54 57 58 60 66 70 73 75 2 3 4 .2 Arithmetic and geometric means 4.4 Analyzing diﬀerential equations: The spring–mass system 3.6 Summary and further problems Pictorial proofs 4.3 Guessing integrals 1.

1 Spatial trigonometry: The bond angle in methane 6.x 5 Taking out the big part 5.5 Daunting trigonometric integral 5.4 Successive approximation: How deep is the well? 5.1 Multiplication using one and few 5.6 Summary and further problems Analogy 6.3 Fractional changes with general exponents 5.2 Topology: How many regions? 6.5 Bon voyage Bibliography Index 77 77 79 84 91 94 97 99 99 103 107 113 121 123 127 6 .2 Fractional changes and low-entropy expressions 5.4 Tangent roots: A daunting transcendental sum 6.3 Operators: Euler–MacLaurin summation 6.

But he leads us through one. on my own. he brings us up to another level. as they are taught today. and as a tool for deriving quantitative conclusions from these relationships. the mathematics that I have found most useful was learned in science and engineering classes. gleaning gems of insight along the way. Sanjoy Mahajan teaches us. In this little book are insights for every one of us. or from this book.Foreword Most of us took mathematics courses from mathematicians—Bad Idea! Mathematicians see mathematics as an area of study in its own right. For that purpose. tools that work in the real world. The rest of us use mathematics as a precise language for expressing relationships among quantities in the real world. I promised myself that if I ever became a teacher. mathematics courses. I have spent my life trying to ﬁnd direct and transparent ways of seeing reality and trying to express these insights quantitatively. I have personally adopted several of the techniques that you will ﬁnd here. My personal favorite is the approach to the Navier–Stokes equations: so nasty that I would never even attempt a solution. Just when we think that a topic is obvious. —Carver Mead . I recommend it highly to every one of you. As a student. I would never put a student through that kind of teaching. in the most friendly way. and I have never knowingly broken my promise. Street-Fighting Mathematics is a breath of fresh air. With rare exceptions. are seldom helpful and are often downright destructive.

.

estimating drag forces without solving the Navier–Stokes equations. refuting a common argument in the media. The students varied widely in experience: from ﬁrst-year undergraduates to graduate students ready for careers in research and teaching.Preface Too much mathematical rigor teaches rigor mortis: the fear of making an unjustiﬁed leap even when it lands on a correct result. This book grew out of a short course of the same name that I taught for several years at MIT. guessing bond angles. Although unwise as public policy. A calculation accurate only to a factor of 2 may show that a proposed bridge would never be built or a circuit could never work. Educated guessing and opportunistic problem solving require a toolbox. A tool. The diverse examples help separate the tool—the general principle—from the particular applications so that you can grasp and transfer the tool to problems of particular interest to you. They teach how to solve exactly stated problems exactly. and summing inﬁnite series whose every term is unknown and transcendental. to paraphrase George Polya. This book complements works such as How to Solve It [37]. The students also varied widely in specialization: . 36]. This book builds. is a trick I use twice. The examples used to teach the tools include guessing integrals without integrating. Mathematics and Plausible Reasoning [35. sharpens. Instead of paralysis. it is a valuable problem-solving philosophy. whereas life often hands us partly deﬁned problems needing only moderately accurate solutions. and it is the theme of this book: how to guess answers without a proof or an exact calculation. ﬁnding the shortest path that bisects a triangle. and demonstrates tools useful across diverse ﬁelds of human knowledge. The eﬀort saved by not doing the precise analysis can be spent inventing promising new designs. and The Art and Craft of Problem Solving [49]. extracting physical properties from nonlinear diﬀerential equations. have courage—shoot ﬁrst and ask questions later.

the students seemed to beneﬁt from the set of tools and to enjoy the diversity of illustrations and applications. marked with a shaded background. For editorial guidance: Katherine Almeida and Robert Prior. Therefore. They are answered in the subsequent text.xiv Preface from physics. For the title: Carl Moyer. to extend an example. are what a tutor might give you to take home after a tutorial. and we will gladly receive any corrections and suggestions. thorough reviews of the manuscript: Michael Gottlieb. to use several tools together. I wish the same for you. computer science. and Carver Mead. where you can check your solutions and my analysis. . Despite or because of the diversity. a skilled and knowledgeable tutor is the most eﬀective teacher [8]. David MacKay. For sweeping. How to use this book Aristotle was tutor to the young Alexander of Macedon (later. David Hogg. questions of two types are interspersed through the book. improve. mathematics. and even to resolve (apparent) paradoxes. wondering. and biology. and discussing promote long-lasting learning. Acknowledgments I gratefully thank the following individuals and organizations. and management to electrical engineering. for she knows that questioning. Try many questions of both types! Copyright license This book is licensed under the same license as MIT’s OpenCourseWare: a Creative Commons Attribution-Noncommercial-Share Alike license. Questions marked with a in the margin: These questions are what a tutor might ask you during a tutorial. The publisher and I encourage you to use. and share the work noncommercially. A skilled tutor makes few statements and asks many questions. Numbered problems: These problems. As ancient royalty knew. and ask you to work out the next steps in an analysis. They ask you to practice the tool. Alexander the Great).

Hans Hagen. Mark Warner. For supporting my work in science and mathematics teaching: The Whitaker Foundation in Biomedical Engineering. John Hobby (MetaPost). and the Python community. Peter Goldreich. David Middlebrook.Preface xv For being inspiring teachers: John Allman. and especially Roger Baker. For advice on free licensing: Daniel Ravicher and Richard Stallman. Cambridge. Jon Kettenring. Rahul Sarpeshkar. Aditya Mahajan. and Joshua Zucker. the Maxima project. and the Debian GNU/Linux project. Richard Stallman (Emacs). John Hopﬁeld. Edwin Taylor. For advice on the book design: Yasuyo Iguchi. Arthur Eisenkraft. Dennis Freeman. Elisabeth Moyer. let’s welcome a visitor from physics and engineering: the method of dimensional analysis. John Williams. and Edwin Taylor. and the Trustees of the Gatsby Charitable Foundation. For advice on the process of writing: Carver Mead and Hillary Rettig. Han The Thanh (PDFTEX). For many valuable suggestions and discussions: Shehu Abdussalam. the MIT Teaching and Learning Laboratory and the Oﬃce of the Dean for Undergraduate Education. Jozef Hanc. Taco Hoekwater. and Tom Prince (Asymptote). Stephen Hou. Daniel Corbett. Madeleine Sheldon-Dante. Michael Godfrey. Benjamin Rapoport. Andy Hammerlindl. . Hubert Pham. Kayla Jacobs. the Master and Fellows of Corpus Christi College. Donald Knuth (TEX). Sterl Phinney. the Hertz Foundation. Carver Mead. Bon voyage As our ﬁrst tool. John Bowman. For the free software used for typesetting: Hans Hagen and Taco Hoekwater (ConTEXt). Haynes Miller. For the free software used for calculations: Fredrik Johansson (mpmath). Matt Mackall (Mercurial). Donald Knuth. Tadashi Tokieda. Geoﬀrey Lloyd.

.

It becomes evident after unpacking the meaning of GDP.3 1. explore the following question: What is the most egregious fault in the comparison between Exxon and Nigeria? The ﬁeld is competitive. a relatively economically strong country. The net worth of Exxon is $119 billion. dimensions. what kind of power relationship are we talking about?” asks Laura Morosini. To show its diversity of application.1 Economics: The power of multinational corporations Critics of globalization often make the following comparison [25] to prove the excessive power of multinational corporations: In Nigeria. 1. “When multinationals have a net worth higher than the GDP of the country in which they operate. the tool is introduced with an economics example and sharpened on examples from Newtonian mechanics and integral calculus.2 1. is an astronomical phenomenon that . when abbreviated.1 Dimensions 1. which is the time for the earth to travel around the sun. the GDP [gross domestic product] is $99 billion. A year. but one fault stands out. A GDP of $99 billion is shorthand for a monetary ﬂow of $99 billion per year.1 1.4 Economics: The power of multinational corporations Newtonian mechanics: Free fall Guessing integrals Summary and further problems 1 3 7 11 Our ﬁrst street-ﬁghting tool is dimensional analysis or. Before continuing.

Nigeria’s GDP becomes $2 billion per week. so he was leaving it unchanged! . monetary ﬂow. (A dimension is general and independent of the system of measurement. To produce the opposite but still nonsense conclusion. suppose the week were the unit of time for measuring GDP. He replied that I had made an interesting point but that the numerical comparison showing the country’s weakness was stronger as he had written it. whereas the unit is how that dimension is measured in a particular system. Now Nigeria towers over Exxon. which is 300 m high. charge. Problem 1. Suppose instead that economists had chosen the decade as the unit of time for measuring GDP. which is 300 m high. I once wrote to one author explaining that I sympathized with his conclusion but that his argument contained a fatal dimensional mistake. is a ﬂow or rate: It has dimensions of money per time and typical units of dollars per year. To deduce the opposite conclusion.2 1 Dimensions has been arbitrarily chosen for measuring a social phenomenon—namely. Then Nigeria’s GDP (assuming the ﬂow remains steady from year to year) would be roughly $1 trillion per decade and be reported as $1 trillion. Because their dimensions diﬀer. The mistake lies in comparing incomparable quantities. the comparison is a category mistake [39] and is therefore guaranteed to generate nonsense. and seconds units or dimensions? What about energy. GDP. Net worth is an amount: It has dimensions of money and is typically measured in units of dollars. however.1 Units or dimensions? Are meters. power.” I often see comparisons of corporate and national power similar to our Nigeria–Exxon example. and force? A similarly ﬂawed comparison is length per time (speed) versus length: “I walk 1.) Comparing net worth to GDP compares a monetary amount to a monetary ﬂow. 50-fold larger than Nigeria. whose puny assets are a mere one-tenth of Nigeria’s GDP. measure time in hours: “I walk 5400 m/hr— much larger than the Empire State building. kilograms. reported as $2 billion.” It is nonsense. A valid economic argument cannot reach a conclusion that depends on the astronomical phenomenon chosen to measure time.5 m s−1 —much smaller than the Empire State building in New York. Now puny Nigeria stands helpless before the mighty Exxon.

but it is not suﬃcient. As a contrary example showing what not to do. 1. thruster performance data in English units instead of metric units was used in the software application code titled SM_FORCES (small forces). so retaining the ﬂawed comparison was not even expedient! That compared quantities must have identical dimensions is a necessary condition for making valid comparisons. or Exxon’s net worth with Nigeria’s net worth. This valid comparison is stronger than the ﬂawed one. p. .2 Newtonian mechanics: Free fall Dimensions are useful not just to debunk incorrect arguments but also to generate correct ones. By 2006. according to the Mishap Investigation Board (MIB). A ﬁle called Angular Momentum Desaturation (AMD) contained the output data from the SM_FORCES software. the quantities in a problem need to have dimensions. on the news. and the trajectory modelers assumed the data was provided in metric units per the requirements. To do so. which crashed into the surface of Mars rather than slipping into orbit around it. here is how many calculus textbooks introduce a classic problem in motion: A ball initially at rest falls from a height of h feet and hits the ground at a speed of v feet per second. in the newspaper. Small Forces. A costly illustration is the 1999 Mars Climate Orbiter (MCO). Exxon had become Exxon Mobil with annual revenues of roughly $350 billion—almost twice Nigeria’s 2006 GDP of $200 billion. used in trajectory models. was a mismatch between English and metric units [26. Find v assuming a gravitational acceleration of g feet per second squared and neglecting air resistance. Make sure to mind your dimensions and units. The data in the AMD ﬁle was required to be in metric units per existing software interface documentation. or on the Internet—that are dimensionally faulty. Because net worths of countries are not often tabulated. Problem 1. whereas corporate revenues are widely available. 6]: The MCO MIB has determined that the root cause for the loss of the MCO spacecraft was the failure to use metric units in the coding of a ground software ﬁle. try comparing Exxon’s annual revenues with Nigeria’s GDP.2 Newtonian mechanics: Free fall 3 A dimensionally valid comparison would compare like with like: either Nigeria’s GDP with Exxon’s revenues. The cause.2 Finding bad comparisons Look for everyday comparisons—for example. Speciﬁcally.1.

and v are dimensionless. the ball meets the ground. This analysis invites several algebra mistakes: forgetting to take a square root when solving for t0 . and their inclusion creates a signiﬁcant problem. Thus the impact time t0 is √ 2h/g. we must instead solve the following diﬀerential equation with initial conditions: d2 y = −g. Practice—in other words. so dimensional analysis cannot help us guess the impact speed. 2 (1.2) Using the solutions for the ball’s position and velocity in Problem 1. then the dimension of length would belong to h. We would like less error-prone methods. Giving up the valuable tool of dimensions is like ﬁghting with one hand tied behind our back. h. Problem 1. or dividing rather than multiplying by g when ﬁnding the impact velocity. making and correcting many mistakes—reduces their prevalence in simple problems. (For h to have dimensions. It is therefore always dimensionally valid. The impact velocity is −gt0 or − 2gh.) A similar explicit speciﬁcation of units means that the variables g and v are also dimensionless. Therefore the impact √ speed (the unsigned velocity) is 2gh. any comparison of v with quantities derived from g and h is a comparison between dimensionless quantities. dt2 (1.4 1 Dimensions The units such as feet or feet per second are highlighted in boldface because their inclusion is so frequent as to otherwise escape notice.3 Calculus solution Use calculus to show that the free-fall diﬀerential equation d2 y/dt2 = −g with initial conditions y(0) = h and dy/dt = 0 at t = 0 has the following solution: dy = −gt dt and 1 y = − gt2 + h. and g is the gravitational acceleration. . the variable h does not contain the units of height: h is therefore dimensionless. Thereby constrained. with y(0) = h and dy/dt = 0 at t = 0. dy/dt is the ball’s velocity. the problem would instead state simply that the ball falls from a height h. Because the height is h feet.1) where y(t) is the ball’s height.3. Because g. but complex problems with many steps remain mineﬁelds. what is the impact speed? When y(t) = 0.

and v. so it is useful even if meters. The strongest constraint is that the combination of g and h. the restatement gives dimensions to h. Most importantly. the restatement is more general. being a speed. A speed has dimensions of LT−1 .1. Problem 1. so v is a function of g and h with dimensions of LT−1 . let’s restate the free-fall problem so that the quantities retain their dimensions: A ball initially at rest falls from a height h and hits the ground at speed v. √ 1/2 = L2 T−2 = LT−1 . The dimensions of height h are simply length or. g. It makes no assumption about the system of units. power. every candidate impact speed. should have dimensions of inverse time (T−1 ). mass M. and time T. ﬁrst. The dimensions of gravitational acceleration g are length per time squared or LT−2 . Second. Therefore.3) Is gh the only combination of g and h with dimensions of speed? √ In order to decide whether gh is the only possibility. no matter how absurd. equates dimensionless quantities and therefore has valid dimensions. it cannot help construct T −1 .4 Dimensions of familiar quantities In terms of the basic dimensions length L. The restatement is. L. or furlongs are the unit of length. and torque? What combination of g and h has dimensions of speed? √ The combination gh has dimensions of speed. g. But this tool requires that at least one quantity among v. Their dimensions will almost uniquely determine the impact speed—without our needing to solve a diﬀerential equation. Because h contains no dimensions of time.2 Newtonian mechanics: Free fall 5 One robust alternative is the method of dimensional analysis. Otherwise. cubits. where T represents the dimension of time. LT−2 × L g h speed (1. Find v assuming a gravitational acceleration of g feet per second squared and neglecting air resistance. Because √ . use constraint propagation [43]. what are the dimensions of energy. for short. shorter and crisper than the original phrasing: A ball initially at rest falls from a height of h feet and hits the ground at a speed of v feet per second. and h have dimensions. Find v assuming a gravitational acceleration g and neglecting air resistance.

6 1 Dimensions √ g contains T −2 . in general. such as the dimensionless factor 2. Furthermore.27 m s−2 (on the asteroid Ceres) to 25 m s−2 (on Jupiter). Exact answers have all factors and terms.√ permitting less important information. The two constraints thereby determine uniquely how g and h appear in the impact speed v. In this example. Much variation in the impact speed. however. (1.4) Including this ∼ notation. The factor-of-100 variation in height contributes a factor-of-10 variation in impact speed. √ √ The exact expression for v is. Similarly. ∼ ≈ equality except perhaps for a factor without dimensions. so the √ missing L1/2 must come from h. Chapter 22]. It could be gh. The g already contributes L1/2 . not calculating the exact answer can be an advantage. Then ﬁnd the exact time by solving the free-fall diﬀerential equation.5 Vertical throw You throw a ball directly upward with speed v0 . As William James advised. to obscure important information √ such as gh. √ √ The exact impact speed is 2gh. and these factors are often unimportant. “The art of being wise is the art of knowing what to overlook” [19. The second constraint is √ that the combination contain L1 . we have several species of equality: ∝ equality except perhaps for a factor with dimensions. the gravitational acceleration might vary from 0. 2gh. Problem 1. The factor-of-100 variation in g contributes another factor-of-10 variation in impact speed. (1. so the dimensions result gh contains the entire functional dependence! It lacks only the dimensionless factor √ 2. gh × dimensionless constant.5) equality except perhaps for a factor close to 1. the T −1 must come from g. The idiom of multiplication by a dimensionless constant occurs frequently and deserves a compact notation akin to the equals sign: v∼ gh. What dimensionless factor was missing from the dimensional-analysis result? . √ or. not unique. √ comes not from the dimensionless factor 2 but rather from the symbolic factors—which are computed exactly by dimensional analysis. therefore. Use dimensional analysis to estimate how long the ball takes to return to your hand (neglecting air resistance). the height might vary from a few centimeters (a ﬂea hopping) to a few meters (a cat jumping from a ledge).

in the following equation. let’s apply it to the general deﬁnite Gaussian integral ∞ −∞ e−αx dx.9) ∞ 2 Unlike its speciﬁc cousin with α = 5. probability theory uses the Gaussian integral x2 x1 e−x /2σ dx. How can dimensional analysis be applied without losing the beneﬁts of mathematical abstraction? The answer is to ﬁnd the quantities with unspeciﬁed dimensions and then to assign them a consistent set of dimensions. or much else. the dimensions might be unspeciﬁed—a common case in mathematics because it is a universal language. Thermal physics uses the similar integral e− 2 mv /kT dv.1. For example.6) Alternatively.3 Guessing integrals 7 1. 2 studies their common form e−αx without specifying the dimensions of α and x. Mathematics. such as the 5 and x in the following Gaussian integral: ∞ −∞ e−5x dx ? 2 (1.7) where x could be height.3 Guessing integrals The analysis of free fall (Section 1. To illustrate the approach.8) where v is a molecular speed. as the common language. the general form does not specify the dimensions of x or α—and that openness provides the freedom needed to use the method of dimensional analysis. 2 2 (1. The method requires that any equation be dimensionally valid.2) shows the value of not separating dimensioned quantities from their units. the left and right sides must have identical dimensions: . However. which is the integral −∞ e−5x dx. 2 (1. 1 2 (1. but it makes using dimensional analysis diﬃcult. The lack of speciﬁcity gives mathematics its power of abstraction. detector error. what if the quantities are dimensionless. Thus.

1.3).3. Step 2.1 Assigning dimensions to α The parameter α appears in an exponent. Accordingly. For convenience.13) (1. (1. Step 3. Find the dimensions of the integral (Section 1. the integral must have the same dimensions as f(α). so an exponent is dimensionless. the dimensional-analysis procedure has the following three steps: Step 1. here is 2n : 2n = 2 × 2 × · · · × 2 . The function f might include dimensionless numbers such as 2/3 or but α is its only input with dimensions.3.12) . denote the dimensions of α by [α] and of x by [x]. Therefore.10) Is the right side a function of x? Is it a function of α? Does it contain a constant of integration? The left side contains no symbolic quantities other than x and α. 2 (1. so x disappears upon integration (and no constant of integration appears). Make an f(α) with those dimensions (Section 1. 2 (1. In symbols.3.8 ∞ −∞ 1 Dimensions e−αx dx = something.2).11) √ π. n terms The notion of “how many times” is a pure number. But x is the integration variable and the integral is over a deﬁnite range.1). An exponent speciﬁes how many times to multiply a quantity by itself. For example.3. the right side—the “something”—is a function only of α. Hence the exponent −αx2 in the Gaussian integral is dimensionless. ∞ −∞ e−αx dx = f(α). For the equation to be dimensionally valid. Then [α] [x]2 = 1. and the dimensions of f(α) depend on the dimensions of α. Assign dimensions to α (Section 1.

position and velocity have diﬀerent dimensions. Because e is dimensionless. the German word for sum. Problem 1. The exponential. the summation sign—and therefore the integration sign—do not aﬀect dimensions: The integral sign is dimensionless. For the same reason. 2 (1. making dimensional analysis again useless. However. Here is the integral again: ∞ −∞ e−αx dx. In a valid sum. is merely several copies of e multiplied together.3. length. That choice makes α and f(α) dimensionless.2 Dimensions of the integral The assignments [x] = L and [α] = L−2 determine the dimensions of the Gaussian integral. The simplest eﬀective alternative is to give x simple dimensions—for example. and the notation risks burying the reasoning.1. the integrand e−αx .6 Integrating velocity Position is the integral of velocity.) Then [α] = L−2 . . the entire sum has the same dimensions as any term. How is this diﬀerence consistent with the conclusion that the integration sign is dimensionless? Because the integration sign is dimensionless. The simplest alternative is to make x dimensionless.3 Guessing integrals 9 or [α] = [x]−2 .15) The dimensions of an integral depend on the dimensions of its three 2 pieces: the integral sign . Thus. but continuing to use unspeciﬁed but general dimensions requires lots of notation. 1. all terms have identical dimensions: The fundamental principle of dimensions requires that apples be added only to apples. despite its ﬁerce exponent −αx2 . (This choice is natural if you imagine the x axis lying on the ﬂoor. 2 so is e−αx . (1. The integral sign originated as an elongated S for Summe.14) This conclusion is useful. the dimensions of the inte2 gral are the dimensions of the exponential factor e−αx multiplied by the dimensions of dx. and the diﬀerential dx. so any candidate for f(α) would be dimensionally valid.

3. dx has the same dimensions as x. so dx is a length.10 1 Dimensions What are the dimensions of dx? To ﬁnd the dimensions of dx. Because the dimensions of α are L−2 . follow the advice of Silvanus Thompson [45.” A little length is still a length. To determine the dimensionless constant. Do not do that.3 Making an f(α) with correct dimensions The third and ﬁnal step in this dimensional analysis is to construct an f(α) with the same dimensions as the integral. The α factor is usually much more important than the dimensionless constant. which yields ∞ −∞ e−αx dx = 2 π . f(α) ∼ α−1/2 . Equivalently. the α factor is what dimensional analysis can compute. In general. which lacks only a dimensionless factor. 2 2 (1. the whole integral has dimensions of length: e−αx dx = e−αx × [dx] = L. Areas have dimensions of L2 .7 Don’t integrals compute areas? A common belief is that integration computes areas. was obtained without any integration. 1]: Read d as “a little bit of. Assembling the pieces.17) This useful result.18) This classic integral will be approximated in Section 2. Conveniently.1 and guessed to be √ √ π. α (1. p. (1. The two results f(1) = π and f(α) ∼ α−1/2 require that f(α) = π/α.16) 1 L Problem 1.” Then dx is “a little bit of x. set α = 1 and evaluate f(1) = ∞ −∞ e−x dx. d—the inverse of —is dimensionless. Therefore. the only way to turn α into a length is to form α−1/2 . .19) We often memorize the dimensionless constant but forget the power of α. 2 (1. How then can the Gaussian integral have dimensions of L? 1.

(These results are used in Section 5. Use dimensional analysis to show that I ∝ T 4 and to ﬁnd the constant of proportionality σ.4 Summary and further problems Do not add apples to oranges: Every term in an equation or sum must have identical dimensions! This restriction is a powerful tool. where T is the object’s temperature and kB is Boltzmann’s constant.3. kB T . A useful result is (1. x2 + 1 Problem 1. Thus the blackbody-radiation intensity I depends on c. And it is a quantum phenomenon.3.13 Arcsine integral Use dimensional analysis to ﬁnd 1 − 3x2 dx.12 Stefan–Boltzmann law Blackbody radiation is an electromagnetic phenomenon. 3 1. Problem 1. Here are further problems to practice this tool.8 Change of variable Rewind back to page 8 and pretend that you do not know f(α). so the radiation intensity depends on the speed of light c. show that f(α) ∼ α−1/2 .9 Easy case α = 1 Setting α = 1. and h. It is also a thermal phenomenon. violates the assumption that x is a length and α has dimensions of L−2 . so it depends on Planck’s constant h. 2 2 .21) 1 − x2 dx = arcsin x x 1 − x2 + + C. Why is it okay to set α = 1? Problem 1. so it depends on the thermal energy kB T . Problem 1.10 Integrating a diﬃcult exponential ∞ Use dimensional analysis to investigate 0 e−αt dt. It helps us to evaluate integrals without integrating and to predict the solutions of diﬀerential equations. Then look up the missing dimensionless constant.11 Integrals using dimensions ∞ Use dimensional analysis to ﬁnd 0 e−ax dx and dx .) Problem 1. A useful result is x2 + a 2 (1. which is an example of easy-cases reasoning (Chapter 2).20) dx = arctan x + C. Without doing dimensional analysis.1.4 Summary and further problems 11 Problem 1.

m the mass of the planet.12 1 Dimensions Problem 1. h Problem 1. and r is their separation.22) where G is Newton’s constant. estimate the rate at which the depth is increasing.23) where M is the mass of the sun. r2 (1. Then use calculus to ﬁnd the exact rate. When the water depth is h = 5 m.15 Kepler’s third law Newton’s law of universal gravitation—the famous inverse-square law—says that the gravitational force between two masses is F=− Gm1 m2 . r is the vector from the sun to the planet. m1 and m2 are the two masses. and r is the unit vector in the r direction.14 Related rates Water is poured into a large inverted cone (with a 90◦ opening angle) at a rate dV/dt = 10 m3 s−1 . . For a planet orbiting the sun. universal gravitation together with Newton’s second law gives m d2 r GMm = − 2 r. ˆ dt2 r (1. ˆ How does the orbital period τ depend on orbital radius r? Look up Kepler’s third law and compare your result to it.

5 Gaussian integral revisited Plane geometry: The area of an ellipse Solid geometry: The volume of a truncated pyramid Fluid mechanics: Drag Summary and further problems 13 16 17 21 29 A correct solution works in all cases. so the bell curve narrows to a sliver. and solve exacting diﬀerential equations. even when x is tiny. as α → ∞ the integral shrinks to zero.2 Easy cases 2.4 2. 2 (2. and its area shrinks to zero. This maxim underlies the second tool—the method of easy cases.3. 2. ∞ −∞ e−αx dx.2 2. It will help us guess integrals.1 2. let’s revisit the Gaussian integral from Section 1.1) π/α? Is the integral √ πα or The correct choice must work for all α 0.1 Gaussian integral revisited As the ﬁrst application. The exponen0 1 tial of a large negative number is tiny. including the easy ones. Therefore. This result refutes the option . At this range’s endpoints (α = ∞ and α = 0). What is the integral when α = ∞? As the ﬁrst easy case. deduce volumes.3 2. the integral is easy to evaluate. increase α to ∞. Then −αx2 be2 e−10x comes very negative.

how could you decide between it and π/α ? Both options pass both easy-cases tests. The areas settle onto a stable value. but that method also requires a trick with few other applications (textbooks on multivariable calculus give the gory details). they also have identical dimensions. A less elegant but more general approach is to evaluate the integral numerically and to use the approximate value to guess the closed form. which is inﬁnite when α = ∞. However. 10. . is inﬁnite. To choose. Thus the πα option fails both easy-cases tests. √ it continues as 1.14 √ 2 Easy cases πα.2) This classic integral can be evaluated in closed form by using polar coordinates. integrated over the inﬁnite range.77. and it supports the option which is zero when α = ∞. However. e−x 2 /10 0 1 π/α option If these two options were the only options. and it supports the π/α option.07326300569564 1. This piecewise-linear approximation turns the area into a sum of n trapezoids. and the passes both easy-cases tests. . try a third easy case: α = 1. which is too large to be 3. Therefore.77245385170978 1.77245385090552 .7. The choice looks diﬃcult. Its area. replace the smooth curve e−x with a curve having n line segments. Fortunately. This result refutes √ the πα option. As n approaches inﬁnity. n 10 20 30 40 50 Area 2. so the √ area might be converging to π. What is the integral when α = 0? In the α = 0 extreme. which is zero when α = 0.77245385090552 1. Then the integral simpliﬁes to ∞ −∞ e−x dx. 2 The table gives the area under the curve in the range x = −10 . and it looks familiar.77263720482665 1. π is slightly larger than 3. which might arise from 3. which is inﬁnity when α = √ 0. the bell curve ﬂattens into a horizontal line with unit height. we would choose π/α. after dividing the curve into n line segments. π/α. 2 (2. the area of the trapezoids more and more closely approaches the area under the smooth curve. It begins √ with 1. if a third option were 2/α.

can also restrict the possibilities (Section 1.5) √ must reduce to π when α = 1. 2 (2.1 Gaussian integral revisited 15 Let’s check by comparing the squared area against π: 1.7) use the three easy-cases tests to evaluate the following candidates for its value.1 Testing several alternatives For the Gaussian integral ∞ −∞ e−αx dx. unlike dimensional analysis. π/α.4) Therefore the general Gaussian integral ∞ −∞ e−αx dx 2 (2.3). Each tool has its strengths. It must also behave correctly in the other two easy cases α = 0 and α = ∞. They do not require us to invent or deduce dimensions for x.3) √ π: e−x dx = 2 √ π. and πα. only π/α passes all three tests α = 0. α. 1.772453850905522 ≈ 3. √ √ √ (a) π/α (b) 1 + ( π − 1)/α (c) 1/α2 + ( π − 1)/α. and dx (the extensive analysis of Section 1. π ≈ 3. It even √ eliminates choices like π/α that pass all three easy-cases tests.6) Easy cases are not the only way to judge these choices. Problem 2. easy cases are. α (2. Easy cases. (2. and ∞. by design. √ Among the three choices 2/α. However.2. The close match suggests that the α = 1 Gaussian integral is indeed ∞ −∞ (2.3). incorrect alternative Is there an alternative to easy-cases tests? π/α that has valid dimensions and passes the three . can also eliminate choices like 2/α with correct dimensions.2 Plausible. Problem 2. Dimensional analysis. simple. ∞ −∞ e−αx dx = 2 π .14159265358980. for example.14159265358979. Therefore.

reduces to A = 2a2 . the candidate a3 /b predicts an inﬁnite area. alas. so it fails the b = 0 test. Then guess a closed form for the ﬁrst integral. The candidate A = a2 + b2 has correct dimensions (as do the remaining candidates). the low extreme a = 0 produces an inﬁnitesimally thin ellipse with zero area. 1 + x2 (2. .16 2 Easy cases Problem 2. so it fails the a = b test. When a = 0 or b = 0. For a. the actual and predicted areas are zero. The candidate A = 2ab shows promise. 2. however.8) The second integral has a ﬁnite integration range. whereas an area must have dimensions of L2 . when a = 0 the candidate A = a2 + b2 reduces to A = b2 rather than to 0. so a2 + b2 fails the a = 0 test. b a What are the merits or drawbacks of each candidate? The candidate A = ab2 has dimensions of L3 . It too produces an inﬁnitesimally thin ellipse with zero area. so the next tests are the easy cases of the radii a and b. For its area A consider the following candidates: (a) ab2 (b) a2 + b2 (c) a3 /b (d) 2ab (e) πab. Two candidates remain. so it is easier than the ﬁrst integral to evaluate numerically. its symmetric counterpart b = 0 should also be a useful easy case. so A = 2ab passes both easy-cases tests. This ellipse has semimajor axis a and semiminor axis b. and the axis labels a and b are almost interchangeable. Estimate the second integral using the trapezoid approximation and a computer or programmable calculator.2 Plane geometry: The area of an ellipse The second application of easy cases is from plane geometry: the area of an ellipse. Because a = 0 was a useful easy case. Then the ellipse becomes a circle with radius a and area πa2 .3 Guessing a closed form Use a change of variable to show that ∞ 0 dx =2 1 + x2 1 0 dx . The candidate 2ab. Further testing requires the third easy case: a = b. However. The candidate A = a3 /b correctly predicts zero area when a = 0. Thus ab2 must be wrong.

b. and no simple operation interchanges height with a or b. 2.5 Inventing a passing candidate Can you invent a second candidate for the area that has correct dimensions and passes the a = 0. and a be the side length of its top.3 Solid geometry: The volume of a truncated pyramid The Gaussian-integral example (Section 2. It is a function of the three lengths h. . This truncated pyramid (called the frustum) has a square base and square top parallel to the base. b = 0.3 Solid geometry: The volume of a truncated pyramid 17 The candidate A = πab passes all three tests: a = 0. and c. Thus the volume probably has two factors. (2. and πab is indeed the correct area (Problem 2.2) showed easy cases as a method of analysis: for checking whether formulas are correct. take a pyramid with a square base and slice a piece from its top using a knife parallel to the base. These lengths split into two kinds: height and base lengths. As an example. b) = f(h) × g(a.4 Area by calculus Use integration to show that A = πab. a. ﬂipping the solid on its head interchanges the meanings of a and b but preserves h. Problem 2. b be the side length of its base. a bit of dimensional reasoning and a lot of easy-cases reasoning will determine g. and a = b tests? Problem 2. The next level of sophistication is to use easy cases as a method of synthesis: for constructing formulas.9) a h b Proportional reasoning will determine f. b = 0.4). and a = b. each containing a length or lengths of only one kind: V(h.6 Generalization Guess the volume of an ellipsoid with principal radii a.2. a. Let h be its vertical height. What is the volume of the truncated pyramid? Let’s synthesize the formula for the volume. For example. our conﬁdence in the candidate increases.1) and the ellipse-area example (Section 2. b). With each passing test. and b. Problem 2.

use a proportional-reasoning thought experiment. in addition. Because g has dimensions of L2 . the solid is a rectangular prism having volume V = ha2 (or hb2 ). V ∼ hb2 . Is there a volume formula that satisﬁes the three easy-cases constraints? . and g is a function only of the base side length b. b). Thus f ∼ h and V ∝ h: V = h × g(a. 2. the function g(a.10) What is g : How should the volume depend on a and b? Because V has dimensions of L3 . the solid is an upside-down version of the b = 0 pyramid and therefore has volume V ∼ ha2 . the only possibility for g is g ∼ b2 . This change doubles the volume of each sliver and therefore doubles the whole volume V. the solid is an ordinary pyramid.1 Easy cases What are the easy cases of a and b? The easiest case is the extreme case a = 0 (an ordinary pyramid). and these constraints are provided by the method of easy cases. then imagine doubling h. The symmetry between a and b suggests two further easy cases. When b = 0. Further constraints are needed to synthesize g. b) has dimensions of L2 . When a = b. That constraint is all that dimensional analysis can say.3. The easy cases are then threefold: a h b h a h a = 0 b = 0 a = b When a = 0. each like an oil-drilling core. so.18 2 Easy cases What is f : How should the volume depend on the height? To ﬁnd f. V ∝ h. namely a = b and the extreme case b = 0. Chop the solid into vertical slivers. (2.

then the volume also satisﬁes the a = b constraint. This task looks like a calculus problem: Slice a pyramid into thin horizontal sections and add (integrate) their volumes. If the missing dimensionless constant is 1/2. Six such pyramids form a cube with volume b3 = 8.2. the dimensionless constant in V ∼ hb2 must be 1/3. .8. The volume of an ordinary pyramid (a pyramid with a = 0) is therefore V = hb2 /3. Thus each right triangle has area A = b2 /2. Now extend this reasoning to three dimensions—ﬁnd an ordinary pyramid (with a square base) that combines with itself to make an easy solid. is the prediction V = hb2 /2 correct? Testing the prediction requires ﬁnding the exact dimensionless constant in V ∼ hb2 . the dimensionless constant is 1/2. but what is the dimensionless constant? To ﬁnd it. For numerical simplicity. let’s meet this condition with b = 2 and h = 1. Assume that the top vertex lies directly over the centroid of the base. Two such triangles make an easy shape: a square with area b2 . When a = 0. and hb2 = 4 for these pyramids. making V = h(a2 + b2 )/2.7 Triangular base Guess the volume of a pyramid with height h and a triangular base of area A. The cube then requires six pyramids whose tips meet in the center of the cube. a simple alternative is to apply easy cases again. The area satisﬁes A ∼ hb. However. Then try Problem 2. and the volume of an ordinary pyramid (a = 0) would be hb2 /2. Because each pyramid has volume V ∼ hb2 . so the volume of one pyramid is 4/3. The easy case is easier to construct after we solve a similar but simpler problem: to ﬁnd the area of a h=b triangle with base b and height h.3 Solid geometry: The volume of a truncated pyramid 19 The a = 0 and b = 0 constraints are satisﬁed by the symmetric sum V ∼ h(a2 + b2 ). thus the pyramids have the aspect ratio h = b/2. Problem 2. What is the easy solid? A convenient solid is suggested by the pyramid’s square base: Perhaps each base is one face of a cube. choose b and h to make an easy triangle: a b right triangle with h = b.

The b = 0 test along with the h = b/2 easy case. And the a = b test requires that α + β + γ = 1. dimensional analysis. 3 (2. The a = 0 test similarly requires that γ = 1/3. what is the volume? The prediction from the ﬁrst three easy-cases tests was V = hb2 /2 (when a = 0).11) Then solve for the coeﬃcients α. (2.12) This formula.10 Truncated triangular pyramid Instead of a pyramid with a square base. Therefore β = 1/3 and voilà. How can this contradiction be resolved? The contradiction must have snuck in during one of the reasoning steps.20 2 Easy cases Problem 2. In terms of the height h . The argument for V ∝ h looks correct. which showed that V = hb2 /3 for an ordinary pyramid. To ﬁnd the culprit.9)! Problem 2. require that α = 1/3. the result of proportional reasoning. V= 1 h(a2 + ab + b2 ). whereas the further easy case h = b/2 alongside a = 0 just showed that V = hb2 /3. is exact (Problem 2. The mistake was leaping from these constraints to the prediction V ∼ h(a2 + b2 ) for any a or b. start with a pyramid with an equilateral triangle of side length b as its base. The two methods are making contradictory predictions. Problem 2. and that V = h(a2 + b2 )/2 when a = b—also look correct. and γ by reapplying the easy-cases requirements. Thus the result V = hb2 /3 might apply only with this restriction. Then make the truncated solid by slicing a piece from the top using a knife parallel to the base.9 Integration Use integration to show that V = h(a2 + ab + b2 )/3. Instead let’s try the following general form that includes an ab term: V = h(αa2 + βab + γb2 ). If instead the top vertex lies above one of the base vertices. β.8 Vertex location The six pyramids do not make a cube unless each pyramid’s top vertex lies directly above the center of the base. revisit each step in turn. The three easy-case requirements—that V ∼ hb2 when a = 0. that V ∼ ha2 when b = 0. and the method of easy cases.

For the next equations. with calculus). ρ is its density. and a corresponding top of area Atop .4 Fluid mechanics: Drag 21 and the top and bottom side lengths a and b. p is the pressure.11 Truncated cone What is the volume of a truncated cone with a circular base of radius r1 and circular top of radius r2 (with the top parallel to the base)? Generalize your formula to the volume of a truncated pyramid with height h. Photocopy this page while magnifying it by a factor of 2. Here then are the Navier–Stokes equations of ﬂuid mechanics: 1 ∂v + (v·∇)v = − ∇p + ν∇2 v. and ν is the kinematic viscosity. but the examples can be done without easy cases (for example.13) where v is the velocity of the ﬂuid (as a function of position and time).7. tornadoes. 2. and river rapids. Our example is the following home experiment on drag.) Problem 2. from ﬂuid mechanics.2. a base of an arbitrary shape and area Abase . then cut out the following two templates: 2 in 1 in . no exact solutions are known in general.4 Fluid mechanics: Drag The preceding examples showed that easy cases can check and construct formulas. so easy cases and other street-ﬁghting tools are almost the only way to make progress. These equations describe an amazing variety of phenomena including ﬂight. what is the volume of this solid? (See also Problem 2. ∂t ρ (2.

Impose boundary conditions. Their solutions are known only in very simple cases: for example. Unfortunately. tape together the shaded areas to make a cone. Use the pressure and velocity to ﬁnd the pressure and velocity gradient at the surface of the cone. If it is not consistent. then integrate the resulting forces to ﬁnd the net force and torque on the cone. . When the cones are dropped point downward. together with the continuity equation ∇·v = 0. and hope for better luck upon reaching this step. Step 1.12 Checking dimensions in the Navier–Stokes equations Check that the ﬁrst three terms of the Navier–Stokes equations have identical dimensions. Step 4. This step is diﬃcult because the resulting motion must be consistent with the motion assumed in step 1.13 Dimensions of kinematic viscosity From the Navier–Stokes equations. Finding the terminal speed involves four steps. a sphere moving very slowly in a viscous ﬂuid. Step 2. in order to ﬁnd the pressure and velocity at the surface of the cone. assume a diﬀerent motion. The conditions include the motion of the cone and the requirement that no ﬂuid enters the paper. go back to step 1. but the large cone has twice the height and width of the small cone. Problem 2. Problem 2.22 2 Easy cases With each template. the Navier–Stokes equations are coupled and nonlinear partial-diﬀerential equations. There is little hope of solving for the complicated ﬂow around an irregular. or a sphere moving at any speed in a zero-viscosity ﬂuid. Solve the equations. quivering shape such as a ﬂexible paper cone. The two resulting cones have the same shape. ﬁnd the dimensions of kinematic viscosity ν. Step 3. what is the approximate ratio of their terminal speeds (the speeds at which drag balances weight)? The Navier–Stokes equations contain the answer to this question. Use the net force and torque to ﬁnd the motion of the cone.

1 √ 3 F1 F2 − 2F2 /F2 . F2 = ρνvr. the possibilities are numerous—for example. and ν into a quantity with dimensions of force. (2. the strategy is to ﬁnd the quantities that aﬀect F. On what quantities does the drag depend. Any sum√ these ugly 1 products is also a force. 1 . and what are their dimensions? The drag force depends on four quantities: two parameters of the cone and two parameters of the ﬂuid (air). r.) Do any combinations of the four parameters v. r. This two-step approach simpliﬁes the problem. it means that in the equation F = f(quantities that aﬀect F) both sides have dimensions of force. ρ.4. F1 = ρv2 r2 . ﬁnd their dimensions.2. (For the dimensions of ν. Problem 2. An indirect approach is to deduce the drag force as a function of fall speed and then to ﬁnd the speed at which the drag balances the weight of the cones. It introduces only one new quantity (the drag force) but eliminates two quantities: the gravitational acceleration and the mass of the cone.14) v r ρ ν speed of the cone size of the cone density of air viscosity of air LT−1 L ML−3 L2 T−1 √ of or the product combinations F1 F2 and F2 /F2 .13. and then combine the quantities into a quantity with dimensions of force. or much worse.4 Fluid mechanics: Drag 23 2. Applied to the drag force F. and ν have dimensions of force? The next step is to combine v. so the drag force F could be F1 F2 + F2 /F2 . ρ. A direct approach is to use them to deduce the terminal velocity itself. Unfortunately.1 Using dimensions Because a direct solution of the Navier–Stokes equations is out of the question.14 Explaining the simpliﬁcation Why is the drag force independent of the gravitational acceleration g and of the cone’s mass m (yet the force depends on the cone’s shape and size)? The principle of dimensions is that all terms in a valid equation have identical dimensions. Therefore. let’s use the methods of dimensional analysis and easy cases. see Problem 2.

The same method turns any valid equation into a dimensionless equation. try it in the simple example of free fall (Section 1. and any dimensionless form can be written using dimensionless groups. and h. Is the free-fall example (Section 1. r. list the quantities in the problem. B.15) makes each term dimensionless. For that group. Second. g. The exact impact speed of an √ object dropped from a height h is v = 2gh. which itself uses only the dimensionless group v/ gh.16) (2. and ν. This result can indeed be written in the dimensionless form √ √ √ v/ gh = 2. This dimensionless-group analysis of formulas. where g is the gravitational acceleration. ρ. all dimensionless groups can be constructed just from one group. which produces the equation C A B + = . First. return to the ﬁrst principle of dimensions: All terms in an equation have identical dimensions. This principle applies to any statement about drag such as A+B=C where the blobs A. Thus. combine these quantities into dimensionless groups. Then the only possible dimensionless statement is . Any dimensionless form can be built from dimensionless groups: from dimensionless products of the variables. Therefore. becomes a method of synthesis. they have identical dimensions.24 2 Easy cases Narrowing the possibilities requires a method more sophisticated than simply guessing combinations with correct dimensions. any (true) equation describing the world can be written in a dimensionless form. Here. A A A (2. Because any equation describing the world can be written in a dimensionless form. The new principle passes its ﬁrst test.2). Let’s warm up by synthesizing the impact speed v. here. when reversed. they are v. any equation describing the world can be written using dimensionless groups. Although the blobs can be absurdly complex functions. dividing each term by A. and C are functions of F. v.2) consistent with this principle? Before applying this principle to the complicated problem of drag. let’s choose v2 /gh (the particular choice does not aﬀect the conclusion). To develop the sophisticated approach.

This result reproduces the result of the less sophisticated dimensional analysis in Section 1.17) (The right side is a dimensionless constant because no second group is √ available to use there. However.) What dimensionless groups can be constructed for the drag problem? One dimensionless group could be F/ρv2 r2 .18) The physics of the (steady-state) drag force on the cone is all contained in the dimensionless function f. Any other group can be constructed from these groups (Problem 2.15.17).19) (2. where f is a still-unknown (but dimensionless) function. Problem 2. place the ﬁrst group on the left side rather than wrapping it in the still-mysterious function f.) In other words.15 Fall time Synthesize an approximate formula for the free-fall time t from g and h.2. (See also Problem 1. Which dimensionless group belongs on the left side? The goal is to synthesize a formula for F. ﬁnding the drag force—the less sophisticated method does not provide its constraint in a useful form. with only one dimensionless group. a second group could be rv/ν. With this choice. gh (2.4 Fluid mechanics: Drag 25 v2 = dimensionless constant. either analysis leads to the same conclusion.16 Kepler’s third law Synthesize Kepler’s third law connecting the orbital period of a planet to its orbital radius. v2 /gh ∼ 1 or v ∼ gh. then the method of dimensionless groups is essential. Indeed. Problem 2. . the most general statement about drag force is rv F .2. With that constraint in mind. in hard problems—for example. and F appears only in the ﬁrst group F/ρv2 r2 . so the problem is described by two independent dimensionless groups. The most general dimensionless statement is then one group = f(second group). =f ρv2 r2 ν (2.

having produced a drag force that depends on the unknown function f. and rewrite the volume using these groups. But it might have exact solutions in its easy cases. This combination rv/ν. and that of a function of four variables a library. Extreme cases of what? The unknown function f depends on only rv/ν. However. rv F .21) so try extremes of rv/ν.20) Make dimensionless groups from V .18 How many groups in general? Is there a general method to predict the number of independent dimensionless groups? (The answer was given in 1914 by Buckingham [9]. whereas dimensional analysis reduced the problem to guessing a function of only one variable (the ratio vr/ν). look ﬁrst at the extreme cases.26 2 Easy cases Problem 2. and ν produce only two independent dimensionless groups. that of a function of three variables a bookcase. p. ﬁrst determine the meaning of rv/ν. a. The original problem formulation required guessing the four-variable function h in F = h(v. The value of this simpliﬁcation was eloquently described by the statistician and physicist Harold Jeﬀreys (quoted in [34. But it has greatly improved our chances of ﬁnding f. Problem 2. Problem 2. =f ρv2 r2 ν (2. Because the easiest cases are often extreme cases. to avoid lapsing into mindless symbol pushing. r. that of a function of two variables a volume. ν). r. 3 (2. . 82]): A good table of functions of one variable may require a page. v. ρ. h.4.2 Using easy cases Although improved.3 has volume V= 1 h(a2 + ab + b2 ).) 2.19 Dimensionless groups for the truncated pyramid The truncated pyramid of Section 2.17 Only two groups Show that F. and b. our chances do not look high: Even the one-variable drag problem has no exact solution.) The procedure might seem pointless. (There are many ways to do so. ρ.

Therefore. is the famous Reynolds number.) Viscosity aﬀects the drag force only through the Reynolds number: rv F . This reasoning contains several subtle untruths. The Reynolds number is r v 0.) Problem 2.23) It is signiﬁcantly greater than 1.20 Reynolds numbers in everyday ﬂows Estimate Re for a submarine cruising underwater. the falling cones are an example of one extreme. say. Are the falling cones an extreme of the Reynolds number? The Reynolds number depends on r. (Its physical interpretation requires the technique of lumping and is explained in Section 3. v. 10−5 m2 s−1 ν (2. 46]. a factor of 2). yet its conclusion is mostly correct. And the kinematic viscosity of air is ν ∼ 10−5 m2 s−1 . so the falling cones are an extreme case of high Reynolds number.2. with further luck. and a 747 crossing the Atlantic.1 m (again within a factor of 2). ρv2 r2 (2. a falling pollen grain.27 and see [38]. The high-Reynolds-number limit can be reached many ways.24) . f can be deduced at extremes of the Reynolds number. culminating in singular perturbations and the theory of boundary layers [12.) The Reynolds number aﬀects the drag force via the unknown function f: F = f (Re) . =f ρv2 r2 ν (2.3.4. because ν lives in the denominator of the Reynolds number. a falling raindrop. (Clarifying the subtleties required two centuries of progress in mathematics. For the speed v. try Problem 2. One way is to shrink the viscosity ν to 0.22) With luck. The size r is roughly 0. in the limit of high Reynolds number.1 m × 1 m s−1 ∼ 104 . (For low Reynolds number.4 Fluid mechanics: Drag 27 often denoted Re. everyday experience suggests that the cones fall at roughly 1 m s−1 (within. and ν. viscosity disappears from the problem and the drag force should not depend on viscosity.

so the most general statement about drag is F = dimensionless constant.29) All cones constructed from the same paper and having the same shape.27) (2. Because Apaper is comparable to the cross-sectional area A. drag weight Fdrag W = mg (2. (2. The weight is W = σpaper Apaper g. it is roughly 1/2.28 2 Easy cases To make F independent of viscosity. F must be independent of Reynolds number! The problem then contains only one independent dimensionless group. F/ρv2 r2 . the result applies to any object as long as the Reynolds number is high. ρv2 A ∼ σpaper Ag .3 Terminal velocities The result F ∼ ρv2 A is enough to predict the terminal velocities of the cones. Because r2 is proportional to the cone’s cross-sectional area A. fall at the same speed! .25) The drag force itself is then F ∼ ρv2 r2 . Terminal velocity means zero acceleration. the weight is roughly given by W ∼ σpaper Ag. it is roughly 1/4. whatever their size. For a sphere.26) Although the derivation was for falling cones. ρ (2. it is roughly 1. ρv2 r2 (2. The shape aﬀects only the missing dimensionless constant. so the drag force must balance the weight. the drag force is commonly written F ∼ ρv2 A. for a long cylinder moving perpendicular to its axis.4. Therefore. and for a ﬂat plate moving perpendicular to its face. where σpaper is the areal density of paper (mass per area) and Apaper is the area of the template after cutting out the quarter section.28) The area divides out and the terminal velocity becomes v∼ gσpaper . 2.

held one in each hand above my head.5 Summary and further problems A correct solution works in all cases. Their 2 m fall lasted roughly 2 s. and then compare that prediction to the result of the home experiment. Problem 2. 2.25 Odd sum Here is the sum of the ﬁrst n odd integers: Sn = 1 + 3 + 5 + · · · + ln n terms a.23 Estimating the terminal speed Estimate or look up the areal density of paper.22 Home experiment of four stacked cones versus one cone Predict the ratio terminal velocity of four small cones stacked inside each other . try the following problems and see the concise and instructive book by Cipra [10].30) terminal velocity of one small cone Test your prediction.31) . (2. and let them fall. I constructed the small and large cones described on page 21.24 Fencepost errors A garden has 10 m of horizontal fencing that you would like to divide into 1 m segments by using vertical posts.2. Can you ﬁnd a method not requiring timing equipment? Problem 2.1. Problem 2. Do you need 10 or 11 vertical posts (including the posts needed at the ends)? Problem 2. and guess formulas by constructing expressions that pass all easy-cases tests.5 Summary and further problems 29 To test this prediction. and they landed within 0.21 Home experiment of a small versus a large cone Try the cone home experiment yourself (page 21). predict the cones’ terminal speed. Use easy cases to guess Sn (as a function of n). Therefore. To apply and extend these ideas.1 s of one another. (2. An alternative solution is discussed in Section 4. check any proposed formula in the easy cases. Does the last term ln equal 2n + 1 or 2n − 1? b. including the easy ones. Cheap experiment and cheap theory agree! Problem 2.

32) The result. when combined with the correct dimensionless constant.2) is k/m. and the gravitational acceleration g. guess the form of f in In the limit Re rv F .29 Spring equation v θ R The angular frequency of an ideal mass–spring system (Section 3. is known as Stokes drag [12].30 2 Easy cases Problem 2. Now imagine that it is given an initial velocity v0 (where positive v0 means an upward throw). if the tape on the large cone is. Then solve the free-fall diﬀerential equation to ﬁnd the exact vi . 6 mm wide. In other words. Use extreme cases of k or m to decide whether that placement is correct.28 Range formula How far does a rock travel horizontally (no air resistance)? Use dimensions and easy cases to guess a formula for the range R as a function of the launch velocity v. Guess the impact velocity vi . Why? . This expression has the spring constant k in the numerator.2 was released from rest.4. and compare the exact result to your guess. where k is the spring constant and m is the mass.27 Low Reynolds number 1. Problem 2.30 Taping the cone templates The tape mark on the large cone template (page 21) is twice as wide as the tape mark on the small cone template. the tape on the small cone should be 3 mm wide. Problem 2. Problem 2.26 Free fall with initial velocity The ball in Section 1. the launch angle θ. say. Problem 2. =f ν ρv2 r2 (2.

4 3. we cannot simply multiply the 6 months by the planet’s current velocity. we can exactly calculate the area under 2 the Gaussian e−x between x = 0 and ∞. yet if either limit is any value except zero or inﬁnity. Such calculations are the reason that calculus was invented. Using calculus methods. this computation can often be done exactly. an exact calculation becomes impossible. However. are often rendered impossible by even small changes to the problem. .3 3. and then to add the tiny distances.2 3. group or lump it into one or two pieces.5 3. for example. Its fundamental idea is to divide the time into tiny intervals over which the velocity is constant. even when the intervals have inﬁnitesimal width and are therefore inﬁnite in number.1 3. to multiply each tiny time by the corresponding velocity to compute a tiny distance.6 Estimating populations: How many babies? Estimating integrals Estimating derivatives Analyzing diﬀerential equations: The spring–mass system Predicting the period of a pendulum Summary and further problems 32 33 37 42 46 54 Where will an orbiting planet be 6 months from now? To predict its new location. This simple approximation and its advantages are illustrated using examples ranging from demographics (Section 3. approximate methods are robust: They almost always provide a reasonable answer. Instead of dividing a changing process into many tiny pieces. In contrast.3 Lumping 3.5).1) to nonlinear diﬀerential equations (Section 3. for its velocity constantly varies. the symbolic manipulations can be lengthy and. worse. And the least accurate but most robust method is lumping. Amazingly.

1 Dimensions of the vertical axis Why is the vertical axis labeled in units of people per year rather than in units of people? Equivalently. Third. and a plausible time related to populations is the life expectancy. As an approximation to this voluminous data. why does the axis have dimensions of T−1 ? This method has several problems. information is collected once every decade by the US Census Bureau. in short. Instead of integrating the population curve exactly. or closely similar. provides little insight and has minimal transfer value. Second. which is the US population—conveniently 300 million in 2008.1) 0 50 age (yr) Problem 3.1 Estimating populations: How many babies? The ﬁrst example is to estimate the number of babies in the United States.2) Why did the life expectancy drop from 80 to 75 years? . This. It is roughly 80 years. Therefore. approximate it—lump the curve into one rectangle. height = 3 × 108 area ∼ . so the integration must be done numerically. An exact calculation requires the birth dates of every person in the United States. The data for 1991 is a set of points lying on a wiggly line N(t). it depends on the huge resources of the US Census Bureau. For deﬁniteness. the Census Bureau [47] publishes the number of people at each age. width 75 yr (3. The rectangle’s height can be computed from the rectangle’s area. so make 80 years the width by pretending that everyone dies abruptly on his or her 80th birthday. where t is age. whereas mathematics should be about generality. First. What are the height and width of this rectangle? The rectangle’s width is a time. call a child a baby until he or she turns 2 years old. the integral is of data speciﬁc to this problem. An exact integration. so it is not usable on a desert island for backof-the-envelope calculations. it requires integrating a curve with no analytic form. 0 (3.32 3 Lumping 3. Then Nbabies = 2 yr 4 106 yr N(t) 0 N(t) dt.

.1 1/e heuristic Electronic circuits.2. and it might even cancel the lumping error.3.3 Industry revenues Estimate the annual revenue of the US diaper industry.2.2). and radioactive decay contain the ubiquitous exponential and its integral (given here in dimensionless form) ∞ 0 1 e−t e−t dt. 4 106 yr lumped census data babies 0 0 age (yr) 75 Integrating the population curve over the range t = 0 .1) was diﬃcult to integrate partly because it was unknown. In such cases.2 Estimating integrals The US population curve (Section 3. t . 2 yr becomes just multiplication: Nbabies ∼ 4 × 106 yr−1 × 2 yr = 8 × 106 . Problem 3. height infancy (3.1) and the full width at half maximum (FWHM) heuristic (Section 3. two lumping methods are particularly useful: the 1/e heuristic (Section 3.2 Landﬁll volume Estimate the US landﬁll volume used annually by disposable diapers. But even well-known functions can be diﬃcult to integrate. The inaccuracy is no larger than the error made by lumping.. 3.2 Estimating integrals 33 Fudging the life expectancy simpliﬁes the mental division: 75 divides easily into 3 and 300. . 3.3) The Census Bureau’s ﬁgure is very close: 7.4) 0 0 1 . The error from lumping canceled the error from fudging the life expectancy to 75 years! Problem 3..980 × 106 . atmospheric pressure. (3. Using 75 years as the width makes the height approximately 4 × 106 yr−1 .2.

77 (Section 2. To choose its width.5 Atmospheric pressure Atmospheric density ρ decays roughly exponentially with height z: ρ ∼ ρ0 e−z/H . so the width is Δx = 2 and its area √ is 1 × 2. Problem 3. The exact area is π ≈ 1. Its width is 2 enough that e−x falls by a factor of e. Use your everyday experience to estimate H. and H is the so-called scale height (the height at which the density falls by a factor of e). .3): Choose a signiﬁcant change in e−t . Δt = 1. then e−t ﬁnd the width Δt that produces this change. 2 (3.34 3 Lumping To approximate its value. In an t 0 0 1 exponential decay.4 General exponential decay Use lumping to estimate the integral ∞ 0 e−at dt. Problem 3. let’s try the heuristic on the diﬃcult integral ∞ −∞ e−x 2 e−x dx. This drop hap−1 0 1 pens at x = ±1. so lumping makes an error of only 13%: For such a short derivation.7) where ρ0 is the density at sea level.3. use signiﬁcant change as the criterion (a method used again in Section 3.5) −1 0 1 Again lump the area into a single rectangle. The lumping rectangle then has unit area—which is the exact value of the integral! Encouraged by this result. Its height 2 is the maximum of e−x . the accuracy is extremely high. namely 1.6) Use dimensional analysis and easy cases to check that your answer makes sense. which is 1. With this criterion. let’s lump the e−t curve into one rectangle. What values should be chosen for the width and height of the rectangle? lumped A reasonable height for the rectangle is the maximum 1 of e−t . (3.1). (3. a simple and natural signiﬁcant change is when e−t becomes a factor of e closer to its ﬁnal value (which is 0 here because t goes to ∞).

4 .2 Full width at half maximum Another reasonable lumping heuristic arose in the early days of spectroscopy. how could the areas of the peaks be computed? They were computed by lumping the peak into a rectangle whose height is the height of the peak and whose width is the full width at half maximum (FWHM).6 Cone free-fall distance Roughly how far does a cone of Section 2. a chart recorder would plot how strongly a molecule absorbed radiation of that wavelength.813]. How accurate is each estimate? ∞ a. Try this recipe on the Gaussian integral 2 ∞ −∞ e−x dx. But decades before digital chart recorders existed.77 (Section 2. √ The exact area is π ≈ 1.1): The FWHM heuristic makes an error of only 6%. so the half maxima √ √ are at x = ± ln 2 and the full width is 2 ln 2.665. b. the FWHM heuristic uses a factor of 2. The √ lumped rectangle therefore has area 2 ln 2 ≈ 1.4 fall before reaching a signiﬁcant fraction of its terminal velocity? How large is that distance compared to the drop height of 2 m? Hint: Sketch (very roughly) the cone’s acceleration versus time and make a lumping approximation. Problem 3. As a spectroscope swept through a range of wavelengths.2. 1 + x2 e−x dx [exact value: Γ (1/4)/2 ≈ 1. 2 The maximum height of e−x is 1.2 Estimating integrals 35 Then estimate the atmospheric pressure at sea level by estimating the weight of an inﬁnitely high cylinder of air. 3. FWHM √ − ln 2 √ ln 2 Problem 3. −∞ ∞ −∞ 1 dx [exact value: π]. This curve contains many peaks whose location and area reveal the structure of the molecule (and were essential in developing quantum theory [14]). Choose the height and width of the rectangle using the FWHM heuristic.7 Trying the FWHM heuristic Make single-rectangle lumping estimates of the following integrals. which is roughly one-half the error of the 1/e heuristic.3. Where the 1/e heuristic uses a factor of e as the signiﬁcant change.

when the integrand tn e−t is nn e−n —thus reproducing the largest and most important factor in Stirling’s formula. the integral representation for n!. so it is an increasing. et outruns any polynomial tn and makes the integrand tn /et equal 0 in the t → ∞ extreme. However. n! is deﬁned as n × (n − 1) × (n − 2) × · · · × 2 × 1. but does the integrand tn e−t have a peak? To understand the integrand tn e−t or tn /et . it is diﬃcult to approximate. 1 2 3 more simply. The graph conﬁrms this prediction t2 e−t and suggests that the peak occurs at t = n. the integrand must have a peak in between.2. Let’s te−t check by using calculus to maximize tn e−t or.3 Stirling’s approximation The 1/e and FWHM lumping heuristics next help us approximate the ubiquitous factorial function n!. In fact. the peak of tn /et shifts right as n increases. (3. Being zero at both extremes. Because df/dt = n/t−1. . In the opposite extreme. Who wins that struggle? The Taylor series for et contains every power of t (and with positive coeﬃcients). so t survives until higher t before e outruns it. examine the extreme cases of t. When t = 0. The lumping analysis will generate almost all of Stirling’s famous approximation formula √ (3. this function’s uses range from probability theory to statistical mechanics and the analysis of algorithms. it has exactly one peak.9) n! ≈ nn e−n 2πn. the peak occurs at tpeak = n. the integrand is 0. a function has zero slope. Therefore. In this discrete form. n! ≡ ∞ 0 tn e−t dt. Lumping requires a peak. t → ∞.8) provides a deﬁnition even when n is not a positive integer—and this integral can be approximated using lumping. At a peak. Therefore.36 3 Lumping 3. the polynomial factor tn makes the product inﬁnity while the exponential factor e−t makes it zero. (Can you show that?) t3 e−t Increasing n strengthens the polynomial factor n n t t . to maximize its logarithm f(t) = n ln t − t. as t goes to inﬁnity. inﬁnite-degree polynomial. For positive integers.

Although √ the exact 2π factor remains mysterious (Problem 3.12) n! ∼ nn e−n n × √ 8 ln 2 (FWHM criterion: F = 2).9). and the n factor is from the width of the rectangle. Lumping has explained almost every factor. Stirling’s formula is n! ≈ nn e−n 2πn.10) The second term of the Taylor expansion vanishes because f(t) has zero slope at the peak. use either the 1/e or the FWHM heuristics. This √ choice means Δt = 2n ln F. lumping also provides . the second derivative d2 f/dt2 at t = n is −n/t2 or −1/n. it is approximated to within 13% (the 1/e heuristic) or 6% (the FWHM heuristic). Because integration and diﬀerentiation are closely related. Problem 3. Thus. the lumped-area estimate of n! is √ √ 8 (1/e criterion: F = e) (3.3 Estimating derivatives What is a reasonable lumping rectangle? The rectangle’s height is the peak height nn e−n .2) was also accurate to 6%.9 Exact constant in Stirling’s formula √ Where does the more accurate constant factor of 2π come from? 3. 2n (3.11) To decrease tn e−t by a factor of F requires decreasing f(t) by ln F. expand its logarithm f(t) in a Taylor series around its peak at t = n: f(n + Δt) = f(n) + Δt df dt t=n 37 2Δt nn /en tn e−t + (Δt)2 d2 f 2 dt2 t=n + ···. For the rectangle’s width. The nn e−n factor is the height of the rec√ tangle.8 Coincidence? The FWHM approximation for the area under a Gaussian (Section 3.3 Estimating derivatives In the preceding examples.3. lumping helped estimate integrals. Because the rectangle’s width is 2Δt. Coincidence? Problem 3. (3. Because both heuristic require approximating tn e−t . f(n + Δt) ≈ f(n) − (Δt)2 .2. √ For comparison. In the third term.

When f(x) = x2 + 100.3. for example.38 3 Lumping a method for estimating derivatives. This useful. for example. x (3. df/dx is the ratio of df to dx. perhaps their magnitudes are similar: df f ∼ . Problem 3.11 Higher powers Investigate the secant approximation for f(x) = xn . Good news—the secant and tangent slopes diﬀer only by a factor of 2: df = 2x dx and f(x) = x. surprising conclusion is worth testing with a familiar example: Diﬀerentiating height y with respect to time t produces velocity dy/dt. Problem 3. A derivative is a ratio of diﬀerentials.3. whereas the approximation f/x is the slope of the secant line. whose dimensions of LT−1 are indeed the dimensions of y/t.12 Second derivatives Use the secant approximation to estimate d2 f/dx2 with f(x) = x2 . dx x (3. we make a lumping approximation.1 Secant approximation As df/dx and f/x have identical dimensions. By replacing the curve with the secant line.13) x2 tangent secant Geometrically. the secant and tangent at x = 1 . Because d is dimensionless (Section 1.2). The method begins with a dimensional observation about derivatives. x Let’s test the approximation on an easy function such as f(x) = x2 .10 Dimensions of a second derivative What are the dimensions of d2 f/dx2 ? 3.14) Problem 3. How does the approximation compare to the exact second derivative? How accurate is the secant approximation for f(x) = x2 + 100? The secant approximation is quick and useful but can make large errors. the dimensions of df/dx are the dimensions of f/x. the derivative df/dx is the slope of the tangent line.

sketch the ratio secant slope tangent slope (3. Then df/dx ≈ (f(x) − f(0))/x. which is Δx→0 lim f(x) − f(x − Δx) . 3. 0) the origin secant. translate f(x) = x2 rightward by 100 to make f(x) = (x−100)2 . call the new secant the x = 0 secant. which produces df/dx ≈ f/x. f(0)) to (x.16) with the secant slope f(x)/x is due to two approximations. At the parabola’s .15) as a function of x. How robust is the x = 0 secant approximation against horizontal translation? To investigate how the x = 0 secant handles horizontal translation. what are the secant and tangent slopes when f(x) = x2 + C? x2 + C tangent x = 0 secant Call the secant starting at (0. is distressingly large. The x = 0 secant approximation is robust against—is unaﬀected by—vertical translation. that ratio is the slope of the secant from (0.3. x Then the x = 0 secant always has one-half the slope of the tangent. Problem 3. f(x)). 0) to (x. whereas the secant slope f(1)/1 is 101. f(0)) instead of (0. 0).) The large discrepancy in replacing the derivative df/dx. The ﬁrst approximation is to take Δx = x rather than Δx = 0.2 Improved secant approximation The second approximation is ﬁxed by starting the secant at (0. Δx (3. f(x)). although dimensionless.3. The ratio of these two slopes. With that change. The ratio is not constant! Why is the dimensionless factor not constant? (That question is tricky. The tangent slope df/dx is 2. This ﬁrst approximation produces the slope of the line from (0.3 Estimating derivatives 39 have dramatically diﬀerent slopes. The second approximation replaces f(0) with 0.13 Investigating the discrepancy With f(x) = x2 + 100. no matter the constant C.

no ﬁxed choice can be scale invariant. The x = 0 (0. The origin secant goes from (0. has slope −100. but is probably too large for computing derivatives related to falling fog droplets. It is a poor approximation to the exact slope of 1. How small should Δx be? Is Δx = 0. 1) (3. without favoring particular coordinate values or values of Δx. from (0. the approximation is scale and translation invariant. the tangent has zero slope. dx Δx where Δx is not zero but is still small. it fails when f(x) = sin 1000x. it cannot work when x has dimensions. To illustrate this approximation.01 has two defects. First. so a derivative suitably approximated might be translation invariant. is aﬀected by horizontal translation. although an improvement on the origin secant. These problems suggest trying the following signiﬁcant-change approximation: signiﬁcant Δf (change in f) at x df ∼ . and the signiﬁcant-change approximation.01 produces accurate derivatives when f(x) = sin x.3 Signiﬁcant-change approximation The derivative itself is unaﬀected by horizontal and vertical translation. the result of simply rescaling x to 1000x. Second. Thus the x = 0 secant.18) dx Δx that produces a signiﬁcant Δf Because the Δx here is deﬁned by the properties of the curve at the point of interest. 104 ) to (100. the x = 0 secant. 0) to (3π/2. what length is small enough? Choosing Δx = 1 mm is probably small enough for computing derivatives related to the solar system.17) origin secant x = 0 secant . however. 0).40 3 Lumping vertex x = 100. the x = 0 secant. An approximate derivative is f(x + Δx) − f(x) df ≈ . Although Δx = 0.01 small enough? The choice Δx = 0.3. so it has zero slope. (3. 0). 3. If x is a length. 1) cos x (2π. let’s try f(x) = cos x and estimate df/dx at x = 3π/2 with the three approximations: the origin secant.

Compare the estimate to the true r0 found using calculus. 0). that h(x) = f(x) + g(x) has a minimum where f(x) = g(x).16. Problem 3. x2 .3 Estimating derivatives 41 secant goes from (0.15 Derivative of the logarithm Use the signiﬁcant-change approximation to estimate the derivative of ln x at x = 10.14 Derivative of a quadratic estimate df/dx at x = 5 using three approximations: the origin With f(x) = secant. to 3π/2 + π/6. which generalizes Problem 3. dx Δx π/6 π (2π. where f(x) = 0. where f(x) = 1/2.955—amazingly close to the true derivative of 1.20) where r is the distance between the molecules.19) This estimate is approximately 0. the separation r at which V(r) is a minimum. In other words.3. Compare these estimates to the true slope. and and σ are constants that depend on the molecules. The approximate derivative is therefore 3 df signiﬁcant Δf near x 1/2 ∼ ∼ = . which is worse than predicting zero slope because even the sign is wrong! The signiﬁcant-change approximation might provide more accuracy.16 Lennard–Jones potential The Lennard–Jones potential is a model of the interaction energy between two nonpolar molecules such as N2 or CH4 . 1) to (3π/2. the x = 0 secant. (3. approximately. Compare the estimate to the true slope. . and the signiﬁcant-change approximation. Use the origin secant to estimate r0 . Δx is π/6. Problem 3. 1 ) 3 2 ( 3π . so it has a slope of −2/3π. Problem 3. That change happens when x changes from 3π/2. 0) 2 (3. Use the origin secant to show. 1) cos x ( 5π .17 Approximate maxima and minima Let f(x) be an increasing function and g(x) a decreasing function. It has the form V(r) = 4 σ r 12 − σ r 6 . Problem 3. call 1/2 a signiﬁcant change in f(x). This useful rule of thumb. What is a signiﬁcant change in f(x) = cos x? Because the cosine changes by 2 (from −1 to 1). is often called the balancing heuristic.

What are the dimensions of the two terms in the spring equation? Look ﬁrst at the simple second term kx. k To produce an example equation to analyze. however. What are the dimensions of those terms? . it thereby reduces diﬀerential equations to algebraic equations.3). If all terms do not have identical dimensions. Many diﬀerential equations. The block oscillates back and forth.21) Let’s approximate the equation and thereby estimate the oscillation frequency. its position x described by the ideal-spring diﬀerential equation m d2 x + kx = 0. ﬁrst check its dimensions (Chapter 1).4.4 Analyzing diﬀerential equations: The spring–mass system Estimating derivatives reduces diﬀerentiation to division (Section 3. dt2 (3. 3. and release it at time t = 0. Is the ﬁrst term also a force? The ﬁrst term m(d2 x/dt2 ) contains the second derivative d2 x/dt2 . this reﬂection helps prepare for solving the equation and for understanding any solution. contain unfamiliar derivatives.1 Checking dimensions Upon seeing any equation. the check has prompted reﬂection on the meaning of the terms. The Navier–Stokes equations of ﬂuid mechanics (Section 2.4). It arises from Hooke’s law. the equation is not worth solving—a great savings of eﬀort. ∂t ρ (3. which is familiar as an acceleration.22) contain two strange derivatives: (v·∇)v and ∇2 v. pull the block a distance x0 to the right relative to the equilibrium position x = 0. Thus the second term kx is a force. conm nect a block of mass m to an ideal spring with x0 spring constant (stiﬀness) k. If the dimensions match. 1 ∂v + (v·∇)v = − ∇p + ν∇2 v. which says that an ideal spring exerts a force kx where x is the extension of the spring relative to its equilibrium length.42 3 Lumping 3.

2 that the diﬀerential symbol d means “a little bit of.2 Estimating the magnitudes of the terms The spring equation passes the dimensions test. and x is length and t is time. These replacements will turn a complicated diﬀerential equation into a simple algebraic equation for the frequency. use the signiﬁcant-change approximation (Section 3.3) to estimate the magnitude of the acceleration d2 x/dt2 . signiﬁcant Δx d2 x ∼ . the dimensions of the second derivative are LT−2 : d2 x = LT−2 .” The numerator d2 x. Because d2 x/dt2 contains two exponents of 2.23) This combination is an acceleration. meaning d of dx.” Thus. whereas the denominator contains the second power of Δt. let’s now ﬁnd the dimensions of d2 x/dt2 by hand. How can that discrepancy be correct? . [It turns out to mean (dt)2 .4 Analyzing diﬀerential equations: The spring–mass system 43 To practice for later handling such complicated terms.24) Problem 3. Are L2 T−2 the correct dimensions? To decide.18 Dimensions of spring constant What are the dimensions of the spring constant k? 3.] In either case. it is a length. use the idea from Section 1. The method is to replace each term with its approximate magnitude. The denominator dt2 could plausibly mean (dt)2 or d(t2 ). Problem 3. d2 x/dt2 might plausibly have dimensions of L2 T−2 . 2 dt (Δt that produces a signiﬁcant Δx)2 (3.4. Therefore. is “a little bit of a little bit of x. To approximate the ﬁrst term m(d2 x/dt2 ). dt2 (3.19 Explaining the exponents The numerator contains only the ﬁrst power of Δx. so it is worth analyzing to ﬁnd the oscillation frequency. so the spring equation’s ﬁrst term m(d2 x/dt2 ) is mass times acceleration—giving it the same dimensions as the kx term.3. its dimensions are T2 .3.3.

but this conclusion is exact (Problem 3.20). The two terms must add to zero—a consequence of the spring equation m d2 x + kx = 0. then in the time Δt the mass would travel a distance comparable to x0 . the frequency ω and oscillation period T = 2π/ω are independent of amplitude.] The approximated angular frequency ω is then k/m. its root-mean-square value—is comparable to mx0 ω2 . the mass moves back and forth and travels a distance 4x0 —much farther than x0 . . dt2 (3.27) The amplitude x0 divides out! With x0 gone. (3. namely that the typical magnitudes are comparable. ﬁrst decide on a signiﬁcant Δx—on what constitutes a signiﬁcant change in the mass’s position. Then the typicalmagnitude estimate can be written m d2 x ∼ mx0 ω2 . [This reasoning uses several approximations.26) Therefore. say. What does “is roughly” mean? The phrase cannot mean that mx0 ω2 and m(d2 x/dt2 ) are within.44 3 Lumping To evaluate this approximate acceleration. the spring equation’s second term kx is roughly kx0 . say. Now estimate Δt: the time for the block to move a distance comparable to Δx. The mass moves between the points x = −x0 and x = +x0 . the m(d2 x/dt2 ) term is roughly mx0 ω2 . dt2 (3. the magnitudes of the two terms are comparable: mx0 ω2 ∼ kx0 .25) With the same meaning of “is roughly”. “is roughly” means that a typical or characteristic magnitude of m(d2 x/dt2 )— for example. Rather. a factor of 2. With the preceding choices for Δx and Δt. This time—called the characteristic time of the system—is related to the oscillation period T . because m(d2 x/dt2 ) varies and mx0 /τ2 is constant. The simplest choice is Δx = x0 . where the angular frequency ω is connected to the period by the deﬁnition ω ≡ 2π/T . T/4 or T/2π. During one period. so a signiﬁcant change in position should be a signiﬁcant fraction of the peak-to-peak amplitude 2x0 . Those choices for Δt have a natural interpretation as being approximately 1/ω. Let’s include this meaning within the twiddle notation ∼. If Δt were.

29) m d2 x dt2 + kx = 0.21 Checking dimensions in the alleged solution What are the dimensions of ωt? What are the dimensions of cos ωt? Check the dimensions of the proposed solution x = x0 cos ωt.22.22 Veriﬁcation Show that x = x0 cos ωt with ω = k/m solves the spring diﬀerential equation (3. we estimate the typical magnitude of the inertial term (v·∇)v and of the viscous term ν∇2 v.3. x = x0 cos ωt.3).28) k/m. ∂t ρ (3. 1 ∂v + (v·∇)v = − ∇p + ν∇2 v. To do so. Problem 3.31) . The approximated angular frequency is also exact! Problem 3. where ω is (3.4. and the dimensions of the proposed period 2π m/k. distance over which ﬂow velocity changes signiﬁcantly (3. from Problem 3. the exact solution of the spring diﬀerential equation is.4 Analyzing diﬀerential equations: The spring–mass system 45 For comparison.20 Amplitude independence Use dimensional analysis to show that the angular frequency ω cannot depend on the amplitude x0 . Problem 3.30) and extract from them a physical meaning for the Reynolds number rv/ν.3 Meaning of the Reynolds number As a further example of lumping—in particular. What is the typical magnitude of the inertial term? The inertial term (v·∇)v contains the spatial derivative ∇v. of the signiﬁcant-change approximation—let’s analyze the Navier–Stokes equations introduced in Section 2.3.4. 3. According to the signiﬁcant-change approximation (Section 3. the derivative ∇v is roughly the ratio signiﬁcant change in ﬂow velocity .

the viscous term is small.4).46 3 Lumping The ﬂow velocity (the velocity of the air) is nearly zero far from the cone and is comparable to v near the cone (which is moving at speed v). How does the period of a pendulum depend on its amplitude? The amplitude θ0 is the maximum angle of the swing. When Re 1. Thus. ν∇2 v is roughly νv/r2 .2).5. The eﬀect of amplitude is contained in the solution to the pendulum diﬀerential equation (see [24] for the equation’s derivation): d2 θ g + sin θ = 0. Reynolds number. 3. and viscosity is the dominant physical eﬀect. The inertial term (v·∇)v contains a second factor of v. The ratio of the inertial term to the viscous term is then roughly (v2 /r)/(νv/r2 ).1 and Section 3.5. and the ﬂow becomes turbulent. Because each spatial derivative contributes a factor of 1/r to the typical magnitude. The ﬂow oozes. so (v·∇)v is roughly v2 /r. What is the typical magnitude of the viscous term? The viscous term ν∇2 v contains two spatial derivatives of v.5.5 Predicting the period of a pendulum Lumping not only turns integration into multiplication. or a reasonable fraction of v. and viscosity has a negligible eﬀect.3). as when pouring cold honey.32) l θ m The analysis will use all our tools: dimensions (Section 3.5. for centuries the basis of Western timekeeping. Therefore. dimensionless. Thus ∇v ∼ v/r. This speed change happens over a distance comparable to the size of the cone: Several cone lengths away. Our example is the analysis of the period of a pendulum. dt2 l (3. the viscous term is large. it is also the angle of release. for a lossless pendulum released from rest. the Reynolds number measures the importance of viscosity. constitutes a signiﬁcant change in ﬂow velocity. it turns nonlinear into linear diﬀerential equations. This ratio simpliﬁes to rv/ν—the familiar. . and lumping (Section 3. easy cases (Section 3. It cannot prevent nearby pieces of ﬂuid from acquiring signiﬁcantly diﬀerent velocities. v. When Re 1. the air hardly knows about the falling cone.

sin θ ≈ θ. for small angles.34) (3. 3. The frequency of the spring–mass system is ω = k/m. Problem 3. the corresponding period is T = 2π l g (for small amplitudes). unit circle 1 sin θ θ θ Problem 3. (3.33) The equations correspond with x analogous to θ and k/m analogous to g/l. the factor is easy in the small-amplitude extreme case θ → 0.23 Angles Explain why angles are dimensionless. Therefore.1 Small amplitudes: Applying extreme cases The pendulum equation is diﬃcult because of its nonlinear factor sin θ. vertical line. dt2 l Compare this equation to the spring–mass equation (Section 3.) .35) (This analysis is a preview of the method of analogy. and its period is T = 2π/ω = 2π m/k. replace the arc with the chord (a straight but nonvertical line).3.5 Predicting the period of a pendulum 47 Problem 3. the pendulum equation becomes linear: d2 θ g + θ = 0. which is the subject of Chapter 6.5. In that limit. To make a more accurate approximation.25 Chord approximation The sin θ ≈ θ approximation replaces the arc with a straight.4) d2 x k + x = 0. Fortunately. For the pendulum equation. dt2 m (3. which is sin θ. What is the resulting approximation for sin θ? In the small-amplitude extreme.24 Checking and using dimensions Does the pendulum equation have correct dimensions? Use dimensional analysis to show that the equation cannot contain the mass of the bob (except as a common factor that divides out). the height of the triangle. is almost exactly the arclength θ.

remain constant. p. or decrease? Any analysis becomes cleaner if expressed using dimensionless groups (Section 2. Therefore. but m can form the dimensionless group T the amplitude x0 . The two groups T l/g and θ0 are independent and fully describe the problem (Problem 3.) Problem 3.2 Arbitrary amplitudes: Applying dimensional analysis The preceding results might change if the amplitude θ0 is no longer small.26 Checking dimensions Does the period 2π Problem 3. cannot be part of any dimensionless group (Problem 3.1). . as the only quantity containing a length. spring constant k. 27. In contrast. see [1] and more broadly also [4. T can belong to the dimenl/g. This problem involves the period T . l θ m 3.48 3 Lumping Problem 3. does the period increase.30). and mass x0 m/k.20) and cannot therefore aﬀect the period of the spring–mass system.29 Conical pendulum for the constant The dimensionless factor of 2π can be derived using an insight from Huygens [15. Projecting its two-dimensional motion onto a vertical screen produces one-dimensional pendulum motion.28 l/g make sense in the extreme cases g → ∞ and Possible coincidence Is it a coincidence that g ≈ π2 m s−2 ? (For an extensive historical discussion that involves the pendulum. As θ0 increases. k An instructive contrast is the ideal spring–mass m system. θ0 is itself a sionless group T dimensionless group. length l. The period T .27 l/g have correct dimensions? Checking extreme cases Does the period T = 2π g → 0? Problem 3.5. and amplitude θ0 . 79]: to analyze the motion of a pendulum moving in a horizontal circle (a conical pendulum). so the period of the two-dimensional motion is the same as the period of one-dimensional pendulum motion! Use that idea along with Newton’s laws of motion to explain the 2π. Because angles are dimensionless. gravitational strength g. 42].4.

what is a large amplitude? An interesting large amplitude is π/2. How does the period behave at large amplitudes? As part of that question. or decreasing function of amplitude? This question is answered in the following section.5 Predicting the period of a pendulum 49 the pendulum’s amplitude θ0 is already a dimensionless group. and deﬁne a dimensionless period h as follows: T = 2π h(θ0 ). A second easy amplitude is the opposite extreme of large amplitudes. at π/2 the exact h is the following awful expression (Problem 3. In constructing useful groups for analyzing the period. l/g (3.5. However. the original question about the period becomes the following: Is h an increasing. which means releasing the pendulum from horizontal. so it can aﬀect the period of the system. so T = function of θ0 . One easy amplitude is the extreme of zero amplitude.3. Problem 3. l/g (3.3 Large amplitudes: Extreme cases again For guessing the general behavior of h as a function of amplitude. factor out the 2π to simplify the subsequent equations.31): .36) Because T l/g = 2π when θ0 = 0 (the small-amplitude limit). where h(0) = 1. length l. why should T appear in only one group? And why should θ0 not appear in the same group as T ? Two dimensionless groups produce the general dimensionless form one group = function of the other group.38) The function h contains all information about how amplitude aﬀects the period of a pendulum.30 Choosing dimensionless groups Check that period T . Using h. constant. useful clues come from evaluating h at two amplitudes. gravitational strength g.37) (3. 3. and amplitude θ0 produce two independent dimensionless groups.

a vertical release would mean that the bob falls straight down instead of oscillating. so T (π) = ∞ and 1 h(π) = ∞. Fortunately. Problem 3. see Problem 3. Because θ0 = π/2 is not a helpful extreme. . cos θ (3. π 0 cos θ (3. a thought experiment is cheap to im. From θ0 π these data.31 General expression for h Use conservation of energy to show that the period is √ T (θ0 ) = 2 2 l g θ0 0 dθ √ .40) Conﬁrm that the equivalent dimensionless statement is √ θ 2 0 dθ √ h(θ0 ) = . equal to. the pendulum bob hangs upside down forever. Balanced perfectly at θ0 = π. however. Thus. h(π) > 1 and h(0) = 1.39) Is this integral less than. be even more extreme. the most likely conjecture is that h increases monotonically with amplitude.42) Problem 3. or more than 1? Who knows? The integral is likely to have no closed form and to require numerical evaluation (Problem 3.34). such twists and turns would be surprising behavior from such a clean diﬀerential equation.32).41) (3. Try θ0 = π. which means releasing the pendulum bob from vertical. cos θ − cos θ0 (3.32 Numerical evaluation for horizontal release Why do the lumping recipes (Section 3. This novel behavior is neither included in nor described by the pendulum diﬀerential equation. θ0 = π/2.h(θ0 ) prove: Replace the string with a massless steel rod.2) fail for the integrals in Problem 3.31? Compute h(π/2) using numerical integration. π 0 cos θ − cos θ0 For horizontal release. Although h could ﬁrst decrease and then increase. and √ π/2 2 dθ √ h(π/2) = . (For the behavior of h near θ0 = π.50 √ 2 h(π/2) = π 3 Lumping π/2 0 dθ √ . If the bob is connected to the pivot point by a string.

the dimensionless period h diverges to inﬁnity. Before taking that statement on faith. Check and reﬁne your conjectures using the tabulated values.5.791297 4. split sin θ into the tractable factor θ and an adjustment factor f(θ). Then predict h(π − 10−5 ). To account for the diﬀerence and predict the period.44) .721428 7. dt2 l θ f(θ) (3. but verify. at zero amplitude.255581 5. θ and sin θ diﬀer and their diﬀerence aﬀects the period. however.187298 3. roughly how long does the pendulum take to rotate by a signiﬁcant angle—say. increase with amplitude? In the zero-amplitude extreme. h = 1. does h(θ0 ) have zero slope (curve A) or positive slope (curve B)? Problem 3. But what about the derivative of h? At zero amplitude (θ0 = 0).33 Small but nonzero amplitude As the amplitude approaches π. At nonzero amplitude.4 Moderate amplitudes: Applying lumping The conjecture that h increases monotonically was derived using the extremes of zero and vertical amplitude.5 Predicting the period of a pendulum 51 Problem 3. or its dimensionless cousin h.3.” At moderate (small but nonzero) amplitudes. sin θ is close to θ. recall a proverb from arms-control negotiations: “Trust. by 1 rad? Use that information to predict how h(θ0 ) behaves when θ0 ≈ π. That approximation turned the nonlinear pendulum equation d2 θ g + sin θ = 0 dt2 l (3. so it should apply at intermediate amplitudes. As a function of β. h 1 B A θ0 β 10−1 10−2 10−3 10−4 h(π − β) 2. The resulting equation is d2 θ g sin θ + θ = 0. ideal-spring equation—in which the period is independent of amplitude.43) into the linear.34 Nearly vertical release Imagine releasing the pendulum from almost vertical: an initial angle π − β with β tiny. does the period.

make a lumping approximation by replacing the changing f(θ) with a constant. Then the pendulum equation becomes dθ g + θf(θ0 ) = 0. the ideal-spring equation. the f(θ) → f(0) lumping approximation discards too much information. As a countermeasure. dt2 l 2 0 θ0 1 f(θ0 ) (3.48) . As is often the case. so h = 1 for all amplitudes.45) 1 f(0) 0 This equation is. the zero- Because the zero-amplitude pendulum has period T = 2π amplitude. a changing process is diﬃcult to analyze—for example. f(θ) falls signiﬁcantly below 1. The simplest constant is f(0). When θ is tiny. But when θ is large.52 3 Lumping f(θ) The nonconstant f(θ) encapsulates the nonlinearity of 1 the pendulum equation. this equation is linear! It describes a zeroamplitude pendulum on a planet with gravity geﬀ that is slightly weaker than earth gravity—as shown by the following slight regrouping: geﬀ d θ gf(θ0 ) θ = 0. low-gravity pendulum has period T (θ0 ) ≈ 2π l = 2π geﬀ l . see the awful integrals in Problem 3. replace f(θ) with the other extreme f(θ0 ). For determining how the period of an unapproximated pendulum depends on amplitude. ideal-spring system. In this approximation. Therefore.46) 0 0 θ0 Is this equation linear? What physical system does it describe? Because f(θ0 ) is a constant. Then the pendulum diﬀerential equation becomes d2 θ g + θ = 0. dt2 l (3. making the ideal-spring approximation signiﬁcantly 0 0 θ0 inaccurate.31. gf(θ0 ) (3. f(θ) ≈ 1: The pendulum behaves like a linear. again. + dt2 l 2 (3. period does not depend on amplitude.47) l/g.

a moderate angle. so it agrees with the thought experiment of releasing the pendulum from upright (Section 3.51) Another Taylor series yields (1 + x)−1/2 ≈ 1 − x/2 (for small x).5 Predicting the period of a pendulum f−1/2 53 Using the dimensionless period h avoids writing the factors of 2π. becomes h(θ0 ) ≈ 1− θ2 0 6 −1/2 . Even at moderate 0 amplitudes.53) Compared to the period at zero amplitude. (3. the period is nearly independent of amplitude! Problem 3.50) Then h(θ0 ). Therefore. θ0 6 (3. 12 (3. a 10◦ amplitude produces a fractional increase of roughly θ2 /12 ≈ 0.33 about the slope of h(θ0 ) at θ0 = 0. (3. . h(θ0 ) ≈ 1 + θ2 0 . and g.0025 or 0. so the approximate prediction for h can itself accurately be approximated using a Taylor series. How much larger than the period at zero amplitude is the period at 10◦ amplitude? A 10◦ amplitude is roughly 0.5.3. it also predicts h(π) = ∞. so f(θ0 ) = θ2 sin θ0 ≈ 1 − 0.35 Slope revisited Use the preceding result for h(θ0 ) to check your conclusion in Problem 3. T ≈ 2π l g 1+ θ2 0 12 .17 rad. which is roughly f(θ0 )−1/2 . and it yields the simple prediction h(θ0 ) ≈ f(θ0 ) −1/2 = sin θ0 θ0 −1/2 h . (3.3).52) Restoring the dimensioned quantities gives the period itself. l.25%. As a bonus. The Taylor series for sin θ begins θ − θ3 /6.49) 1 π θ0 At moderate amplitudes the approximation closely follows the exact dimensionless period (dark curve).

Therefore. Instead. . it assumed that the mass always remained at the endpoints of the motion where |θ| = θ0 . the f(θ) → f(θ0 ) lumping approximation overestimates h and the period. An improved guess might be two-thirds of the way from 0 to 1/12. A natural guess is that the coeﬃcient lies halfway between these extremes—namely.6 Summary and further problems Lumping turns calculus on its head. namely 1/18. Equivalently. . and the rough places plain. and mildly nonlinear diﬀerential equations into linear diﬀerential equations. 1/24. which predicts T = 2π l/g. lumping simpliﬁes a changing process by combining it into one unchanging process. Because h is inversely related to f (h = f−1/2 ).55) Our educated guess of 1/18 is very close to the true coeﬃcient of 1/16! 3. Therefore. underestimates the period. Whereas calculus analyzes a changing process by dividing it into ever ﬁner intervals.54 3 Lumping Does our lumping approximation underestimate or overestimate the period? The lumping approximation simpliﬁed the pendulum diﬀerential equation by replacing f(θ) with f(θ0 ). the true coeﬃcient of the θ2 term 0 in the period approximation T ≈ 2π l g 1+ θ2 0 12 (3. the pendulum spends more time toward the extremes (where f(θ) = f(θ0 )) than it spends near the equilibrium position (where f(θ) = f(0)). In comparison. (Isaiah 40:4) . the true coefﬁcient is probably closer to 1/12—the prediction of the f(θ) → f(θ0 ) approximation—than it is to 0. the crooked shall be made straight. the average f is greater than f(θ0 ). 16 3072 0 (3. the pendulum spends much of its time at intermediate positions where |θ| < θ0 and f(θ) > f(θ0 ). diﬃcult integrals into multiplication. The f(θ) → f(0) lumping approximation.54) lies between 0 and 1/12. 33]: T = 2π l g 1+ 1 2 11 4 θ0 + θ + ··· . a full successive-approximation solution of the pendulum diﬀerential equation gives the following period [13. Therefore. It turns curves into straight lines. . However.

In other words.59) a. would T decrease.37 Hypothetical pendulum equation Suppose the pendulum equation had been Problem 3. estimate the area of its n-sigma tail (for large n). 2π 2 (3. remain constant. Problem 3.57) How would the period T depend on amplitude θ0 ? In particular. as θ0 increases. l dθ2 (3. Problem 3.61) . 4 −∞ 1 + x (3.6 Summary and further problems 55 √ Then compare the estimate with the exact value of π/ 2.39 Distant Gaussian tails For the canonical probability Gaussian. d.36 FWHM for another decaying function Use the FWHM heuristic to estimate ∞ dx .1) to estimate the area.60) dx = 2 2π 1 where erf (z) is the error function.3.33. (3. Compare the two lumping estimates with the result of numerical integration: √ ∞ −x2 /2 1 − erf (1/ 2) e √ ≈ 0. or increase? What is the slope dT/dθ0 at zero amplitude? Compare your results with the results of Problem 3.159. 2π 2 (3. In this problem you estimate the area of the 1-sigma tail ∞ 1 e−x /2 √ dx.58) The area of its tail is an important quantity in statistics. For an enjoyable additional problem.56) d2 θ g + tan θ = 0. but it has no closed form. estimate ∞ n e−x /2 √ dx.2. Sketch the above Gaussian and shade the 1-sigma tail. Use the 1/e lumping heuristic (Section 3.38 Gaussian 1-sigma tail The Gaussian probability density function with zero mean and unit variance is p(x) = e−x /2 √ . Use the FWHM heuristic to estimate the area. ﬁnd an approximate expression for the dimensionless period h(θ0 ) and use it to check your previous conclusions. 2π 2 (3. c. derive the exact value. b. For small but nonzero θ0 . Problem 3.

.

My danger sense activates only after the temperature conversion connects the temperature to my experience.4 4. I therefore react in two stages: 1. is unconvincing compared to an argument that speaks to our perceptual system. yet still not believed the theorem? You realize that the theorem is true. elicits no reaction. When I hear about a temperature of 40◦ C.6 Adding odd numbers Arithmetic and geometric means Approximating the logarithm Bisecting a triangle Summing series Summary and further problems 58 60 66 70 73 75 Have you ever worked through a proof. sequential reasoning requires language. but not why it is true. I convert 40◦ C to Fahrenheit: 40 × 1. imagine learning that your child has a fever and hearing the temperature in Fahrenheit or Celsius degrees.4 Pictorial proofs 4. temperatures are mostly in Fahrenheit.8 + 32 = 104.1 4. 2. which has . scholarly history of the brain. although symbolically equivalent to the Fahrenheit temperature. whichever is less familiar.3 4.) Symbolic. The reason lies in how our brains acquired the capacity for symbolic reasoning. 104◦ F. That’s dangerous! Get thee to a doctor!” The Celsius temperature. In my everyday experience. To see the same contrast in a familiar example.2 4. A symbolic description. whether a proof or an unfamiliar temperature. understood and conﬁrmed each step. I react: “Wow.5 4. (See Evolving Brains [2] for an illustrated.

Seeing an idea conveys to us a depth of understanding that a symbolic description of it cannot easily match. Although 105 yr spans many human lifetimes. it is short compared to the time span over which our perceptual hardware has evolved: For several hundred million years. sequential hardware is an ill-developed latecomer. 2. At tasks like recognizing faces or smells. How do you explain these contrasts? Problem 4. For this proof.1) . Compared to our perceptual hardware. 2. In particular. or 3 lead to the conjecture that Sn = n2 .2 Linguistic evidence for the importance of perception In your favorite language(s). Problem 4.1 Adding odd numbers To illustrate the value of pictures. smelling. the following. our perceptual abilities far surpass our symbolic abilities. think of the many sensory synonyms for understanding (for example. n terms Easy cases such as n = 1.25): Sn = 1 + 3 + 5 + · · · + (2n − 1) . weaker induction hypothesis is suﬃcient: (4. In that case. Verify that Sn = n2 for the base case n = 1. Evolution has worked 1000 times longer on our perceptual abilities than on our symbolic-reasoning abilities. tasting. grasping). let’s ﬁnd the sum of the ﬁrst n odd numbers (also the subject of Problem 2. our symbolic. so the base case is veriﬁed. Even an apparently high-level symbolic activity such as playing grandmaster chess uses mostly perceptual hardware [16]. Make the induction hypothesis: Assume that Sm = m2 for m less than or equal to a maximum value n. as is n2 . touching. Not surprisingly. it is an evolutionary eyeblink. computers are much faster than people. even young children are much faster than current computers. But how can the conjecture be proved? The standard symbolic method is proof by induction: 1.1 Computers versus people At tasks like expanding (x + 2y)50 . and seeing.58 4 Pictorial proofs evolved for only 105 yr. S1 is 1. 4. organisms have reﬁned their capacities for hearing.

so the n terms build an n × n square. 1 (4. the sum on the right is n2 . Perform the induction step: Use the induction hypothesis to show that Sn+1 = (n + 1)2 .6) Each successive odd number—each piece—extends the square by 1 unit in height and width. . Start by drawing each odd number as an L-shaped puzzle piece: 5 3 1 (4. (4. [Or is it an (n − 1) × (n − 1) square?] Therefore. After grasping this pictorial proof. you cannot forget why adding up the ﬁrst n odd numbers produces n2 . and the theorem is proved. That missing understanding—the kind of gestalt insight described by Wertheimer [48]—requires a pictorial proof. Thus Sn+1 = (2n + 1) + n2 .1 Adding odd numbers n 59 (2k − 1) = n2 . why the sum Sn ends up as n2 still feels elusive. Although these steps prove the theorem. their sum is n2 . 3. we assume the theorem only in the case that m = n.5) How do these pieces ﬁt together? Then compute Sn by ﬁtting together the puzzle pieces as follows: 3 3 S2 = 1 + 5 3 = 1 5 3 S3 = 1 + + = 1 (4.4. The sum Sn+1 splits into two pieces: n+1 n Sn+1 = 1 (2k − 1) = (2n + 1) + 1 (2k − 1).3) Thanks to the induction hypothesis.4) (4. which is (n + 1)2 .2) In other words.

What do you notice when a and b are close to each other? Can you formalize the pattern? (See also Problem 4. 0 (4. (4.9) Give pictorial explanations for the 1 in the summand 3k2 + 3k + 1. The arithmetic mean √ is 1.16. it is the famous arithmetic-mean–geometric-mean (AM–GM) inequality [18]: √ a+b ab . for the 3 and the k2 in 3k2 . (4.2 Arithmetic and geometric means The next pictorial proof starts with two nonnegative numbers—for example. 1 and 2.414.10) (4. the geometric mean is smaller than the arithmetic mean.3 Triangular numbers Draw a picture or pictures to show that 1 + 2 + 3 + · · · + n + · · · + 3 + 2 + 1 = n2 . Then show that (4.464. This pattern is general.12) 2 AM GM (The inequality requires that a. 3 and 4—and compares the following two averages: arithmetic mean ≡ 3+4 = 3.4 Three dimensions Draw a picture to show that n (3k2 + 3k + 1) = (n + 1)3 .8) Problem 4.5.11) Try another pair of numbers—for example. For both pairs. b 0.5 More numerical examples Test the AM–GM inequality using varied numerical examples. and for the 3 and the k in 3k. 2 √ geometric mean ≡ 3 × 4 ≈ 3. 2 (4. the geometric mean is 2 ≈ 1.60 4 Pictorial proofs Problem 4.) Problem 4.7) 1 + 2 + 3 + ··· + n = n(n + 1) . 4.) .5.

What is pictorial. and the altitude √ x is their geometric mean ab.1 Symbolic proof The AM–GM inequality has a pictorial and a symbolic proof. so a2 − 2ab + b2 0. 2 (4. the whole chain seems like magic and leaves √ the why mysterious. 4.2 Arithmetic and geometric means 61 4. The second odd choice is to form (a − b)2 . It is nonnegative. √ 2 ab and (4. The hypotenuse splits into two lengths a and b. their aspect ratios (the ratio of the short to the long side) are identical. so a + b a+b √ ab. The result is a2 + 2ab + b2 (a+b)2 4ab. x/a = √ b/x: The altitude x is therefore the geometric mean ab. If the algebra had ended with (a + b)/4 ab.2. Lay it with its hypotenuse horizontal. Now magically decide to add 4ab to both sides.4. The symbolic proof begins with (a − b)2 —a surprising choice because the inequality contains a + b rather than a − b.2. compare the small. dark triangle to the large. it would not look obviously wrong.14) Although each step is simple. In contrast. or geometric. a convincing proof would leave us feeling that the inequality cannot help but be true. x a b b x . In symbols. about the geometric mean? A geometric picture for the geometric mean starts with a right triangle.2 Pictorial proof This satisfaction is provided by a pictorial proof.13) The left side is (a + b)2 . The two triangles are similar! Therefore. light triangle by rotating the small triangle and laying it on the large triangle. then cut it with the altitude x into the light and dark subtriangles. √ Why is the altitude x equal to ab? √ To show that x = ab.

circumscribe a semicircle around the triangle.6). The altitude cannot exceed the radius. The picture therefore contains the inequality and its equality condition in one easyto-grasp object.6 Circumscribing a circle around a triangle Here are a few examples showing a circle circumscribed around a triangle.15) Alas. the two sides are equal only when the altitude of the triangle is also a radius of the semicircle—namely when a = b. Can a semicircle always be circumscribed around a right triangle while aligning the circle’s diameter along the hypotenuse? . (4. this claim is not pictorially obvious.62 4 Pictorial proofs The uncut right triangle represents the geometric-mean portion of the AM–GM inequality. therefore.7 Finding the right semicircle A triangle uniquely determines its circumscribing circle (Problem 4. matching the circle’s diameter with the hypotenuse a + b (Problem 4. (4. the circle’s diameter might not align with a side of the triangle. Problem 4. However. Therefore. (An alternative pictorial proof of the AM–GM inequality is developed in Problem 4.7). a+b √ ab. Can you ﬁnd an alternative geometric interpretation of the arithmetic mean that makes the AM–GM inequality pictorially obvious? The arithmetic mean is also the radius of a circle with diameter a + b. the inequality claims that hypotenuse 2 altitude. The arithmetic mean (a + b)/2 also has a picture. Thus.) Problem 4. as one-half of the hypotenuse.33. Draw a picture to show that the circle is uniquely determined by the triangle.16) 2 a+b 2 a √ ab b Furthermore.

8 Geometric mean of three numbers For three nonnegative numbers.3 Applications Arithmetic and geometric means have wide mathematical application. let me know. without using calculus. The perimeter P = 2(a + b) is four times the arithmetic mean. Therefore. and the area A = ab is the square of the geometric mean. Thus the right side. 0. Problem 4. then the AM–GM inequality might help maximize the area. which varies depending on a and b. If the perimeter is related to the arithmetic mean and the area to the geometric mean. It is symbolic reasoning built upon the pictorial proof for the AM– GM inequality. The maximal-area rectangle is a square. from the AM–GM inequality.10 Three-part product Find the maximum value of f(x) = x2 (1 − 2x) for x Sketch f(x) to conﬁrm your answer.17) Why is this inequality. has a maximum of P/4 when a = b. The left side is ﬁxed by the amount of fence. . What shape of rectangle maximizes the area? The problem involves two quantities: a perimeter that is ﬁxed and an area to maximize. in contrast to its two-number cousin. √ P A 4 AM GM b a garden (4.2 Arithmetic and geometric means 63 Problem 4.9 Direct pictorial proof The AM–GM reasoning for the maximal rectangular garden is indirect pictorial reasoning.18) with equality when a = b. the AM–GM inequality is a+b+c 3 (abc)1/3 . unlikely to have a geometric proof? (If you ﬁnd a proof.2. The ﬁrst application is a problem more often solved with derivatives: Fold a ﬁxed length of fence into a rectangle enclosing the largest garden. (4.) 4. Can you draw a picture to show directly that the square is the optimal shape? Problem 4.4.

14 Trigonometric maximum In the region t ∈ [0. amazingly rapid method for computing π [5. 2 sin t cos t. and fold in the ﬂaps. b = 1 − 2x. Problem 4. choosing x = 1/3 should maximize the volume of the box. Then abc is the 3 volume V .11 Unrestricted maximal area If the garden need not be rectangular.64 4 Pictorial proofs Problem 4. The box has volume V = x(1 − 2x)2 .13 Trigonometric minimum Find the minimum value of 9x2 sin2 x + 4 x sin x in the region x ∈ (0. maximize sin 2t or. where x is the side length of a corner cutout. π). Therefore. (4.8). Now show that this choice is wrong by graphing V(x) or setting dV/dx = 0. what is the maximal-area shape? Problem 4. perhaps to test the hardware of a new supercomputer or to study whether the digits of π are random (a theme in Carl Sagan’s novel Contact [40]). Set√ = x. What choice of x maximizes the volume of the box? base ﬂap x x Here is a plausible analysis modeled on the analysis of the a rectangular garden. cut out four identical corners. the maximum volume is attained when x = 1 − 2x. and make a correct version. . and c = 1 − 2x. Ancient methods for computing π included calculating the perimeter of many-sided regular polygons and provided a few decimal places of accuracy. 9 Obtaining 109 digits requires roughly 1010 terms—far more terms than atoms in the universe.19) The second application of arithmetic and geometric means is a modern. and V 1/3 = abc is the geometric mean (Problem 4. π/2]. Problem 4. explain what is wrong with the preceding reasoning. 6]. Setting x = 1 in the Leibniz series produces π/4.20) Imagine that you want to compute π to 109 digits. 3 5 7 (4. Recent computations have used Leibniz’s arctangent series arctan x = x − x3 x5 x7 + − + ···.12 Volume maximization Build an open-topped box as follows: Start with a unit square. equivalently. Because the geometric mean never exceeds the arithmetic mean and because the two means are equal when a = b = c. but the series converges extremely slowly.

the modern Brent–Salamin algorithm [3.16).23) an+1 = n n 2 The a and g sequences rapidly converge to a number M(a0 . 41]. (4.22) Even with the speedup. dn+1 ∼ d2 n (Problem 4. compute the a. each iteration in this computation of π doubles the digits of accuracy.15) and also for calculating mutual inductance [23]. g0 )2 . geometric means gn . it then computes successive arithmetic means an . gn+1 = an gn . and d sequences and the common limit M(a0 . as for the computation of π.2 Arithmetic and geometric means 65 Fortunately. dn = a2 − g2 . converges to π extremely rapidly. g0 ) of the a and g sequences. Then the perimeter P can be computed with the following formula: . Problem 4. which relies on arithmetic and geometric means. g.21) accelerates the convergence by reducing x: 1 1 π =4× 1− + ··· − 1 − + ··· . a surprising trigonometric identity due to John Machin (1686– 1751) arctan 1 = 4 arctan 1 1 − arctan 5 239 (4.24) The d sequence approaches zero quadratically. 109 -digit accuracy requires calculating roughly 109 terms. and their squared diﬀerences dn . g0 ) and the diﬀerence sequence d determine π. The algorithm generates several sequences by starting √ with a0 = 1 and g0 = 1/ 2. The algorithm is closely related to amazingly accurate methods for calculating the perimeter of an ellipse (Problem 4. 1 − ∞ 2j+1 dj j=1 (4. in other words. Then M(a0 . A billion-digit calculation of π requires only about 9 30 iterations—far fewer than the 1010 terms using the arctangent series with x = 1 or even than the 109 terms using Machin’s speedup. g0 ) called the arithmetic–geometric mean of a0 and g0 . Therefore.15 Perimeter of an ellipse To compute the perimeter of an ellipse with semimajor axis a0 and semiminor axis g0 . In contrast. π= 4M(a0 .4. 3 4 3×5 3 × 2393 arctan (1/5) arctan (1/239) (4. √ an + gn .

) Quadratic convergence √ Start with a0 = 1 and g0 = 1/ 2 (or any other positive pair) and follow several iterations of the AM–GM sequence Problem 4.25) where A and B are constants for you to determine. (See [3] to check your values and for a proof of the completed formula.5).17 Rapidity of convergence Pick a positive x0 .29) . g0 ) ⎞ 2 dj ⎠ . j 4 Pictorial proofs ∞ j=0 (4. θ Fortunately.3 Approximating the logarithm A function is often approximated by its Taylor series f(x) = f(0) + x df dx x=0 unit circle + x2 d2 f 2 dx2 x=0 + · · · . Problem 4.16 an+1 = an + g n 2 and gn+1 = √ an g n . the one-term approximation sin θ ≈ θ. (4. Another Taylor-series illustration of the value of pictures come from the series for the logarithm function: ln(1 + x) = x − x2 x3 + − ···. linear equation (Section 3. then generate a sequence by the iteration xn+1 = 1 2 xn + 2 xn (n 0 ).27) To what and how rapidly does the sequence converge? What if x0 < 0? 4. Use the method of easy cases (Chapter 2) to determine their values.66 ⎛ A ⎝a2 − B P= 0 M(a0 . (4. 2 3 (4. For example. turns the nonlinear pendulum diﬀerential equation into a tractable. pictures often explain the ﬁrst and most important terms in a function approximation. which replaces the altitude of the triangle by the arc of the circle.26) Then generate dn = a2 − g2 and log10 dn to check that dn+1 ∼ d2 (quadratic n n n convergence).28) 1 sin θ θ which looks like an unintuitive sequence of symbols. (4.

4). Its second term.3 Approximating the logarithm 67 Its ﬁrst term.31) t 0 x This area reproduces the ﬁrst term in the Taylor series. This area slightly underestimates ln(1 + x). Both dance around the exact value.3. 1 1 1+t t 0 x Problem 4. 1 x x 1 x 1 1+t (4. helps evaluate the accuracy of that approximation.1 or x = 0. it slightly overestimates ln(1 + x).4. −x2 /2.30) 0 t What is the simplest approximation for the shaded area? As a ﬁrst approximation. The rectangle has area x: area = height × width = x. Because it uses a circumscribed rectangle.32) by trying x = 0.2. Then draw a picture to illustrate the equivalent approximation (1 − x)(1 + x) ≈ 1. These ﬁrst two terms are the most useful terms—and they have pictorial explanations. How can the inscribed. 1+t 1 1+t (4. Thus the inscribed rectangle has the approximate area x(1 − x) = x − x2 . The second approximation came from drawing the inscribed rectangle. The area can also be approximated by drawing an inscribed rectangle.18 Picture for approximating the reciprocal function Conﬁrm the approximation 1 ≈1−x 1+x (for small x) (4. The starting picture is the integral representation ln(1 + x) = x 0 1 ln(1 + x) dt . will lead to the wonderful approximation (1 + x)n ≈ enx for small x and arbitrary n (Section 5. x. but its height is not 1 but rather 1/(1 + x). The ﬁrst and slightly simpler approximation came from drawing the circumscribed rectangle. Its width is again x.18). the shaded area is roughly the circumscribed rectangle—an example of lumping. which is approximately 1 − x (Problem 4.and circumscribed-rectangle approximations be combined to make an improved approximation? . We now have two approximations to ln(1 + x).

693).19 Cubic term Estimate the cubic term in the Taylor series by estimating the diﬀerence between the trapezoid and the true area.68 4 Pictorial proofs One approximation overestimates the area. The average is a trapezoid with area x + (x − x ) x =x− .2. Fortunately. the direct approximation of π/4 requires many terms to attain even moderate accuracy. Even moderate accuracy for ln 2 requires many terms of the Taylor series. the hardest problem is ln 2. an analogous rewriting of ln 2 is ln 2 = ln 4 2 − ln . and the other underestimates the area. 1 ln(1 + 1) ≈ 1− 1 2 (one term) (two terms).37) . 3 3 (4. the trigonometric identity arctan 1 = 4 arctan 1/5 − arctan 1/239 lowers the largest x to 1/5 and thereby speeds the convergence.20). The problem is that x in ln(1 + x) is 1.33) This area reproduces the ﬁrst two terms of the full Taylor series ln(1 + x) = x − x3 x2 + − ···. The same problem happens when computing π using Leibniz’s arctangent series (Section 4. For these logarithm approximations.35) Both approximations diﬀer signiﬁcantly from the true value (roughly 0. 3 5 7 (4. Is there an analogous that helps estimate ln 2? Because 2 is also (4/3)/(2/3). 2 2 2 2 1 1 1+t t 0 x (4. so the xn factor in each term of the Taylor series does not shrink the high-n terms. 2 3 (4.3) arctan x = x − x3 x5 x7 + − + ···. far beyond what pictures explain (Problem 4. their average ought to improve on either approximation. (4.36) By using x = 1.34) Problem 4.

Let’s therefore use ln(1 + x) ≈ x to approximate the two logarithms: ln 2 ≈ 1 1 − − 3 3 = 2 .) Problem 4. use the two-term approximation that ln(1+x) ≈ x−x2 /2 to estimate ln 2.22 Two terms of the Taylor series After rewriting ln 2 as ln(4/3) − ln(2/3).21 Second rewriting Repeat the rewriting method by rewriting 4/3 and 2/3. Problem 4. and thereby explain why the rational-function approximation is more accurate than even the two-term series ln(1 + x) ≈ x − x2 /2.3 Approximating the logarithm 69 Each fraction has the form 1 + x with x = ±1/3.40) where y = x/(2 + x).23 Rational-function approximation for the logarithm The replacement ln 2 = ln(4/3) − ln(2/3) has the general form ln(1 + x) = ln 1+y . one term of the logarithm series might provide reasonable accuracy. n (4. namely 2/3.24 investigates a pictorial explanation. Because x is small.38) This estimate is accurate to within 5%! The rewriting trick has helped to compute π (by rewriting the arctan x series) and to estimate ln(1 + x) (by rewriting x itself). Use the expression for y and the one-term series ln(1+x) ≈ x to express ln(1+x) as a rational function of x (as a ratio of polynomials in x). how many terms are required to estimate ln 2 to within 5%? Problem 4. Compare the approximation to the one-term estimate. What are the ﬁrst few terms of its Taylor series? Compare those terms to the ﬁrst few terms of the ln(1 + x) Taylor series.39) If you set x = 1 in this series. This idea therefore becomes a method—a trick that I use twice (this deﬁnition is often attributed to Polya). 3 (4. How accurate is the revised estimate? Problem 4.20 How many terms? The full Taylor series for the logarithm is ∞ ln(1 + x) = 1 (−1)n+1 xn . then estimate ln 2 using only one term of the logarithm series. . (Problem 4.4. 1−y (4.

What is the shape of the smaller triangle. Outline the same region when using the trapezoid ap.4 Bisecting a triangle Pictorial solutions are especially likely for a geometric problem: What is the shortest path that bisects an equilateral triangle into two regions of equal area? The possible bisecting paths form an uncountably inﬁnite set.42) l= 1 2 An alternative straight path splits the triangle into a trapezoid and a small triangle. Use the integral representation of ln(1 + x) to explain why the shaded area is ln 2. and how long is the path? √ l = 1/ 2 1 l l= √ 3/2 The triangle is similar to the original triangle. try easy cases (Chapter 2)—draw a few equilateral triangles and bisect them with easy paths. c.41) ln 2 1 1+t when using the circumscribed-rectangle approximation for each logarithm. ideas. Patterns. Outline the region that represents ln 2 4 − ln 3 3 1 (4. so it too is equilateral. and it has length √ 3 2 − (1/2)2 = ≈ 0. (4. t 4. To manage the complexity. b. it has one-half of the area of the original triangle. or even a solution might emerge.866. What are a few easy paths? The simplest bisecting path is a vertical segment that splits the triangle into two right triangles each with base 1/2.24 Pictorial interpretation of the rewriting a.−1/3 1/3 proximation ln(1+x) = x−x2 /2. Show pictorially that this region. although a diﬀerent shape. has the same area as the region that you drew in item b. This path is the triangle’s altitude.70 4 Pictorial proofs Problem 4. Furthermore. so its three .

This path is.26). Two examples are illustrated here: Do you expect closed bisecting paths to be longer or shorter than the shortest one-segment path? Give a geometric reason for your conjecture. but the shortest two-segment path has an approximate length of 0. Which one-segment path is the shortest? Now let’s investigate easy two-segment paths. whose lengths are 1/ 2 and 3/2. The length decrease suggests trying extreme paths: paths with an inﬁnite number of . Therefore. Each small triangle therefore occupies one-fourth of the entire l=1 area and has side length 1/2. Thus this path has length 1/√ 2 ≈ 0. and check the conjecture by ﬁnding the lengths of the two illustrative closed paths.43) Problem 4.4 Bisecting a triangle 71 √ sides. One possible path encloses a diamond and excludes two small triangles.707. This conjecture deserves to be tested (Problem 4.26 All two-segment paths Draw a ﬁgure showing the variety of two-segment paths. a reasonable conjecture is that the shortest path has the fewest segments.707—a substantial improvement on the vertical path with length 3/2. showing that it has length l = 2 × 31/4 × sin 15◦ ≈ 0. A few of them are shown in the ﬁgure. Problem 4. unfortunately.27 Bisecting with closed paths The bisecting path need not begin or end at an edge of the triangle. it has length 1. Find the shortest path.681. Because the bisecting path contains two of these sides. The two small triangles occupy one-half of the entire area. are a factor of 2 smaller than the √ sides of the original triangle. one of which is the bisecting path.25 All one-segment paths An equilateral triangle has inﬁnitely many one-segment bisecting paths. Problem 4.4. longer √ √ than our two one-segment candidates.681. Does using fewer segments produce shorter paths? The shortest one-segment path has an approximate length of 0. (4.

Here is the hexagon built from the triangle bisected by a horizontal line: The six bisecting paths form an internal hexagon whose area is one-half of the area of the large hexagon. putting the center inside the triangle and using a full circle produces a long bisecting path (Problem 4. try curved paths. However.72 4 Pictorial proofs segments. The condition on r is that πr2 = 3 3/4: 1 1 × area of the full circle = × area of the triangle . the length of the arc is πr/3. which is approximately 0.27). It might be the shortest possible path. so imagine a bisecting arc centered on one vertex. To test this conjecture.44) √ The radius is therefore (3 3/4π)1/2 . The only other plausible center is a vertex of the triangle. What is a likely candidate for the shortest circle or piece of a circle that bisects the triangle? Whether the path is a circle or piece of a circle. use the requirement that the arc must bisect the triangle. To ﬁnd the radius. The easiest curved path is probably a circle or a piece of a circle. Therefore. the arc encloses one-half √ of the triangle’s area. it needs a center. How long is this arc? The arc subtends one-sixth (60◦ ) of the full circle.673. so its length is l = πr/3. 6 2 √ πr2 3/4 (4. This curved path is shorter than the shortest two-segment path. build a hexagon by replicating the bisected equilateral triangle. where r is radius of the full circle. Because an equilateral triangle is one-sixth of a hexagon. What happens when replicating the triangle bisected by the circular arc? . In other words. we use symmetry.

one-sixth of the circle is the shortest bisecting path. The curve intersects at the left endpoint of the edge. In the preceding ﬁgure and the analysis of this section.4. Our ﬁrst approximation to n! began with its integral representation and then used lumping (Section 3.30 Drawing the smooth curve Setting the height of the rectangles requires drawing the ln k curve—which could intersect the top edge of each rectangle anywhere along the edge.5 Summing series For the ﬁnal example of what pictures can explain.5 Summing series 73 When that triangle is replicated. . The curve intersects at the midpoint of the edge. the curve intersects at the right endpoint of the edge. which surface has the smallest area? 4. Problem 4. is already a pictorial analysis. by replacing a curve with a rectangle whose area is easily computed. return to the factorial function.45) This sum equals the combined area of the circumscribing rectangles. redo the analysis for two other cases: a. For a ﬁxed area. b.11).2. therefore. produces a fragmented enclosed region rather than a convex polygon. Lumping. if replicated and only rotated. its six copies make a circle with area equal to one-half of the area of the hexagon. Problem 4. a circle has the shortest perimeter (the isoperimetric theorem [30] and Problem 4. (4. A second picture for n! begins with the summation representation n ln 5 ln 4 ln 3 ln 2 ln k ln n! = 1 1 2 3 4 5 k ln k.29 Bisecting the cube Of all surfaces that bisect a cube into two equal volumes. How can the triangle be replicated so that the six bisecting paths form a regular polygon? Problem 4.3).28 Replicating the vertical bisection The triangle bisected by a vertical line. After reading the section.

3). The resulting triangles would be easier to add if they were rectangles. With your left hand. In descending order of importance. the factors in Stirling’s approximation are √ √ n! ≈ nn × e−n × n × 2π.47) Each factor has a counterpart in a factor from Stirling’s approximation (Section 3. let’s double each triangle to make it a rectangle. the triangular protrusions sum to (ln n)/2. (4. lay your right hand along the ln k k = n vertical line. √ ln k From where does the n factor come? √ The n factor must come from the fragments above the ln k curve. Therefore. This triangle correction improves the integral approximation.46) n ln k dk 1 Each term in this ln n! approximation contributes one factor to n!: n! ≈ nn × e−n × e. The resulting approximation for ln n! now has one more term: . (4. k Because each piece is double the corresponding 1 ··· n triangular protrusion. shove the pieces to the right until they hit your right hand.48) The integral approximation reproduces the two √ most important factors and almost reproduces the fourth factor: e and 2π diﬀer by only 8%. 1 ··· n Therefore. so ln n! ≈ n 1 ln k ln k dk = n ln n − n + 1. redraw the ln k curve using straightline segments (another use of lumping). What is the sum of these rectangular pieces? 1 ··· ln k n k To sum these pieces.74 4 Pictorial proofs That combined area is approximately the area under the ln k curve. √ The only unexplained factor is n. 1 ··· n k (4.2. They are almost triangles k and would be easier to add if they were triangles. The pieces then stack to form the ln n rectangle.

32 Next correction The triangle correction is the ﬁrst of an inﬁnite series of corrections.50) Compared to Stirling’s approximation. √ n! ≈ nn × e−n × e × n. whose area is given by Archimedes’ formula (Problem 4. But the n−1 correction can be derived with pictures. . b. The corrections include terms proportional to n−2 . evolution has reﬁned our perceptual abilities. . Problem 4.2 using the technique of analogy. the only remaining diﬀerence is √ the factor of e that should be 2π.3. Problem 4. and they are diﬃcult to derive using only pictures. 2 (4.6 Summary and further problems 75 ln n! ≈ n ln n − n + 1 + integral ln n .51) Use that property to approximate the area of each region. Each region is bounded above by a curve that is almost a parabola. Draw the regions showing the error made by replacing the smooth ln k curve with a piecewise-linear curve (a curve made of straight segments).6 Summary and further problems For tens of millions of years. 3 n 1 ln k.31 Underestimate or overestimate? Does the integral approximation with the triangle correction underestimate or overestimate n!? Use pictorial reasoning. (4. a.49) triangles √ Upon exponentiating to get n!. improved constant term (formerly e) in the approxima√ tion to n! and how close is it to 2π ? What factor does the n−1 term in the ln n! approximation contribute to the n! approximation? These and subsequent corrections are derived in Section 6. 4. the correction contributes a factor of n. (4. an error of only 8%—all from doing one integral and drawing a few pictures. these regions sum to approxi- d. n−3 .34) area = 2 × area of the circumscribing rectangle.. c. A small child recognizes patterns more reliably and quickly than does . Show that when evaluating ln n! = mately (1 − n−1 )/12. .4. What is the resulting. then check the conclusion numerically.

then use the recipe to estimate 2. Problem 4. Problem 4. Show that the closed parabola also encloses two-thirds of the circumscribing parallelogram with vertical sides. therefore.33 Another picture for the AM–GM inequality Sketch y = ln x to show that the arithmetic mean of a and b is always greater than or equal to their geometric mean. given that its surface area is 4πr2 .35 to ﬁnd the volume of a sphere of radius r. see the works of Nelsen [31. Draw a picture to √ justify this recipe. solving f(t) = 0 requires approximations.38 Newton–Raphson method In general. Pictorial reasoning. Can you reconstruct the argument? = Problem 4. One method is to start with a guess t0 and to improve it iteratively using the Newton–Raphson method tn+1 = tn − f(tn ) . It makes us more intelligent by helping us understand and see large ideas at a glance.76 4 Pictorial proofs the largest supercomputer.32). 32].35 Ancient picture for the area of a circle The ancient Greeks knew that the circumference of a circle with radius r was 2πr.37 A famous sum ∞ 1 Use pictorial reasoning to approximate the famous Basel sum n−2 . Problem 4.) .52) where f (tn ) is the derivative df/dt evaluated at t = tn . Problem 4. Illustrate the argument with a sketch. in Problem 4. (Then try Problem 4.34 Archimedes’ formula for the area of a parabola Archimedes showed (long before calculus!) that the closed parabola encloses two-thirds of its circumscribing rectangle.36 Volume of a sphere Extend the argument of Problem 4. taps the mind’s vast computational power. f (tn ) (4. Prove this result by integration. with equality when a = b. These pictorial recipes are useful when approximating functions (for example. They then used the following picture to show that its area is πr2 . Here are further problems to develop pictorial reasoning.17. Problem 4. For extensive and enjoyable collections of picture proofs.

5 Taking out the big part 5. quadratic equations (Section 5.1 Multiplication using one and few The ﬁrst illustration is a method of mental multiplication suited to rough.4 5. A data CD-ROM has the same format and storage capacity as a music CD.1). This procedure of successive approximation or “taking out the big part” generates meaningful. 1 hr 1s 1 sample sample rate sample size (5. and a diﬃcult trigonometric integral (Section 5. whose capacity can be estimated as the product of three factors: 1 hr × 16 bits 3600 s 4.1) playing time . The particular calculation is the storage capacity of a data CD-ROM.4 × 104 samples × × 2 channels × .2 5.5 5. and usable expressions.1 5.3 5.4). 5. exponentiation (Section 5. First approximate and understand the most important eﬀect—the big part—then reﬁne your analysis and understanding. memorable. The following examples introduce the related idea of lowentropy expressions (Section 5.2) and analyze mental multiplication (Section 5.6 Multiplication using one and few Fractional changes and low-entropy expressions Fractional changes with general exponents Successive approximation: How deep is the well? Daunting trigonometric integral Summary and further problems 77 79 84 91 94 97 In almost every quantitative problem.3).5). back-of-the-envelope estimates. the analysis simpliﬁes when you follow the proverbial advice of doing ﬁrst things ﬁrst.

The units. Round each factor to the closest number among three choices: 1.3 Checking units Check that all the units in the estimate divide out—except for the desired units of bits.4 × 104 × 32. the remaining part is a correction factor of 3. (5. What is the data capacity to within a factor of 2? The units (the biggest part!) are bits (Problem 5. The invented number few lies midway between 1 and 10: It is the geometric mean of 1 and 10. Problem 5. Back-of-the-envelope calculations use rough estimates such as the playing time and neglect important factors such as the bits devoted to error detection and correction. the two channels are for stereophonic sound. To estimate the product. and the correction factor combine to give capacity ∼ 108 × 30 bits = 3 × 109 bits. and the three numerical factors contribute 3600 × 4.2 Bits per sample ∼ 105 . The correction: After taking out the big part. In this and many other estimates.4×3. An approximate analysis needs an approximate method of calculation. say 32 bits (per channel)? 216 Problem 5.2. Why didn’t the designers of the CD format choose a much larger sample size. and 32 contributes one.78 5 Taking out the big part (In the sample-size factor.4×3.6 × 4. so evaluate this big part ﬁrst: 3600 contributes three powers of 10. the powers of 10. The big part: The most important factor in a back-of-the-envelope product usually comes from the powers of 10.2) .001%.4 × 104 contributes four. In the product 3. few. split it into a big part and a correction.6×4.2 ≈ (few)3 or roughly 30. so (few)2 = 10 and few ≈ 3. or 10. The eight powers of 10 produce a factor of 108 .1 Sample rate Look up the Shannon–Nyquist sampling theorem [22]. each factor rounds to few. multiplication with 3 decimal places of accuracy would be overkill. so 3.3).6×4. 4.2.) Problem 5. This product too is simpliﬁed by taking out its big part.4 × 3. and explain why the sample rate (the rate at which the sound pressure is measured) is roughly 40 kHz. a 16-bit sample—as chosen for the CD format—requires Because electronics accurate to roughly 0.

15 × 0. 5.5.4 × 104 × 32? Check your reasoning by computing the exact product. as a ﬁrst of many uses. 161 × 294 × 280 × 438.21) = 3 × 7 + 0. a. then compare the approximate and actual products. The actual surface area is roughly 5. This decomposition produces (3 + 0. developing the improved correction introduces two important street-ﬁghting ideas: fractional changes (Section 5.4).15 to 3 and 7.1 × 1014 m2 . For example.6 × 109 bits. where the radius is R ∼ 6 × 106 m.8 × 109 . however.21 quickly becomes few × 101 ∼ 30. 5. Slightly modiﬁed.21 to 7.2 Fractional changes and low-entropy expressions Using the one-or-few method for mental multiplication is fast.5 More practice Use the one-or-few method of multiplication to perform the following calculations mentally.2.3) big part additivecorrection The approach is sound.3).15 × 7.21 into a big part and an additive correction. (5.1 Fractional changes The hygienic alternative to an additive correction is to split the product into a big part and a multiplicative correction: . As gravy. Problem 5. round 3.2 Fractional changes and low-entropy expressions 79 This estimate is within a factor of 2 of the exact product (Problem 5.2). which is itself close to the actual capacity of 5. which is within 50% of the exact product 22.21 . Their product 21 is in error by only 8%.2. one could split 3. Earth’s surface area A = 4πR2 . help us estimate the energy saved by highway speed limits (Section 5.7115. Problem 5. b. To get a more accurate estimate. The improved correction will then.15 × 7. The actual product is roughly 5. taking out the big part provides a clean and intuitive correction.2. but the literal application of taking out the big part produces a messy correction that is hard to remember and understand.15)(7 + 0. To reduce the error further.1) and low-entropy expressions (Section 5. 3.4 Underestimate or overestimate? Does 3 × 109 overestimate or underestimate 3600 × 4.15 × 7 + 3 × 0.21 + 0.2.

The big part is 21. For example.05) × (1 + 0. the correction simpliﬁes to xΔy + yΔx. xΔx. Then draw a rectangle for the correction factor. big part correction factor (5. but even so it is hard to remember because it has many plausible but incorrect alternatives.8 Rectangle picture Draw a rectangle representing the expansion (x + Δx)(y + Δy) = xy + xΔy + yΔx + ΔxΔy.6 Picture for the fractional error What is the pictorial explanation for the fractional error of roughly 0.15 × 7.03 ≈0 1 1 0. The extent .05 and height 1 + 0. a two-factor product becomes (x + Δx)(y + Δy) = xy + xΔy + yΔx + ΔxΔy .05 Problem 5. 5. and correct the big part. it could plausibly contain terms such as ΔxΔy.03).80 5 Taking out the big part 3.05) × (1 + 0.68.03.6) When the absolute changes Δx and Δy are small (x Δx and y Δy).15%? Problem 5. estimate its area.21 was complicated as an absolute or additive change but simple as a fractional change.05 + 0.03) .5) Problem 5.05 1 0.15 × 7. so 3. 0. and 8% of it is 1.7 Try it yourself Estimate 245×42 by rounding each factor to a nearby multiple of 10. This contrast is general.15 × 7. which is within 0.68. Their combined area of roughly 1 + 0.03 represents an 8% fractional increase over the big part.4) Can you ﬁnd a picture for the correction factor? The correction factor is the area of a rectangle with width 1 + 0.2 Low-entropy expressions The correction to 3.2.14% of the exact product. or yΔy. Using the additive correction.21 = 22.21 = 3 × 7 × (1 + 0. The rectangle contains one subrectangle for each term in the expansion of (1 + 0.03 0. and compare this big part with the exact product. additive correction (5. (5.

The logarithm does not alter the essential point that expressions diﬀer in the number of plausible alternatives and that high-entropy expressions [28]—ones with many plausible alternatives—are hard to remember and understand. a low-entropy expression allows few plausible alternatives. this ratio contains gratuitous entropy. and elicits. Although the result is dimensionless. and the harder we must work to remember the correct result. the larger the gap. Unmixing is difﬁcult with physical systems. automatically has lower entropy than the additive correction: The set of plausible dimensionless expressions is much smaller than the full set of plausible expressions. and ﬁnally divides the product by xy. (5.7) The right side is built only from the fundamental dimensionless quantity 1 and from meaningful dimensionless ratios: (Δx)/x is the fractional change in x. to remove a drop of food coloring mixed into a glass of water. “Yes! How could it be otherwise?!” Much mathematical and scientiﬁc progress consists of ﬁnding ways of thinking that turn highentropy expressions into easy-to-understand. multiplies them. Such gaps are the subject of statistical mechanics and information theory [20. and (Δy)/y is the fractional change in y. It constructs two dimensioned sums x + Δx and y + Δy. In contrast. and it was removed by regrouping or unmixing. Fortunately. What is a low-entropy expression for the correction to the product xy? A multiplicative correction.2 Fractional changes and low-entropy expressions 81 of the plausible alternatives measures the gap between our intuition and reality. low-entropy expressions. The multiplicative correction is (x + Δx)(y + Δy)/xy. x. As written.5. which deﬁne the gap as the logarithm of the number of plausible alternatives and call the logarithmic quantity the entropy. for example. the harder the correct result must work to ﬁll it. it becomes so only in the last step. and y willy nilly. . 21]. The problem is that a glass of water contains roughly 1025 molecules. being dimensionless. The gratuitous entropy came from mixing x + Δx. Try. y + Δy. A cleaner method is to group related factors by making dimensionless quantities right away: x + Δx y + Δy (x + Δx)(y + Δy) = = xy x y 1+ Δx x 1+ Δy y . most mathematical expressions have fewer constituents. We can often regroup and unmix the mingled pieces and thereby reduce the entropy of the expression.

82

5 Taking out the big part

Problem 5.9 Rectangle for the correction factor Draw a rectangle representing the low-entropy correction factor

1+

Δx x

1+

Δy y

.

(5.8)

A low-entropy correction factor produces a low-entropy fractional change: Δ (xy) = xy 1+ Δx x 1+ Δy y −1= Δx Δy Δx Δy + + , x y x y (5.9)

where Δ(xy)/xy is the fractional change from xy to (x + Δx)(y + Δy). The rightmost term is the product of two small fractions, so it is small compared to the preceding two terms. Without this small, quadratic term, Δx Δy Δ (xy) ≈ + . xy x y Small fractional changes simply add! This fractional-change rule is far simpler than the corresponding approximate rule that the absolute change is xΔy + yΔx. Simplicity indicates low entropy; indeed, the only plausible alternative to the proposed rule is the possibility that fractional changes multiply. And this conjecture is not likely: When Δy = 0, it predicts that Δ(xy) = 0 no matter the value of Δx (this prediction is explored also in Problem 5.12).

Problem 5.10 Thermal expansion If, due to thermal expansion, a metal sheet expands in each dimension by 4%, what happens to its area? Problem 5.11 Price rise with a discount Imagine that inﬂation, or copyright law, increases the price of a book by 10% compared to last year. Fortunately, as a frequent book buyer, you start getting a store discount of 15%. What is the net price change that you see?

(5.10)

5.2.3 Squaring In analyzing the engineered and natural worlds, a common operation is squaring—a special case of multiplication. Squared lengths are areas, and squared speeds are proportional to the drag on most objects (Section 2.4): Fd ∼ ρv2 A, (5.11)

5.2 Fractional changes and low-entropy expressions

83

where v is the speed of the object, A is its cross-sectional area, and ρ is the density of the ﬂuid. As a consequence, driving at highway speeds for a distance d consumes an energy E = Fd d ∼ ρAv2 d. Energy consumption can therefore be reduced by driving more slowly. This possibility became important to Western countries in the 1970s when oil prices rose rapidly (see [7] for an analysis). As a result, the United States instituted a highway speed limit of 55 mph (90 kph). By what fraction does gasoline consumption fall due to driving 55 mph instead of 65 mph? A lower speed limit reduces gasoline consumption by reducing the drag force ρAv2 and by reducing the driving distance d: People measure and regulate their commuting more by time than by distance. But ﬁnding a new home or job is a slow process. Therefore, analyze ﬁrst things ﬁrst— assume for this initial analysis that the driving distance d stays ﬁxed (then try Problem 5.14). With that assumption, E is proportional to v2 , and Δv ΔE =2× . E v (5.12)

Going from 65 mph to 55 mph is roughly a 15% drop in v, so the energy consumption drops by roughly 30%. Highway driving uses a signiﬁcant fraction of the oil consumed by motor vehicles, which in the United States consume a signiﬁcant fraction of all oil consumed. Thus the 30% drop substantially reduced total US oil consumption.

Problem 5.12 A tempting error

2

If A and x are related by A = x2 , a tempting conjecture is that

ΔA ≈ A

Δx x

.

(5.13)

Disprove this conjecture using easy cases (Chapter 2). Problem 5.13 Numerical estimates

Use fractional changes to estimate 6.33 . How accurate is the estimate? Problem 5.14 Time limit on commuting Assume that driving time, rather than distance, stays ﬁxed as highway driving speeds fall by 15%. What is the resulting fractional change in the gasoline consumed by highway driving?

84

5 Taking out the big part

Problem 5.15

Wind power

The power generated by an ideal wind turbine is proportional to v3 (why?). If wind speeds increase by a mere 10%, what is the eﬀect on the generated power? The quest for fast winds is one reason that wind turbines are placed on cliﬀs or hilltops or at sea.

**5.3 Fractional changes with general exponents
**

The fractional-change approximations for changes in x2 (Section 5.2.3) and in x3 (Problem 5.13) are special cases of the approximation for xn Δ (xn ) Δx . ≈n× n x x (5.14)

This rule oﬀers a method for mental division (Section 5.3.1), for estimating square roots (Section 5.3.2), and for judging a common explanation for the seasons (Section 5.3.3). The rule requires only that the fractional change be small and that the exponent n not be too large (Section 5.3.4). 5.3.1 Rapid mental division The special case n = −1 provides the method for rapid mental division. As an example, let’s estimate 1/13. Rewrite it as (x + Δx)−1 with x = 10 and Δx = 3. The big part is x−1 = 0.1. Because (Δx)/x = 30%, the fractional correction to x−1 is roughly −30%. The result is 0.07. 1 1 ≈ − 30% = 0.07, 13 10 (5.15)

where the “−30%” notation, meaning “decrease the previous object by 30%,” is a useful shorthand for a factor of 1 − 0.3. How accurate is the estimate, and what is the source of the error? The estimate is in error by only 9%. The error arises because the linear approximation Δ x−1 Δx ≈ −1 × −1 x x (5.16)

does not include the square (or higher powers) of the fractional change (Δx)/x (Problem 5.17 asks you to ﬁnd the squared term).

reduce the fractional change. let’s increase the accuracy of the big part. To improve it. (5. If a 55 mph speed limit decreases energy consumption by 30%. and the fractional correction to 1/x and 8/x is a mere −4%. . Because the fractional change is determined by the big part. As an example. to construct 8/104. the fractional correction is 1/18.18) Use the resulting approximation to improve the estimates for 1/13.13%! Problem 5. what is the new fuel eﬃciency of a car that formerly got 30 miles per US gallon (12.1667.14%. the coeﬃcient of the quadratic term in the improved fractional-change approximation Δ x−1 Δx +A× ≈ −1 × x x−1 Δx x 2 .08 − 4% = 0.04 (rather than 0. multiply 1/13 by 8/8. Accordingly. The big part x1/2 is 3.18 Fuel eﬃciency Fuel eﬃciency is inversely proportional to energy consumption.0032 = 0. Its big part 0. Because (Δx)/x = 1/9 and n = 1/2.08 approximates 1/13 already to within 4%. a convenient form of 1. The corrected estimate is 0. (5. 13 (5. How accurate is the resulting approximation? Problem 5.8 kilometers per liter)? 5.3). The corrected estimate is √ 1 ≈ 3..16 Next approximation Multiply 1/13 by a convenient form of 1 to make a denominator near 1000.0768.08 − 0. The fractional change (Δx)/x is now 0.0768: 1 ≈ 0. .19) 10 ≈ 3 × 1 + 18 The exact value is 3. Problem 5. write 1/104 as (x + Δx)−1 with x = 100 and Δx = 4. .17 Quadratic approximation Find A.17) This estimate can be done mentally in seconds and is accurate to 0.2 Square roots The fractional exponent n = 1/2 provides the method for estimating √ square roots. let’s estimate 10.3. so the estimate is accurate to 0.1622 . then estimate 1/13. Rewrite it as (x + Δx)1/2 with x = 9 and Δx = 1.3 Fractional changes with general exponents 85 How can the error in the linear approximation be reduced? To reduce the error.5.

The intensity I therefore varies according to I ∝ r−2 . summers in the southern hemisphere happen alongside winters in the northern hemisphere. rewrite it√ then estimate 360. The causal chain—that the distance determines the intensity of solar radiation and that the intensity determines the surface temperature—is most easily analyzed using fractional changes. Problem 5. The solar power hardly changes over a year (the sun has existed for several billion years).22 Another method to reduce the fractional change √ √ Because 2 is fractionally distant from the nearest integer square roots √1 and √ 4. as we will now estimate. fractional changes do not give a direct and accurate estimate of 2. Second. however. The fractional changes in radius and intensity are related by Δr ΔI ≈ −2 × .3). if no. I r (5. because the earth is closer to the sun in the summer than in the winter. the varying earth–sun distance produces too small a temperature diﬀerence.20) . give a counterexample. How accurate is the resulting estimate for 10? Problem 5. rewriting 2 as √ (4/3)/(2/3) improved the accuracy. explain why. Does that rewriting help estimate 2? Problem 5. the energy has spread over a giant sphere with surface area ∼ r2 . Intensity of solar radiation: The intensity is the solar power divided by the area over which it spreads. it is often alleged.3 A reason for the seasons? Summers are warmer than winters. First. there. Problem 5.23 Cube root Estimate 21/3 to within 10%. 5.86 5 Taking out the big part Problem 5. despite almost no diﬀerence in the respective distances to the sun.21 Reducing the fractional change √ √ as 360/6 and To reduce the √ fractional change when estimating 10. This common explanation is bogus for two reasons.20 Cosine approximation Use the small-angle approximation sin θ ≈ θ to show that cos θ ≈ 1 − θ2 /2. at a distance r from the sun.19 Overestimate or underestimate? Does the linear fractional-change approximation overestimate all square roots (as √ it overestimated 10)? If yes. A similar problem occurred in estimating ln 2 (Section 4.3.

24 Where is the sun? The preceding diagram of the earth’s orbit placed the sun away from the center of the ellipse. 1 ΔI ΔT ≈ × . if any.21) This relation connects intensity and temperature. its orbital distance is r= l . What physical laws.5. the two relations connect distance and temperature as follows: Δr r −2 I ∝ r−2 ΔI I ≈ −2 × Δr r 1 4 T ∝ I1/4 ΔT 1 Δr ≈− × T 2 r The next step in the computation is to estimate the input (Δr)/r—namely. . The temperature and distance are connected by (ΔI)/I = −2 × (Δr)/r. Problem 5. Using fractional changes. r varies by roughly 2 .3 Fractional changes with general exponents 87 Surface temperature: The incoming solar energy cannot accumulate and returns to space as blackbody radiation. The diagram to the right shows the sun at an alternative and perhaps more natural location: at the center of the ellipse. The increase from l to rmax contributes another fractional change of roughly . prevent the sun from sitting at the center of the ellipse? rmin rmax Problem 5. Therefore T ∝ I1/4 .12).4%). and l is the semilatus rectum.2% (making the intensity vary by 6. Thus r varies from rmin = l/(1 + ) (when θ = 0◦ ) to rmax = l/(1 − ) (when θ = 180◦ ). = 0. 1 + cos θ (5.25 Check the fractional change Look up the minimum and maximum earth–sun distances and check that the distance does vary by 3. When joined. For the earth’s orbit.2% from minimum to maximum. the fractional change in the earth–sun distance. Its outgoing intensity depends on the earth’s surface temperature T according to the Stefan–Boltzmann law I = σT 4 (Problem 1. The earth orbits the sun in an ellipse. so the earth–sun distance varies by 0.22) r rmax l θ rmin 0◦ where is the eccentricity of the orbit.032 or 3.016. The increase from rmin to l contributes a fractional change of roughly . Thus. where σ is the Stefan–Boltzmann constant. T 4 I (5. θ is the polar angle.

so is ΔT ≈ 0◦ C? If our calculation predicts that ΔT ≈ 0◦ C.) A typical temperature change between summer and winter in temperate latitudes is 20◦ C— much larger than the predicted 5◦ C change. Neither the Celsius nor the Fahrenheit scale satisﬁes this requirement.6% × T. For blackbody ﬂux to be proportional to T 4 .23) However. ΔT = −1.2% increase in distance causes a slight drop in temperature: 1 Δr ΔT ≈− × = −1. thus.6%. Yet ΔT cannot ﬂip its sign just because T is measured in Fahrenheit degrees! Fortunately. In winter T ≈ 0◦ C. A varying earth–sun distance is a dubious explanation of the reason for the seasons.26. man does not live by fractional changes alone and experiences the absolute temperature change ΔT . temperature must be measured relative to a state with zero thermal energy: absolute zero.26 Converting to Fahrenheit The conversion between Fahrenheit and Celsius temperatures is (5. which makes T often negative in parts of the northern hemisphere. Problem 5. An even less plausible conclusion results from measuring T in Fahrenheit degrees. (See also Problem 5. On the Kelvin scale. the average surface temperature is T ≈ 300 K. A 5 K change is also a 5◦ C change—Kelvin and Celsius degrees are the same size. although the scales have diﬀerent zero points. the Kelvin scale does measure temperature relative to absolute zero.25) so a change of 5◦ C should be a change of 41◦ F—suﬃciently large to explain the seasons! What is wrong with this reasoning? Problem 5. in passing. what can? Your proposal should. it must be wrong.88 5 Taking out the big part A 3. In contrast. explain why the northern and southern hemispheres have summer 6 months apart.24) F = 1. (5.6% change in T makes ΔT ≈ 5 K. a 1. even after allowing for errors in the estimate.27 Alternative explanation If a varying distance to the sun cannot explain the seasons. . the temperature scale is constrained by the Stefan–Boltzmann law.8C + 32. T 2 r (5.

28) 1.000. . For nz.3). more than 1000 times larger than the prediction.70481. (1+z)n nz (5. Both predictions used large n and small z. Perhaps the culprit is the dimensionless product nz. the approximation predicts that 1.1 (larger than 0. 1. The right side becomes nz.3. What happens in the extreme case of large exponents? With a large exponent such as n = 100 and.27) The approximation becomes inaccurate when z is too large: for example.22). hold nz constant while trying large values of n.26) has been useful.5.001100 ≈ 1. 1.71692.110 ≈ 2. But when is it valid? To investigate without drowning in notation.3. the problem cannot lie in n or z alone.001 but still small) produces the terrible prediction 1. 1/2 for square roots (Section 5.1100 = 1 + 100 × 0.2). −1 for reciprocals (Section 5. Is the exponent n also restricted? The preceding examples illustrated only moderate-sized exponents: n = 2 for energy consumption (Section 5.1 = 11. (5. a sensible constant is 1—the simplest dimensionless number.59374.1). However. thus.001 1000 (5. Here are several examples. say. .4 Limits of validity The linear fractional-change approximation Δ (xn ) Δx ≈n× n x x (5.01100 ≈ 2.3 Fractional changes with general exponents 89 5. −2 for fuel eﬃciency (Problem 5. choosing the same n alongside z = 0. then choose x = 1 to make z the absolute and the fractional change. and −2 and 1/4 for the seasons (Section 5. write z for Δx.3. 1. We need further data.3).3. √ when evaluating 1 + z with z = 1 (Problem 5. z = 0.2. . yet only one prediction was accurate. To test that idea.1—close to the true value of 1.29) ≈ 2.1100 is roughly 14. and the linear fractional-change approximation is equivalent to (1 + z)n ≈ 1 + nz.18).001.105 .

718281828 . ln(1 + z)n = n ln(1 + z).7182682 2.. and the upper half plane shows n 1. Problem 5. Therefore.28 Explaining the approximation plane In the right half plane.28). continue the sequence beyond 1. Thus. relax the assumption of positive n and z as far as possible. across the whole n–z plane. For the whole plane. simplify the products in the binomial coeﬃcients by approximating n − k as n. explain the n/z = 1 and n ln z = 1 boundaries. n z=1 n /z n z = 1 (5.7181459 2. the boundary curve is n ln z = 1. (5. The axes are logarithmic and n and z are assumed positive: The right half plane shows z 1.7182817 10k Pictorial reasoning showed that ln(1 + z) ≈ z when z 1 (Section 4. the base of the natural logarithms. n ln(1 + z) ≈ nz.90 5 Taking out the big part In each example. This improved approximation explains why the approximation (1 + z)n ≈ 1 + nz failed with large nz: 1 Only when nz 1 is enz approximately 1 + nz. On the lower right. 1 and nz unrestricted).5937425 2. What is the cause of the error? To ﬁnd the cause. . Explaining the boundaries and extending the approximations is an instructive exercise (Problem 5. Expand Try the following alternative derivation of (1+z)n ≈ enz (where n (1 + z)n using the binomial theorem.31) 1 The diagram shows. enz zn en/z zn 1 + nz n=1 zn 1 + n ln z = z Problem 5. the simplest approximation in each region. take the logarithm of the whole approximation. the approximation incorrectly predicts that (1 + z)n = 2. Therefore. making (1 + z)n ≈ enz . when z the two simplest approximation are (1 + z)n ≈ 1 + nz enz (z (z 1 and nz 1). .29 Binomial-theorem derivation 1).7182805 2.3). and compare the resulting expansion to the Taylor series for enz .0011000 and hope that a pattern will emerge: The values seem to approach e = 2.7169239 2. .30) k 1 2 3 4 5 6 7 1 + 10−k 2.7048138 2.

the constraint is (5. You drop a stone down a well of unknown depth h and hear the splash 4 s later.33) Using the quadratic formula and choosing the positive root yields z= − 2/g + 2/g + 4T/cs . Neglecting air resistance. g √ h.34) Because z2 = h. What are the advantages and disadvantages of this method in comparison with √ the method of rewriting the constraint as a quadratic in z = h? As a quadratic equation in z = 1 2 z + cs 2 z − T = 0. so the total time is T= 2h h + . either isolate the square root on one side and square both sides to get a quadratic equation in h (Problem 5. rewrite the constraint as a quadratic equation in a √ new variable z = h. Problem 5.4 Successive approximation: How deep is the well? The next illustration of taking out the big part emphasizes successive approximation and is disguised as a physics problem.30). Approximate and exact solutions give almost the same well depth. g cs rock sound (5.3).1 Exact depth The depth is determined by the constraint that the 4 s wait splits into two times: the rock falling freely down the well and the sound traveling up the well. 2/cs (5. ﬁnd h to within 5%.4. .30 Other quadratic Solve for h by isolating the square root on one side and squaring both sides. or. for a less error-prone method. The free-fall time is 2h/g (Problem 1.4 Successive approximation: How deep is the well? 91 5. Use cs = 340 m s−1 as the speed of sound and g = 10 m s−2 as the strength of gravity. but oﬀer signiﬁcantly diﬀerent understandings.32) To solve for h exactly. 5.5.

How can this approximation be improved? To improve it.36) Is this approximate depth an overestimate or underestimate? How accurate is it? This approximation neglects the sound-travel time.56 m. 2 0 (5. may be less useful than approximate answers. Compared to the true depth of roughly 71.56 m. which is far below cs . identify the big part—the most important eﬀect.37) cs T t 1 2 2 gt h T− h cs The remaining time is the next approximation to the free-fall time. Furthermore.92 5 Taking out the big part h= − 2/g + 2/g + 4T/cs 2/cs 2 . is only gT = 40 m s−1 . approximate depth. (5.24 s.4. so it overestimates the free-fall time and therefore the depth. most of the total time is the rock’s free fall: The rock’s maximum speed. . how deep is the well? In this zeroth approximation. Exact answers. h0 tsound ≈ ≈ 0. its use often signals the triumph of symbol manipulation over thought. it overestimates the depth by only 11%—reasonable accuracy for a quick method oﬀering physical insight. so the well depth h0 becomes h0 = 1 2 gt = 80 m. Even if the depth is correct. the free-fall time t0 is the full time T = 4 s. (5. even if it fell for the entire 4 s. Therefore. use the approximate depth h0 to approximate the sound-travel time.2 Approximate depth To ﬁnd a low-entropy. the exact formula for it is a mess.35) Substituting g = 10 m s−2 and cs = 340 m s−1 gives h ≈ 71. If cs = ∞. the most important eﬀect should arise in the extreme case of inﬁnite sound speed. this approximation suggests its own reﬁnement. we will ﬁnd. Such highentropy horrors arise frequently from the quadratic formula. 5. Here.

Most equations have no closed-form solution. so the depth is roughly gT 2 /2. the second approximation to the depth? Compare the error in h1 and h2 with the error made by using g = 10 m s−2 . . The method of successive approximation has several advantages over solving the quadratic formula exactly. Problem 5. it has a pictorial explanation (Problem 5. underestimates the free-fall time. the method can handle small changes in the model. for example. The quadratic formula and the even messier cubic and the quartic formulas are rare closed-form solutions to complicated equations. we realize. the rock falls a distance gt2 /2. comprehensible solutions.32 Eﬀect of air resistance Roughly what fractional error in the depth is produced by neglecting air resistance (Section 2.34).38) In that time. Problem 5. Because h0 overestimates the depth.56 m—but by only 1. First.5. Maybe the speed of sound varies with depth.31 Parameter-value inaccuracies What is h2 . Therefore.3%. 2 1 (5. that most of the T = 4 s is spent in free fall. so the next approximation to 1 the depth is h1 = 1 2 gt ≈ 70. a small change to a solvable model usually produces an intractable model—if we demand an exact answer.2)? Compare this error to the error in the ﬁrst approximation h1 and in the second approximation h2 (Problem 5.76 s.4 Successive approximation: How deep is the well? 93 t1 = T − h0 ≈ 3. If you want to know whether it is safe to jump into the well. quadratic-formula method fails. cs (5. The method of successive approximation is a robust alternative that produces low-entropy. by the same amount.31).39) Is this approximate depth an overestimate or underestimate? How accurate is it? The calculation of h1 used h0 to estimate the sound-travel time. it helps us develop a physical understanding of the system. Thus h1 underestimates the depth. Then the brute-force.87 m. the procedure overestimates the sound-travel time and. Second. why calculate the depth to three decimal places? Finally. or air resistance becomes important (Problem 5.32). Third. h1 is slightly smaller than the true depth of roughly 71.4. it gives a suﬃciently accurate answer quickly. Indeed.

An intuitively reasonable pair are h≡ h gT 2 and T≡ gT . Then check that f(T ) behaves correctly in the easy case T → 0. Why are portions of the rock and sound-wavefront curves dotted? How would you redraw the diagram if the speed of sound doubled? If g doubled? rock sound wavefront depth 5.41) as h = f(T ). My classmates and I spent many late nights in the physics library solving homework problems. and the exact depth h.34 Spacetime diagram of the well depth t How does the spacetime diagram [44] illustrate the successive approximation of the well depth? 4 s On the diagram. Rewrite the quadratic-formula solution 2 h= − 2/g + 2/g + 4T/cs 2/cs (5. g. and cs produce two independent dimensionless groups (Section 2. The problem is to evaluate π/2 −π/2 (cos t)100 dt (5.33 Dimensionless form of the well-depth analysis Even the messiest results are cleaner and have lower entropy in dimensionless form. the general dimensionless form is h = f(T ).94 5 Taking out the big part Problem 5. doing the same for their courses.4. the zeroth approximation to the free-fall time.5 Daunting trigonometric integral The ﬁnal example of taking out the big part is to estimate a daunting trigonometric integral that I learned as an undergraduate. cs (5. Problem 5. mark h0 (the zeroth approximation to the depth). What is a physical interpretation of T ? b. With two groups. The four quantities h. T . h1 . Mark t0 . would regale us with their favorite mathematics and physics problems. the graduate students. The integral appeared on the mathematical-preliminaries exam to enter the Landau Institute for Theoretical Physics in the former USSR.42) .40) a. What is h in the easy case T → 0? c.1).

(5. this issue contributes only a tiny error (Problem 5.43) which becomes a trigonometric monster upon expanding the 50th power. replacing (cos t)100 by a Gaussian is a bit suspicious. (5. so the integrand is roughly (cos t)100 ≈ 1− t2 2 100 .35). . so (1 + z)n may be approximated using the results of Section 5.44) It has the familiar form (1 + z)n . 5 show a Gaussian bell shape taking form as n increases. then (cos t)100 ≈ 1− t2 2 100 ≈ e−50t .47) . computergenerated plots of (cos t)n for n = 1 . In the original integral.3. A clue pointing to a simpler method is that 5% accuracy is suﬃcient—so.20).45) Because the exponent n is large. ﬁnd the big part! The integrand is largest when t is near zero. cos t ≈ 1 − t2 /2 (Problem 5. Fortunately.46) A cosine raised to a high power becomes a Gaussian! As a check on this surprising conclusion. Most trigonometric identities do not help. z = −t2 /2 is tiny.4: (1 + z)n ≈ 1 + nz enz (z (z 1 and nz 1) 1 and nz unrestricted). 2 (5. cos t Even with this graphical evidence.5. The usually helpful identity (cos t)2 = (cos 2t − 1)/2 produces only (cos t)100 = cos 2t − 1 2 50 . nz can be large even when t and z are small. with fractional change z = −t2 /2 and exponent n = 100. 2 (5. the safest approximation is (1 + z)n ≈ enz . (5. t ranges from −π/2 to π/2. . Therefore. and these endpoints are far outside the region where cos t ≈ 1 − t2 /2 is an accurate approximation. There. Ignoring this error turns the original integral into a Gaussian integral with ﬁnite limits: π/2 −π/2 (cos t)100 dt ≈ π/2 −π/2 e−50t dt.5 Daunting trigonometric integral 95 to within 5% in less than 5 min without using a calculator or computer! That (cos t)100 looks frightening. When t is small.

96 5 Taking out the big part Unfortunately. the binomial coeﬃcient and power of two produce 12611418068195524166851562157 π ≈ 0. How close is the result of this 1-minute method to the exact value 0. then integrate it over the range where 1 − 50t2 is positive.25003696348037.50) Our 5-minute.25.36 Extending the limits Why doesn’t extending the integration limits from ±π/2 to ±∞ contribute a signiﬁcant error? The last integral is an old friend (Section 2.41) π/2 −π/2 (cos t)n dt = 2−n n π. . 2 ∞ For comparison.51) . with ﬁnite limits the integral has no closed form. 50 is roughly 16π.1): −∞ e−αt dt = π/α. within-5% estimate of 0.2500 .36). n/2 (5.49) When n = 100.35 Using the original limits The approximation cos t ≈ 1 − t2 /2 requires that t be small.48) Problem 5.39 Estimate π/2 −π/2 Huge exponent (cos t)10000 dt.? Problem 5. But extending the limits to inﬁnity produces a closed form while contributing almost no error (Problem 5. Conveniently. the exact integral is (Problem 5. 158456325028528675187087900672 (5. the integral becomes π/50. 2 (5.38 Simplest approximation Use the linear fractional-change approximation (1 − t2 /2)100 ≈ 1 − 50t2 to approximate the integrand. (5.01%! Problem 5. The approximation chain is now π/2 −π/2 (cos t)100 dt ≈ π/2 −π/2 e−50t dt ≈ 2 ∞ −∞ e−50t dt. so the square root—and our 5% estimate—is roughly 0. Why doesn’t using the approximation outside the small-t range contribute a signiﬁcant error? Problem 5.25 is accurate to almost 0. .37 Plot Sketching the approximations 2 and its two approximations e−50t and 1 − 50t2 . With α = 50. (cos t)100 Problem 5.

a species of divide-and-conquer reasoning. What value or values of k produce a sum whose integral is nonzero? 5. 5% of the bacteria got mutated. researchers repeatedly irradiated a population of bacteria in order to generate mutations. In each round of radiation.6 Summary and further problems 97 Problem 5.5. and worry about the correction afterward. This successive-approximation approach.52) for small n. divide it into a big part—the most important eﬀect—and a correction. Use the binomial theorem to expand the 100th power. use the following steps: a.41 Closed form To evaluate the integral π/2 −π/2 (cos t)100 dt (5. roughly what fraction of bacteria were left unmutated? (The seminar speaker gave the audience 3 s to make a guess.) .43 Bacterial mutations In an experiment described in a Caltech biology seminar in the 1990s. Replace cos t with (eit + e−it ) 2.42 Large logarithm What is the big part in ln(1+e2 )? Give a short calculation to estimate ln(1+e2 ) to within 2%.53) in closed form. Low-entropy expressions admit few plausible alternatives. b. n (5. they are therefore memorable and comprehensible. c. then integrate their sum from −π/2 to π/2. hardly enough time to use or even ﬁnd a calculator. After 140 rounds.6 Summary and further problems Upon meeting a complicated problem. Analyze the big part ﬁrst. Problem 5. gives results automatically in a low-entropy form.40 How low can you go? Investigate the accuracy of the approximation π/2 −π/2 (cos t)n dt ≈ π . Problem 5. Pair each term like eikt with a counterpart e−ikt . In short. Problem 5. including n = 1. approximate results can be more useful than exact results.

b). b) = 1 0 xa (1 − x)b dx. Is f(k) an even or an odd function of k? For what k does f(k) have its maximum? n and sketch f(k). ﬁnally. Check your conjectures with a high-quality table of integrals or a computer-algebra system such as Maxima.57) where f(a − 1.44 Quadratic equations revisited The following quadratic equation. Therefore.45 Normal approximation to the binomial distribution The binomial expansion 1 1 + 2 2 2n (5. (5. (5.54) a. . and. Use the normal approximation to show that the variance of this binomial distribution is n/2. Estimate the roots by taking out the big part. . describes a very strongly damped oscillating system. What are the advantages and disadvantages of the quadratic-formula analysis versus successive approximation? Problem 5. Use the quadratic formula and a standard calculator to ﬁnd both roots of the quadratic. n. derive and explain b. f(k) is the so-called binomial distribution with parameters p = q = 1/2. c. (Hint: Approximate and solve the equation in appropriate extreme cases. f(a.98 5 Taking out the big part Problem 5. . a).46 Beta function The following integral appears often in Bayesian inference: f(a. Approximate this distribution by answering the following questions: a. What goes wrong and why? b. b − 1) is the Euler beta function.56) where k = −n . 0). Problem 5. Use street-ﬁghting methods to conjecture functional forms for f(a. s2 + 109 s + 1 = 0.55) contains terms of the form 2n f(k) ≡ 2−2n . Approximate f(k) when k the normal approximation to the binomial distribution. Each term f(k) is the probability of tossing n − k heads (and n + k tails) in 2n coin ﬂips. inspired by [29].) Then improve the estimates using successive approximation. n−k (5. c. f(a.

to an inﬁnite transcendental sum (Section 6. construct and solve a similar but simpler problem—an analogous problem. sharpened on solid geometry and topology (Section 6. Practice develops ﬂuency.4). What is the angle θ between two carbon–hydrogen bonds? θ Angles in three dimensions are hard to visualize. The tool is introduced in spatial trigonometry (Section 6.1 Spatial trigonometry: The bond angle in methane The ﬁrst analogy comes from spatial trigonometry. .1). then applied to discrete mathematics (Section 6. and one hydrogen atom sits at each vertex. Its advice is simple: Faced with a diﬃcult problem.2 6.1 6.4 6. 6. in the farewell example.3) and. a carbon atom sits at the center of a regular tetrahedron.2).6 Analogy 6. to imagine and calculate the angle between two faces of a regular tetrahedron. for example. the theme of the whole book. let’s construct and analyze an analogous planar molecule. the tough lower their standards.3 6. Knowing its bond angle might help us guess methane’s bond angle. Try. underlies the ﬁnal street-ﬁghting tool of reasoning by analogy. This idea. In methane (chemical formula CH 4 ). Because two-dimensional angles are easy to visualize.5 Spatial trigonometry: The bond angle in methane Topology: How many regions? Operators: Euler–MacLaurin summation Tangent roots: A daunting transcendental sum Bon voyage 99 103 107 113 121 When the going gets tough.

100

6 Analogy

Should the analogous planar molecule have four or three hydrogens? Four hydrogens produce four bonds which, when spaced regularly in a plane, produce two diﬀerent bond angles. In contrast, methane contains only one bond angle. Therefore, using four hydrogens alters a crucial feature of the original problem. The likely solution is to construct the analogous planar molecule using only three hydrogens. Three hydrogens arranged regularly in a plane create only θ one bond angle: θ = 120◦ . Perhaps this angle is the bond angle in methane! One data point, however, is a thin reed on which to hang a prediction for higher dimensions. The single data point for two dimensions (d = 2) is consistent with numerous conjectures—for example, that in d dimensions the bond angle is 120◦ or (60d)◦ or much else. Selecting a reasonable conjecture requires gathering further data. Easily available data comes from an even simpler yet analogous problem: the one-dimensional, linear molecule CH 2 . Its two hydrogens sit opposite one another, so the two C–H bonds form an angle of θ = 180◦ .

θ

Based on the accumulated data, what are reasonable conjectures for the threedimensional angle θ3 ? The one-dimensional molecule eliminates the conjecture that d θd θd = (60d)◦ . It also suggests new conjectures—for example, 1 180◦ that θd = (240 − 60d)◦ or θd = 360◦ /(d + 1). Testing these 2 120 conjectures is an ideal task for the method of easy cases. 3 ? The easy-cases test of higher dimensions (high d) refutes the conjecture that θd = (240 − 60d)◦ . For high d, it predicts implausible bond angles—namely, θ = 0 for d = 4 and θ < 0 for d > 4. Fortunately, the second suggestion, θd = 360◦ /(d + 1), passes the same easy-cases test. Let’s continue to test it by evaluating its prediction for methane—namely, θ3 = 90◦ . Imagine then a big brother of methane: a CH 6 molecule with carbon at the center of a cube and six hydrogens at the face centers. Its small bond angle is 90◦ . (The other bond angle is 180◦ .) Now remove two hydrogens to turn CH 6 into CH 4 , evenly spreading out the remaining four hydrogens. Reducing the crowding raises the small bond angle above 90◦ —and refutes the prediction that θ3 = 90◦ .

6.1 Spatial trigonometry: The bond angle in methane

101

Problem 6.1 How many hydrogens? How many hydrogens are needed in the analogous four- and ﬁve-dimensional bond-angle problems? Use this information to show that θ4 > 90◦ . Is θd > 90◦ for all d?

The data so far have refuted the simplest rational-function conjectures (240−60d)◦ and 360◦ /(d+1). Although other rational-function conjectures might survive, with only two data points the possibilities are too vast. Worse, θd might not even be a rational function of d. Progress requires a new idea: The bond angle might not be the simplest variable to study. An analogous diﬃculty arises when conjecturing the next term in the series 3, 5, 11, 29, . . . What is the next term in the series? At ﬁrst glance, the numbers seems almost random. Yet subtracting 2 from each term produces 1, 3, 9, 27, . . . Thus, in the original series the next term is likely to be 83. Similarly, a simple transformation of the θd data might help us conjecture a pattern for θd . What transformation of the θd data produces simple patterns? The desired transformation should produce simple patterns and have aesthetic or logical justiﬁcation. One justiﬁcation is the structure of an honest calculation of the bond angle, which can be computed as a dot product of two C–H vectors (Problem 6.3). Because dot products involve cosines, a worthwhile transformation of θd is cos θd . This transformation simpliﬁes the data: The cos θd series begins simply −1, −1/2, . . . Two plausible continuations are −1/4 or −1/3; they correspond, respectively, to the general term −1/2d−1 or −1/d. Which continuation and conjecture is the more plausible? Both conjectures predict cos θ < 0 and therefore θd > 90◦ (for all d). This shared prediction is encouraging (Problem 6.1); however, being shared means that it does not distinguish between the conjectures. Does either conjecture match the molecular geometry? H 1 C 1 H An important geometric feature, apart from the bond angle, is the position of the carbon. In one dimension, it lies halfway

d 1 2 3 θd 180◦ 120 ? cos θd −1 −1/2 ?

102

6 Analogy

between the two hydrogens, so it splits the H–H line segment into two pieces having a 1 : 1 length ratio. In two dimensions, the carbon lies on the altitude that connects one hydrogen to the midpoint of the other two hydrogens. The carbon splits the altitude into two pieces having a 1 : 2 length ratio. How does the carbon split the analogous altitude of methane?

H H

2

C

1

H

In methane, the analogous altitude runs from the top vertex to the center of the base. The carbon lies at the mean position and therefore at the mean height of the C four hydrogens. Because the three base hydrogens have zero height, the mean height of the four hydrogens is h/4, where h is the height of the top hydrogen. Thus, in three dimensions, the carbon splits the altitude into two parts having a length ratio of h/4 : 3h/4 or 1 : 3. In d dimensions, therefore, the carbon probably splits the altitude into two parts having a length ratio of 1 : d (Problem 6.2). Because 1 : d arises naturally in the geometry, cos θd is more likely to contain 1/d rather than 1/2d−1 . Thus, the more likely of the two cos θd conjectures is that 1 cos θd = − . d (6.1)

109.47◦

For methane, where d = 3, the predicted bond angle is arccos(−1/3) or approximately 109.47◦ . This prediction using reasoning by analogy agrees with experiment and with an honest calculation using analytic geometry (Problem 6.3).

Problem 6.2 Carbon’s position in higher dimensions Justify conjecture that the carbon splits the altitude into two pieces having a length ratio 1 : d. Problem 6.3 Analytic-geometry solution In order to check the solution using analogy, use analytic geometry as follows to ﬁnd the bond angle. First, assign coordinates (xn , yn , zn ) to the n hydrogens, where n = 1 . . . 4, and solve for those coordinates. (Use symmetry to make the coordinates as simple as you can.) Then choose two C–H vectors and compute the angle that they subtend.

or three planes meeting at a line.6. Into how many regions do ﬁve planes divide space? This formulation permits degenerate arrangements such as ﬁve parallel planes. try the case n = 3 by slicing the orange a third time and cutting each of the four pieces into two smaller pieces. Perhaps the pattern continues with R(4) = 16 and R(5) = 32. What pattern(s) appear in the data? A reasonable conjecture is that R(n) = 2n . What is the approximate bond angle in high dimensions (large d)? Can you ﬁnd an intuitive explanation for the approximate bond angle? 6. so reasoning by analogy does not show its full power. In the following table for R(n).3).2 Topology: How many regions? The bond angle in methane (Section 6. The ﬁrst plane divides space into two halves. let’s place and orient the planes randomly. try the following problem. Therefore.4 Extreme case of high dimensionality Draw a picture to explain the small-angle approximation arccos x ≈ π/2 − x. thereby maximizing the number of regions. Five planes are hard to imagine. four planes meeting at a point.2 Topology: How many regions? 103 Problem 6.1) can be calculated directly with analytic geometry (Problem 6. thus. The easiest case is zero planes: Space remains whole so R(0) = 1 (where R(n) denotes the number of regions produced by n planes). To eliminate these and other degeneracies. R(3) is indeed 8. giving R(1) = 2. The problem is then to ﬁnd the maximum number of regions produced by ﬁve planes. n R 0 1 1 2 2 4 3 4 5 8 16 32 . To test it. imagine slicing an orange twice to produce four wedges: R(2) = 4. these two extrapolations are marked in gray to distinguish them from the veriﬁed entries. but the method of easy cases—using fewer planes—might produce a pattern that generalizes to ﬁve planes. To add the second plane.

so the analogous question is the following: What is the maximum number of regions into which n lines divide the plane? The method of easy cases might suggest a pattern. before discarding such a simple conjecture. A two-dimensional space is partitioned by lines. If the pattern is 2n . Here is another example with three lines.) What about the three-dimensional regions created by placing planes in space? Until R(3) turned out to be 7. An analogous two-dimensional problem would be easier to solve. A new conjecture might arise from seeing the two-dimensional data R2 (n) alongside the three-dimensional data R3 (n).5): R(1)=2 R(2)=4 R(3)=7 Problem 6. Four lines make only 11 regions rather than the predicted 16. What happens in a few easy cases? Zero lines leave the plane whole. also in a random arrangement. draw a fourth line and carefully count the regions. if anywhere. but it seems to produce only six regions. so the 2n conjecture is dead. and its solution may help test the threedimensional conjecture. then the R(n) = 2n conjecture is likely to apply in three dimensions. However. is the seventh region? Or is R(3) = 6? Problem 6. giving R(0) = 1.6 Convexity Must all the regions created by the lines be convex? (A region is convex if and only if a line segment connecting any two points inside the region lies entirely inside the region. . the conjecture R(n) = 2n looked sound.104 How can the R(n) = 2n conjecture be tested further? 6 Analogy A direct test by counting regions is diﬃcult because the regions are hard to visualize in three dimensions.5 Three lines again The R(3) = 7 illustration showed three lines producing seven regions. Where. The next three cases are as follows (although see Problem 6.

But the table has many small numbers with many ways to combine them. several entries combine to make nearby entries. and Polya [36]. However.6.5. In the R1 row. before seeing these failures. R2 (1) and R3 (1)—the two entries in the n = 1 column—sum to R2 (2) or R3 (2).01. it probably fails starting at n = 4. My personal estimate is that. For example. n R1 R2 R3 0 1 1 1 1 2 2 2 2 3 4 4 3 4 5 6 n n+1 4 5 7 11 8 What patterns are in these data? The 2n conjecture survives partially. What is the maximum number of segments into which n points divide a line? A tempting answer is that n points make n segments. Rather. Jaynes [21]. In the R2 row. discarding the coincidences requires gathering further data—and the simplest data source is the analogous one-dimensional problem. That result generates the R1 row in the following table. n points make n + 1 segments. it fails starting at n = 2. (For more on estimating and updating the probabilities of conjectures. but now it falls to at most 0. making the conjectures R3 (4) = 16 and R3 (5) = 32 improbable. These two entries in turn sum to the R3 (3) entry. see the important works on plausible reasoning by Corﬁeld [11].) In better news. the apparent coincidences contain a robust pattern: n R1 R2 R3 0 1 1 1 1 2 2 2 2 3 4 4 3 4 7 8 4 5 11 5 6 n n+1 . the probability of the R3 (4) = 16 conjecture was 0. an easy case—that one point produces two segments—reduces the temptation. it fails starting at n = 3. Thus in the R3 row.2 Topology: How many regions? 105 n R2 R3 0 1 1 1 2 2 2 4 4 3 7 8 4 11 In this table.

ﬁve planes can divide space into a maximum of 26 regions. Problem 6. for n = 0 . Furthermore. and for the general entry Rd (n) (Problem 6. that brute-force approach would give the value of only R3 (5). R2 .106 6 Analogy If the pattern continues. which is a zeroth-degree polynomial. for R3 (n) (Problem 6. . R2 (n) − An2 . 2 to the general quadratic An2 + Bn + C. into how many regions can ﬁve planes divide space? According to the pattern.3) Thus. What is R0 if the row is to follow the observed pattern? Is that result consistent with the geometric meaning of trying to subdivide a point? Problem 6. the R2 data probably ﬁts a quadratic.9).8). Test this conjecture by ﬁtting the data for n = 0 . R3 (4) = R2 (3) + R3 (3) = 15 7 8 (6.8 Free data from zero dimensions Because the one-dimensional problem gave useful data. . repeatedly taking out the big part (Chapter 5) as follows. Therefore. This number is hard to deduce by drawing ﬁve planes and counting the regions. a.12). whereas easy cases and analogy give a method to compute any entry in the table.10). 11 15 (6. Problem 6.9 General result in two dimensions The R0 data ﬁts R0 (n) = 1 (Problem 6.7 Checking the pattern in two dimensions The conjectured pattern predicts R2 (5) = 16: that ﬁve lines can divide the plane into 16 regions. . Then take out (subtract) the big part An2 and tabulate the leftover.2) and then R3 (5) = R2 (4) + R3 (4) = 26. They thereby provide enough data to conjecture expressions for R2 (n) (Problem 6. . Extend the pattern for the R3 . The R1 data ﬁts R1 (n) = n + 1. try the zero-dimensional problem. . Check the conjecture by drawing ﬁve lines and counting the regions. Guess a reasonable value for the quadratic coeﬃcient A. which is a ﬁrst-degree polynomial. It gives the number of zero-dimensional regions (points) produced by partitioning a point with n objects (of dimension −1). and R1 rows upward to construct an R0 row. 2.

In three dimensions. Therefore. adjust A. Problem 6.8). 3 data. Check your quadratic ﬁt against new data (R2 (n) for n 3). d. Hint: Explain ﬁrst why the pattern generates the R2 row from the R1 row.12 General solution in arbitrary dimension The pattern connecting neighboring entries of the Rd (n) table is the pattern that generates Pascal’s triangle [17]. Problem 6. To understand and learn how to use operators.11 Geometric explanation Find a geometric explanation for the observed pattern. . In d dimensions. Most functions turn numbers into other numbers.9). use binomial coeﬃcients to express R0 (n) (Problem 6.10. show that Rd (n) = 2n for n d (perhaps using the results of Problem 6. Once the quadratic coeﬃcient A is correct. 6. .13 Power-of-2 conjecture Our ﬁrst conjecture for the number of regions was Rd (n) = 2n . a fruitful tool is reasoning by analogy: Operators behave much like ordinary functions or even like numbers. use an analogous procedure to ﬁnd the linear coeﬃcient B. b. the general expression Rd (n) should contain binomial coeﬃcients. Use taking out the big part to ﬁt a cubic to the n = 0 . Because Pascal’s triangle produces binomial coeﬃcients. omitting the parentheses gives the less cluttered expression D sin = cos and D sinh = cosh. It turns the sine function into the cosine function. In either case. and R2 (n) (Problem 6.3 Operators: Euler–MacLaurin summation 107 If the leftover is not linear in n. then a quadratic term remains or too much was removed. Similarly solve for the constant coeﬃcient C. but special kinds of functions—operators—turn functions into other functions. In operator notation. D(sin) = cos and D(sinh) = cosh. R1 (n). checking the result against Problem 6. A familiar example is the derivative operator D. c.12). Then conjecture a binomial-coeﬃcient form for R3 (n) and Rd (n).10 General result in three dimensions A reasonable conjecture is that the R3 row matches a cubic (Problem 6.9). it worked until n = 4.3 Operators: Euler–MacLaurin summation The next analogy studies unusual functions. or the hyperbolic sine function into the hyperbolic cosine function. Does it produce the conjectured values R3 (4) = 15 and R3 (5) = 26? Problem 6.6. then generalize the reason to explain the R3 row. . Problem 6.

2 6 What does this eD do to simple functions? The simplest nonzero function is the constant function f = 1. use a Taylor series—as if D were a number—to build eD out of linear operators. It turns 2f into e2Df . 1+D+ D2 D3 + · · · x2 = x2 + 2x + 1 = (x + 1)2 . do cosh D or sin D have a meaning? Because these functions can be written using the exponential function.1 Left shift Like a number. let’s investigate the operator exponential eD . this interpretation is needlessly nonlinear. the derivative operator can be fed to a polynomial. x2 turns into (x + 1)2 . How far does the analogy to numbers extend? For example.4) (6. whereas a linear operator that produces eDf from f would produce 2eDf from 2f. Similarly. eD f (6. 1+D+ D2 + · · · x = x + 1.6) More interestingly. In that usage. Here is that function being fed to eD : (1 + D + · · ·) 1 = 1.108 6 Analogy 6. an ordinary polynomial such as P(x) = x2 + x/10 + 1 produces the operator polynomial P(D) = D2 + D/10 + 1 (the diﬀerential operator for a lightly damped spring–mass system).5) The next simplest function x turns into x + 1. the derivative operator D can be squared to make D2 (the second-derivative operator) or to make any integer power of D. f D Df exp eDf However.7) . 1 1 eD = 1 + D + D2 + D3 + · · · . What does eD mean? The direct interpretation of eD is that it turns a function f into eDf . which is the square of eDf . 2 (6.3. 2 6 (6. To get a linear interpretation.

14 Continue the pattern What is eD x3 and.6. here is the deﬁnite integration of f(x) = 2x.16 Operating on a harder function Apply the Taylor expansion for eD to sin x to show that eD sin x = sin(x + 1). eD xn ? What does eD do in general? The preceding examples follow the pattern eD xn = (x+1)n . the conclusion is that eD turns f(x) into f(x + 1). Problem 6. eD is simply L. To make the general expression eaD legal. where D is the derivative operator and G = g is the result of indeﬁnite integration.3.3 Operators: Euler–MacLaurin summation 109 Problem 6. Integration occurs in deﬁnite and indeﬁnite ﬂavors: Deﬁnite integration is equivalent to indeﬁnite integration followed by evaluation at the limits of integration. the left-shift operator can represent the operation of summation. and eD is an illegal expression. then the derivative operator D = d/dx is not dimensionless. Because most functions of x can be expanded in powers of x. Amazingly. and eD turns each xn term into (x + 1)n . the left-shift operator. Summation is analogous to the more familiar operation of integration. 2x integration x2 + C b a b 2 − a2 limits In general. what must the dimensions of a be? What does eaD do? 6. Problem 6.17 General shift operator If x has dimensions. Thus D and are inverses . Problem 6. in general. Apply e−D to a few simple functions to characterize its behavior.15 Right or left shift Draw a graph to show that f(x) → f(x + 1) is a left rather than a right shift.2 Summation Just as the derivative operator can represent the left-shift operator (as L = eD ). the connection between an input function g and the result of indeﬁnite integration is DG = g. As an example. This operator representation will lead to a powerful method for approximating sums with no closed form.

Evaluating F between 0 and n gives n(n − 1)/2. Therefore. these steps are the forward path. f(3).8) . f F b a b−1 F(b) − F(a) = k=a f(k) Δ In the reverse path. ( D = 1 because of a possible integration constant. and f(4)—whereas the deﬁ4 nite integral 2 f(k) dk does not include any of the f(4) rectangle. their representations are probably analogous. deﬁne deﬁnite summation as indeﬁnite summation and f(2) f(k) f(3) f(4) then evaluation at the limits. interpret indeﬁnite summation to exclude the last rectangle.) g G b a G(b) − G(a) D What is the analogous picture for summation? Analogously to integration. the new Δ operator inverts Σ just as diﬀerentiation inverts integration. take f(k) = k. But apply the k analogy with care to avoid an oﬀ-by-one or 2 4 5 3 fencepost error (Problem 2. A derivative is the limit f(x + h) − f(x) df = lim . The sum 4 2 f(k) includes three rectangles—f(2). h→0 dx h (6. which is n−1 k. Then the indeﬁnite sum f is the function F deﬁned by F(k) = k(k−1)/2+C (where C is the constant of summation).24).110 6 Analogy of one another—D = 1 or D = 1/ —a connection represented by the loop in the diagram. Because Δ and the derivative operator D are analogous. In the 0 following diagram. As an example. Then indeﬁnite summation followed by evaluating at the limits a and b produces a sum whose index ranges from a to b − 1. an operator representation for Δ provides one for Σ. Rather than rectifying the discrepancy by redeﬁning the familiar operation of integration.

Show therefore that L = eD . Feeding this function to the L − 1 operator reproduces g. and (Σg)(k) is the result of feeding it k. the operator product (L − 1)Σ takes g back to itself. . g(k) (6.9) where the Lh operator turns f(x) into f(x + h)—that is. Δ = lim h→1 Lh − 1 = L − 1. Then Σg is a function waiting to be fed an argument. How well does this conjecture work in various easy cases? To test the conjecture.10) This Δ—called the ﬁnite-diﬀerence operator—is constructed to be 1/Σ. (L − 1)Σg (k) = (k + 1 + C) − (k + C) = 1 .3 Operators: Euler–MacLaurin summation 111 The derivative operator D is therefore the operator limit D = lim h→0 Lh − 1 .18 Operator limit Explain why Lh ≈ 1 + hD for small h. With that notation. then (L − 1)Σ is the identity operator 1. h (6. Problem 6. Lh left shifts by h. Because summation Σ sums rectangles of unit width. the inverse operation of integration sums rectangles of inﬁnitesimal width. Lh with h = 1. As a reasonable conjecture. h (6. its inverse Δ should use a unit left shift—namely. (L − 1)Σ should turn functions into themselves.6. correspondingly. In other words. (LΣg)(k) (1Σg)(k) g(k) (6. What is an analogous representation of Δ? The operator limit for D uses an inﬁnitesimal left shift. (Σg)(k) = k + C. so it acts like the identity operator.11) With the next-easiest function—deﬁned by g(k) = k—the indeﬁnite sum (Σg)(k) is k(k − 1)/2 + C. apply the operator (L−1)Σ ﬁrst to the easy function g = 1.12) In summary. for the test functions g(k) = 1 and g(k) = k. Passing Σg through L − 1 again reproduces g. (L − 1)Σg (k) = (k + 1)k +C − 2 (LΣg)(k) k(k − 1) +C 2 (1Σg)(k) = k . If the construction is correct.

Therefore. Applying this operator series to a function f and then evaluating at the limits a and b produces the Euler–MacLaurin summation formula b−1 f(k) = a b a f(k) dk − f(b) − f(a) f(1) (b) − f(1) (a) + 2 12 f(3) (b) − f(3) (a) f(5) (b) − f(5) (a) − + − ···. Including this term gives the useful alternative b f(k) = a b a f(k) dk + (3) f(b) + f(a) f(1) (b) − f(1) (a) + 2 12 (3) (5) (5) − f (b) − f (a) f (b) − f (a) + − ···.16) A more stringent test of Euler–MacLaurin summation is to approximate ln n!. f(k) = k. = 1 1 D D3 D5 1 = − + − + − ···. sum f(k) = ln k 1 between the (inclusive) limits a = 1 and b = n. Expanding the right side in a Taylor series gives an amazing representation of the summation operator. 2 (6. eD − 1 D 2 12 720 30240 (6. which is the sum n ln k (Section 4.15) n As a check. 720 30240 where f(n) indicates the nth derivative of f. a = 0. The integral term then contributes n2 /2. 2 2 2 (6. Because L = eD .14) The sum lacks the usual ﬁnal term f(b). Thus.17) . the constant term f(b) + f(a) 2 contributes n/2. 720 30240 (6. Using Euler–MacLaurin summation.112 6 Analogy This behavior is general—(L−1)Σ1 is indeed 1. try an easy case: 0 k. The result is n ln k = 1 n 1 ln k dk + ln n + ···. summation is approximately integration—a plausible conclusion indicating that the operator representation is not nonsense. (6. The result is familiar and correct: n k= 0 n(n + 1) n2 n + +0= . and b = n. we have Σ = 1/(eD − 1).5). and Σ = 1/(L−1). and later terms vanish.13) Because D = 1. the leading term 1/D is integration.

. Find S ≡ x−2 where the xn are the positive solutions of tan x = x.21 Higher-order terms Approximate ln 5! using Euler–MacLaurin summation. Street-ﬁghting methods will come to our rescue. yet a closed form is required for almost every summation method. namely ln n.20 Boundary cases In Euler–MacLaurin summation. As Euler did.37). the approximation is too crude to help guess the closed form. However. happened to one-half of the ﬁrst term? Problem 6. 6.6.5) showed that the protrusions are approximately one-half of the last term.32) but simple using Euler–MacLaurin summation (Problem 6. Problem 6.4 Tangent roots: A daunting transcendental sum Our farewell example. What. are transcendental and have no closed form.21)—hard 1 to evaluate using pictures (Problem 4. from the 1/2 operator.21). contributes the area under the ln k curve.19 Integer sums Use Euler–MacLaurin summation to ﬁnd closed forms for the following sums: n n n (a) 0 k2 (b) 0 (2k + 1) (c) 0 k3 . pictorially. use Euler–MacLaurin summation to improve the accuracy until you can conﬁdently guess the closed form. chosen because its analysis combines diverse streetﬁghting tools. n The solutions to tan x = x or. from the 1/D operator. The picture for summing ln k (Section 4. The correction. the constant term is f(b) + f(a) 2—one-half of the ﬁrst term plus one-half of the last term. equivalently.20). The ellipsis includes the higher-order corrections (Problem 6. incorporates the triangular protrusions (Problem 6.4 Tangent roots: A daunting transcendental sum 113 The integral. Hint: Sum the ﬁrst few terms explicitly. the roots of tan x − x. is a diﬃcult inﬁnite sum.22 The Basel sum 1 Basel sum ∞ n−2 may be approximated with pictures (Problem 4. Problem 6. ln k ··· n k Problem 6.

the ﬁrst intersection is just before the asymptote at x = 3π/2.4. no intersection occurs in the branch of tan x where 0 < x < π/2 (Problem 6. 6 (6.3.1 Pictures and easy cases Begin the analysis with a hopefully easy case. x1 ≈ 3π/2. low-entropy expression for xn gives the big part of S (the zeroth approximation).2). roughly the following integral. (2n + 1)2 (6. (The result looks plausible pictorially but is worth checking in order to draw the picture.23 No intersection with the main branch Show symbolically that tan x = x has no solution for 0 < x < π/2.114 6 Analogy 6.) Where. approximately. 3 2 1 x y=x π 2 3π 2 5π 2 7π 2 Problem 6.4. Surprisingly. are the subsequent intersections? As x grows.5) or from Euler– 1 (2n + 1) MacLaurin summation (Section 6. make the following asymptote approximation for the big part of xn : xn ≈ n+ 1 2 π.23).2 Taking out the big part This approximate. Therefore. ∞ 1 (2n + 1)−2 ≈ ∞ 1 1 1 (2n + 1)−2 dn = − × 2 2n + 1 ∞ 1 = 1 .20) . S≈ n+ 1 π 2 −2 = 4 π2 ∞ 1 1 . the y = x line intersects the y = tan x graph ever higher and therefore ever closer to the vertical asymptotes. Thus. (6.18) 6.19) ≈xn ∞ −2 The sum is. What is the ﬁrst root x1 ? The roots of tan x − x are given by the intersections of y = x and y = tan x. from a picture (Section 4.

π 9 which is slightly higher than the ﬁrst estimate. after n making the asymptote approximation. Including the n = 0 term. which is 1. the pictorial approximation to the sum ∞ (2n + 1)−2 replaces each protrusion with an inscribed triangle 1 and thereby underestimates each protrusion (Problem 6.22) 1 2 3 4 k Therefore. and the even squared reciprocals 1/(2n)2 produces a compact and familiar lower-entropy sum. Is the new approximation an overestimate or an underestimate? (6. ∞ 1 1 +1+ (2n + 1)2 ∞ 1 1 = (2n)2 ∞ 1 1 . That rectangle has area 1/9.4 Tangent roots: A daunting transcendental sum 115 Therefore. n2 (6. π 6 The shaded protrusions are roughly triangles. . 4 1 S ≈ 2 × = 0. and they sum to one-half of the ﬁrst rectangle.25) .23) The new approximation is based on two underestimates. .21) (2k + 1)−2 (2n + 1)−2 ≈ 2 1 1 1 + × = . The sum is unfamiliar partly because its ﬁrst term 1 (2n + 1) is the fraction 1/9—whose arbitrariness increases the entropy of the sum. the asymptote approximation xn ≈ (n + 0.090063 . so ∞ 1 (6.24).6. a more accurate estimate of S is 2 4 S ≈ 2 × = 0.5)π overestimates each xn and therefore underestimates the squared reciprocals in the sum x−2 .24 Picture for the second underestimate Draw a picture of the underestimate in the pictorial approximation ∞ 1 1 1 1 1 ≈ + × . 6 2 9 (2n + 1)2 (6. . First. 6 2 9 9 (6. Second.067547 .24) How can these two underestimates be remedied? The second underestimate (the protrusions) is eliminated by summing ∞ −2 exactly. . . Problem 6.

22). 1 any remaining error in the estimate of S must belong to the asymptote approximation itself.24) that 1/6 + 1/18 = 2/9 underestimates ∞ (2n + 1)−2 . is ∞ 1 π2 1 1 − 1. Assembled together. 8 (6.090063 (integral approximation and triangular overshoots). Its value is B = π2 /6 (Problem 6.25 Check the earlier reasoning Check the earlier pictorial reasoning (Problem 6. S ≈ 0.29) Problem 6.28) Simplifying by expanding the product gives S≈ 1 4 − 2 = 0. How does knowing B = π2 /6 help evaluate the original sum ∞ 1 (2n + 1)−2 ? The major modiﬁcation from the original sum was to include the even squared reciprocals. Their sum is B/4.094715 . the estimates are ⎧ ∞ ⎨ 0. ⎩ 0. Thus. =B− B−1= (2n + 1)2 4 8 (6.116 6 Analogy The ﬁnal.26) The second modiﬁcation was to include the n = 0 term. n2 (6. after substituting B = π2 /6. adjust the Basel value B by subtracting B/4 and then the n = 0 term. ∞ 1 1 1 = 2 (2n) 4 ∞ 1 1 . low-entropy sum is the famous Basel sum (high-entropy results are not often famous). . The result. How accurate was that estimate? 1 This estimate of S is the third that uses the asymptote approximation xn ≈ (n + 0. 2 π (6. S≈ 4 π2 ∞ 1 1 4 = 2 2 (2n + 1) π π2 −1 . 1 Because the third estimate incorporated the exact value of ∞ (2n + 1)−2 .067547 (integral approximation to 1 (2n + 1)−2 ). . produces the following estimate of S.094715 (exact sum of ∞ (2n + 1)−2 ). to obtain ∞ −2 1 (2n + 1) .5)π.27) This exact sum. based on the asymptote approximation for xn . .

. the procedure rapidly converges to x1 = 4. which was based on the asymptote approximation.26 Absolute error in the early terms Estimate.09978 (Problem 6.5)π with a more accurate value. the fractional error in xn is.30) When the starting guess for x is slightly below the ﬁrst asymptote at 1. even more concentrated at n = 1. a highly educated guess is S= 1 . being −2 times the fractional error n in xn (Section 5.09921. Because x−2 is the n largest at n = 1. Problem 6. 1 1 + ≈ 0.4 Tangent roots: A daunting transcendental sum x−2 is the asymptote approximation most inaccurate? n 117 For which term of As x grows. Thus.5π) 4. (6.4934 . the 1/x2 term 2 gives S ≈ 0. the largest at n = 1. the graphs of x and tan x intersect ever closer to the vertical asymptote. the asymptote approximation makes its largest absolute error when n = 1. as a function of n.38). The fractional error in x−2 . the absolute error in x−2 (the fractional error times x−2 n n itself) is. relative to the absolute error in xn .3). . is equally concentrated at n = 1.094715.5π. With the error so concentrated at n = 1. A simple numerical approach is successive approximation using the Newton–Raphson method (Problem 4. the absolute error in x−2 that is produced by the n asymptote approximation. Therefore. make a starting guess x and repeatedly improve it using the replacement x −→ x − tan x − x .32) The inﬁnite sum of unknown transcendental numbers seems to be neither transcendental nor irrational! This simple and surprising rational number deserves a simple explanation. in addition.49342 Using the Newton–Raphson method to reﬁne. by far. Therefore.31) S ≈ Sold − 2 (1.27). sec2 x − 1 (6.6. . to improve the estimate S ≈ 0. the greatest improvement in the estimate of S comes from replacing the approximation x1 = (n + 0. Because x1 is the smallest root. To ﬁnd a root with this method. 10 (6. subtract its approximate ﬁrst term (its big part) and add the corrected ﬁrst term.

Then use the Newton–Raphson method to compute accurate values of xn for n = 1 . . a positive root. 3 15 (6. has two terms. tan x − x. The quadratic has only positive roots. no plausible method of combining 2 and −3 predicts that x−2 = 5/4. Unfortunately. therefore. after expansion. However. do the reﬁned estimates of S approach our educated guess of 1/10? 6. The simplest interesting polynomial is the quadratic. That wish is fulﬁlled by replacing tan x − x with a polynomial equation with simple roots. and use those values to reﬁne the estimate of S.118 6 Analogy Problem 6. its two coeﬃcients 2 and −3. It must use only surface features of the quadratic— namely. x−2 = n 1 1 5 + 2 = . An analogous polynomial—here. N. tan x − x = x3 2x5 + + ···. Indeed. however. and a symmetric negative root—is (x+2)x3 (x−2) or. This polynomial has two roots. a method that can transfer to the equation tan x − x = 0. . As you extend the computation to larger values of N.34) The common factor of x3 means that tan x − x has a triple root at x = 0. The sum n .27 Continuing the corrections Choose a small N. x2 − 3x + 2.33) This brute-force method for computing the root sum requires a solution to the quadratic equation. the n polynomial-root sum analog of the tangent-root sum.28). therefore x−2 . one with a triple root at x = 0. 2 1 2 4 (6.4. n Where did the polynomial analogy go wrong? The problem is that the quadratic x2 − 3x + 2 is not suﬃciently similar to tan x − x. the Taylor series for tan x is x + x3 /3 + 2x5 /15 + · · · (Problem 6. which has no closed-form solution. x−2 (using the positive root) contains only one term x5 − 4x3 . so experiment with a simple quadratic—for example. say 4. an odd function. x1 = 1 and x2 = 2. has symmetric positive and negative roots and has a root at x = 0.3 Analogy with polynomials If only the equation tan x − x = 0 had just a few closed-form solutions! Then the sum S would be easy to compute. cannot use the roots themselves.

39) The coeﬃcient of x2 in parentheses is analog of the tangent-root sum. (6. Placing k roots at x = 0 and single roots at ±x1 . try a richer polynomial: one with roots at −2. (6. and 3 and is 1/12 + 1/22 + 1/32 . 1. which is 5/4—the (negative) ratio of the last two coeﬃcients. its Taylor series is like an inﬁnite-degree polynomial. ±x2 .4 Tangent roots: A daunting transcendental sum 119 and is simply 1/4. The resulting polynomial is (x7 − 5x5 + 4x3 )(x + 3)(x − 3) = x9 − 14x7 + 49x5 − 36x3 . Although it is not a polynomial.35) The polynomial-root sum uses only the two positive roots 1 and 2 and is 1/12 + 1/22 .37) In this arrangement. . 36 36 36 (6. (6. and 2. ±xn gives the polynomial Axk 1 − x2 x2 1 1− x2 x2 2 1− x2 x2 3 ··· 1 − x2 x2 n .38) where A is a constant. the expansion begins k Axk 1 − 1 1 1 1 + 2 + 2 + ··· + 2 2 xn x1 x2 x3 x2 + · · · . 2. What is the origin of the pattern. −1. The Taylor series is . When expanding the product of the factors in parentheses. . This value could plausibly arise as the (negative) ratio of the last two coeﬃcients of the polynomial. include −3 and 3 among the roots. the sum 49/36 appears as the negative of the ﬁrst interesting coeﬃcient. As a ﬁnal test of this pattern. Thus. Let’s generalize. and how can it be extended to tan x − x? To explain the pattern. which is the polynomial n Let’s apply this method to tan x − x.36) The polynomial-root sum uses the three positive roots 1. To decide whether that pattern is a coincidence. which is 49/36—again the (negative) ratio of the last two coeﬃcients in the expanded polynomial. . One such polynomial is (x + 2)(x + 1)x3 (x − 1)(x − 2) = x7 − 5x5 + 4x3 . 0 (threefold). (6. the coeﬃcient of the x2 term in the expansion receives one contribution from each x2 /x2 term in a factor. x−2 . tidy the polynomial as follows: x9 − 14x7 + 49x5 − 36x3 = −36x3 1 − 49 2 14 4 1 x + x − x6 ..6.

To remove the inﬁnities without creating or destroying any roots. Fortunately. For the tangentn sum problem. Problem 6.120 6 Analogy x3 x3 2x5 17x7 + + + ··· = 3 15 315 3 2 17 4 1 + x2 + x + ··· .43) Hint: Use taking out the big part. as the negative of the x2 coeﬃcient.40) The negative of the x2 coeﬃcient should be − x−2 . whereas no polynomial does so even once. Unfortunately.41) The diﬀerence of the two series is sin x − x cos x = x3 3 1− 1 2 x + ··· . 5 105 (6. multiply tan x − x by cos x. 6 120 2 24 sin x x cos x sin x − x cos x x1 0 x3 x2 (6.29). A harder-to-solve problem is that tan x − x goes to inﬁnity at ﬁnite values of x. And there at last. The inﬁnities of tan x − x occur where tan x blows up. sits our tangent-root sum S = 1/10. and does so inﬁnitely often. Its Taylor expansion is x− x5 x5 x3 x3 + − ··· − x − + − ··· . . 3 15 (6. The polynomial-like function to expand is therefore sin x − x cos x. x−2 should therefore be −2/5. 10 (6. the sum n of positive quantities cannot be negative! What went wrong with the analogy? One problem is that tan x − x might have imaginary or complex roots whose squares contribute negative amounts to S. all its roots are real (Problem 6.28 Taylor series for the tangent Use the Taylor series for sin x and cos x to show that tan x = x + 2x5 x3 + + ···. which is where cos x = 0.42) The x3 /3 factor indicates the triple root at x = 0. The solution is to construct a function having no inﬁnities but having the same roots as tan x−x.

29 Only real roots Show that all roots of tan x − x are real. the conclusion is correct. you will sharpen them—and even build new tools. The Taylor expansion of cot2 x − x−2 is − 2 3 1− 1 2 1 x − x4 − · · · . what is wrong with the reasoning? Problem 6.30 Exact Basel sum Use the polynomial analogy to evaluate the Basel sum ∞ 1 1 . the tangent-root sum S—for cot x = x−2 and therefore tan x = x—should be 1/10. Problem 6. n2 (6. it is a root of cot2 x − x−2 . equivalently.22.45) Because the coeﬃcient of x2 is −1/10. 10 280 (6. if x is a root of tan x − x. easy cases. where the xn are the positive roots of cos x. taking out the big part. Check numerically Therefore ﬁnd n that your result is plausible. lumping.33 Find Other source equations for the roots x−2 . As you apply the tools.6. and analogy. n 6. Problem 6.46) x−4 for the positive roots of tan x = x.5 Bon voyage 121 Problem 6. As we found experimentally and analytically for tan x = x. Problem 6. Therefore. pictorial reasoning.31 Misleading alternative expansions Squaring and taking the reciprocal of tan x = x gives cot2 x = x−2 . cot2 x − x−2 = 0. 10 63 (6. May you ﬁnd diverse opportunities to use dimensional analysis. However.5 Bon voyage I hope that you have enjoyed incorporating street-ﬁghting methods into your problem-solving toolbox.44) Compare your result with your solution to Problem 6.32 Fourth powers of the reciprocals The Taylor series for sin x − x cos x continues x3 3 1− x2 x4 + − ··· . .

Bibliography

[1] [2] [3]

P. Agnoli and G. D’Agostini. Why does the meter beat the second?. arXiv:physics/0412078v2, 2005. Accessed 14 September 2009. John Morgan Allman. Evolving Brains. W. H. Freeman, New York, 1999. Gert Almkvist and Bruce Berndt. Gauss, Landen, Ramanujan, the arithmeticgeometric mean, ellipses, π, and the Ladies Diary. American Mathematical Monthly, 95(7):585–608, 1988. William J. H. Andrewes (Ed.). The Quest for Longitude: The Proceedings of the Longitude Symposium, Harvard University, Cambridge, Massachusetts, November 4–6, 1993. Collection of Historical Scientiﬁc Instruments, Harvard University, Cambridge, Massachusetts, 1996. Petr Beckmann. A History of Pi. Golem Press, Boulder, Colo., 4th edition, 1977. Lennart Berggren, Jonathan Borwein and Peter Borwein (Eds.). Pi, A Source Book. Springer, New York, 3rd edition, 2004. John Malcolm Blair. The Control of Oil. Pantheon Books, New York, 1976. Benjamin S. Bloom. The 2 sigma problem: The search for methods of group instruction as eﬀective as one-to-one tutoring. Educational Researcher, 13(6):4–16, 1984. E. Buckingham. On physically similar systems. Physical Review, 4(4):345–376, 1914.

[4]

[5] [6] [7] [8]

[9]

[10] Barry Cipra. Misteaks: And How to Find Them Before the Teacher Does. AK Peters, Natick, Massachusetts, 3rd edition, 2000. [11] David Corﬁeld. Towards a Philosophy of Real Mathematics. Cambridge University Press, Cambridge, England, 2003. [12] T. E. Faber. Fluid Dynamics for Physicists. Cambridge University Press, Cambridge, England, 1995. [13] L. P. Fulcher and B. F. Davis. Theoretical and experimental study of the motion of the simple pendulum. American Journal of Physics, 44(1):51–55, 1976. [14] George Gamow. Thirty Years that Shook Physics: The Story of Quantum Theory. Dover, New York, 1985. [15] Simon Gindikin. Tales of Mathematicians and Physicists. Springer, New York, 2007.

124

[16] Fernand Gobet and Herbert A. Simon. The role of recognition processes and look-ahead search in time-constrained expert problem solving: Evidence from grand-master-level chess. Psychological Science, 7(1):52-55, 1996. [17] Ronald L. Graham, Donald E. Knuth and Oren Patashnik. Concrete Mathematics. Addison–Wesley, Reading, Massachusetts, 2nd edition, 1994. [18] Godfrey Harold Hardy, J. E. Littlewood and G. Polya. Inequalities. Cambridge University Press, Cambridge, England, 2nd edition, 1988. [19] William James. The Principles of Psychology. Harvard University Press, Cambridge, MA, 1981. Originally published in 1890. [20] Edwin T. Jaynes. Information theory and statistical mechanics. Physical Review, 106(4):620–630, 1957. [21] Edwin T. Jaynes. Probability Theory: The Logic of Science. Cambridge University Press, Cambridge, England, 2003. [22] A. J. Jerri. The Shannon sampling theorem—Its various extensions and applications: A tutorial review. Proceedings of the IEEE, 65(11):1565–1596, 1977. [23] Louis V. King. On some new formulae for the numerical calculation of the mutual induction of coaxial circles. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, 100(702):60–66, 1921. [24] Charles Kittel, Walter D. Knight and Malvin A. Ruderman. Mechanics, volume 1 of The Berkeley Physics Course. McGraw–Hill, New York, 1965. [25] Anne Marchand. Impunity for multinationals. ATTAC, 11 September 2002. [26] Mars Climate Orbiter Mishap Investigation Board. Phase I report. Technical Report, NASA, 1999. [27] Michael R. Matthews. Time for Science Education: How Teaching the History and Philosophy of Pendulum Motion can Contribute to Science Literacy. Kluwer, New York, 2000. [28] R.D. Middlebrook. Low-entropy expressions: the key to design-oriented analysis. In Frontiers in Education Conference, 1991. Twenty-First Annual Conference. ‘Engineering Education in a New World Order’. Proceedings, pages 399–403, Purdue University, West Lafayette, Indiana, September 21–24, 1991. [29] R. D. Middlebrook. Methods of design-oriented analysis: The quadratic equation revisisted. In Frontiers in Education, 1992. Proceedings. Twenty-Second Annual Conference, pages 95–102, Vanderbilt University, November 11–15, 1992. [30] Paul J. Nahin. When Least is Best: How Mathematicians Discovered Many Clever Ways to Make Things as Small (or as Large) as Possible. Princeton University Press, Princeton, New Jersey, 2004. [31] Roger B. Nelsen. Proofs without Words: Exercises in Visual Thinking. Mathematical Association of America, Washington, DC, 1997.

Sussman. [46] D. Calculus Made Easy: Being a Very-Simplest Introduction to Those Beautiful Methods of Reasoning Which are Generally Called by the Terrifying Names of the Diﬀerential Calculus and the Integral Calculus. New Jersey. 2nd edition. Princeton. Salamin. Hoboken. Government Printing Oﬃce. . 1992. Pankhurst. Hutchinson’s University Library. [33] Robert A. J. [49] Paul Zeitz. Nelson and M. Nelsen. How to Solve It: A New Aspect of the Mathematical Method. [41] E. Thompson. [37] George Polya. DC. New Jersey. London. Productive Thinking. 1992. Princeton University Press. MIT. volume 2 of Mathematics and Plausible Reasoning. Life at low Reynolds number. Oxford University Press. [39] Gilbert Ryle. Dimensional Analysis and Scale Factors. New Jersey. Contact. [35] George Polya. [34] R. 1914. Physical Fluid Dynamics. DC. Olsson. enlarged edition. Walker and Company. Freeman. 1959. 112th edition. New York. [38] Edward M. American Journal of Physics. New Jersey. Macmillan. Stallman and Gerald J. Statistical Abstracts of the United States: 1992. 2nd edition. volume 1 of Mathematics and Plausible Reasoning. W. Mathematical Association of America. New York. Chapman and Hall. London. New York. Washington. 30:565–570. Simon & Schuster. The pendulum: Rich physics from a simple system. Computation of pi using arithmetic-geometric mean. New York. Purcell. 45(1):3–11. 1964. [36] George Polya. [44] Edwin F. 1949. The Art and Craft of Problem Solving. Proofs without Words II: More Exercises in Visual Thinking. American Journal of Physics. C. 54(2):112–121. [42] Dava Sobel. 1957/2004. Washington. Tritton. Wiley. Artiﬁcial Intelligence Laboratory. G. 1976. Induction and Analogy in Mathematics. [47] US Bureau of the Census. Harper.125 [32] Roger B. Mathematics of Computation. Princeton. [48] Max Wertheimer. New York. [40] Carl Sagan. Patterns of Plausible Inference. 1977. [43] Richard M. Longitude: The True Story of a Lone Genius Who Solved the Greatest Scientiﬁc Problem of His Time. Spacetime Physics: Introduction to Special Relativity. New York. 1954. 2nd edition. 2nd edition. Forward reasoning and dependencydirected backtracking in a system for computer-aided circuit analysis. AI Memos 380. Princeton University Press. 2000. Princeton University Press. 1954. 2007. 1988. 1976. The Concept of Mind. [45] Silvanus P. 1985. Princeton. 1986. Taylor and John Archibald Wheeler. 1995. H.

.

taking out see taking out the big part . 44 ω see angular frequency analogy. correcting the see also taking out the big part additive messier than multiplicative corrections 80 using multiplicative corrections see fractional changes using one or few 78 big part. 113. 121 beta function 98 big part. ν see kinematic viscosity 1 or few see few ≈ (approximately equal) 6 π. spatial 99–103 angular frequency 44 Aristotle xiv arithmetic–geometric mean 65 arithmetic-mean–geometric-mean inequality 60–66 applications 63–66 computing π 64–66 maxima 63–64 equality condition 62 numerical examples 60 pictorial proof 61–63 symbolic proof 61 arithmetic mean see also geometric mean picture for 62 asymptotes of tan x 114 atmospheric pressure 34 back-of-the-envelope estimates correcting 78 mental multiplication in 77 minimal accuracy required for 78 powers of 10 in 78 balancing 41 Basel sum ( n−2 ) 76. 116. 118. computing arctangent series 64 Brent–Salamin algorithm 65 ∝ (proportional to) 6 ∼ (twiddle) 6.Index An italic page number refers to a problem on that page. 120 pyramid volume 19 spatial angles 99–103 tangent-root sum 118–121 testing conjectures see conjectures: testing to polynomials 118–121 transforming dependent variable 101 angles. reasoning by 99–121 dividing space with planes 103–107 generating conjectures see conjectures: generating operators 107–113 left shift (L) 108–109 summation (Σ) 109 preserving crucial features 100.

68 convexity 104 copyright raising book prices 82 Corﬁeld. storage capacity 77–79 characteristic magnitudes (typical magnitudes) 44 characteristic times 44 checking units 78 circle area from circumference 76 as polygon with many sides 72 comparisons. CD capacity in 78 blackbody radiation 87 boundary layers 27 brain evolution 57 Buckingham. 105. dimensionless groups dimensionless constants Gaussian integral 10 simple harmonic motion 48 Stefan–Boltzmann law 11 dimensionless groups 24 drag 25 free-fall speed 24 pendulum period 48 spring–mass system 48 . 103. 51–54 orbital motion 12 pendulum 46 simplifying into algebraic equations 43–46 spring–mass system 42–45 exact solution 45 pendulum equation 47 dimensional analysis see dimensions. Edgar 26 calculus. David 105 cosine integral of high power 94–97 small-angle approximation derived 86 used 95 cube. 97 bisecting a triangle 70–73 bits. 43 degeneracies 103 derivative as a ratio 38 derivatives approximating with nonzero Δx 40 secant approximation 38 errors in 39 improved starting point 39 large error 38 vertical translation 39 second dimensions of 38 secant approximation to 38 signiﬁcant-change approximation 40–41 acceleration 43 Navier–Stokes derivatives 45 scale and translation invariance 40 translation invariance 40 desert-island method 32 diﬀerential equations checking dimensions 42 linearizing 47.128 binomial coeﬃcients 96. 106 constants of proportionality Stefan–Boltzmann constant 11 constraint propagation 5 contradictions 20 convergence. method of. 119 getting more data 100. bisecting 73 d (diﬀerential symbol) 10. 104. 101. fundamental idea of 31 CD-ROM see also CD same format as CD 77 CD/CD-ROM. 104. nonsense with diﬀerent dimensions 2 cone free-fall distance 35 cone templates 21 conical pendulum 48 conjectures discarding coincidences 105. 106. 119 explaining 119 generating 100. 105 probabilities of 105 testing 100. 107 binomial distribution 98 binomial theorem 90. 111. accelerating 65.

8–9 compared with easy cases 15 constraint propagation 5 drag 23–26 guessing integrals 7–11 Kepler’s third law 12 pendulum 48–49 related-rates problems 12 robust alternative to solving diﬀerential equations 5 Stefan–Boltzmann law 11 dimensions of angles 47 d (diﬀerential) 10 dx 10 exponents 8 integrals 9 9 integration sign kinematic viscosity ν 22 pendulum equation 47 second derivative 38. eﬀect on 93 high Reynolds number 28 low Reynolds number 30 quantities aﬀecting 23 drag force see drag e in fractional changes 90 earth surface area 79 surface temperature 87 easy cases 13–30 adding odd numbers 58 beta-function integral 98 bisecting a triangle 70 bond angles 100 checking formulas 13–17 compared with dimensions 15 ellipse area 16–17 ellipse perimeter 65 fewer lines 104 fewer planes 103 guessing integrals 13–16 high dimensionality 103 high Reynolds number 27 large exponents 89 low Reynolds number 30 of inﬁnite sound speed 92. method of 1–12 see also dimensionless groups advantages 6 checking diﬀerential equations 42 choosing unspeciﬁed dimensions 7. kinds of 6 .129 dimensionless quantities depth of well 94 fractional change times exponent 89 have lower entropy 94 having lower entropy 81 dimensions L for length 5 retaining 5 T for time 5 versus units 2 dimensions. 112 synthesizing formulas 17 truncated cone 21 truncated pyramid 18–21 ellipse area 17 perimeter 65 elliptical orbit eccentricity 87 position of sun 87 energy conservation 50 energy consumption in driving 82–84 eﬀect of longer commuting time 83 entropy of an expression see low-entropy expressions entropy of mixing 81 equality. 43 spring constant 43 summation sign Σ 9 drag 21–29 depth-of-well estimate. 94 pendulum large amplitude 49–51 small amplitude 47–48 polynomials 118 pyramid volume 19 roots of tan x = x 114 simple functions 108.

21 79 negative and fractional exponents 86–88 no plausible alternative to adding 82 picture 80 small changes add 82 square roots 85–86 . integral of 33 outruns any polynomial 36 exponents. number of 32–33 bisecting a triangle 70–73 bond angle in methane 99–103 depth of a well 91–94 derivative of cos x. derivatives.15 by 7.130 estimating derivatives see derivatives. estimating 40–41 dividing space with planes 103–107 drag on falling paper cones 21–29 ellipse area 16–17 energy savings from 55 mph speed limit 82–84 factorial function 36–37 free fall 3–6 Gaussian integral using dimensions 7–11 Gaussian integral using easy cases 13–16 logarithm series 66–70 maximizing garden area 63–64 multiplying 3.21 using fractional changes 79–80 using one or few 79 operators left shift (L) 108–109 summation (Σ) 109–113 pendulum period 46–54 power of multinationals 1–3 rapidly computing 1/13 84–85 seasonal temperature ﬂuctuations 86–88 spring–mass diﬀerential equation 42–45 square root of ten 85–86 storage capacity of a CD-ROM or CD 77–79 summing ln n! 73–75 tangent-root sum 113–121 trigonometric integral 94–97 volume of truncated pyramid 17–21 exponential decaying. signiﬁcant-change approximation Euler 113 see also Basel sum beta function 98 Euler–MacLaurin summation 112 Evolving Brains 57 exact solution invites algebra mistakes 4 examples adding odd numbers 58–60 arithmetic-mean–geometric-mean inequality 60–66 babies. 95 linear approximation 82 multiplying 3. 84 do not multiply 83 earth–sun distance 87 estimating wind power 84 exponent of −2 86 exponent of 1/4 87 general exponents 84–90 increasing accuracy 85.15 by 7. 86 introduced 79–80 large exponents 89–90. secant approximation. dimensions of 8 extreme cases see easy cases factorial integral representation 36 Stirling’s formula Euler–MacLaurin summation 112 lumping 36–37 pictures 74 summation representation 73 summing logarithm of 73–75 few as geometric mean 78 as invented number 78 for mental multiplication 78 fractional changes cube roots 86 cubing 83.

16 extending limits to ∞ 96 tail area 55 trapezoidal approximation 14 using dimensions 7–11 using easy cases 13–16 using lumping 34. as monetary ﬂow 1 geometric mean see also arithmetic mean. power of abstraction 7 maxima and minima 41. daunting trigonometric integral from 94 L (dimension of length) 5 Lennard–Jones potential 41 life expectancy 32 little bit (meaning of d) 10. moderate amplitudes 51 population estimates 32–33 too much 52 Mars Climate Orbiter. 43 logarithms analyzing fractional changes 90 integral deﬁnition 67 rational-function approximation 69 low-entropy expressions basis of scientiﬁc progress 81 dimensionless quantities are often 81 fractional changes are often 81 from successive approximation 93 high-entropy intermediate steps 81 introduced 80–82 reducing mixing entropy 81 roots of tan x = x 114 lumping 31–55 1/e heuristic 34 atmospheric pressure 34 circumscribed rectangle 67 diﬀerential equations 51–54 estimating derivatives 37–41 inscribed rectangle 67 integrals 33–37 pendulum. Edwin Thompson Jeﬀreys.131 squaring 82–84 tangent-root sum 117 free fall analysis using dimensions 3–6 depth of well 91–94 diﬀerential equation 4 impact speed (exact) 4 with initial velocity 30 fudging 33 fuel eﬃciency 85 Gaussian integral closed form. Harold 26 Kepler’s third law 25 kinematic viscosity (ν) 21. guessing 14. 27 Landau Institute. 35 GDP. 70 arithmetic-mean–geometric-mean inequality 63–64 . arithmetic-mean–geometric-mean theorem deﬁnition 60 picture for 61 three numbers 63 gestalt understanding 59 globalization 1 graphical arguments see pictorial proofs high-entropy expressions see also low-entropy expressions from quadratic formula 92 How to Solve It xiii Huygens 48 induction proof 58 information theory 81 integration approximating as multiplication see lumping inverse of diﬀerentiation 109 numerical 14 operator 109 intensity of solar radiation 86 isoperimetric theorem 73 105 Jaynes. crash of 3 Mathematics and Plausible Reasoning xiii mathematics.

George 105 population. 86 Reynolds number (Re) 27 high 27 low 30 rigor xiii rigor mortis xiii rounding to nearest integer 79 using one or few 78 scale invariance 40 seasonal temperature changes 86–88 seasonal temperature ﬂuctuations alternative explanation 88 secant approximation see derivatives. 84 including 85 range formula 30 rapid mental division 84–85 rational functions 69. 76 bisecting a triangle 70–73 compared to induction proof 58 dividing space with planes 107 factorial 73–75 logarithm series 66–70 Newton–Raphson method 76 roots of tan x = x 114 volume of sphere 76 pictorial reasoning depth of well 94 plausible alternatives see low-entropy expressions Polya.132 box volume 64 trigonometry 64 mental division 33 mental multiplication using one or few see few method versus trick 69 mixing entropy 81 Navier–Stokes equations diﬃcult to solve 22 inertial term 45 statement of 21 viscous term 46 Newton–Raphson method 76. estimating 32 power of multinationals 1–3 powers of ten 78 proportional reasoning 18 pyramid. looking for 90 pendulum diﬀerential equation 46 in weaker gravity 52 period of 46–54 perceptual abilities 58 pictorial proofs 57–76 adding odd numbers 58–60 area of circle 76 arithmetic-mean–geometric-mean inequality 60–63. 118 numerical integration 14 odd numbers. sum of 58–60 one or few if not accurate enough 79 operators derivative (D) 107 exponential of 108 ﬁnite diﬀerence (Δ) 110 integration 109 left shift (L) 108–109 right shift 109 summation (Σ) 109–113 parabola. truncated 17 quadratic formula 91 high entropy 92 versus successive approximation quadratic terms ignoring 80. 101 Re see Reynolds number related-rates problems 12 rewriting-as-a-ratio trick 68. secant approximation secant line. area without calculus 76 Pascal’s triangle 107 patterns. 82. slope of 38 93 . 117. 70.

114 Euler–MacLaurin 112. 70. 113. second Shannon–Nyquist sampling theorem 78 signiﬁcant-change approximation see derivatives. lumping.or overestimate? approximating depth of well computing square roots 86 92. taking out the big part. Silvanus 10 thought experiments 18. reasoning by transformations logarithmic 36 taking cosine 101 trapezoidal approximation 14 tricks multiplication by one 85 rewriting as a ratio 68. in 42 statistical mechanics 81 Stefan–Boltzmann constant 11. 107 tangent-root sum 114. 115 symbolic reasoning brain evolution 57 seeming like magic 61 symmetry 72 112 taking out the big part 77–98 depth of well 92–94 polynomial extrapolation 106. pictorial proofs. volume from surface area 76 spring–mass system 42–45 spring constant dimensions of 43 Hooke’s law. easy cases. analogy. dividing with planes 103–107 spectroscopy 35 sphere.133 second derivatives see derivatives. 66 solar-radiation intensity 86 space. 101 trick versus method 69 tutorial teaching xiv under. 69 cubic term 68 pendulum period 53 tangent 118. 50 tools see dimensions. signiﬁcant-change approximation similar triangles 61. analogy sine. 113 indeﬁnite 110 integral approximation 74 operator 109–113 represented using diﬀerentiation tangent roots 113–121 triangle correction 74. 93 . 70 simplifying problems see taking out the big part. 120 L (dimension of length) 5 tetrahedron. 87 Stefan–Boltzmann law derivation 11 requires temperature in Kelvin 88 to compute surface temperature 87 stiﬀness see spring constant Stirling’s formula see factorial: Stirling’s formula successive approximation see also taking out the big part depth of well 92–94 low-entropy expressions 93 physical insights 93 robustness 93 versus quadratic formula 93 summation approximately integration 113. easy cases. regular 99 The Art and Craft of Problem Solving xiii thermal expansion 82 Thompson. lumping. 86 variable transformation 36. small-angle approximation derived 47 used 86 small-angle approximation cosine 95 sine 47. method of. 117–118 trigonometric integral 94–97 Taylor series factorial integrand 37 general 66 logarithm 66.

134 lumping analysis 54 summation approximation 75 tangent-root sum 115 using one or few 79 units cancellation 78 Mars Climate Orbiter. Max 59 3 . crash of separating from quantities 4 versus dimensions 2 Wertheimer.

1 and the mpmath Python library aided several calculations.10. designed by Hermann Zapf and available as TeX Gyre Pagella. The text is set in Palatino.10.40. The TEX source was compiled to PDF using ConTeXt 2009.27 and PDFTeX 1. . The source ﬁles were created using many versions of GNU Emacs and managed using the Mercurial revision-control system. Maxima 5.81 and took 10 min on a 2006-vintage laptop.17. I warmly thank the many contributors to the software commons. The compilations were managed with GNU Make 3.88.This book was created entirely with free software and fonts. also designed by Hermann Zapf. The mathematics is set in Euler.208 and Asymptote 1. All software was running on Debian GNU/Linux. The ﬁgure source ﬁles were compiled with MetaPost 1.

- Act 2
- 50261677 an Introduction to Inequalities
- Gate Guide_Signals and Systems by R K Kanodia (2)
- 1001 Algebra Problems
- CERGLecture2004
- Mathematical Brain Benders 2nd Miscellany of Puzzles
- [Szabo Richard] an Introduction to String Theory a(BookFi.org)
- Algebra Know It All Malestrom
- Algorithms and Application Programming
- The Art and Craft of Problem Solving
- liangPHD
- Handbook of Mathematical Cognition
- How to Make Jurnal( Group 7)
- 54601090 Maths Puzzles
- An efficient meta-heuristic algorithm for grid computing
- Matrix Analysis for Scientists and Engineers by Alan J Laub
- Handout MATH F212
- Exponential Distribution Theory and Methods
- The Complexity of Boolean Functions 1987
- Mathematics for Computer Science
- afi-samples
- Accessible Math
- Linear-quadratic-regulator
- A Course in Combinatorics.vilsonVanlint
- Lab 1 Assignment.pdf
- Bibliografia Olimpiadas
- Neural Network Stuff
- TheoryOfPlatesAndShellsS.timoshenko2ndEdition
- chap1
- A Mathematical Introduction to Control Theory

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd