Notes by Leah EdelsteinKeshet: All rights reserved
University of British Columbia
January 2, 2010
ii Leah EdelsteinKeshet
Contents
Preface xvii
1 Areas, volumes and simple sums 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Areas of simple shapes . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Example 1: Finding the area of a polygon using triangles:
a “dissection” method . . . . . . . . . . . . . . . . . . . 3
1.2.2 Example 2: How Archimedes discovered the area of a
circle: dissect and “take a limit” . . . . . . . . . . . . . . 4
1.3 Simple volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Example 3: The Tower of Hanoi: a tower of disks . . . . 8
1.4 Summations and the “Sigma” notation . . . . . . . . . . . . . . . . . 9
1.4.1 Manipulations of sums . . . . . . . . . . . . . . . . . . . 10
1.5 Summation formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.1 Example 3, revisited: Volume of a Tower of Hanoi . . . . 12
1.6 Summing the geometric series . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Prelude to inﬁnite series . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7.1 The inﬁnite geometric series . . . . . . . . . . . . . . . . 15
1.7.2 Example: A geometric series that converges. . . . . . . . 16
1.7.3 Example: A geometric series that diverges . . . . . . . . 16
1.8 Application of geometric series to the branching structure of the lungs 16
1.8.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . 17
1.8.2 A simple geometric rule . . . . . . . . . . . . . . . . . . 19
1.8.3 Total number of segments . . . . . . . . . . . . . . . . . 20
1.8.4 Total volume of airways in the lung . . . . . . . . . . . . 20
1.8.5 Total surface area of the lung branches . . . . . . . . . . 21
1.8.6 Summary of predictions for speciﬁc parameter values . . 22
1.8.7 Exploring the problem numerically . . . . . . . . . . . . 23
1.8.8 For further independent study . . . . . . . . . . . . . . . 23
1.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Areas 27
2.1 Areas in the plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Computing the area under a curve by rectangular strips . . . . . . . . 29
iii
iv Contents
2.2.1 First approach: Numerical integration using a spreadsheet 29
2.2.2 Second approach: Analytic computation using Riemann
sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 The area of a leaf . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Area under an exponential curve . . . . . . . . . . . . . . . . . . . . 35
2.5 Extensions and other examples . . . . . . . . . . . . . . . . . . . . . 36
2.6 The deﬁnite integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.1 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7 The area as a function . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3 The Fundamental Theorem of Calculus 43
3.1 The deﬁnite integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Properties of the deﬁnite integral . . . . . . . . . . . . . . . . . . . . 44
3.3 The area as a function . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . 47
3.4.1 Fundamental theorem of calculus: Part I . . . . . . . . . 47
3.4.2 Example: an antiderivative . . . . . . . . . . . . . . . . . 47
3.4.3 Fundamental theorem of calculus: Part II . . . . . . . . . 48
3.5 Review of derivatives (and antiderivatives) . . . . . . . . . . . . . . . 49
3.6 Examples: Computing areas with the Fundamental Theoremof Calculus 50
3.6.1 Example 1: The area under a polynomial . . . . . . . . . 50
3.6.2 Example 2: Simple areas . . . . . . . . . . . . . . . . . 50
3.6.3 Example 3: The area between two curves . . . . . . . . . 52
3.6.4 Example 4: Area of land . . . . . . . . . . . . . . . . . . 53
3.7 Qualitative ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.7.1 Example: sketching A(x) . . . . . . . . . . . . . . . . . 56
3.8 Some ﬁne print . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.8.1 Function unbounded I . . . . . . . . . . . . . . . . . . . 57
3.8.2 Function unbounded II . . . . . . . . . . . . . . . . . . . 57
3.8.3 Example: Function discontinuous or with distinct parts . 57
3.8.4 Function undeﬁned . . . . . . . . . . . . . . . . . . . . 57
3.8.5 Inﬁnite domain (“improper integral”) . . . . . . . . . . . 58
3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4 Applications of the deﬁnite integral to velocities and rates 61
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Displacement, velocity and acceleration . . . . . . . . . . . . . . . . 62
4.2.1 Geometric interpretations . . . . . . . . . . . . . . . . . 62
4.2.2 Displacement for uniform motion . . . . . . . . . . . . . 63
4.2.3 Uniformly accelerated motion . . . . . . . . . . . . . . . 63
4.2.4 Nonconstant acceleration and terminal velocity . . . . . 64
4.3 From rates of change to total change . . . . . . . . . . . . . . . . . . 66
4.3.1 Tree growth rates . . . . . . . . . . . . . . . . . . . . . . 68
Contents v
4.3.2 Radius of a tree trunk . . . . . . . . . . . . . . . . . . . 68
4.3.3 Birth rates and total births . . . . . . . . . . . . . . . . . 71
4.4 Production and removal . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5 Present value of a continuous income stream . . . . . . . . . . . . . . 74
4.6 Average value of a function . . . . . . . . . . . . . . . . . . . . . . . 76
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5 Applications of the deﬁnite integral to calculating volume, mass, and length 81
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Mass distributions in one dimension . . . . . . . . . . . . . . . . . . 82
5.2.1 A discrete distribution: total mass of beads on a wire . . . 82
5.2.2 A continuous distribution: mass density and total mass . . 82
5.2.3 Example: Actin density inside a cell . . . . . . . . . . . 84
5.3 Mass distribution and the center of mass . . . . . . . . . . . . . . . . 85
5.3.1 Center of mass of a discrete distribution . . . . . . . . . . 85
5.3.2 Center of mass of a continuous distribution . . . . . . . . 85
5.3.3 Example: Center of mass vs average mass density . . . . 86
5.3.4 Physical interpretation of the center of mass . . . . . . . 87
5.4 Miscellaneous examples and related problems . . . . . . . . . . . . . 87
5.4.1 Example: A glucose density gradient . . . . . . . . . . . 87
5.4.2 Example: A circular colony of bacteria . . . . . . . . . . 89
5.5 Volumes of solids of revolution . . . . . . . . . . . . . . . . . . . . . 90
5.5.1 Volumes of cylinders and shells . . . . . . . . . . . . . . 90
5.5.2 Computing the Volumes . . . . . . . . . . . . . . . . . . 91
5.6 Length of a curve: Arc length . . . . . . . . . . . . . . . . . . . . . . 96
5.6.1 How the alligator gets its smile . . . . . . . . . . . . . . 99
5.6.2 References . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6 Techniques of Integration 107
6.1 Differential notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2 Antidifferentiation and indeﬁnite integrals . . . . . . . . . . . . . . . 110
6.2.1 Integrals of derivatives . . . . . . . . . . . . . . . . . . . 111
6.3 Simple substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.3.1 Example: Simple substitution . . . . . . . . . . . . . . . 112
6.3.2 How to handle endpoints . . . . . . . . . . . . . . . . . 113
6.3.3 Examples: Substitution type integrals . . . . . . . . . . . 113
6.3.4 When simple substitution fails . . . . . . . . . . . . . . . 115
6.3.5 Checking your answer . . . . . . . . . . . . . . . . . . . 116
6.4 More substitutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.4.1 Example: perfect square in denominator . . . . . . . . . 116
6.4.2 Example: completing the square . . . . . . . . . . . . . . 117
6.4.3 Example: factoring the denominator . . . . . . . . . . . 117
6.5 Trigonometric substitutions . . . . . . . . . . . . . . . . . . . . . . . 118
6.5.1 Example: simple trigonometric substitution . . . . . . . . 118
6.5.2 Example: using trigonometric identities (1) . . . . . . . . 119
vi Contents
6.5.3 Example: using trigonometric identities (2) . . . . . . . . 119
6.5.4 Example: converting to trigonometric functions . . . . . 120
6.5.5 Example: The centroid of a two dimensional shape . . . . 122
6.5.6 Example: tan and sec substitution . . . . . . . . . . . . . 123
6.6 Partial fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.6.1 Example: partial fractions (1) . . . . . . . . . . . . . . . 124
6.6.2 Example: partial fractions (2) . . . . . . . . . . . . . . . 125
6.6.3 Example: partial fractions (3) . . . . . . . . . . . . . . . 126
6.7 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7 Discrete probability and the laws of chance 133
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2 Dealing with data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.3 Simple experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3.2 Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3.3 Empirical probability . . . . . . . . . . . . . . . . . . . 134
7.3.4 Theoretical Probability . . . . . . . . . . . . . . . . . . . 135
7.3.5 Random variables and probability distributions . . . . . . 135
7.3.6 The cumulative distribution . . . . . . . . . . . . . . . . 136
7.4 Examples of experimental data . . . . . . . . . . . . . . . . . . . . . 136
7.4.1 Example1: Tossing a coin . . . . . . . . . . . . . . . . . 136
7.4.2 Example 2: grade distributions . . . . . . . . . . . . . . 137
7.5 Mean and variance of a probability distribution . . . . . . . . . . . . . 137
7.6 Bernoulli trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.6.1 The Binomial distribution . . . . . . . . . . . . . . . . . 140
7.6.2 The Binomial theorem . . . . . . . . . . . . . . . . . . . 142
7.6.3 The binomial distribution . . . . . . . . . . . . . . . . . 143
7.6.4 The normalized binomial distribution . . . . . . . . . . . 145
7.7 HardyWeinberg genetics . . . . . . . . . . . . . . . . . . . . . . . . 146
7.7.1 Random nonassortative mating . . . . . . . . . . . . . . 147
7.8 Random walker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8 Continuous probability distributions 153
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.2 Basic deﬁnitions and properties . . . . . . . . . . . . . . . . . . . . . 153
8.2.1 Example: probability density and the cumulative function 156
8.3 Mean and median . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.3.1 Example: Mean and median . . . . . . . . . . . . . . . . 158
8.3.2 How is the mean different from the median? . . . . . . . 160
8.3.3 Example: a nonsymmetric distribution . . . . . . . . . . 161
8.4 Applications of continuous probability . . . . . . . . . . . . . . . . . 161
8.4.1 Radioactive decay . . . . . . . . . . . . . . . . . . . . . 162
8.4.2 Discrete versus continuous probability . . . . . . . . . . 165
Contents vii
8.4.3 Example: Student heights . . . . . . . . . . . . . . . . . 166
8.4.4 Example: Age dependent mortality . . . . . . . . . . . . 167
8.4.5 Example: Raindrop size distribution . . . . . . . . . . . 169
8.5 Moments of a probability density . . . . . . . . . . . . . . . . . . . . 171
8.5.1 Deﬁnition of moments . . . . . . . . . . . . . . . . . . . 171
8.5.2 Relationship of moments to mean and variance of a prob
ability density . . . . . . . . . . . . . . . . . . . . . . . 172
8.5.3 Example: computing moments . . . . . . . . . . . . . . 174
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9 Differential Equations 177
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.2 Unlimited population growth . . . . . . . . . . . . . . . . . . . . . . 178
9.2.1 A simple model for population growth . . . . . . . . . . 178
9.2.2 Separation of variables and integration . . . . . . . . . . 179
9.3 Terminal velocity and steady states . . . . . . . . . . . . . . . . . . . 180
9.3.1 Ignoring friction: the uniformly accelerated case . . . . . 181
9.3.2 Including friction: the case of terminal velocity . . . . . . 181
9.3.3 Steady state . . . . . . . . . . . . . . . . . . . . . . . . 184
9.4 Related problems and examples . . . . . . . . . . . . . . . . . . . . . 184
9.4.1 Blood alcohol . . . . . . . . . . . . . . . . . . . . . . . 185
9.4.2 Chemical kinetics . . . . . . . . . . . . . . . . . . . . . 185
9.5 Emptying a container . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.5.1 Conservation of mass . . . . . . . . . . . . . . . . . . . 186
9.5.2 Conservation of energy . . . . . . . . . . . . . . . . . . 188
9.5.3 Putting it together . . . . . . . . . . . . . . . . . . . . . 188
9.5.4 Solution by separation of variables . . . . . . . . . . . . 189
9.5.5 How long will it take the tank to empty? . . . . . . . . . 191
9.6 Density dependent growth . . . . . . . . . . . . . . . . . . . . . . . . 191
9.6.1 The logistic equation . . . . . . . . . . . . . . . . . . . . 191
9.6.2 Scaling the equation . . . . . . . . . . . . . . . . . . . . 192
9.6.3 Separation of variables . . . . . . . . . . . . . . . . . . . 192
9.6.4 Application of partial fractions . . . . . . . . . . . . . . 192
9.6.5 The solution of the logistic equation . . . . . . . . . . . . 193
9.6.6 What this solution tells us . . . . . . . . . . . . . . . . . 194
9.7 Extensions and other population models: the “Law of Mortality” . . . 195
9.7.1 Aging and Survival curves for a cohort: . . . . . . . . . . 196
9.7.2 Gompertz Model . . . . . . . . . . . . . . . . . . . . . . 197
9.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10 Inﬁnite series, improper integrals, and Taylor series 199
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
10.2 Convergence and divergence of series . . . . . . . . . . . . . . . . . . 200
10.3 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
10.3.1 Example: A decaying exponential: convergent improper
integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
viii Contents
10.3.2 Example: The improper integral of 1/x diverges . . . . . 203
10.3.3 Example: The improper integral of 1/x
2
converges . . . . 204
10.3.4 When does the integral of 1/x
p
converge? . . . . . . . . 204
10.3.5 Integral comparison tests . . . . . . . . . . . . . . . . . 205
10.4 Comparing integrals and series . . . . . . . . . . . . . . . . . . . . . 206
10.4.1 The harmonic series . . . . . . . . . . . . . . . . . . . . 206
10.5 From geometric series to Taylor polynomials . . . . . . . . . . . . . . 208
10.5.1 Example 1: A simple expansion . . . . . . . . . . . . . . 209
10.5.2 Example 2: Another substitution . . . . . . . . . . . . . 210
10.5.3 Example 3: An expansion for the logarithm . . . . . . . . 210
10.5.4 Example 4: An expansion for arctan . . . . . . . . . . . 211
10.6 Taylor Series: a systematic approach . . . . . . . . . . . . . . . . . . 212
10.6.1 Taylor series for the exponential function, e
x
. . . . . . . 213
10.6.2 Taylor series of trigonometric functions . . . . . . . . . . 214
10.7 Application of Taylor series . . . . . . . . . . . . . . . . . . . . . . . 216
10.7.1 Example 1: using a Taylor series to evaluate an integral . 216
10.7.2 Example 2: Series solution of a differential equation . . . 217
10.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
11 Appendix 221
11.1 How to prove the formulae for sums of squares and cubes . . . . . . . 221
11.2 Riemann Sums: Extensions and other examples . . . . . . . . . . . . 223
11.2.1 A general interval: a ≤ x ≤ b . . . . . . . . . . . . . . . 223
11.2.2 Using left (rather than right) endpoints . . . . . . . . . . 224
11.3 Physical interpretation of the center of mass . . . . . . . . . . . . . . 226
11.4 The shell method for computing volumes . . . . . . . . . . . . . . . . 229
11.4.1 Example: Volume of a cone using the shell method . . . . 229
11.5 More techniques of integration . . . . . . . . . . . . . . . . . . . . . 230
11.5.1 Secants and other “hard integrals” . . . . . . . . . . . . . 230
11.5.2 A special case of integration by partial fractions . . . . . 231
11.6 Analysis of data: a student grade distribution . . . . . . . . . . . . . . 232
11.6.1 Deﬁning an average grade . . . . . . . . . . . . . . . . . 232
11.6.2 Fraction of students that scored a given grade . . . . . . . 232
11.6.3 Frequency distribution . . . . . . . . . . . . . . . . . . . 233
11.6.4 Average/mean of the distribution . . . . . . . . . . . . . 233
11.6.5 Cumulative function . . . . . . . . . . . . . . . . . . . . 234
11.6.6 The median . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.7 Factorial notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.8 Appendix: Permutations and combinations . . . . . . . . . . . . . . . 236
11.8.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . 236
11.9 Appendix: Tests for convergence of series . . . . . . . . . . . . . . . 237
11.9.1 The ratio test: . . . . . . . . . . . . . . . . . . . . . . . 238
11.9.2 Series comparison tests . . . . . . . . . . . . . . . . . . 239
11.9.3 Alternating series . . . . . . . . . . . . . . . . . . . . . 240
11.10 Adding and multiplying series . . . . . . . . . . . . . . . . . . . . . . 240
11.11 Using series to solve a differential equation . . . . . . . . . . . . . . . 241
Contents ix
Index 243
x Contents
List of Figures
1.1 Planar regions whose areas are given by elementary formulae. . . . . . . 2
1.2 Dissecting n nsided polygon into n triangles . . . . . . . . . . . . . . . 3
1.3 Archimedes’ approximation of the area of a circle . . . . . . . . . . . . 5
1.4 3dimensional shapes whose volumes are given by elementary formulae . 7
1.5 Computing the volume of a set of disks. (This structure is sometimes
called the tower of Hanoi after a mathematical puzzle by the same name.) 8
1.6 Branched structure of the lung airways . . . . . . . . . . . . . . . . . . 17
1.7 Volume and surface area of the lung airways . . . . . . . . . . . . . . . 24
2.1 Areas of regions in the plane . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Increasing the number of strips improves the approximation . . . . . . . 28
2.3 Approximating an area by a set of rectangles . . . . . . . . . . . . . . . 30
2.4 The area of a leaf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5 The area corresponding to the deﬁnite integral of the function f(x) . . . 37
2.6 More areas related to deﬁnite integrals . . . . . . . . . . . . . . . . . . . 38
2.7 The area A(x) considered as a function . . . . . . . . . . . . . . . . . . 39
3.1 Deﬁnite integrals for functions that take on negative values, and proper
ties of the deﬁnite integral . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 How the area changes when the interval changes . . . . . . . . . . . . . 46
3.3 The area of a symmetric region . . . . . . . . . . . . . . . . . . . . . . 51
3.4 The areas A
1
and A
2
in Example 3 . . . . . . . . . . . . . . . . . . . . 52
3.5 The “area function” corresponding to a function f(x) . . . . . . . . . . . 54
3.6 Sketching the antiderivative of f(x) . . . . . . . . . . . . . . . . . . . . 55
3.7 Sketches of a functions and its antiderivative . . . . . . . . . . . . . . . 56
3.8 Splitting up a region to compute an integral . . . . . . . . . . . . . . . . 58
3.9 Integrating in the y direction . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1 Displacement and velocity as areas under curves . . . . . . . . . . . . . 63
4.2 Terminal velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Tree growth rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4 The rate of change of a tree radius . . . . . . . . . . . . . . . . . . . . . 69
4.5 The tree radius as a function of time . . . . . . . . . . . . . . . . . . . . 70
4.6 Rates of hormone production and removal . . . . . . . . . . . . . . . . . 72
4.7 Approximating hormone production/removal . . . . . . . . . . . . . . . 74
xi
xii List of Figures
4.8 The yearly day length cycle and average day length . . . . . . . . . . . . 78
5.1 Discrete mass distribution . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Continuous mass distribution . . . . . . . . . . . . . . . . . . . . . . . 83
5.3 The actin cortex of a ﬁsh keratocyte cell . . . . . . . . . . . . . . . . . . 84
5.4 A glucose gradient in a test tube . . . . . . . . . . . . . . . . . . . . . . 88
5.5 A bacterial colony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.6 Volumes of simple 3D shapes . . . . . . . . . . . . . . . . . . . . . . . 91
5.7 Dissecting a solid of revolution into disks . . . . . . . . . . . . . . . . . 91
5.8 Volume of one of the disks . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.9 Generating a sphere by rotating a semicircle . . . . . . . . . . . . . . . . 93
5.10 A paraboloid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.11 Dissecting a curve into small arcs . . . . . . . . . . . . . . . . . . . . . 97
5.12 Elements of arclength . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.13 Using the spreadsheet to compute and graph arclength . . . . . . . . . . 100
5.14 Alligator mississippiensis and its teeth . . . . . . . . . . . . . . . . . . . 103
5.15 Analysis of distance between successive teeth . . . . . . . . . . . . . . . 104
6.1 Slope of a straight line, m = ∆y/∆x . . . . . . . . . . . . . . . . . . . 108
6.2 Figure illustrating differential notation . . . . . . . . . . . . . . . . . . . 108
6.3 A helpful triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.4 A semicircular shape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.5 As in Figure 6.3 but for example 6.5.6. . . . . . . . . . . . . . . . . . . 124
7.1 A plot of data from a coin tossing experiment . . . . . . . . . . . . . . . 138
7.2 The Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.3 The Normal (Gaussian) distribution . . . . . . . . . . . . . . . . . . . . 145
7.4 Normal probability density and its cumulative function . . . . . . . . . . 146
7.5 Hardy Weinberg mating . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.6 A random walker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.1 Probability density and its cumulative function in Example 8.2.1 . . . . . 157
8.2 Median for Example 8.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.3 Mean versus median . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.4 Median and median for a nonsymmetric probability density . . . . . . . 162
8.5 Reﬁning a histogramby increasing the number of bins leads (eventually)
to the idea of a continuous probability density. . . . . . . . . . . . . . . 166
8.6 Raindrop radius and volume probability distributions . . . . . . . . . . . 169
9.1 Terminal velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9.2 Blood alcohol level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.3 Emptying a container . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9.4 Height of ﬂuid versus time . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.5 Solutions to the logistic equation . . . . . . . . . . . . . . . . . . . . . . 194
9.6 Gompertz Law of Mortality . . . . . . . . . . . . . . . . . . . . . . . . 196
10.1 Approximating a function . . . . . . . . . . . . . . . . . . . . . . . . . 199
List of Figures xiii
10.2 Convergence and divergence of an inﬁnite series . . . . . . . . . . . . . 201
10.3 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
10.4 The harmonic series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
10.5 Taylor polynomials for sin(x) . . . . . . . . . . . . . . . . . . . . . . . 215
11.1 Rectangles attached to left or right endpoints . . . . . . . . . . . . . . . 225
11.2 Rectangles with left or right corners on the graph of y = x
2
. . . . . . . 227
11.3 Center of mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
11.4 A cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
11.5 Student grade distribution . . . . . . . . . . . . . . . . . . . . . . . . . 233
11.6 Cumulative grade function and the median . . . . . . . . . . . . . . . . 235
11.7 Permutations and combinations . . . . . . . . . . . . . . . . . . . . . . 237
xiv List of Figures
List of Tables
1.1 Typical structure of branched airway passages in lungs. . . . . . . . . . . 18
1.2 Volume, surface area, scale factors, and other derived quantities . . . . . 22
1.3 Areas of planar regions . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4 Volumes of 3D shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5 Surface areas of 3D shapes . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.6 Useful summation formulae . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1 Heights and areas of rectangular strips . . . . . . . . . . . . . . . . . . . 31
3.1 Common functions and their antiderivatives . . . . . . . . . . . . . . . . 49
5.1 Arc length calculated using spreadsheet . . . . . . . . . . . . . . . . . . 101
5.2 Alligator teeth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.1 Data from a cointossing experiment . . . . . . . . . . . . . . . . . . . . 137
7.2 A Bernoulli trial with n = 3 repetitions . . . . . . . . . . . . . . . . . . 141
7.3 Probability of X successes in a Bernoulli trial with n = 3 repetitions . . 141
7.4 Pascal’s triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.5 Hardy Weinberg gene probabilities . . . . . . . . . . . . . . . . . . . . 147
7.6 Mating table for HardyWeinberg genetics . . . . . . . . . . . . . . . . 148
11.1 Student test scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
xv
xvi List of Tables
Preface
Integral calculus arose originally to solve very practical problems that merchants,
landowners, and ordinary people faced on a daily basis. Among such pressing problems
were the following: How much should one pay for a piece of land? If that land has an
irregular shape, i.e. is not a simple geometrical shape, how should its area (and therefore,
its cost) be calculated? How much olive oil or wine, are you getting when you purchase
a barrelfull? Barrels come is a variety of shapes and sizes. If the barrel is not close
to cylindrical, what is its volume (and thus, a reasonable price to pay)? In most such
transactions, the need to accurately measure an area or a volume went well beyond the
available results of geometry. (It was known how to compute areas of rectangles, triangles,
and polygons. Volumes of cylinders and cubes were also known, but these were at best
crude approximations to actual shapes and objects encountered in commerce.) This led to
motivation for the development of the topic we now call integral calculus.
Essentially, the approach is based on the idea of “divide and conquer”: that is, cut up
the geometric shape into smaller pieces, and approximate those pieces by regular shapes
that can be quantiﬁed using simple geometry. In computing the area of an irregular shape,
add up the areas of the (approximately regular) little parts in your “dissection”, to arrive at
an approximation of the desired area of the shape. Depending on how ﬁne the dissection
(i.e. how many little parts), this approximation could be quite crude, or fairly accurate.
The idea of applying a limit to obtain the true dimensions of the object was a ﬂash of
inspiration that led to modern day calculus. Similar ideas apply to computing the volume
of a 3D object by successive subdivisions.
It is the aim of a calculus course to develop the language to deal with such concepts,
to make such concepts systematic, and to ﬁnd convenient and relevant shortcuts that can
be used to solve a variety of problems that have common features. More than that, it is
the purpose of this course to show that ideas developed in the original context of geometry
(ﬁnding areas or volumes of 2D or 3D shapes) can be generalized and extended to a variety
of applications that have little to do with geometry.
One area of application is that of computing total change given some timedependent
rate of change. We encounter many cases where a process changes at a rate that varies
over time: the rate of production of hormone changes over a day, the rate of ﬂow of water
in a river changes over the seasons, or the rate of motion of a vehicle (i.e. its velocity)
changes over its path. Computing the total change over some time span turns out to be
closely related to the same underlying concept of “divide and conquer”: namely, subdivide
(the time interval) and add up approximate changes over each of the smaller subintervals.
The same idea applies to quantities that are distributed not in time but rather over space.
xvii
xviii Preface
We show the connection between material that is spatially distributed in a nonuniformway
(e.g. a density that varies from point to point) and total amount of material (obtained by
the same process of integration).
A theme that unites much of the approach is that integral calculus has both analytic
(i.e. pencil and paper) calculations  but these apply to a limited set of cases, and analogous
numerical (i.e. computerenabled) calculations. The two go handinhand, with concepts
that are closely linked. A set of computer labs using a spreadsheet tool are an important
part of this course. The importance of seeing calculus from these two distinct but related
perspectives is stressed: on the one hand, analytic computations can be very powerful and
helpful, but at the same time, many interesting problems are too challenging to be handled
by integration techniques. Here is where the same ideas, used in the context of simple
computer algorithms, comes in handy. For this reason, the importance of understanding the
concepts (not just the technical results, or the “formulae” for integrals) is vital: Ideas used to
develop the analytic techniques on which calculus is based can be adapted to develop good
working methods for harnessing computer power to solve problems. This is particularly
useful in cases where the analytic methods are not sufﬁcient or too technically challenging.
This set of lecture notes grewout of many years of teaching of Mathematics 103. The
material is organized as follows: In Chapter 1 we develop the basic formulae for areas and
volumes of elementary shapes, and showhowto set up summations that describe compound
objects made up of many such shapes. An example to motivate these ideas is the volume
and surface area of a branching structure. In Chapter 2, we turn attention to the classic
problem of deﬁning and computing the area of a twodimensional region, leading to the
notion of the deﬁnite integral. In Chapter 3, we discuss the linchpin of Integral Calculus,
namely the Fundamental Theorem that connects derivatives and integrals. This allows us
to ﬁnd a great shortcut to the analytic computations described in Chapter 2. Applications
of these ideas to calculating total change from rates of change, and to computing volumes
and masses are discussed in Chapters 4 and 5.
To expand our reach to other cases, we discuss the techniques on integration in Chap
ter 6. Here, we ﬁnd that the chain rule of calculus reappears (in the form of substitution
integrals), and a variety of miscellaneous tricks are devised to simplify integrals. Among
these, the most important is integration by parts, a technique that has independent applica
tions in many areas of science.
We study the ideas of probability in Chapters 7 and 8. Here we rediscover the con
nection between discrete sums and continuous integration, and apply the techniques to
computing expected values for random variables. The connection between the mean (in
probability) and the center of mass (of a density distributed in space) is illustrated.
Many scientiﬁc problems are phrased in terms of rules about rates of change. Quite
often such rules take the form of differential equations. In an earlier differential calculus
course, the student will have made acquaintance with the topic of such equations and qual
itative techniques associated with interpreting their solutions. With the methods of integral
calculus in hand, we can solve some types of differential equations analytically. This is
discussed in Chapter 9.
The course concludes with the development of some notions of inﬁnite sums and con
vergence in Chapter 10. Of prime importance, the Taylor series is developed and discussed
in this concluding chapter.
Chapter 1
Areas, volumes and
simple sums
1.1 Introduction
This introductory chapter has several aims. First, we concentrate here a number of basic
formulae for areas and volumes that are used later in developing the notions of integral
calculus. Among these are areas of simple geometric shapes and formulae for sums of
certain common sequences. An important idea is introduced, namely that we can use the
sum of areas of elementary shapes to approximate the areas of more complicated objects,
and that the approximation can be made more accurate by a process of reﬁnement.
We show using examples how such ideas can be used in calculating the volumes or
areas of more complex objects. In particular, we conclude with a detailed exploration of
the structure of branched airways in the lung as an application of ideas in this chapter.
1.2 Areas of simple shapes
One of the main goals in this course will be calculating areas enclosed by curves in the
plane and volumes of three dimensional shapes. We will ﬁnd that the tools of calculus will
provide important and powerful techniques for meeting this goal. Some shapes are simple
enough that no elaborate techniques are needed to compute their areas (or volumes). We
brieﬂy survey some of these simple geometric shapes and list what we know or can easily
determine about their area or volume.
The areas of simple geometrical objects, such as rectangles, parallelograms, triangles,
and circles are given by elementary formulae. Indeed, our ability to compute areas and
volumes of more elaborate geometrical objects will rest on some of these simple formulae,
summarized below.
Rectangular areas
Most integration techniques discussed in this course are based on the idea of carving up
irregular shapes into rectangular strips. Thus, areas of rectangles will play an important
part in those methods.
1
2 Chapter 1. Areas, volumes and simple sums
• The area of a rectangle with base b and height h is
A = b · h
• Any parallelogramwith height h and base b also has area, A = b·h. See Figure 1.1(a)
and (b)
(a)
(c)
(e) (f)
(d)
(b)
b
h
b
h
h
b
b
h
r
b
h
b
θ
h
Figure 1.1. Planar regions whose areas are given by elementary formulae.
Areas of triangular shapes
A few illustrative examples in this chapter will be based on dissecting shapes (such as regu
lar polygons) into triangles. The areas of triangles are easy to compute, and we summarize
this review material below. However, triangles will play a less important role in subsequent
integration methods.
• The area of a triangle can be obtained by slicing a rectangle or parallelogramin half,
as shown in Figure 1.1(c) and (d). Thus, any triangle with base b and height h has
area
A =
1
2
bh.
1.2. Areas of simple shapes 3
• In some cases, the height of a triangle is not given, but can be determined from other
information provided. For example, if the triangle has sides of length b and r with
enclosed angle θ, as shown on Figure 1.1(e) then its height is simply h = r sin(θ),
and its area is
A = (1/2)br sin(θ)
• If the triangle is isosceles, with two sides of equal length, r, and base of length b,
as in Figure 1.1(f) then its height can be obtained from Pythagoras’s theorem, i.e.
h
2
= r
2
−(b/2)
2
so that the area of the triangle is
A = (1/2)b
r
2
−(b/2)
2
.
1.2.1 Example 1: Finding the area of a polygon using
triangles: a “dissection” method
Using the simple ideas reviewed so far, we can determine the areas of more complex ge
ometric shapes. For example, let us compute the area of a regular polygon with n equal
sides, where the length of each side is b = 1. This example illustrates how a complex shape
(the polygon) can be dissected into simpler shapes, namely triangles
1
.
h
θ
1
1/2
θ/2
Figure 1.2. An equilateral nsided polygon with sides of unit length can be dis
sected into n triangles. One of these triangles is shown at right. Since it can be further
divided into two Pythagorean triangles, trigonometric relations can be used to ﬁnd the
height h in terms of the length of the base 1/2 and the angle θ/2.
Solution
The polygon has n sides, each of length b = 1. We dissect the polygon into n isosceles
triangles, as shown in Figure 1.2. We do not know the heights of these triangles, but the
angle θ can be found. It is
θ = 2π/n
since together, n of these identical angles make up a total of 360
◦
or 2π radians.
1
This calculation will be used again to ﬁnd the area of a circle in Section 1.2.2. However, note that in later
chapters, our dissections of planar areas will focus mainly on rectangular pieces.
4 Chapter 1. Areas, volumes and simple sums
Let h stand for the height of one of the triangles in the dissected polygon. Then
trigonometric relations relate the height to the base length as follows:
opp
adj
=
b/2
h
= tan(θ/2)
Using the fact that θ = 2π/n, and rearranging the above expression, we get
h =
b
2 tan(π/n)
Thus, the area of each of the n triangles is
A =
1
2
bh =
1
2
b
b
2 tan(π/n)
.
The statement of the problem speciﬁes that b = 1, so
A =
1
2
1
2 tan(π/n)
.
The area of the entire polygon is then n times this, namely
A
ngon
=
n
4 tan(π/n)
.
For example, the area of a square (a polygon with 4 equal sides, n = 4) is
A
square
=
4
4 tan(π/4)
=
1
tan(π/4)
= 1,
where we have used the fact that tan(π/4) = 1.
As a second example, the area of a hexagon (6 sided polygon, i.e. n = 6) is
A
hexagon
=
6
4 tan(π/6)
=
3
2(1/
√
3)
=
3
√
3
2
.
Here we used the fact that tan(π/6) = 1/
√
3.
1.2.2 Example 2: How Archimedes discovered the area of a
circle: dissect and “take a limit”
As we learn early in school the formula for the area of a circle of radius r, A = πr
2
.
But how did this convenient formula come about? and how could we relate it to what we
know about simpler shapes whose areas we have discussed so far. Here we discuss how
this formula for the area of a circle was determined long ago by Archimedes using a clever
“dissection” and approximation trick. We have already seen part of this idea in dissecting
a polygon into triangles, in Section 1.2.1. Here we see a terriﬁcally important second step
that formed the “leap of faith” on which most of calculus is based, namely taking a limit as
the number of subdivisions increases
2
.
First, we recall the deﬁnition of the constant π:
2
This idea has important parallels with our later development of integration. Here it involves adding up the
areas of triangles, and then taking a limit as the number of triangles gets larger. Later on, we do much the same,
but using rectangles in the dissections.
1.2. Areas of simple shapes 5
Deﬁnition of π
In any circle, π is the ratio of the circumference to the diameter of the circle. (Comment:
expressed in terms of the radius, this assertion states the obvious fact that the ratio of 2πr
to 2r is π.)
Shown in Figure 1.3 is a sequence of regular polygons inscribed in the circle. As the
number of sides of the polygon increases, its area gradually becomes a better and better
approximation of the area inside the circle. Similar observations are central to integral
calculus, and we will encounter this idea often. We can compute the area of any one of
these polygons by dissecting into triangles. All triangles will be isosceles, since two sides
are radii of the circle, whose length we’ll call r.
r
r
b
h
Figure 1.3. Archimedes approximated the area of a circle by dissecting it into triangles.
Let r denote the radius of the circle. Suppose that at one stage we have an n sided
polygon. (If we knew the side length of that polygon, then we already have a formula for
its area. However, this side length is not known to us. Rather, we know that the polygon
should ﬁt exactly inside a circle of radius r.) This polygon is made up of n triangles, each
one an isosceles triangle with two equal sides of length r and base of undetermined length
that we will denote by b. (See Figure 1.3.) The area of this triangle is
A
triangle
=
1
2
bh.
The area of the whole polygon, A
n
is then
A = n · (area of triangle) = n
1
2
bh =
1
2
(nb)h.
We have grouped terms so that (nb) can be recognized as the perimeter of the polygon
(i.e. the sum of the n equal sides of length b each). Now consider what happens when we
increase the number of sides of the polygon, taking larger and larger n. Then the height
of each triangle will get closer to the radius of the circle, and the perimeter of the polygon
will get closer and closer to the perimeter of the circle, which is (by deﬁnition) 2πr. i.e. as
n →∞,
h →r, (nb) →2πr
so
A =
1
2
(nb)h →
1
2
(2πr)r = πr
2
6 Chapter 1. Areas, volumes and simple sums
We have used the notation “→” to mean that in the limit, as n gets large, the quantity of
interest “approaches” the value shown. This argument proves that the area of a circle must
be
A = πr
2
.
One of the most important ideas contained in this little argument is that by approximating a
shape by a larger and larger number of simple pieces (in this case, a large number of trian
gles), we get a better and better approximation of its area. This idea will appear again soon,
but in most of our standard calculus computations, we will use a collection of rectangles,
rather than triangles, to approximate areas of interesting regions in the plane.
Areas of other shapes
We concentrate here the area of a circle and of other shapes.
• The area of a circle of radius r is
A = πr
2
.
• The surface area of a sphere of radius r is
S
ball
= 4πr
2
.
• The surface area of a right circular cylinder of height h and base radius r is
S
cyl
= 2πrh.
Units
The units of area can be meters
2
(m
2
), centimeters
2
(cm
2
), square inches, etc.
1.3 Simple volumes
Later in this course, we will also be computing the volumes of 3D shapes. As in the case
of areas, we collect below some basic formulae for volumes of elementary shapes. These
will be useful in our later discussions.
1. The volume of a cube of side length s (Figure 1.4a), is
V = s
3
.
2. The volume of a rectangular box of dimensions h, w, l (Figure 1.4b) is
V = hwl.
3. The volume of a cylinder of base area A and height h, as in Figure 1.4(c), is
V = Ah.
This applies for a cylinder with ﬂat base of any shape, circular or not.
1.3. Simple volumes 7
r
(a) (b)
(c) (d)
s
w
l
h
A
h
Figure 1.4. 3dimensional shapes whose volumes are given by elementary formulae
4. In particular, the volume of a cylinder with a circular base of radius r, (e.g. a disk) is
V = h(πr
2
).
5. The volume of a sphere of radius r (Figure 1.4d), is
V =
4
3
πr
3
.
6. The volume of a spherical shell (hollow sphere with a shell of some small thickness,
τ) is approximately
V ≈ τ · (surface area of sphere) = 4πτr
2
.
7. Similarly, a cylindrical shell of radius r, height h and small thickness, τ has volume
given approximately by
V ≈ τ · (surface area of cylinder) = 2πτrh.
Units
The units of volume are meters
3
(m
3
), centimeters
3
(cm
3
), cubic inches, etc.
8 Chapter 1. Areas, volumes and simple sums
1.3.1 Example 3: The Tower of Hanoi: a tower of disks
In this example, we consider how elementary shapes discussed above can be used to de
termine volumes of more complex objects. The Tower of Hanoi is a shape consisting of a
number of stacked disks. It is a simple calculation to add up the volumes of these disks, but
if the tower is large, and comprised of many disks, we would want some shortcut to avoid
long sums
3
.
Figure 1.5. Computing the volume of a set of disks. (This structure is sometimes
called the tower of Hanoi after a mathematical puzzle by the same name.)
(a) Compute the volume of a tower made up of four disks stacked up one on top of
the other, as shown in Figure 1.5. Assume that the radii of the disks are 1, 2, 3, 4 units and
that each disk has height 1.
(b) Compute the volume of a tower made up of 100 such stacked disks, with radii
r = 1, 2, . . . , 99, 100.
Solution
(a) The volume of the fourdisk tower is calculated as follows:
V = V
1
+V
2
+V
3
+V
4
,
where V
i
is the volume of the i’th disk whose radius is r = i, i = 1, 2 . . . 4. The height of
each disk is h = 1, so
V = (π1
2
) + (π2
2
) + (π3
2
) + (π4
2
) = π(1 + 4 + 9 + 16) = 30π.
(b) The idea will be the same, but we have to calculate
V = π(1
2
+ 2
2
+ 3
2
+. . . + 99
2
+ 100
2
).
It would be tedious to do this by adding up individual terms, and it is also cumbersome
to write down the long list of terms that we will need to add up. This motivates inventing
some helpful notation, and ﬁnding some clever way of performing such calculations.
3
Note that the idea of computing a volume of a radially symmetric 3D shape by dissection into disks will form
one of the main themes in Chapter 5. Here, the sums of the volumes of disks is exactly the same as the volume of
the tower. Later on, the disks will only approximate the true 3D volume, and a limit will be needed to arrive at a
“true volume”.
1.4. Summations and the “Sigma” notation 9
1.4 Summations and the “Sigma” notation
We introduce the following notation for the operation of summing a list of numbers:
S = a
1
+a
2
+a
3
+. . . +a
N
≡
N
¸
k=1
a
k
.
The Greek symbol Σ (“Sigma”) indicates summation. The symbol k used here is
called the “index of summation” and it keeps track of where we are in the list of summands.
The notation k = 1 that appears underneath Σ indicates where the sum begins (i.e. which
term starts off the series), and the superscript N tells us where it ends. We will be interested
in getting used to this notation, as well as in actually computing the value of the desired
sum using a variety of shortcuts.
Example 4a: Summation notation
Suppose we want to form the sum of ten numbers, each equal to 1. We would write this as
S = 1 + 1 + 1 +. . . 1 ≡
10
¸
k=1
1.
The notation . . . signiﬁes that we have left out some of the terms (out of laziness, or in cases
where there are too many to conveniently write down.) We could have just as well written
the sum with another symbol (e.g. n) as the index, i.e. the same operation is implied by
10
¸
n=1
1.
To compute the value of the sum we use the elementary fact that the sum of ten ones is just
10, so
S =
10
¸
k=1
1 = 10.
Example 4b: Sum of squares
Expand and sum the following:
S =
4
¸
k=1
k
2
.
Solution
S =
4
¸
k=1
k
2
= 1 + 2
2
+ 3
2
+ 4
2
= 1 + 4 + 9 + 16 = 30.
(We have already seen this sum in part (a) of The Tower of Hanoi.)
10 Chapter 1. Areas, volumes and simple sums
Example 4c: Common factors
Add up the following list of 100 numbers (only a few of them are shown):
S = 3 + 3 + 3 + 3 +. . . + 3.
Solution
There are 100 terms, all equal, so we can take out a common factor
S = 3 + 3 + 3 + 3 +. . . + 3 =
100
¸
k=1
3 = 3
100
¸
k=1
1 = 3(100) = 300.
Example 4d: Finding the pattern
Write the following terms in summation notation:
S =
1
3
+
1
9
+
1
27
+
1
81
.
Solution
We recognize that there is a pattern in the sequence of terms, namely, each one is 1/3 raised
to an increasing integer power, i.e.
S =
1
3
+
1
3
2
+
1
3
3
+
1
3
4
.
We can represent this with the “Sigma” notation as follows:
S =
4
¸
n=1
1
3
n
.
The “index” n starts at 1, and counts up through 2, 3, and 4, while each term has the formof
(1/3)
n
. This series is a geometric series, to be explored shortly. In most cases, a standard
geometric series starts off with the value 1. We can easily modify our notation to include
additional terms, for example:
S =
5
¸
n=0
1
3
n
= 1 +
1
3
+
1
3
2
+
1
3
3
+
1
3
4
+
1
3
5
.
Learning how to compute the sum of such terms will be important to us, and will be de
scribed later on in this chapter.
1.4.1 Manipulations of sums
Since addition is commutative and distributive, sums of lists of numbers satisfy many con
venient properties. We give a few examples below:
1.5. Summation formulas 11
Example 5a: Simple operations
Simplify the following expression:
10
¸
k=1
2
k
−
10
¸
k=3
2
k
.
Solution
10
¸
k=1
2
k
−
10
¸
k=3
2
k
= (2 + 2
2
+ 2
3
+ · · · + 2
10
) −(2
3
+ · · · + 2
10
) = 2 + 2
2
.
We could have arrived at this conclusion directly from
10
¸
k=1
2
k
−
10
¸
k=3
2
k
=
2
¸
k=1
2
k
= 2 + 2
2
= 2 + 4 = 6.
The idea is that all but the ﬁrst two terms in the ﬁrst sum will cancel. The only remaining
terms are those corresponding to k = 1 and k = 2.
Example 5b: Expanding
Expand the following expression:
5
¸
n=0
(1 + 3
n
).
Solution
5
¸
n=0
(1 + 3
n
) =
5
¸
n=0
1 +
5
¸
n=0
3
n
.
1.5 Summation formulas
In this section we introduce a few examples of useful sums and give formulae that provide
a shortcut to dreary calculations.
The sum of consecutive integers (Gauss’ formula)
We ﬁrst show that the sum of the ﬁrst N integers is:
S = 1 + 2 + 3 +. . . +N =
N
¸
k=1
k =
N(N + 1)
2
. (1.1)
12 Chapter 1. Areas, volumes and simple sums
The following trick is due to Gauss. By aligning two copies of the above sum, one
written backwards, we can easily add them up one by one vertically. We see that:
S = 1 + 2 + . . . + (N −1) + N
+
S = N + (N −1) + . . . + 2 + 1
2S = (1 + N) + (1 +N) + . . . + (1 +N) + (1 +N)
Thus, there are N times the value (N + 1) above, so that
2S = N(1 +N), so S =
N(1 +N)
2
.
Thus, Gauss’ formula is conﬁrmed.
Example: Adding up the ﬁrst 1000 integers
Suppose we want to add up the ﬁrst 1000 integers. This formula is very useful in what
would otherwise be a huge calculation. We ﬁnd that
S = 1 + 2 + 3 +. . . + 1000 =
1000
¸
k=1
k =
1000(1 + 1000)
2
= 500(1001) = 500500.
Two other useful formulae are those for the sums of consecutive squares and of
consecutive cubes:
The sum of the ﬁrst N consecutive square integers
S
2
= 1
2
+ 2
2
+ 3
2
+. . . +N
2
=
N
¸
k=1
k
2
=
N(N + 1)(2N + 1)
6
. (1.2)
The sum of the ﬁrst N consecutive cube integers
S
3
= 1
3
+ 2
3
+ 3
3
+. . . +N
3
=
N
¸
k=1
k
3
=
N(N + 1)
2
2
. (1.3)
In the Appendix, we show how the formula for the sum of square integers can be
proved by a technique called mathematical induction.
1.5.1 Example 3, revisited: Volume of a Tower of Hanoi
Armed with the formula for the sum of squares, we can now return to the problem of com
puting the volume of a tower of 100 stacked disks of heights 1 and radii r = 1, 2, . . . , 99, 100.
We have
V = π(1
2
+2
2
+3
2
+. . .+99
2
+100
2
) = π
100
¸
k=1
k
2
= π
100(101)(201)
6
= 338, 350π cubic units.
1.6. Summing the geometric series 13
Examples: Evaluating the sums
Compute the following two sums:
(a) S
a
=
20
¸
k=1
(2 −3k + 2k
2
), (b) S
b
=
50
¸
k=10
k.
Solutions
(a) We can separate this into three individual sums, each of which can be handled by alge
braic simpliﬁcation and/or use of the summation formulae developed so far.
S
a
=
20
¸
k=1
(2 −3k + 2k
2
) = 2
20
¸
k=1
1 −3
20
¸
k=1
k + 2
20
¸
k=1
k
2
.
Thus, we get
S
a
= 2(20) −3
20(21)
2
+ 2
(20)(21)(41)
6
= 5150.
(b) We can express the second sum as a difference of two sums:
S
b
=
50
¸
k=10
k =
50
¸
k=1
k
−
9
¸
k=1
k
.
Thus
S
b
=
50(51)
2
−
9(10)
2
= 1275 −45 = 1230.
1.6 Summing the geometric series
Consider a sum of terms that all have the form r
k
, where r is some real number and k is
an integer power. We refer to a series of this type as a geometric series. We have already
seen one example of this type in a previous section. Below we will show that the sum of
such a series is given by:
S
N
= 1 +r +r
2
+r
3
+. . . +r
N
=
N
¸
k=0
r
k
=
1 −r
N+1
1 −r
(1.4)
where r = 1. We call this sum a (ﬁnite) geometric series. We would like to ﬁnd
an expression for terms of this form in the general case of any real number r, and ﬁnite
number of terms N. First we note that there are N + 1 terms in this sum, so that if r = 1
then
S
N
= 1 + 1 + 1 +. . . 1 = N + 1
(a total of N + 1 ones added.) If r = 1 we have the following trick:
S = 1 + r + r
2
+ . . . + r
N
−
rS = r + r
2
+ . . . + r
N+1
14 Chapter 1. Areas, volumes and simple sums
Subtracting leads to
S −rS = (1 +r +r
2
+. . . +r
N
) −(r +r
2
+. . . +r
N
+r
N+1
)
Most of the terms on the right hand side cancel, leaving
S(1 −r) = 1 −r
N+1
.
Now dividing both sides by 1 −r leads to
S =
1 −r
N+1
1 −r
,
which was the formula to be established.
Example: Geometric series
Compute the following sum:
S
c
=
10
¸
k=0
2
k
.
Solution
This is a geometric series
S
c
=
10
¸
k=0
2
k
=
1 −2
10+1
1 −2
=
1 −2048
−1
= 2047.
1.7 Prelude to inﬁnite series
So far, we have looked at several examples of ﬁnite series, i.e. series in which there are
only a ﬁnite number of terms, N (where N is some integer). We would like to investigate
how the sum of a series behaves when more and more terms of the series are included. It
is evident that in many cases, such as Gauss’s series (1.1), or sums of squared or cubed
integers (e.g., Eqs. (1.2) and (1.3)), the series simply gets larger and larger as more terms
are included. We say that such series diverge as N → ∞. Here we will look speciﬁcally
for series that converge, i.e. have a ﬁnite sum, even as more and more terms are included
4
.
Let us focus again on the geometric series and determine its behaviour when the
number of terms is increased. Our goal is to ﬁnd a way of attaching a meaning to the
expression
S
n
=
∞
¸
k=0
r
k
,
when the series becomes an inﬁnite series. We will use the following deﬁnition:
4
Convergence and divergence of series is discussed in fuller depth in Chapter 10 in the context of Taylor Series.
However, these concepts are so important that it was felt necessary to introduce some preliminary ideas early in
the term.
1.7. Prelude to inﬁnite series 15
1.7.1 The inﬁnite geometric series
Deﬁnition
An inﬁnite series that has a ﬁnite sum is said to be convergent. Otherwise it is divergent.
Deﬁnition
Suppose that S is an (inﬁnite) series whose terms are a
k
. Then the partial sums, S
n
, of this
series are
S
n
=
n
¸
k=0
a
k
.
We say that the sum of the inﬁnite series is S, and write
S =
∞
¸
k=0
a
k
,
provided that
S = lim
n→∞
n
¸
k=0
a
k
.
That is, we consider the inﬁnite series as the limit of the partial sums as the number of
terms n is increased. In this case we also say that the inﬁnite series converges to S.
We will see that only under certain circumstances will inﬁnite series have a ﬁnite
sum, and we will be interested in exploring two questions:
1. Under what circumstances does an inﬁnite series have a ﬁnite sum.
2. What value does the partial sum approach as more and more terms are included.
In the case of a geometric series, the sum of the series, (1.4) depends on the number
of terms in the series, n via r
n+1
. Whenever r > 1, or r < −1, this term will get bigger in
magnitude as n increases, whereas, for 0 < r < 1, this term decreases in magnitude with
n. We can say that
lim
n→∞
r
n+1
= 0 provided r < 1.
These observations are illustrated by two speciﬁc examples below. This leads to the fol
lowing conclusion:
The sum of an inﬁnite geometric series,
S = 1 +r +r
2
+. . . +r
k
+. . . =
∞
¸
k=0
r
k
,
exists provided r < 1 and is
S =
1
1 − r
. (1.5)
Examples of convergent and divergent geometric series are discussed below.
16 Chapter 1. Areas, volumes and simple sums
1.7.2 Example: A geometric series that converges.
Consider the geometric series with r =
1
2
, i.e.
S
n
= 1 +
1
2
+
1
2
2
+
1
2
3
+. . . +
1
2
n
=
n
¸
k=0
1
2
k
.
Then
S
n
=
1 −(1/2)
n+1
1 −(1/2)
.
We observe that as n increases, i.e. as we retain more and more terms, we obtain
lim
n→∞
S
n
= lim
n→∞
1 −(1/2)
n+1
1 −(1/2)
=
1
1 −(1/2)
= 2.
In this case, we write
∞
¸
n=0
1
2
n
= 1 +
1
2
+ (
1
2
)
2
+. . . = 2
and we say that “the (inﬁnite) series converges to 2”.
1.7.3 Example: A geometric series that diverges
In contrast, we now investigate the case that r = 2: then the series consists of terms
S
n
= 1 + 2 + 2
2
+ 2
3
+. . . + 2
n
=
n
¸
k=0
2
k
=
1 −2
n+1
1 −2
= 2
n+1
−1
We observe that as n grows larger, the sum continues to grow indeﬁnitely. In this case, we
say that the sum does not converge, or, equivalently, that the sum diverges.
It is important to remember that an inﬁnite series, i.e. a sum with inﬁnitely many
terms added up, can exhibit either one of these two very different behaviours. It may
converge in some cases, as the ﬁrst example shows, or diverge (fail to converge) in other
cases. We will see examples of each of these trends again. It is essential to be able to
distinguish the two. Divergent series (or series that diverge under certain conditions) must
be handled with particular care, for otherwise, we may ﬁnd contradictions or seemingly
reasonable calculations that have meaningless results.
1.8 Application of geometric series to the branching
structure of the lungs
In this section, we will compute the volume and surface area of the branched airways of
lungs
5
. We use the summation formulae to arrive at the results, and we also illustrate how
the same calculation could be handled using a simple spreadsheet.
5
This section provides an example of how to set up a biologically relevant calculation based on geometric
series. It is further studied in the homework problems. A similar example is given as an exercise for the student
in Lab 1 of this calculus course.
1.8. Application of geometric series to the branching structure of the lungs 17
Our lungs pack an amazingly large surface area into a conﬁned volume. Most of
the oxygen exchange takes place in tiny sacs called alveoli at the terminal branches of the
airways passages. The bronchial tubes conduct air, and distribute it to the many smaller
and smaller tubes that eventually lead to those alveoli. The principle of this efﬁcient organ
for oxygen exchange is that these very many small structures present a very large surface
area. Oxygen from the air can diffuse across this area into the bloodstream very efﬁciently.
The lungs, and many other biological “distribution systems” are composed of a
branched structure. The initial segment is quite large. It bifurcates into smaller segments,
which then bifurcate further, and so on, resulting in a geometric expansion in the number of
branches, their collective volume, length, etc. In this section, we apply geometric series to
explore this branched structure of the lung. We will construct a simple mathematical model
and explore its consequences. The model will consist in some wellformulated assumptions
about the way that “daughter branches” are related to their “parent branch”. Based on these
assumptions, and on tools developed in this chapter, we will then predict properties of the
structure as a whole. We will be particularly interested in the volume V and the surface
area S of the airway passages in the lungs
6
.
2
l
0
r
0
Segment 0
1
Figure 1.6. Air passages in the lungs consist of a branched structure. The index
n refers to the branch generation, starting from the initial segment, labeled 0. All segments
are assumed to be cylindrical, with radius r
n
and length
n
in the n’th generation.
1.8.1 Assumptions
• The airway passages consist of many “generations” of branched segments. We label
the largest segment with index “0”, and its daughter segments with index “1”, their
successive daughters “2”, and so on down the structure from large to small branch
segments. We assume that there are M “generations”, i.e. the initial segment has un
dergone M subdivisions. Figure 1.6 shows only generations 0, 1, and 2. (Typically,
for human lungs there can be up to 2530 generations of branching.)
• At each generation, every segment is approximated as a cylinder of radius r
n
and
length
n
.
6
The surface area of the bronchial tubes does not actually absorb much oxygen, in humans. However, as an
example of summation, we will compute this area and compare how it grows to the growth of the volume from
one branching layer to the next.
18 Chapter 1. Areas, volumes and simple sums
radius of ﬁrst segment r
0
0.5 cm
length of ﬁrst segment
0
5.6 cm
ratio of daughter to parent length α 0.9
ratio of daughter to parent radius β 0.86
number of branch generations M 30
average number daughters per parent b 1.7
Table 1.1. Typical structure of branched airway passages in lungs.
• The number of branches grows along the “tree”. On average, each parent branch
produces b daughter branches. In Figure 1.6, we have illustrated this idea for b = 2.
A branched structure in which each branch produces two daughter branches is de
scribed as a bifurcating tree structure (whereas trifurcating implies b = 3). In real
lungs, the branching is slightly irregular. Not every level of the structure bifurcates,
but in general, averaging over the many branches in the structure b is smaller than 2.
In fact, the rule that links the number of branches in generation n, here denoted x
n
with the number (of smaller branches) in the next generation, x
n+1
is
x
n+1
= bx
n
. (1.6)
We will assume, for simplicity, that b is a constant. Since the number of branches
is growing down the length of the structure, it must be true that b > 1. For human
lungs, on average, 1 < b < 2. Here we will take b to be constant, i.e. b = 1.7. In
actual fact, this simpliﬁcation cannot be precise, because we have just one segment
initially (x
0
= 1), and at level 1, the number of branches x
1
should be some small
integer, not a number like “1.7”. However, as in many mathematical models, some
accuracy is sacriﬁced to get intuition. Later on, details that were missed and are
considered important can be corrected and reﬁned.
• The ratios of radii and lengths of daughters to parents are approximated by “pro
portional scaling”. This means that the relationship of the radii and lengths satisfy
simple rules: The lengths are related by
n+1
= α
n
, (1.7)
and the radii are related by
r
n+1
= βr
n
, (1.8)
with α and β positive constants. For example, it could be the case that the radius of
daughter branches is 1/2 or 2/3 that of the parent branch. Since the branches decrease
in size (while their number grows), we expect that 0 < α < 1 and 0 < β < 1.
Rules such as those given by equations (1.7) and (1.8) are often called selfsimilar growth
laws. Such concepts are closely linked to the idea of fractals, i.e. theoretical structures
produced by iterating such growth laws indeﬁnitely. In a real biological structure, the
1.8. Application of geometric series to the branching structure of the lungs 19
number of generations is ﬁnite. (However, in some cases, a ﬁnite geometric series is well
approximated by an inﬁnite sum.)
Actual lungs are not fully symmetric branching structures, but the above approxi
mations are used here for simplicity. According to physiological measurements, the scale
factors for sizes of daughter to parent size are in the range 0.65 ≤ α, β ≤ 0.9. (K. G.
Horsﬁeld, G. Dart, D. E. Olson, and G. Cumming, (1971) J. Appl. Phys. 31, 207217.) For
the purposes of this example, we will use the values of constants given in Table 1.1.
1.8.2 A simple geometric rule
The three equations that govern the rules for successive branching, i.e. equations (1.6), (1.7),
and (1.8), are examples of a very generic “geometric progression” recipe. Before returning
to the problem at hand, let us examine the implications of this recursive rule, when it is
applied to generating the whole structure. Essentially, we will see that the rule linking two
generations implies an exponential growth. To see this, let us write out a few ﬁrst terms in
the progression of the sequence {x
n
}:
initial value: x
0
ﬁrst iteration: x
1
= bx
0
second iteration: x
2
= bx
1
= b(bx
0
) = b
2
x
0
third iteration: x
3
= bx
2
= b(b
2
x
0
) = b
3
x
0
.
.
.
By the same pattern, at the n’th generation, the number of segments will be
n’th iteration: x
n
= bx
n−1
= b(bx
n−2
) = b(b(bx
n−3
)) = . . . = (b · b · · · b)
. .. .
n factors
x
0
= b
n
x
0
.
We have arrived at a simple, but important result, namely:
The rule linking two generations,
x
n
= bx
n−1
(1.9)
implies that the n’th generation will have grown by a factor b
n
, i.e.,
x
n
= b
n
x
0
. (1.10)
This connection between the rule linking two generations and the resulting number of
members at each generation is useful in other circumstances. Equation (1.9) is sometimes
called a recursion relation, and its solution is given by equation (1.10). We will use the
same idea to ﬁnd the connection between the volumes, and surface areas of successive
segments in the branching structure.
20 Chapter 1. Areas, volumes and simple sums
1.8.3 Total number of segments
We used the result of Section 1.8.2 and the fact that there is one segment in the 0’th gener
ation, i.e. x
0
= 1, to conclude that at the n’th generation, the number of segments is
x
n
= x
0
b
n
= 1 · b
n
= b
n
.
For example, if b = 2, the number of segments grows by powers of 2, so that the tree
bifurcates with the pattern 1, 2, 4, 8, etc.
To determine how many branch segments there are in total, we add up over all gen
erations, 0, 1, . . . M. This is a geometric series, whose sum we can compute. Using equa
tion (1.4), we ﬁnd
N =
M
¸
n=0
b
n
=
1 −b
M+1
1 −b
.
Given b and M, we can then predict the exact number of segments in the structure. The
calculation is summarized further on for values of the branching parameter, b, and the
number of branch generations, M, given in Table 1.1.
1.8.4 Total volume of airways in the lung
Since each lung segment is assumed to be cylindrical, its volume is
v
n
= πr
2
n
n
.
Here we mean just a single segment in the n’th generation of branches. (There are b
n
such
identical segments in the n’th generation, and we will refer to the volume of all of them
together as V
n
below.)
The length and radius of segments also follow a geometric progression. In fact, the
same idea developed above can be used to relate the length and radius of a segment in the
n’th, generation segment to the length and radius of the original 0’th generation segment,
namely,
n
= α
n−1
⇒
n
= α
n
0
,
and
r
n
= βr
n−1
⇒ r
n
= β
n
r
0
.
Thus the volume of one segment in generation n is
v
n
= πr
2
n
n
= π(β
n
r
0
)
2
(α
n
0
) = (αβ
2
)
n
(πr
2
0
0
)
. .. .
v
0
.
This is just a product of the initial segment volume v
0
= πr
2
0
0
, with the n’th power of a
certain factor(α, β). (That factor takes into account that both the radius and the length are
being scaled down at every successive generation of branching.) Thus
v
n
= (αβ
2
)
n
v
0
.
1.8. Application of geometric series to the branching structure of the lungs 21
The total volume of all (b
n
) segments in the n’th layer is
V
n
= b
n
v
n
= b
n
(αβ
2
)
n
v
0
= (bαβ
2
. .. .
a
)
n
v
0
.
Here we have grouped terms together to reveal the simple structure of the relationship:
one part of the expression is just the initial segment volume, while the other is now a
“scale factor” that includes not only changes in length and radius, but also in the number of
branches. Letting the constant a stand for that scale factor, a = (bαβ
2
) leads to the result
that the volume of all segments in the n’th layer is
V
n
= a
n
v
0
.
The total volume of the structure is obtained by summing the volumes obtained at
each layer. Since this is a geometric series, we can use the summation formula. i.e.,
Equation (1.4). Accordingly, total airways volume is
V =
30
¸
n=0
V
n
= v
0
30
¸
n=0
a
n
= v
0
1 −a
M+1
1 −a
.
The similarity of treatment with the previous calculation of number of branches is appar
ent. We compute the value of the constant a in Table 1.2, and ﬁnd the total volume in
Section 1.8.6.
1.8.5 Total surface area of the lung branches
The surface area of a single segment at generation n, based on its cylindrical shape, is
s
n
= 2πr
n
n
= 2π(β
n
r
0
)(α
n
0
) = (αβ)
n
(2πr
0
0
)
. .. .
s0
,
where s
0
is the surface area of the initial segment. Since there are b
n
branches at generation
n, the total surface area of all the n’th generation branches is thus
S
n
= b
n
(αβ)
n
s
0
= (bαβ
....
c
)
n
s
0
,
where we have let c stand for the scale factor c = (bαβ). Thus,
S
n
= c
n
s
0
.
This reveals the similar nature of the problem. To ﬁnd the total surface area of the airways,
we sum up,
S = s
0
M
¸
n=0
c
n
= s
0
1 −c
M+1
1 −c
.
We compute the values of s
0
and c in Table 1.2, and summarize ﬁnal calculations of the
total airways surface area in section 1.8.6.
22 Chapter 1. Areas, volumes and simple sums
volume of ﬁrst segment v
0
= πr
2
0
0
4.4 cm
3
surface area of ﬁrst segment s
0
= 2πr
0
0
17.6 cm
2
ratio of daughter to parent segment volume (αβ
2
) 0.66564
ratio of daughter to parent segment surface area (αβ) 0.774
ratio of net volumes in successive generations a = bαβ
2
1.131588
ratio of net surface areas in successive generations c = bαβ 1.3158
Table 1.2. Volume, surface area, scale factors, and other derived quantities. Be
cause a and c are bases that will be raised to large powers, it is important to that their
values are fairly accurate, so we keep more signiﬁcant ﬁgures.
1.8.6 Summary of predictions for speciﬁc parameter values
By setting up the model in the above way, we have revealed that each quantity in the struc
ture obeys a simple geometric series, but with distinct “bases” b, a and c and coefﬁcients
1, v
0
, and s
0
. This approach has shown that the formula for geometric series applies in
each case. Now it remains to merely “plug in” the appropriate quantities. In this section,
we collect our results, use the sample values for a model “human lung” given in Table 1.1,
or the resulting derived scale factors and quantities in Table 1.2 to ﬁnish the task at hand.
Total number of segments
N =
M
¸
n=0
b
n
=
1 −b
M+1
1 −b
=
1 −(1.7)
31
1 −1.7
= 1.9898 · 10
7
≈ 2 · 10
7
.
According to this calculation, there are a total of about 20 million branch segments overall
(including all layers, form top to bottom) in the entire structure!
Total volume of airways
Using the values for a and v
0
computed in Table 1.2, we ﬁnd that the total volume of all
segments in the n’th generation is
V = v
0
30
¸
n=0
a
n
= v
0
1 −a
M+1
1 −a
= 4.4
(1 −1.131588
31
)
(1 −1.131588)
= 1510.3 cm
3
.
Recall that 1 litre = 1000 cm
3
. Then we have found that the lung airways contain about 1.5
litres.
1.8. Application of geometric series to the branching structure of the lungs 23
Total surface area of airways
Using the values of s
0
and c in Table 1.2, the total surface area of the tubes that make up
the airways is
S = s
0
M
¸
n=0
c
n
= s
0
1 −c
M+1
1 −c
= 17.6
(1 −1.3158
31
)
(1 −1.3158)
= 2.76 · 10
5
cm
2
.
There are 100 cm per meter, and (100)
2
= 10
4
cm
2
per m
2
. Thus, the area we have
computed is equivalent to about 28 square meters!
1.8.7 Exploring the problem numerically
Up to now, all calculations were done using the formulae developed for geometric series.
However, sometimes it is more convenient to devise a computer algorithm to implement
“rules” and perform repetitive calculations in a problem such as discussed here. The ad
vantage of that approach is that it eliminates tedious calculations by hand, and, in cases
where summation formulae are not know to us, reduces the need for analytical computa
tions. It can also provide a shortcut to visual summary of the results. The disadvantage is
that it can be less obvious how each of the values of parameters assigned to the problem
affects the ﬁnal answers.
A spreadsheet is an ideal tool for exploring iterated rules such as those given in the
lung branching problem
7
. In Figure 1.7 we show the volumes and surface areas associated
with the lung airways for parameter values discussed above. Both layer by layer values and
cumulative sums leading to total volume and surface area are shown in each of (a) and (c).
In (b) and (d), we compare these results to similar graphs in the case that one parameter, the
branching number, b is adjusted from 1.7 (original value) to 2. The contrast between the
graphs shows how such a small change in this parameter can signiﬁcantly affect the results.
1.8.8 For further independent study
The following problems can be used for further independent exploration of these ideas.
1. In our model, we have assumed that, on average, a parent branch has only “1.7”
daughter branches, i.e. that b = 1.7. Suppose we had assumed that b = 2. What
would the total volume V be in that case, keeping all other parameters the same?
Explain why this is biologically impossible in the case M = 30 generations. For
what value of M would b = 2 lead to a reasonable result?
2. Suppose that the ﬁrst 5 generations of branching produce 2 daughters each, but then
from generation 6 on, the branching number is b = 1.7. How would you set up this
variant of the model? How would this affect the calculated volume?
3. In the problem we explored, the net volume and surface area keep growing by larger
and larger increments at each “generation” of branching. We would describe this as
“unbounded growth”. Explain why this is the case, paying particular attention to the
scale factors a and c.
7
See Lab 1 for a similar problem that is also investigated using a spreadsheet.
24 Chapter 1. Areas, volumes and simple sums
Cumulative volume to layer n
Vn = Volume of layer n


V
0.5 30.5
0.0
1500.0
Cumulative volume to layer n
Vn = Volume of layer n
0.5 30.5
0.0
1500.0
(a) (b)
Cumulative surface area to n’th layer
surface area of n’th layer
0.5 30.5
0.0
250000.0
Cumulative surface area to n’th layer
surface area of n’th layer
0.5 30.5
0.0
250000.0
(c) (d)
Figure 1.7. (a) V
n
, the volume of layer n (red bars), and the cumulative volume
down to layer n (yellow bars) are shown for parameters given in Table 1.1. (b) Same as (a)
but assuming that parent segments always produce two daughter branches (i.e. b = 2). The
graphs in (a) and (b) are shown on the same scale to accentuate the much more dramatic
growth in (b). (c) and (d): same idea showing the surface area of n’th layer (green) and
the cumulative surface area to layer n (blue) for original parameters (in c), as well as for
the value b = 2 (in d).
4. Suppose we want a set of tubes with a large surface area but small total volume.
Which single factor or parameter should we change (and how should we change it) to
correct this feature of the model, i.e. to predict that the total volume of the branching
tubes remains roughly constant while the surface area increases as branching layers
are added.
5. Determine how the branching properties of real human lungs differs from our as
sumed model, and use similar ideas to reﬁne and correct our estimates. You may
want to investigate what is known about the actual branching parameter b, the num
ber of generations of branches, M, and the ratios of lengths and radii that we have
assumed. Alternately, you may wish to ﬁnd parameters for other species and do a
1.9. Summary 25
comparative study of lungs in a variety of animal sizes.
6. Branching structures are ubiquitous in biology. Many species of plants are based
on a regular geometric sequence of branching. Consider a tree that trifurcates (i.e.
produces 3 new daughter branches per parent branch, b = 3). Explain (a) What
biological problem is to be solved in creating such a structure (b) What sorts of
constraints must be satisﬁed by the branching parameters to lead to a viable structure.
This is an openended problem.
1.9 Summary
In this chapter, we collected useful formulae for areas and volumes of simple 2D and 3D
shapes. A summary of the most important ones is given below. Table 1.3 lists the areas of
simple shapes, Table 1.4 the volumes and Table 1.5 the surface areas of 3D shapes.
We used areas of triangles to compute areas of more complicated shapes, including
regular polygons. We used a polygon with N sides to approximate the area of a circle, and
then, by letting N go to inﬁnity, we were able to prove that the area of a circle of radius r
is A = πr
2
. This idea, and others related to it, will form a deep underlying theme in the
next two chapters and later on in this course.
We introduced some notation for series and collected useful formulae for summation
of such series. These are summarized in Table 1.6. We will use these extensively in our
next chapter.
Finally, we investigated geometric series and studied a biological application, namely
the branching structure of lungs.
Object dimensions area, A
triangle base b, height h
1
2
bh
rectangle base b, height h bh
circle radius r πr
2
Table 1.3. Areas of planar regions
26 Chapter 1. Areas, volumes and simple sums
Object dimensions volume, V
box base b, height h, width w hwb
circular cylinder radius r, height h πr
2
h
sphere radius r
4
3
πr
3
cylindrical shell* radius r, height h, thickness τ 2πrhτ
spherical shell* radius r, thickness τ 4πr
2
τ
Table 1.4. Volumes of 3D shapes. * Assumes a thin shell, i.e. small τ.
Object dimensions surface area, S
box base b, height h, width w 2(bh +bw +hw)
circular cylinder radius r, height h 2πrh
sphere radius r 4πr
2
Table 1.5. Surface areas of 3D shapes
Sum Notation Formula Comment
1 + 2 + 3 +. . . +N
¸
N
k=1
k
N(1+N)
2
Gauss’ formula
1
2
+ 2
2
+ 3
2
+. . . +N
2
¸
N
k=1
k
2
N(N+1)(2N+1)
6
Sum of squares
1
3
+ 2
3
+ 3
3
+. . . +N
3
¸
N
k=1
k
3
N(N+1)
2
2
Sum of cubes
1 +r +r
2
+r
3
. . . r
N
¸
N
k=0
r
k 1−r
N+1
1−r
Geometric sum
Table 1.6. Useful summation formulae.
Chapter 2
Areas
2.1 Areas in the plane
A longstanding problem of integral calculus is how to compute the area of a region in
the plane. This type of geometric problem formed part of the original motivation for the
development of calculus techniques, and we will discuss it in many contexts in this course.
We have already seen examples of the computation of areas of especially simple geometric
shapes in Chapter 1. For triangles, rectangles, polygons, and circles, no advanced methods
(beyond simple geometry) are needed. However, beyond these elementary shapes, such
methods fail, and a new idea is needed. We will discuss such ideas in this chapter, and in
Chapter 3.
x
y
a b
A
y=f(x)
Figure 2.1. We consider the problem of determining areas of regions such
bounded by the x axis, the lines x = a and x = b and the graph of some function, y = f(x).
We now consider the problem of determining the area of a region in the plane that
has the following special properties: The region is formed by straight lines on three sides,
and by a smooth curve on one of its edges, as shown in Figure 2.1. You might imagine
that the shaded portion of this ﬁgure is a plot of land bounded by fences on three sides, and
by a river on the fourth side. A farmer wishing to purchase this land would want to know
exactly how large an area is being acquired. Here we set up the calculation of that area.
27
28 Chapter 2. Areas
More speciﬁcally, we use a cartesian coordinate system to describe the region: we
require that it falls between the xaxis, the lines x = a and x = b, and the graph of a
function y = f(x). This is required for the process described below to work
8
. We will ﬁrst
restrict attention to the case that f(x) > 0 for all points in the interval a ≤ x ≤ b as we
concentrate on “real areas”. Later, we generalize our results and lift this restriction.
We will approximate the area of the region shown in Figure 2.1 by dissecting it into
smaller regions (rectangular strips) whose areas are easy to determine. We will refer to this
type of procedure as a Riemann sum. In Figure 2.2, we illustrate the basic idea using a
region bounded by the function y = f(x) = x
2
on 0 ≤ x ≤ 1. It can be seen that the
y=f(x)=x^2
x
N=10 rectangles
0.0 1.0
0.0
1.0
y=f(x)=x^2
x
N=20 rectangles
0.0 1.0
0.0
1.0
y=f(x)=x^2
x
N=40 rectangles
0.0 1.0
0.0
1.0
y=f(x)=x^2
True area of region
x
N > infinity
0.0 1.0
0.0
1.0
Figure 2.2. The function y = x
2
for 0 ≤ x ≤ 1 is shown, with rectangles that
approximate the area under its curve. As we increase the number of rectangular strips, the
total area of the strips becomes a better and better approximation of the desired “true”
area. Shown are the intermediate steps N = 10, N = 20, N = 40 and the true area for
N →∞
approximation is fairly coarse when the number of rectangles is small
9
. However, if the
number of rectangles is increased, (as shown in subsequent panels of this same ﬁgure), we
8
Not all planar areas have this property. Later examples indicate how to deal with some that do not.
9
That is, the area of the rectangles is very different from the area of the region of interest.
2.2. Computing the area under a curve by rectangular strips 29
obtain a better and better approximation of the true area. In the limit as N, the number of
rectangles, approaches inﬁnity, the area of the desired region is obtained. This idea will
form the core of this chapter. The reader will note a similarity with the idea we already en
countered in obtaining the area of a circle, though in that context, we had used a dissection
of the circle into approximating triangles.
With this idea in mind, in Section 2.2, we compute the area of the region shown in
Figure 2.2 in two ways. First, we use a simple spreadsheet to do the computations for us.
This is meant to illustrate the “numerical approach”.
Then, as the alternate analytic approach , we set up the Riemann sum corresponding
to the function shown in Figure 2.2. We will ﬁnd that carefully setting up the calculation
of areas of the approximating rectangles will be important. Making a cameo appearance
in this calculation will be the formula for the sums of square integers developed in the
previous chapter. A new feature will be the limit N → ∞that introduces the ﬁnal step of
arriving at the smooth region shown in the ﬁnal panel of Figure 2.2.
2.2 Computing the area under a curve by rectangular
strips
2.2.1 First approach: Numerical integration using a
spreadsheet
The same tool that produces Figure 2.2 can be used to calculate the areas of the steps for
each of the panels in the ﬁgure. To do this, we ﬁx N for a given panel, (e.g. N = 10, 20,
or 40), ﬁnd the corresponding value of ∆x, and set up a calculation which adds up the
areas of steps, i.e.
¸
x
2
∆x in a given panel. The ideas are analogous to those described in
Section 2.2.2, but a spreadsheet does the number crunching for us.
Using a spreadsheet, for example, we ﬁnd the following results at each stage: For
N = 10 strips, the area is 0.3850 units
2
, for N = 20 strips it is 0.3588, for N = 40 strips,
the area is 0.3459. If we increase N greatly, e.g. set N = 1000 strips, which begins to
approximate the limit of N →∞, then the area obtained is 0.3338 units
210
.
This example illustrates that areas can be computed “numerically”  indeed many of
the laboratory exercises that accompany this course will be based on precisely this idea.
The advantage of this approach is that it requires only elementary “programming”  i.e.
the assembly of a simple algorithm, i.e. a set of instructions. Once assembled, we can use
essentially the same algorithmto explore various functions, intervals, number of rectangles,
etc. Lab 2 in this course will motivate the student to explore this numerical integration
approach, and later labs will expand and generalize the idea to a variety of settings.
In our second approach, we set up the problem analytically. We will ﬁnd that results
are similar. However, we will get deeper insight by understanding what happens in the limit
as the number of strips N gets very large.
10
Note that all these values are approximations, correct to 4 decimal places. Compare with the exact calcula
tions in Section 2.2.2
30 Chapter 2. Areas
2.2.2 Second approach: Analytic computation using Riemann
sums
In this section we consider the detailed steps involved in analytically computing the area of
the region bounded by the function
y = f(x) = x
2
, 0 ≤ x ≤ 1.
By this we mean that we use “penandpaper” calculations, rather than computational aids
to determine that area.
We set up the rectangles (as shown in Figure 2.2, with detailed labeling in Fig
ures 2.3), determine the heights and areas of these rectangle, sum their total area, and
then determine how this value behaves as the rectangles get more numerous (and thinner).
1
y
y=f(x)=x
2
f(x)
Δx
0
f(x )
y
x
x x ... ...
1 k−1 k N
x x
0
y=f(x)
k
f(x )
N
Figure 2.3. The region under the graph of y = f(x) for 0 ≤ x ≤ 1 will be
approximated by a set of N rectangles. A rectangle (shaded) has base width ∆x and
height f(x). Since 0 ≤ x ≤ 1, and the all rectangles have the same base width, it follows
that ∆x = 1/N. In the panel on the right, the coordinates of base corners and two typical
heights of the rectangles have been labeled. Here x
0
= 0, x
N
= 1 and x
k
= k∆x.
The interval of interest in this problem is 0 ≤ x ≤ 1. Let us subdivide this interval
into N equal subintervals. Then each has width 1/N. (We will refer to this width as ∆x, as
shown in Figure 2.3, as it forms a difference of successive x coordinates.) The coordinates
of the endpoints of these subintervals will be labeled x
0
, x
1
, . . . , x
k
, . . . , x
N
, where the
value x
0
= 0 and x
N
= 1 are the endpoints of the original interval. Since the points are
equally spaced, starting at x
0
= 0, the coordinate x
k
is just k steps of size 1/N along
the x axis, i.e. x
k
= k(1/N) = k/N. In the right panel of Figure 2.3, some of these
coordinates have been labeled. For clarity, we show only the ﬁrst few points, together with
a representative pair x
k−1
and x
k
inside the region.
Let us look more carefully at one of the rectangles. Suppose we look at the rectangle
labeled k. Such a representative kth rectangle is shown shaded in Figures 2.3. The height
of this rectangle is determined by the value of the function, since one corner of the rectangle
is “glued” to the curve. The choice shown in Figure 2.3 is to afﬁx the right corner of each
2.2. Computing the area under a curve by rectangular strips 31
rectangle (k) right x coord (x
k
) height f(x
k
) area a
k
1 (1/N) (1/N)
2
(1/N)
2
∆x
2 (2/N) (2/N)
2
(2/N)
2
∆x
3 (3/N) (3/N)
2
(3/N)
2
∆x
.
.
.
k (k/N) (k/N)
2
(k/N)
2
∆x
.
.
.
N (N/N) = 1 (N/N)
2
= 1 (1)∆x
Table 2.1. The label, position, height, and area a
k
of each rectangular strip is
shown above. Each rectangle has the same base width, ∆x = 1/N. We approximate the
area under the curve y = f(x) = x
2
by the sum of the values in the last column, i.e. the
total area of the rectangles.
rectangle on the curve. This implies that the height of the kth rectangle is obtained from
substituting x
k
into the function, i.e. height = f(x
k
). The base of every rectangle is the
same, i.e. base = ∆x = 1/N. This means that the area of the kth rectangle, shown shaded,
is
a
k
= height ×base = f(x
k
)∆x
We now use three facts:
f(x
k
) = x
2
k
, ∆x =
1
N
, x
k
=
k
N
.
Then the area of the k’th rectangle is
a
k
= height ×base = f(x
k
)∆x =
k
N
2
. .. .
f(x
k
)
1
N
. .. .
∆x
.
A list of rectangles, and their properties are shown in Table 2.1. This may help the
reader to see the pattern that emerges in the summation. (In general this table is not needed
in our work, and it is presented for this example only, to help visualize how heights of
rectangles behave.) The total area of all rectangular strips (a sum of the values in the right
column of Table 2.1) is
A
N strips
=
N
¸
k=1
a
k
=
N
¸
k=1
f(x
k
)∆x =
N
¸
k=1
k
N
2
1
N
. (2.1)
The expressions shown in Eqn. (2.1) is a Riemann sum. A recurring theme underlying
integral calculus is the relationship between Riemann sums and deﬁnite integrals, a concept
introduced later on in this chapter.
32 Chapter 2. Areas
We now rewrite this sum in a more convenient form so that summation formulae
developed in Chapter 1 can be used. In this sum, only the quantity k changes from term to
term. All other quantities are common factors, so that
A
N strips
=
1
N
3
N
¸
k=1
k
2
.
The formula (1.2) for the sum of square integers can be applied to the summation, resulting
in
A
N strips
=
1
N
3
N(N + 1)(2N + 1)
6
=
(N + 1)(2N + 1)
6N
2
. (2.2)
In the box below, we use Eqn. (2.2) to compute that approximate area for values of N
shown in the ﬁrst three panels of Fig 2.2. Note that these are comparable to the values we
obtained “numerically” in Section 2.2.1. (We plug in the value of N into (2.2) and use a
calculator to obtain the results below.)
If N = 10 strips (Figure 2.2a), the width of each strip is 0.1 unit. According to equa
tion 2.2, the area of the 10 strips (shown in red) is
A
10 strips
=
(10 + 1)(2 · 10 + 1)
6 · 10
2
= 0.385.
If N = 20 strips (Figure 2.2b), ∆x = 1/20 = 0.05, and
A
20 strips
=
(20 + 1)(2 · 20 + 1)
6 · 20
2
= 0.35875.
If N = 40 strips (Figure 2.2c), ∆x = 1/40 = 0.025 and
A
40 strips
=
(40 + 1)(2 · 40 + 1)
6 · 40
2
= 0.3459375.
We will deﬁne the true area under the graph of the function y = f(x) over the given
interval to be:
A = lim
N→∞
A
N strips
.
This means that the true area is obtained by letting the number of rectangular strips, N, get
very large, (while the width of each one, ∆x = 1/N gets very small.)
In the example discussed in this section, the true area is found by taking the limit as
N gets large in equation (2.2), i.e.,
A = lim
N→∞
1
N
2
(N + 1)(2N + 1)
6
=
1
6
lim
N→∞
(N + 1)(2N + 1)
N
2
.
To evaluate this limit, note that when N gets very large, we can use the approximations,
(N + 1) ≈ N and (2N + 1) ≈ 2N so that (simplifying and cancelling common factors)
lim
N→∞
(N + 1)(2N + 1)
N
2
= lim
N→∞
(N)
N
(2N)
N
= 2.
2.3. The area of a leaf 33
The result is:
A =
1
6
(2) =
1
3
≈ 0.333. (2.3)
Thus, the true area of the region (Figure 2.2d) is is 1/3 units
2
.
2.2.3 Comments
Many student who have had calculus before in highschool, ask “why do we bother with
such tedious calculations, when we could just use integration?”. Indeed, our development
of Riemann sums foreshadows and anticipates the idea of a deﬁnite integral, and in short
order, some powerful techniques will help to shortcut such technical calculations. There
are two reasons why we linger on Riemann sums. First, in order to understand integration
adequately, we must understand the underlying “technology” and concepts; this proves
vital in understanding how to use the methods, and when things can go wrong. It also helps
to understand what integrals represent in applications that occur later on. Second, even
though we will shortly have better tools for analytical calculations, the ideas of setting up
area approximations using rectangular strips is very similar to the way that the spreadsheet
computations are designed. (However, the summation is handled automatically using the
spreadsheet, and no “formulae” are needed.) In Section 2.2.1, we gave only few details of
the steps involved. The student will ﬁnd that understanding the ideas of Section 2.2.2 will
go handinhand with understanding the numerical approach of Section 2.2.1.
The ideas outlined above can be applied to more complicated situations. In the next
section we consider a practical problem in which a similar calculation is carried out.
2.3 The area of a leaf
Leaves act as solar energy collectors for plants. Hence, their surface area is an important
property. In this section we use our techniques to determine the area of a rhododendron
leaf, shown in Figure 2.4. For simplicity of treatment, we will ﬁrst consider a function
designed to mimic the shape of the leaf in a simple system of units: we will scale distances
by the length of the leaf, so that its proﬁle is contained in the interval 0 ≤ x ≤ 1. We later
ask how to modify this treatment to describe similarly curved leaves of arbitrary length and
width, and leaves that are less symmetric. As shown in Figure 2.4, a simple parabola, of
the form
y = f(x) = x(1 −x),
provides a convenient approximation to the top edge of the leaf. To check that this is the
case, we observe that at x = 0 and x = 1, the curve intersects the x axis. At 0 < x < 1,
the curve is above the axis. Thus, the area between this curve and the x axis, is one half of
the leaf area.
We set up the computation of approximating rectangular strips as before, by sub
dividing the interval of interest into N rectangular strips. We can set up the calculation
systematically, as follows:
length of interval = 1 −0 = 1
34 Chapter 2. Areas
1
x
y
0
y=f(x)=x(1−x)
x
y
x =0
0
x
1
x
2
x =1
n
k
y =f(x )
k
Δ x
k’th
rectangle
(enlarged)
x
k
Figure 2.4. In this ﬁgure we show how the area of a leaf can be approximated by
rectangular strips.
number of segments, N
width of rectangular strips, ∆x =
1
N
the k’th x value, x
k
= k
1
N
=
k
N
height of k’th rectangular strip, f(x
k
) = x
k
(1 −x
k
)
The representative k’th rectangle is shown shaded in Figure 2.4: Its area is
a
k
= base ×height = ∆x · f(x
k
) =
1
N
. .. .
∆x
·
k
N
(1 −
k
N
)
. .. .
f(x
k
)
.
The total area of these rectangular strips is:
A
N strips
=
N
¸
k=1
a
k
=
N
¸
k=1
∆x · f(x
k
) =
N
¸
k=1
1
N
·
k
N
(1 −
k
N
)
.
2.4. Area under an exponential curve 35
Simplifying the result (so we can use summation formulae) leads to:
A
N strips
=
1
N
N
¸
k=1
k
N
(1 −
k
N
)
=
1
N
2
N
¸
k=1
k −
1
N
3
N
¸
k=1
k
2
.
Using the summation formulae (1.1) and(1.2) from Chapter 1 results in:
A
N strips
=
1
N
2
N(N + 1)
2
−
1
N
3
(2N + 1)N(N + 1)
6
.
Simplifying, and regrouping terms, we get
A
N strips
=
1
2
(N + 1)
N
−
1
6
(2N + 1)(N + 1)
N
2
.
This is the area for a ﬁnite number, N, of rectangular strips. As before, the true area is
obtained as the limit as N goes to inﬁnity, i.e. A = lim
N→∞
A
N strips
. We obtain:
A = lim
N→∞
1
2
(N + 1)
N
− lim
N→∞
1
6
(2N + 1)(N + 1)
N
2
=
1
2
−
1
6
· 2 =
1
6
.
Taking the limit leads to
A =
1
2
−
1
6
· 2 =
1
2
−
1
3
=
1
6
.
Thus the area of the entire leaf (twice this area) is 1/3.
Remark:
The function in this example can be written as y = x − x
2
. For part of this expression,
we have seen a similar calculation in Section 2.2. This example illustrates an important
property of sums, namely the fact that we can rearrange the terms into simpler expressions
that can be summed individually.
In the homework problems accompanying this chapter, we investigate how to de
scribe leaves with arbitrary lengths and widths, as well as leaves with shapes that are ta
pered, broad, or less symmetric than the current example.
2.4 Area under an exponential curve
In the precious examples, we considered areas under curves described by a simple quadratic
functions. Each of these led to calculations in which sums of integers or square integers
appeared. Here we demonstrate an example in which a geometric sum will be used. Recall
that we derived Eqn. (1.4) in Chapter 1, for a ﬁnite geometric sum.
We will ﬁnd the area under the graph of the function y = f(x) = e
2x
over the
interval between x = 0 and x = 2. In evaluating a limit in this example, we will also use
the fact that the exponential function has a linear approximation as follows:
e
z
≈ 1 +z
36 Chapter 2. Areas
(See Linear Approximations in an earlier calculus course.)
As before, we subdivide the interval into N pieces, each of width 2/N. Proceeding
systematically as before, we write
length of interval = 2 −0 = 2
number of segments = N
width of rectangular strips, ∆x =
2
N
the k’th x value, x
k
= k
2
N
=
2k
N
height of k’th rectangular strip, f(x
k
) = e
x
k
= e
2(2k/N)
= e
4k/N
We observe that the length of the interval (here 2) has affected the details of the calculation.
As before, the area of the k’th rectangle is
a
k
= base ×height = ∆x ×f(x
k
) =
2
N
e
4k/N
,
and the total area of all the rectangles is
A
N strips
=
2
N
N
¸
k=1
e
4k/N
=
2
N
N
¸
k=1
r
k
=
2
N
N
¸
k=0
r
k
−r
0
,
where r = e
4/N
. This is a ﬁnite geometric series. Because the series starts with k = 1 and
not with k = 0, the sum is
A
N strips
=
2
N
¸
(1 −r
N+1
)
(1 −r)
−1
.
After some simpliﬁcation and using r = e
4/N
, we ﬁnd that
A
N strips
=
2
N
e
4/N
1 −e
4
1 −e
4/N
= 2
1 −e
4
N(e
−4/N
−1)
.
We need to determine what happens when N gets very large. We can use the linear approx
imation
e
−4/N
≈ 1 −4/N
to evaluate the limit of the term in the denominator, and we ﬁnd that
A = lim
N→∞
2
1 −e
4
N(e
−4/N
−1)
= lim
N→∞
2
1 −e
4
−N(1 + 4/N −1)
= 2
e
4
−1
4
≈ 26.799.
2.5 Extensions and other examples
More general interval
To calculate the area under the curve y = f(x) = x
2
over the interval 2 ≤ x ≤ 5 using
N rectangles, the width of each one would be ∆x = (5 − 2)/N = 3/N, (i.e., length of
2.6. The deﬁnite integral 37
interval divided by N). Since the interval starts at x
0
= 2, and increments in units of (3/N),
the k’th coordinate is x
k
= 2 + k(3/N) = 2 + (3k/N). The area of the k’th rectangle is
then A
K
= f(x
k
) × ∆x = [(2 + (3k/N))
2
](3/N), and this is to be summed over k. A
similar algebraic simpliﬁcation, summation formulae, and limit is needed to calculate the
true area.
Other examples
In the Appendix 11.2 we discuss a number of other examples with several modiﬁcations:
First, in Appendix 11.2.1, we show how to set up a Riemann sum for a more complicated
quadratic function on a general interval, a ≤ x ≤ b.
Second, we show how Riemann sums can be set up for left, rather than right endpoint
approximations. The results are entirely analogous.
2.6 The deﬁnite integral
We now introduce a central concept that will form an important theme in this course, that
of the deﬁnite integral. We begin by deﬁning a new piece of notation relevant to the topic
in this chapter, namely the area associated with the graph of a function. For a function y =
x
y
a b
A
y=f(x)
Figure 2.5. The shaded area A corresponds to the deﬁnite integral I of the func
tion f(x) over the interval a ≤ x ≤ b.
f(x) > 0 that is bounded and continuous
11
on an interval [a, b] (also written a ≤ x ≤ b),
we deﬁne the deﬁnite integral,
I =
b
a
f(x) dx (2.4)
to be the area A of the region under the graph of the function between the endpoints a and
b. See Figure 2.5.
2.6.1 Remarks
1. The deﬁnite integral is a number.
11
A function is said to be bounded if its graph stays between some pair of horizontal lines. It is continuous if
there are no “breaks” in its graph.
38 Chapter 2. Areas
2. The value of the deﬁnite integral depends on the function, and on the two end points
of the interval.
3. From previous remarks, we have a procedure to calculate the value of the deﬁnite
integral by dissecting the region into rectangular strips, summing up the total area of
the strips, and taking a limit as N, the number of strips gets large. (The calculation
may be nontrivial, and might involve sums that we have not discussed in our simple
examples so far, but in principle the procedure is welldeﬁned.)
(a) (b)
(c) (d)
0 1 0
2 4 0 2
1
x
x
y
y
Figure 2.6. Examples (14) relate areas shown above to deﬁnite integrals.
2.6.2 Examples
We have calculated the areas of regions bounded by particularly simple functions. To
practice notation, we write down the corresponding deﬁnite integral in each case. Note
that in many of the examples below, we need no elaborate calculations, but merely use
previously known or recently derived results, to familiarize the reader with the newnotation
just deﬁned.
Example (1)
The area under the function y = f(x) = x over the interval 0 ≤ x ≤ 1 is triangular,
with base and height 1. The area of this triangle is thus A = (1/2)base× height= 0.5
(Figure 2.6a). Hence,
1
0
xdx = 0.5.
2.7. The area as a function 39
Example (2)
In Section 2.2, we also computed the area under the function y = f(x) = x
2
on the interval
0 ≤ x ≤ 1 and found its area to be 1/3 (See Eqn. (2.3) and Fig. 2.6(b)). Thus
1
0
x
2
dx = 1/3 0.333.
Example (3)
A constant function of the form y = 1 over an interval 2 lex ≤ 4 would produce a rectan
gular region in the plane, with base (42)=2 and height 1 (Figure 2.6(c)). Thus
4
2
1 dx = 2.
Example (4)
The function y = f(x) = 1 − x/2 (Figure 2.6(d)) forms a triangular region with base 2
and height 1, thus
2
0
(1 − x/2) dx = 1.
2.7 The area as a function
In Chapter 3, we will elaborate on the idea of the deﬁnite integral and arrive at some very
important connection between differential and integral calculus. Before doing so, we have
to extend the idea of the deﬁnite integral somewhat, and thereby deﬁne a new function,
A(x).
y
a
y=f(x)
A(x)
b x
Figure 2.7. We deﬁne a new function A(x) to be the area associated with the
graph of some function y = f(x) from the ﬁxed endpoint a up to the endpoint x, where
a ≤ x ≤ b.
We will investigate how the area under the graph of a function changes as one of the
endpoints of the interval moves. We can think of this as a function that gradually changes
40 Chapter 2. Areas
(i.e. the area accumulates) as we sweep across the interval (a, b) from left to right in
Figure 2.1. The function A(x) represents the area of the region shown in Figure 2.7.
Extending our deﬁnition of the deﬁnite integral, we might be tempted to use the
notation
A(x) =
x
a
f(x) dx.
However, there is a slight problem with this notation: the symbol x is used in slightly
confusing ways, both as the argument of the function and as the variable endpoint of the
interval. To avoid possible confusion, we will prefer the notation
A(x) =
x
a
f(s) ds.
(or some symbol other than s used as a placeholder instead of x.)
An analogue already seen is the sum
N
¸
k=1
k
2
where N denotes the “end” of the sum, and k keeps track of where we are in the process
of summation. The symbol s, sometimes called a “dummy variable” is analogous to the
summation symbol k.
In the upcoming Chapter 3, we will investigate properties of this new “area function”
A(x) deﬁned above. This will lead us to the Fundamental Theorem of Calculus, and will
provide new and powerful tools to replace the dreary summations that we had to perform
in much of Chapter 2. Indeed, we are about to discover the amazing connection between a
function, the area A(x) under its curve, and the derivative of A(x).
2.8 Summary
In this chapter, we showed how to calculate the area of a region in the plane that is bounded
by the x axis, two lines of the form x = a and x = b, and the graph of a positive function
y = f(x). We also introduced the terminology “deﬁnite integral” (Section 2.6) and the
notation (2.4) to represent that area.
One of our main efforts here focused on how to actually compute that area by the
following set of steps:
• Subdivide the interval [a, b] into smaller intervals (width ∆x).
• Construct rectangles whose heights approximate the height of the function above the
given interval.
• Add up the areas of these approximating rectangles. (Here we often used summation
formulae from Chapter 1.) The resulting expression, such as Eqn. (2.1), for example,
was denoted a Riemann sum.
• Find out what happens to this total area in the limit when the width ∆x goes to zero
(or, in other words, when the number of rectangles N goes to inﬁnity).
2.8. Summary 41
We showed both the analytic approach, using Riemann sums and summation formu
lae to ﬁnd areas, as well as numerical approximations using a spreadsheet tool to arrive at
similar results. We then used a variety of examples to illustrate the concepts and arrive at
computed areas.
As a ﬁnal important point, we noted that the area “under the graph of a function” can
itself be considered a function. This idea will emerge as particularly important and will lead
us to the key concept linking the geometric concept of areas with the analytic properties
of antiderivatives. We shall see this link in the Fundamental Theorem of Calculus, in
Chapter 3.
42 Chapter 2. Areas
Chapter 3
The Fundamental
Theorem of Calculus
In this chapter we will formulate one of the most important results of calculus, the Funda
mental Theorem. This result will link together the notions of an integral and a derivative.
Using this result will allow us to replace the technical calculations of Chapter 2 by much
simpler procedures involving antiderivatives of a function.
3.1 The deﬁnite integral
In Chapter 2, we deﬁned the deﬁnite integral, I, of a function f(x) > 0 on an interval [a, b]
as the area under the graph of the function over the given interval a ≤ x ≤ b. We used the
notation
I =
b
a
f(x)dx
to represent that quantity. We also set up a technique for computing areas: the procedure
for calculating the value of I is to write down a sum of areas of rectangular strips and to
compute a limit as the number of strips increases:
I =
b
a
f(x)dx = lim
N→∞
N
¸
k=1
f(x
k
)∆x, (3.1)
where N is the number of strips used to approximate the region, k is an index associated
with the k’th strip, and ∆x = x
k+1
− x
k
is the width of the rectangle. As the number of
strips increases (N →∞), and their width decreases (∆x →0), the sum becomes a better
and better approximation of the true area, and hence, of the deﬁnite integral, I. Example
of such calculations (tedious as they were) formed the main theme of Chapter 2 .
We can generalize the deﬁnite integral to include functions that are not strictly pos
itive, as shown in Figure 3.1. To do so, note what happens as we incorporate strips cor
responding to regions of the graph below the x axis: These are associated with negative
values of the function, so that the quantity f(x
k
)∆x in the above sum would be negative
for each rectangle in the “negative” portions of the function. This means that regions of the
graph below the x axis will contribute negatively to the net value of I.
43
44 Chapter 3. The Fundamental Theorem of Calculus
If we refer to A
1
as the area corresponding to regions of the graph of f(x) above the
x axis, and A
2
as the total area of regions of the graph under the x axis, then we will ﬁnd
that the value of the deﬁnite integral I shown above will be
I = A
1
−A
2
.
Thus the notion of “area under the graph of a function” must be interpreted a little carefully
when the function dips below the axis.
x
y
y=f(x)
x
y
y=f(x)
(a) (b)
x
y
a
y=f(x)
x
y
a
y=f(x)
b c
(c) (d)
Figure 3.1. (a) If f(x) is negative in some regions, there are terms in the sum (3.1)
that carry negative signs: this happens for all rectangles in parts of the graph that dip
below the x axis. (b) This means that the deﬁnite integral I =
b
a
f(x)dx will correspond
to the difference of two areas, A
1
−A
2
where A
1
is the total area (dark) of positive regions
minus the total area (light) of negative portions of the graph. Properties of the deﬁnite
integral: (c) illustrates Property 1. (d) illustrates Property 2.
3.2 Properties of the deﬁnite integral
The following properties of a deﬁnite integral stemfromits deﬁnition, and the procedure for
calculating it discussed so far. For example, the fact that summation satisﬁes the distributive
3.3. The area as a function 45
property means that an integral will satisfy the same the same property. We illustrate some
of these in Fig 3.1.
1.
a
a
f(x)dx = 0,
2.
c
a
f(x)dx =
b
a
f(x)dx +
c
b
f(x)dx,
3.
b
a
Cf(x)dx = C
b
a
f(x)dx,
4.
b
a
(f(x) +g(x))dx =
b
a
f(x) +
b
a
g(x)dx,
5.
b
a
f(x)dx = −
a
b
f(x)dx.
Property 1 states that the “area” of a region with no width is zero. Property 2 shows
how a region can be broken up into two pieces whose total area is just the sum of the
individual areas. Properties 3 and 4 reﬂect the fact that the integral is actually just a sum,
and so satisﬁes properties of simple addition. Property 5 is obtained by noting that if
we perform the summation “in the opposite direction”, then we must replace the previous
“rectangle width” given by ∆x = x
k+1
−x
k
by the new “width” which is of opposite sign:
x
k
−x
k+1
. This accounts for the sign change shown in Property 5.
3.3 The area as a function
In Chapter 2, we investigated how the area under the graph of a function changes as one of
the endpoints of the interval moves. We deﬁned a function that represents the area under
the graph of a function f, from some ﬁxed starting point, a to an endpoint x.
A(x) =
x
a
f(t) dt.
This endpoint is considered as a variable
12
, i.e. we will be interested in the way that this
area changes as the endpoint varies (Figure 3.2(a)). We will now investigate the interesting
connection between A(x) and the original function, f(x).
We would like to study how A(x) changes as x is increased ever so slightly. Let
∆x = h represent some (very small) increment in x. (Caution: do not confuse h with
height here. It is actually a step size along the x axis.) Then, according to our deﬁnition,
A(x + h) =
x+h
a
f(t) dt.
12
Recall that the “dummy variable” t inside the integral is just a “place holder”, and is used to avoid confusion
with the endpoint of the integral (x in this case). Also note that the value of A(x) does not depend in any way on
t, so any letter or symbol in its place would do just as well.
46 Chapter 3. The Fundamental Theorem of Calculus
a x
y
y=f(x)
A(x)
a x
y
y=f(x)
x+h
A(x+h)
(a) (b)
a x
y
y=f(x)
x+h
A(x+h)−A(x)
a
y
y=f(x)
h
f(x)
(c) (d)
Figure 3.2. When the right endpoint of the interval moves by a distance h, the area
of the region increases from A(x) to A(x + h). This leads to the important Fundamental
Theorem of Calculus, given in Eqn. (3.2).
In Figure 3.2(a)(b), we illustrate the areas represented by A(x) and by A(x + h), respec
tively. The difference between the two areas is a thin sliver (shown in Figure 3.2(c)) that
looks much like a rectangular strip (Figure 3.2(d)). (Indeed, if h is small, then the approx
imation of this sliver by a rectangle will be good.) The height of this sliver is speciﬁed
by the function f evaluated at the point x, i.e. by f(x), so that the area of the sliver is
approximately f(x) · h. Thus,
A(x +h) −A(x) ≈ f(x)h
or
A(x +h) −A(x)
h
≈ f(x).
As h gets small, i.e. h →0, we get a better and better approximation, so that, in the limit,
lim
h→0
A(x +h) −A(x)
h
= f(x).
The ratio above should be recognizable. It is simply the derivative of the area function, i.e.
f(x) =
dA
dx
= lim
h→0
A(x +h) −A(x)
h
. (3.2)
3.4. The Fundamental Theorem of Calculus 47
We have just given a simple argument in support of an important result, called the
Fundamental Theorem of Calculus, which is restated below..
3.4 The Fundamental Theorem of Calculus
3.4.1 Fundamental theorem of calculus: Part I
Let f(x) be a bounded and continuous function on an interval [a, b]. Let
A(x) =
x
a
f(t) dt.
Then for a < x < b,
dA
dx
= f(x).
In other words, this result says that A(x) is an “antiderivative” of the original function,
f(x)
13
.
Proof
See above argument. and Figure 3.2.
3.4.2 Example: an antiderivative
Recall the connection between functions and their derivatives. Consider the following two
functions:
g
1
(x) =
x
2
2
, g
2
=
x
2
2
+ 1.
Clearly, both functions have the same derivative:
g
1
(x) = g
2
(x) = x.
We would say that x
2
/2 is an “antiderivative” of x and that (x
2
/2) + 1 is also an “an
tiderivative” of x. In fact, any function of the form
g(x) =
x
2
2
+C where C is any constant
is also an “antiderivative” of x.
This example illustrates that adding a constant to a given function will not affect
the value of its derivative, or, stated another way, antiderivatives of a given function are
deﬁned only up to some constant. We will use this fact shortly: if A(x) and F(x) are both
antiderivatives of some function f(x), then A(x) = F(x) +C.
13
We often write “antiderivative”, with no hyphen.
48 Chapter 3. The Fundamental Theorem of Calculus
3.4.3 Fundamental theorem of calculus: Part II
Let f(x) be a continuous function on [a, b]. Suppose F(x) is any antiderivative of f(x).
Then for a ≤ x ≤ b,
A(x) =
x
a
f(t) dt = F(x) −F(a).
Proof
Fromcomments above, we knowthat a function f(x) could have many different antideriva
tives that differ from one another by some additive constant. We are told that F(x) is an
antiderivative of f(x). But from Part I of the Fundamental Theorem, we know that A(x) is
also an antiderivative of f(x). It follows that
A(x) =
x
a
f(t) dt = F(x) +C, where C is some constant. (3.3)
However, by property 1 of deﬁnite integrals,
A(a) =
a
a
f(t) = F(a) +C = 0.
Thus,
C = −F(a).
Replacing C by −F(a) in equation 3.3 leads to the desired result. Thus
A(x) =
x
a
f(t) dt = F(x) −F(a).
Remark 1: Implications
This theorem has tremendous implications, because it allows us to use a powerful new
tool in determining areas under curves. Instead of the drudgery of summations in order to
compute areas, we will be able to use a shortcut: ﬁnd an antiderivative, evaluate it at the
two endpoints a, b of the interval of interest, and subtract the results to get the area. In the
case of elementary functions, this will be very easy and convenient.
Remark 2: Notation
We will often use the notation
F(t)
x
a
= F(x) −F(a)
to denote the difference in the values of a function at two endpoints.
3.5. Review of derivatives (and antiderivatives) 49
3.5 Review of derivatives (and antiderivatives)
By remarks above, we see that integration is related to “antidifferentiation”. This moti
vates a review of derivatives of common functions. Table 3.1 lists functions f(x) and their
derivatives f
(x) (in the ﬁrst two columns) and functions f(x) and their antiderivatives
F(x) in the subsequent two columns. These will prove very helpful in our calculations of
basic integrals.
function derivative function antiderivative
f(x) f
(x) f(x) F(x)
Cx C C Cx
x
n
nx
n−1
x
m x
m+1
m+1
sin(ax) a cos(ax) cos(bx) (1/b) sin(bx)
cos(ax) −a sin(ax) sin(bx) −(1/b) cos(bx)
tan(ax) a sec
2
(ax) sec
2
(bx) (1/b) tan(bx)
e
kx
ke
kx
e
kx
e
kx
/k
ln(x)
1
x
1
x
ln(x)
arctan(x)
1
1 +x
2
1
1 +x
2
arctan(x)
arcsin(x)
1
√
1 −x
2
1
√
1 −x
2
arcsin(x)
Table 3.1. Common functions and their derivatives (on the left two columns) also
result in corresponding relationships between functions and their antiderivatives (right two
columns). In this table, we assume that m = −1, b = 0, k = 0. Also, when using ln(x) as
antiderivative for 1/x, we assume that x > 0.
As an example, consider the polynomial
p(x) = a
0
+a
1
x +a
2
x
2
+a
3
x
3
+. . .
This polynomial could have many other terms (or even an inﬁnite number of such terms,
as we discuss much later, in Chapter 10). Its antiderivative can be found easily using the
“power rule” together with the properties of addition of terms. Indeed, the antiderivative is
F(x) = C +a
0
x +
a
1
2
x
2
+
a
2
3
x
3
+
a
3
4
x
4
+. . .
50 Chapter 3. The Fundamental Theorem of Calculus
This can be checked easily by differentiation
14
.
3.6 Examples: Computing areas with the
Fundamental Theorem of Calculus
3.6.1 Example 1: The area under a polynomial
Consider the polynomial
p(x) = 1 +x +x
2
+x
3
.
(Here we have taken the ﬁrst few terms from the example of the last section with coefﬁ
cients all set to 1.) Then, computing
I =
1
0
p(x) dx
leads to
I =
1
0
(1 +x +x
2
+x
3
) dx = (x +
1
2
x
2
+
1
3
x
3
+
1
4
x
4
)
1
0
= 1 +
1
2
+
1
3
+
1
4
≈ 2.083.
3.6.2 Example 2: Simple areas
Determine the values of the following deﬁnite integrals by ﬁnding antiderivatives and using
the Fundamental Theorem of Calculus:
1. I =
1
0
x
2
dx,
2. I =
1
−1
(1 −x
2
) dx,
3. I =
1
−1
e
−2x
dx,
4. I =
π
0
sin
x
2
dx,
Solutions
1. An antiderivative of f(x) = x
2
is F(x) = (x
3
/3), thus
I =
1
0
x
2
dx = F(x)
1
0
= (1/3)(x
3
)
1
0
=
1
3
(1
3
−0) =
1
3
.
14
In fact, it is very good practice to perform such checks.
3.6. Examples: Computing areas with the Fundamental Theorem of Calculus 51
2. An antiderivative of f(x) = (1 −x
2
) is F(x) = x −(x
3
/3), thus
I =
1
−1
(1−x
2
) dx = F(x)
1
−1
=
x −(x
3
/3)
1
−1
=
1 −(1
3
/3)
−
(−1) −((−1)
3
/3)
= 4/3
See comment below for a simpler way to compute this integral.
3. An antiderivative of e
−2x
is F(x) = (−1/2)e
−2x
. Thus,
I =
1
−1
e
−2x
dx = F(x)
1
−1
= (−1/2)(e
−2x
)
1
−1
= (−1/2)(e
−2
−e
2
).
4. An antiderivative of sin(x/2) is F(x) = −cos(x/2)/(1/2) = −2 cos(x/2). Thus
I =
π
0
sin
x
2
dx = −2 cos(x/2)
π
0
−2(cos(π/2) −cos(0)) = −2(0 −1) = 2.
Comment: The evaluation of Integral 2. in the examples above is tricky only in that signs
can easily get garbled when we plug in the endpoint at 1. However, we can simplify our
work by noting the symmetry of the function f(x) = 1 − x
2
on the given interval. As
shown in Fig 3.3, the areas to the right and to the left of x = 0 are the same for the interval
−1 ≤ x ≤ 1. This stems directly from the fact that the function considered is even
15
.
Thus, we can immediately write
I =
1
−1
(1 − x
2
) dx = 2
1
0
(1 −x
2
) dx = 2
x −(x
3
/3)
1
0
= 2
1 −(1
3
/3)
= 4/3
Note that this calculation is simpler since the endpoint at x = 0 is trivial to plug in.
2
−1 0 1
x
y=1−x
Figure 3.3. We can exploit the symmetry of the function f(x) = 1 − x
2
in the
second integral of Examples 3.6.2. We can integrate over 0 ≤ x ≤ 1 and double the result.
We state the general result we have obtained, which holds true for any function with
even symmetry integrated on a symmetric interval about x = 0:
If f(x) is an even function, then
a
−a
f(x) dx = 2
a
−a
f(x) dx (3.4)
15
Recall that a function f(x) is even if f(x) = f(−x) for all x. A function is odd if f(x) = −f(−x).
52 Chapter 3. The Fundamental Theorem of Calculus
3.6.3 Example 3: The area between two curves
The deﬁnite integral is an area of a somewhat special type of region, i.e., an axis, two
vertical lines (x = a and x = b) and the graph of a function. However, using additive
(or subtractive) properties of areas, we can generalize to computing areas of other regions,
including those bounded by the graphs of two functions.
(a) Find the area enclosed between the graphs of the functions y = x
3
and y = x
1/3
in the ﬁrst quadrant.
(b) Find the area enclosed between the graphs of the functions y = x
3
and y = x in
the ﬁrst quadrant.
(c) What is the relationship of these two areas? What is the relationship of the func
tions y = x
3
and y = x
1/3
that leads to this relationship between the two areas?
A
A
y=x
y=x
y=x
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2
y
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x
1
2
3
1/3
Figure 3.4. In Example 3, we compute the areas A
1
and A
2
shown above.
Solution
(a) The two curves, y = x
3
and y = x
1/3
, intersect at x = 0 and at x = 1 in the ﬁrst
quadrant. Thus the interval that we will be concerned with is 0 < x < 1. On this
interval, x
1/3
> x
3
, so that the area we want to ﬁnd can be expressed as:
A
1
=
1
0
x
1/3
−x
3
dx.
Thus,
A
1
=
x
4/3
4/3
1
0
−
x
4
4
1
0
=
3
4
−
1
4
=
1
2
.
(b) The two curves y = x
3
and y = x also intersect at x = 0 and at x = 1 in the
ﬁrst quadrant, and on the interval 0 < x < 1 we have x > x
3
. The area can be
represented as
A
2
=
1
0
x −x
3
dx.
3.7. Qualitative ideas 53
A
2
=
x
2
2
1
0
−
x
4
4
1
0
=
1
2
−
1
4
=
1
4
.
(c) The area calculated in (a) is twice the area calculated in (b). The reason for this is that
x
1/3
is the inverse of the function x
3
, which means geometrically that the graph of
x
1/3
is the mirror image of the graph of x
3
reﬂected about the line y = x. Therefore,
the area A
1
between y = x
1/3
and y = x
3
is twice as large as the area A
2
between
y = x and y = x
3
calculated in part (b): A
1
= 2A
2
(see Figure 3.4).
3.6.4 Example 4: Area of land
Find the exact area of the piece of land which is bounded by the y axis on the west, the x
axis in the south, the lake described by the function y = f(x) = 100 + (x/100)
2
in the
north and the line x = 1000 in the east.
Solution
The area is
A =
1000
0
100 +
x
100
2
dx. =
1000
0
100 +
1
10000
x
2
dx.
Note that the multiplicative constant (1/10000) is not affected by integration. The result is
A = 100x
1000
0
+
x
3
3
1000
0
·
1
10000
=
4
3
10
5
.
3.7 Qualitative ideas
In some cases, we are given a sketch of the graph of a function, f(x), fromwhich we would
like to construct a sketch of the associated function A(x). This sketching skill is illustrated
in the ﬁgures shown in this section.
Suppose we are given a function as shown in the top left hand panel of Figure 3.5.
We would like to assemble a sketch of
A(x) =
x
a
f(t)dt
which corresponds to the area associated with the graph of the function f. As x moves
from left to right, we show how the “area” accumulated along the graph gradually changes.
(See A(x) in bottom panels of Figure 3.5): We start with no area, at the point x = a
(since, by deﬁnition A(a) = 0) and gradually build up to some net positive amount, but
then we encounter a portion of the graph of f below the x axis, and this subtracts from
the amount accrued. (Hence the graph of A(x) has a little peak that corresponds to the
point at which f = 0.) Every time the function f(x) crosses the x axis, we see that A(x)
has either a maximum or minimum value. This ﬁts well with our idea of A(x) as the
antiderivative of f(x): Places where A(x) has a critical point coincide with places where
dA/dx = f(x) = 0.
54 Chapter 3. The Fundamental Theorem of Calculus
(a)
x
f(x)
x
A(x)
(b)
x
f(x)
x
A(x)
(c)
x
f(x)
x
A(x)
(d)
x
f(x)
x
A(x)
Figure 3.5. Given a function f(x), we here show how to sketch the corresponding
“area function” A(x). (The relationship is that f(x) is the derivative of A(x)
Sketching the function A(x) is thus analogous to sketching a function g(x) when we
are given a sketch of its derivative g
(x). Recall that this was one of the skills we built up in
learning the connection between functions and their derivatives in a ﬁrst semester calculus
course.
Remarks
The following remarks may be helpful in gaining conﬁdence with sketching the “area”
function A(x) =
x
a
f(t) dt, from the original function f(x):
1. The endpoint of the interval, a on the x axis indicates the place at which A(x) = 0.
This follows from Property 1 of the deﬁnite integral, i.e. from the fact that A(a) =
a
a
f(t) dt = 0.
2. Whenever f(x) is positive, A(x) is an increasing function  this follows from the fact
that the area continues to accumulate as we “sweep across” positive regions of f(x).
3.7. Qualitative ideas 55
a
x
x
f(x)
g(x)
+
−
+
−
+
a
Figure 3.6. Given a function f(x) (top, solid line), we assemble a plot of the
corresponding function g(x) =
x
a
f(t)dt (bottom, solid line). g(x) is an antiderivative
of f(x). Whether f(x) is positive (+) or negative () in portions of its graph, determines
whether g(x) is increasing or decreasing over the given intervals. Places where f(x)
changes sign correspond to maxima and minima of the function g(x) (Two such places are
indicated by dotted vertical lines). The box in the middle of the sketch shows conﬁgurations
of tangent lines to g(x) based on the sign of f(x). Where f(x) = 0, those tangent lines
are horizontal. The function g(x) is drawn as a smooth curve whose direction is parallel to
the tangent lines shown in the box. While the function f(x) has many antiderivatives (e.g.,
dashed curve parallel to g(x)), only one of these satisﬁes g(a) = 0 as required by Property
1 of the deﬁnite integral. (See dashed vertical line at x = a). This determines the height of
the desired function g(x).
3. Wherever f(x), changes sign, the function A(x) has a local minimum or maximum.
This means that either the area stops increasing (if the transition is from positive to
negative values of f), or else the area starts to increase (if f crosses from negative to
positive values).
4. Since dA/dx = f(x) by the Fundamental Theorem of Calculus, it follows that (tak
56 Chapter 3. The Fundamental Theorem of Calculus
ing a derivative of both sides) d
2
A/dx
2
= f
(x). Thus, when f(x) has a local
maximum or minimum, (i.e. f
(x) = 0), it follows that A
(x) = 0. This means that
at such points, the function A(x) would have an inﬂection point.
Given a function f(x), Figure 3.6 shows in detail how to sketch the corresponding function
g(x) =
x
a
f(t)dt.
3.7.1 Example: sketching A(x)
Consider the f(x) whose graph is shown in the top part of Figure 3.7. Sketch the corre
sponding function g(x) =
x
a
f(x)dx.
x
+
−
+
f(x)
g(x)
a
a
x
Figure 3.7. The original functions, f(x) is shown above. The corresponding
functions g(x) is drawn below.
Solution
See Figure 3.7
3.8 Some ﬁne print
The Fundamental Theorem has a number of restrictions that must be satisﬁed before its
results can be applied. In this section we look at some examples in which care must be
used.
3.8. Some ﬁne print 57
3.8.1 Function unbounded I
Consider the deﬁnite integral
2
0
1
x
dx.
The function f(x) =
1
x
is undeﬁned at x = 0, and unbounded on any interval that contains
the point x = 0. Hence, we cannot evaluate this integral using the Fundamental theorem,
and indeed, we say that “this integral does not exist”.
3.8.2 Function unbounded II
Consider the deﬁnite integral
1
−1
1
x
2
dx.
This function is also undeﬁned (and hence not continuous) at x = 0. The Fundamental
Theorem of Calculus cannot be applied. Technically, although one can “go through the
motions” of computing an antiderivative, evaluating it at both endpoints, and getting a
numerical answer, the result so obtained would be simply wrong. We say that his integral
does not exist.
3.8.3 Example: Function discontinuous or with distinct parts
Suppose we are given the integral
I =
2
−1
x dx.
This function is actually made up of two distinct parts, namely
f(x) =
x if x > 0
−x if x < 0.
The integral I must therefore be split up into two parts, namely
I =
2
−1
x dx =
0
−1
(−x) dx +
2
0
x dx.
We ﬁnd that
I = −
x
2
2
0
−1
+
x
2
2
2
0
= −
¸
0 −
1
2
+
¸
4
2
−0
= 2.5
3.8.4 Function undeﬁned
Now let us examine the integral
1
−1
x
1/2
dx.
58 Chapter 3. The Fundamental Theorem of Calculus
x
y
y= x
−1 2 0
Figure 3.8. In this example, to compute the integral over the interval −1 ≤ x ≤ 2,
we must split up the region into two distinct parts.
We see that there is a problem here. Recall that x
1/2
=
√
x. Hence, the function is not
deﬁned for x < 0 and the interval of integration is inappropriate. Hence, this integral does
not make sense.
3.8.5 Inﬁnite domain (“improper integral”)
Consider the integral
I =
b
0
e
−rx
dx, where r > 0, and b > 0 are constants.
Simple integration using the antiderivative in Table 3.1 (for k = −r) leads to the result
I =
e
−rx
−r
b
0
= −
1
r
e
−rb
−e
0
=
1
r
1 −e
−rb
.
This is the area under the exponential curve between x = 0 and x = b. Now consider what
happens when b, the upper endpoint of the integral increases, so that b → ∞. Then the
value of the integral becomes
I = lim
b→∞
b
0
e
−rx
dx = lim
b→∞
1
r
1 −e
−rb
=
1
r
(1 −0) =
1
r
.
(We used the fact that e
−rb
→0 as b →∞.) We have, in essence, found that
I =
∞
0
e
−rx
dx =
1
r
. (3.5)
An integral of the form (3.5) is called an improper integral. Even though the domain
of integration of this integral is inﬁnite, (0, ∞), observe that the value we computed is
ﬁnite, so long as r = 0. Not all such integrals have a bounded ﬁnite value. Learning to
distinguish between those that do and those that do not will form an important theme in
Chapter 10.
3.9. Summary 59
Regions that need special treatment
So far, we have learned how to compute areas of regions in the plane that are bounded
by one or more curves. In all our examples so far, the basis for these calculations rests on
imagining rectangles whose heights are speciﬁed by one or another function. Up to now, all
the rectangular strips we considered had bases (of width ∆x) on the x axis. In Figure 3.9
we observe an example in which it would not be possible to use this technique. We are
Δ y
x=g(y)
x
y y
x
Figure 3.9. The area in the region shown here is best computed by integrating
in the y direction. If we do so, we can use the curved boundary as a single function that
deﬁnes the region. (Note that the curve cannot be expressed in the form of a function in the
usual sense, y = f(x), but it can be expressed in the form of a function x = f(y).)
asked to ﬁnd the area between the curve y
2
−y +x = 0 and the y axis. However, one and
the same curve, y
2
− y + x = 0 forms the boundary from both the top and the bottom of
the region. We are unable to set up a series of rectangles with bases along the x axis whose
heights are described by this curve. This means that our deﬁnite integral (which is really
just a convenient way of carrying out the process of area computation) has to be handled
with care.
Let us consider this problem from a “new angle”, i.e. with rectangles based on the y
axis, we can achieve the desired result. To do so, let us express our curve in the form
x = g(y) = y −y
2
.
Then, placing our rectangles along the interval 0 < y < 1 on the y axis (each having base
of width ∆y) leads to the integral
I =
1
0
g(y) dy =
1
0
(y −y
2
)dy =
y
2
2
−
y
3
3
1
0
=
1
2
−
1
3
=
1
6
.
3.9 Summary
In this chapter we ﬁrst recapped the deﬁnition of the deﬁnite integral in Section 3.1, recalled
its connection to an area in the plane under the graph of some function f(x), and examined
its basic properties.
If one of the endpoints, x of the integral is allowed to vary, the area it represents,
A(x), becomes a function of x. Our construction in Figure 3.2 showed that there is a con
nection between the derivative A
(x) of the area and the function f(x). Indeed, we showed
that A
(x) = f(x) and argued that this makes A(x) an antiderivative of the function f(x).
60 Chapter 3. The Fundamental Theorem of Calculus
This important connection between integrals and antiderivatives is the crux of In
tegral Calculus, forming the Fundamental Theorem of Calculus. Its signiﬁcance is that
ﬁnding areas need not be as tedious and labored as the calculation of Riemann sums that
formed the bulk of Chapter 2. Rather, we can take a shortcut using antidifferentiation.
Motivated by this very important result, we reviewed some common functions and
derivatives, and used this to relate functions and their antiderivatives in Table 3.1. We
used these antiderivatives to calculate areas in several examples. Finally, we extended the
treatment to include qualitative sketches of functions and their antiderivatives.
As we will see in upcoming chapters, the ideas presented here have a much wider
range of applicability than simple area calculations. Indeed, we will shortly show that the
same concepts can be used to calculate net changes in continually varying processes, to
compute volumes of various shapes, to determine displacement from velocity, mass from
densities, as well as a host of other quantities that involve a process of accumulation. These
ideas will be investigated in Chapters 4, and 5.
Chapter 4
Applications of the
deﬁnite integral to
velocities and rates
4.1 Introduction
In this chapter, we encounter a number of applications of the deﬁnite integral to practical
problems. We will discuss the connection between acceleration, velocity and displacement
of a moving object, a topic we visited in an earlier, Differential Calculus Course. Here
we will show that the notion of antiderivatives and integrals allows us to deduce details of
the motion of an object from underlying Laws of Motion. We will consider both uniform
and accelerated motion, and recall how air resistance can be described, and what effect it
induces.
An important connection is made in this chapter between a rate of change (e.g. rate
of growth) and the total change (i.e. the net change resulting from all the accumulation and
loss over a time span). We show that such examples also involve the concept of integration,
which, fundamentally, is a cumulative summation of inﬁnitesimal changes. This allows us
to extend the utility of the mathematical tools to a variety of novel situations. We will see
examples of this type in Sections 4.3 and 4.4.
Several other important ideas are introduced in this chapter. We encounter for the
ﬁrst time the idea of spatial density, and see that integration can also be used to “add up”
the total amount of material distributed over space. In Section 5.2.2, this idea is applied to
the density of cars along a highway. We also consider mass distributions and the notion of
a center of mass.
Finally, we also show that the deﬁnite integral is useful for determining the average
value of a function, as discussed in Section 4.6. In all these examples, the important step
is to properly set up the deﬁnite integral that corresponds to the desired net change. Com
putations at this stage are relatively straightforward to emphasize the process of setting up
the appropriate integrals and understanding what they represent.
61
62 Chapter 4. Applications of the deﬁnite integral to velocities and rates
4.2 Displacement, velocity and acceleration
Recall from our study of derivatives that for x(t) the position of some particle at time t,
v(t) its velocity, and a(t) the acceleration, the following relationships hold:
dx
dt
= v,
dv
dt
= a.
(Velocity is the derivative of position and acceleration is the derivative of velocity.) This
means that position is an antiderivative of velocity and velocity is an antiderivative of
acceleration.
Since position, x(t), is an antiderivative of velocity, v(t), by the Fundamental The
orem of Calculus, it follows that over the time interval T
1
≤ t ≤ T
2
,
T2
T1
v(t) dt = x(t)
T2
T1
= x(T
2
) −x(T
1
). (4.1)
The quantity on the right hand side of Eqn. (4.1) is a displacement,, i.e., the difference
between the position at time T
1
and the position at time T
2
. In the case that T
1
= 0, T
2
= T,
we have
T
0
v(t) dt = x(T) −x(0),
as the displacement over the time interval 0 ≤ t ≤ T.
Similarly, since velocity is an antiderivative of acceleration, the Fundamental Theo
rem of Calculus says that
T2
T1
a(t) dt = v(t)
T2
T1
= v(T
2
) −v(T
1
). (4.2)
as above, we also have that
T
0
a(t) dt = v(t)
T
0
= v(T) −v(0)
is the net change in velocity between time 0 and time T, (though this quantity does not
have a special name).
4.2.1 Geometric interpretations
Suppose we are given a graph of the velocity v(t), as shown on the left of Figure 4.1. Then
by the deﬁnition of the deﬁnite integral, we can interpret
T2
T1
v(t) dt as the “area” associ
ated with the curve (counting positive and negative contributions) between the endpoints
T
1
and T
2
. Then according to the above observations, this area represents the displacement
of the particle between the two times T
1
and T
2
.
Similarly, by previous remarks, the area under the curve a(t) is a geometric quantity
that represents the net change in the velocity, as shown on the right of Figure 4.1.
Next, we consider two examples where either the acceleration or the velocity is con
stant. We use the results above to compute the displacements in each case.
4.2. Displacement, velocity and acceleration 63
T T
v
1 2
t
displacement
This area represents
a
net velocity change
This area represents
T
1
T
2
t
Figure 4.1. The total area under the velocity graph represents net displacement,
and the total area under the graph of acceleration represents the net change in velocity
over the interval T
1
≤ t ≤ T
2
.
4.2.2 Displacement for uniform motion
We ﬁrst examine the simplest case that the velocity is constant, i.e. v(t) = v = constant.
Then clearly, the acceleration is zero since a = dv/dt = 0 when v is constant. Thus, by
direct antidifferentiation,
T
0
v dt = vt
T
0
= v(T −0) = vT.
However, applying result (4.1) over the time interval 0 ≤ t ≤ T also leads to
T
0
v dt = x(T) −x(0).
Therefore, it must be true that the two expressions obtained above must be equal, i.e.
x(T) −x(0) = vT.
Thus, for uniform motion, the displacement is proportional to the velocity and to the time
elapsed. The ﬁnal position is
x(T) = x(0) +vT.
This is true for all time T, so we can rewrite the results in terms of the more familiar (lower
case) notation for time, t, i.e.
x(t) = x(0) +vt. (4.3)
4.2.3 Uniformly accelerated motion
In this case, the acceleration a is a constant. Thus, by direct antidifferentiation,
T
0
a dt = at
T
0
= a(T − 0) = aT.
64 Chapter 4. Applications of the deﬁnite integral to velocities and rates
However, using Equation (4.2) for 0 ≤ t ≤ T leads to
T
0
a dt = v(T) −v(0).
Since these two results must match, v(T) −v(0) = aT so that
v(T) = v(0) +aT.
Let us refer to the initial velocity V (0) as v
0
. The above connection between velocity and
acceleration holds for any ﬁnal time T, i.e., it is true for all t that:
v(t) = v
0
+at. (4.4)
This just means that velocity at time t is the initial velocity incremented by an increase (over
the given time interval) due to the acceleration. From this we can ﬁnd the displacement and
position of the particle as follows: Let us call the initial position x(0) = x
0
. Then
T
0
v(t) dt = x(T) −x
0
. (4.5)
But
I =
T
0
v(t) dt =
T
0
(v
0
+at) dt =
v
0
t +a
t
2
2
T
0
=
v
0
T +a
T
2
2
. (4.6)
So, setting Equations (4.5) and (4.6) equal means that
x(T) −x
0
= v
0
T +a
T
2
2
.
But this is true for all ﬁnal times, T, i.e. this holds for any time t so that
x(t) = x
0
+ v
0
t +a
t
2
2
. (4.7)
This expression represents the position of a particle at time t given that it experienced a
constant acceleration. The initial velocity v
0
, initial position x
0
and acceleration a allowed
us to predict the position of the object x(t) at any later time t. That is the meaning of
Eqn. (4.7)
16
.
4.2.4 Nonconstant acceleration and terminal velocity
In general, the acceleration of a falling body is not actually uniform, because frictional
forces impede that motion. A better approximation to the rate of change of velocity is
given by the differential equation
dv
dt
= g −kv. (4.8)
16
Of course, Eqn. (4.7) only holds so long as the object is accelerating. Once the a falling object hits the ground,
for example, this equation no longer holds.
4.2. Displacement, velocity and acceleration 65
We will assume that initially the velocity is zero, i.e. v(0) = 0.
This equation is a mathematical statement that relates changes in velocity v(t) to the
constant acceleration due to gravity, g, and drag forces due to friction with the atmosphere.
A good approximation for such drag forces is the term kv, proportional to the velocity,
with k, a positive constant, representing a frictional coefﬁcient. Because v(t) appears both
in the derivative and in the expression kv, we cannot apply the methods developed in the
previous section directly. That is, we do not have an expression that depends on time whose
antiderivative we would calculate. The derivative of v(t) (on the left) is connected to the
unknown v(t) on the right.
Finding the velocity and then the displacement for this type of motion requires special
techniques. In Chapter 9, we will develop a systematic approach, called Separation of
Variables to ﬁnd analytic solutions to equations such as (4.8).
Here, we use a special procedure that allows us to determine the velocity in this case.
We ﬁrst recall the following result from ﬁrst term calculus material:
The differential equation and initial condition
dy
dt
= −ky, y(0) = y
0
(4.9)
has a solution
y(t) = y
0
e
−kt
. (4.10)
Equation (4.8) implies that
a(t) = g −kv(t),
where a(t) is the acceleration at time t. Taking a derivative of both sides of this equation
leads to
da
dt
= −k
dv
dt
= −ka.
We observe that this equation has the same form as equation (4.9) (with a replacing y),
which implies (according to 4.10) that a(t) is given by
a(t) = C e
−kt
= a
0
e
−kt
.
Initially, at time t = 0, the acceleration is a(0) = g (since a(t) = g −kv(t), and v(0) = 0).
Therefore,
a(t) = g e
−kt
.
Since we now have an explicit formula for acceleration vs time, we can apply direct inte
gration as we did in the examples in Sections 4.2.2 and 4.2.3. The result is:
T
0
a(t) dt =
T
0
g e
−kt
dt = g
T
0
e
−kt
dt = g
¸
e
−kt
−k
T
0
= g
(e
−kT
−1)
−k
=
g
k
1 −e
−kT
.
In the calculation, we have used the fact that the antiderivative of e
−kt
is e
−kt
/k. (This can
be veriﬁed by simple differentiation.)
66 Chapter 4. Applications of the deﬁnite integral to velocities and rates
t
velocity v(t)
0.0 30.0
0.0
50.0
Figure 4.2. Terminal velocity (m/s) for acceleration due to gravity g=9.8 m/s
2
,
and k = 0.2/s. The velocity reaches a near constant 49 m/s by about 20 s.
As before, based on equation (4.2) this integral of the acceleration over 0 ≤ t ≤ T
must equal v(T) − v(0). But v(0) = 0 by assumption, and the result is true for any ﬁnal
time T, so, in particular, setting T = t, and combining both results leads to an expression
for the velocity at any time:
v(t) =
g
k
1 −e
−kt
. (4.11)
We will study the differential equation (4.8) again in Section 9.3.2, in the context of a more
detailed discussion of differential equations
From our result here, we can also determine how the velocity behaves in the long
term: observe that for t →∞, the exponential term e
−kt
→0, so that
v(t) →
g
k
(1 − very small quantity) ≈
g
k
.
Thus, when drag forces are in effect, the falling object does not continue to accelerate
indeﬁnitely: it eventually attains a terminal velocity. We have seen that this limiting
velocity is v = g/k. The object continues to fall at this (approximately constant) speed as
shown in Figure 4.2. The terminal velocity is also a steady state value of Eqn. (4.8), i.e. a
value of the velocity at which no further change occurs.
4.3 From rates of change to total change
In this section, we examine several examples in which the rate of change of some process is
speciﬁed. We use this information to obtain the total change
17
that occurs over some time
period.
17
We will use the terminology “total change” and “net change” interchangeably in this section.
4.3. From rates of change to total change 67
Changing temperature
We must carefully distinguish between information about the time dependence of some
function, from information about the rate of change of some function. Here is an example
of these two different cases, and how we would handle them
(a) The temperature of a cup of juice is observed to be
T(t) = 25(1 −e
−0.1t
)
◦
Celcius
where t is time in minutes. Find the change in the temperature of the juice between
the times t = 1 and t = 5.
(b) The rate of change of temperature of a cup of coffee is observed to be
f(t) = 8e
−0.2t
◦
Celcius per minute
where t is time in minutes. What is the total change in the temperature between
t = 1 and t = 5 minutes ?
Solutions
(a) In this case, we are given the temperature as a function of time. To determine what
net change occurred between times t = 1 and t = 5, we ﬁnd the temperatures at
each time point and subtract: That is, the change in temperature between times t = 1
and t = 5 is simply
T(5) −T(1) = 25(1−e
−0.5
) −25(1−e
−0.1
) = 25(0.94 −0.606) = 7.47
◦
Celcius.
(b) Here, we do not know the temperature at any time, but we are given information
about the rate of change. (Carefully note the subtle difference in the wording.)
To get the total change, we would sum up all the small changes, f(t)∆t (over N
subintervals of duration ∆t = (5 − 1)/N = 4/N) for t starting at 1 and ending
at 5 min. We obtain a sum of the form
¸
f(t
k
)∆t where t
k
is the k’th time point.
Finally, we take a limit as the number of subintervals increases (N → ∞). By now,
we recognize that this amounts to a process of integration. Based on this variation
of the same concept we can take the usual shortcut of integrating the rate of change,
f(t), from t = 1 to t = 5. To do so, we apply the Fundamental Theorem as before,
reducing the amount of computation to ﬁnding antiderivatives. We compute:
I =
5
1
f(t) dt =
5
1
8e
−0.2t
dt = −40e
−0.2t
5
1
= −40e
−1
+ 40e
−0.2
,
I = 40(e
−0.2
−e
−1
) = 40(0.8187 −0.3678) = 18.
Only in the second case did we need to use a deﬁnite integral to ﬁnd a net change, since we
were given the way that the rate of change depended on time. Recognizing the subtleties
of the wording in such examples will be an important skill that the reader should gain.
68 Chapter 4. Applications of the deﬁnite integral to velocities and rates
2
0 1 2 3
1
rate
Growth
4 year
Figure 4.3. Growth rates of two trees over a four year period. Tree 1 initially has
a higher growth rate, but tree 2 catches up and grows faster after year 3.
4.3.1 Tree growth rates
The rate of growth in height for two species of trees (in feet per year) is shown in Figure 4.3.
If the trees start at the same height, which tree is taller after 1 year? After 4 years?
Solution
In this problem we are provided with a sketch, rather than a formula for the growth rate of
the trees. Our solution will thus be qualitative (i.e. descriptive), rather than quantitative.
(This means we do not have to calculate anything; rather, we have to make some important
observations about the behaviour shown in Fig 4.3.)
We recognize that the net change in height of each tree is of the form
H
i
(T) −H
i
(0) =
T
0
g
i
(t)dt, i = 1, 2.
where i = 1 for tree 1, i = 2 for tree 2, g
i
(t) is the growth rate as a function of time
(shown for each tree in Figure 4.3) and H
i
(t) is the height of tree “i” at time t. But, by the
Fundamental Theorem of Calculus, this deﬁnite integral corresponds to the area under the
curve g
i
(t) from t = 0 to t = T. Thus we must interpret the net change in height for each
tree as the area under its growth curve. We see from Figure 4.3 that at t = 1 year, the area
under the curve for tree 1 is greater, so it has grown more. At t = 4 years the area under
the second curve is greatest so tree 2 has grown the most by that time.
4.3.2 Radius of a tree trunk
The trunk of a tree, assumed to have the shape of a cylinder, grows incrementally, so that its
crosssection consists of “rings”. In years of plentiful rain and adequate nutrients, the tree
grows faster than in years of drought or poor soil conditions. Suppose the rainfall pattern
4.3. From rates of change to total change 69
time, t
f(t)
0.0 14.0
0.0
3.0
Figure 4.4. Rate of change of radius, f(t) for a growing tree over a period of 14 years.
has been cyclic, so that, over a period of 14 years, the growth rate of the radius of the tree
trunk (in cm/year) is given by the function
f(t) = 1.5 + sin(πt/5),
as shown in Figure 4.4. Let the height of the tree trunk be approximately constant over this
ten year period, and assume that the density of the trunk is approximately 1 gm/cm
3
.
(a) If the radius was initially r
0
at time t = 0, what will the radius of the trunk be at
time t later?
(b) What is the ratio of the mass of the tree trunk at t = 10 years and t = 0 years?
(i.e. ﬁnd the ratio mass(10)/mass(0).)
Solution
(a) Let R(t) denote the trunk’s radius at time t. The rate of change of the radius of the tree
is given by the function f(t), and we are told that at t = 0, R(0) = r
0
. A graph of this
growth rate over the ﬁrst ﬁfteen years is shown in Figure 4.4. The net change in the radius
is
R(t) −R(0) =
t
0
f(s) ds =
t
0
(1.5 + sin(πs/5)) ds.
Integrating the above, we get
R(t) −R(0) =
1.5t −
cos(πs/5)
π/5
t
0
.
Here we have used the fact that the antiderivative of sin(ax) is −(cos(ax)/a).
Thus, using the initial value, R(0) = r
0
(which is a constant), and evaluating the
integral, leads to
R(t) = r
0
+ 1.5t −
5 cos(πt/5)
π
+
5
π
.
70 Chapter 4. Applications of the deﬁnite integral to velocities and rates
(The constant at the end of the expression stems from the fact that cos(0) = 1.) A graph
of the radius of the tree over time (using r
0
= 1) is shown in Figure 4.5. This function
is equivalent to the area associated with the function shown in Figure 4.4. Notice that
Figure 4.5 conﬁrms that the radius keeps growing over the entire period, but that its growth
rate (slope of the curve) alternates between higher and lower values.
time, t
R(t)
0.0 14.0
0.0
25.0
Figure 4.5. The radius of the tree, R(t), as a function of time, obtained by inte
grating the rate of change of radius shown in Fig. 4.4.
After ten years we have
R(10) = r
0
+ 15 −
5
π
cos(2π) +
5
π
.
But cos(2π) = 1, so
R(10) = r
0
+ 15.
(b) The mass of the tree is density times volume, and since the density in this example
is constant, 1 gm/cm
3
, we need only obtain the volume at t = 10. Taking the trunk to be
cylindrical means that the volume at any given time is
V (t) = π[R(t)]
2
h.
The ratio we want is then
V (10)
V (0)
=
π[R(10)]
2
h
πr
2
0
h
=
[R(10)]
2
r
2
0
=
r
0
+ 15
r
0
2
.
In this problem we used simple antidifferentiation to compute the desired total change.
We also related the graph of the radial growth rate in Fig. 4.4 to that of the resulting radius
at time t, in Fig. 4.5.
4.4. Production and removal 71
4.3.3 Birth rates and total births
After World War II, the birth rate in western countries increased dramatically. Suppose that
the number of babies born (in millions per year) was given by
b(t) = 5 + 2t, 0 ≤ t ≤ 10,
where t is time in years after the end of the war.
(a) How many babies in total were born during this time period (i.e in the ﬁrst 10 years
after the war)?
(b) Find the time T
0
such that the total number of babies born from the end of the war
up to the time T
0
was precisely 14 million.
Solution
(a) To ﬁnd the number of births, we would integrate the birth rate, b(t) over the given
time period. The net change in the population due to births (neglecting deaths) is
P(10)−P(0) =
10
0
b(t) dt =
10
0
(5+2t) dt = (5t+t
2
)
10
0
= 50+100 = 150[million babies].
(b) Denote by T the time at which the total number of babies born was 14 million. Then,
[in units of million]
I =
T
0
b(t) dt = 14 =
T
0
(5 + 2t) dt = 5T +T
2
equating I = 14 leads to the quadratic equation, T
2
+ 5T − 14 = 0, which can be
written in the factored form, (T − 2)(T + 7) = 0. This has two solutions, but we
reject T = −7 since we are looking for time after the War. Thus we ﬁnd that T = 2
years, i.e it took two years for 14 million babies to have been born.
While this problem involves simple integration, we had to solve for a quantity (T) based
on information about behaviour of that integral. Many problems in real application involve
such slight twists on the ideas of integration.
4.4 Production and removal
The process of integration can be used to convert rates of production and removal into net
amounts present at a given time. The example in this section is of this type. We investigate
a process in which a substance accumulates as it is being produced, but disappears through
some removal process. We would like to determine when the quantity of material increases,
and when it decreases.
72 Chapter 4. Applications of the deﬁnite integral to velocities and rates
Circadean rhythm in hormone levels
Consider a hormone whose level in the blood at time t will be denoted by H(t). We will
assume that the level of hormone is regulated by two separate processes: one might be
the secretion rate of specialized cells that produce the hormone. (The production rate of
hormone might depend on the time of day, in some cyclic pattern that repeats every 24
hours or so.) This type of cyclic pattern is called circadean rhythm. A competing process
might be the removal of hormone (or its deactivation by some enzymes secreted by other
cells.) In this example, we will assume that both the production rate, p(t), and the removal
rate, r(t), of the hormone are timedependent, periodic functions with somewhat different
phases.
12 6 6 0 0
(noon)
3 3 9 9
r(t)
hour
hormone production/removal rates
t
p(t)
Figure 4.6. The rate of hormone production p(t) and the rate of removel r(t) are
shown here. We want to use these graphs to deduce when the level of hormone is highest
and lowest.
A typical example of two such functions are shown in Figure 4.6. This ﬁgure shows
the production and removal rates over a period of 24 hours, starting at midnight. Our ﬁrst
task will be to use properties of the graph in Figure 4.6 to answer the following questions:
1. When is the production rate, p(t), maximal?
2. When is the removal rate r(t) minimal?
3. At what time is the hormone level in the blood highest?
4. At what time is the hormone level in the blood lowest?
5. Find the maximal level of hormone in the blood over the period shown, assuming
that its basal (lowest) level is H = 0.
Solutions
1. We see directly from Fig. 4.6 that production rate is maximal at about 9:00 am.
4.4. Production and removal 73
2. Similarly, removal rate is minimal at noon.
3. To answer this question we note that the total amount of hormone produced over a
time period a ≤ t ≤ b is
P
total
=
b
a
p(t)dt.
The total amount removed over time interval a ≤ t ≤ b is
R
total
=
b
a
r(t)dt.
This means that the net change in hormone level over the given time interval (amount
produced minus amount removed) is
H(b) −H(a) = P
total
−R
total
=
b
a
(p(t) −r(t))dt.
We interpret this integral as the area between the curves p(t) and r(t). But we
must use caution here: For any time interval over which p(t) > r(t), this integral
will be positive, and the hormone level will have increased. Otherwise, if r(t) <
p(t), the integral yields a negative result, so that the hormone level has decreased.
This makes simple intuitive sense: If production is greater than removal, the level
of the substance is accumulating, whereas in the opposite situation, the substance is
decreasing. With these remarks, we ﬁnd that the hormone level in the blood will be
greatest at 3:00 pm, when the greatest (positive) area between the two curves has
been obtained.
4. Similarly, the least hormone level occurs after a period in which the removal rate has
been larger than production for the longest stretch. This occurs at 3:00 am, just as
the curves cross each other.
5. Here we will practice integration by actually ﬁtting some cyclic functions to the
graphs shown in Figure 4.6. Our ﬁrst observation is that if the length of the cycle
(also called the period) is 24 hours, then the frequency of the oscillation is ω =
(2π)/24 = π/12. We further observe that the functions shown in the Figure 4.7
have the form
p(t) = A(1 + sin(ωt)), r(t) = A(1 + cos(ωt)).
Intersection points occur when
p(t) = r(t)
A(1 + sin(ωt)) = A(1 + cos(ωt)),
sin(ωt) = cos(ωt)),
⇒tan(ωt) = 1.
This last equality leads to ωt = π/4, 5π/4. But then, the fact that ω = π/12 implies
that t = 3, 15. Thus, over the time period 3 ≤ t ≤ 15 hrs, the hormone level is
74 Chapter 4. Applications of the deﬁnite integral to velocities and rates
0
0.5
1
1.5
2
5 10 15 20
t
Figure 4.7. The functions shown above are trigonometric approximations to the
rates of hormone production and removal from Figure 4.6
increasing. For simplicity, we will take the amplitude A = 1. (In general, this would
just be a multiplicative constant in whatever solution we compute.) Then the net
increase in hormone over this period is calculated from the integral
H
total
=
15
3
[p(t) −r(t)] dt =
15
3
[(1 + sin(ωt)) −(1 + cos(ωt))] dt
In the problem set, the reader is asked to compute this integral and to show that the
amount of hormone that accumulated over the time interval 3 ≤ t ≤ 15, i.e. between
3:00 am and 3:00 pm is 24
√
2/π.
4.5 Present value of a continuous income stream
Here we discuss the value of an annuity, which is a kind of savings account that guarantees
a continuous stream of income. You would like to pay P dollars to purchase an annuity that
will pay you an income f(t) every year from now on, for t > 0. In some cases, we might
want a constant income every year, in which case f(t) would be constant. More generally,
we can consider the case that at each future year t, we ask for income f(t) that could vary
from year to year. If the bank interest rate is r, how much should you pay now?
Solution
If we invest P dollars (the “principal” i.e., the amount deposited) in the bank with interest
r (compounded continually) then the amount A(t) in the account at time t (in years), will
4.5. Present value of a continuous income stream 75
grow as follows:
A(t) = P
1 +
r
n
nt
,
where r is the yearly interest rate (e.g. 5%) and n is the number of times per year that
interest is compound (e.g. n = 2 means interest compounded twice per year, n = 12
means monthly compounded interest, etc.) Deﬁne
h =
r
n
. Then n =
r
h
.
Then at time t, we have that
A(t) = P(1 +h)
1
h
rt
= P
(1 +h)
1
h
rt
≈ Pe
rt
for large n or small h.
Here we have used the fact that when h is small (i.e. frequent intervals of compounding)
the expression in square brackets above can be approximated by e, the base of the natural
logarithms. Recall that
e = lim
h→0
(1 +h)
1
h
.
(This result was obtained in a ﬁrst semester calculus course by selecting the base of expo
nentials such that the derivative of e
x
is just e
x
itself.) Thus, we have found that the amount
in the bank at time t will grow as
A(t) = Pe
rt
, (assuming continually compounded interest). (4.12)
Having established the exponential growth of an investment, we return to the question
of how to set up an annuity for a continuous stream of income in the future. Rewriting
Eqn. (4.12), the principle amount that we should invest in order to have A(t) to spend at
time t is
P = A(t)e
−rt
.
Suppose we want to have f(t) spending money for each year t. We refer to the present
value of year t as the quantity
P = f(t)e
−rt
.
(i.e. We must pay P now, in the present, to get f(t) in a future year t.) Summing over all
the years, we ﬁnd that the present value of the continuous income stream is
P =
L
¸
t=1
f(t)e
−rt
· 1
....
“∆t
≈
L
0
f(t)e
−rt
dt,
where L is the expected number of years left in the lifespan of the individual to whom this
annuity will be paid, and where we have approximated a sum of payments by an integral
(of a continuous income stream). One problemis that we do not know in advance how long
76 Chapter 4. Applications of the deﬁnite integral to velocities and rates
the lifespan L will be. As a crude approximation, we could assume that this income stream
continues forever, i.e. that L ≈ ∞. In such an approximation, we have to compute the
integral:
P =
∞
0
f(t)e
−rt
dt. (4.13)
The integral in Eqn. (4.13) is an improper integral (i.e. integral over an unbounded do
main), as we have already encountered in Section 3.8.5. We shall have more to say about
the properties of such integrals, and about their technical deﬁnition, existence, and proper
ties in Chapter 10. We refer to the quantity
P =
∞
0
f(t)e
−rt
dt, (4.14)
as the present value of a continuous income stream f(t).
Example: Setting up an annuity
Suppose we want an annuity that provides us with an annual payment of 10, 000 from the
bank, i.e. in this case f(t) = $10, 000 is a function that has a constant value for every year.
Then from Eqn (4.14),
P =
∞
0
10000e
−rt
dt = 10000
∞
0
e
−rt
dt.
By a previous calculation in Section 3.8.5, we ﬁnd that
P = 10000 ·
1
r
,
e.g. if interest rate is 5% (and assumed constant over future years), then
P =
10000
0.05
= $200, 000.
Therefore, we need to pay $200,000 today to get 10, 000 annually for every future year.
4.6 Average value of a function
In this ﬁnal example, we apply the deﬁnite integral to computing the average height of
a function over some interval. First, we deﬁne what is meant by average value in this
context.
18
Given a function
y = f(x)
over some interval a ≤ x ≤ b, we will deﬁne average value of the function as follows:
18
In Chapters 5 and 8, we will encounter a different type of average (also called mean) that will represent an
average horizontal position or center of mass. It is important to avoid confusing these distinct notions.
4.6. Average value of a function 77
Deﬁnition
The average value of f(x) over the interval a ≤ x ≤ b is
¯
f =
1
b −a
b
a
f(x)dx.
Example 1
Find the average value of the function y = f(x) = x
2
over the interval 2 < x < 4.
Solution
¯
f =
1
4 −2
4
2
x
2
dx =
1
2
x
3
3
4
2
=
1
6
4
3
−2
3
=
28
3
Example 2: Day length over the year
Suppose we want to know the average length of the day during summer and spring. We
will assume that day length follows a simple periodic behaviour, with a cycle length of 1
year (365 days). Let us measure time t in days, with t = 0 at the spring equinox, i.e. the
date in spring when night and day lengths are equal, each being 12 hrs. We will refer to
the number of daylight hours on day t by the function f(t). (We will also call f(t) the
daylength on day t.
A simple function that would describe the cyclic changes of day length over the
seasons is
f(t) = 12 + 4 sin
2πt
365
.
This is a periodic function with period 365 days as shown in Figure 4.8. Its maximal value
is 16h and its minimal value is 8h. The average day length over spring and summer, i.e.
over the ﬁrst (365/2) ≈ 182 days is:
¯
f =
1
182
182
0
f(t)dt
=
1
182
182
0
12 + 4 sin(
πt
182
)
dt
=
1
182
12t −
4 · 182
π
cos(
πt
182
)
182
0
=
1
182
12 · 182 −
4 · 182
π
[cos(π) −cos(0)]
= 12 +
8
π
≈ 14.546 hours (4.15)
78 Chapter 4. Applications of the deﬁnite integral to velocities and rates
summer winter
<=================> <=================>
0.0 365.0
0.0
16.0
summer winter
<=================> <=================>
0.0 365.0
0.0
16.0
Figure 4.8. We show the variations in day length (cyclic curve) as well as the
average day length (height of rectangle) in this ﬁgure.
Thus, on average, the day is 14.546 hrs long during the spring and summer.
In Figure 4.8, we show the entire day length cycle over one year. It is left as an
exercise for the reader to show that the average value of f over the entire year is 12 hrs.
(This makes intuitive sense, since overall, the short days in winter will average out with the
longer days in summer.)
Figure 4.8 also shows geometrically what the average value of the function repre
sents. Suppose we determine the area associated with the graph of f(x) over the interval of
interest. (This area is painted red (dark) in Figure 4.8, where the interval is 0 ≤ t ≤ 365,
i.e. the whole year.) Now let us draw in a rectangle over the same interval (0 ≤ t ≤ 365)
having the same total area. (See the big rectangle in Figure 4.8, and note that its area
matches with the darker, red region.) The height of the rectangle represents the average
value of f(t) over the interval of interest.
4.7 Summary
In this chapter, we arrived at a number of practical applications of the deﬁnite integral.
1. In Section 4.2, we found that for motion at constant acceleration a, the displace
ment of a moving object can be obtained by integrating twice: the deﬁnite integral
of acceleration is the velocity v(t), and the deﬁnite integral of the velocity is the
displacement.
v(t) = v
0
+
t
0
a ds. x(t) = x(0) +
t
0
v(s) ds.
(Here we use the “dummy variable” s inside the integral, but the meaning is, of
course, the same as in the previous presentation of the formulae.) We showed that at
4.7. Summary 79
any time t, the position of an object initially at x
0
with velocity v
0
is
x(t) = x
0
+v
0
t +a
t
2
2
.
2. We extended our analysis of a moving object to the case of nonconstant acceleration
(Section 4.2.4), when air resistance tends to produce a drag force to slow the motion
of a falling object. We found that in that case, the acceleration gradually decreases,
a(t) = ge
−kt
. (The decaying exponential means that a → 0 as t increases.) Again,
using the deﬁnite integral, we could compute a velocity,
v(t) =
t
0
a(s) ds =
g
k
(1 −e
−kt
).
3. We illustrated the connection between rates of change (over time) and total change
(between on time point and another). In general, we saw that if r(t) represents a rate
of change of some process, then
b
a
r(s) ds = Total change over the time interval a ≤ t ≤ b.
This idea was discussed in Section 4.3.
4. In the concluding Section 4.6, we discussed the average value of a function f(x)
over some interval a ≤ x ≤ b,
¯
f =
1
b −a
b
a
f(x)dx.
In the next few chapters we encounters a vast assortment of further examples and
practical applications of the deﬁnite integral, to such topics as mass, volumes, length,
etc. In some of these we will be called on to “dissect” a geometric shape into pieces
that are not simple rectangles. The essential idea of an integral as a sum of many
(inﬁnitesimally) small pieces will, nevertheless be the same.
80 Chapter 4. Applications of the deﬁnite integral to velocities and rates
Chapter 5
Applications of the
deﬁnite integral to
calculating volume,
mass, and length
5.1 Introduction
In this chapter, we consider applications of the deﬁnite integral to calculating geometric
quantities such as volumes of geometric solids, masses, centers of mass, and lengths of
curves.
The idea behind the procedure described in this chapter is closely related to those we
have become familiar with in the context of computing areas. That is, we ﬁrst imagine an
approximation using a ﬁnite number of pieces to represent a desired result. Then, a limiting
process of reﬁnement leads to the desired result. The technology of the deﬁnite integral,
developed in Chapters 2 and 3 applies directly. This means that we need not rederive
the link between Riemann Sums and the deﬁnite integral, we can use these as we did in
Chapter 4.
In the ﬁrst parts of this chapter, we will calculate the total mass of a continuous
density distribution. In this context, we will also deﬁne the concept of a center of mass.
We will ﬁrst motivate this concept for a discrete distribution made up of a number of ﬁnite
masses. Then we show how the same concept can be applied in the continuous case. The
connection between the discrete and continuous representation will form an important link
with our study of analogous concepts in probability, in Chapters 7 and 8.
In the second part of this chapter, we will consider howto dissect certain three dimen
sional solids into a set of simpler parts whose volumes are easy to compute. We will use
familiar formulae for the volumes of disks and cylindrical shells, and carefully construct
a summation to represent the desired volume. The volume of the entire object will then
be obtained by summing up volumes of a stack of disks or a set of embedded shells, and
considering the limit as the thickness of the dissection cuts gets thinner. There are some im
portant differences between material in this chapter and in previous chapters. Calculating
volumes will stretch our imagination, requiring us to visualize 3dimensional (3D) objects,
and how they can be subdivided into shells or slices. Most of our effort will be aimed at
understanding how to set up the needed integral. We provide a number of examples of this
procedure, but ﬁrst we review the basics of elementary volumes that will play the dominant
role in our calculations.
81
82Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
5.2 Mass distributions in one dimension
We start our discussion with a number of example of mass distributed along a single dimen
sion. First, we consider a discrete collection of masses and then generalize to a continuous
density. We will be interested in computing the total mass (by summation or integration)
as well as other properties of the distribution.
In considering the example of mass distributions, it becomes an easy step to develop
the analogous concepts for continuous distributions. This allows us to recapitulate the link
between ﬁnite sums and deﬁnite integrals that we developed in earlier chapters. Examples
in this chapter also further reinforce the idea of density (in the context of mass density).
Later, we will ﬁnd that these ideas are equally useful in the context of probability, explored
in Chapters 7 and 8.
5.2.1 A discrete distribution: total mass of beads on a wire
5
m
1
m
2
m
3
m
4
m
5
x
1
x
2
x
3
x
4
x
Figure 5.1. A discrete distribution of masses along a (one dimensional) wire.
In Figure 5.1 we see a number of beads distributed along a thin wire. We will label
each bead with an index, i = 1 . . . n (there are ﬁve beads so that n = 5). Each bead has a
certain position (that we will think of as the value of x
i
) and a mass that we will call m
i
.
We will think of this arrangement as a discrete mass distribution: both the masses of the
beads, and their positions are of interest to us. We would like to describe some properties
of this distribution.
The total mass of the beads, M, is just the sum of the individual masses, so that
M =
n
¸
i=1
m
i
. (5.1)
5.2.2 A continuous distribution: mass density and total mass
We now consider a continuous mass distribution where the mass per unit length (“density”)
changes gradually from one point to another. For example, the bar in Figure 5.2 has a
density that varies along its length.
The portion at the left is made of lighter material, or has a lower density than the
portions further to the right. We will denote that density by ρ(x) and this carries units of
mass per unit length. (The density of the material along the length of the bar is shown in
the graph.) How would we ﬁnd the total mass?
Suppose the bar has length L and let x (0 ≤ x ≤ L) denote position along that bar.
Let us imagine dividing up the bar into small pieces of length ∆x as shown in Figure 5.2.
5.2. Mass distributions in one dimension 83
x
mass distribution
ρ( ) x
...
Δ x
m
1
m
2
m
n
x
1
x
2
x
n
...
Figure 5.2. Top: A continuous mass distribution along a one dimensional bar,
discussed in Example 5.3.3. The density of the bar (mass per unit length), ρ(x) is shown
on the graph. Bottom: the discretized approximation of this same distribution. Here we
have subdivided the bar into n smaller pieces, each of length ∆x. The mass of each piece
is approximately m
k
= ρ(x
k
)∆x where x
k
= k∆x. The total mass of the bar (“sum of
all the pieces”) will be represented by an integral (5.2) as we let the size, ∆x, of the pieces
become inﬁnitesimal.
The coordinates of the endpoints of those pieces are then
x
0
= 0, . . . , x
k
= k∆x, . . . , x
N
= L
and the corresponding masses of each of the pieces are approximately
m
k
= ρ(x
k
)∆x.
(Observe that units are correct, that is mass
k
=(mass/length)· length. Note that ∆x has units
of length.) The total mass is then a sum of masses of all the pieces, and, as we have seen in
an earlier chapter, this sum will approach the integral
M =
L
0
ρ(x)dx (5.2)
as we make the size of the pieces smaller.
84Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
We can also deﬁne a cumulative function for the mass distribution as
M(x) =
x
0
ρ(s)ds. (5.3)
Then M(x) is the total mass in the part of the interval between the left end (assumed at
0) and the position x. The idea of a cumulative function becomes useful in discussions of
probability in Chapter 8.
5.2.3 Example: Actin density inside a cell
Biologists often describe the density of protein, receptors, or other molecules in cells. One
example is shown in Fig. 5.3. Here we show a keratocyte, which is a cell from the scale
of a ﬁsh. A band of actin ﬁlaments (protein responsible for structure and motion of the
2
nucleus
actin cortex
actin cortex
b
c
d
−1
1
−1 0 1
x
=1−x
ρ
Figure 5.3. A cell (keratocyte) shown in (a) has a dense distribution of actin
in a band called the actin cortex. In (b) we show a schematic sketch of the actin cortex
(shaded). In (c) that band of actin is scaled and straightened out so that it occupies a
length corresponding to the interval −1 ≤ x ≤ 1. We are interested in the distribution
of actin ﬁlaments across that band. That distribution is shown in (d). Note that actin is
densest in the middle of the band. (a) Credit to Alex Mogilner.
cell) are found at the edge of the cell in a band called the actin cortex. It has been found
experimentally that the density of actin is greatest in the middle of the band, i.e. the position
corresponding to the midpoint of the edge of the cell shown in Fig. 5.3a. According to
Alex Mogilner
19
, the density of actin across the cortex in ﬁlaments per edge µm is well
approximated by a distribution of the form
ρ(x) = α(1 −x
2
) −1 ≤ x ≤ 1,
where x is the fraction of distance
20
from midpoint to the end of the band (Fig. 5.3c and d).
Here ρ(x) is an actin ﬁlament density in units of ﬁlaments per µm. That is, ρ is the number
19
Alex Mogilner is a professor of mathematics who specializes in cell biology and the actin cytoskeleton
20
Note that 1µm (read “ 1 micrometer” or “micron”) is 10
−6
meters, and is appropriate for measuring lengths
of small objects such as cells.
5.3. Mass distribution and the center of mass 85
of actin ﬁbers per unit length.
We can ﬁnd the total number of actin ﬁlaments, N in the band by integration, i.e.
N =
1
−1
α(1 −x
2
) dx = α
1
−1
(1 −x
2
) dx.
The integral above has already been computed (Integral 2.) in the Examples 3.6.2 of Chap
ter 3 and was found to be 4/3. Thus, we have that there are N = 4α/3 actin ﬁlaments in
the band.
5.3 Mass distribution and the center of mass
It is useful to describe several other properties of mass distributions. We ﬁrst deﬁne the
“center of mass”, ¯ x which is akin to an average x coordinate.
5.3.1 Center of mass of a discrete distribution
The center of mass ¯ x of a mass distribution is given by:
¯ x =
1
M
n
¸
i=1
x
i
m
i
.
This can also be written in the form
¯ x =
¸
n
i=1
x
i
m
i
¸
n
i=1
m
i
.
5.3.2 Center of mass of a continuous distribution
We can generalize the concept of the center of mass for a continuous mass density. Our
usual approach of subdividing the interval 0 ≤ x ≤ L and computing a Riemann sum leads
to
¯ x =
1
M
n
¸
i=1
x
i
ρ(x
i
)∆x.
As ∆x → dx, this becomes an integral. Based on this, it makes sense to deﬁne the center
of mass of the continuous mass distribution as follows:
¯ x =
1
M
L
0
xρ(x)dx .
We can also write this in the form
¯ x =
L
0
xρ(x)dx
L
0
ρ(x)dx
.
86Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
5.3.3 Example: Center of mass vs average mass density
Here we distinguish between two (potentially confusing) quantities in the context of an
example.
A long thin bar of length L is made of material whose density varies along the length
of the bar. Let x be distance fromone end of the bar. Suppose that the mass density is given
by
ρ(x) = ax, 0 ≤ x ≤ L.
This type of mass density is shown in a panel in Fig. 5.2.
(a) Find the total mass of the bar.
(b) Find the average mass density along the bar.
(c) Find the center of mass of the bar.
(d) Where along the length of the bar should you cut to get two pieces of equal mass?
Solution
(a) From our previous discussion, the total mass of the bar is
M =
L
0
ax dx =
ax
2
2
L
0
=
aL
2
2
.
(b) The average mass density along the bar is computed just as one would compute the
average value of a function: integrate the function over an interval and divide by the
length of the interval. An example of this type appeared in Section 4.6. Thus
¯ ρ =
1
L
L
0
ρ(x) dx =
1
L
aL
2
2
=
aL
2
A bar having a uniform density ¯ ρ = aL/2 would have the same total mass as the bar
in this example. (This is the physical interpretation of average mass density.)
(c) The center of mass of the bar is
¯ x =
L
0
xp(x) dx
M
=
1
M
L
0
ax
2
dx =
a
M
x
3
3
L
0
=
2a
aL
2
L
3
3
=
2
3
L.
Observe that the center of mass is an “average x coordinate”, which is not the same
as the average mass density.
(d) We can use the cumulative function deﬁned in Eqn. (5.3) to ﬁgure out where half of
the mass is concentrated. Suppose we cut the bar at some position x = s. Then the
mass of this part of the bar is
M
1
=
s
0
ρ(x) dx =
as
2
2
,
5.4. Miscellaneous examples and related problems 87
We ask for what values of s is it true that M
1
is exactly half the total mass? Using
the result of part (a), we ﬁnd that for this to be true, we must have
M
1
=
M
2
, ⇒
as
2
2
=
1
2
aL
2
2
Solving for s leads to
s =
1
√
2
L =
√
2
2
L.
Thus, cutting the bar at a distance (
√
2/2)L from x = 0 results in two equal masses.
Remark: the position that subdivides the mass into two equal pieces is analogous to
the idea of a median. This concept will appear again in the context of probability in
Chapter 8.
5.3.4 Physical interpretation of the center of mass
The center of mass has a physical interpretation: it is the point at which the mass would
“balance”. In the Appendix 11.3 we discuss this in detail.
5.4 Miscellaneous examples and related problems
The idea of mass density can be extended to related problems of various kinds. We give
some examples in this section.
Up to now, we have seen examples of mass distributed in one dimension: beads on a
wire, actin density along the edge of a cell, (in Chapter 4), or a bar of varying density. For
the continuous distributions, we determined the total mass by integration. Underlying the
integral we computed was the idea that the interval could be “dissected” into small parts
(of width ∆x), and a sum of pieces transformed into an integral. In the next examples, we
consider similar ideas, but instead of dissecting the region into 1dimensional intervals, we
have slightly more interesting geometries.
5.4.1 Example: A glucose density gradient
A cylindrical testtube of radius r, and height h, contains a solution of glucose which has
been prepared so that the concentration of glucose is greatest at the bottom and decreases
gradually towards the top of the tube. (This is called a density gradient). Suppose that the
concentration c as a function of the depth x is c(x) = 0.1 + 0.5x grams per centimeter
3
.
(x = 0 at the top of the tube, and x = h at the bottom of the tube.) In Figure 5.4 we show a
schematic version of what this gradient might look like. (In reality, the transition between
high and low concentration would be smoother than shown in this ﬁgure.) Determine the
total amount of glucose in the tube (in gm). Neglect the “rounded” lower portion of the
tube: i.e. assume that it is a simple cylinder.
88Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
x=0
r
x=h
Δ x
Figure 5.4. A testtube of radius r containing a gradient of glucose. A diskshaped
slice of the tube with small thickness ∆x has approximately constant density.
Solution
We assume a simple cylindrical tube and consider imaginary “slices” of this tube along
its vertical axis, here labeled as the “x” axis. Suppose that the thickness of a slice is ∆x.
Then the volume of each of these (disk shaped) slices is πr
2
∆x. The amount of glucose in
the slice is approximately equal to the concentration c(x) multiplied by the volume of the
slice, i.e. the small slice contains an amount πr
2
∆xc(x) of glucose. In order to sum up the
total amount over all slices, we use a deﬁnite integral. (As before, we imagine ∆x → dx
becoming “inﬁnitesimal” as the number of slices increases.) The integral we want is
G = πr
2
h
0
c(x) dx.
Even though the geometry of the testtube, at ﬁrst glance, seems more complicated than
the onedimensional highway described in Chapter 4, we observe here that the integral
that computes the total amount is still a sum over a single spatial variable, x. (Note the
resemblance between the integrals
I =
L
0
C(x) dx and G = πr
2
h
0
c(x) dx,
here and in the previous example.) We have neglected the complication of the rounded bot
tomportion of the testtube, so that integration over its length (which is actually summation
of disks shown in Figure 5.4) is a onedimensional problem.
In this case the total amount of glucose in the tube is
G = πr
2
h
0
(0.1 + 0.5x)dx = πr
2
0.1x +
0.5x
2
2
h
0
= πr
2
0.1h +
0.5h
2
2
.
Suppose that the height of the testtube is h = 10 cm and its radius is r = 1 cm.
Then the total mass of glucose is
G = π
0.1(10) +
0.5(100)
2
= π (1 + 25) = 26π gm.
5.4. Miscellaneous examples and related problems 89
In the next example, we consider a circular geometry, but the concept of dissecting
and summing is the same. Our task is to determine how to set up the problem in terms
of an integral, and, again, we must imagine which type of subdivision would lead to the
summation (integration) needed to compute total amount.
5.4.2 Example: A circular colony of bacteria
A circular colony of bacteria has radius of 1 cm. At distance r from the center of the
colony, the density of the bacteria, in units of one million cells per square centimeter, is
observed to be b(r) = 1−r
2
(Note: r is distance from the center in cm, so that 0 ≤ r ≤ 1).
What is the total number of bacteria in the colony?
(one ring)
Δ r
Side view
Top−down view
r
b(r)
b(r)=1−r
r
2
Figure 5.5. A colony of bacteria with circular symmetry. A ring of small thickness
∆r has roughly constant density. The superimposed curve on the left is the bacterial density
b(r) as a function of the radius r.
Solution
Figure 5.5 shows a rough sketch of a ﬂat surface with a colony of bacteria growing on it.
We assume that this distribution is radially symmetric. The density as a function of distance
from the center is given by b(r), as shown in Figure 5.5. Note that the function describing
density, b(r) is smooth, but to accentuate the strategy of dissecting the region, we have
shown a topdown view of a ring of nearly constant density on the right in Figure 5.5. We
see that this ring occupies the region between two circles, e.g. between a circle of radius r
and a slightly bigger circle of radius r + ∆r. The area of that “ring”
21
would then be the
area of the larger circle minus that of the smaller circle, namely
A
ring
= π(r +∆r)
2
−πr
2
= π(2r∆r + (∆r)
2
).
However, if we make the thickness of that ring really small (∆r → 0), then the quadratic
term is very very small so that
A
ring
≈ 2πr∆r.
21
Students commonly make the error of writing A
ring
= π(r + ∆r − r)
2
= π(∆r)
2
. This trap should be
avoided! It is clear that the correct expression has additional terms, since we really are computing a difference of
two circular areas.
90Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
Consider all the bacteria that are found inside a “ring” of radius r and thickness ∆r
(see Figure 5.5.) The total number within such a ring is the product of the density, b(r) and
the area of the ring, i.e.
b(r) · (2πr∆r) = 2πr(1 −r
2
)∆r.
To get the total number in the colony we sum up over all the rings from r = 0 to r = 1
and let the thickness, ∆r → dr become very small. But, as with other examples, this is
equivalent to calculating a deﬁnite integral, namely:
B
total
=
1
0
(1 −r)(2πr) dr = 2π
1
0
(1 −r
2
)rdr = 2π
1
0
(r −r
3
)dr.
We calculate the result as follows:
B
total
= 2π
r
2
2
−
r
4
4
1
0
= (πr
2
−π
r
4
2
)
1
0
= π −
π
2
=
π
2
.
Thus the total number of bacteria in the entire colony is π/2 million which is approximately
1.57 million cells.
5.5 Volumes of solids of revolution
We now turn to the problem of calculating volumes of 3D solids. Here we restrict attention
to symmetric objects denoted solids of revolution. The outer surface of these objects is
generated by revolving some curve around a coordinate axis. In Figure 5.7 we show one
such curve, and the surface it forms when it is revolved about the y axis.
5.5.1 Volumes of cylinders and shells
Before starting the calculation, let us recall the volumes of some of the geometric shapes
that are to be used as elementary pieces into which our shapes will be carved. See Fig
ure 5.6.
1. The volume of a cylinder of height h having circular base of radius r, is
V
cylinder
= πr
2
h.
2. The volume of a circular disk of thickness τ, and radius r (shown on the left in
Figure 5.6 ), is a special case of the above,
V
disk
= πr
2
τ.
3. The volume of a cylindrical shell of height h, with circular radius r and small
thickness τ (shown on the right in Figure 5.6) is
V
shell
= 2πrhτ.
(This approximation holds for τ << r.)
5.5. Volumes of solids of revolution 91
r
h
τ
r
τ
disk shell
Figure 5.6. The volumes of these simple 3D shapes are given by simple formulae.
We use them as basic elements in computing more complicated volumes. Here we will
present examples based on disks. In Appendix 11.4 we give an example based on shells.
y
x x
x
y
y
Figure 5.7. A solid of revolution is formed by revolving a region in the xyplane
about the yaxis. We show how the region is approximated by rectangles of some given
width, and how these form a set of approximating disks for the 3D solid of revolution.
5.5.2 Computing the Volumes
Consider the curve in Figure 5.7 and the surface it forms when it is revolved about the
y axis. In the same ﬁgure, we also show how a set of approximating rectangular strips
associated with the planar region (grey rectangles) lead to a set of stacked disks (orange
shapes) that approximate the volume of the solid (greenish object in Fig. 5.7). The total
volume of the disks is not the same as the volume of the object but if we make the thickness
of these disks very small, the approximation of the true volume is good. In the limit, as
the thickness of the disks becomes inﬁnitesimal, we arrive at the true volume of the solid
of revolution. The reader should recognize a familiar theme. We used the same concept in
92Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
computing areas using Riemann sums based on rectangular strips in Chapter 2.
Fig. 5.8 similarly shows a volume of revolution obtained by revolving the graph of
the function y = f(x) about the x axis. We note that if this surface is cut into slices, the
radius of the crosssections depend on the position of the cut. Let us imagine a stack of
disks approximating this volume. One such disk has been pulled out and labeled for our
inspection. We note that its radius (in the y direction) is given by the height of the graph
of the function, so that r = f(x). The thickness of the disk (in the x direction) is ∆x. The
volume of this single disk is then v = π[f(x)]
2
∆x. Considering this disk to be based at
the k’th coordinate point in the stack, i.e. at x
k
, means that its volume is
v
k
= π[f(x
k
)]
2
∆x.
Summing up the volumes of all such disks in the stack leads to the total volume of disks
V
disks
=
¸
k
π[f(x
k
)]
2
∆x.
When we increase the number of disks, making each one thinner so that ∆x → 0, we
disk thickness:
disk radius:
x Δ
x
y
x
y=f(x)
r=f(x)
Figure 5.8. Here the solid of revolution is formed by revolving the curve y = f(x)
about the y axis. A typical disk used to approximate the volume is shown. The radius of the
disk (parallel to the y axis) is r = y = f(x). The thickness of the disk (parallel to the x
axis) is ∆x. The volume of this disk is hence v = π[f(x)]
2
∆x
arrive at a deﬁnite integral,
V =
b
a
π[f(x)]
2
dx.
In most of the examples discussed in this chapter, the key step is to make careful observation
of the way that the radius of a given disk depends on the function that generates the surface.
5.5. Volumes of solids of revolution 93
(By this we mean the function that speciﬁes the curve that forms the surface of revolution.)
We also pay attention to the dimension that forms the disk thickness, ∆x.
Some of our examples will involve surfaces revolved about the x axis, and others
will be revolved about the y axis. In setting up these examples, a diagram is usually quite
helpful.
Example 1: Volume of a sphere
k
y
x
f(x )
k
Δ x
Δ x
x
Figure 5.9. When the semicircle (on the left) is rotated about the x axis, it gen
erates a sphere. On the right, we show one disk generated by the revolution of the shaded
rectangle.
We can think of a sphere of radius R as a solid whose outer surface is formed by
rotating a semicircle about its long axis. A function that describe a semicircle (i.e. the
top half of the circle, y
2
+x
2
= R
2
) is
y = f(x) =
R
2
−x
2
.
In Figure 5.9, we show the sphere dissected into a set of disks, each of width ∆x. The disks
are lined up along the x axis with coordinates x
k
, where −R ≤ x
k
≤ R. These are just
integer multiples of the slice thickness ∆x, so for example,
x
0
= −R, x
1
= −R+∆x, . . . , x
k
= −R +k∆x .
The radius of the disk depends on its position
22
. Indeed, the radius of a disk through the x
axis at a point x
k
is speciﬁed by the function r
k
= f(x
k
). The volume of the k’th disk is
V
k
= πr
2
k
∆x.
By the above remarks, using the fact that the function f(x) determines the radius, we have
V
k
= π[f(x
k
)]
2
∆x,
22
Note that the radius is oriented along the y axis, so sometimes we may write this as r
k
= y
k
= f(x
k
)
94Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
V
k
= π
¸
R
2
−x
2
k
2
∆x = π(R
2
−x
2
k
)∆x.
The total volume of all the disks is
V =
¸
k
V
k
=
¸
k
π[f(x
k
)]
2
∆x = π
¸
k
(R
2
−x
2
k
)∆x.
as ∆x →0, this sum becomes a deﬁnite integral, and represents the true volume. We start
the summation at x = −R and end at x
N
= R since the semicircle extends fromx = −R
to x = R. Thus
V
sphere
=
R
R
π[f(x
k
)]
2
dx = π
R
R
(R
2
−x
2
) dx.
We compute this integral using the Fundamental Theorem of calculus, obtaining
V
sphere
= π
R
2
x −
x
3
3
R
−R
.
Observe that this is twice the volume obtained for the interval 0 < x < R,
V
sphere
= 2π
R
2
x −
x
3
3
R
0
= 2π
R
3
−
R
3
3
.
We often use such symmetry properties to simplify computations. After simpliﬁcation, we
arrive at the familiar formula
V
sphere
=
4
3
πR
3
.
Example 2: Volume of a paraboloid
Consider the curve
y = f(x) = 1 −x
2
.
If we rotate this curve about the y axis, we will get a paraboloid, as shown in Figure 5.10.
In this section we show how to compute the volume by dissecting into disks stacked up
along the y axis.
Solution
The object has the y axis as its axis of symmetry. Hence disks are stacked up along the
y axis to approximate this volume. This means that the width of each disk is ∆y. This
accounts for the dy in the integral below. The volume of each disk is
V
disk
= πr
2
∆y,
where the radius, r is now in the direction parallel to the x axis. Thus we must express
radius as
r = x = f
−1
(y),
5.5. Volumes of solids of revolution 95
2
y
x x
y
y=f(x)=1−x
Figure 5.10. The curve that generates the shape of a paraboloid (left) and the
shape of the paraboloid (right).
i.e, we invert the relationship to obtain x as a function of y. From y = 1 − x
2
we have
x
2
= 1 −y so x =
√
1 −y. The radius of a disk at height y is therefore r = x =
√
1 −y.
The shape extends from a smallest value of y = 0 up to y = 1. Thus the volume is
V = π
1
0
[f(y)]
2
dy = π
1
0
[
1 −y]
2
dy.
It is helpful to note that once we have identiﬁed the thickness of the disks (∆y), we are
guided to write an integral in terms of the variable y, i.e. to reformulate the equation
describing the curve. We compute
V = π
1
0
(1 −y) dy = π
y −
y
2
2
1
0
= π
1 −
1
2
=
π
2
.
The above example was set up using disks. However, there are other options. In
Appendix 11.4 we show yet another method, comprised of cylindrical shells to compute
the volume of a cone. In some cases, one method is preferable to another, but here either
method works equally well.
Example 3
Find the volume of the surface formed by rotating the curve
y = f(x) =
√
x, 0 ≤ x ≤ 1
(a) about the x axis. (b) about the y axis.
Solution
(a) If we rotate this curve about the x axis, we obtain a bowl shape; dissecting this
surface leads to disks stacked along the x axis, with thickness ∆x → dx, with radii
in the y direction, i.e. r = y = f(x), and with x in the range 0 ≤ x ≤ 1. The
96Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
volume will thus be
V = π
1
0
[f(x)]
2
dx = π
1
0
[
√
x]
2
dx = π
1
0
x dx = π
x
2
2
1
0
=
π
2
.
(b) When the curve is rotated about the y axis, it forms a surface with a sharp point at the
origin. The disks are stacked along the y axis, with thickness ∆y →dy, and radii in
the x direction. We must rewrite the function in the form
x = g(y) = y
2
.
We now use the interval along the y axis, i.e. 0 < y < 1 The volume is then
V = π
1
0
[f(y)]
2
dy = π
1
0
[y
2
]
2
dy = π
1
0
y
4
dy = π
y
5
5
1
0
=
π
5
.
5.6 Length of a curve: Arc length
Analytic geometry provides a simple way to compute the length of a straight line segment,
based on the distance formula
23
. Recall that, given points P
1
= (x
1
, y
1
) and P
2
= (x
2
, y
2
),
the length of the line joining those points is
d =
(x
2
−x
1
)
2
+ (y
2
−y
1
)
2
.
Things are more complicated for “curves” that are not straight lines, but in many cases, we
are interested in calculating the length of such curves. In this section we describe how this
can be done using the deﬁnite integral “technology”.
The idea of dissection also applies to the problem of determining the length of a
curve. In Figure 5.11, we see the general idea of subdividing a curve into many small
“arcs”. Before we look in detail at this construction, we consider a simple example, shown
in Figure 5.12. In the triangle shown, by the Pythagorean theorem we have the length of
the sloped side related as follows to the side lengths ∆x, ∆y:
∆
2
= ∆x
2
+∆y
2
,
∆ =
∆x
2
+∆y
2
=
1 +
∆y
2
∆x
2
∆x =
1 +
∆y
∆x
2
∆x.
We now consider a curve given by some function
y = f(x) a < x < b,
as shown in Figure 5.11(a). We will approximate this curve by a set of line segments, as
shown in Figure 5.11(b). To obtain these, we have selected some step size ∆x along the
x axis, and placed points on the curve at each of these x values. We connect the points
with straight line segments, and determine the lengths of those segments. (The total length
23
The reader should recall that this formula is a simple application of Pythagorean theorem.
5.6. Length of a curve: Arc length 97
x
y
y=f(x)
x
y
y=f(x)
x
y
y=f(x)
x
y
Δ
Δ
Figure 5.11. Top: Given the graph of a function, y = f(x) (at left), we draw
secant lines connecting points on its graph at values of x that are multiples of ∆x (right).
Bottom: a small part of this graph is shown, and then enlarged, to illustrate the relationship
between the arc length and the length of the secant line segment.
y
x
x
y
Δ
Δ
Δ l
Figure 5.12. The basic idea of arclength is to add up lengths ∆l of small line
segments that approximate the curve.
of the segments is only an approximation of the length of the curve, but as the subdivision
gets ﬁner and ﬁner, we will arrive at the true total length of the curve.)
We show one such segment enlarged in the circular inset in Figure 5.11. Its slope,
shown at right is given by ∆y/∆x. According to our remarks, above, the length of this
98Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
segment is given by
∆ =
1 +
∆y
∆x
2
∆x.
As the step size is made smaller and smaller ∆x →dx, ∆y →dy and
∆ →
1 +
dy
dx
2
dx.
We recognize the ratio inside the square root as as the derivative, dy/dx. If our curve is
given by a function y = f(x) then we can rewrite this as
d =
1 + (f
(x))
2
dx.
Thus, the length of the entire curve is obtained from summing (i.e. adding up) these small
pieces, i.e.
L =
b
a
1 + (f
(x))
2
dx. (5.4)
Example 1
Find the length of a line whose slope is −2 given that the line extends fromx = 1 to x = 5.
Solution
We could ﬁnd the equation of the line and use the distance formula. But for the purpose of
this example, we apply the method of Equation (5.4): we are given that the slope f
(x) is
2. The integral in question is
L =
5
1
1 + (f
(x))
2
dx =
5
1
1 + (−2)
2
dx =
1
5
√
5 dx.
We get
L =
√
5
5
1
dx =
√
5x
5
1
=
√
5[5 −1] = 4
√
5.
Example 2
Find an integral that represents the length of the curve that forms the graph of the function
y = f(x) = x
3
, 1 < x < 2.
Solution
We ﬁnd that
dy
dx
= f
(x) = 3x
2
.
5.6. Length of a curve: Arc length 99
Thus, the integral is
L =
2
1
1 + (3x
2
)
2
dx =
2
1
1 + 9x
4
dx.
At this point, we will not attempt to ﬁnd the actual length, as we must ﬁrst develop tech
niques for ﬁnding the antiderivative for functions such as
√
1 + 9x
4
.
Using the spreadsheet to calculate arclength
Most integrals for arclength contain square roots and functions that are not easy to integrate,
simply because their antiderivatives are difﬁcult to determine. However, now that we know
the idea behind determining the length of a curve, we can apply the ideas developed have
to approximate the length of a curve “numerically”. The spreadsheet is a simple tool for
doing the necessary summations.
As an example, we show here how to calculate the length of the curve
y = f(x) = 1 −x
2
for 0 ≤ x ≤ 1
using a simple numerical procedure implemented on the spreadsheet.
Let us choose a step size of ∆x = 0.1 along the x axis, for the interval 0 ≤ x ≤ 1.
We calculate the function, the slopes of the little segments (change in y divided by change
in x), and from this, compute the length of each segment
∆ =
1 + (∆y/∆x)
2
∆x
and the accumulated length along the curve from left to right, L which is just a sum of
such values. The Table 5.6 shows steps in the calculation of the ratio ∆y/∆x, the value
of ∆, the cumulative sum, and, ﬁnally the total length L. The ﬁnal value of L = 1.4782
represents the total length of the curve over the entire interval 0 < x < 1.
In Figure 5.13(a) we show the actual curve y = 1 − x
2
. with points placed on it
at each multiple of ∆x. In Figure 5.13(b), we show (in blue) how the lengths of the little
straightline segments connecting these points changes across the interval. (The segments
on the left along the original curve are nearly ﬂat, so their length is very close to ∆x. The
segments on the right part of the curve are much more sloped, and their lengths are thus
bigger.) We also show (in red) how the total accumulated length L depends on the position
x across the interval. This function represents the total arclength of the curve y = 1 −x
2
,
from x = 0 up to a given x value. At x = 1 this function returns the value y = L, as it has
added up the full length of the curve for 0 ≤ x ≤ 1.
5.6.1 How the alligator gets its smile
The American alligator, Alligator mississippiensis has a set of teeth best viewed at some
distance. The regular arrangement of these teeth, i.e. their spacing along the jaw is im
portant in giving the reptile its famous bite. We will concern ourselves here with how that
pattern of teeth is formed as the alligator develops from its embryonic stage to that of an
100Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
y = f(x) =1x^2
0.0 1.0
0.0
1.5
y = f(x) =1x^2
cumulative length L
length increment
Arc Length
0.0 1.0
0.0
1.5
Figure 5.13. The spreadsheet can be used to compute approximate values of
integrals, and hence to calculate arclength. Shown here is the graph of the function y =
f(x) = 1 − x
2
for 0 ≤ x ≤ 1, together with the length increment and the cumulative
arclength along that curve.
5.6. Length of a curve: Arc length 101
x y = f(x) ∆y/∆x ∆ L =
¸
∆
0. 0 1.0000 0.1 0.1005 0.0000
0.1 0.9900 0.3 0.1044 0.1005
0.2 0.9600 0.5 0.1118 0.2049
0.3 0.9100 0.7 0.1221 0.3167
0.4 0.8400 0.9 0.1345 0.4388
0.5 0.7500 1.1 0.1487 0.5733
0.6 0.6400 1.3 0.1640 0.7220
0.7 0.5100 1.5 0.1803 0.8860
0.8 0.3600 1.7 0.1972 1.0663
0.9 0.1900 1.9 0.2147 1.2635
1. 0 0.0000 2.1 0.2326 1.4782
Table 5.1. For the function y = f(x) = 1 −x
2
, and 0 ≤ x ≤ 1, we show how to
calculate an approximation to the arclength using the spreadsheet.
adult. As is the case in humans, the teeth on an alligator do not form or sprout simultane
ously. In the development of the baby alligator, there is a sequence of initiation of teeth,
one after the other, at welldeﬁned positions along the jaw.
Paul Kulesa, a former student of James D Muray, set out to understand the pattern of
development of these teeth, based on data in the literature about what happens at distinct
stages of embryonic growth. Of interest in his research were several questions, including
what determines the positions and timing of initiation of individual teeth, and what mecha
nisms lead to this pattern of initiation. One theory proposed by this group was that chemical
signals that diffuse along the jaw at an early stage of development give rise to instructions
that are interpreted by jaw cells: where the signal is at a high level, a tooth will start to
initiate.
While we will not address the details of the mechanism of development here, we
will ﬁnd a simple application of the ideas of arclength in the developmental sequence of
teething. Shown in Figure 5.14 is a smiling baby alligator (no doubt thinking of some
future tasty meal). A close up of its smile (at an earlier stage of development) reveals the
shape of the jaw, together with the sites at which teeth are becoming evident. (One of these
sites, called primordia, is shown enlarged in an inset in this ﬁgure.)
Paul Kulesa found that the shape of the alligator’s jaw can be described remarkably
well by a parabola. A proper choice of coordinate system, and some experimentation leads
to the equation of the best ﬁt parabola
y = f(x) = −ax
2
+b
where a = 0.256, and b = 7.28 (in units not speciﬁed).
We show this curve in Figure 5.15(a). Also shown in this curve is a set of points at
which teeth are found, labelled by order of appearance. In Figure 5.15(b) we see the same
curve, but we have here superimposed the function L(x) given by the arc length along the
102Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
curve from the front of the jaw (i.e. the top of the parabola), i.e.
L(x) =
x
0
1 + [f
(s)]
2
ds.
This curve measures distance along the jaw, from front to back. The distances of the teeth
from one another, or along the curve of the jaw can be determined using this curve if we
know the x coordinates of their positions.
The table below gives the original data, courtesy of Dr. Kulesa, showing the order
of the teeth, their (x, y) coordinates, and the value of L(x) obtained from the arclength
formula. We see from this table that the teeth do not appear randomly, nor do they ﬁll in
the jaw in one sweep. Rather, they appear in several stages.
In Figure 5.15(c), we show the pattern of appearance: Plotting the distance along the
jaw of successive teeth reveals that the teeth appear in waves of nearly equallyspaced sites.
(By equally spaced, we refer to distance along the parabolic jaw.) The ﬁrst wave (teeth 1, 2,
3) are followed by a second wave (4, 5, 6, 7), and so on. Each wave forms a linear pattern
of distance from the front, and each successive wave ﬁlls in the gaps in a similar, equally
spaced pattern.
The true situation is a bit more complicated: the jaw grows as the teeth appear as
shown in 5.15(c). This has not been taken into account in our simple treatment here, where
we illustrate only the essential idea of arc length application.
Tooth number position distance along jaw
x y L(x)
1 1.95 6.35 2.1486
2 3.45 4.40 4.7000
3 4.54 2.05 7.1189
4 1.35 6.95 1.4000
5 2.60 5.50 3.2052
6 3.80 3.40 5.4884
7 5.00 1.00 8.4241
8 3.15 4.80 4.1500
9 4.25 2.20 6.3923
10 4.60 1.65 7.3705
11 0.60 7.15 0.6072
12 3.45 4.05 4.6572
13 5.30 0.45 9.2644
Table 5.2. Data for the appearance of teeth, in the order in which they appear
as the alligator develops. We can use arclength computations to determine the distances
between successive teeth.
5.6. Length of a curve: Arc length 103
Figure 5.14. Alligator mississippiensis and its teeth
104Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
1
2
3
4
5
6
7
8
9
10
11
12
13
Alligator teeth
6.0 6.0
0.0
8.0
jaw y = f(x)
arc length L(x) along jaw
0.0 5.5
0.0
10.0
(a) (b)
Distance along jaw
1
2
3
4
5
6
7
8
9
10
11
12
13
teeth in order of appearance
0.0 13.0
0.0
10.0
(c) (d)
Figure 5.15. (a) The parabolic shape of the jaw, showing positions of teeth and
numerical order of emergence. (b) Arc length along the jaw from front to back. (c) Distance
of successive teeth along the jaw. (d) Growth of the jaw.
5.7. Summary 105
5.6.2 References
1. P.M. Kulesa and J.D. Murray (1995). Modelling the Wavelike Initiation of Teeth
Primordia in the Alligator. FORMA. Cover Article. Vol. 10, No. 3, 259280.
2. J.D. Murray and P.M. Kulesa (1996). On A Dynamic ReactionDiffusion Mechanism
for the Spatial Patterning of Teeth Primordia in the Alligator. Journal of Chemical
Physics. J. Chem. Soc., Faraday Trans., 92 (16), 29272932.
3. P.M. Kulesa, G.C. Cruywagen, S.R. Lubkin, M.W.J. Ferguson and J.D. Murray (1996).
Modelling the Spatial Patterning of Teeth Primordia in the Alligator. Acta Biotheo
retica, 44, 153164.
5.7 Summary
Here are the main points of the chapter:
1. We introduced the idea of a spatially distributed mass density ρ(x) in Section 5.2.2.
Here the deﬁnite integral represents
b
a
ρ(x) dx = Total mass in the interval a ≤ x ≤ b.
2. In this chapter, we deﬁned the center of mass of a (discrete) distribution of n masses
by
¯
X =
1
M
n
¸
i=0
x
i
m
i
. (5.5)
We developed the analogue of this for a continuous mass distribution (distributed in
the interval 0 ≤ x ≤ L). We deﬁned the center of mass of a continuous distribution
by the deﬁnite integral
¯ x =
1
M
L
0
xρ(x)dx . (5.6)
Importantly, the quantities m
i
in the sum (5.5) carry units of mass, whereas the
analogous quantities in (5.6) are ρ(x)dx. [Recall that ρ(x) is a mass per unit length
in the case of mass distributed along a bar or straight line.]
3. We deﬁned a cumulative function. In the discrete case, this was deﬁned as In the
continuous case, it is
M(x) =
x
0
ρ(s)ds.
4. The mean is an average x coordinate, whereas the median is the x coordinate that
splits the distribution into two equal masses (Geometrically, the median subdivides
the graph of the distribution into two regions of equal areas). The mean and median
are the same only in symmetric distributions. They differ for any distribution that is
asymmetric. The mean (but not the median) is inﬂuenced more strongly by distant
portions of the distribution.
106Chapter 5. Applications of the deﬁnite integral to calculating volume, mass, and length
5. In the later parts of this chapter, we showed how to compute volumes of various
objects that have radial symmetry (“solids of revolution”). We showed that if the
surface is generated by rotating the graph of a function y = f(x) about the x axis
(for a ≤ x ≤ b), then its volume can be described by an integral of the form
V =
b
a
π[f(x)]
2
dx.
We used this idea to show that the volume of a sphere of radius R is V
sphere
=
(4/3)πR
3
In the Chapters 7 and 8, we ﬁnd applications of the ideas of density and center of
mass to the context of a probability distribution and its mean.
Chapter 6
Techniques of
Integration
In this chapter, we expand our repertoire for antiderivatives beyond the “elementary” func
tions discussed so far. A review of the table of elementary antiderivatives (found in Chap
ter 3) will be useful. Here we will discuss a number of methods for ﬁnding antiderivatives.
We refer to these collected “tricks” as methods of integration. As will be shown, in some
cases, these methods are systematic (i.e. with clear steps), whereas in other cases, guess
work and trial and error is an important part of the process.
A primary method of integration to be described is substitution. A close relationship
exists between the chain rule of differential calculus and the substitution method. A second
very important method is integration by parts. Aside from its usefulness in integration
per se, this method has numerous applications in physics, mathematics, and other sciences.
Many other techniques of integration used to forma core of methods taught in such courses
in integral calculus. Many of these are quite technical. Nowadays, with sophisticated
mathematical software packages (including Maple and Mathematica), integration can be
carried our automatically via computation called “symbolic manipulation”, reducing our
need to dwell on these technical methods.
6.1 Differential notation
We begin by familiarizing the reader with notation that appears frequently in substitution
integrals, i.e. differential notation. Consider a straight line
y = mx +b.
Recall that the slope of the line, m, is
m =
change in y
change in x
=
∆y
∆x
.
This relationship can also be written in the form
∆y = m∆x.
107
108 Chapter 6. Techniques of Integration
y
x Δ
Δ
x
y
Figure 6.1. The slope of the line shown here is m = ∆y/∆x. This means that the
small quantities ∆y and ∆x are related by ∆y = m∆x.
If we take a very small step along this line in the x direction, call it dx (to remind us of an
“inﬁnitesimally small” quantity), then the resulting change in the y direction, (call it dy) is
related by
dy = mdx.
Now suppose that we have a curve deﬁned by some arbitrary function, y = f(x)
which need not be a straight line.For a given point (x, y) on this curve, a step ∆x in the x
direction is associated with a step ∆y in the y direction. The relationship between the step
y=f(x)
x
y
x+ Δx
y+Δ y
x
y
x+dx
y+dy
secant
tangent
Figure 6.2. On this ﬁgure, the graph of some function is used to illustrate the
connection between differentials dy and dx. Note that these are related via the slope of a
tangent line, m
t
to the curve, in contrast with the relationship of ∆y and ∆x which stems
from the slope of the secant line m
s
on the same curve.
sizes is:
∆y = m
s
∆x,
where now m
s
is the slope of a secant line (shown connecting two points on the curve in
Figure 6.2). If the sizes of the steps are small (dx and dy), then this relationship is well
approximated by the slope of the tangent line, m
t
as shown in Figure 6.2 i.e.
dy = m
t
dx = f
(x)dx.
6.1. Differential notation 109
The quantities dx and dy are called differentials. In general, they link a small step on the
x axis with the resulting small change in height along the tangent line to the curve (shown
in Figure 6.2). We might observe that the ratio of the differentials, i.e.
dy
dx
= f
(x),
appears to link our result to the deﬁnition of the derivative. We remember, though, that the
derivative is actually deﬁned as a limit:
f
(x) = lim
∆x→0
∆y
∆x
.
When the step size ∆x is quite small, it is approximately true that
∆y
∆x
≈
dy
dx
.
This notation will be useful in substitution integrals.
Examples
We give some examples of functions, their derivatives, and the differential notation that
goes with them.
1. The function y = f(x) = x
3
has derivative f
(x) = 3x
2
. Thus
dy = 3x
2
dx.
2. The function y = f(x) = tan(x) has derivative f
(x) = sec
2
(x). Therefore
dy = sec
2
(x) dx.
3. The function y = f(x) = ln(x) has derivative f
(x) =
1
x
so
dy =
1
x
dx.
With some practice, we can omit the intermediate step of writing down a derivative
and go directly from function to differential notation. Given a function y = f(x) we will
often write
df(x) =
df
dx
dx
and occasionally, we use just the symbol df to mean the same thing. The following exam
ples illustrate this idea with speciﬁc functions.
d(sin(x)) = cos(x) dx, d(x
n
) = nx
n−1
dx, d(arctan(x)) =
1
1 +x
2
dx.
Moreover, some of the basic rules of differentiation translate directly into rules for
handling and manipulating differentials. We give a list of some of these elementary rules
below.
110 Chapter 6. Techniques of Integration
Rules for derivatives and differentials
1.
d
dx
C = 0, dC = 0.
2.
d
dx
(u(x) +v(x)) =
du
dx
+
dv
dx
d(u +v) = du +dv.
3.
d
dx
u(x)v(x) = u
dv
dx
+v
du
dx
d(uv) = u dv +v du.
4.
d
dx
(Cu(x)) = C
du
dx
, d(Cu) = C du
6.2 Antidifferentiation and indeﬁnite integrals
In Chapters 2 and 3, we deﬁned the concept of the deﬁnite integral, which represents a
number. It will be useful here to consider the idea of an indeﬁnite integral, which is a
function, namely an antiderivative.
If two functions, F(x) and G(x), have the same derivative, say f(x), then they differ
at most by a constant, that is F(x) = G(x) +C, where C is some constant.
Proof
Since F(x) and G(x) have the same derivative, we have
d
dx
F(x) =
d
dx
G(x),
d
dx
F(x) −
d
dx
G(x) = 0,
d
dx
(F(x) −G(x)) = 0.
This means that the function F(x) −G(x) should be a constant, since its derivative is zero.
Thus
F(x) −G(x) = C,
so
F(x) = G(x) +C,
as required. F(x) and G(x) are called antiderivatives of f(x), and this conﬁrms, once
more, that any two antiderivatives differ at most by a constant.
In another terminology, which means the same thing, we also say that F(x) (or G(x))
is the integral of the function f(x), and we refer to f(x) as the integrand. We write this as
follows:
F(x) =
f(x) dx.
This notation is sometimes called “an indeﬁnite integral” because it does not denote a
speciﬁc numerical value, nor is an interval speciﬁed for the integration range. An indeﬁnite
6.3. Simple substitution 111
integral is a function with an arbitrary constant. (Contrast this with the deﬁnite integral
studied in our last chapters: in the case of the deﬁnite integral, we speciﬁed an interval, and
interpreted the result, a number, in terms of areas associated with curves.) We also write
f(x) dx = F(x) +C,
if we want to indicate the form of all possible functions that are antiderivatives of f(x). C
is referred to as a constant of integration.
6.2.1 Integrals of derivatives
Suppose we are given an integral of the form
df
dx
dx,
or alternately, the same thing written using differential notation,
df.
How do we handle this? We reason as follows. The df/dx (a quantity that is, itself, a
function) is the derivative of the function f(x). That means that f(x) is the antiderivative
of df/dx. Then, according to the Fundamental Theorem of Calculus,
df
dx
dx = f(x) +C.
We can write this same result using the differential of f, as follows:
df = f(x) +C.
The following examples illustrate the idea with several elementary functions.
Examples
1.
d(cos x) = cos x +C.
2.
dv = v +C.
3.
d(x
3
) = x
3
+C.
6.3 Simple substitution
In this section, we observe that the forms of some integrals can be simpliﬁed by making a
judicious substitution, and using our familiarity with derivatives (and the chain rule). The
idea rests on the fact that in some cases, we can spot a “helper function”
u = f(x),
112 Chapter 6. Techniques of Integration
such that the quantity
du = f
(x)dx
appears in the integrand. In that case, the substitution will lead to eliminating x entirely in
favour of the new quantity u, and simpliﬁcation may occur.
6.3.1 Example: Simple substitution
Suppose we are given the function
f(x) = (x + 1)
10
.
Then its antiderivative (indeﬁnite integral) is
F(x) =
f(x) dx =
(x + 1)
10
dx.
We could ﬁnd an antiderivative by expanding the integrand (x + 1)
10
into a degree 10
polynomial and using methods already known to us; but this would be laborious. Let us
observe, however, that if we deﬁne
u = (x + 1),
then
du =
d(x + 1)
dx
dx =
dx
dx
+
d(1)
dx
dx = (1 + 0)dx = dx.
Now replacing (x + 1) by u and dx by the equivalent du we get:
F(x) =
u
10
du.
An antiderivative to this can be easily found, namely,
F(x) =
u
11
11
=
(x + 1)
11
11
+C.
In the last step, we converted the result back to the original variable, and included the
arbitrary integration constant. A very important point to remember is that we can always
check our results by differentiation:
Check
Differentiate F(x) to obtain
dF
dx
=
1
11
(11(x + 1)
10
) = (x + 1)
10
.
6.3. Simple substitution 113
6.3.2 How to handle endpoints
We consider how substitution type integrals can be calculated when we have endpoints, i.e.
in evaluating deﬁnite integrals. Consider the example:
I =
2
1
1
x + 1
dx.
This integration can be done by making the substitution u = x+1 for which du = dx. We
can handle the endpoints in one of two ways:
Method 1: Change the endpoints
We can change the integral over entirely to a deﬁnite integral in the variable u as follows:
Since u = x + 1, the endpoint x = 1 corresponds to u = 2, and the endpoint x = 2
corresponds to u = 3, so changing the endpoints to reﬂect the change of variables leads to
I =
3
2
1
u
du = ln u
3
2
= ln 3 −ln 2 = ln
3
2
.
In the last steps we have plugged in the new endpoints (appropriate to u).
Method 2: Change back to x before evaluating at endpoints
Alternately, we could rewrite the antiderivative in terms of x.
1
u
du = ln u = ln x + 1
and then evaluate this function at the original endpoints.
2
1
1
x + 1
dx = ln x + 1
2
1
= ln
3
2
Here we plugged in the original endpoints (as appropriate to the variable x).
6.3.3 Examples: Substitution type integrals
Find a simple substitution and determine the antiderivatives (indeﬁnite or deﬁnite integrals)
of the following functions:
1. I =
2
x + 2
dx.
2. I =
1
0
x
2
e
x
3
dx
3. I =
1
(x + 1)
2
+ 1
dx.
114 Chapter 6. Techniques of Integration
4. I =
(x + 3)
x
2
+ 6x + 10 dx.
5. I =
π
0
cos
3
(x) sin(x) dx.
6. I =
1
ax +b
dx
7. I =
1
b +ax
2
dx.
Solutions
1. I =
2
x + 2
dx. Let u = x + 2. Then du = dx and we get
I =
2
u
du = 2
1
u
du = 2 lnu = 2 lnx + 2 +C.
2. I =
1
0
x
2
e
x
3
dx. Let u = x
3
. Then du = 3x
2
dx. Here we use method 2 for
handling endpoints.
e
u
du
3
=
1
3
e
u
=
1
3
e
x
3
+C.
Then
I =
1
0
x
2
e
x
3
dx =
1
3
e
x
3
1
0
=
1
3
(e −1).
(We converted the antiderivative to the original variable, x, before plugging in the
original endpoints.)
3. I =
1
(x + 1)
2
+ 1
dx. Let u = x + 1, then du = dx so we have
I =
1
u
2
+ 1
du = arctan(u) = arctan(x + 1) +C.
4. I =
(x+3)
x
2
+ 6x + 10 dx. Let u = x
2
+6x+10. Then du = (2x+6) dx =
2(x + 3) dx. With this substitution we get
I =
√
u
du
2
=
1
2
u
1/2
du =
1
2
u
3/2
3/2
=
1
3
u
3/2
=
1
3
(x
2
+ 6x + 10)
3/2
+C.
5. I =
π
0
cos
3
(x) sin(x) dx. Let u = cos(x). Then du = −sin(x) dx. Here we use
method 1 for handling endpoints. For x = 0, u = cos 0 = 1 and for x = π, u =
cos π = −1, so changing the integral and endpoints to u leads to
I =
−1
1
u
3
(−du) = −
u
4
4
−1
1
= −
1
4
((−1)
4
−1
4
) = 0.
6.3. Simple substitution 115
Here we plugged in the new endpoints that are relevant to the variable u.
6.
1
ax +b
dx. Let u = ax + b. Then du = a dx, so dx = du/a. Substitute the
above equations into the ﬁrst equation and simplify to get
I =
1
u
du
a
=
1
a
1
u
du =
1
a
lnu +C.
Substitute u = ax + b back to arrive at the solution
I =
1
ax +b
dx =
1
a
ln ax +b (6.1)
7. I =
1
b +ax
2
dx =
1
b
1
1 + (a/b)x
2
dx. This can be brought to the form of
an arctan type integral as follows: Let u
2
= (a/b)x
2
, so u =
a/b x and du =
a/b dx. Now substituting these, we get
I =
1
b
1
1 +u
2
du
a/b
=
b/a
1
b
1
1 +u
2
du
I =
1
√
ba
arctan(u) du =
1
√
ba
arctan(
a/b x) +C.
6.3.4 When simple substitution fails
Not every integral can be handled by simple substitution. Let us see what could go wrong:
Example: Substitution that does not work
Consider the case
F(x) =
1 +x
2
dx =
(1 +x
2
)
1/2
dx.
A “reasonable” guess for substitution might be
u = (1 +x
2
).
Then
du = 2x dx,
and dx = du/2x. Attempting to convert the integral to the form containing u would lead
to
I =
√
u
du
2x
.
We have not succeeded in eliminating x entirely, so the expression obtained contains a
mixture of two variables. We can proceed no further. This substitution did not simplify the
integral and we must try some other technique.
116 Chapter 6. Techniques of Integration
6.3.5 Checking your answer
Finding an antiderivative can be tricky. (To a large extent, methods described in this chap
ter are a “collection of tricks”.) However, it is always possible (and wise) to check for
correctness, by differentiating the result. This can help uncover errors.
For example, suppose that (in the previous example) we had incorrectly guessed that
the antiderivative of
(1 +x
2
)
1/2
dx
might be
F
guess
(x) =
1
3/2
(1 +x
2
)
3/2
.
The following check demonstrates the incorrectness of this guess: Differentiate F
guess
(x)
to obtain
F
guess
(x) =
1
3/2
(3/2)(1 +x
2
)
(3/2)−1
· 2x = (1 +x
2
)
1/2
· 2x
The result is clearly not the same as (1 + x
2
)
1/2
, since an “extra” factor of 2x appears
from application of the chain rule: this means that the trial function F
guess
(x) was not the
correct antiderivative. (We can similarly check to conﬁrm correctness of any antiderivative
found by following steps of methods here described. This can help to uncover sign errors
and other algebraic mistakes.)
6.4 More substitutions
In some cases, rearrangement is needed before the form of an integral becomes apparent.
We give some examples in this section. The idea is to reduce each one to the form of an
elementary integral, whose antiderivative is known.
Standard integral forms
1. I =
1
u
du = ln u + C.
2. I =
u
n
du =
u
n+1
n + 1
.
3. I =
1
1 +u
2
du = arctan(u) +C.
However, ﬁnding which of these forms is appropriate in a given case will take some in
genuity and algebra skills. Integration tends to be more of an art than differentiation, and
recognition of patterns plays an important role here.
6.4.1 Example: perfect square in denominator
Find the antiderivative for
I =
1
x
2
−6x + 9
dx.
6.4. More substitutions 117
Solution
We observe that the denominator of the integrand is a perfect square, i.e. that x
2
−6x+9 =
(x −3)
2
. Replacing this in the integral, we obtain
I =
1
x
2
−6x + 9
dx =
1
(x −3)
2
dx.
Now making the substitution u = (x −3), and du = dx leads to a power type integral
I =
1
u
2
du =
u
−2
du = −u
−1
= −
1
(x −3)
+C.
6.4.2 Example: completing the square
A small change in the denominator will change the character of the integral, as shown by
this example:
I =
1
x
2
−6x + 10
dx.
Solution
Here we use “completing the square” to express the denominator in the formx
2
−6x+10 =
(x −3)
2
+ 1. Then the integral takes the form
I =
1
1 + (x −3)
2
dx.
Now a substitution u = (x −3) and du = dx will result in
I =
1
1 +u
2
du = arctan(u) = arctan(x −3) +C.
Remark: in cases where completing the square gives rise to a constant other than 1 in the
denominator, we use the technique illustrated in Example 6.3.3 Eqn. (6.1) to simplify the
problem.
6.4.3 Example: factoring the denominator
A change in one sign can also lead to a drastic change in the antiderivative. Consider
I =
1
1 −x
2
dx.
In this case, we can factor the denominator to obtain
I =
1
(1 −x)(1 +x)
dx.
118 Chapter 6. Techniques of Integration
We will show shortly that the integrand can be simpliﬁed to the sum of two fractions, i.e.
that
I =
1
(1 −x)(1 +x)
dx =
A
(1 −x)
+
B
(1 +x)
dx,
where A, B are constants. The algebraic technique for ﬁnding these constants, and hence of
forming the simpler expressions, called Partial fractions, will be discussed in an upcoming
section. Once these constants are found, each of the resulting integrals can be handled by
substitution.
6.5 Trigonometric substitutions
Trigonometric functions provide a rich set of interconnected functions that show up in
many problems. It is useful to remember three very important trigonometric identities that
help to simplify many integrals. These are:
Essential trigonometric identities
1. sin
2
(x) + cos
2
(x) = 1
2. sin(A +B) = sin(A) cos(B) + sin(B) cos(A)
3. cos(A +B) = cos(A) cos(B) −sin(A) sin(B).
In the special case that A = B = x, the last two identities above lead to:
Double angle trigonometric identities
1. sin(2x) = 2 sin(x) cos(x).
2. cos(2x) = cos
2
(x) −sin
2
(x).
From these, we can generate a variety of other identities as special cases. We list the most
useful below. The ﬁrst two are obtained by combining the doubleangle formula for cosines
with the identity sin
2
(x) + cos
2
(x) = 1.
Useful trigonometric identities
1. cos
2
(x) =
1 + cos(2x)
2
.
2. sin
2
(x) =
1 −cos(2x)
2
.
3. tan
2
(x) + 1 = sec
2
(x).
6.5.1 Example: simple trigonometric substitution
Find the antiderivative of
I =
sin(x) cos
2
(x) dx.
6.5. Trigonometric substitutions 119
Solution
This integral can be computed by a simple substitution, similar to Example 5 of Section 6.3.
We let u = cos(x) and du = −sin(x)dx to get the integral into the form
I = −
u
2
du =
−u
3
3
=
−cos
3
(x)
3
+C.
We need none of the trigonometric identities in this case. Simple substitution is always the
easiest method to use. It should be the ﬁrst method attempted in each case.
6.5.2 Example: using trigonometric identities (1)
Find the antiderivative of
I =
cos
2
(x) dx.
Solution
This is an example in which the “Useful trigonometric identity” 1 leads to a simpler inte
gral. We write
I =
cos
2
(x) dx =
1 + cos(2x)
2
dx =
1
2
(1 + cos(2x)) dx.
Then clearly,
I =
1
2
x +
sin(2x)
2
+C.
6.5.3 Example: using trigonometric identities (2)
Find the antiderivative of
I =
sin
3
(x) dx.
Solution
We can rewrite this integral in the form
I =
sin
2
(x) sin(x) dx.
Now using the trigonometric identity sin
2
(x) + cos
2
(x) = 1, leads to
I =
(1 − cos
2
(x)) sin(x) dx.
This can be split up into
I =
sin(x) dx −
sin(x) cos
2
(x) dx.
120 Chapter 6. Techniques of Integration
The ﬁrst part is elementary, and the second was shown in a previous example. Therefore
we end up with
I = −cos(x) +
cos
3
(x)
3
+C.
Note that it is customary to combine all constants obtained in the calculation into a single
constant, C at the end.
Aside fromintegrals that, themselves, contain trigonometric functions, there are other
cases in which use of trigonometric identities, though at ﬁrst seemingly unrelated, is cru
cial. Many expressions involving the form
√
1 ±x
2
or the related form
√
a ±bx
2
will be
simpliﬁed eventually by conversion to trigonometric expressions!
6.5.4 Example: converting to trigonometric functions
Find the antiderivative of
I =
1 −x
2
dx.
Solution
The simple substitution u = 1 − x
2
will not work, (as shown by a similar example in
Section 6.3). However, converting to trigonometric expressions will do the trick. Let
x = sin(u), then dx = cos(u)du.
(In Figure 6.3, we show this relationship on a triangle. This diagram is useful in reversing
the substitutions after the integration step.) Then 1 − x
2
= 1 −sin
2
(u) = cos
2
(u), so the
x
1
1−x
2
u
Figure 6.3. This triangle helps to convert the (trigonometric) functions of u to the
original variable x in Example 6.5.4.
substitutions lead to
I =
cos
2
(u) cos(u) du =
cos
2
(u) du.
From a previous example, we already know how to handle this integral. We ﬁnd that
I =
1
2
u +
sin(2u)
2
=
1
2
(u + sin(u) cos(u)) +C.
6.5. Trigonometric substitutions 121
(In the last step, we have used the double angle trigonometric identity. We will shortly see
why this simpliﬁcation is relevant.)
We now desire to convert the result back to a function of the original variable, x.
We note that x = sin(u) implies u = arcsin(x). To convert the term cos(u) back to an
expression depending on x we can use the relationship 1 − sin
2
(u) = cos
2
(u), to deduce
that
cos(u) =
1 −sin
2
(u) =
1 −x
2
.
It is sometimes helpful to use a Pythagorean triangle, as shown in Figure 6.3, to
rewrite the antiderivative in terms of the variable x. The idea is this: We construct the
triangle in such a way that its side lengths are related to the “angle” u according to the
substitution rule. In this example, x = sin(u) so the sides labeled x and 1 were chosen so
that their ratio (“opposite over hypotenuse” coincides with the sine of the indicated angle,
u, thereby satisfying x = sin(u). We can then determine the length of the third leg of
the triangle (using the Pythagorean formula) and thus all other trigonometric functions
of u. For example, we note that the ratio of “adjacent over hypotenuse” is cos(u) =
√
1 −x
2
/1 =
√
1 −x
2
. Finally, with these reverse substitutions, we ﬁnd that,
I =
1 −x
2
dx =
1
2
arcsin(x) +x
1 −x
2
+C.
Remark: In computing a deﬁnite integral of the same type, we can circumvent the
need for the conversion back to an expression involving x by using the appropriate method
for handling endpoints. For example, the integral
I =
1
0
1 −x
2
dx
can be transformed to
I =
π/2
0
cos
2
(u) cos(u) du,
by observing that x = sin(u) implies that u = 0 when x = 0 and u = π/2 when x = 1.
Then this means that the integral can be evaluated directly (without changing back to the
variable x) as follows:
I =
π/2
0
cos
2
(u) cos(u) du =
1
2
u +
sin(2u)
2
π/2
0
=
1
2
π
2
+
sin(π)
2
=
π
4
where we have used the fact that sin(π) = 0.
Some subtle points about the domains of deﬁnition of inverse trigonometric functions
will not be discussed here in detail. (See material on these functions in a ﬁrst term calculus
course.) Sufﬁce it to say that some integrals of this type will be undeﬁned if this endpoint
conversion cannot be carried out (e.g. if the interval of integration had been 0 ≤ x ≤ 2,
we would encounter an impossible relation 2 = sin(u). Since no value of u satisﬁes this
relation, such a deﬁnite integral has no meaning, i.e. “does not exist”.)
122 Chapter 6. Techniques of Integration
6.5.5 Example: The centroid of a two dimensional shape
We extend the concept of centroid (center of mass) for a region that has uniform density
in 2D, but where we consider the distribution of mass along the x (or y) axis. Consider
the semicircle shape of uniform thickness, shown in Figure 6.4, and suppose it is balanced
along its horizontal edge. Find the x coordinate ¯ c at which the shape balances.
x
y
y= 9 − x
2
Figure 6.4. A semicircular shape.
Solution
The semicircle is one quarter of a circle of radius 3. Its edge is described by the equation
y = f(x) =
9 −x
2
.
We will assume that the density per unit area is uniform. However, the mass per unit
length along the x axis is not uniform, due to the shape of the object. We apply the idea of
integration: If we cut the shape at increments of ∆x along the x axis, we get a collection
of pieces whose mass is each proportional to f(x)∆x. Summing up such contributions and
letting the widths ∆x →dx get small, we arrive at the integral for mass. The total mass of
the shape is thus
M =
3
0
f(x) dx =
3
0
9 −x
2
dx.
Furthermore, if we compute the integral
I =
3
0
xf(x) dx =
3
0
x
9 −x
2
dx,
we obtain the x coordinate of the center of mass,
¯ x =
I
M
.
It is evident that the mass is proportional to the area of one quarter of a circle of radius 3:
M =
1
4
π(3)
2
=
9
4
π.
6.5. Trigonometric substitutions 123
(We could also see this by performing a trigonometric substitution integral.) The second
integral can be done by simple substitution. Consider
I =
3
0
xf(x) dx =
3
0
x
9 −x
2
dx.
Let u = 9 − x
2
. Then du = −2x dx. The endpoints are converted as follows: x = 0 ⇒
u = 9 −0
2
= 9 and x = 3 ⇒u = 9 −3
2
= 0 so that we get the integral
I =
0
9
√
u
1
−2
du.
We can reverse the endpoints if we switch the sign, and this leads to
I =
1
2
9
0
u
1/2
du =
1
2
u
3/2
3/2
9
0
.
Since 9
3/2
= (9
1/2
)
3
= 3
3
, we get I = (3
3
)/3 = 3
2
= 9. Thus the x coordinate of the
center of mass is
¯ x =
I
M
=
9
(9/4)π
=
4
π
.
We can similarly ﬁnd the y coordinate of the center of mass: To do so, we would express
the boundary of the shape in the formx = f(y) and integrate to ﬁnd
¯ y =
3
0
yf(y) dy.
For the semicircle, y
2
+x
2
= 9, so x = f(y) =
9 −y
2
. Thus
¯ y =
3
0
y
9 −y
2
dy.
This integral looks identical to the one we wrote down for ¯ x. Thus, based on this similarity
(or based on the symmetry of the problem) we will ﬁnd that
¯ y =
4
π
.
6.5.6 Example: tan and sec substitution
Find the antiderivative of
I =
1 +x
2
dx.
Solution
We aim for simpliﬁcation by the identity 1 + tan
2
(u) = sec
2
(u), so we set
x = tan(u), dx = sec
2
(u)du.
124 Chapter 6. Techniques of Integration
Then the substitution leads to
I =
1 + tan
2
(u) sec
2
(u) du =
sec
2
(u) sec
2
(u) du =
sec
3
(u) du.
This integral will require further work, and will be partly calculated by Integration by Parts
in Appendix 11.5. In this example, the triangle shown in Figure 6.5 shows the relationship
between x and u and will help to convert other trigonometric functions of u to functions of
x.
x
u
1+x
2
1
Figure 6.5. As in Figure 6.3 but for example 6.5.6.
6.6 Partial fractions
In this section, we show a simple algebraic trick that helps to simplify an integrand when
it is in the form of some rational function such as
f(x) =
1
(ax +b)(cx +d)
.
The idea is to break this up into simpler rational expressions by ﬁnding constants A, B
such that
1
(ax + b)(cx +d)
=
A
(ax +b)
+
B
(cx +d)
.
Each part can then be handled by a simple substitution, as shown in Example 6.3.3, Eqn. (6.1).
We give several examples below.
6.6.1 Example: partial fractions (1)
Find the antiderivative of
I =
1
x
2
−1
.
Factoring the denominator, x
2
− 1 = (x − 1)(x + 1), suggests breaking up the integrand
into the form
1
x
2
−1
=
A
(x + 1)
+
B
(x −1)
.
6.6. Partial fractions 125
The two sides are equal provided:
1
x
2
−1
=
A(x −1) +B(x + 1)
x
2
−1
.
This means that
1 = A(x −1) +B(x + 1)
must be true for all x values. We now ask what values of A and B make this equation hold
for any x. Choosing two “easy” values, namely x = 1 and x = −1 leads to isolating one
or the other unknown constants, A, B, with the results:
1 = −2A, 1 = 2B.
Thus B = 1/2, A = −1/2, so the integral can be written in the simpler form
I =
1
2
−1
(x + 1)
dx +
1
(x −1)
dx
.
(A common factor of (1/2) has been taken out.) Now a simple substitution will work for
each component. (Let u = x + 1 for the ﬁrst, and u = x −1 for the second integral.) The
result is
I =
1
x
2
−1
=
1
2
(−lnx + 1 + ln x −1) +C.
6.6.2 Example: partial fractions (2)
Find the antiderivative of
I =
1
x(1 −x)
dx.
This example is similar to the previous one. We set
1
x(1 −x)
=
A
x
+
B
(1 −x)
.
Then
1 = A(1 −x) +Bx.
This must hold for all x values. In particular, convenient values of x for determining the
constants are x = 0, 1. We ﬁnd that
A = 1, B = 1.
Thus
I =
1
x(1 −x)
dx =
1
x
dx +
1
1 −x
dx.
Simple substitution now gives
I = ln x −ln 1 −x +C.
126 Chapter 6. Techniques of Integration
6.6.3 Example: partial fractions (3)
Find the antiderivative of
I =
x
x
2
+x −2
.
The rational expression above factors into x
2
+ x − 2 = (x − 1)(x + 2), leading to the
expression
x
x
2
+x −2
=
A
(x −1)
+
B
(x + 2)
.
Consequently, it follows that
A(x + 2) +B(x −1) = x.
Substituting the values x = 1, −2 into this leads to A = 1/3 and B = 2/3. The usual
procedure then results in
I =
x
x
2
+x −2
=
1
3
ln x −1 +
2
3
ln x + 2 +C.
Another example of the technique of partial fractions is provided in Appendix 11.5.2.
6.7 Integration by parts
The method described in this section is important as an additional tool for integration. It
also has independent theoretical stature in many applications in mathematics and physics.
The essential idea is that in some cases, we can exchange the task of integrating a function
with the job of differentiating it.
The idea rests on the product rule for derivatives. Suppose that u(x) and v(x) are
two differentiable functions. Then we know that the derivative of their product is
d(uv)
dx
= v
du
dx
+u
dv
dx
,
or, in the differential notation:
d(uv) = v du +u dv,
Integrating both sides, we obtain
d(uv) =
v du +
u dv
i.e.
uv =
v du +
u dv.
We write this result in the more suggestive form
u dv = uv −
v du.
The idea here is that if we have difﬁculty evaluating an integral such as
u dv, we may be
able to “exchange it” for a simpler integral in the form
v du. This is best illustrated by
the examples below.
6.7. Integration by parts 127
Example: Integration by parts (1)
Compute
I =
2
1
ln(x) dx.
Solution
Let u = ln(x) and dv = dx. Then du = (1/x) dx and v = x.
ln(x) dx = xln(x) −
x(1/x) dx = xln(x) −
dx = xln(x) −x.
We now evaluate this result at the endpoints to obtain
I =
2
1
ln(x) dx = (xln(x) − x)
2
1
= (2 ln(2) −2) −(1 ln(1) −1) = 2 ln(2) −1.
(Where we used the fact that ln(1) = 0.)
Example: Integration by parts (2)
Compute
I =
1
0
xe
x
dx.
Solution
At ﬁrst, it may be hard to decide how to assign roles for u and dv. Suppose we try u = e
x
and dv = xdx. Then du = e
x
dx and v = x
2
/2. This means that we would get the integral
in the form
I =
x
2
2
e
x
−
x
2
2
e
x
dx.
This is certainly not a simpliﬁcation, because the integral we obtain has a higher power of
x, and is consequently harder, not easier to integrate. This suggests that our ﬁrst attempt
was not a helpful one. (Note that integration often requires trial and error.)
Let u = x and dv = e
x
dx. This is a wiser choice because when we differentiate u,
we reduce the power of x (from 1 to 0), and get a simpler expression. Indeed, du = dx,
v = e
x
so that
xe
x
dx = xe
x
−
e
x
dx = xe
x
−e
x
+C.
To ﬁnd a deﬁnite integral of this kind on some interval (say 0 ≤ x ≤ 1), we compute
I =
1
0
xe
x
dx = (xe
x
−e
x
)
1
0
= (1e
1
−e
1
) −(0e
0
−e
0
) = 0 +e
0
= e
0
= 1.
Note that all parts of the expression are evaluated at the two endpoints.
128 Chapter 6. Techniques of Integration
Example: Integration by parts (2b)
Compute
I
n
=
x
n
e
x
dx.
Solution
We can calculate this integral by repeated application of the idea in the previous example.
Letting u = x
n
and dv = e
x
dx leads to du = nx
n−1
and v = e
x
. Then
I
n
= x
n
e
x
−
nx
n−1
e
x
dx = x
n
e
x
−n
x
n−1
e
x
dx.
Each application of integration by parts, reduces the power of the termx
n
inside an integral
by one. The calculation is repeated until the very last integral has been simpliﬁed, with
no remaining powers of x. This illustrates that in some problems, integration by parts is
needed more than once.
Example: Integration by parts (3)
Compute
I =
arctan(x) dx.
Solution
Let u = arctan(x) and dv = dx. Then du = (1/(1 +x
2
)) dx and v = x so that
I = xarctan(x) −
1
1 +x
2
x dx.
The last integral can be done with the simple substitution w = (1 +x
2
) and dw = 2x dx,
giving
I = xarctan(x) −(1/2)
(1/w)dw.
We obtain, as a result
I = xarctan(x) −
1
2
ln(1 +x
2
).
Example: Integration by parts (3b)
Compute
I =
tan(x) dx.
6.7. Integration by parts 129
Solution
We might try to ﬁt this into a similar pattern, i.e. let u = tan(x) and dv = dx. Then
du = sec
2
(x) dx and v = x, so we obtain
I = xtan(x) −
xsec
2
(x) dx.
This is not really a simpliﬁcation, and we see that integration by parts will not necessarily
work, even on a seemingly related example. However, we might instead try to rewrite the
integral in the form
I =
tan(x) dx =
sin(x)
cos(x)
dx.
Now we ﬁnd that a simple substitution will do the trick, i.e. that w = cos(x) and dw =
−sin(x) dx will convert the integral into the form
I =
1
w
(−dw) = −lnw = −ln cos(x).
This example illustrates that we should always try substitution, ﬁrst, before attempting
other methods.
Example: Integration by parts (4)
Compute
I
1
=
e
x
sin(x) dx.
We refer to this integral as I
1
because a related second integral, that we’ll call I
2
will appear
in the calculation.
Solution
Let u = e
x
and dv = sin(x) dx. Then du = e
x
dx and v = −cos(x) dx. Therefore
I
1
= −e
x
cos(x) −
(−cos(x))e
x
dx = −e
x
cos(x) +
cos(x)e
x
dx.
We now have another integral of a similar form to tackle. This seems hopeless, as we
have not simpliﬁed the result, but let us not give up! In this case, another application of
integration by parts will do the trick. Call I
2
the integral
I
2
=
cos(x)e
x
dx,
so that
I
1
= −e
x
cos(x) +I
2
.
Repeat the same procedure for the new integral I
2
, i.e. Let u = e
x
and dv = cos(x) dx.
Then du = e
x
dx and v = sin(x) dx. Thus
I
2
= e
x
sin(x) −
sin(x)e
x
dx = e
x
sin(x) −I
1
.
130 Chapter 6. Techniques of Integration
This appears to be a circular argument, but in fact, it has a purpose. We have determined
that the following relationships are satisﬁed by the above two integrals:
I
1
= −e
x
cos(x) +I
2
I
2
= e
x
sin(x) −I
1
.
We can eliminate I
2
, obtaining
I
1
= −e
x
cos(x) +I
2
= −e
x
cos(x) +e
x
sin(x) −I
1
.
that is,
I
1
= −e
x
cos(x) +e
x
sin(x) −I
1
.
Rearranging (taking I
1
to the left hand side) leads to
2I
1
= −e
x
cos(x) +e
x
sin(x),
and thus, the desired integral has been found to be
I
1
=
e
x
sin(x) dx =
1
2
(−e
x
cos(x) +e
x
sin(x)) =
1
2
e
x
(sin(x) −cos(x)) +C.
(At this last step, we have included the constant of integration.) Moreover, we have also
found that I
2
is related, i.e. using I
2
= e
x
sin(x) −I
1
we now know that
I
2
=
cos(x)e
x
dx =
1
2
e
x
(sin(x) + cos(x)) +C.
6.8 Summary
In this chapter, we explored a number of techniques for computing antiderivatives. We here
summarize the most important results:
1. Substitution is the ﬁrst method to consider. This method works provided the change
of variable results in elimination of the original variable and leads to a simpler, more
elementary integral.
2. When using substitution on a deﬁnite integral, endpoints can be converted to the
new variable (Method 1) or the resulting antiderivative can be converted back to its
original variable before plugging in the (original) endpoints (Method 2).
3. The integration by parts formula for functions u(x), v(x) is
u dv = uv −
v du.
Integration by parts is useful when u is easy to differentiate (but not easy to integrate).
It is also helpful when the integral contains a product of elementary functions such
as x
n
and a trigonometric or an exponential function. Sometimes more than one
application of this method is needed. Other times, this method is combined with
substitution or other simpliﬁcations.
6.8. Summary 131
4. Using integration by parts on a deﬁnite integral means that both parts of the formula
are to be evaluated at the endpoints.
5. Integrals involving
√
1 ±x
2
can be simpliﬁed by making a trigonometric substitu
tion.
6. Integrals with products or powers of trigonometric functions can sometimes be sim
pliﬁed by application of trigonometric identities or simple substitution.
7. Algebraic tricks, and many associated manipulations are often applied to twist and
turn a complicated integral into a set of simpler expressions that can each be handled
more easily.
8. Even with all these techniques, the problem of ﬁnding an antiderivative can be very
complicated. In some cases, we resort to handbooks of integrals, use symbolic ma
nipulation software packages, or, if none of these work, calculate a given deﬁnite
integral numerically using a spreadsheet.
Table of elementary antiderivatives
1.
1
u
du = ln u +C.
2.
u
n
du =
u
n+1
n + 1
+C
3.
1
1 +u
2
= arctan(u) +C
4.
1
√
1 −x
2
= arcsin(u) +C
5.
sin(u) du = −cos(u) +C
6.
cos(u) du = sin(u) +C
7.
sec
2
(u) du = tan(u) +C
Additional useful antiderivatives
1.
tan(u) du = ln  sec(u) +C.
2.
cot(u) du = ln  sin(u) +C
3.
sec(u) = ln sec(u) + tan(u) +C
132 Chapter 6. Techniques of Integration
Chapter 7
Discrete probability and
the laws of chance
7.1 Introduction
In this chapter we lay the groundwork for calculations and rules governing simple discrete
probabilities
24
. Such skills are essential in understanding problems related to random pro
cesses of all sorts. In biology, there are many examples of such processes, including the
inheritance of genes and genetic diseases, the random motion of cells, the ﬂuctuations in
the number of RNA molecules in a cell, and a vast array of other phenomena.
To gain experience with probability, it is important to see simple examples. In this
chapter, we discuss experiments that can be easily reproduced and tested by the reader.
7.2 Dealing with data
Scientists studying phenomena in the real world, collect data of all kinds, some resulting
from experimental measurement or ﬁeld observations. Data sets can be large and complex.
If an experiment is repeated, and comparisons are to be made between multiple data sets,
it is unrealistic to compare each and every numerical value. Some shortcuts allow us to
summarize trends or descriptions of data sets in simple values such as averages (means),
medians, and similar quantities. In doing so we lose the detailed information that the data
set contains, in favor of simplicity of one or several “simple” numerical descriptors such
as the mean and the median of a distribution. We have seen related ideas in Chapter 5
in the context of mass distributions. The idea of a center of mass is closely related to that
of the mean of a distribution. Here we revisit such ideas in the context of probability. An
additional example of real data is described in Appendix 11.6. There, we show how grade
distributions on a test can be analyzed by similar methods.
24
I am grateful to Robert Israel for comments regarding the organization of this chapter
133
134 Chapter 7. Discrete probability and the laws of chance
7.3 Simple experiments
7.3.1 Experiment
We will consider “experiments” such as tossing a coin, rolling a die, dealing cards, applying
treatment to sick patients, and recording how many are cured. In order for the ideas of
probability to apply, we should be able to repeat the experiment as many times as desired
under exactly the same conditions. The number of repetitions will often be denoted N.
7.3.2 Outcome
Whenever we perform the experiment, exactly one outcome happens. In this chapter we
will deal with discrete probability, where there is a ﬁnite list of possible outcomes.
Consider the following experiment: We toss a coin and see how it lands. Here there
are only two possible results: “heads” (H) or “tails” (T). A fair coin is one for which these
results are equally likely. This means that if we repeat this experiment many many times,
we expect that on average, we get H roughly 50% of the time and T roughly 50% of the
time. This will lead us to deﬁne a probability of 1/2 for each outcome.
Similarly, consider the experiment of rolling a dice: A sixsided die can land on any
of its six faces, so that a “single experiment” has six possible outcomes. For a fair die, we
anticipate getting each of the results with an equal probability, i.e. if we were to repeat
the same experiment many many times, we would expect that, on average, the six possible
events would occur with similar frequencies, each 1/6 of the times. We say that the events
are random and unbiased for “fair” dice.
We will often be interested in more complex experiments. For example, if we toss a
coin ﬁve times, an outcome corresponds to a ﬁveletter sequence of “Heads” (H) and “Tails”
(T), such as THTHH. We are interested in understanding how to quantify the probability
of each such outcome in fair (as well as unfair) coins. If we toss a coin ten times, how
probable is it that we get eight out of ten heads? For dice, we could ask how likely are
we to roll a 5 and a 6 in successive experiments? A 5 or a 6? For such experiments we
are interested in quantifying how likely it is that a certain event is obtained. Our goal in
this chapter is to make more precise our notion of probability, and to examine ways of
quantifying and computing probabilities. To motivate this investigation, we ﬁrst look at
results of a real experiment performed in class by students.
7.3.3 Empirical probability
We can arrive at a notion of probability by actually repeating a real experiment N times,
and counting how many times each outcome happens. Let us use the notation x
i
to refer to
the number of times that outcome i was obtained. An example of this sort is illustrated in
Section 7.4.1. We deﬁne the empirical probability p
i
of outcome i to be
p
i
= x
i
/N,
i.e p
i
is the fraction of times that the result i is obtained out of all the experiments. We ex
pect that if we repeated the experiment many more times, this empirical probability would
7.3. Simple experiments 135
approach, as a limit, the actual probability of the outcome. So if in a cointossing experi
ment, repeated 1000 times, the outcome HHTHH is obtained 25 times, then we would say
that the empirical probability p
HHTHH
is 25/1000.
7.3.4 Theoretical Probability
For theoretical probability, we make some reasonable basic assumptions on which we base
a calculation of the probabilities. For example, in the case of a “fair coin”, we can argue by
symmetry that every sequence of n heads and tails has the same probability as any other.
We then use two fundamental rules of probability to calculate the probability as illustrated
below.
Rules of probability
1. In discrete probability, 0 ≤ p
i
≤ 1 for each outcome i.
2. For discrete probability
¸
i
p
i
= 1, where the sum is over all possible outcomes.
About Rule 1: p
i
= 0 implies that the given outcome never happens, whereas p
i
= 1
implies that this outcome is the only possibility (and always happens). Any value inside
the range (0,1) means that the outcome occurs some of the time. Rule 2 makes intuitive
sense: it means that we have accounted for all possibilities, i.e. the fractions corresponding
to all of the outcomes add up to 100% of the results.
In a case where there are M possible outcomes, all with equal probability, it follows
that p
i
= 1/M for every i.
7.3.5 Random variables and probability distributions
A random variable is a numerical quantity X that depends on the outcome of an exper
iment. For example, suppose we toss a coin n times, and let X be the number of heads
that appear. If, say, we toss the coin n = 4 times, then the number of heads, X could take
on any of the values {x
i
} = {0, 1, 2, 3, 4} (i.e., no heads, one head, . . . four heads). In the
case of discrete probability there are a discrete number of possible values for the random
variable to take on.
We will be interested in the probability distribution of X. In general if the possible
values x
i
are listed in increasing order for i = 0, ..., n, we would like to characterize their
probabilities p(x
i
), where p(x
i
) =Prob(X = x
i
)
25
.
Even though p(x
i
) is a discrete quantity taking on one of a discrete set of values,
we should still think of this mathematical object as a function: it associates a number
(the probability) p with each allowable value of the random variable x
i
for i = 0, . . . , n.
In what follows, we will be interested in characterizing such function, termed probability
distributions and their properties.
25
Read: p(x
i
) is the probability that the random variable X takes on the value x
i
136 Chapter 7. Discrete probability and the laws of chance
7.3.6 The cumulative distribution
Given a probability distribution, we can also deﬁne a cumulative function as follows:
The cumulative function corresponding to the probability distribution p(x
i
) is deﬁned as
F(x
i
) = Prob(X ≤ x
i
).
For a given numerical outcome x
i
, the value of F(x
i
) is hence
F(x
i
) =
i
¸
j=0
p(x
j
).
The function F merely sums up all the probabilities of outcomes up to and including x
i
,
hence is called “cumulative”. This implies that F(x
n
) = 1 where x
n
is the largest value
attainable by the randomvariable. For example, in the rolling of a die, if we list the possible
outcomes in ascending order as {1, 2, . . . , 6}, then F(6) stands for the probability of rolling
a 6 or any lower value, which is clearly equal to 1 for a sixsided die.
7.4 Examples of experimental data
7.4.1 Example1: Tossing a coin
We illustrate ideas with an example of real data obtained by repeating an “experiment”
many times. The experiment, actually carried out by each of 121 students in this calcu
lus course, consisted of tossing a coin n = 10 times and recording the number, x
i
, of
“Heads” that came up. Each student recorded one of eleven possible outcomes, x
i
=
{0, 1, 2, . . . , 10} (i.e. no heads, one, two, etc, up to ten heads out of the ten tosses). By
pooling together such data, we implicitly assume that all coins and all tossers are more
or less identical and unbiased, so the “experiment” has N = 121 replicates (one for each
student). Table 7.1 shows the result of this experiment. Here n
i
is the number of students
who got x
i
heads. We refer to this as the frequency of the given result. Also, so n
i
/N is
the fraction of experiments that led to the given result, and we deﬁne the empirical proba
bility assigned to x
i
as this fraction, that is p(x
i
) = n
i
/N. In column (3) we display the
cumulative number of students who got any number up to and including x
i
heads, and then
in column (5) we compute the cumulative (empirical) probability F(x
i
).
In Figure 7.1 we show what this distribution looks like on a bar graph. The horizontal
axis is x
i
, the number of heads obtained, and the vertical axis is p(x
i
). Because in this
example, only discrete integer values (0, 1, 2, etc) can be obtained in the experiment,
it makes sense to represent the data as discrete points, as shown on the bottom panel in
Fig. 7.1. We also show the cumulative function F(x
i
), superimposed as an xyplot on a
graph of p(x
i
). Observe that F starts with the value 0 and climbs up to value 1, since the
probabilities of any of the events (0, 1, 2, etc heads) must add up to 1.
7.5. Mean and variance of a probability distribution 137
Number frequency cumulative empirical cumulative
of heads (number of students) number probability function
x
i
n
i
i
¸
0
n
j
p(x
i
) = n
i
/N F(x
i
) =
i
¸
0
p(x
j
)
0 0 0 0.00 0.00
1 1 1 0.0083 0.0083
2 2 3 0.0165 0.0248
3 10 13 0.0826 0.1074
4 27 40 0.2231 0.3306
5 26 66 0.2149 0.5455
6 34 100 0.2810 0.8264
7 14 114 0.1157 0.9421
8 7 121 0.0579 1.00
9 0 121 0.00 1.00
10 0 121 0.00 1.00
Table 7.1. Results of a real cointossing experiment carried out by 121 students
in this mathematics course. Each student tossed a coin 10 times. We recorded the “fre
quency”, i.e. the number of students n
i
who each got x
i
= 0, 1, 2, . . . , 10 heads. The
fraction of the class that got each outcome, n
i
/N, is identiﬁed with the (empirical) prob
ability of that outcome, p(x
i
). We also compute the cumulative function F(x
i
) in the last
column. See Figure 7.1 for the same data presented graphically.
7.4.2 Example 2: grade distributions
Another example of real data is provided in Appendix 11.6. There we discuss distributions
of grades on a test. Many of the ideas described here apply in the same way. For space
constraints, that example is provided in an Appendix, rather than here.
7.5 Mean and variance of a probability distribution
We next discuss some very important quantities related to the randomvariable. Such quan
tities provide numerical descriptions of the average value of the random variable and the
ﬂuctuations about that average. We deﬁne each of these as follows:
The mean (or average or expected value), ¯ x of a probability distribution is
¯ x =
n
¸
i=0
x
i
p(x
i
) .
The expected value is a kind of “average value of x”, where values of x are weighted
by their frequency of occurrence. This idea is related to the concept of center of mass
deﬁned in Section 5.3.1 (x positions weighted by masses associated with those positions).
138 Chapter 7. Discrete probability and the laws of chance
number of heads (i)
empirical probability of i heads in 10 tosses
0.0 10.0
0.0
0.4
Cumulative function
number of heads (i) 0.0 10.0
0.0
1.0
Figure 7.1. The data from Table 7.1 is shown plotted on this graph. A total of
N = 121 people were asked to toss a coin n = 10 times. In the bar graph (left), the
horizontal axis reﬂects i, the number, of heads (H) that came up during those 10 coin
tosses. The vertical axis reﬂects the fraction p(x
i
) of the class that achieved that particular
number of heads. In the lower graph, the same data is shown by the discrete points. We
also show the cumulative function that sums up the values from left to right. Note that the
cumulative function is a “step function” .
The mean is a point on the x axis, representing the “average” outcome of an experiment.
(Recall that in the distributions we are describing, the possible outcomes of some observa
tion or measurement process are depicted on the x axis of the graph.) The mean is not the
same as the average value of a function, discussed in Section 4.6. (In that case, the average
is an average y coordinate, or average height of the function.)
26
We also deﬁne quantities that represents the width of the distribution. We deﬁne the
variance, V and standard deviation, σ as follows:
The variance, V , of a distribution is
V =
n
¸
i=0
(x
i
− ¯ x)
2
p(x
i
).
where ¯ x is the mean. The standard deviation, σ is
σ =
√
V .
The variance is related to the square of the quantity represented on the x axis, and since
the standard deviation its square root, σ carries the same units as x. For this reason, it is
26
Note to the instructor: students often mix these two distinct meanings of the word average, and they should
be helped to overcome this difﬁculty with terminology.
7.5. Mean and variance of a probability distribution 139
common to associate the value of σ, with a typical “width” of the distribution. Having a
low value of σ means that most of the experimental results are close to the mean, whereas
a large σ signiﬁes that there is a large scatter of experimental values about the mean.
In the problem sets, we show that the variance can also be expressed in the form
V = M
2
− ¯ x
2
,
where M
2
is the second moment of the distribution. Moments of a distribution are deﬁned
as the values obtained by summing up products of the probability weighted by powers of
x.
The j’th moment, M
j
of a distribution is
M
j
=
n
¸
i=0
(x
i
)
j
p(x
i
).
Example 7.1 (Rolling a die) Suppose you toss a die, and let the random variable be X be
the number obtained on the die, i.e. (1 to 6). If this die is fair, then it is equally likely to get
any of the six possible outcomes, so each has probability 1/6. In this case
x
i
= i, i = 1, 2 . . . 6 p(x
i
) = 1/6.
We calculate the various quantities as follows: The mean is
¯ x =
6
¸
i=1
i ·
1
6
=
1
6
·
6 · 7
2
=
7
2
= 3.5.
The second moment, M
2
is
M
2
=
6
¸
i=1
i
2
·
1
6
=
1
6
·
6 · 7 · 13
6
=
91
6
.
We can now obtain the variance,
V =
91
6
−
7
2
2
=
35
12
,
and the standard deviation,
σ =
35/12 ≈ 1.7078.
Example 7.2 (Expected number of heads (empirical)) For the empirical probability dis
tribution shown in Figure 7.1, the mean (expected value) is calculated from results in Ta
ble 7.1 as follows:
¯ x =
10
¸
k=0
x
i
p(x
i
) = 0(0)+1(0.0083)+2(0.0165)+. . .+8(0.0579)+9(0)+10(0) = 5.2149
140 Chapter 7. Discrete probability and the laws of chance
Thus, the mean number of heads in this set of experiments is about 5.2. This is close to
what we would expect intuitively in a fair coin, namely that, on average, 5 out of 10 tosses
(i.e. 50%) would result in heads. To compute the variance we form the sum
V =
10
¸
k=0
(x
k
− ¯ x)
2
p(x
k
) =
10
¸
k=0
(k −5.2149)
2
p(k).
Here we have used the mean calculated above and the fact that x
k
= k. We obtain
V = (0 −5.2149)
2
(0) + (1 −5.2149)
2
(0.0083) +. . . + (7 −5.2149)
2
(0.1157)
+ (8 −5.2149)
2
(0.0579) + (9 −5.2149)
2
(0) + (10 −5.2149)
2
(0) = 2.053
(Because there was no replicate of the experiment that led to 9 or 10 heads out of 10
tosses, these values do not contribute to the calculation.) The standard deviation is then
σ =
√
V = 1.4328.
7.6 Bernoulli trials
A Bernoulli trial is an experiment in which there are two possible outcomes. A typical
example, motivated previously, is tossing a coin (the outcome being H or T). Traditionally,
we refer to one of the outcomes of a Bernoulli trial as ”success” S and the other ”failure”
27
,
F.
Let p be the probability of success and q = 1 − p the probability of failure in a
Bernoulli trial. We now consider how to calculate the probability of some number of
“successes” in a set of repetitions of a Bernoulli trial. In short, we are interested in the
probability of tossing some number of Heads in n coin tosses.
7.6.1 The Binomial distribution
Suppose we repeat a Bernoulli trial n times; we will assume that each trial is identical and
independent of the others. This implies that the probability p of success and q of failure
is the same in each trial. Let X be the number of successes. Then X is said to have a
Binomial distribution with parameters n and p.
Let us consider how to calculate the probability distribution of X, i.e. the probability
that X = k where k is some number of successes between none (k = 0) and all (k = n).
Recall that the notation for this probability is Prob(X = k) for k = 0, 1, . . . , n. Also note
that
X = k means that in the n trials there are k successes and n − k failures. Consider
the following example for the case of n = 3, where we list all possible outcomes and their
probabilities:
In constructing Table 7.2, we use a multiplication principle applied to computing
the probability of a compound experiment. We state this, together with a useful addition
principle below.
27
For example “Heads you win, Tails you lose”.
7.6. Bernoulli trials 141
Result probability number of heads
SSS p
3
X = 3
SSF p
2
q X = 2
SFS p
2
q X = 2
SFF pq
2
X = 1
FSS p
2
q X = 2
FSF pq
2
X = 1
FFS pq
2
X = 1
FFF q
3
X = 0
Table 7.2. A list of all possible results of three repetitions ( n = 3) of a Bernoulli
trial. S=“success” and F=“failure. (Substituting H for S, and T for F gives the same
results for a coin tossing experiment repeated 3 times).
Multiplication principle: if e
1
, . . . , e
k
are independent events, then
Prob(e
1
and e
2
and . . . e
k
) = Prob(e
1
)Prob(e
2
) . . . Prob(e
k
)
Addition principle: if e
1
, ..., e
k
are mutually exclusive events, then
Prob(e
1
or e
2
or . . . e
k
) = Prob(e
1
) + Prob(e
2
) +. . . + Prob(e
k
).
Based on the results in Table 7.2 and on the two principles outline above, we can compute
the probability of obtaining 0, 1, 2, or 3 successes out of 3 trials. The results are shown
in Table 7.3. In constructing Table 7.3, we have considered all the ways of obtaining 0
Probability of X heads
Prob(X = 0) = q
3
Prob(X = 1) = 3pq
2
Prob(X = 2) = 3p
2
q
Prob(X = 3) = p
3
Table 7.3. The probability of obtaining X successes out of 3 Bernoulli trials,
based on results in Table 7.2 and the addition principle of probability.
successes (there is only one such way, namely SSS, and its probability is p
3
), all the ways
of obtaining only one success (here we must allow for SFF, FSF, FFS, each having the
same probability pq
2
) etc. Since these results are mutually exclusive (only one such result
is possible for any given replicate of the 3trial experiment), the addition principle is used
to compute the probability Prob(SFF or FSF or FFS).
142 Chapter 7. Discrete probability and the laws of chance
In general, for each replicate of an experiment consisting of n Bernoulli trials, the
probability of an outcome that has k successes and n − k failures (in some speciﬁc order)
is p
k
q
(n−k)
. To get the total probability of X = k, we need to count how many possible
outcomes consist of k successes and n − k failures. As illustrated by the above example,
there are, in general, many such ways, since the order in which S and F appear can differ
from one outcome to another. In mathematical terminology, there can be many permuta
tions (i.e. arrangements of the order) of S and F that have the same number of successes
in total. (See Section 11.8 for a review.) In fact, the number of ways that n trials can lead
to k successes is C(n, k), the binomial coefﬁcient, which is, by deﬁnition, the number of
ways of choosing k objects out of a collection of n objects. That binomial coefﬁcient is
C(n, k) = (n choose k) =
n!
(n −k)!k!
.
(See Section 11.7 for the deﬁnition of factorial notation “!” used here.) We have arrived at
the following result for n Bernoulli trials:
The probability of k successes in n Bernoulli trials is
Prob(X = k) = C(n, k)p
k
q
n−k
.
In the above example, with n = 3, we ﬁnd that
Prob(X = 2) = C(3, 2)p
2
q = 3p
2
q.
7.6.2 The Binomial theorem
The name binomial coefﬁcient comes from the binomial theorem: which accounts for the
expression obtained by expanding a binomial.
(a +b)
n
=
n
¸
k=0
C(n, k)a
k
b
n−k
.
Let us consider a few examples. A familiar example is
(a +b)
2
= (a +b) · (a +b) = a
2
+ab +ba +b
2
= a
2
+ 2ab +b
2
.
The coefﬁcients C(2, 2) = 1, C(2, 1) = 2, and C(2, 0) = 1 appear in front of the three
terms, representing, respectively, the number of ways of choosing 2 a’s, 1 a, and no a’s out
of the n factors of (a + b). [Respectively, these account for the terms a
2
, ab and b
2
in the
resulting expansion.] Similarly, the product of three terms is
(a +b)
3
= (a +b) · (a +b) · (a +b) = (a +b)
3
= a
3
+ 3a
2
b + 3ab
2
+b
3
whereby coefﬁcients are of the form C(3, k) for k = 3, 2, 1, 0. More generally, an expan
sion of n terms leads to
(a +b)
n
= a
n
+C(n, 1)a
n−1
b +C(n, 2)a
n−2
b
2
+. . . +C(n, k)a
k
b
n−k
+. . . +C(n, n −2)a
2
b
n−2
+C(n, n −1)ab
n−1
+b
n
=
n
¸
k=0
C(n, k)a
k
b
n−k
7.6. Bernoulli trials 143
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
Table 7.4. Pascal’s triangle contains the binomial coefﬁcients of the C(n, k).
Each term in Pascal’s triangle is obtained by adding the two diagonally above it. The top
of the triangle represents C(0, 0). The next row represents C(1, 0) and C(1, 1). For row
number n, terms along the row are the binomial coefﬁcients C(n, k), starting with k = 0
at the beginning of the row and and going to k = n at the end of the row.
The binomial coefﬁcients are symmetric, so that C(n, k) = C(n, n −k). They are entries
that occur in Pascal’s triangle, shown in Table 7.4.
7.6.3 The binomial distribution
The binomial distribution
p=1/2 q=1/2
0.5 10.5
0.0
0.4
The binomial distribution
p=1/4 q=3/4
0.5 10.5
0.0
0.4
Figure 7.2. The binomial distribution is shown here for n = 10. We have plotted
Prob(X = k) versus k for k = 0, 1, . . . 10. This distribution is the same as the probability
of getting X heads out of 10 coin tosses for a fair coin. In the ﬁrst panel, the probability
of success and failure are the same, i.e. p = q = 0.5. The distribution is then symmetric.
In the second panel, the probability of success is p = 1/4, so q = 3/4 and the resulting
distribution is skewed.
What does the binomial theorem say about the binomial distribution? First, since
there are only two possible outcomes in each Bernoulli trial, it follows that
p +q = 1, and hence (p +q)
n
= 1.
144 Chapter 7. Discrete probability and the laws of chance
Using the binomial theorem, we can expand the latter to obtain
(p +q)
n
=
n
¸
k=0
C(n, k)p
k
q
n−k
=
n
¸
k=0
Prob(X = k) = 1.
That is, the sum of these terms represents the sum of probabilities of obtaining k =
0, 1, . . . , n successes. (And since this accounts for all possibilities, it follows that the sum
adds up to 1.)
We can compute the mean and variance of the binomial distribution using the follow
ing tricks. We will write out an expansion for a product of the form (px +q)
n
. Here x will
be an abstract quantity introduced for convenience (i.e., for making the trick work):
(px + q)
n
=
n
¸
k=0
C(n, k)(px)
k
q
n−k
=
n
¸
k=0
C(n, k)p
k
q
n−k
x
k
.
Taking the derivative of the above with respect to x leads to:
n(px +q)
n−1
· p =
n
¸
k=0
C(n, k)p
k
q
n−k
kx
k−1
,
which, (plugging in x = 1) implies that
np =
n
¸
k=0
k · C(n, k)p
k
q
n−k
=
n
¸
k=0
k · Prob(X = k) =
¯
X. (7.1)
Thus, we have found that
The mean of the binomial distribution is
¯
X = np where n is the number of trials and p is
the probability of success in one trial.
We continue to compute other quantities of interest. Multiply both sides of Eqn. 7.1
by x to obtain
nx(px +q)
n−1
p =
n
¸
k=0
C(n, k)p
k
q
n−k
kx
k
.
Take the derivative again. The result is
n(px +q)
n−1
p + n(n −1)x(px +q)
n−2
p
2
=
n
¸
k=0
C(n, k)p
k
q
n−k
k
2
x
k−1
.
Plug in x = 1 to get
np +n(n −1)p
2
=
n
¸
k=0
k
2
C(n, k)p
k
q
n−k
= M
2
.
Thereby we have calculated the second moment of the distribution, the variance, and the
standard deviation. In summary, we found the following results:
7.6. Bernoulli trials 145
The second moment M
2
, the Variance V and the standard deviation σ of a binomial distri
bution are
M
2
= np +n
2
p
2
−np
2
,
V = M
2
−
¯
X
2
= np −np
2
= np(1 −p) = npq,
σ =
√
npq.
7.6.4 The normalized binomial distribution
We can “normalize” (i.e. rescale) the binomial random variable so that it has a convenient
mean and width. To do so, deﬁne the new random variable
˜
X to be:
˜
X = X −
¯
X. Then
˜
X
has mean 0 and standard deviation σ. Now deﬁne
Z =
(X −
¯
X)
σ
Then Z has mean 0 and standard deviation 1. In the limit as n → ∞, we can approximate
Z with a continuous distribution, called the standard normal distribution.
The Normal distribution
4.0 4.0
0.0
0.4
Figure 7.3. The Normal (or Gaussian) distribution is given by equation (7.2) and
has the distribution shown in this ﬁgure.
As the number of Bernoulli trials grows, i.e. as we toss our imaginary coin in longer
and longer sets (n → ∞), a remarkable thing happens to the binomial distribution: it
becomes smoother and smoother, until it grows to resemble a continuous distribution that
looks like a “Bell curve”. That curve is known as the Gaussian or Normal distribution. If
we scale this curve vertically and horizontally (stretch vertically and compress horizontally
by the factor
√
N/2) and shift its peak to x = 0, then we ﬁnd a distribution that describes
the deviation from the expected value of 50% heads. The resulting function is of the form
p(x) =
1
√
2π
e
−x
2
/2
(7.2)
146 Chapter 7. Discrete probability and the laws of chance
We will study properties of this (and other) such continuous distributions in a later
section. We show a typical example of the Normal distribution in Figure 7.3. Its cumulative
distribution is then shown (without and with the original distribution superimposed) in
Figure 7.4.
The cumulative distribution
4.0 4.0
0.0
1.0
The cumulative distribution
The normal distribution
4.0 4.0
0.0
1.0
Figure 7.4. The Normal probability density with its corresponding cumulative function.
7.7 HardyWeinberg genetics
In this section, we investigate how the ideas developed in this chapter apply to genetics.
We ﬁnd that many of the simple concepts presented here will be useful in calculating the
probability of inheriting genes from one generation to the next.
Each of us has two entire sets of chromosomes: one set is inherited from our mother,
and one set comes from our father. These chromosomes carry genes, the unit of genetic
material that “codes” for proteins and ultimately, through complicated biochemistry and
molecular biology, determines all of our physical traits.
We will investigate how a single gene (with two “ﬂavors”, called alleles) is passed
from one generation to the next. We will consider a particularly simple situation, when the
single gene determines some physical trait (such as eye color). The trait (say blue or green
eyes) will be denoted the phenotype and the actual pair of genes (one on each parentally
derived chromosome) will be called the genotype.
Suppose that the gene for eye color comes in two forms that will be referred to as
A and a. For example, A might be an allele for blue eyes, whereas a could be an allele
for brown eyes. Consider the following “experiment”: select a random individual from the
population of interest, and examine the region in one of their chromosomes determining
eye colour. Then there are two possible mutually exclusive outcomes, A or a; according to
our previous deﬁnition, the experiment just described is a Bernoulli trial.
The actual eye color phenotype will depend on both inherited alleles, and hence, we
are interested in a “repeated Bernoulli trial” with n = 2. In principle, each chromosome
will come with one or the other allele, so each individual would have one of the following
pairs of combinations AA, Aa, aA, or aa. The order Aa or aA is synonymous, so only the
7.7. HardyWeinberg genetics 147
Genotype: aA AA aa Aa
Probability: pq p
2
q
2
pq
Genotype: aA or Aa AA aa
Probability: 2pq p
2
q
2
Table 7.5. If the probability of ﬁnding allele A is p and the probability of ﬁnding
allele A is q, then the eye color gene probabilities are as shown in the top table. However,
because genotype Aa is equivalent to genotype aA, we have combined these outcomes in
the revised second table.
number of alleles of type A (or equivalently of type a) is important.
Suppose we know that the fraction of all genes for eye color of type A in the popu
lation is p, and the fraction of all genes for eye color of type a is q, where p +q = 1. (We
have used the fact that there are only two possibilities for the gene type, of course.) Then
we can interpret p and q as probabilities that a gene selected at random from the population
will turn out to be type a (respectively A), i.e., Prob(A) = p, Prob(a)=q.
Now suppose we draw at random two alleles out of the (large) population. If the
population size is N, then, on average we would expect Np
2
individuals of type AA, Nq
2
of type aa and 2Npq individuals of the mixed type. Note that the sum of the probabilities
of all the genotypes is
p
2
+ 2pq +q
2
= (p +q)
2
= 1.
(We have seen this before in the discussion of Bernoulli trials, and in the deﬁnition of
properties of probability.)
7.7.1 Random nonassortative mating
We now examine what happens if mates are chosen randomly and offspring arise from
such parents. The father and mother each pass down one or another copy of their alleles to
the progeny. We investigate how the proportion of genes of various types is arranged, and
whether it changes in the next generation. In Table 7.6, we show the possible genotypes of
the mother and father, and calculate the probability that mating of such individuals would
occur under the assumption that choice of mate is random  i.e., does not depend at all
on “eye color”. We assume that the allele donated by the father (carried in his sperm) is
independent of the allele found in the mother’s egg cell
28
. This means that we can use the
multiplicative property of probability to determine the probability of a given combination
of parental alleles. (i.e. Prob(x and y)=Prob(x)·Prob(y)).
For example, the probability that a couple chosen at random will consist of a woman
of genotype aA and a man of genotype aa is a product of the fraction of females that are of
type aA and the fraction of males that are of type aa. But that is just (2pq)(p
2
), or simply
2p
3
q. Now let us examine the distribution of possible offspring of various parents.
28
Recall that the sperm and the egg each have one single set of chromosomes, and their union produces the
zygote that carries the doubled set of chromosomes.
148 Chapter 7. Discrete probability and the laws of chance
In Table 7.6, we note, for example, that if the couple are both of type aA, each parent
can “donate” either a or A to the progeny, so we expect to see children of types aa, aA, AA
in the ratio 1:2:1 (regardless of the values of p and q).
We can now group together and summarize all the progeny of a given genotype, with
the probabilities that they are produced by one or another such random mating. Using this
table, we can then determine the probability of each of the three genotypes in the next
generation.
Mother: AA aA aa
p
2
2pq q
2
Father:
AA AA
1
2
aA
1
2
AA Aa
p
2
p
4
2pqp
2
p
2
q
2
aA
1
2
aA
1
2
AA
1
4
aa
1
2
aA
1
4
AA
1
2
aa
1
2
Aa
2pq 2pqp
2
4p
2
q
2
2pqq
2
aa Aa
1
2
aA
1
2
aa aa
q
2
p
2
q
2
2pqq
2
q
4
Table 7.6. The frequency of progeny of various types in HardyWeinberg genetics
can be calculated as shown in this “mating table”. The genotype of the mother is shown
across the top and the father’s genotype is shown on the left column. The various progeny
resulting from mating are shown as entries in bold face. The probabilities of the given
progeny are directly under those entries. (We did not simplify the expressions  this is to
emphasize that they are products of the original parental probabilities.)
Example 7.3 (Probability of AA progeny) Find the probability that a random(Hardy Wein
berg) mating will give rise to a progeny of type AA.
Solution 1
Using Table 7.6, we see that there are only four ways that a child of type AA can result
from a mating: either both parents are AA, or one or the other parent is Aa, or both parents
are Aa. Thus, for children of type AA the probability is
Prob(child of type AA) = p
4
+
1
2
(2pqp
2
) +
1
2
(2pqp
2
) +
1
4
(4p
2
q
2
).
Simplifying leads to
Prob(child of type AA) = p
2
(p
2
+ 2qp +q
2
) = p
2
(p +q)
2
= p
2
.
7.7. HardyWeinberg genetics 149
In the problem set, we also ﬁnd that the probability of a child of type aA is 2qp, the
probability of the child being type aa is q
2
. We thus observe that the frequency of genotypes
of the progeny is exactly the same as that of the parents. This type of genetic makeup is
termed HardyWeinberg genetics.
Alternate solution
1/2
child
AA
mother father
Aa AA Aa AA
A or A A or A
2pq p
2pq p
2
2
(pq+p ) ( pq + p )
2 2 .
1/2 1 1
Figure 7.5. A tree diagram to aid the calculation of the probability that a child
with genotype AA results from random assortative (Hardy Weinberg) mating.
In Figure 7.5, we show an alternate solution to the same problem using a tree dia
gram. Reading from the top down, we examine all the possibilities at each branch point.
A child AA cannot have any parent of genotype aa, so both father and mother’s genotype
could only have been one of AA or Aa. Each arrow indicating the given case is accom
panied by the probability of that event. (For example, a random individual has probability
2pq of having genotype Aa, as shown on the arrows from the father and mother to these
genotypes.) Continuing down the branches, we ask with what probability the given parent
would have contributed an allele of type A to the child. For a parent of type AA, this is
certainly true, so the given branch carries probability 1. For a parent of type Aa, the proba
bility that A is passed down to the child is only 1/2. The combined probability is computed
as follows: we determine the probability of getting an A from father (of type AA OR Aa):
This is Prob(A from father)=(1/2)2pq + 1 · p
2
) = (pq + p
2
) and multiply it by a similar
probability of getting A from the mother (of type AA OR Aa). (We must multiply, since
we need A from the father AND A from the mother for the genotype AA.) Thus,
Prob(child of type AA) =(pq +p
2
)(pq +p
2
) = p
2
(q + p)
2
= p
2
· 1 = p
2
.
It is of interest to investigate what happens when one of the assumptions we made is
150 Chapter 7. Discrete probability and the laws of chance
relaxed, for example, when the genotype of the individual has an impact on survival or on
the ability to reproduce. While this is beyond our scope here, it forms an important theme
in the area of genetics.
7.8 Random walker
In this section we discuss an application of the binomial distribution to the process of a
randomwalk. A shown in Figure 7.6(a), we consider a straight (1 dimensional) path and an
erratic walker who takes steps randomly to the left or right. We will assume that the walker
never stops. With probability p, she takes a step towards the right, and with probability q
she takes a step towards the left. (Since these are the only two choices, it must be true that
p +q = 1.) In Figure 7.6(b) we show the walker’s position, x plotted versus the number of
steps (n) she has taken. (We may as well assume that the steps occur at regular intervals of
time, so that the horizontal axis of this plot can be thought of as a time axis.)
(a)
p q
0 −1 1
n
x
x
(b)
Figure 7.6. A random walker in 1 dimension takes a step to the right with proba
bility p and a step to the left with probability q.
The process described here is classic, and often attributed to a drunken wanderer. In
our case, we could consider this motion as a 1D simpliﬁcation of the random tumbles and
swims of a bacterium in its turbulent environment. it is usually the case that a goal of this
swim is a search for some nutrient source, or possibly avoidance of poor environmental
conditions. We shall see that if the probabilities of left and right motion are unequal (i.e.
the motion is biased in one direction or another) this swimmer tends to drift along towards
a preferred direction.
In this problem, each step has only two outcomes (analogous to a trial in a Bernoulli
experiment). We could imagine the walker tossing a coin to determine whether to move
7.8. Random walker 151
right or left. We wish to characterize the probability of the walker being at a certain posi
tion at a given time, and to ﬁnd her expected position after n steps. Our familiarity with
Bernoulli trials and the binomial distribution will prove useful in this context.
Example
(a) What is the probability of a run of steps as follows: RLRRRLRLLLL
(b) Find the probability that the walker moves k steps to the right out of a total run of n
consecutive steps.
(c) Suppose that p = q = 1/2. What is the probability that a walker starting at the origin
returns to the origin on her 10’th step?
Solution
(a) The probability of the run RLRRRLRLLL is the product pqpppqpqqq = p
5
q
5
. Note
the similarity to the question “What is the probability of tossing HTHHHTHTTT?”
(b) This problem is identical to the problem of k heads in n tosses of a coin. The proba
bility of such an event is given by a term in the binomial distribution:
P(k out of n moves to right)=C(n, k)p
k
q
n−k
.
(c) The walker returns to the origin after 10 steps only if she has taken 5 steps to the left
(total) and 5 steps to the right (total). The order of the steps does not matter. Thus
this problem reduces to the problem (b) with 5 steps out of 10 taken to the right. The
probability is thus
P(back at 0 after 10 steps) = P(5 out of 10 steps to right)
=C(10, 5)p
5
q
5
= C(10, 5)
1
2
10
=
10!
5!5!
1
1024
= 0.24609
Mean position
We now ask how to determine the expected position of the walker after n steps, i.e. how
the mean value of x depends on the number of steps and the probabilities associated with
each step. After 1 step, with probability p the position is x = +1 and with probability q,
the position is x = −1. The expected (mean) position after 1 move is thus
x
1
= p(+1) +q(−1) = p −q
But the process follows a binomial distribution, and thus the mean after n steps is
x
n
= n(p −q).
152 Chapter 7. Discrete probability and the laws of chance
7.9 Summary
In this chapter, we introduced the notion of discrete probability of elementary events. We
learned that a probability is always a number between 0 and 1, and that the sum of (dis
crete) probabilities of all possible (discrete) outcomes is 1. We then described how to
combine probabilities of elementary events to calculate probabilities of compound inde
pendent events in a variety of simple experiments. We deﬁned the notion of a Bernoulli
trial, such as tossing of a coin, and studied this in detail.
We investigated a number of ways of describing results of experiments, whether in
tabular or graphical form, and we used the distribution of results to deﬁne simple numerical
descriptors. The mean is a number that, more or less, describes the location of the “center”
of the distribution (analogous to center of mass), deﬁned as follows:
The mean (expected value) ¯ x of a probability distribution is
¯ x =
n
¸
i=0
x
i
p(x
i
).
The standard deviation is, roughly speaking, the “width” of the distribution.
The standard deviation, σ is
σ =
√
V
where V is the variance,
V =
n
¸
i=0
(x
i
− ¯ x)
2
p(x
i
).
While the chapter was motivated by results of a real experiment, we then investigated
theoretical distributions, including the binomial. We found that the distribution of events in
a repetition of a Bernoulli trial (e.g. coin tossed n times) was a binomial distribution, and
we computed the mean of that distribution.
Suppose that the probability of one of the events, say event e
1
in a Bernoulli trial is p (and
hence the probability of the other event e
2
is q = 1 −p), then
P(k occurrences of given event out of n trials) =
n!
k!(n −k)!
p
k
q
n−k
.
This is called the binomial distribution. The mean of the binomial distribution, i.e. the
mean number of events e
1
in n repeated Bernoulli trials is
¯ x = np.
Chapter 8
Continuous probability
distributions
8.1 Introduction
In Chapter 7, we explored the concepts of probability in a discrete setting, where outcomes
of an experiment can take on only one of a ﬁnite set of values. Here we extend these
ideas to continuous probability. In doing so, we will see that quantities such as mean and
variance that were previously deﬁned by sums will now become deﬁnite integrals. Here
again, we will see the concepts of integral calculus in the context of practical examples and
applications.
We begin by extending the idea of a discrete random variable to the continuous case.
We call x a continuous random variable in a ≤ x ≤ b if x can take on any value in this
interval. An example of a random variable is the height of a person, say an adult male,
selected randomly from a population. (This height typically takes on values in the range
0.5 ≤ x ≤ 3 meters, say, so a = 0.5 and b = 3.)
If we select a male subject at randomfroma large population, and measure his height,
we might expect to get a result in the proximity of 1.71.8 meters most often  thus, such
heights will be associated with a larger value of probability than heights in some other
interval of equal length, e.g. heights in the range 2.7 < x < 2.8 meters, say. Unlike
the case of discrete probability, however, the measured height can take on any real number
within the interval of interest. This leads us to redeﬁne our idea of a continuous probability,
using a continuous function in place of the discrete bargraph seen in Chapter 7.
8.2 Basic deﬁnitions and properties
Here we extend previous deﬁnitions from Chapter 7 to the case of continuous probability.
One of the most important differences is that we now consider a probability density, rather
than a value of the probability per se
29
. First and foremost, we observe that now p(x)
will no longer be a probability, but rather “ a probability per unit x”. This idea is analo
29
This leap from discrete values that are the probability of an outcome (as seen in Chapter 7) to a probability
density is challenging for many students. Reinforcing the analogy with discrete masses versus distributed mass
density (discussed in Chapter 5) may be helpful.
153
154 Chapter 8. Continuous probability distributions
gous to the connection between the mass of discrete beads and a continuous mass density,
encountered previously in Chapter 5.
Deﬁnition
A function p(x) is a probability density provided it satisﬁes the following properties:
1. p(x) ≥ 0 for all x.
2.
b
a
p(x) dx = 1 where the possible range of values of x is a ≤ x ≤ b.
The probability that a random variable x takes on values in the interval a
1
≤ x ≤ a
2
is deﬁned as
a2
a1
p(x) dx.
The transition to probability density means that the quantity p(x) does not carry the same
meaning as our previous notation for probability of an outcome x
i
, namely p(x
i
) in the
discrete case. In fact, p(x)dx, or its approximation p(x)∆x is now associated with the
probability of an outcome whose values is “close to x”.
Unlike our previous discrete probability, we will not ask “what is the probability that
x takes on some exact value?” Rather, we ask for the probability that x is within some
range of values, and this is computed by performing an integral
30
.
Having generalized the idea of probability, we will now ﬁnd that many of the asso
ciated concepts have a natural and straightforward generalization as well. We ﬁrst deﬁne
the cumulative function, and then show how the mean, median, and variance of a contin
uous probability density can be computed. Here we will have the opportunity to practice
integration skills, as integrals replace the sums in such calculations.
Deﬁnition
For experiments whose outcome takes on values on some interval a ≤ x ≤ b, we deﬁne a
cumulative function, F(x), as follows:
F(x) =
x
a
p(s) ds.
Then F(x) represents the probability that the randomvariable takes on a value in the range
(a, x)
31
. The cumulative function is simply the area under the probability density (between
the left endpoint of the interval, a, and the point x).
The above deﬁnition has several implications:
30
Remark: the probability that x is exactly equal to b is the integral
b
b
p(x) dx. But this integral has a value
zero, by properties of the deﬁnite integral.
31
By now, the reader should be comfortable with the use of “s” as the “dummy variable” in this formula, where
x plays the role of right endpoint of the interval of integration.
8.2. Basic deﬁnitions and properties 155
Properties of continuous probability
1. Since p(x) ≥ 0, the cumulative function is an increasing function.
2. The connection between the probability density and its cumulative function can be
written (using the Fundamental Theorem of Calculus) as
p(x) = F
(x).
3. F(a) = 0. This follows from the fact that
F(a) =
a
a
p(s) ds.
By a property of the deﬁnite integral, this is zero.
4. F(b) = 1. This follows from the fact that
F(b) =
b
a
p(s) ds = 1
by Property 2 of the deﬁnition of the probability density, p(x).
5. The probability that x takes on a value in the interval a
1
≤ x ≤ a
2
is the same as
F(a
2
) −F(a
1
).
This follows from the additive property of integrals and the Fundamental Theorem
of Calculus:
a2
a
p(s) ds −
a1
a
p(s) ds =
a2
a1
p(s) ds =
a2
a1
F
(s) ds = F(a
2
) −F(a
1
)
Finding the normalization constant
Not every realvalued function can represent a probability density. For one thing, the func
tion must be positive everywhere. Further, the total area under its graph should be 1, by
Property 2 of a probability density. Given an arbitrary positive function, f(x) ≥ 0, on
some interval a ≤ x ≤ b such that
b
a
f(x)dx = A > 0,
we can always deﬁne a corresponding probability density, p(x) as
p(x) =
1
A
f(x), a ≤ x ≤ b.
It is easy to check that p(x) ≥ 0 and that
b
a
p(x)dx = 1. Thus we have converted the
original function to a probability density. This process is called normalization, and the
constant C = 1/A is called the normalization constant
32
.
32
The reader should recognize that we have essentially rescaled the original function by dividing it by the “area”
A. This is really what normalization is all about.
156 Chapter 8. Continuous probability distributions
8.2.1 Example: probability density and the cumulative
function
Consider the function f(x) = sin (πx/6) for 0 ≤ x ≤ 6.
(a) Normalize the function so that it describes a probability density.
(b) Find the cumulative distribution function, F(x).
Solution
The function is positive in the interval 0 ≤ x ≤ 6, so we can deﬁne the desired probability
density. Let
p(x) = C sin
π
6
x
.
(a) We must ﬁnd the normalization constant, C, such that Property 2 of continuous prob
ability is satisﬁed, i.e. such that
1 =
6
0
p(x) dx.
Carrying out this computation leads to
6
0
C sin
π
6
x
dx = C
6
π
−cos
π
6
x
6
0
= C
6
π
(1 −cos(π)) = C
12
π
(We have used the fact that cos(0) = 1 in a step here.) But by Property 2, for p(x)
to be a probability density, it must be true that C(12/π) = 1. Solving for C leads to
the desired normalization constant,
C =
π
12
.
Note that this calculation is identical to ﬁnding the area
A =
6
0
sin
π
6
x
dx,
and setting the normalization constant to C = 1/A.
Once we rescale our function by this constant, we get the probability density,
p(x) =
π
12
sin
π
6
x
.
This density has the property that the total area under its graph over the interval
0 ≤ x ≤ 6 is 1. A graph of this probability density function is shown as the black
curve in Figure 8.1.
8.3. Mean and median 157
(b) We now compute the cumulative function,
F(x) =
x
0
p(s) ds =
π
12
x
0
sin
π
6
s
ds
Carrying out the calculation
33
leads to
F(x) =
π
12
·
6
π
−cos
π
6
s
x
0
=
1
2
1 −cos
π
6
x
.
This cumulative function is shown as a red curve in Figure 8.1.
p(x)
F(x)
0.0 6.0
0.0
1.0
Figure 8.1. The probability density p(x) (black), and the cumulative function
F(x) (red) for Example 8.2.1. Note that the area under the black curve is 1 (by normal
ization), and thus the value of F(x), which is the cumulative area function is 1 at the right
endpoint of the interval.
8.3 Mean and median
When we are given a distribution, we often want to describe it with simpler numerical
values that characterize its “center”: the mean and the median both give this type of in
formation. We also want to describe whether the distribution is narrow or fat  i.e. how
clustered it is about its “center”. The variance and higher moments will provide that type
of information.
Recall that in Chapter 5 for mass density ρ(x), we deﬁned a center of mass,
¯ x =
b
a
xρ(x) dx
b
a
ρ(x) dx
. (8.1)
33
Notice that the integration involved in ﬁnding F(x) is the same as the one done to ﬁnd the normalization
constant. The only difference is the ultimate step of evaluating the integral at the variable endpoint x rather than
the ﬁxed endpoint b = 6.
158 Chapter 8. Continuous probability distributions
The mean of a probability density is deﬁned similarly, but the deﬁnition simpliﬁes by virtue
of the fact that
b
a
p(x) dx = 1. Since probability distributions are normalized, the denom
inator in Eqn. (8.1) is simply 1.Consequently, the mean of a probability density is given as
follows:
Deﬁnition
For a random variable in a ≤ x ≤ b and a probability density p(x) deﬁned on this interval,
the mean or average value of x (also called the expected value), denoted ¯ x is given by
¯ x =
b
a
xp(x) dx.
To avoid confusion note the distinction between the mean as an average value of x versus
the average value of the function p over the given interval. Reviewing Example 5.3.3 may
help to dispel such confusion.
The idea of median encountered previously in grade distributions also has a parallel
here. Simply put, the median is the value of x that splits the probability distribution into
two portions whose areas are identical.
Deﬁnition
The median x
med
of a probability distribution is a value of x in the interval a ≤ x
med
≤ b
such that
x
med
a
p(x) dx =
b
x
med
p(x) dx =
1
2
.
It follows from this deﬁnition that the median is the value of x for which the cumulative
function satisﬁes
F(x
med
) =
1
2
.
8.3.1 Example: Mean and median
Find the mean and the median of the probability density found in Example 8.2.1.
Solution
To ﬁnd the mean we compute
¯ x =
π
12
6
0
xsin
π
6
x
dx.
8.3. Mean and median 159
Integration by parts is required here
34
. Let u = x, dv = sin
π
6
x
dx.
Then du = dx, v = −
6
π
cos
π
6
x
. The calculation is then as follows:
¯ x =
π
12
−x
6
π
cos
π
6
x
6
0
+
6
π
6
0
cos
π
6
x
dx
=
1
2
−xcos
π
6
x
6
0
+
6
π
sin
π
6
x
6
0
=
1
2
−6 cos(π) +
6
π
sin(π) −
6
π
sin(0)
=
6
2
= 3. (8.2)
(We have used cos(π) = −1, sin(0) = sin(π) = 0 in the above.)
To ﬁnd the median, x
med
, we look for the value of x for which
F(x
med
) =
1
2
.
Using the form of the cumulative function from Example 8.2.1, we ﬁnd that
F(x)
0.5
xmed 0.0 6.0
0.0
1.0
Figure 8.2. The cumulative function F(x) (red) for Example 8.2.1 in relation
to the median, as computed in Example 8.3.1. The median is the value of x at which
F(x) = 0.5, as shown in green.
x
med
0
sin
π
6
s
ds =
1
2
⇒
1
2
1 −cos
π
6
x
med
=
1
2
.
34
Recall from Chapter 6 that
udv = vu −
vdu. Calculations of the mean in continuous probability often
involve Integration by Parts (IBP), since the integrand consists of an expression xp(x)dx. The idea of IBP is
to reduce the integration to something involving only p(x)dx, which is done essentially by “differentiating” the
term u = x, as we show here.
160 Chapter 8. Continuous probability distributions
Here we must solve for the unknown value of x
med
.
1 −cos
π
6
x
med
= 1, ⇒ cos
π
6
x
med
= 0.
The angles whose cosine is zero are ±π/2, ±3π/2 etc. We select the angle so that the
resulting value of x
med
will be inside the relevant interval (0 ≤ x ≤ 6 for this example),
i.e. π/2. This leads to
π
6
x
med
=
π
2
so the median is
x
med
= 3.
In other words, we have found that the point x
med
subdivides the interval 0 ≤ x ≤ 6 into
two subintervals whose probability is the same. The relationship of the median and the
cumulative function F(x) is illustrated in Fig 8.2.
Remark
A glance at the original probability distribution should convince us that it is symmetric
about the value x = 3. Thus we should have anticipated that the mean and median of this
distribution would both occur at the same place, i.e. at the midpoint of the interval. This
will be true in general for symmetric probability distributions, just as it was for symmetric
mass or grade distributions.
8.3.2 How is the mean different from the median?
p(x)
p(x)
x x
Figure 8.3. In a symmetric probability distribution (left) the mean and median are
the same. If the distribution is changed slightly so that it is no longer symmetric (as shown
on the right) then the median may still be the same, which the mean will have shifted to the
new “center of mass” of the probability density.
We have seen in Example 8.3.1 that for symmetric distributions, the mean and the
median are the same. Is this always the case? When are the two different, and how can we
understand the distinction?
Recall that the mean is closely associated with the idea of a center of mass, a concept
from physics that describes the location of a pivot point at which the entire “mass” would
8.4. Applications of continuous probability 161
exactly balance. It is worth remembering that
mean of p(x) = expected value of x = average value of x.
This concept is not to be confused with the average value of a function, which is an average
value of the y coordinate, i.e., the average height of the function on the given interval.
The median simply indicates a place at which the “total mass” is subdivided into two
equal portions. (In the case of probability density, each of those portions represents an
equal area, A
1
= A
2
= 1/2 since the total area under the graph is 1 by deﬁnition.)
Figure 8.3 shows how the two concepts of median (indicated by vertical line) and
mean (indicated by triangular “pivot point”) differ. At the left, for a symmetric probability
density, the mean and the median coincide, just as they did in Example 8.3.1. To the right,
a small portion of the distribution was moved off to the far right. This change did not affect
the location of the median, since the total areas to the right and to the left of the vertical
line are still equal. However, the fact that part of the mass is farther away to the right leads
to a shift in the mean of the distribution, to compensate for the change.
Simply put, the mean contains more information about the way that the distribution
is arranged spatially. This stems from the fact that the mean of the distribution is a “sum” 
i.e. integral  of terms of the formxp(x)∆x. Thus the location along the x axis, x, not just
the “mass”, p(x)∆x, affects the contribution of parts of the distribution to the value of the
mean.
8.3.3 Example: a nonsymmetric distribution
We slightly modify the function used in Example 8.2.1 to the new expression
f(x) = xsin (πx/6) for 0 ≤ x ≤ 6.
This results in a nonsymmetric probability density, shown in black in Figure 8.4. Steps in
obtaining p(x) would be similar
35
, but we have to carry out an integration by parts to ﬁnd
the normalization constant and/or to calculate the cumulative function, F(x). Further, to
compute the mean of the distribution we have to integrate by parts twice.
Alternatively, we can carry out all such computations (approximately) using the
spreadsheet, as shown in Figure 8.4. We can plot f(x) using sufﬁciently ﬁne increments
∆x along the x axis and compute the approximation for its integral by adding up the quanti
ties f(x)∆x. The area under the curve A, and hence the normalization constant (C = 1/A)
will be thereby determined (at the point corresponding to the end of the interval, x = 6).
It is then an easy matter to replot the revised function f(x)/A, which corresponds to the
normalized probability density. This is the curve shown in black in Figure 8.4. In the prob
lem sets, we leave as an exercise for the reader how to determine the median and the mean
using the same spreadsheet tool for a related (simpler) example.
8.4 Applications of continuous probability
In the next few sections, we explore applications of the ideas developed in this chapter
to a variety of problems. We treat the decay of radioactive atoms, consider distribution of
35
This is good practice, and the reader is encouraged to do this calculation.
162 Chapter 8. Continuous probability distributions
F(x)
0.5
x
med
p(x)
0.0 6.0
0.0
1.0
Figure 8.4. As in Figures 8.1 and 8.2, but for the probability density p(x) =
(π/36)xsin(πx/6). This function is not symmetric, so the mean and median are not the
same. From this ﬁgure, we see that the median is approximately x
med
= 3.6. We do not
show the mean (which is close but not identical). We can compute both the mean and the
median for this distribution using numerical integration with the spreadsheet. We ﬁnd that
the mean is ¯ x = 3.5679. Note that the “most probable value”, i.e. the point at which p(x)
is maximal is at x = 3.9, which is again different from both the mean and the median.
heights in a population, and explore howthe distribution of radii is related to the distribution
of volumes in raindrop drop sizes. The interpretation of the probability density and the
cumulative function, as well as the means and medians in these cases will form the main
focus of our discussion.
8.4.1 Radioactive decay
Radioactive decay is a probabilistic phenomenon: an atom spontaneously emits a particle
and changes into a new form. We cannot predict exactly when a given atom will undergo
this event, but we can study a large collection of atoms and draw some interesting conclu
sions.
We can deﬁne a probability density function that represents the probability per unit
time that an atom would decay at time t. It turns out that a good candidate for such a
function is
p(t) = Ce
−kt
,
where k is a constant that represents the rate of decay (in units of 1/time) of the speciﬁc
radioactive material. In principle, this function is deﬁned over the interval 0 ≤ t ≤ ∞;
that is, it is possible that we would have to wait a “very long time” to have all of the atoms
decay. This means that these integrals have to be evaluated “at inﬁnity”, leading to an
improper integral. Using this probability density for atom decay, we can characterize the
mean and median decay time for the material.
8.4. Applications of continuous probability 163
Normalization
We ﬁrst ﬁnd the constant of normalization, i.e. ﬁnd the constant C such that
∞
0
p(t) dt =
∞
0
Ce
−kt
dt = 1.
Recall that an integral of this sort, in which one of the endpoints is at inﬁnity is called an
improper integral
36
. Some care is needed in understanding how to handle such integrals,
and in particular when they “exist” (in the sense of producing a ﬁnite value, despite the
inﬁnitely long domain of integration). We will delay full discussion to Chapter 10, and
state here the deﬁnition:
I =
∞
0
Ce
−kt
dt ≡ lim
T→∞
I
T
where I
T
=
T
0
Ce
−kt
dt.
The idea is to compute an integral over a ﬁnite interval 0 ≤ t ≤ T and then take a limit as
the upper endpoint, T goes to inﬁnity (T →∞). We compute:
I
T
= C
T
0
e
−kt
dt = C
¸
e
−kt
−k
T
0
=
1
k
C(1 −e
−kT
).
Now we take the limit:
I = lim
T→∞
I
T
= lim
T→∞
1
k
C(1 −e
−kT
) =
1
k
C(1 − lim
T→∞
e
−kT
). (8.3)
To compute this limit, recall that for k > 0, T > 0, the exponential term in Eqn. 8.3 decays
to zero as T increases, so that
lim
T→∞
e
−kT
= 0.
Thus, the second term in braces in the integral I in Eqn. 8.3 will vanish as T →∞so that
the value of the improper integral will be
I = lim
T→∞
I
T
=
1
k
C.
To ﬁnd the constant of normalization C we require that I = 1, i.e.
1
k
C = 1, which means
that
C = k.
Thus the (normalized) probability density for the decay is
p(t) = ke
−kt
.
This means that the fraction of atoms that decay between time t
1
and t
2
is
k
t2
t1
e
−kt
dt.
36
We have already encountered such integrals in Sections 3.8.5 and 4.5. See also, Chapter 10 for a more detailed
discussion of improper integrals.
164 Chapter 8. Continuous probability distributions
Cumulative decays
The fraction of the atoms that decay between time 0 and time t (i.e. “any time up to time
t” or “by time t  note subtle wording”
37
) is
F(t) =
t
0
p(s) ds = k
t
0
e
−ks
ds.
We can simplify this expression by integrating:
F(t) = k
¸
e
−ks
−k
t
0
= −
e
−kt
−e
0
= 1 −e
−kt
.
Thus, the probability of the atoms decaying by time t (which means anytime up to time t)
is
F(t) = 1 −e
−kt
.
We note that F(0) = 0 and F(∞) = 1, as expected for the cumulative function.
Median decay time
As before, to determine the median decay time, t
m
(the time at which half of the atoms
have decayed), we set F(t
m
) = 1/2. Then
1
2
= F(t
m
) = 1 −e
−ktm
,
so we get
e
−ktm
=
1
2
, ⇒ e
ktm
= 2, ⇒ kt
m
= ln 2, ⇒ t
m
=
ln 2
k
.
Thus half of the atoms have decayed by this time. (Remark: this is easily recognized as the
half life of the radioactive process from previous familiarity with exponentially decaying
functions.)
Mean decay time
The mean time of decay
¯
t is given by
¯
t =
∞
0
tp(t) dt.
We compute this integral again as an improper integral by taking a limit as the top endpoint
increases to inﬁnity, i.e. we ﬁrst ﬁnd
I
T
=
T
0
tp(t) dt,
37
Note that the precise English wording is subtle, but very important here. “By time t” means that the event
could have happened at any time right up to time t.
8.4. Applications of continuous probability 165
and then set
¯
t = lim
T→∞
I
T
.
To compute I
T
we use integration by parts:
I
T
=
T
0
tke
−kt
dt = k
T
0
te
−kt
dt.
Let u = t, dv = e
−kt
dt. Then du = dt, v = e
−kt
/(−k), so that
I
T
= k
¸
t
e
−kt
(−k)
−
e
−kt
(−k)
dt
T
0
=
¸
−te
−kt
+
e
−kt
dt
T
0
=
¸
−te
−kt
−
e
−kt
k
T
0
=
¸
−Te
−kT
−
e
−kT
k
+
1
k
Now as T →∞, we have e
−kT
→0 so that
¯
t = lim
T→∞
I
T
=
1
k
.
Thus the mean or expected decay time is
¯
t =
1
k
.
8.4.2 Discrete versus continuous probability
In Chapter 5.3, we compared the treatment of two types of mass distributions. We ﬁrst
explored a set of discrete masses strung along a “thin wire”. Later, we considered a single
“bar” with a continuous distribution of density along its length. In the ﬁrst case, there was
an unambiguous meaning to the concept of “mass at a point”. In the second case, we could
assign a mass to some section of the bar between, say x = a and x = b. (To do so we had
to integrate the mass density on the interval a ≤ x ≤ b.) In the ﬁrst case, we talked about
the mass of the objects, whereas in the latter case, we were interested in the idea of density
(mass per unit distance: Note that the units of mass density are not the same as the units of
mass.)
As we have seen so far in this chapter, the same dichotomy exists in the topic of
probability. In Chapter 7, we were concerned with the probability of discrete events whose
outcome belongs to some ﬁnite set of possibilities (e.g. Head or Tail for a coin toss, allele
A or a in genetics).
The example below provides some further insight to the connection between contin
uous and discrete probability. In particular, we will see that one can arrive at the idea of
probability density by reﬁning a set of measurements and making the appropriate scaling.
We explore this connection in more detail below.
166 Chapter 8. Continuous probability distributions
8.4.3 Example: Student heights
Suppose we measure the heights of all UBC students. This would produce about 30,000
data values
38
. We could make a graph and show how these heights are distributed. For
example, we could subdivide the student body into those students between 0 and 1.5m, and
those between 1.5 and 3 meters. Our bar graph would contain two bars, with the number
of students in each height category represented by the heights of the bars, as shown in
Figure 8.5(a).
h
p(h)
Δ h
h
p(h) p(h)
h
Δ h
Figure 8.5. Reﬁning a histogram by increasing the number of bins leads (eventu
ally) to the idea of a continuous probability density.
Suppose we want to record this distribution in more detail. We could divide the
population into smaller groups by shrinking the size of the interval or “bin” into which
height is subdivided. (An example is shown in Figure 8.5(b)). Here, by a “bin” we mean a
little interval of width ∆h where h is height, i.e. a height interval. For example, we could
keep track of the heights in increments of 50 cm. If we were to plot the number of students
in each height category, then as the size of the bins gets smaller, so would the height of the
bar: there would be fewer students in each category if we increase the number of categories.
To keep the bar height from shrinking, we might reorganize the data slightly. Instead
of plotting the number of students in each bin, we might plot
number of students in the bin
∆h
.
If we do this, then both numerator and denominator decrease as the size of the bins is made
smaller, so that the shape of the distribution is preserved (i.e. it does not get ﬂatter).
We observe that in this case, the number of students in a given height category is
represented by the area of the bar corresponding to that category:
Area of bin = ∆h
number of students in the bin
∆h
= number of students in the bin.
The important point to consider is that the height of each bar in the plot represents the
number of students per unit height.
38
I am grateful to David Austin for developing this example.
8.4. Applications of continuous probability 167
This type of plot is precisely what leads us to the idea of a density distribution. As
∆h shrinks, we get a continuous graph. If we “normalize”, i.e. divide by the total area
under the graph, we get a probability density, p(h) for the height of the population. As
noted, p(h) represents the fraction of students per unit height
39
whose height is h. It is thus
a density, and has the appropriate units. In this case, p(h) ∆h represents the fraction of
individuals whose height is in the range h ≤ height ≤ h +∆h.
8.4.4 Example: Age dependent mortality
In this example, we consider an age distribution and interpret the meanings of the proba
bility density and of the cumulative function. Understanding the connection between the
verbal description and the symbols we use to represent these concepts requires practice and
experience. Related problems are presented in the homework.
Let p(a) be a probability density for the probability of mortality of a female Canadian
nonsmoker at age a, where 0 ≤ a ≤ 120. (We have chosen an upper endpoint of age
120 since practically no Canadian female lives past this age at present.) Let F(a) be the
cumulative distribution corresponding to this probability density. We would like to answer
the following questions:
(a) What is the probability of dying by age a?
(b) What is the probability of surviving to age a?
(c) Suppose that we are told that F(75) = 0.8 and that F(80) differs from F(75) by
0.11. Interpret this information in plain English. What is the probability of surviving
to age 80? Which is larger, F(75) or F(80)?
(d) Use the information in part (c) to estimate the probability of dying between the ages
of 75 and 80 years old. Further, estimate p(80) from this information.
Solution
(a) The probability of dying by age a is the same as the probability of dying any time
up to age a. Restated, this is the probability that the age of death is in the interval
0 ≤ age of death ≤ a. The appropriate quantity is the cumulative function, for this
probability density
F(a) =
a
0
p(x) dx.
Remark: note that, as customary, x is playing the role of a “dummy variable”. We
are integrating over all ages between 0 and a, so we do not want to confuse the
notation for variable of integration, x and endpoint of the interval a. Hence the
symbol x rather than a inside the integral.
39
Note in particular the units of h
−1
attached to this probability density, and contrast this with a discrete
probability that is a pure number carrying no such units.
168 Chapter 8. Continuous probability distributions
(b) The probability of surviving to age a is the same as the probability of not dying
before age a. By the elementary properties of probability discussed in the previous
chapter, this is
1 −F(a).
(c) F(75) = 0.8 means that the probability of dying some time up to age 75 is 0.8.
(This also means that the probability of surviving past this age would be 10.8=0.2.)
From the properties of probability, we know that the cumulative distribution is an
increasing function, and thus it must be true that F(80) > F(75). Then F(80) =
F(75) + 0.11 = 0.8 + 0.11 = 0.91. Thus the probability of surviving to age 80
is 10.91=0.09. This means that 9% of the population will make it to their 80’th
birthday.
(d) The probability of dying between the ages of 75 and 80 years old is exactly
80
75
p(x) dx.
However, we can also state this in terms of the cumulative function, since
80
75
p(x) dx =
80
0
p(x) dx −
75
0
p(x) dx = F(80) −F(75) = 0.11
Thus the probability of death between the ages of 75 and 80 is 0.11.
To estimate p(80), we use the connection between the probability density and the
cumulative distribution
40
:
p(x) = F
(x). (8.4)
Then it is approximately true that
p(x) ≈
F(x +∆x) −F(x)
∆x
. (8.5)
(Recall the deﬁnition of the derivative, and note that we are approximating the deriva
tive by the slope of a secant line.) Here we have information at ages 75 and 80, so
∆x = 80 −75 = 5, and the approximation is rather crude, leading to
p(80) ≈
F(80) −F(75)
5
=
0.11
5
= 0.022 per year.
Several important points merit attention in the above example. First, information contained
in the cumulative function is useful. Differences in values of F between x = a and x = b
are, after all, equivalent to an integral of the function
b
a
p(x)dx, and are the probability
of a result in the given interval, a ≤ x ≤ b. Second, p(x) is the derivative of F(x). In
the expression (8.5), we approximated that derivative by a small ﬁnite difference. Here we
see at play many of the themes that have appeared in studying calculus: the connection be
tween derivatives and integrals, the Fundamental Theoremof Calculus, and the relationship
between tangent and secant lines.
40
In Eqn. (8.4) there is no longer confusion between a variable of integration and an endpoint, so we could
revert to the notation p(a) = F
(a), helping us to identify the independent variable as age. However, we have
avoided doing so simply so that the formula in Eqn. (8.5) would be very recognizable as an approximation for a
derivative.
8.4. Applications of continuous probability 169
8.4.5 Example: Raindrop size distribution
In this example, we ﬁnd a rather nonintuitive result, linking the distribution of raindrops
of various radii with the distribution of their volumes. This reinforces the caution needed
in interpreting and handling probabilities.
During a Vancouver rainstorm, the distribution of raindrop radii is uniform for radii
0 ≤ r ≤ 4 (where r is measured in mm) and zero for larger r. By a uniform distribution
we mean a function that has a constant value in the given interval. Thus, we are saying that
the distribution looks like f(r) = C for 0 ≤ r ≤ 4.
(a) Determine what is the probability density for raindrop radii, p(r)? Interpret the
meaning of that function.
(b) What is the associated cumulative function F(r) for this probability density? Inter
pret the meaning of that function.
(c) In terms of the volume, what is the cumulative distribution F(V )?
(d) In terms of the volume, what is the probability density p(V )?
(e) What is the average volume of a raindrop?
Solution
This problemis challenging because one may be tempted to think that the uniformdistribu
tion of drop radii should give a uniform distribution of drop volumes. This is not the case,
as the following argument shows! The sequence of steps is illustrated in Figure 8.6.
V
F(r)
p(r)
F(V)
r r
V
4
4
(a)
(b)
(c) (d)
p(V)
Figure 8.6. Probability densities for raindrop radius and raindrop volume (left
panels) and for the cumulative distributions (right) of each for Example 8.4.5.
170 Chapter 8. Continuous probability distributions
(a) The probability density function is p(r) = 1/4 for 0 ≤ r ≤ 4. This means that
the probability per unit radius of ﬁnding a drop of size r is the same for all radii in
0 ≤ r ≤ 4, as shown in Fig. 8.6(a). Some of these drops will correspond to small
volumes, and others to very large volumes. We will see that the probability per unit
volume of ﬁnding a drop of given volume will be quite different.
(b) The cumulative function is
F(r) =
r
0
1
4
ds =
r
4
, 0 ≤ r ≤ 4. (8.6)
A sketch of this function is shown in Fig. 8.6(b).
(c) The cumulative function F(r) is proportional to the radius of the drop. We use the
connection between radii and volume of spheres to rewrite that function in terms of
the volume of the drop: Since
V =
4
3
πr
3
(8.7)
we have
r =
3
4π
1/3
V
1/3
.
Substituting this expression into the formula (8.6), we get
F(V ) =
1
4
3
4π
1/3
V
1/3
.
We ﬁnd the range of values of V by substituting r = 0, 4 into Eqn. (8.7) to get
V = 0,
4
3
π4
3
. Therefore the interval is 0 ≤ V ≤
4
3
π4
3
or 0 ≤ V ≤ (256/3)π. The
function F(V ) is sketched in panel (d) of Fig. 8.6.
(d) We nowuse the connection between the probability density and the cumulative distri
bution, namely that p is the derivative of F. Now that the variable has been converted
to volume, that derivative is a little more “interesting”:
p(V ) = F
(V )
Therefore,
p(V ) =
1
4
3
4π
1/3
1
3
V
−2/3
.
Thus the probability per unit volume of ﬁnding a drop of volume V in 0 ≤ V ≤
4
3
π4
3
is not at all uniform. This probability density is shown in Fig. 8.6(c) This results
from the fact that the differential quantity dr behaves very differently from dV , and
reinforces the fact that we are dealing with density, not with a probability per se. We
note that this distribution has smaller values at larger values of V .
8.5. Moments of a probability density 171
(e) The range of values of V is
0 ≤ V ≤
256π
3
,
and therefore the mean volume is
¯
V =
256π/3
0
V p(V )dV =
1
12
3
4π
1/3
256π/3
0
V · V
−2/3
dV
=
1
12
3
4π
1/3
256π/3
0
V
1/3
dV =
1
12
3
4π
1/3
3
4
V
4/3
256π/3
0
=
1
16
3
4π
1/3
256π
3
4/3
=
64π
3
≈ 67mm
3
.
8.5 Moments of a probability density
We are now familiar with some of the properties of probability distributions. On this page
we will introduce a set of numbers that describe various properties of such distributions.
Some of these have already been encountered in our previous discussion, but now we will
see that these ﬁt into a pattern of quantities called moments of the distribution.
8.5.1 Deﬁnition of moments
Let f(x) be any function which is deﬁned and positive on an interval [a, b]. We might refer
to the function as a distribution, whether or not we consider it to be a probability density.
Then we will deﬁne the following moments of this function:
zero’th moment M
0
=
b
a
f(x) dx
ﬁrst moment M
1
=
b
a
x f(x) dx
second moment M
2
=
b
a
x
2
f(x) dx
.
.
.
n’th moment M
n
=
b
a
x
n
f(x) dx.
Observe that moments of any order are deﬁned by integrating the distribution f(x)
with a suitable power of x over the interval [a, b]. However, in practice we will see that
usually moments up to the second are usefully employed to describe common attributes of
a distribution.
172 Chapter 8. Continuous probability distributions
8.5.2 Relationship of moments to mean and variance of a
probability density
In the particular case that the distribution is a probability density, p(x), deﬁned on the
interval a ≤ x ≤ b, we have already established the following :
M
0
=
b
a
p(x) dx = 1.
(This follows from the basic property of a probability density.) Thus The zero’th moment
of any probability density is 1. Further
M
1
=
b
a
x p(x) dx = ¯ x = µ.
That is, The ﬁrst moment of a probability density is the same as the mean (i.e. expected
value) of that probability density. So far, we have used the symbol ¯ x to represent the mean
or average value of x but often the symbol µ is also used to denote the mean.
The second moment, of a probability density also has a useful interpretation. From
above deﬁnitions, the second moment of p(x) over the interval a ≤ x ≤ b is
M
2
=
b
a
x
2
p(x) dx.
We will shortly see that the second moment helps describe the way that density is dis
tributed about the mean. For this purpose, we must describe the notion of variance or
standard deviation.
Variance and standard deviation
Two children of approximately the same size can balance on a teetertotter by sitting very
close to the point at which the beampivots. They can also achieve a balance by sitting at the
very ends of the beam, equally far away. In both cases, the center of mass of the distribution
is at the same place: precisely at the pivot point. However, the mass is distributed very
differently in these two cases. In the ﬁrst case, the mass is clustered close to the center,
whereas in the second, it is distributed further away. We may want to be able to describe
this distinction, and we could do so by considering higher moments of the mass distribution.
Similarly, if we want to describe how a probability density distribution is distributed
about its mean, we consider moments higher than the ﬁrst. We use the idea of the variance
to describe whether the distribution is clustered close to its mean, or spread out over a great
distance from the mean.
Variance
The variance is deﬁned as the average value of the quantity (distance from mean)
2
,
where the average is taken over the whole distribution. (The reason for the square is that
we would not like values to the left and right of the mean to cancel out.) For discrete
probability with mean, µ we deﬁne variance by
8.5. Moments of a probability density 173
V =
¸
(x
i
−µ)
2
p
i
.
For a continuous probability density, with mean µ, we deﬁne the variance by
V =
b
a
(x −µ)
2
p(x) dx.
The standard deviation
The standard deviation is deﬁned as
σ =
√
V .
Let us see what this implies about the connection between the variance and the moments
of the distribution.
Relationship of variance to second moment
From the equation for variance we calculate that
V =
b
a
(x −µ)
2
p(x) dx =
b
a
(x
2
−2µx +µ
2
) p(x) dx.
Expanding the integral leads to:
V =
b
a
x
2
p(x)dx −
b
a
2µx p(x) dx +
b
a
µ
2
p(x) dx
=
b
a
x
2
p(x)dx −2µ
b
a
x p(x) dx +µ
2
b
a
p(x) dx.
We recognize the integrals in the above expression, since they are simply moments of the
probability distribution. Using the deﬁnitions, we arrive at
V = M
2
−2µ µ + µ
2
.
Thus
V = M
2
−µ
2
.
Observe that the variance is related to the second moment, M
2
and to the mean, µ of the
distribution.
174 Chapter 8. Continuous probability distributions
Relationship of variance to second moment
Using the above deﬁnitions, the standard deviation, σ can be expressed as
σ =
√
V =
M
2
−µ
2
.
8.5.3 Example: computing moments
Consider a probability density such that p(x) = C is constant for values of x in the interval
[a, b] and zero for values outside this interval
41
. The area under the graph of this function
for a ≤ x ≤ b is A = C · (b − a) ≡ 1 (enforced by the usual property of a probability
density), so it is easy to see that the value of the constant C should be C = 1/(b −a). Thus
p(x) =
1
b −a
, a ≤ x ≤ b.
We compute some of the moments of this probability density
M
0
=
b
a
p(x)dx =
1
b −a
b
a
1 dx = 1.
(This was already known, since we have determined that the zeroth moment of any proba
bility density is 1.) We also ﬁnd that
M
1
=
b
a
x p(x) dx =
1
b −a
b
a
x dx =
1
b −a
x
2
2
b
a
=
b
2
−a
2
2(b −a)
.
This last expression can be simpliﬁed by factoring, leading to
µ = M
1
=
(b −a)(b +a)
2(b −a)
=
b +a
2
.
The value (b+a)/2 is a midpoint of the interval [a, b]. Thus we have found that the mean µ
is in the center of the interval, as expected for a symmetric distribution. The median would
be at the same place by a simple symmetry argument: half the area is to the left and half
the area is to the right of this point.
To ﬁnd the variance we calculate the second moment,
M
2
=
b
a
x
2
p(x) dx =
1
b −a
b
a
x
2
dx =
1
b −a
x
3
3
b
a
=
b
3
−a
3
3(b −a)
.
Factoring simpliﬁes this to
M
2
=
(b −a)(b
2
+ab +a
2
)
3(b −a)
=
b
2
+ab +a
2
3
.
41
As noted before, this is a uniform distribution. It has the shape of a rectangular band of height C and base
(b −a).
8.6. Summary 175
The variance is then
V = M
2
−µ
2
=
b
2
+ab +a
2
3
−
(b +a)
2
4
=
b
2
−2ab +a
2
12
=
(b −a)
2
12
.
The standard deviation is
σ =
(b −a)
2
√
3
.
8.6 Summary
In this chapter, we extended the discrete probability encountered in Chapter 7 to the case of
continuous probability density. We learned that this function is a probability per unit value
(of the variable of interest), so that
b
a
p(x)dx = probability that x takes a value in the interval (a, b).
We also deﬁned and studied the cumulative function
F(x) =
x
a
p(s)ds = probability of a value in the interval (a, x).
We noted that by the Fundamental Theorem of Calculus, F(x) is an antiderivative of p(x)
(or synonymously, p
(x) = F(x).)
The mean and median are two descriptors for some features of probability densities.
For p(x) deﬁned on an interval a ≤ x ≤ b and zero outside, the mean, (¯ x, or sometimes
called µ) is
¯ x =
b
a
xp(x)dx
whereas the median, x
med
is the value for which
F(x
med
) =
1
2
.
Both mean and median correspond to the “center” of a symmetric distribution. If the dis
tribution is nonsymmetric, a long tail in one direction will shift the mean toward that
direction more strongly than the median. The variance of a probability density is
V =
b
a
(x −µ)
2
p(x) dx,
and the standard deviation is
σ =
√
V .
This quantity describes the “width” of the distribution, i.e. how spread out (large σ) or
clumped (small σ) it is.
We deﬁned the n’th moment of a probability density as
M
n
=
b
a
x
n
p(x)dx,
176 Chapter 8. Continuous probability distributions
and showed that the ﬁrst few moments are related to mean and variance of the probability.
Most of these concepts are directly linked to the analogous ideas in discrete probability,
but in this chapter, we used integration in place of summation, to deal with the continuous,
rather than the discrete case.
Chapter 9
Differential Equations
9.1 Introduction
A differential equation is a relationship between some (unknown) function and one of its
derivatives. Examples of differential equations were encountered in an earlier calculus
course in the context of population growth, temperature of a cooling object, and speed of a
moving object subjected to friction. In Section 4.2.4, we reviewed an example of a differ
ential equation for velocity, (4.8), and discussed its solution, but here, we present a more
systematic approach to solving such equations using a technique called separation of vari
ables. In this chapter, we apply the tools of integration to ﬁnding solutions to differential
equations. The importance and wide applicability of this topic cannot be overstated.
In this course, since we are concerned only with functions that depend on a single
variable, we discuss ordinary differential equations (ODE’s), whereas later, after a mul
tivariate calculus course where partial derivatives are introduced, a wider class, of partial
differential equations (PDE’s) can be studied. Such equations are encountered in many ar
eas of science, and in any quantitative analysis of systems where rates of change are linked
to the state of the system. Most laws of physics are of this form; for example, applying
the familiar Newton’s law, F = ma, links the position of a pendulum’s mass to its accel
eration (second derivative of position).
42
Many biological processes are also described by
differential equations. The rate of growth of a population dN/dt depends on the size of
that population at the given time N(t).
Constructing the differential equation that adequately represents a system of interest
is an art that takes some thought and experience. In this process, which we call “modeling”,
many simpliﬁcations are made so that the essential properties of a given system are cap
tured, leaving out many complicating details. For example, friction might be neglected in
“modeling” a perfect pendulum. The details of age distribution might be neglected in mod
eling a growing population. Now that we have techniques for integration, we can devise a
new approach to computing solutions of differential equations.
Given a differential equation and a starting value, the goal is to make a prediction
42
Newton’s law states that force is proportional to acceleration. For a pendulum, the force is due to gravity, and
the acceleration is a second derivative of the x or y coordinate of the bob on the pendulum.
177
178 Chapter 9. Differential Equations
about the future behaviour of the system. This is equivalent to identifying the function that
satisﬁes the given differential equation and initial value(s). We refer to such a function as
the solution to the initial value problem (IVP). In differential calculus, our exploration
of differential equations was limited to those whose solution could be guessed, or whose
solution was supplied in advance. We also explored some of the fascinating geometric and
qualitative properties of such equations and their predictions.
Now that we have techniques of integration, we can ﬁnd the analytic solution to a
variety of simple ﬁrstorder differential equations (i.e. those involving the ﬁrst derivative
of the unknown function). We will describe the technique of separation of variables. This
technique works for examples that are simple enough that we can isolate the dependent
variable (e.g. y) on one side of the equation, and the independent variable (e.g. time t) on
the other side.
9.2 Unlimited population growth
We start with a simple example that was treated thoroughly in the differential calculus
semester of this course. We consider a population with per capita birth and mortality rates
that are constant, irrespective of age, disease, environmental changes, or other effects. We
ask how a population in such ideal circumstances would change over time. We build up
a simple model (i.e. a differential equation) to describe this ideal case, and then proceed
to ﬁnd its solution. Solving the differential equation is accomplished by a new technique
introduced here, namely separation of variables. This reduces the problem to integration
and algebraic manipulation, allowing us to compute the population size at any time t. By
going through this process, we essentially convert information about the rate of change and
starting level of the population to a detailed prediction of the population at later times.
43
9.2.1 A simple model for population growth
Let y(t) represent the size of a population at time t. We will assume that at time t = 0, the
population level is speciﬁed, i.e. y(0) = y
0
is some given constant. We want to ﬁnd the
population at later times, given information about birth and mortality rates, (both of which
are here assumed to be constant over time).
The population changes through births and mortality. Suppose that b > 0 is the per
capita average birth rate, and m > 0 the per capita average mortality rate. The assumption
that b, m are both constants is a simpliﬁcation that neglects many biological effects, but
will be used for simplicity in this ﬁrst example.
The statement that the population increases through births and decreases due to mor
tality, can be restated as
rate of change of y = rate of births −rate of mortality
where the rate of births is given by the product of the per capita average birth rate b and the
population size y. Similarly, the rate of mortality is given by my. Translating the rate of
43
Of course, we must keep in mind that such predictions are based on simplifying assumptions, and are to be
taken as an approximation of any real population growth.
9.2. Unlimited population growth 179
change into the corresponding derivative of y leads to
dy
dt
= by −my = (b − m)y.
Let us deﬁne the new constant,
k = b −m.
Then k is the net per capita growth rate of the population. We can distinguish two possible
cases: b > m means that there are more births then deaths, so we expect the population
to grow. b < m means that there are more deaths than births, so that the population will
eventually go extinct. There is also a marginal case that b = m, for which k = 0, where the
population does not change at all. To summarize, this simple model of unlimited growth
leads to the differential equation and initial condition:
dy
dt
= ky, y(0) = y
0
. (9.1)
Recall that a differential equation together with an initial condition is called an initial value
problem. To ﬁnd a solution to such a problem, we look for the function y(t) that describes
the population size at any future time t, given its initial size at time t = 0.
9.2.2 Separation of variables and integration
We here introduce the technique, separation of variables, that will be used in all the
examples described in this chapter. Since the differential equation (9.1) is relatively simple,
this ﬁrst example will be relatively straightforward. We would like to determine y(t) given
the differential equation
dy
dt
= ky.
Rather than integrating this equation as is
44
, we use an alternate approach, consid
ering dt and dy as “differentials” in the sense deﬁned in Section 6.1. We rearrange and
rewrite the above equation in the form
1
y
dy = k dt, (9.2)
This step of putting expressions involving the independent variable t on one side and ex
pressions involving the dependent variable y on the opposite side gives rise to the name
“separation of variables”.
Now, the LHS of Eqn. (9.2) depends only on the variable y, and the RHS only on t.
The constant k will not interfere with any integration step. Moreover, integrating each side
of Eqn. (9.2) can be carried out independently.
To determine the appropriate intervals for integration, we observe that when time
sweeps over some interval 0 ≤ t ≤ T (from initial to ﬁnal time), the value of y(t) will
44
We may be tempted to integrate both sides of this equation with respect to the independent variable t, e.g.
writing
dy
dt
dt =
ky dt + C, (where C is some constant), but this is not very useful, since the integral on
the right hand side (RHS) can only be carried out if we know the function y = y(t), which we are trying to
determine.
180 Chapter 9. Differential Equations
change over a corresponding interval y
0
≤ y ≤ y(T). Here y
0
is the given starting value
of y (prescribed by the initial condition in (9.1)). We do not yet know y(T), but our goal
is to ﬁnd that value, i.e to predict the future behaviour of y. Integrating leads to
y(T)
y0
1
y
dy =
T
0
k dt = k
T
0
dt,
ln y
y(T)
y0
= kt
T
0
,
ln y(T) −ln y(0) = k(T −0),
ln
y(T)
y
0
= kT,
y(T)
y
0
= e
kT
,
y(T) = y
0
e
kT
.
But this result holds for any arbitrary ﬁnal time, T. In other words, since this is true for any
time we chose, we can set T = t, arriving at the desired solution
y(t) = y
0
e
kt
. (9.3)
The above formula relates the predicted value of y at any time t to its initial value, and to
all the parameters of the problem. Observe that plugging in t = 0, we get y(0) = y
0
e
kt
=
y
0
e
0
= y
0
, so that the solution (9.3) satisﬁes the initial condition. We leave as an exercise
for the reader
45
to validate that the function in(9.3) also satisﬁes the differential equation in
(9.1).
By solving the initial value problem (9.1), we have determined that, under ideal con
ditions, when the net per capita growth rate t is constant, a population will grow expo
nentially with time. Recall that this validates results that we had encountered in our ﬁrst
calculus course.
9.3 Terminal velocity and steady states
Here we revisit the equation for velocity of a falling object that we ﬁrst encountered in Sec
tion 4.2.4. We wish to derive the appropriate differential equation governing that velocity,
and ﬁnd the solution v(t) as a function of time. We will ﬁrst reconsider the simplest case of
uniformly accelerated motion (i.e. where friction is neglected), as in Section 4.2.3. We then
include friction, as in Section 4.2.4 and use the new technique of separation of variables to
shortcut the method of solution.
45
This kind of check is good practice and helps to spot errors. Simply differentiate Eqn. (9.3) and show that the
result is the same as k times the original function, as required by the equation (9.1).
9.3. Terminal velocity and steady states 181
9.3.1 Ignoring friction: the uniformly accelerated case
Let v(t) and a(t) be the velocity and the acceleration, respectively of an object falling
under the force of gravity at time t. We take the positive direction to be downwards, for
convenience. Suppose that at time t = 0, the object starts from rest, i.e. the initial velocity
of the object is known to be v(0) = 0. When friction is neglected, the object will accelerate,
a(t) = g,
which is equivalent to the statement that the velocity increases at a constant rate,
dv
dt
= g. (9.4)
Because g is constant, we do not need to use separation of variables, i.e. we can integrate
each side of this equation directly
46
. Writing
dv
dt
dt =
g dt +C = g
dt +C,
where C is an integration constant, we arrive at
v(t) = gt +C. (9.5)
Here we have used (on the LHS) that v is the antiderivative of dv/dt. (equivalently, we can
simplify the integral
dv
dt
dt =
dv = v). Plugging in v(0) = 0 into Eqn. (9.5) leads to
0 = g · 0 +C = C, so the constant we need is C = 0 and the velocity satisﬁes
v(t) = gt.
We have just arrived at a result that parallels Eqn. (4.4) of Section 4.2.3 (in slightly different
notation).
9.3.2 Including friction: the case of terminal velocity
When a falling object experiences the force of friction, it cannot accelerate indeﬁnitely. In
fact, a frictional force retards the downwards motion. To a good approximation, that force
is proportional to the velocity.
A force balance for the falling object leads to
ma(t) = mg −γv(t),
where γ is the frictional coefﬁcient. For an object of constant mass, we can divide through
by m, so
a(t) = g −
γ
m
v(t).
46
It is important to note the distinction between this simple example and other cases where separation of vari
ables is required. It would not be wrong to use separation of variables to ﬁnd the solution for Eqn. (9.4), but it
would just be “overkill”, since simple integration of the each side of the equation “as is” does the job.
182 Chapter 9. Differential Equations
Let k = γ/m. Then, the velocity at any time satisﬁes the differential equation and initial
condition
dv
dt
= g −kv, v(0) = 0. (9.6)
We can ﬁnd the solution to this differential equation and predict the velocity at any time t
using separation of variables.
terminal velocity
time t
velocity v
0.0 10.0
0.0
20.0
Figure 9.1. The velocity v(t) as a function of time given by Eqn. (9.7) as found in
Section 9.3.2. Note that as time increases, the velocity approaches some constant terminal
velocity. The parameters used were g = 9.8 m/s
2
and k = 0.5.
Consider a time interval 0 ≤ t ≤ T, and suppose that, during this time interval, the
velocity changes from an initial value of v(0) = 0 to the ﬁnal value, v(T) at the ﬁnal time,
T. Then using separation of variables and integration, we get
dv
dt
= g −kv,
dv
g −kv
= dt,
v(T)
0
dv
g −kv
=
T
0
dt.
9.3. Terminal velocity and steady states 183
Substitute u = g − kv for the integral on the left hand side. Then du = −kdv, dv =
(−1/k)du, so we get an integral of the form
−
1
k
1
u
du = −
1
k
ln u.
After replacing u by g −kv, we arrive at
−
1
k
ln g −kv
v(T)
0
= t
T
0
.
We use the fact that v(0) = 0 to write this as
−
1
k
(lng −kv(T) −ln g) = T,
−
1
k
ln
g −kv(T)
g
= T,
ln
g −kv(T)
g
= −kT.
We are ﬁnished with the integration step, but the function we are trying to ﬁnd, v(T)
is still tangled up inside an expression involving the natural logarithm. Extricating it will
involve some subtle reasoning about signs because there is an absolute value to contend
with. As a ﬁrst step, we exponentiate both sides to remove the logarithm.
g −kv(T)
g
= e
−kT
⇒ g −kv(T) = ge
−kT
.
Because the constant g is positive, we could remove absolute values signs from it. To
simplify further, we have to consider the sign of the term inside the absolute value in the
numerator. In the case we are considering here, v(0) = 0. This will mean that the quantity
g −kv(T) is always be nonnegative (i.e. g −kv(T) ≥ 0). We will verify this fact shortly.
For the moment, supposing this is true, we can write
g −kv(T) = g − kv(T) = ge
−kT
,
and ﬁnally solve for v(T) to obtain our ﬁnal result,
v(T) =
g
k
(1 −e
−kT
).
Here we note that v(T) can never be larger than g/k since the term (1 − e
−kT
) is always
≤ 1. Hence, we were correct in assuming that g −kv(T) ≥ 0.
As before, the above formula relating velocity to time holds for any choice of the
ﬁnal time T, so we can write, in general,
v(t) =
g
k
(1 −e
−kt
). (9.7)
184 Chapter 9. Differential Equations
This is the solution to the initial value problem (9.6). It predicts the velocity of the
falling object through time. Note that we have arrived once more at the result obtained
in Eqn. (4.11), but using the technique of separation of variables
47
.
We graph the expression given in (9.7) in Figure 9.1. Note that as t increases, the
term e
−kt
decreases rapidly, so that the velocity approaches a constant whose value is
v(t) →
g
k
.
We call this the terminal velocity
48
.
9.3.3 Steady state
We might observe that the terminal velocity can also be found quite simply and directly
from the differential equation itself: it is the steady state of the differential equation, i.e.
the value for which no further change takes place. The steady state can be found by setting
the derivative in the differential equation, to zero, i.e. by letting
dv
dt
= 0.
When this is done, we arrive at
g −kv = 0 ⇒ v =
g
k
.
Thus, at steady state, the velocity of the falling object is indeed the same as the terminal
velocity that we have just discovered.
9.4 Related problems and examples
The example discussed in Section 9.3.2 belongs to a class of problems that share many
common features. Generally, this class is represented by linear differential equations of the
form
dy
dt
= a −by, (9.8)
with given initial condition y(0) = y
0
. Properties of this equation were studied in the
context of differential calculus in a previous semester. Now, with the same method as we
applied to the problem of terminal velocity, we can integrate this equation by separation of
variables, writing
dy
a −by
= dt
and proceeding as in the previous example. We arrive at its solution,
y(t) =
a
b
+
y
0
−
a
b
e
−bt
. (9.9)
47
It often happens that a differential equation can be solved using several different methods.
48
A similar plot of the solution of the differential equation (9.6) could be assembled using Euler’s method, as
studied in differential calculus. That is the numerical method alternative to the analytic technique discussed in this
chapter. The student may wish to review results obtained in a previous semester to appreciate the correspondence.
9.4. Related problems and examples 185
The steps are left as an exercise for the reader.
We observe that the steady state of the above equation is obtained by setting
dy
dt
= a −by = 0, i.e. y =
a
b
.
Indeed the solution given in the formula (9.9) has the property that as t increases, the
exponential term e
−bt
→ 0 so that the term in large brackets will vanish and y → a/b.
This means that from any initial value, y will approach its steady state level.
This equation has a number of important applications that arise in a variety of context.
A few of these are mentioned below.
9.4.1 Blood alcohol
Let y(t) be the level of alcohol in the blood of an individual during a party. Suppose that
the average rate of drinking is gradual and constant (i.e. small sips are continually taken,
so that the rate of input of alcohol is approximately constant). Further, assume that alcohol
is detoxiﬁed in the liver at a rate proportional to its blood level. Then an equation of the
form (9.8) would describe the blood level over the period of drinking. y(0) = 0 would
signify the absence of alcohol in the body at the beginning of the evening. The constant a
would reﬂect the rate of intake per unit volume of the individual’s blood: larger people take
longer to “get drunk” for a given amount consumed
49
. The constant b represents the rate of
decay of alcohol per unit time due to degradation by the liver, assumed constant
50
; young
healthy drinkers have a higher value of b than those who can no longer metabolize alcohol
as efﬁciently.
The solution (9.9) has several features of note: it illustrates the fact that alcohol
would increase from the initial level, but only up to a maximum of a/b, where the intake
and degradation balance. Indeed, the level y = a/b represents a steady state level (as
long as drinking continues). Of course, this level could be toxic to the drinker, and the
assumptions of the model may break down in that region! In the phase of “recovery”,
after drinking stops, the above differential equation no longer describes the level of blood
alcohol. Instead, the process of recovery is represented by
dy
dt
= −by, y(0) = y
0
. (9.10)
The level of blood alcohol then decays exponentially with rate b from its level at the mo
ment that drinking ends. We show this typical pattern in Figure 9.2.
9.4.2 Chemical kinetics
The same ideas apply to any chemical substance that is formed at a constant rate (or sup
plied at a constant rate) a, and then breaks down with rate proportional to its concentration.
We then call the constant b the “decay rate constant”.
49
Of course, we are here assuming a constant intake rate, as though the alcohol is being continually sipped
all evening at a uniform rate. Most people do not drink this way, instead quafﬁng a few large drinks over some
hour(s). It is possible to describe this, but we will not do so in this chapter.
50
This is also a simplifying assumption, as the rate of metabolism can depend on other factors, such as food
intake.
186 Chapter 9. Differential Equations
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10
Blood Alcohol level
Figure 9.2. The level of alcohol in the blood is described by Eqn. (9.8) for the ﬁrst
two hours of drinking. At t = 2h, the drinking stopped (so a = 0 from then on). The level
of alcohol in the blood then decays back to zero, following Eqn. (9.10).
The variable y(t) represents the concentration of chemical at time t, and the same
differential equation describes this chemical process. As above, given any initial level of
the substance, y(t) = y
0
, the level of y will eventually approach the steady state, y = a/b.
9.5 Emptying a container
In this section we investigate a new problem in which the differential equation that de
scribes a process will be derived from basic physical principles
51
. We will look at the ﬂow
of ﬂuid leaking out of a container, and use mass balance to derive a differential equation
model. When this is done, we will also use separation of variables to predict how long it
takes for the container to be emptied.
We will assume that the container has a small hole at its base. The rate of emptying
of the container will depend on the height of ﬂuid in the container above the hole
52
. We
can derive a simple differential equation that describes the rate that the height of the ﬂuid
changes using the following physical argument.
9.5.1 Conservation of mass
Suppose that the container is a cylinder, with a constant cross sectional area A > 0, as
shown in Fig. 9.3. Suppose that the area of the hole is a. The rate that ﬂuid leaves through
the hole must balance with the rate that ﬂuid decreases in the container. This principle is
called mass balance. We will here assume that the density of water is constant, so that we
can talk about the net changes in volume (rather than mass).
51
This example is particularly instructive. First, it shows precisely how physical laws can be combined to
formulate a model, then it shows how the problem can be recast as a single ODE in one dependent variable.
Finally, it illustrates a slightly different integral.
52
As we have assumed that the hole is at h = 0, we henceforth consider the height of the ﬂuid surface, h(t) to
be the same as ”the height of ﬂuid above the hole”.
9.5. Emptying a container 187
A
v t Δ
a
a
h
Figure 9.3. We investigate the time it takes to empty a container full of ﬂuid by
deriving a differential equation model and solving it using the methods developed in this
chapter. A is the crosssectional area of the cylindrical tank, a is the crosssectional area
of the hole through which ﬂuid drains, v(t) is the velocity of the ﬂuid, and h(t) is the time
dependent height of ﬂuid remaining in the tank (indicated by the dashed line). The volume
of ﬂuid leaking out in a time span ∆t is av∆t  see small cylindrical volume indicated on
the right.
We refer to V (t) as the volume of ﬂuid in the container at time t. Note that for the
cylindrical container, V (t) = Ah(t) where A is the crosssectional area and h(t) is the
height of the ﬂuid at time t. The rate of change of V is
dV
dt
= −(rate volume lost as ﬂuid ﬂows out).
(The minus sign indicates that the volume is decreasing).
At every second, some amount of ﬂuid leaves through the hole. Suppose we are
told that the velocity of the water molecules leaving the hole is precisely v(t) in units of
cm/sec. (We will ﬁnd out how to determine this velocity shortly.) Then in one second,
those particles have moved a distance v cm/sec · 1 sec = v cm. In fact, all the particles in
a little cylinder of length v behind these molecules have also left the hole. Indeed, if we
know the area of the hole, we can determine precisely what volume of water exits through
the hole each second, namely
rate volume lost as ﬂuid ﬂows out = va.
(The small inset in Fig. 9.3 shows a little “cylindrical unit” of ﬂuid that ﬂows out of the
hole per second. The area is a and the length of that little volume is v. Thus the volume
leaving per second is va.)
So far we have a relationship between the volume of ﬂuid in the tank and the velocity
of the water exiting the hole:
dV
dt
= −av.
Now we need to determine the velocity v of the ﬂow to complete the formulation of the
problem.
188 Chapter 9. Differential Equations
9.5.2 Conservation of energy
The ﬂuid “picks up speed” because it has “dropped” by a height h from the top of the ﬂuid
surface to the hole. In doing so, a small mass of water has simply exchanged some potential
energy (due to its relative height above the hole) for kinetic energy (expressed by how fast
it is moving). Potential energy of a small mass of water (m) at height h will be mgh,
whereas when the water ﬂows out of the hole, its kinetic energy is given by (1/2)mv
2
where v is velocity. Thus, for these to balance (so that total energy is conserved) we have
1
2
mv
2
= mgh.
(Here v = v(t) is the instantaneous velocity of the ﬂuid leaving the hole and h = h(t) is
the height of the water column.) This allows us to relate the velocity of the ﬂuid leaving
the hole to the height of the water in the tank, i.e.
v
2
= 2gh ⇒ v =
2gh. (9.11)
In fact, both the height of ﬂuid and its exit velocity are constantly changing as the ﬂuid
drains, so we might write [v(t)]
2
= 2gh(t) or v(t) =
2gh(t). We have arrived at this
result using an energy balance argument.
9.5.3 Putting it together
We now combine the various pieces of information to arrive at the model, a differential
equation for a single (unknown) function of time. There are three timedependent variables
that were discussed above, the volume V (t), the height h(t), of the velocity v(t). It proves
convenient to express everything in terms of the height of water in the tank, h(t), though
this choice is to some extent arbitrary. Keeping units in an equation consistent is essential.
Checking for unit consistency can help to uncover errors in equations, including differential
equations.
Recall that the volume of the water in the tank, V (t) is related to the height of ﬂuid
h(t) by
V (t) = Ah(t),
where A > 0 is a constant, the crosssectional area of the tank. We can simplify as follows:
dV
dt
=
d(Ah(t))
dt
= A
d(h(t))
dt
.
But by previous steps and Eqn. (9.11)
dV
dt
= −av = −a
2gh.
Thus
A
d(h(t))
dt
= −a
2gh,
or simply put,
dh
dt
= −
a
A
2gh = −k
√
h. (9.12)
9.5. Emptying a container 189
where k is a constant that depends on the size and shape of the cylinder and its hole:
k =
a
A
2g.
If the area of the hole is very small relative to the crosssectional area of the tank, then
k will be very small, so that the tank will drain very slowly (i.e. the rate of change in h
per unit time will not be large). On a planet with a very high gravitational force, the same
tank will drain more quickly. A taller column of water drains faster. Once its height has
been reduced, its rate of draining also slows down. We comment that Equation (9.12) has
a minus sign, signifying that the height of the ﬂuid decreases.
Using simple principles such as conservation of mass and conservation of energy,
we have shown that the height h(t) of water in the tank at time t satisﬁes the differential
equation (9.12). Putting this together with the initial condition (height of ﬂuid h
0
at time
t = 0), we arrive at initial value problem to solve:
dh
dt
= −k
√
h, h(0) = h
0
. (9.13)
Clearly, this equation is valid only for h nonnegative. We also remark that Eqn. (9.13) is
nonlinear
53
as it involves the variable h in a nonlinear term,
√
h. Next, we use separation
of variables to ﬁnd the height as a function of time.
9.5.4 Solution by separation of variables
The equation (9.13) shows how height of ﬂuid is related to its rate of change, but we are
interested in an explicit formula for ﬂuid height h versus time t. To obtain that relationship,
we must determine the solution to this differential equation. We do this using separation of
variables. (We will also use the initial condition h(0) = h
0
that accompanies Eqn. (9.13).)
As usual, rewrite the equation in the separated form,
dh
√
h
= −kdt,
We integrate from t = 0 to t = T, during which the height of ﬂuid that started as h
0
becomes some new height h(T) to be determined.
h(T)
h0
1
√
h
dh = −k
T
0
dt.
Now integrate both sides and simplify:
h
1/2
(1/2)
h(T)
h0
= −kT
2
h(T) −
h
0
= −kT
53
In many cases, nonlinear differential equations are more challenging than linear ones. However, examples
chosen in this chapter are simple enough that we will not experience the true challenges of such nonlinearities.
190 Chapter 9. Differential Equations
h(T) = −k
T
2
+
h
0
h(T) =
h
0
−k
T
2
2
.
Since this is true for any time t, we can also write the form of the solution as
h(t) =
h
0
−k
t
2
2
. (9.14)
Eqn. (9.14) predicts ﬂuid height remaining in the tank versus time t. In Fig. 9.4 we show
some of the “solution curves”
54
, i.e. functions of the formEqn. (9.14) for a variety of initial
ﬂuid height values h
0
. We can also use our results to predict the emptying time, as shown
in the next section.
h(t)
time t
<= initial height of fluid
emptying time
V
Emptying a fluidfilled container
0.0 20.0
0.0
10.0
Figure 9.4. Solution curves obtained by plotting Eqn. (9.14) for three different
initial heights of ﬂuid in the container, h
0
= 2.5, 5, 10. The parameter k = 0.4 in
each case. The “V” points to the time it takes the tank to empty starting from a height of
h(t) = 10.
54
As before, this ﬁgure was produced by plotting the analytic solution (9.14). A numerical method alternative
would use Euler’s Method and the spreadsheet to obtain the (approximate) solution directly from the initial value
problem (9.13).
9.6. Density dependent growth 191
9.5.5 How long will it take the tank to empty?
The tank will be empty when the height of ﬂuid is zero. Setting h(t) = 0 in Eqn. 9.14
h
0
−k
t
2
2
= 0.
Solving this equation for the emptying time t
e
, we get
k
t
e
2
=
h
0
⇒ t
e
=
2
√
h
0
k
.
The time it takes to empty the tank depends on the initial height of water in the tank. Three
examples are shown in Figure 9.4 for initial heights of h
0
= 2.5, 5, 10. The emptying time
depends on the squareroot of the initial height. This means, for instance, that doubling the
height of ﬂuid initially in the tank only increases the time it takes by a factor of
√
2 ≈ 1.41.
Making the hole smaller has a more direct “proportional” effect, since we have found that
k = (a/A)
√
2g.
9.6 Density dependent growth
The simple model discussed in Section 9.2 for population growth has an unrealistic feature
of unlimited explosive exponential growth. To correct for this unrealistic feature, a common
assumption is that the rate of growth is “density dependent”. In this section, we consider
a revised differential equation that describes such growth, and use the new tools to analyze
its predictions. In place of our previous notation we will now use N to represent the size
of a population.
9.6.1 The logistic equation
The logistic equation is the simplest density dependent growth equation, and we study its
behaviour below.
Let N(t) be the size of a population at time t. Clearly, we expect N(t) ≥ 0 for all
time t, since a population cannot be negative. We will assume that the initial population is
known, N(0) = N
0
. The logistic differential equation states that the rate of change of the
population is given by
dN
dt
= rN
K −N
K
. (9.15)
Here r > 0 is called the intrinsic growth rate and K > 0 is called the carrying capacity.
K reﬂects that size of the population that can be sustained by the given environment. We
can understand this equation as a modiﬁed growth law in which the “density dependent”
term, r(K −N)/K, replaces the previous constant net growth rate k.
192 Chapter 9. Differential Equations
9.6.2 Scaling the equation
The form of the equation can be simpliﬁed if we measure the population in units of the
carrying capacity, instead of “numbers of individuals”. i.e. if we deﬁne a new quantity
y(t) =
N(t)
K
.
This procedure is called scaling. To see this, consider dividing each side of the logistic
equation (9.15) by the constant K. Then
1
K
dN
dt
=
r
K
N
K −N
K
.
We now group terms conveniently, forming
d(
N
K
)
dt
= r
N
K
1 −
N
K
.
Replacing (N/K) by y in each case, we obtain the scaled equation and initial condition
given by
dy
dt
= ry(1 −y), y(0) = y
0
. (9.16)
Now the variable y(t) measures population size in “units” of the carrying capacity, and
y
0
= N
0
/K is the scaled initial population level. Here again is an initial value problem,
like Eqn. (9.13), but unlike Eqn. (9.1), the logistic differential equation is nonlinear. That
is, the variable y appears in a nonlinear expression (in fact a quadratic) in the equation.
9.6.3 Separation of variables
Here we will solve Eqn. (9.16) by separation of variables. The idea is essentially the same
as our previous examples, but is somewhat more involved. To show an alternative method
of handling the integration, we will treat both sides as indeﬁnite integrals. Separating the
variables leads to
1
y(1 −y)
dy = r dt
1
y(1 −y)
dy =
r dt +K.
The integral on the right will lead to rt + K where K is some constant of integration that
we need to incorporate since we do not have endpoints on our integrals. But we must work
harder to evaluate the integral on the left. We can do so by partial fractions, the technique
described in Section 6.6. Details are given in Section 9.6.4.
9.6.4 Application of partial fractions
Let
I =
1
y(1 −y)
dy.
9.6. Density dependent growth 193
Then for some constants A, B we can write
I =
A
y
+
B
1 −y
dy = Aln y −Bln 1 −y.
(The minus sign in front of B stems from the fact that letting u = 1 − y would lead to
du = −dy.) We can ﬁnd A, B from the fact that
A
y
+
B
1 −y
=
1
y(1 −y)
,
so that
A(1 −y) +By = 1.
This must be true for all y, and in particular, substituting in y = 0 and y = 1 leads to
A = 1, B = 1 so that
I = ln y −ln 1 −y = ln
y
1 −y
.
9.6.5 The solution of the logistic equation
We now have to extract the quantity y from the equation
ln
y
1 −y
= rt +K.
That is, we want y as a function of t. After exponentiating both sides we need to remove
the absolute value. We will now assume that y is initially smaller than 1, and show that it
remains so. In that case, everything inside the absolute value is positive, and we can write
y(t)
(1 −y(t))
= e
rt+K
= e
K
e
rt
= Ce
rt
.
In the above step, we have simply renamed the constant, e
K
by the new name C for sim
plicity. C > 0 is now also an arbitrary constant whose value will be determined from the
initial conditions. Indeed, if we substitute t = 0 into the most recent equation, we ﬁnd that
y(0)
(1 −y(0))
= Ce
0
= C,
so that
C =
y
0
(1 −y
0
)
.
We will use this fact shortly. What remains now is some algebra to isolate the desired
function y(t)
y(t) = (1 −y(t))Ce
rt
.
y(t)
1 +Ce
rt
= Ce
rt
.
194 Chapter 9. Differential Equations
y(t) =
Ce
rt
(1 +Ce
rt
)
=
1
(1/C)e
−rt
+ 1
.
The desired function is now expressed in terms of the time t, and the constants r, C. We
can also express it in terms of the initial value of y, i.e. y
0
, by using what we know to be
true about the constant C, i.e. C = y
0
/(1 −y
0
). When we do so, we arrive at
y(t) =
1
1+y0
y0
e
−rt
+ 1
=
y
0
(y
0
+ (1 −y
0
)e
−rt
)
. (9.17)
Some typical solution curves of the logistic equation are shown in Fig. 9.5.
y(t)
time t
Solutions to Logistic equation
0.0 30.0
0.0
1.0
Figure 9.5. Solution curves for y(t) in the scaled form of the logistic equation
based on (9.18). We show the predicted behaviour of y(t) as given by Eqn. (9.17) for three
different initial conditions, y
0
= 0.1, 0.25, 0.5. Note that all solutions approach the value
y = 1.
9.6.6 What this solution tells us
We have arrived at the function that describes the scaled population as a function of time
as predicted by the scaled logistic equation, (9.16). The level of population (in units of the
carrying capacity K) follows the timedependent function
y(t) =
y
0
(y
0
+ (1 −y
0
)e
−rt
)
. (9.18)
9.7. Extensions and other population models: the “Law of Mortality” 195
We can convert this result to an equivalent expression for the unscaled total population N(t)
by recalling that y(t) = N(t)/K. Substituting this for y(t), and noting that y
0
= N
0
/K
leads to
N(t) =
N
0
(N
0
+ (K −N
0
)e
−rt
)
. (9.19)
It is left as an exercise for the reader to check this claim.
Now recall that r > 0. This means that e
−rt
is a decreasing function of time. There
fore, (9.18) implies that, after a long time, the term e
−rt
in the denominator will be negli
gibly small, and so
y(t) →
y
0
y
0
= 1,
so that y will approach the value 1. This means that
(N/K) →1 or simply N(t) →K.
The population will thus settle into a constant level, i.e., a steady state, at which no further
change will occur.
As an aside, we observe that this too, could have been predicted directly from the
differential equation. By setting dy/dt = 0, we ﬁnd that
0 = ry(1 −y),
which suggests that y = 1 is a steady state. (This is also true for the less interesting case
of no population, i.e. y = 0 is also a steady state.) Similarly, this could have been found
by setting the derivative to zero in Eqn. (9.15), the original, unscaled logistic differential
equation. Doing so leads to
dN
dt
= 0 ⇒rN
K −N
K
= 0.
If r > 0, the only values of N satisfying this steady state equation are N = 0 or N =
K. This implies that either N = 0 or N = K are steady states. The former is not too
interesting. It states the obvious fact that if there is no population, then there can be no
population growth. The latter reﬂects that N = K, the carrying capacity, is the population
size that will be sustained by the environment.
In summary, we have shown that the behaviour of the logistic equation for population
growth is more realistic than the simpler exponential growth we studied earlier. We saw
in Figure 9.5, that a small population will grow, but only up to some constant level (the
carrying capacity). Integration, and in particular the use of partial fractions allowed us to
make a full prediction of the behaviour of the population level as a function of time, given
by Eqn. (9.19).
9.7 Extensions and other population models: the
“Law of Mortality”
There are many variants of the logistic model that are used to investigate the growth or
mortality of a population. Here we extend tools to another example, the gradual decline of
196 Chapter 9. Differential Equations
a group of individuals born at the same time. Such a group is called a “cohort”.
55
. In 1825,
Gompertz suggested that the rate of mortality, m would depend on the age of the individu
als. Because we consider a group of people who were born at the same time, we can trade
”age” for ”time”. Essentially, Gompertz assumed that mortality is not constant: it is low
at ﬁrst, and increase as individuals age. Gompertz argued that mortality increases expo
nentially. This turns out to be equivalent to the assumption that the logarithm of mortality
increases linearly with time.
56
It is easy to see that these two statements are equivalent:
Suppose we assume that for some constants A > 0, µ > 0,
ln(m(t)) = A +µt. (9.20)
Then Eqn. (9.20) means that
t
ln( ) m
A
slope µ
log mortality
age,
Figure 9.6. In the Gompertz Law of Mortality, it is assumed that the log of mor
tality increases linearly with time, as depicted by Eqn. 9.20 and by the solid curve in this
diagram. Here the slope of ln(m) versus time (or age) is µ. For real populations, the
mortality looks more like the dashed curve.
m(t) = e
A+µt
= e
A
e
µt
Since Ais constant, so is e
A
. For simplicity we deﬁne Let us deﬁne m
0
= e
A
. (m
0
= m(0)
is the socalled “birth mortality” i.e. value of m at age 0.) Thus, the timedependent
mortality is
m = m(t) = m
0
e
µt
. (9.21)
9.7.1 Aging and Survival curves for a cohort:
We now study a population model having Gompertz mortality, together with the following
additional assumptions.
55
This section was formulated with help from Lu Fan
56
In actual fact, this is likely true for some range of ages. Infant mortality is generally higher than mortality
for young children, whereas mortality levels off or even decreases slightly for those oldest old who have survived
past the average lifespan.
9.8. Summary 197
1. All individuals are assumed to be identical.
2. There is “natural” mortality, but no other type of removal. This means we ignore the
mortality caused by epidemics, by violence and by wars.
3. We consider a single cohort, and assume that no new individuals are introduced (e.g.
by immigration)
57
.
We will now study the size of a “cohort”, i.e. a group of people who were born in the same
year. We will denote by N(t) the number of people in this group who are alive at time t,
where t is time since birth, i.e. age. Let N(0) = N
0
be the initial number of individuals in
the cohort.
9.7.2 Gompertz Model
All the people in the cohort were born at time (age) t = 0, and there were N
0
of them at
that time. That number changes with time due to mortality. Indeed,
The rate of change of cohort size = −[number of deaths per unit time]
= −[mortality rate] · [cohort size]
Translating to mathematical notation, we arrive at the differential equation
dN(t)
dt
= −m(t)N(t),
and using information about the size of the cohort at birth leads to the initial condition,
N(0) = N
0
. Together, this leads to the initial value problem
dN(t)
dt
= −m(t)N(t), N(0) = N
0
.
Note similarity to Eqn. (9.1), but now mortality is timedependent.
In the Problem set, we apply separation of variables and integrate over the time in
terval [0, T]: to show that the remaining population at age t is
N(t) = N
0
e
−
m
0
µ
(e
µt
−1)
.
9.8 Summary
In this chapter, we used integration methods to ﬁnd the analytical solutions to a variety of
differential equations where initial values were prescribed.
We investigated a number of population growth models:
1. Exponential growth, given by
dy
dt
= ky, with initial population level y(0) = y
0
was investigated (Eqn. (9.1)). This model had an unrealistic feature that growth is
unlimited.
57
Note that new births would contribute to other cohorts.
198 Chapter 9. Differential Equations
2. The Logistic equation
dN
dt
= rN
K−N
K
was analyzed (Eqn. (9.15)), showing that
densitydependent growth can correct for the above unrealistic feature.
3. The Gompertz equation,
dN(t)
dt
= −m(t)N(t), was solved to understand how age
dependent mortality affects a cohort of individuals.
In each of these cases, we used separation of variables to “integrate” the differential
equation, and predict the population as a function of time.
We also investigated several other physical models in this chapter, including the
velocity of a falling object subject to drag force. This led us to study a differential equation
of the form
dy
dt
= a − by. By slight reinterpretation of terms in this equation, we can use
results to understand chemical kinetics and blood alcohol levels, as well as a host of other
scientiﬁc applications.
Section 9.5, the “centerpiece” of this chapter, illustrated the detailed steps that go into
the formulation of a differential equation model for ﬂow of liquid out of a container. Here
we saw how conservation statements and simplifying assumptions are interpreted together,
to arrive at a differential equation model. Such ideas occur in many scientiﬁc problems, in
chemistry, physics, and biology.
Chapter 10
Inﬁnite series, improper
integrals, and Taylor
series
10.1 Introduction
This chapter has several important and challenging goals. The ﬁrst of these is to under
stand how concepts that were discussed for ﬁnite series and integrals can be meaningfully
extended to inﬁnite series and improper integrals  i.e. integrals of functions over an inﬁ
nite domain. In this part of the discussion, we will ﬁnd that the notion of convergence and
divergence will be important.
A second theme will be that of approximation of functions in terms of power series,
also called Taylor series. Such series can be described informally as inﬁnite polynomials
(i.e. polynomials containing inﬁnitely many terms). Understanding when these objects are
meaningful is also related to the issue of convergence, so we use the background assembled
in the ﬁrst part of the chapter to address such concepts arising in the second part.
HOA
x
y
y=f(x)
x
0
LA
Figure 10.1. The function y = f(x) (solid heavy curve) is shown together with its
linear approximation (LA, dashed line) at the point x
0
, and a better “higher order” approx
imation (HOA, thin solid curve). Notice that this better approximation stays closer to the
graph of the function near x
0
. In this chapter, we discuss how such better approximations
can be obtained.
The theme of approximation has appeared often in our calculus course. In a previous
199
200 Chapter 10. Inﬁnite series, improper integrals, and Taylor series
semester, we discussed a linear approximationto a function. The idea was to approximate
the value of the function close to a point on its graph using a straight line (the tangent line).
We noted in doing so that the approximation was good only close to the point of tangency.
Further away, the graph of the functions curves away from that straight line. This leads
naturally to the question: can we do better in making this approximation if we include
other terms to describe this “curving away”? Here we extend such linear approximation
methods. Our goal is to increase the accuracy of the linear approximation by including
higher order terms (quadratic, cubic, etc), i.e. to ﬁnd a polynomial that approximates the
given function. This idea forms an important goal in this chapter.
We ﬁrst review the idea of series introduced in Chapter 1.
10.2 Convergence and divergence of series
Recall the geometric series discussed in Section 1.6.
The sum of a ﬁnite geometric series,
S
n
= 1 +r +r
2
+. . . +r
n
=
n
¸
k=0
r
k
, is S
n
=
1 −r
n+1
1 −r
. (10.1)
We also review deﬁnitions discussed in Section 1.7
Deﬁnition: Convergence of inﬁnite series
An inﬁnite series that has a ﬁnite sum is said to be convergent. Otherwise it is divergent.
Deﬁnition: Partial sums and convergence
Suppose that S is an (inﬁnite) series whose terms are a
k
. Then the partial sums, S
n
, of
this series are
S
n
=
n
¸
k=0
a
k
.
We say that the sum of the inﬁnite series is S, and write
S =
∞
¸
k=0
a
k
, provided that S = lim
n→∞
S
n
= lim
n→∞
n
¸
k=0
a
k
.
That is, we consider the inﬁnite series as the limit of partial sums S
n
as the number of
terms n is increased. If this limit exists, we say that the inﬁnite series converges
58
to S.
This leads to the following conclusion:
58
If the limit does not exist, we say that the series diverges.
10.2. Convergence and divergence of series 201
The sum of an inﬁnite geometric series,
S = 1 +r +r
2
+. . . +r
k
+. . . =
∞
¸
k=0
r
k
=
1
1 −r
, provided r < 1. (10.2)
If this inequality is not satisﬁed, then we say that this sum does not exist (meaning that it is
not ﬁnite).
It is important to remember that an inﬁnite series, i.e. a sum with inﬁnitely many
terms added up, can exhibit either one of these two very different behaviours. It may
converge in some cases, as the ﬁrst example shows, or diverge (fail to converge) in other
cases. We will see examples of each of these trends again. It is essential to be able to
distinguish the two. Divergent series (or series that diverge under certain conditions) must
be handled with particular care, for otherwise, we may ﬁnd contradictions or “seemingly
reasonable” calculations that have meaningless results.
We can think of convergence or divergence of series using a geometric analogy. If we
start on the number line at the origin and take a sequence of steps {a
0
, a
1
, a
2
, . . . , a
k
, . . .},
we can think of S =
¸
∞
k=0
a
k
as the total distance we have travelled. S converges if that
distance remains ﬁnite and if we approach some ﬁxed number.
"divergence"
"convergence"
Figure 10.2. An informal schematic illustrating the concept of convergence and
divergence of inﬁnite series. Here we show only a few terms of the inﬁnite series: from
left to right, each step is a term in the series. In the top example, the sum of the steps gets
closer and closer to some (ﬁnite) value. In the bottom example, the steps lead to an ever
increasing total sum.
In order for the sum of ‘inﬁnitely many things’ to add up to a ﬁnite number, the
terms have to get smaller. But just getting smaller is not, in itself, enough to guarantee
convergence. (We will show this later on by considering the harmonic series.) There are
rigorous mathematical tests which help determine whether a series converges or not. We
discuss some of these tests in Appendix 11.9.
202 Chapter 10. Inﬁnite series, improper integrals, and Taylor series
10.3 Improper integrals
We will see that there is a close connection between certain inﬁnite series and improper
integrals, i.e. integrals over an inﬁnite domain. We have already encountered an example of
an improper integral in Section 3.8.5 and in the context of radioactive decay in Section 8.4.
Recall the following deﬁnition:
Deﬁnition
An improper integral is an integral performed over an inﬁnite domain, e.g.
∞
a
f(x) dx.
The value of such an integral is understood to be a limit, as given in the following deﬁnition:
∞
a
f(x) dx = lim
b→∞
b
a
f(x) dx.
i.e. we evaluate an improper integral by ﬁrst computing a deﬁnite integral over a ﬁnite
domain a ≤ x ≤ b, and then taking a limit as the endpoint b moves off to larger and larger
values. The deﬁnite integral can be interpreted as an area under the graph of the function.
The essential question being addressed here is whether that area remains bounded when we
include the “inﬁnite tail” of the function (i.e. as the endpoint b moves to larger values.) For
some functions (whose values get small enough fast enough) the answer is “yes”.
Deﬁnition
When the limit shown above exists, we say that the improper integral converges. Other
wise we say that the improper integral diverges.
With this in mind, we compute a number of classic integrals:
10.3.1 Example: A decaying exponential: convergent
improper integral
Here we recall that the improper integral of a decaying exponential converges. (We have
seen this earlier, in Section 3.8.5, and again in applications in Sections 4.5 and 8.4.1. Here
we recap this important result in the context of our discussion of improper integrals.) Sup
pose that r > 0 and let
I =
∞
0
e
−rt
dt ≡ lim
b→∞
b
0
e
−rt
dt.
Then note that b > 0 so that
I = lim
b→∞
−
1
r
e
−rt
b
0
= −
1
r
lim
b→∞
(e
−rb
−e
0
) = −
1
r
( lim
b→∞
e
−rb
. .. .
0
−1) =
1
r
.
10.3. Improper integrals 203
We have used the fact that lim
b→∞
e
−rb
= 0 since (for r, b > 0) the exponential function
is decreasing with increasing b. Thus the limit exists (is ﬁnite) and the integral converges.
In fact it converges to the value I = 1/r.
1
x
y
y=
1/x
y=1/x
2
Figure 10.3. In Sections 10.3.2 and 10.3.3, we consider two functions whose
values decrease along the x axis, f(x) = 1/x and f(x) = 1/x
2
. We show that one, but not
the other encloses a ﬁnite (bounded) area over the interval (1, ∞). To do so, we compute
an improper integral for each one. The heavy arrow is meant to remind us that we are
considering areas over an unbounded domain.
10.3.2 Example: The improper integral of 1/x diverges
We nowconsider a classic and counterintuitive result, and one of the most important results
in this chapter. Consider the function
y = f(x) =
1
x
.
Examining the graph of this function for positive x, e.g. for the interval (1, ∞), we know
that values decrease to zero as x increases
59
. The function is not only itself bounded, but
also falls to arbitrarily small values as x increases. Nevertheless, this is not enough to
guarantee that the enclosed area remains ﬁnite! We show this in the following calculation.
I =
∞
1
1
x
dx = lim
b→∞
b
a
1
x
dx = lim
b→∞
ln(x)
b
1
= lim
b→∞
(ln(b) −ln(1))
I = lim
b→∞
ln(b) = ∞
The fact that we get an inﬁnite value for this integral follows from the observation that
ln(b) increases without bound as b increases, that is the limit does not exist (is not ﬁnite).
Thus the area under the curve f(x) = 1/x over the interval 1 ≤ x ≤ ∞is inﬁnite. We say
that the improper integral of 1/x diverges (or does not converge). We will use this result
again in Section 10.4.1.
59
We do not chose the interval (0, ∞) because this function is undeﬁned at x = 0. We want here to emphasize
the behaviour at inﬁnity, not the blow up that occurs close to x = 0.
204 Chapter 10. Inﬁnite series, improper integrals, and Taylor series
10.3.3 Example: The improper integral of 1/x
2
converges
Now consider the related function
y = f(x) =
1
x
2
, and the corresponding integral I =
∞
1
1
x
2
dx
Then
I = lim
b→∞
b
1
x
−2
dx. = lim
b→∞
(−x
−1
)
b
1
= − lim
b→∞
1
b
−1
= 1.
Thus, the limit exists, and, in fact, I = 1, so, in contrast to the Example 10.3.2, this integral
converges.
We observe that the behaviours of the improper integrals of the functions 1/x and
1/x
2
are very different. The former diverges, while the latter converges. The only differ
ence between these functions is the power of x. As shown in Figure 10.3, that power affects
how rapidly the graph “falls off” to zero as x increases. The function 1/x
2
decreases much
faster than 1/x. (Consequently 1/x
2
has a sufﬁciently “slim” inﬁnite “tail”, that the area
under its graph does not become inﬁnite  not an easy concept to digest!) This observations
leads us to wonder what power p is needed to make the improper integral of a function
1/x
p
converge. We answer this question below.
10.3.4 When does the integral of 1/x
p
converge?
Here we consider an arbitrary power, p, that can be any real number. We ask when the
corresponding improper integral converges or diverges. Let
I =
∞
1
1
x
p
dx.
For p = 1 we have already established that this integral diverges (Section 10.3.2), and for
p = 2 we have seen that it is convergent (Section 10.3.3). By a similar calculation, we ﬁnd
that
I = lim
b→∞
x
1−p
(1 −p)
b
1
= lim
b→∞
1
1 −p
b
1−p
−1
.
Thus this integral converges provided that the termb
1−p
does not “blow up” as b increases.
For this to be true, we require that the exponent (1 − p) should be negative, i.e. 1 −p < 0
or p > 1. In this case, we have
I =
1
p −1
.
To summarize our result,
∞
1
1
x
p
dx converges if p > 1, diverges if p ≤ 1.
10.3. Improper integrals 205
Examples:
1/x
p
that do or do not converge
1. The integral
∞
1
1
√
x
dx, diverges.
We see this from the following argument:
√
x = x
1
2
, so p =
1
2
< 1. Thus, by the
general result, this integral diverges.
2. The integral
∞
1
x
−1.01
dx, converges.
Here p = 1.01 > 1, so the result implies convergence of the integral.
10.3.5 Integral comparison tests
The integrals discussed above can be used to make comparisons that help us to identify
when other improper integrals converge or diverge
60
. The following important result estab
lishes how these comparisons work:
Suppose we are given two functions, f(x) and g(x), both continuous on some inﬁnite
interval [a, ∞). Suppose, moreover, that, at all points on this interval, the ﬁrst function is
smaller than the second, i.e.
0 ≤ f(x) ≤ g(x).
Then the following conclusions
a
can be made:
1.
∞
a
f(x) dx ≤
∞
a
g(x) dx. (This means that the area under f(x) is smaller than
the area under g(x).)
2. If
∞
a
g(x) dx converges, then
∞
a
f(x) dx converges. (If the larger area is ﬁnite,
so is the smaller one)
3. If
∞
a
f(x) dx diverges, then
∞
a
g(x) dx diverges. (If the smaller area is inﬁnite,
so is the larger one.)
a
These statements have to be carefully noted. What is assumed and what is concluded works “one way”. That
is the order “if..then” is important. Reversing that order leads to a common error.
60
The reader should notice the similarity of these ideas to the comparisons made for inﬁnite series in the
Appendix 11.9.2. This similarity stems from the fact that there is a close connection between series and integrals,
a recurring theme in this course.
206 Chapter 10. Inﬁnite series, improper integrals, and Taylor series
Example: comparison of improper integrals
We can determine that the integral
∞
1
x
1 +x
3
dx converges
by noting that for all x > 0
0 ≤
x
1 +x
3
≤
x
x
3
=
1
x
2
.
Thus
∞
1
x
1 +x
3
dx ≤
∞
1
1
x
2
dx.
Since the larger integral on the right is known to converge, so does the smaller integral on
the left.
10.4 Comparing integrals and series
The convergence of inﬁnite series was discussed earlier, in Section 1.7 and here again in
Section 10.2. Many tests for convergence are provided in the Appendix 11.9, and will not
be discussed in detail due to time and space constraints. However, an interesting connection
exists between convergence of series and integrals. This is the topic we examine here.
Convergence of series and convergence of integrals is related. We can use the con
vergence/divergence of an integral/series to determine the behaviour of the other. Here we
give an example of this type by establishing a connection between the harmonic series and
a divergent improper integral.
10.4.1 The harmonic series
The harmonic series is a sum of terms of the form 1/k where k = 1, 2, . . .. At ﬁrst ap
pearance, this series might seem to have the desired qualities of a convergent series, simply
because the successive terms being added are getting smaller and smaller, but this appear
ance is deceptive and actually wrong
61
.
The harmonic series
∞
¸
k=1
1
k
= 1 +
1
2
+
1
3
+
1
4
+. . . +
1
k
+. . . diverges
We establish that the harmonic series diverges by comparing it to the improper integral of
the related function
62
.
y = f(x) =
1
x
.
61
We have already noticed a similar surprise in connection with the improper integral of 1/x. These two
“surprises” are closely related, as we show here using a comparison of the series and the integral.
62
This function is “related” since for integer values of x, the function takes on values that are the same as
successive terms in the series, i.e. if x = k is an integer, then f(x) = f(k) = 1/k
10.4. Comparing integrals and series 207
The harmonic series
<==== the function y=1/x
0.0 11.0
0.0
1.0
Figure 10.4. The harmonic series is a sum that corresponds to the area under the
staircase shown above. Note that we have purposely shown the stairs arranged so that they
are higher than the function. This is essential in drawing the conclusion that the sum of the
series is inﬁnite: It is larger than an area under 1/x that we already know to be inﬁnite, by
Section 10.3.2.
In Figure 10.4 we show on one graph a comparison of the area under this curve, and a
staircase area representing the ﬁrst few terms in the harmonic series. For the area of the
staircase, we note that the width of each step is 1, and the heights form the sequence
{1,
1
2
,
1
3
,
1
4
, . . .}
Thus the area of (inﬁnitely many) of these steps can be expressed as the (inﬁnite) harmonic
series,
A = 1 · 1 + 1 ·
1
2
+ 1 ·
1
3
+ 1 ·
1
4
+. . . = 1 +
1
2
+
1
3
+
1
4
+. . . =
∞
¸
k=1
1
k
.
On the other hand, the area under the graph of the function y = f(x) = 1/x for 0 ≤ x ≤ ∞
is given by the improper integral
∞
1
1
x
dx.
208 Chapter 10. Inﬁnite series, improper integrals, and Taylor series
We have seen previously in Section 10.3.2 that this integral diverges!
FromFigure 10.4 we see that the areas under the function, A
f
and under the staircase,
A
s
, satisfy
0 < A
f
< A
s
.
Thus, since the smaller of the two (the improper integral) is inﬁnite, so is the larger (the
sum of the harmonic series). We have established, using this comparison, that the the sum
of the harmonic series cannot be ﬁnite, so that this series diverges.
Other comparisons: The “p” series
More generally, we can compare series of the form
∞
¸
k=1
1
k
p
to the integral
∞
1
1
x
p
dx
in precisely the same way. This leads to the conclusion that
The “p” series,
∞
¸
k=1
1
k
p
converges if p > 1, diverges if p ≤ 1.
For example, the series
∞
¸
k=1
1
k
2
= 1 +
1
4
+
1
9
+
1
16
+. . .
converges, since p = 2 > 1. Notice, however, that the comparison does not give us a value
to which the sum converges. It merely indicates that the series does converge.
10.5 From geometric series to Taylor polynomials
In studying calculus, we explored a variety of functions. Among the most basic are poly
nomials, i.e. functions such as
p(x) = x
5
+ 2x
2
+ 3x + 2.
Our introduction to differential calculus started with such functions for a reason: these
functions are convenient and simple to handle. We found long ago that it is easy to compute
derivatives of polynominals. The same can be said for integrals. One of our ﬁrst examples,
in Section 3.6.1 was the integral of a polynomial. We needed only use a power rule to
integrate each term. An additional convenience of polynomials is that “evaluating” the
function, (i.e. plugging in an x value and determining the corresponding y value) can be
done by simple multiplications and additions, i.e. by basic operations easily handled by
an ordinary calculator. This is not the case for, say, trigonometric functions, exponential
10.5. From geometric series to Taylor polynomials 209
functions, or for that matter, most other functions we considered
63
. For this reason, being
able to approximate a function by a polynomial is an attractive proposition. This forms our
main concern in the sections that follow.
We can arrive at connections between several functions and their polynomial approx
imations by exploiting our familiarity with the geometric series. We use both the results
for convergence of the geometric series (from Section 10.2) and the formula for the sum of
that series to derive a number of interesting, (somewhat haphazard) results
64
.
Recall from Sections 1.7.1 and 10.2 that the sum of an inﬁnite geometric series is
S = 1 +r + r
2
+. . . +r
k
+. . . =
∞
¸
k=0
r
k
=
1
1 −r
, provided r < 1. (10.3)
To connect this result to a statement about a function, we need a “variable”. Let us consider
the behaviour of this series when we vary the quantity r. To emphasize that now r is our
variable, it will be helpful to change notation by substituting r = x into the above equation,
while remembering that the formula in Eqn (10.3) hold only provided r = x < 1.
10.5.1 Example 1: A simple expansion
Substitute the variable x = r into the series (10.3). Then formally, rewriting the above with
this substitution, leads to the conclusion that
1
1 −x
= 1 +x +x
2
+. . . (10.4)
We can think of this result as follows: Let
f(x) =
1
1 −x
(10.5)
Then for every x in −1 < x < 1, it is true that f(x) can be approximated by terms in the
polynomial
p(x) = 1 +x +x
2
+. . . (10.6)
In other words, by (10.3), for x < 1 the two expressions “are the same”, in the sense that
the polynomial converges to the value of the function. We refer to this p(x) as an (inﬁnite)
Taylor polynomial
65
or simply a Taylor series for the function f(x). The usefulness of this
kind of result can be illustrated by a simple example.
Example 10.1 (Using the Taylor Series (10.6) to approximate the function (10.5)) Compute
the value of the function f(x) given by Eqn. (10.5) for x = 0.1 without using a calculator.
63
For example, to ﬁnd the decimal value of sin(2.5) we would need a scientiﬁc calculator. These days the
distinction is blurred, since powerful handheld calculators are ubiquitous. Before such devices were available,
the ease of evaluating polynomials made them even more important.
64
We say “haphazard” here because we are not yet at the point of a systematic procedure for computing a
Taylor Series. That will be done in Section 10.6. Here we “take what we can get” using simple manipulations of
a geometric series.
65
A Taylor polynomial contains ﬁnitely many terms, n, whereas a Taylor series has n →∞.
210 Chapter 10. Inﬁnite series, improper integrals, and Taylor series
Solution: Plugging in the value x = 0.1 into the function directly leads to 1/(1 − 0.1) =
1/0.9, whose evaluation with no calculator requires long division
66
. Using the polynomial
representation, we have a simpler method:
p(0.1) = 1 + 0.1 + 0.1
2
+. . . = 1 + 0.1 + 0.01 +. . . = 1.11 . . .
We provide a few other examples based on substitutions of various sorts using the geomet
ric series as a starting point.
10.5.2 Example 2: Another substitution
We make the substitution r = −t, then r < 1 means that  − t = t < 1, so that the
formula (10.3) for the sum of a geometric series implies that:
1
1 −(−t)
= 1 + (−t) + (−t)
2
+ (−t)
3
+. . .
1
1 +t
= 1 −t +t
2
−t
3
+t
4
+. . . provided t < 1
This means we have produced a series expansion for the function 1/(1 + t). We can go
farther with this example by a new manipulation, whereby we integrate both sides to arrive
at a new function and its expansion, shown next.
10.5.3 Example 3: An expansion for the logarithm
We will use the results of Example 10.5.2, but we follow our substitution by integration.
On the left, we integrate the function f(t) = 1/(1 + t) (to arrive at a logarithm type
integral) and on the right we integrate the power terms of the expansion. We are permitted
to integrate the power series term by term provided that the series converges. This is an
important restriction that we emphasize: Manipulation of inﬁnite series by integration,
differentiation, addition, multiplication, or any other “term by term” computation makes
sense only so long as the original series converges.
Provided t < 1, we have that
x
0
1
1 +t
dt =
x
0
(1 −t +t
2
−t
3
+t
4
−. . .) dt
ln(1 +x) = x −
x
2
2
+
x
3
3
−
x
4
4
+. . .
This procedure has allowed us to ﬁnd a series representation for a new function, ln(1 +x).
ln(1 +x) = x −
x
2
2
+
x
3
3
−
x
4
4
+. . . =
∞
¸
k=1
(−1)
k+1
x
k
k
. (10.7)
66
This example is slightly “trivial”, in the sense that evaluating the function itself is not very difﬁcult. However,
in other cases, we will ﬁnd that the polynomial expansion is the only way to ﬁnd the desired value.
10.5. From geometric series to Taylor polynomials 211
The formula appended on the right is just a compact notation that represents the pattern of
the terms. Recall that in Chapter 1, we have gotten thoroughly familiar with such summa
tion notation
67
.
Example 10.2 (Evaluating the logarithm for x = 0.25) An expansion for the logarithm
is deﬁnitely useful, in the sense that (without a scientiﬁc calculator or log tables) it is not
possible to easily calculate the value of this function at a given point. For example, for x =
0.25, we cannot ﬁnd ln(1 + 0.25) = ln(1.25) using simple operations, whereas the value
of the ﬁrst few terms of the series are computable by simple multiplication, division, and
addition (0.25−
0.25
2
2
+
0.25
3
3
≈ 0.2239). (A scientiﬁc calculator gives ln(1.25) ≈ 0.2231,
so the approximation produced by the series is relatively good.)
When is the series for ln(1 +x) in (10.7) expected to converge? Retracing our steps
from the beginning of Example 10.5.2 we note that the value of t is not permitted to leave
the interval t < 1 so we need also x < 1 in the integration step
68
. We certainly cannot
expect the series for ln(1 + x) to converge when x > 1. Indeed, for x = −1, we have
ln(1 + x) = ln(0) which is undeﬁned. Also note that for x = −1 the right hand side of
(10.7) becomes
−
1 +
1
2
+
1
3
+
1
4
+. . .
.
This is the recognizable harmonic series (multiplied by 1). But we already know from
Section 10.4.1 that the harmonic series diverges. Thus, we must avoid x = −1, since
the expansion will not converge there, and neither is the function deﬁned. This example
illustrates that outside the interval of convergence, the series and the function become
“meaningless”.
Example 10.3 (An expansion for ln(2)) Strictly speaking, our analysis does not predict
what happens if we substitute x = 1 into the expansion of the function found in Sec
tion 10.5.3, because this value of x is outside of the permitted range −1 < x < 1 in which
the Taylor series can be guaranteed to converge. It takes some deeper mathematics (Abel’s
theorem) to prove that the result of this substitution actually makes sense, and converges,
i.e. that
ln(2) = 1 −
1
2
+
1
3
−
1
4
+. . .
We state without proof here that the alternating harmonic series converges to ln(2).
10.5.4 Example 4: An expansion for arctan
Suppose we make the substitution r = −t
2
into the geometric series formula, and recall
that we need r < 1 for convergence. Then
1
1 −(−t
2
)
= 1 + (−t
2
) + (−t
2
)
2
+ (−t
2
)
3
+. . .
67
The summation notation is not crucial and should certainly not be memorized. We are usually interested only
in the ﬁrst few terms of such a series in any approximation of practical value.
68
Strictly speaking, we should have ensured that we are inside this interval of convergence before we computed
the last example.
212 Chapter 10. Inﬁnite series, improper integrals, and Taylor series
1
1 +t
2
= 1 −t
2
+t
4
−t
6
+t
8
+. . . =
∞
¸
k=0
(−1)
n
t
2n
This series will converge provided t < 1. Now integrate both sides, and recall that the
antiderivative of the function 1/(1 +t
2
) is arctan(t). Then
x
0
1
1 +t
2
dt =
x
0
(1 −t
2
+t
4
−t
6
+t
8
+. . .) dt
arctan(x) = x −
x
3
3
+
x
5
5
−
x
7
7
+. . . =
∞
¸
k=1
(−1)
k+1
x
(2k−1)
(2k −1)
. (10.8)
Example 10.4 (An expansion for π) For a particular application of this expansion, con
sider plugging in x = 1 into Equation (10.8). Then
arctan(1) = 1 −
1
3
+
1
5
−
1
7
+. . .
But arctan(1) = π/4. Thus we have found a way of computing the irrational number π,
namely
π = 4
1 −
1
3
+
1
5
−
1
7
+. . .
= 4
∞
¸
k=1
(−1)
k+1
1
(2k −1)
.
While it is true that this series converges, the convergence is slow. (This can be seen by
adding up the ﬁrst 100 or 1000 terms of this series with a spreadsheet.) This means that it
is not practical to use such a series as an approximation for π. (There are other series that
converge to π very rapidly that are used in any practical application.)
10.6 Taylor Series: a systematic approach
In Section 10.5, we found a variety of Taylor series expansions directly from the formula
for a geometric series. Here we ask how such Taylor series can be constructed more sys
tematically, if we are given a function that we want to approximate
69
.
Suppose we have a function which we want to represent by a power series,
f(x) = a
0
+a
1
x +a
2
x
2
+a
3
x
3
+. . . =
∞
¸
k=0
a
k
x
k
.
Here we will use the function to directly determine the coefﬁcients a
k
. To determine a
0
,
let x = 0 and note that
f(0) = a
0
+a
1
0 +a
2
0
2
+a
3
0
3
+. . . = a
0
.
We conclude that
a
0
= f(0).
69
The development of this section was motivated by online notes by David Austin.
10.6. Taylor Series: a systematic approach 213
By differentiating both sides we ﬁnd the following:
f
(x) = a
1
+ 2a
2
x + 3a
3
x
2
+. . . +ka
k
x
k−1
+. . .
f
(x) = 2a
2
+ 2 · 3a
3
x +. . . + (k −1)ka
k
x
k−2
+. . .
f
(x) = 2 · 3a
3
+. . . + (k −2)(k −1)ka
k
x
k−3
+. . .
f
(k)
(x) = 1 · 2 · 3 · 4 . . . ka
k
+. . .
Here we have used the notation f
(k)
(x) to denote the k’th derivative of the function. Now
evaluate each of the above derivatives at x = 0. Then
f
(0) = a
1
, ⇒a
1
= f
(0)
f
(0) = 2a
2
, ⇒a
2
=
f
(0)
2
f
(0) = 2 · 3a
3
, ⇒a
3
=
f
(0)
2·3
f
(k)
(0) = k!a
k
, ⇒a
k
=
f
(k)
(0)
k!
This gives us a recipe for calculating all coefﬁcients a
k
. This means that if we can compute
all the derivatives of the function f(x), then we know the coefﬁcients of the Taylor series
as well. Because we have evaluated all the coefﬁcients by the substitution x = 0, we say
that the resulting power series is the Taylor series of the function about x = 0.
10.6.1 Taylor series for the exponential function, e
x
Consider the function f(x) = e
x
. All the derivatives of this function are equal to e
x
. In
particular,
f
(k)
(x) = e
x
⇒ f
(k)
(0) = 1.
So that the coefﬁcients of the Taylor series are
a
k
=
f
(k)
(0)
k!
=
1
k!
.
Therefore the Taylor series for e
x
about x = 0 is
a
0
+a
1
x+a
2
x
2
+a
3
x
3
+. . .+a
k
x
k
+. . . = 1+x+
x
2
2
+
x
3
6
+
x
4
24
+. . .+
x
k
k!
+. . . =
∞
¸
k=0
x
k
k!
This is a very interesting series. We state here without proof that this series converges for
all values of x. Further, the function deﬁned by the series is in fact equal to e
x
that is,
e
x
= 1 +x +
x
2
2
+
x
3
6
+. . . =
∞
¸
k=0
x
k
k!
214 Chapter 10. Inﬁnite series, improper integrals, and Taylor series
The implication is that the function e
x
is completely determined (for all x values)
by its behaviour (i.e. derivatives of all orders) at x = 0. In other words, the value of the
function at x = 1, 000, 000 is determined by the behaviour of the function around x = 0.
This means that e
x
is a very special function with superior “predictable features”. If a
function f(x) agrees with its Taylor polynomial on a region (−a, a), as was the case here,
we say that f is analytic on this region. It is known that e
x
is analytic for all x.
We can use the results of this example to establish the fact that the exponential func
tion grows “faster” than any power function x
n
. That is the same as saying that the ratio of
e
x
to x
n
(for any power n) increases with x. We leave this as an exercise for the reader.
We can also easily obtain a Taylor series expansion for functions related to e
x
, with
out assembling the derivatives. We start with the result that
e
u
= 1 +u +
u
2
2
+
u
3
6
+. . . =
∞
¸
k=0
u
k
k!
Then, for example, the substitution u = x
2
leads to
e
x
2
= 1 +x
2
+
(x
2
)
2
2
+
(x
2
)
3
6
+. . . =
∞
¸
k=0
(x
2
)
k
k!
10.6.2 Taylor series of trigonometric functions
In this example we determine the Taylor series for the sine function. The function and its
derivatives are
f(x) = sinx, f
(x) = cos x, f
(x) = −sin x, f
(x) = −cos x, f
(4)
(x) = sin x, . . .
After this, the cycle repeats. This means that
f(0) = 0, f
(0) = 1, f
(0) = 0, f
(0) = −1, . . .
and so on in a cyclic fashion. In other words,
a
0
= 0, a
1
= 1, a
2
= 0, a
3
= −
1
3!
, a
4
= 0, a
5
=
1
5!
, . . .
Thus,
sinx = x −
x
3
3!
+
x
5
5!
−
x
7
7!
+. . . =
∞
¸
n=0
(−1)
n
x
2n+1
(2n + 1)!
.
We state here without proof that the function sin(x) is analytic, so that the expansion
converges to the function for all x.
It is instructive to demonstrate how successive terms in a Taylor series expansion
lead to approximations that improve. Doing this kind of thing will be the subject of the last
computer laboratory exercise in this course.
10.6. Taylor Series: a systematic approach 215
sin(x)
T1
T2
T3
T4
0.0 7.0
2.0
2.0
Figure 10.5. An approximation of the function y = sin(x) by successive Taylor
polynomials, T
1
, T
2
, T
3
, T
4
. The higher Taylor polynomials do a better job of approximat
ing the function on a larger interval about x = 0.
Here we demonstrate this idea with the expansion for the function sin(x) that we just
obtained. To see this, consider the sequence of polynomials
T
1
(x) = x,
T
2
(x) = x −
x
3
3!
,
T
3
(x) = x −
x
3
3!
+
x
5
5!
,
T
4
(x) = x −
x
3
3!
+
x
5
5!
−
x
7
7!
.
Then these polynomials provide a better and better approximation to the function sin(x)
close to x = 0. The ﬁrst of these is just a linear (or tangent line) approximation that we
had studied long ago. The second improves this with a quadratic approximation, etc. Fig
ure 10.5 illustrates how the ﬁrst few Taylor polynomials approximate the function sin(x)
near x = 0. Observe that as we keep more terms, n, in the polynomial T
n
(x), the approx
imating curve “hugs” the graph of sin(x) over a longer and longer range. The student will
be asked to use the spreadsheet, together with some calculations as done in this section, to
produce a composite graph similar to Fig. 10.5 for some other function.
216 Chapter 10. Inﬁnite series, improper integrals, and Taylor series
Example 10.5 (The error in successive approximations) For a given value of x close to
the base point (at x = 0), the error in the approximation between the polynomials and
the function is the vertical distance between the graphs of the polynomial and the function
sin(x) (shown in black). For example, at x = 2 radians sin(2) = 0.9093 (as found on
a scientiﬁc calculator). The approximations are: T
1
(2) = 2, which is very inaccurate,
T
2
(2) = 2 − 2
3
/3! ≈ 0.667 which is too small, T
3
(2) ≈ 0.9333 that is much closer and
T
4
(2) ≈ .9079 that is closer still. In general, we can approximate the size of the error using
the next term that would occur in the polynomial if we kept a higher order expansion. The
details of estimating such errors is omitted from our discussion.
We also note that all polynomials that approximate sin(x) contain only odd powers
of x. This stems from the fact that sin(x) is an odd function, i.e. its graph is symmetric to
rotation about the origin, a concept we discussed in an earlier term.
The Taylor series for cos(x) could be found by a similar sequence of steps. But in
this case, this is unnecessary: We already knowthe expansion for sin(x), so we can ﬁnd the
Taylor series for cos(x) by simple differentiation term by term. (Since sin(x) is analytic,
this is permitted for all x.) We leave as an exercise for the reader to show that
cos(x) = 1 −
x
2
2
+
x
4
4!
−
x
6
6!
+. . . =
∞
¸
n=0
(−1)
n
x
2n
(2n)!
.
Since cos(x) has symmetry properties of an even function, we ﬁnd that its Taylor series is
composed of even powers of x only.
10.7 Application of Taylor series
In this section we illustrate some of the applications of Taylor series to problems that may
be difﬁcult to solve using other conventional methods. Some functions do not have an
antiderivative that can be expressed in terms of other simple functions. Integrating these
functions can be a problem, as we cannot use the Fundamental Theorem of Calculus spec
iﬁes. In some cases, we can approximate the value of the deﬁnite integral using a Taylor
series. We show this in Section 10.7.1.
Another application of Taylor series is to compute an approximate solution to a dif
ferential equation. We provide one example of that sort in Section 10.7.2 and another in
Appendix 11.11.
10.7.1 Example 1: using a Taylor series to evaluate an integral
Evaluate the deﬁnite integral
1
0
sin(x
2
) dx.
A simple substitution (e.g. u = x
2
) will not work here, and we cannot ﬁnd an antideriva
tive. Here is how we might approach the problem using Taylor series: We know that the
10.7. Application of Taylor series 217
series expansion for sin(t) is
sint = t −
t
3
3!
+
t
5
5!
−
t
7
7!
+. . .
Substituting t = x
2
, we have
sin(x
2
) = x
2
−
x
6
3!
+
x
10
5!
−
x
14
7!
+. . .
In spite of the fact that we cannot antidifferentiate the function, we can antidifferentiate the
Taylor series, just as we would a polynomial:
1
0
sin(x
2
) dx =
1
0
(x
2
−
x
6
3!
+
x
10
5!
−
x
14
7!
+. . .) dx
=
x
3
3
−
x
7
7 · 3!
+
x
11
11 · 5!
−
x
15
15 · 7!
+. . .
1
0
=
1
3
−
1
7 · 3!
+
1
11 · 5!
−
1
15 · 7!
+. . .
This is an alternating series so we know that it converges. If we add up the ﬁrst four terms,
the pattern becomes clear: the series converges to 0.31026.
10.7.2 Example 2: Series solution of a differential equation
We are already familiar with the differential equation and initial condition that describes
unlimited exponential growth.
dy
dx
= y,
y(0) = 1.
Indeed, from previous work, we know that the solution of this differential equation and ini
tial condition is y(x) = e
x
, but we will pretend that we do not know this fact in illustrating
the usefulness of Taylor series. In some cases, where separation of variables does not work,
this option would have great practical value.
Let us express the “unknown” solution to the differential equation as
y = a
0
+a
1
x +a
2
x
2
+a
3
x
3
+a
4
x
4
+. . .
Our task is to determine values for the coefﬁcients a
i
Since this function satisﬁes the condition y(0) = 1, we must have y(0) = a
0
= 1.
Differentiating this power series leads to
dy
dx
= a
1
+ 2a
2
x + 3a
3
x
2
+ 4a
4
x
3
+. . .
218 Chapter 10. Inﬁnite series, improper integrals, and Taylor series
But according to the differential equation,
dy
dx
= y. Thus, it must be true that the two Taylor
series match, i.e.
a
0
+a
1
x +a
2
x
2
+a
3
x
3
+a
4
x
4
+. . . = a
1
+ 2a
2
x + 3a
3
x
2
+ 4a
4
x
3
+. . .
This equality hold for all values of x. This can only happen if the coefﬁcients of like terms
are the same, i.e. if the constant terms on either side of the equation are equal, if the terms of
the form Cx
2
on either side are equal, and so on for all powers of x. Equating coefﬁcients,
we obtain:
a
0
= a
1
= 1, ⇒a
1
= 1,
a
1
= 2a
2
, ⇒a
2
=
a1
2
=
1
2
,
a
2
= 3a
3
, ⇒a
3
=
a2
3
=
1
2·3
,
a
3
= 4a
4
, ⇒a
4
=
a3
4
=
1
2·3·4
,
a
n−1
= na
n
, ⇒a
n
=
an−1
n
=
1
1·2·3...n
=
1
n!
.
This means that
y = 1 +x +
x
2
2!
+
x
3
3!
+. . . +
x
n
n!
+. . . = e
x
,
which, as we have seen, is the expansion for the exponential function. This agrees with the
solution we have been expecting. In the example here shown, we would hardly need to use
series to arrive at the right conclusion, but in the next example, we would not ﬁnd it as easy
to discover the solution by other techniques discussed previously.
We provide an example of a more complicated differential equation and its series
solution in Appendix 11.11.
10.8 Summary
The main points of this chapter can be summarized as follows:
1. We reviewed the deﬁnition of an improper integral
∞
a
f(x) dx = lim
b→∞
b
a
f(x) dx.
2. We computed some examples of improper integrals and discussed their convergence
or divergence. We recalled (from earlier chapters) that
I =
∞
0
e
−rt
dt converges,
whereas
I =
∞
1
1
x
dx diverges.
10.8. Summary 219
3. More generally, we showed that
∞
1
1
x
p
dx converges if p > 1, diverges if p ≤ 1.
4. Using a comparison between integrals and series we showed that the harmonic series,
∞
¸
k=1
1
k
= 1 +
1
2
+
1
3
+
1
4
+. . . +
1
k
+. . . diverges.
5. More generally, our results led to the conclusion that the “p” series,
∞
¸
k=1
1
k
p
converges if p > 1, diverges if p ≤ 1.
6. We studied Taylor series and showed that some can be found using the formula for
convergent geometric series. Two examples of Taylor series that were obtained in
this way are
ln(1 +x) = x −
x
2
2
+
x
3
3
−
x
4
4
+. . . for x < 1
and
arctan(x) = x −
x
3
3
+
x
5
5
−
x
7
7
+. . . for x < 1
7. In discussing Taylor series, we considered some of the following questions: (a) For
what range of values of x can we expect the series to converges? (b) Suppose we
approximate the function on the right by a ﬁnite number of terms on the left. How
good is that approximation? Another interesting question is: (c) If we include more
and more such terms, does that approximation get better and better? (i.e., does the
series converge to the function?) (d) Is the convergence rate rapid? Some of these
questions occupy the attention of career mathematicians, and are beyond the scope
of this introductory calculus course.
8. More generally, we showed that the Taylor series for a function about x = 0,
f(x) = a
0
+a
1
x +a
2
x
2
+a
3
x
3
+. . . =
∞
¸
k=0
a
k
x
k
.
can be found by computing the coefﬁcients
a
k
=
f
(k)
(0)
k!
9. We discussed some of the applications of Taylor series. We used Taylor series to
approximate a function, to ﬁnd an approximation for a deﬁnite integral of a function,
and to solve a differential equation.
220 Chapter 10. Inﬁnite series, improper integrals, and Taylor series
Chapter 11
Appendix
11.1 How to prove the formulae for sums of squares
and cubes
Mathematicians are concerned with rigorously establishing formulae such as sums of squared
(or cubed) integers. While it is not hard to see that these formulae “work” for a few cases,
determining that they work in general requires more work. Here we provide a taste of how
such careful arguments works. We give two examples. The ﬁrst, based on mathematical
induction provides a general method that could be used in many similar kinds of proofs.
The second argument, also for purposes of illustration uses a “trick”. Devising such tricks
is not as straightforward, and depends to some degree on serendipity or experience with
numbers.
Proof by induction (optional)
Here, we prove the formula for the sum of square integers,
N
¸
k=1
k
2
=
N(N + 1)(2N + 1)
6
,
using a technique called induction. The idea of the method is to check that the formula
works for one or two simple cases (e.g. the “sum” of just one or just two terms of the
series), and then show that whenever it works for one case (summing up to N), it has to
also work for the next case (summing up to N + 1).
First, we verify that this formula works for a few test cases:
N = 1: If there is only one term, then clearly, by inspection,
1
¸
k=1
k
2
= 1
2
= 1.
221
222 Chapter 11. Appendix
The formula indicates that we should get
S =
1(1 + 1)(2 · 1 + 1)
6
=
1(2)(3)
6
= 1,
so this case agrees with the prediction.
N = 2:
2
¸
k=1
k
2
= 1
2
+ 2
2
= 1 + 4 = 5.
The formula would then predict that
S =
2(2 + 1)(2 · 2 + 1)
6
=
2(3)(5)
6
= 5.
So far, elementary computation matches with the result predicted by the formula.
Now we show that if this formula holds for any one case, e.g. for the sum of the ﬁrst N
squares, then it is also true for the next case, i.e. for the sum of N + 1 squares. So we will
assume that someone has checked that for some particular value of N it is true that
S
N
=
N
¸
k=1
k
2
=
N(N + 1)(2N + 1)
6
.
Now the sum of the ﬁrst N +1 squares will be just a bit bigger: it will have one more term
added to it:
S
N+1
=
N+1
¸
k=1
k
2
=
N
¸
k=1
k
2
+ (N + 1)
2
.
Thus
S
N+1
=
N(N + 1)(2N + 1)
6
+ (N + 1)
2
.
Combining terms, we get
S
N+1
= (N + 1)
¸
N(2N + 1)
6
+ (N + 1)
,
S
N+1
= (N + 1)
2N
2
+N + 6N + 6
6
= (N + 1)
2N
2
+ 7N + 6
6
.
Simplifying and factoring the last term leads to
S
N+1
= (N + 1)
(2N + 3)(N + 2)
6
.
We want to check that this still agrees with what the formula predicts. To make the notation
simpler, we will let M stand for N +1. Then, expressing the result in terms of the quantity
M = N + 1 we get
S
M
=
M
¸
k=1
k
2
= (N + 1)
[2(N + 1) + 1][(N + 1) + 1]
6
= M
[2M + 1][M + 1]
6
.
This is the same formula as we started with, only written in terms of M instead of N. Thus
we have veriﬁed that the formula works. By Mathematical Induction we ﬁnd that the result
has been proved.
11.2. Riemann Sums: Extensions and other examples 223
Another method using a trick
70
There is another method for determining the sums
n
¸
k=1
k
2
or
n
¸
k=1
k
3
. Write
(k + 1)
3
−(k −1)
3
= 6k
2
+ 2,
so
n
¸
k=1
(k + 1)
3
−(k −1)
3
=
n
¸
k=0
(6k
2
+ 2).
But looking more carefully at the left hand side (LHS), we see that
n
¸
k=1
((k +1)
3
−(k −1)
3
) = 2
3
−0
3
+3
3
−1
3
+4
3
−2
3
+5
3
−3
3
... +(n+1)
3
−(n−1)
3
.
most of the terms cancel, leaving only −1 +n
3
+ (n + 1)
3
, so this means that
−1 +n
3
+ (n + 1)
3
= 6
n
¸
k=1
k
2
+
n
¸
k=1
2,
so
n
¸
k=1
k
2
= (−1 +n
3
+ (n + 1)
3
−2n)/6 = (2n
3
+ 3n
2
+n)/6.
Similarly, the formula for
n
¸
k=1
k
3
, can be obtained by starting with
(k + 1)
4
−(k −1)
4
= 4k
3
+ 4k.
11.2 Riemann Sums: Extensions and other examples
We take up some issues here that were not yet considered in the context of our examples of
Riemann sums in Chapter 2 . First, we consider an arbitrary interval a ≤ x ≤ b. Then we
comment on other ways of constructing the rectangular strip approximation (that eventually
lead to the same limit when the true area is computed.)
11.2.1 A general interval: a ≤ x ≤ b
Example 2: (Lu Fan)
Find the area under the graph of the function
y = f(x) = x
2
+ 2x + 1 a ≤ x ≤ b.
70
I want to thank Robert Israel for contributing this material
224 Chapter 11. Appendix
Here the interval is a ≤ x ≤ b. Let us leave the values of a, b general for a moment, and
consider how the calculation is set up in this case. Then we have
length of interval = b −a
number of segments = N
width of rectangular strips = ∆x =
b −a
N
the k’th x value = x
k
= a +k
(b −a)
N
height of k’th rectangular strip = f(x
k
) = x
2
k
+ 2x
k
+ 1
Combining the last two steps, the height of rectangle k is:
f(x
k
) =
a +
k(b −a)
N
2
+ 2
a +
k(b −a)
N
+ 1
and its area is
a
k
= f(x
k
) ×∆x = f(x
k
) ×
b −a
N
.
We use the last two equations to express a
k
in terms of k (and the quantities a, b, N), then
sum over k as before (A = ΣA
k
). Some algebra is needed to simplify the sums so that
summation formulae can be applied. The details are left as an exercise for the reader (see
homework problems). Evaluating the limit N →∞, we ﬁnally obtain
A = lim
N→∞
N
¸
k=1
a
k
= (a + 1)
2
(b −a) + (a + 1)(b −a)
2
+
(b −a)
3
3
.
as the area under the function f(x) = x
2
+ 2x + 1, over the interval a ≤ x ≤ b. Observe
that the solution depends on a, and b. (The endpoints of the interval inﬂuence the total
area under the curve.) For example, if the given interval happens to be 2 ≤ x ≤ 4. then,
substituting a = 2, b = 4 into the above result for A, leads to
A = (2 + 1)
2
(4 −2) + (2 + 1)(4 −2)
2
+
4 −2
3
= 18 + 12 +
2
3
=
32
3
In the next chapter, we will show that the tools of integration will lead to the same conclu
sion.
11.2.2 Using left (rather than right) endpoints
So far, we used the right endpoint of each rectangular strip to assign its height using the
given function (see Figs. 2.2, 2.3, 2.4). Restated, we “glued” the top right corner of the
rectangle to the graph of the function. This is the so called right endpoint approxima
tion. We can just as well use the left corners of the rectangles to assign their heights (left
endpoint approximation). A comparison of these for the function y = f(x) = x
2
is
shown in Figs. 11.1 and 11.2. In the case of the left endpoint approximation, we evaluate
11.2. Riemann Sums: Extensions and other examples 225
y=f(x)
a b
x
y
x x x
1
a=x
0 k−1 k N
x =b x x x
1
a=x
0 k−1 k N
x =b
x
y
x
y
Figure 11.1. The area under the curve y = f(x) over an interval a ≤ x ≤ b could
be computed by using either a left or right endpoint approximation. That is, the heights of
the rectangles are adjusted to match the function of interest either on the right or on their
left corner. Here we compare the two approaches. Usually both lead to the same result
once a limit is computer to arrive at the “true ” area.
the heights of the rectangles starting at x
0
(instead of x
1
, and ending at x
N−1
(instead of
x
N
). There are still N rectangles. To compare, sum of areas of the rectangles in the left
versus the right endpoint approximation is
Right endpoints: A
N strips
=
N
¸
k=1
f(x
k
)∆x.
Left endpoints: A
N strips
=
N−1
¸
k=0
f(x
k
)∆x.
Details of one such computation is given in the box.
226 Chapter 11. Appendix
Example of left endpoint calculation
We here look again at a simple example, using the quadratic function,
f(x) = x
2
, 0 ≤ x ≤ 1,
We now compare the right and left endpoint approximation. These are shown in panels of
Figure 11.2. Note that
∆x =
1
N
, x
k
=
k
N
,
The area of the k’th rectangle is
a
k
= f(x
k
) ×∆x = (k/N)
2
(1/N) ,
but now the sum starts at k = 0 so
A
N strips
=
N−1
¸
k=0
f(x
k
)∆x =
N−1
¸
k=0
k
N
2
1
N
=
1
N
3
N−1
¸
k=0
k
2
.
The ﬁrst rectangle corresponds to k = 0 in the left endpoint approximation (rather than
k = 1 in the right endpoint approximation). But the k = 0 rectangle makes no contribution
(as its area is zero in this example) and we have one less rectangle at the right endpoint of
the interval, since the N’th rectangle is k = N −1. Then the sum is
A
N strips
=
1
N
3
(2(N −1) + 1)(N −1)(N)
6
=
(2N −1)(N −1)
6N
2
.
The area, obtained by taking a limit for N →∞is
A = lim
N→∞
A
N strips
= lim
N→∞
(2N −1)(N −1)
6N
2
=
2
6
=
1
3
.
We see that, after computing the limit, the result for the “true area” under the curve is
exactly the same as we found earlier in this chapter using the right endpoint approximation.
11.3 Physical interpretation of the center of mass
We deﬁned the idea of a center of mass in Chapter 5. The center of mass has a physical
interpretation for a real mass distribution. Loosely speaking, it is the position at which the
mass “balances” without rotating to the left or right. In physics, we say that there is no net
torque. The analogy with children sitting on a teetertotter is relevant: many children may
sit along the length of the frame of a teeter totter, but if they distribute themselves in a way
that the center of mass is at the fulcrum of the teeter totter, they will remain precariously
balanced (until one of them ﬁdgets or gets off!). Notice that both the mass and the position
of each child is important  a light child sitting on the very edge of the teeter totter can
balance a heavier child sitting closer to the fulcrum (middle). The center of mass need
not be the same as the median position. As we have see, the median is a position that
11.3. Physical interpretation of the center of mass 227
Right endpoint approximation
0.0 10.0
0.0
100.0
Left endpoint approximation
0.0 10.0
0.0
100.0
Comparison of
Left Right
approximations
0.0 10.0
0.0
100.0
Figure 11.2. Rectangles with left or right corners on the graph of y = x
2
are
compared in this picture. The approximation shown in pink is “missing” the largest rect
angle shown in green. However, in the limit as the number of rectangles, N →∞, the true
area obtained is the same.
subdivides the distribution into two equal masses (or, more generally, produces equal sized
areas under the graph of the density function.) The center of mass assigns a greater weight
to parts of the distribution that are “far away” in the same sense. (However, for symmetric
distributions, the median and the mean are the same.)
In physics, we speak of the “moment of mass” of a distribution about a point. This
quantity is related to the tendency of the mass to contribute a torque, i.e. to make the
object rotate. Suppose we are interested in a particular point of reference x. In a discrete
mass distribution, for example, the moment of mass of each of the beads relative to point
x is given by the product of the mass and its distance away from the point  as with the
teeter totter, beads farther away will contribute more torque than beads closer to point x,
and heavier beads (i.e. greater mass) will contribute more torque than lighter beads. For
example, mass 1 contributes an amount m
1
(x − x
1
) to the total moment of mass of the
distribution about the point x. Altogether the moment of mass of the distribution about the
228 Chapter 11. Appendix
point x is deﬁned as
M
1
(x) =
n
¸
i=1
m
i
(x −x
i
).
The center of mass is a special point ¯ x such that the moment of mass about that point is
zero. (Loosely speaking the tendency to rotate to the left or the right are the same: thus the
distribution would be balanced if it “rested on that point”.)
3
x
m
1
m
2
m
3
x
1
x
2
x
Figure 11.3. A discrete set of masses m
1
, m
2
, m
3
is distributed at positions
x
1
, x
2
, x
3
. The center of mass of the distribution is the position at which the given mass
distribution would balance, here represented by the white triangle.
Thus, we identify the center of mass as the point at which
M
1
(¯ x) = 0,
or
n
¸
i=1
m
i
(¯ x −x
i
) = 0.
Now expanding the sum, we rewrite the above as
n
¸
i=1
m
i
¯ x
−
n
¸
i=1
m
i
x
i
= 0,
¯ x
n
¸
i=1
m
i
−
n
¸
i=1
m
i
x
i
= 0.
But we already know that the ﬁrst summation above is just the total mass, so that
¯ xM −
n
¸
i=1
m
i
x
i
= 0,
so, taking the second term to the other side and dividing by M leads to
¯ x =
1
M
n
¸
i=1
m
i
x
i
.
We have recovered precisely the deﬁnition of the center of mass or “average x coordinate”.
11.4. The shell method for computing volumes 229
11.4 The shell method for computing volumes
In Chapter 5, we used dissection into small disks to compute the volume of solids of revo
lution. Here we show use an alternative dissection into shells.
11.4.1 Example: Volume of a cone using the shell method
x
y
y=f(x)=1−x
y
x
x
y
y=1−x
dx
Figure 11.4. Top: The curve that generates the cone (left) and the shape of the
cone (right). Bottom: the cone showing one of the series of shells that are used in this
example to calculate its volume.
We use the shell method
71
to ﬁnd the volume of the cone formed by rotating the curve
y = 1 −x
about the y axis.
Solution
We show the cone and its generating curve in Figure 11.4, together with a representative
shell used in the calculation of total volume. The volume of a cylindrical shell of radius r,
height h and thickness τ is
V
shell
= 2πrhτ.
We will place these shells one inside the other so that their radii are parallel to the x axis
(so r = x). The heights of the shells are determined by their y value (i.e. h = y = 1 −x =
71
Note to the instructor: This material may be skipped in the interest of time. It presents an alternative to the
disk method, but there may not be enough time to cover this in detail.
230 Chapter 11. Appendix
1 −r). For the tallest shell r = 0, and for the ﬂattest shell r = 1. The thickness of the shell
is ∆r. Therefore, the volume of one shell is
V
shell
= 2πr(1 −r) ∆r.
The volume of the object is obtained by summing up these shell volumes. In the limit,
as ∆r → dr gets inﬁnitesimally small, we recognize this as a process of integration. We
integrate over 0 ≤ r ≤ 1, to obtain:
V = 2π
1
0
r(1 −r) dr = 2π
1
0
(r −r
2
) dr.
We ﬁnd that
V = 2π
r
2
2
−
r
3
3
1
0
= 2π
1
2
−
1
3
=
π
3
.
11.5 More techniques of integration
11.5.1 Secants and other “hard integrals”
In a previous section, we encountered the integral
I =
sec
3
(x) dx.
This integral can be simpliﬁed to some extent by integration by parts as follows: Let u =
sec(x), dv = sec
2
(x) dx. Then du = sec(x) tan(x)dx while v =
sec
2
(x) dx = tan(x).
The integral can be transformed to
I = sec(x) tan(x) −
sec(x) tan
2
(x) dx.
The latter can be rewritten:
I
1
=
sec(x) tan
2
(x) dx =
sec(x)(sec
2
(x) −1).
where we have use a trigonometric identity for tan
2
(x). Then
I = sec(x) tan(x) −
sec
3
(x) dx +
sec(x) dx = sec(x) tan(x) −I +
sec(x) dx
so (taking both I’s to the left hand side, and dividing by 2)
I =
1
2
sec(x) tan(x) +
sec(x) dx
.
We are now in need of an antiderivative for sec(x). No “obvious substitution” or further
integration by parts helps here, but it can be checked by differentiation that
sec(x) dx = ln  sec(x) + tan(x) +C
Then the ﬁnal result is
I =
1
2
(sec(x) tan(x) + ln  sec(x) + tan(x)) +C
11.5. More techniques of integration 231
11.5.2 A special case of integration by partial fractions
Evaluate this integral
72
:
2
1
7x + 4
6x
2
+ 7x + 2
dx
This integral involves a rational function (that is, a ratio of two polynomials). The denom
inator is a degree 2 polynomial function that has two roots and that can be factored easily;
the numerator is a degree 1 polynomial function. In this case, we can use the following
strategy. First, factor the denominator:
6x
2
+ 7x + 2 = (2x + 1)(3x + 2)
Assign A and B in the following way:
A
2x + 1
+
B
3x + 2
=
7x + 4
(2x + 1)(3x + 2)
=
7x + 4
6x
2
+ 7x + 2
(Remember, this is how we deﬁne A and B.)
Next, ﬁnd the common denominator and rewrite it as a single fraction in terms of A
and B.
A
2x + 1
+
B
3x + 2
=
3Ax + 2A + 2Bx +B
(2x + 1)(3x + 2)
Group like terms in the numerator, and note that this has to match the original fraction, so:
3Ax + 2A+ 2Bx +B
(2x + 1)(3x + 2)
=
(3A+ 2B)x + (2A +B)
(2x + 1)(3x + 2)
=
7x + 4
(2x + 1)(3x + 2)
The above equation should hold true for all x values; therefore:
3A+ 2B = 7, 2A+B = 4
Solving the system of equations leads to A = 1, B = 2. Using this result, we rewrite the
original expression in the form:
7x + 4
6x
2
+ 7x + 2
=
7x + 4
(2x + 1)(3x + 2)
=
A
2x + 1
+
B
3x + 2
=
1
2x + 1
+
2
3x + 2
Now we are ready to rewrite the integral:
I =
2
1
7x + 4
6x
2
+ 7x + 2
dx =
2
1
1
2x + 1
+
2
3x + 2
dx
Simplify:
I =
2
1
1
2x + 1
dx + 2
2
1
1
3x + 2
dx
Now the integral becomes a simple natural log integral that follows the pattern of Eqn. 6.1.
Simplify:
I =
1
2
ln 2x + 1
2
1
+
2
3
ln 3x + 2
2
1
.
72
This section was contributed by Lu Fan
232 Chapter 11. Appendix
Simplify further:
I =
1
2
(ln 5 −ln 3) +
2
3
(ln 8 −ln 5) = −
1
6
ln 5 −
1
2
ln 3 +
2
3
ln 8.
This method can be used to solve any integral that contain a fraction with a degree 1
polynomial in the numerator and a degree 2 polynomial (that has two roots) in the denom
inator.
11.6 Analysis of data: a student grade distribution
We study the distribution of student grades on a test written by 76 students and graded out
of a maximum of 50 points.
11.6.1 Deﬁning an average grade
Let N be the size of the class,and y
k
the grade of student k. Here k is the number of the
student from 1 to N, and y
k
takes any value between 0 and 50 points). Then the average
grade
¯
Y is computed by adding up the scores of all students and dividing by the number of
students as follows:
¯
Y =
1
N
N
¸
k=1
y
k
.
For example, for a class of 76 students, we would have the sum
¯
Y =
1
76
76
¸
k=1
y
k
.
11.6.2 Fraction of students that scored a given grade
Suppose that the number of students who got the grade x
i
is p
i
. If the class consists of a
total of N students, then it follows that
N =
10
¸
i=1
p
i
.
This is just saying that the sum of the number of students in every one of the categories has
to add up to the total class size. The fraction of the class that scored grade x
i
is
p
i
N
.
(Dividing by N has normalized the distribution. The value p
i
/N is the empirical probabil
ity of getting grade x
i
.) The mean or average grade is:
¯
X =
1
N
50
¸
i=0
x
i
p
i
.
11.6. Analysis of data: a student grade distribution 233
Grade Distribution
mean
31.9
0.0 50.0
0.0
25.0
Figure 11.5. Distributions of grades on a test with 50 point maximum. There were
a total of 76 students writing the test. The mean grade 31.9 is shown.
11.6.3 Frequency distribution
It is difﬁcult to visualize all the data if we list all the grades obtained. We “lump together”
scores into various categories (or “bins”) and create a distribution. For example, test scores
might be divided into ranges of bins in increments of 5 points: (15, 610, 1115, etc). We
could represent grades in each bin by some value up to a speciﬁed level of accuracy. For
example, grades in the the range 1620 can be described by the score18 up to an accuracy
of ±2. This is what we have done in Table 11.1.
We will nowreinterpret our notation somewhat. We will refer to ˜ x
i
as the score and p
i
the number of students whose test score fell within the range represented by ˜ x
i
±accuracy.
(The notation ˜ x
i
is meant to remind us that we are approximating the grade value.) For
example, consider 10 “bins” or grade categories. In that case, the index i takes on values
i = 1, 2, . . . 10. The, e.g., ˜ x
4
represents all grades in the fourth “bin”, i.e. grades between
1620. A plot of p
i
against ˜ x
i
is called a frequency distribution. The bar graph shown in
Figure 11.5 represents this distribution. Table 11.1 shows the data that produced that bar
graph.
11.6.4 Average/mean of the distribution
The frequency distribution can also be used to compute an average value: each (approxi
mate) grade value ˜ x
i
is achieved by p
i
students, which is a fraction (p
i
/N) of the whole
class. When we form the multiple (p
i
/N)˜ x
i
, we assign a “weight” to each of the cate
234 Chapter 11. Appendix
i grade ˜ x
i
number p
i
¸
p
i
¸
˜ x
i
p
i
(1/N)
¸
˜ x
i
p
i
0 0 0 0 0.0 0.0
1 3±2 1 1 3.0 0.0395
2 8±2 2 3 19 0.25
3 13±2 0 3 19 0.25
4 18±2 5 8 109 1.4342
5 23±2 10 18 339 4.4605
6 28±2 8 26 563 7.4079
7 33±2 21 47 1256 16.5263
8 38±2 19 66 1978 26.0263
9 43±2 6 72 2236 29.4211
10 48±2 4 76 2428 31.9474
Table 11.1. Distribution of grades (out of 50) for a class of 76 students. The mean
grade for this class is 31.9474.
gories according to the proportion of the class that was in that category. (The terminology
weighted average is sometimes used.)
We deﬁne the mean or average grade in the distribution by
¯ x =
M
¸
i=1
˜ x
i
p
i
N
. (11.1)
Where M is the number of bins. An equivalent way of expressing the mean (average) is:
¯ x =
1
N
M
¸
i=1
˜ x
i
p
i
=
¸
M
i=1
˜ x
i
p
i
¸
M
i=1
p
i
. (11.2)
The sum in the denominator of this last fraction is simply the total class size.
In Table 11.1, we show steps in the calculation of the mean grade for the class. This
calculation is easily handled on the same spreadsheet that recorded the frequency of grades
and that was used to plot the bar graph of that distribution. Equations 11.1 and 11.2 are
saying the same thing. We will see the second of these again in the context of a more
general probability distribution in Chapter 8.
11.6.5 Cumulative function
We can calculate a “running total” as shown on Figure 11.6, where we plot for each grade
category, the total number of students whose grade was in the given range.
11.6. Analysis of data: a student grade distribution 235
We deﬁne the cumulative function, F
i
to be:
F
i
=
i
¸
k=1
p
k
.
Then F
i
is the number of students whose grade x
k
was between x
1
and x
i
(x
1
≤
x
k
≤ x
i
). Of course, when we add up all the way to the last category, we arrive at the total
number of students in the class (assuming each student wrote the test and received a grade).
Thus
F
m
=
M
¸
k=1
p
k
= N,
Where as before, M stands for the number of “bins” used to represent the grade distribution.
(Note that each student has been counted in one of the categories correspondingto the grade
he or she achieved.) Another way of saying the same thing is that
m
¸
k=1
p
k
N
= 1.
In Figure 11.6 we show the cumulative function, i.e. we plot ˜ x
i
vs F
i
. Note that this graph
is a step function. That is,the function takes on a set of discrete values with jumps at every
5th integer
73
.
Grade Distribution
Cumulative function
0.0 50.0
0.0
80.0
Cumulative function
50%
50%
40.
30.
median 0.0 50.0
0.0
80.0
Figure 11.6. Top: The same grade distribution as in Figure 11.5, but showing
the cumulative function. The grid has been removed for easier visualization of that step
function. Bottom:The cumulative function is used to determine an approximate median
grade.
73
Note: ideally, this graph should be discontinuous, with horizontal segments only. The vertical“jumps” cannot
correspond to values of a function. However the spreadsheet tool used to plot this function does not currently
allow this graphing option.
236 Chapter 11. Appendix
11.6.6 The median
We can use the cumulative function and its features to come up with newways of summariz
ing the distribution or comparing the performance of two sections. Suppose we subdivide
a given class into exactly two equal groups based on performance on the test. Then there
would be some grade that was achieved or surpassed by the top half of the class only; the
rest of the students (i.e. the other half of the class) got scores below that level. We call that
grade the median of the distribution.
To ﬁnd the median grade using a cumulative function, we must ask what grade level
corresponds to a cumulative 1/2 of the class, i.e. to N/2 students. To determine that level,
we draw a horizontal line corresponding to N/2. As shown in Figure 11.6, because the
function f is discontinuous, we only have an approximate median of 30. We observe that
the median is not in general equal to the mean computed earlier.
11.7 Factorial notation
Let n be an integer, n ≥ 0. Then n!, called “n factorial”, is deﬁned as the following product
of integers:
n! = n(n −1)(n −2) . . . (2)(1)
Example
1! = 1
2! = 2 · 1 = 2
3! = 3 · 2 · 1 = 6
4! = 4 · 3 · 2 · 1 = 24
5! = 5 · 4 · 3 · 2 · 1 = 120
We also deﬁne
0! = 1
11.8 Appendix: Permutations and combinations
11.8.1 Permutations
A permutation is a way of arranging objects, where the order of appearance of the objects
is important.
11.9. Appendix: Tests for convergence of series 237
(c)
n distinct objects
n slots
n distinct objects
k slots
n distinct objects
k objects
n n−1 ... n−k+1
n n−1 n−2 ... 2 1
n n−1 ... n−k+1
k slots
n!
(n−k)!
n!
P(n,k)=
k! C(n,k)
(a)
(b)
Figure 11.7. This diagram illustrates the meanings of permutations and combi
nations. (a) The number of permutations (ways of arranging) n objects into n slots. There
are n choices for the ﬁrst slot, and for each of these, there are n −1 choices for the second
slot, etc. In total there are n! ways of arranging these objects. (Note that the order of
the objects is here important.) (b) The number of permutations of n objects into k slots,
P(n, k), is the product n · (n − 1) · (n − 2) . . . (n − k + 1) which can also be written as
a ratio of factorials. (c) The number of combinations of n objects in groups of k is called
C(n, k) (shown as the ﬁrst arrow in part c). Here order is not important. The step shown
in (b) is equivalent o the two steps shown in (c). This means that there is a relationship
between P(n, k) and C(n, k), namely, P(n, k) = k!C(n, k).
11.9 Appendix: Tests for convergence of series
In order for the sum of ‘inﬁnitely many things’ to add up to a ﬁnite number, the terms have
to get smaller. But just getting smaller is not, in itself, enough to guarantee convergence.
(We will show this later on by considering the harmonic series.)
There are rigorous mathematical tests which help determine whether a series con
verges or not. We discuss some of these tests here
74
.
74
Recall that ⇒ means “implies that”. This is a oneway implication: A ⇒ B says that “A implies B”
and cannot be used to conclude that B implies A. ⇔ means that each statement implies the other, a twoway
implication. Just as it is important to “obey trafﬁc signs” and avoid “driving the wrong way” on a oneway street,
it is also important to be careful about use of these mathematical statements.
238 Chapter 11. Appendix
11.9.1 The ratio test:
If
∞
¸
k=0
a
k
is a series with a
n
> 0 and lim
k→∞
a
k+1
a
k
= L, then
(a) L < 1 ⇒the series converges,
(a) L > 1 ⇒the series diverges,
(a) L = 1 ⇒the test is inconclusive.
Example 1: Reciprocal factorial series
Recall that if k > 0 is an integer then the notation k! (read “k factorial”) means
k! = k · (k −1) · (k −2) . . . 3 · 2 · 1.
Consider the series
S =
∞
¸
k=1
1
k!
= 1 +
1
2 · 1
+
1
3 · 2 · 1
+. . . +
1
k(k −1) . . . 1
,
then
a
k+1
=
1
(k + 1)!
, a
k
=
1
k!
,
a
k+1
a
k
= lim
k→∞
1
(k+1)!
1
k!
= lim
k→∞
k!
(k + 1)!
= lim
k→∞
1
k + 1
= 0.
Thus L = 0, L < 1 so this series converges by the ratio test. Later, we will see a second
method (comparison) to arrive at the same conclusion.
Example 2: Harmonic series
Does the following converge?
S =
∞
¸
k=1
1
k
= 1 +
1
2
+
1
3
+. . . +
1
k
+. . . ,
This series is the Harmonic Series. To apply the ratio test, we note that
a
k+1
=
1
k + 1
, a
k
=
1
k
,
L = lim
k→∞
a
k+1
a
k
= lim
k→∞
1
k+1
1
k
= lim
k→∞
k
k + 1
= 1.
Since L = 1, in this case, the test is inconclusive. In fact, we show in Section 10.4 that the
harmonic series diverges.
11.9. Appendix: Tests for convergence of series 239
Example 3: Geometric series
Apply the ratio test to determine the condition for convergence of the geometric series,
S =
∞
¸
k=0
r
k
.
Here
a
k+1
= r
k+1
, a
k
= r
k
,
a
k+1
a
k
= r,
L = lim
k→∞
a
k+1
a
k
= r.
So, by the ratio test, if L = r < 1 then the geometric series converges (conﬁrming a fact
we have already established).
11.9.2 Series comparison tests
We can sometimes use the convergence (or divergence) of a known series to conclude
whether a second series converges (or diverges).
Suppose we have two series,
S
a
=
∞
¸
k=0
a
k
and S
b
=
∞
¸
k=0
b
k
,
such that terms of one series are always smaller than terms of the other, i.e. satisfy
0 < a
k
< b
k
for all k = 0, 1, . . . .
Then
¸
b
k
converges ⇒
¸
a
k
converges,
¸
a
k
diverges ⇒
¸
b
k
diverges.
The idea behind the ﬁrst of these statements is that the “smaller” series
¸
a
k
is
“squeezed in” between 0 (the lower bound) and the sum of the larger series (which we
know must exist, since
¸
b
k
converges.) This means that the smaller series cannot become
unbounded. For the second statement, we have that the smaller of the two series is known
to diverge, forcing the larger also to be unbounded. One must carefully observe that “⇒”
applies only in one direction. (For example, if the smaller series converges, we cannot
conclude anything about the larger series.)
Example: Comparison with geometric series
Does the series below converge or diverge?
S =
∞
¸
k=0
1
2
k
+ 1
.
240 Chapter 11. Appendix
Solution: We compare terms in this series to a terms in a geometric series with r =
1
2
. i.e.
consider
a
k
=
1
2
k
+ 1
, b
k
=
1
2
k
.
Then clearly
0 < a
k
< b
k
for every k
(since the denominator in a
k
is larger). But we know that
¸
1
2
k
converges. Therefore, so
does
¸
1
2
k
+1
.
11.9.3 Alternating series
An alternating series is a series in which the signs of successive terms alternate. An exam
ple of this type is the series
1 −
1
2
+
1
3
−
1
4
+ . . . =
¸
(−1)
n+1
1
n
We will show that this series converges (essentially because terms nearly cancel out), and
in fact, we show in Section 10.5.3 that it converges to the number ln(2) ≈ 0.693. More
generally, we have the following result.
If S is an alternating series,
S =
∞
¸
k=1
(−1)
k
a
k
= a
1
−a
2
+a
3
−a
4
+. . .
with a
k
> 0 and such that (1) a
1
 ≥ a
2
 ≥ a
3
 ≥ . . . etc. and (2) lim
k→∞
a
k
= 0, then
the series converges. (This was established by Leibniz in 1705.)
11.10 Adding and multiplying series
We ﬁrst comment that arithmetic operations on inﬁnite series only make sense if the series
are convergent. In this discussion, we will deal only with series of the convergent type.
When this is true, then (and only then) is it true that we can exchange the order of operations
as discussed below.
If
¸
a
k
and
¸
b
k
both converge and
¸
a
k
= S
¸
b
k
= T, then
(a)
¸
(a
k
+ b
k
) converges and
¸
(a
k
+b
k
) =
¸
a
k
+
¸
b
k
= S +T.
(b)
¸
ca
k
= c
¸
a
k
= cS, where c is any constant.
(c) The product (
¸
a
k
) · (
¸
b
k
) =
¸
∞
k=0
¸
k
i=0
a
i
b
k−i
= S · T.
11.11. Using series to solve a differential equation 241
Example:
¸
1
2
k
·
¸
1
3
j
=
1 +
1
2
+
1
4
+. . .
1 +
1
3
+
1
9
+. . .
.
Both series converge, so we can write
∞
¸
j=0
∞
¸
k=0
1
2
k
1
3
j
=
1
1 −
1
2
·
1
1 −
1
3
= 2 ·
3
2
= 3.
11.11 Using series to solve a differential equation
Airy’s equation arises in the study of optics, and (with initial conditions) is as follows:
y
= xy, y(0) = 1, y
(0) = 0.
As before, we will write the solution as a series:
y = a
0
+a
1
x +a
2
x
2
+a
3
x
3
+a
4
x
4
+a
5
x
5
+. . .
Using the information from the initial conditions, we get y(0) = a
0
= 1 and y
(0) = a
1
=
0. Now we can write down the derivatives:
y
= a
1
+ 2a
2
x + 3a
3
x
2
+ 4a
4
x
3
+ 5a
5
x
4
+. . .
y
= 2a
2
+ 2 · 3x + 3 · 4x
2
+ 4 · 5x
3
+. . .
The equation then gives
y
= xy
2a
2
+ 2 · 3a
3
x + 3 · 4a
4
x
2
+ 4 · 5a
5
x
3
+. . . = x(a
0
+a
1
x +a
2
x
2
+a
3
x
3
+. . .)
2a
2
+ 2 · 3a
3
x + 3 · 4a
4
x
2
+ 4 · 5a
5
x
3
+. . . = a
0
x +a
1
x
2
+a
2
x
3
+a
3
x
4
+. . .
Again, we can equate the coefﬁcients of x, and use a
0
= 1 and a
1
= 0, to obtain
2a
2
= 0 ⇒a
2
= 0,
2 · 3a
3
= a
0
⇒a
3
=
1
2·3
,
3 · 4a
4
= a
1
⇒a
4
= 0,
4 · 5a
5
= a
2
⇒a
5
= 0,
5 · 6a
6
= a
3
⇒a
6
=
1
2·3·5·6
.
This gives us the ﬁrst few terms of the solution:
y = 1 +
x
3
2 · 3
+
x
6
2 · 3 · 5 · 6
+. . .
If we continue in this way, we can write down many terms of the series.
242 Chapter 11. Appendix
Index
3D
objects, 81
Abel’s theorem, 211
acceleration, 62
actin
cortex, 84
addition
principle, 140
age
distribution, 167
of death, 167
airways
surface area, 23
volume, 22
Airy’s equation, 241
alcohol
in the blood, 185
algorithm, 29
allele, 146, 165
alligator, 101
alternating series, 240
alveoli, 17
analytic, 214
approach, 29
annuity, 74
antidifferentiation, 49
antiderivative, 47, 110
table of, 49
applications
of integration, 61
approximation
left endpoint, 224
linear, 36, 200
right endpoint, 224
Archimedes, 4
area
as a function, 39
circle, 6
of planar region, 27
of simple shapes, 1
parallelogram, 2
polygon, 3
rectangle, 2
triangle, 2
true, 35
average, 234
mass density, 86
of probability distribution, 137
weighted, 234
average value
of a function, 76, 161
bacterial
motion, 150
balance
energy, 188
mass, 186
bank
interest rate, 74
bell
curve, 145
Bernoulli trial, 140
bifurcate, 18
bin, 166, 233
binomial
coefﬁcient, 142
distribution, 140, 143
theorem, 142
birth, 71, 178
blood alcohol, 185
branch
daughter, 18
parent, 18
243
244 Index
bronchial tubes, 17
calculus
motivation for, xvii
carrying capacity, 191
center
of mass, 81, 85, 133, 137, 157, 228
centroid, 122
chain rule, 107
change
net, 66, 71
total, 66
chemical kinetics, 185
chromosomes, 146
circadean
rhythm, 72
cohort, 196, 197
coin
fair, 134
toss, 136, 165
combination, 237
comparison
integral and series, 206
integrals, 205
tests, 239
completing the square, 117
conservation
of energy, 188
of mass, 186
converge, 14
convergence, 199
of series, 200
tests for, 201, 206, 237
convergent, 15
coordinate
system, 28
critical point, 53
cumulative
function, 84, 136, 154, 155, 235
data, 133
set, 133
decay
radioactive, 162
rate, 185
deﬁnite
integral, 37, 43
density, 61, 82
probability, 153
dice, 139
differential, 107–109
equation, 177
notation, 107, 108
differential equation
linear, 184
nonlinear, 192
displacement, 62
distribution
binomial, 140
frequency, 233
Gaussian, 145
grade, 133, 137
normal, 145
uniform, 169, 174
diverge, 14, 201
divergence, 199
of series, 200
divergent, 15
dummy
variable, 40
emptying
container, 186
time, 191
endpoints, 30, 113
energy
balance, 188
conservation, 188
kinetic, 188
potential, 188
error
approximation, 216
Euler’s method, 184
evaluate
a function, 208
even
function, 51
expected value, 137
experiment, 134
cointoss, 137
repeated, 134
exponential, 35
Index 245
decaying, 202
function, 214
growth, 19, 180
eye
color, 146, 147
factorial, 238
notation, 236
factoring
denominator, 117
failure, 140
fair
dice, 139
falling object, 181
ﬁrstorder
differential equation, 178
force
frictional, 181
of gravity, 181
formulae
areas, 25
volumes, 25
fractals, 18
frequency, 73, 136
friction, 181
frictional
coefﬁcient, 181
fulcrum, 226
function
bounded, 37
continuous, 37
even, 51
inverse, 53
Fundamental Theorem of Calculus, 40,
41, 43, 47, 62, 155, 216
Gauss, 11
formula, 11, 12
Gaussian
distribution, 145
gene, 146
genetics, 146
genotype, 146
geometric
series, 10, 209, 240
series, ﬁnite, 13
series,ﬁnite, 200
series,inﬁnite, 201
Gompertz, 196
grade
distribution, 137, 232
growth
density dependent, 191
exponential, 19, 75, 180
logistic, 191
population, 197
selfsimilar, 18
unlimited, 179, 191
growth rate
intrinsic, 191
per capita, 179
Hanoi
tower of, 8
HardyWeinberg, 146
harmonic
series, 201, 206, 211, 237
height
distribution, 166
higher order terms, 200
hormone
level of, 72
hypotenuse, 121
implication, 237
improper
integral, 58, 162–164, 203
income
stream, 74
induction, 221
mathematical, 12
inﬁnite
series, 14, 200
initial
condition, 179
initial value, 178
problem, 178, 179
integral, 110
applications of, 61
converges, 202, 204
deﬁnite, 31, 33, 37, 40, 43, 110
deﬁnite,properties of, 44
246 Index
diverges, 202, 203
does not exist, 57, 121
exists, 163
improper, 58, 76, 162–164, 199, 202,
206
indeﬁnite, 110, 192
integrand, 110
integration, 33
by partial fractions, 124
by parts, 107, 124, 126
by substitution, 111
constant, 111
numerical, 162
interest
compounded, 75
rate, 74
inverse function, 53
inverse trigonometric functions, 121
keratocyte, 84
kinetic
energy, 188
Kulesa
Paul, 101
leaf
area of, 33
leaking
container, 186
Leibniz, 240
length
of curve, 81, 96
of straight line, 96
limit, 29
linear approximation, 36
logistic
equation, 191
growth, 191
lung
branching, 16
human, 22
Maple, 107
mass
balance, 186
conservation, 186
density, 82, 165
discrete, 165
mass distribution
continuous, 82
discrete, 82
Mathematica, 107
mating
table, 148
maximum, 53, 55
mean, 76, 133, 137, 153, 158, 161, 234
continuous probability, 154
decay time, 162, 164
of a distribution, 157
of a probability distribution, 106
of binomial distribution, 144
measurement, 133
median, 87, 133, 158, 161, 236
continuous probability, 154
decay time, 162, 164
micron(µm), 84
minimum, 53, 55
model
derivation of, 186
modeling, 177
Mogilner
Alex, 84
moment, 171
j’th, 139
ﬁrst, 172
of a distribution, 171
of distribution, 139
of mass, 227
second, 139, 172
zero’th, 172
mortality, 178
age distribution, 167
constant, 178
Gompertz law of, 196
nonconstant, 196
motion
uniform, 63
uniformly accelerated, 63
multiplication
principle, 140
Murray,James D., 101
Index 247
net change, 67
Newton’s law, 177
nonlinear
differential equation, 189
normalization, 145, 155
constant, 155, 163
numerical
approach, 29
method, 184
observation, 133
ODE, 177
oscillation, 73
outcome
of experiment, 134
partial fractions, 118, 192, 231
partial sums, 15, 200
PDE, 177
pendulum, 177
perfect square, 117
period, 73
permutation, 142, 237
phenotype, 146
pi(π)
approximation for, 212
deﬁnition of, 5
polygon, 3
polynomials, 208
population
growth, 178, 197
sustainable, 195
potential
energy, 188
power
series, 199
Preface, xvii
present value, 75, 76
probability
applications of, 161
continuous, 153, 165
cumulative, 136
density, 154
discrete, 165
discrete, rules of, 135
empirical, 134, 136
symmetric, 160
theoretical, 135
product rule
for derivatives, 126
production, 71
progression
geometric, 20
mathmatical, 19
Pythagorean
theorem, 96
triangle, 121
radioactive
decay, 162
radioactive decay
cumulative, 164
raindrops, 169
random
event, 134
variable, 135
walk, 150
random variable
continuous, 153
discrete, 153
rate
birth, 178
mortality, 178
of change, 67
production, 72
removal, 72
ratio
test, 238
rational
function, 124, 231
rectangle
height of, 30
rectangular strips, 28, 43
recursion relation, 19
removal, 71
replicate, 136
rescale, 145
Riemann
sum, 28–31, 33, 40
rule
chain, 116
rules
248 Index
iterated, 23
savings account, 74
scaled
equation, 192
scaling, 192
secant, 230
separation
of variables, 65, 177–179, 182, 189,
197
series
p, 208
alternating, 240
comparison tests, 239
converges, 200
divergent, 238
diverges, 200, 208
ﬁnite geometric, 13
geometric, 10, 13, 200, 209, 239
harmonic, 201, 206, 211, 237, 238
inﬁnite, 14, 199, 200
operations on, 240
Taylor, 199, 209
term by term integration, 210
Sigma
notation, 9
size distribution, 169
sketching
antiderivative, 53
solids
of revolution, 90, 91
solution
curves, 190
of initial value problem, 179
qualitative, 68
quantitative, 68
to ODE, 180
spreadsheet, 23, 29, 190
standard deviation, 138, 172, 173
steady state, 66, 184, 195
step
function, 138, 235
strips
area of, 28
rectangular, 28, 43
substitution, 107, 111
examples, 113
trigonometric, 118, 123
success, 140
sum
geometric, 35
of N cubes, 12
of N integers, 11
of N squares, 12
of square integers, 32
Riemann, 29, 30, 40
summation
index, 9
notation, 9
sums
partial, 15, 200
surface area
cylinder, 6
survival
probability, 168
tangent line, 200
Taylor polynomial, 209
Taylor series, 199, 209
for cos(x), 216
for sin(x), 214
for e
x
, 213
teeth, 99
temperature, 67
terminal velocity, 180
torque, 226, 227
tree
growth, 68
structure, 18
trial
Bernoulli, 140
triangle
Pythagorean, 121
trigonometric, 120
trifurcate, 18
trigonometric
identities, 118
substitution, 118
unbiased, 134, 136
unbounded
function, 57
Index 249
undeﬁned
function, 58
units, 7
variance, 138, 153, 172
continuous probability, 154
velocity, 62
terminal, 66, 180, 184
volume
cube, 6
cylinder, 7, 90
cylindrical shell, 7
disk, 90
of solids, 81
rectangular box, 6
shell, 90
simple shapes, 6
sphere, 7
spherical shell, 7
zygote, 147
ii
Leah EdelsteinKeshet
Contents
Preface 1 Areas, volumes and simple sums 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Areas of simple shapes . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Example 1: Finding the area of a polygon using triangles: a “dissection” method . . . . . . . . . . . . . . . . . . . 1.2.2 Example 2: How Archimedes discovered the area of a circle: dissect and “take a limit” . . . . . . . . . . . . . . 1.3 Simple volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Example 3: The Tower of Hanoi: a tower of disks . . . . 1.4 Summations and the “Sigma” notation . . . . . . . . . . . . . . . . . 1.4.1 Manipulations of sums . . . . . . . . . . . . . . . . . . . 1.5 Summation formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Example 3, revisited: Volume of a Tower of Hanoi . . . . 1.6 Summing the geometric series . . . . . . . . . . . . . . . . . . . . . . 1.7 Prelude to inﬁnite series . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.1 The inﬁnite geometric series . . . . . . . . . . . . . . . . 1.7.2 Example: A geometric series that converges. . . . . . . . 1.7.3 Example: A geometric series that diverges . . . . . . . . 1.8 Application of geometric series to the branching structure of the lungs 1.8.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . 1.8.2 A simple geometric rule . . . . . . . . . . . . . . . . . . 1.8.3 Total number of segments . . . . . . . . . . . . . . . . . 1.8.4 Total volume of airways in the lung . . . . . . . . . . . . 1.8.5 Total surface area of the lung branches . . . . . . . . . . 1.8.6 Summary of predictions for speciﬁc parameter values . . 1.8.7 Exploring the problem numerically . . . . . . . . . . . . 1.8.8 For further independent study . . . . . . . . . . . . . . . 1.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 1 1 1 3 4 6 8 9 10 11 12 13 14 15 16 16 16 17 19 20 20 21 22 23 23 25
2
Areas 27 2.1 Areas in the plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2 Computing the area under a curve by rectangular strips . . . . . . . . 29 iii
iv 2.2.1 2.2.2
Contents First approach: Numerical integration using a spreadsheet Second approach: Analytic computation using Riemann sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . The area of a leaf . . . . . . . . . . . . . . . . . . . . . . . . . . . . Area under an exponential curve . . . . . . . . . . . . . . . . . . . . Extensions and other examples . . . . . . . . . . . . . . . . . . . . . The deﬁnite integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . The area as a function . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 30 33 33 35 36 37 37 38 39 40 43 43 44 45 47 47 47 48 49 50 50 50 52 53 53 56 56 57 57 57 57 58 59 61 61 62 62 63 63 64 66 68
2.3 2.4 2.5 2.6
2.7 2.8 3
The Fundamental Theorem of Calculus 3.1 The deﬁnite integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Properties of the deﬁnite integral . . . . . . . . . . . . . . . . . . . . 3.3 The area as a function . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . 3.4.1 Fundamental theorem of calculus: Part I . . . . . . . . . 3.4.2 Example: an antiderivative . . . . . . . . . . . . . . . . . 3.4.3 Fundamental theorem of calculus: Part II . . . . . . . . . 3.5 Review of derivatives (and antiderivatives) . . . . . . . . . . . . . . . 3.6 Examples: Computing areas with the Fundamental Theorem of Calculus 3.6.1 Example 1: The area under a polynomial . . . . . . . . . 3.6.2 Example 2: Simple areas . . . . . . . . . . . . . . . . . 3.6.3 Example 3: The area between two curves . . . . . . . . . 3.6.4 Example 4: Area of land . . . . . . . . . . . . . . . . . . 3.7 Qualitative ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Example: sketching A(x) . . . . . . . . . . . . . . . . . 3.8 Some ﬁne print . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.1 Function unbounded I . . . . . . . . . . . . . . . . . . . 3.8.2 Function unbounded II . . . . . . . . . . . . . . . . . . . 3.8.3 Example: Function discontinuous or with distinct parts . 3.8.4 Function undeﬁned . . . . . . . . . . . . . . . . . . . . 3.8.5 Inﬁnite domain (“improper integral”) . . . . . . . . . . . 3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications of the deﬁnite integral to velocities and rates 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Displacement, velocity and acceleration . . . . . . . . . . . 4.2.1 Geometric interpretations . . . . . . . . . . . . 4.2.2 Displacement for uniform motion . . . . . . . . 4.2.3 Uniformly accelerated motion . . . . . . . . . . 4.2.4 Nonconstant acceleration and terminal velocity 4.3 From rates of change to total change . . . . . . . . . . . . . 4.3.1 Tree growth rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Contents 4.3.2 Radius of a tree trunk . . . . . 4.3.3 Birth rates and total births . . . Production and removal . . . . . . . . . . . Present value of a continuous income stream Average value of a function . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v 68 71 71 74 76 78
4.4 4.5 4.6 4.7 5
Applications of the deﬁnite integral to calculating volume, mass, and length 81 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 Mass distributions in one dimension . . . . . . . . . . . . . . . . . . 82 5.2.1 A discrete distribution: total mass of beads on a wire . . . 82 5.2.2 A continuous distribution: mass density and total mass . . 82 5.2.3 Example: Actin density inside a cell . . . . . . . . . . . 84 5.3 Mass distribution and the center of mass . . . . . . . . . . . . . . . . 85 5.3.1 Center of mass of a discrete distribution . . . . . . . . . . 85 5.3.2 Center of mass of a continuous distribution . . . . . . . . 85 5.3.3 Example: Center of mass vs average mass density . . . . 86 5.3.4 Physical interpretation of the center of mass . . . . . . . 87 5.4 Miscellaneous examples and related problems . . . . . . . . . . . . . 87 5.4.1 Example: A glucose density gradient . . . . . . . . . . . 87 5.4.2 Example: A circular colony of bacteria . . . . . . . . . . 89 5.5 Volumes of solids of revolution . . . . . . . . . . . . . . . . . . . . . 90 5.5.1 Volumes of cylinders and shells . . . . . . . . . . . . . . 90 5.5.2 Computing the Volumes . . . . . . . . . . . . . . . . . . 91 5.6 Length of a curve: Arc length . . . . . . . . . . . . . . . . . . . . . . 96 5.6.1 How the alligator gets its smile . . . . . . . . . . . . . . 99 5.6.2 References . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Techniques of Integration 6.1 Differential notation . . . . . . . . . . . . . . . . . . . 6.2 Antidifferentiation and indeﬁnite integrals . . . . . . . 6.2.1 Integrals of derivatives . . . . . . . . . . . 6.3 Simple substitution . . . . . . . . . . . . . . . . . . . 6.3.1 Example: Simple substitution . . . . . . . 6.3.2 How to handle endpoints . . . . . . . . . 6.3.3 Examples: Substitution type integrals . . . 6.3.4 When simple substitution fails . . . . . . . 6.3.5 Checking your answer . . . . . . . . . . . 6.4 More substitutions . . . . . . . . . . . . . . . . . . . . 6.4.1 Example: perfect square in denominator . 6.4.2 Example: completing the square . . . . . . 6.4.3 Example: factoring the denominator . . . 6.5 Trigonometric substitutions . . . . . . . . . . . . . . . 6.5.1 Example: simple trigonometric substitution 6.5.2 Example: using trigonometric identities (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 107 110 111 111 112 113 113 115 116 116 116 117 117 118 118 119
6
3 Mean and median . . . . .1 Radioactive decay . . . . . . . . . . . .3. . . . . . . . . . .5. .2 How is the mean different from the median? .4. . . . . . . . . . .6. . . . 6. . . . . . . . . . . . . . . 7. . .2 Dealing with data . . . . . . . . . . . . . . . . . 7. . . . . . . .6. . . . . . . . . . .1 Example: partial fractions (1) . . . . . . . . . . . . . . . 7.7 HardyWeinberg genetics . . . . . . . . . . 8. . . . . . . . . . . . . . .6. . . . . . 7. . . . . . . 8. . 119 120 122 123 124 124 125 126 126 130 133 133 133 134 134 134 134 135 135 136 136 136 137 137 140 140 142 143 145 146 147 150 152 153 153 153 156 157 158 160 161 161 162 165 6. . 8. . . . . . 8. . . Integration by parts . . . . . . . . . . . . 8. . . . . . . .7. . . . . 8 Continuous probability distributions 8. . . . . . . . . . . . . . . .3. . . . . . . . . . . .4 Examples of experimental data . . .1 Random nonassortative mating . . . . .1 Introduction . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . .3. . . 7. . . . .4 Theoretical Probability . . . . 8. .5 Random variables and probability distributions 7. . . .6.5.4 Applications of continuous probability . . . . .3 Example: using trigonometric identities (2) . . . . .vi 6. . . . . . . . .3. . . . . . . . . . . . . 6. . . 7. . . . . . 7. . 7. . . . .9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . 7. . . . .6 The cumulative distribution .3. . . . . . . . . . . . . . . . . . . . .7 6. . . . . . . . . . . . . 7. . . . . . . .8 Random walker . . . . . . .1 The Binomial distribution . . . . . . . . .1 Example: probability density and the cumulative function 8. . . . . . .4 Example: converting to trigonometric functions . . . . . . . . . . . 7. . . . . . . .5. . . . . . . . Contents . . . . . . .6 Bernoulli trials . 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 Empirical probability . . . . . . . . . . . . . . . . . . . . . . .3 Simple experiments . . . .3 Example: partial fractions (3) . . . . . . . . . 7. 7. . . . . . . . . . . . 7. . . . Partial fractions .3. . . . . . . . . . . . . . . . . . .3. . . . .6 6. . . . . . .5. . . . . . . . . . . . . . . . . . . . . .4 The normalized binomial distribution . . . . . . 7. . .5 Mean and variance of a probability distribution . . . .2 Outcome . . . Summary . . . . . . . . . . . . . . . . . . . 7. . . . . .3. . 6. . . . . . . . . . . . . . . . . . . . . . 7. . . . . . .3 Example: a nonsymmetric distribution . .2 The Binomial theorem . . . . . . . . . . . . . . . . . . . . .5 Example: The centroid of a two dimensional shape 6.6.1 Example: Mean and median . . . . .2 Basic deﬁnitions and properties . . . . . . . . . . . . . .1 Introduction . . . . . . . .1 Example1: Tossing a coin . . . . . . . . . . 7. 6. . . . . . . . . .2 Discrete versus continuous probability . . . . . . . . . . . .2 Example 2: grade distributions . . . .4. . . . . .3. . . . . . . . 8. . . . . . .2 Example: partial fractions (2) . . . . . . .8 7 Discrete probability and the laws of chance 7. . .3 The binomial distribution . . . . . . . . .1 Experiment . . . . . . . . . 7.6. . . . . . . . . . . . . . . . .6 Example: tan and sec substitution .6. . . . . . . . . 7. 6.
. . . . . . . . . . . . . . . . . . . . . 8. .1 Aging and Survival curves for a cohort: . .4. . . . . . . . .2 Gompertz Model . . .5 Emptying a container . . . .3. . . .4 Example: Age dependent mortality . . . . . . 174 . .3 Separation of variables . . . . . . . . . 175 . . . . . . . . . . . . .5. . . . . . . and Taylor series 10. . 9. 9. . . . . . . . . . . 8. . . . . .5. . . . . . . . . . . . 9. .2 Convergence and divergence of series . . .5 The solution of the logistic equation . . . .7. . . .1 Blood alcohol . . .6 What this solution tells us . . . . . . 202 . . . . . . . . . . . . 9. .3 Example: computing moments . . .1 Introduction . 9.6. . . . . . . . . . .2 Including friction: the case of terminal velocity . . . . 9. . . . . . .2 Separation of variables and integration . . . . . . . . .5.1 A simple model for population growth .4. 9. . . . . . . . . . . . . . . . . 9. . .1 Conservation of mass . . 9. . . . . .4 Related problems and examples .3 Steady state . . . . . . . . . . . .5. . . . . . . . . . Summary . . .1 Ignoring friction: the uniformly accelerated case . . . . . 172 . . . . . .5. . . . . . .1 Introduction . . . . . . . . . . .2. .2 Chemical kinetics . . . . Moments of a probability density . . . . . . . improper integrals. . . . . . . .3. . . . . . . . . . . . . .2. . . . . .6 Density dependent growth . . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . .3 Terminal velocity and steady states . . . . . . . . . . . . . . . . . .5. . .2 Relationship of moments to mean and variance of a probability density . 199 . . . . . . . . . . . .4 Application of partial fractions . . . 9.1 Example: A decaying exponential: convergent improper integral . 9. . . . . . . 9. . . . . . . . . . . . . .1 The logistic equation . . . . . . . 10. . . . . . . 177 177 178 178 179 180 181 181 184 184 185 185 186 186 188 188 189 191 191 191 192 192 192 193 194 195 196 197 197 Differential Equations 9. . . 9. . . . .6. . 9. . . . . . . . . vii 166 167 169 171 171 8. . . . . . . . . . . . . . . .5. 199 . . . . . . .6. . . . . . .6. . . . . . . . . . . . . . . . . . . .5 Example: Raindrop size distribution . . . . . .3.2 Scaling the equation . . . .2 Conservation of energy . . .5 How long will it take the tank to empty? . . . . . . . . . . . .7 Extensions and other population models: the “Law of Mortality” 9. 200 . . . . . . .6. . . . 9. . . . . 10 Inﬁnite series. 10. . . . . . . . . . . . .6. . . . . 9. .4. . . .3 Improper integrals . . .6 9 . . . . . . . . . . . .5. . . . 9. . . . . . . . . . . . . . . . . . . . . . . 202 . . . . . . . . . . .3. . . . . . . . . . . . . . . 8. . . . . . . 8. .2 Unlimited population growth . . . . . .3 Putting it together . . . . . . . . . . . . . . . . .4. .1 Deﬁnition of moments . . 10. . . . .4. 9. . . . . . . . 8. . . . . . . . . . . .4 Solution by separation of variables . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 8. 9. . . . . . . 9. . . 9. . . . . . . . 9. . . . . . . . .8 Summary . . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Example: Student heights . . . . . . . . . . . .7. . . . . . . . . . . . . . . .Contents 8. 9.
. . . . . . . . 10. . . .6. 11. . . . . . . . . .5. . .3 Example 3: An expansion for the logarithm . . . . . . . . . . . . . . 11. .1 Taylor series for the exponential function. . . . . . . . . . . . . . . . . . . .2 Fraction of students that scored a given grade . . From geometric series to Taylor polynomials . . . . 10. . . . . . . . . . . .2 Example 2: Another substitution . . . . . . . . . . . . . . . . . . . . . . . . .1 Permutations . . . . . . . 11. . . 11. . . .5 More techniques of integration . 11. . . .2 Series comparison tests . . . 11. . . . . . . 11. .7. . . . . . .2 Taylor series of trigonometric functions . . 11. .9.10 Adding and multiplying series . . 10. . . . . . . .5 Integral comparison tests . . . . . . . . . . . . . ex . . . .8 Appendix: Permutations and combinations . . . . . . . Taylor Series: a systematic approach . . .2 Riemann Sums: Extensions and other examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Example 4: An expansion for arctan . .3 Example: The improper integral of 1/x2 converges . . . .8 11 Appendix 11. . . .11 Using series to solve a differential equation . . . . . . . Application of Taylor series . . .1 The ratio test: . Comparing integrals and series . . . . . . . . 10. . . . . . . . .1 The harmonic series . . 11.1 How to prove the formulae for sums of squares and cubes . . .6. . .4 When does the integral of 1/xp converge? . . .1 Example 1: using a Taylor series to evaluate an integral 10. . . . . . .2. . . . . 10.2. . . . . .3.6 Analysis of data: a student grade distribution . . . . . . . . . . . .1 Example: Volume of a cone using the shell method .2 A special case of integration by partial fractions .2 Using left (rather than right) endpoints .5. . 10. . . . . . . . . . . . .8. . . . . .2 Example 2: Series solution of a differential equation . . . . . . . . . . . . . 10. . . .6. . . . . . 10. . . . . . . . .4 The shell method for computing volumes .1 Deﬁning an average grade . . . . . . . . . . . .1 Secants and other “hard integrals” . . . .6. . . 10. . 10. . . . . . . . . . . . . . 11. 11. .5 10. . . . . . . . . . . . . . . . .3 Physical interpretation of the center of mass . . . . . . . . . . . . . . . . Summary . . . . . . . .1 Example 1: A simple expansion . . . . . . . . . .3.4. . . . . . . .5 Cumulative function . . . . .7. .9. . . . . . 11. . . 11. . . . .2 Example: The improper integral of 1/x diverges . . . . . . . . . . . . .7 10. . . . . . . .6.9 Appendix: Tests for convergence of series . . . . . . . . 10. . . . . . . . . . . . . . . . . . . . . .3. .5. . . . . . . . . . 11. . .viii Contents 10. . .5. . .6. . . . . . .9. .3 Frequency distribution . . . 11. . . . 11. . . . . . . . . . . . . . .5.6 The median . . . . . . . . . . . . . . . . .6.4 Average/mean of the distribution . . . . . . . . . . . . . . . . . . . . . 11. . . . . . . . .1 A general interval: a ≤ x ≤ b . . . . . . .5. . . . 11. . . . . . . . 11. . . . . . . . . . . . . . . . 203 204 204 205 206 206 208 209 210 210 211 212 213 214 216 216 217 218 221 221 223 223 224 226 229 229 230 230 231 232 232 232 233 233 234 236 236 236 236 237 238 239 240 240 241 10. .3 Alternating series . . .4 10. . . .4. . .3. . . . . . . . 11. 11. . . . . . . . . . . . . . . . . 11. . . . . .7 Factorial notation . 11.6. . . . . 11.6 10. 11. . . .
Contents Index ix 243 .
x Contents .
.5 4. . . . . . . . . . . . . . . . . . The tree radius as a function of time . . . . . . . . . . . . . . . . .2 3. . .3 3. . . . Approximating hormone production/removal . . . . . . . . . . . . . . . . . . . 2 Dissecting n nsided polygon into n triangles . . . and properties of the deﬁnite integral . . . Tree growth rates. . . . . . . . . . . . . . . . . . . . . . Displacement and velocity as areas under curves Terminal velocity . . . . . . . . . .7 Planar regions whose areas are given by elementary formulae. . . . . . . . . . . . . . . . . . . . xi . . . . . . . .3 1. .6 4. . 7 Computing the volume of a set of disks. . . . . . . The area A(x) considered as a function . The areas A1 and A2 in Example 3 . . . . .7 3. . .6 1. . . . . . . . . . . .3 4. . . . . . . . . . . . . . .6 3. . . . . . . . . . . . . . . . . . . . . . . . Splitting up a region to compute an integral . . . . . . 24 Areas of regions in the plane . . . . . . . . . . . . . . . . . . .1 3. . . . . . . . . . . .4 4. . . . . . . Sketching the antiderivative of f (x) . . . . . (This structure is sometimes called the tower of Hanoi after a mathematical puzzle by the same name. . . . .2 4. . . . . . . . . . . . . . . . . . . . . . . . . .1 4. .1 2. . . . . . . . . . . . . .8 3. . . . . . . . .5 3. The “area function” corresponding to a function f (x) . . . . . . . . . .4 3. Increasing the number of strips improves the approximation . . 3 Archimedes’ approximation of the area of a circle . . . . . . . . .List of Figures 1. . . . . . . . . . Approximating an area by a set of rectangles . The area corresponding to the deﬁnite integral of the function f (x) More areas related to deﬁnite integrals . . .2 2. . . . . . . . . 17 Volume and surface area of the lung airways . . . . . . . The area of a leaf . . . . . . . . . Integrating in the y direction . .5 1. . . . . . . . . . . . . The area of a symmetric region . . . . . . . . . 5 3dimensional shapes whose volumes are given by elementary formulae . . .4 2. . . . . Rates of hormone production and removal . . .5 2. . . . . . . . . .3 2. . . . .2 1. . . . . . . . . . . How the area changes when the interval changes . . . . . . . . . . 27 28 30 34 37 38 39 44 46 51 52 54 55 56 58 59 63 66 68 69 70 72 74 Deﬁnite integrals for functions that take on negative values. . . Sketches of a functions and its antiderivative . . . . . . .1 1. . . . . .9 4. .6 2. . . . . . The rate of change of a tree radius . .) 8 Branched structure of the lung airways . . . . . . . . . . . . . . . . . . . .4 1. . . . . . . . . . . . . . . . . . .7 3. . . . . . . . . . . . . .7 2. . . . . . . . . . . . . .
. . . . Raindrop radius and volume probability distributions . . . . . . .4 7. Generating a sphere by rotating a semicircle . . . . . . . . . . . . . . . . . . . Probability density and its cumulative function in Example 8. .5 7. . . . . . . . . . . . . . . . . . . Gompertz Law of Mortality .3. . . . . . . . .3 9. . . . . Median and median for a nonsymmetric probability density . . . . . . . . . . . .5 8. . . . . . . . . . . . . . . . . . .3 but for example 6. . . . . . . . . .4 9. . . . . . . . . . . . . . . . . . . . . . . .2 5. . .3 6. . . . . . . . . . . . . . . . . . . . . . . . . . . A random walker . . . . . . . . . . . . . . . . . . . . . . .6 5. . . . .2 8. Figure illustrating differential notation . . . . . . . . 166 . . . . .xii 4. . .1 9. . . . . . . . . . . . . . . . . . . .6 8. . . . . Solutions to the logistic equation . . . . . . . . . . . .2 7. . . . . . . . . . . .1 6. . . . . . . . . . . . . . . . . Continuous mass distribution . . . Analysis of distance between successive teeth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Normal (Gaussian) distribution . . .1 7. . . . .8 5. . . . .1 8. . .10 5. . . . . . . . . . . . . . . A bacterial colony . . . . . . . . . . . . . 169 . . . . . . . . . . Dissecting a curve into small arcs . . . . . . . . . . . . . . . . .15 6. . . . . . . . . . . Height of ﬂuid versus time . Dissecting a solid of revolution into disks . . . . . . . . . . . A helpful triangle . . Reﬁning a histogram by increasing the number of bins leads (eventually) to the idea of a continuous probability density. . . . . . . . . . .7 5. . . . . . . Normal probability density and its cumulative function Hardy Weinberg mating . . . . . . . . . . .4 8. . . . . . . The actin cortex of a ﬁsh keratocyte cell . . .4 5. . . . . .5 5. . . . . . . . . . . . A glucose gradient in a test tube . . . . . . . . . . . . . . . . . . . . . . .9 5. . . . . . .1 . . 82 83 84 88 89 91 91 92 93 95 97 97 100 103 104 108 108 120 122 124 138 143 145 146 149 150 157 159 160 162 A plot of data from a coin tossing experiment . . .5 9. . . . . Blood alcohol level . . . Volume of one of the disks . . Emptying a container . . . . . . . . . . . . The Binomial distribution . . . Mean versus median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the spreadsheet to compute and graph arclength Alligator mississippiensis and its teeth . . . . . . . . . . . . . . . . . . . . . . . . 182 186 187 190 194 196 Approximating a function . . . .11 5. . .13 5.2. . . . . . . . . . . . . . . Terminal velocity . . 78 Discrete mass distribution . . . .4 6. . . . . . . A paraboloid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 5. . . . . . . . . Elements of arclength . Median for Example 8. . . . . . . . . . . . . . . . . .3 7. . . . As in Figure 6. . . . . . . . . . . . . . . . . . .6 10. . . . . . . . . . . . . . . . . 199 . . . . . . . . . . . . . . . . . . .1 5.6 9. . . . . . . . . . . . . . . .3 5. . . . . .1 . . . . . . . . . . . . . . . . . Volumes of simple 3D shapes . . . . . . . . .5 7. . . . . . .14 5. . . . . . . . . . . . . . A semicircular shape. . . . . . . .5. . . m = ∆y/∆x . . . . . . .1 List of Figures The yearly day length cycle and average day length .2 9. . .3 8. . . . . . . Slope of a straight line. . . . . . . . . . . . .8 5. . . . .2 6. . . . .6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .4 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A cone . . . . .List of Figures 10.4 11. . . . . . . . . . . . . . . . . . . . .1 11. . .3 11. . . . . .6 11. . . Permutations and combinations . . . . . . . . . . . . Taylor polynomials for sin(x) . . . . .3 10. . . . . . . . . . . . . . . . . . . . . xiii 201 203 207 215 225 227 228 229 233 235 237 Rectangles attached to left or right endpoints . . . . . .7 Convergence and divergence of an inﬁnite series Improper integrals . . . . . . . . . . . . .2 11. . . . . . . . . . . . . . . . . . . . . . Rectangles with left or right corners on the graph of y = x2 Center of mass . . . . . . . . . . . . . . . .5 11. . . .2 10. . . . . . . . . . Student grade distribution . . . . . . . . . . . . . . . . . .5 11. . . . . . . . . . . . . . . . . . . Cumulative grade function and the median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The harmonic series . . . . . .
xiv List of Figures .
. . . . . . . . . . . . . . . surface area. . . . . .1 5. . . . 234 xv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mating table for HardyWeinberg genetics . . . . . . . . . . . . . . . . . . . . . . scale factors. . .6 11. . . . .1 3. Probability of X successes in a Bernoulli trial with n = 3 repetitions Pascal’s triangle . . . .List of Tables 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 22 25 26 26 26 Heights and areas of rectangular strips . . . . . . . . . . . . . . . . . . . . . . . . . 49 Arc length calculated using spreadsheet . . 101 Alligator teeth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Data from a cointossing experiment .3 1.1 5. and other derived quantities Areas of planar regions . . . . . . . . 31 Common functions and their antiderivatives .2 7. . . . . . . .4 1. . 137 141 141 143 147 148 Student test scores . . . .1 7.1 1. A Bernoulli trial with n = 3 repetitions . . . . . . . . . . . . . . . . . . . . . . Volume. Useful summation formulae . . . . . . . . . . . . . . . . . . . . . . . .3 7. . . . . . . . . . .2 1. . Hardy Weinberg gene probabilities . .5 1. . . . . . .2 7. . . . . . . . .1 Typical structure of branched airway passages in lungs. . Surface areas of 3D shapes . . . . . . .6 2. . . . . . . . . . . . . . . . .4 7. . . . . Volumes of 3D shapes . . . . . . . . . . . . . . . . . . . . . .5 7. . . .
xvi List of Tables .
subdivide (the time interval) and add up approximate changes over each of the smaller subintervals. Computing the total change over some time span turns out to be closely related to the same underlying concept of “divide and conquer”: namely. cut up the geometric shape into smaller pieces. the need to accurately measure an area or a volume went well beyond the available results of geometry. or fairly accurate. what is its volume (and thus. More than that.) This led to motivation for the development of the topic we now call integral calculus. and to ﬁnd convenient and relevant shortcuts that can be used to solve a variety of problems that have common features.e. The idea of applying a limit to obtain the true dimensions of the object was a ﬂash of inspiration that led to modern day calculus. its velocity) changes over its path. landowners. Among such pressing problems were the following: How much should one pay for a piece of land? If that land has an irregular shape. a reasonable price to pay)? In most such transactions. its cost) be calculated? How much olive oil or wine. and approximate those pieces by regular shapes that can be quantiﬁed using simple geometry. (It was known how to compute areas of rectangles. add up the areas of the (approximately regular) little parts in your “dissection”. If the barrel is not close to cylindrical.e. how should its area (and therefore. and ordinary people faced on a daily basis. Essentially. triangles. are you getting when you purchase a barrelfull? Barrels come is a variety of shapes and sizes. is not a simple geometrical shape. to arrive at an approximation of the desired area of the shape.e.Preface Integral calculus arose originally to solve very practical problems that merchants. and polygons. the rate of ﬂow of water in a river changes over the seasons. In computing the area of an irregular shape. this approximation could be quite crude. One area of application is that of computing total change given some timedependent rate of change. xvii . We encounter many cases where a process changes at a rate that varies over time: the rate of production of hormone changes over a day. or the rate of motion of a vehicle (i. It is the aim of a calculus course to develop the language to deal with such concepts. how many little parts). Depending on how ﬁne the dissection (i. to make such concepts systematic. The same idea applies to quantities that are distributed not in time but rather over space. but these were at best crude approximations to actual shapes and objects encountered in commerce. Similar ideas apply to computing the volume of a 3D object by successive subdivisions. Volumes of cylinders and cubes were also known. i. the approach is based on the idea of “divide and conquer”: that is. it is the purpose of this course to show that ideas developed in the original context of geometry (ﬁnding areas or volumes of 2D or 3D shapes) can be generalized and extended to a variety of applications that have little to do with geometry.
This allows us to ﬁnd a great shortcut to the analytic computations described in Chapter 2. with concepts that are closely linked. and a variety of miscellaneous tricks are devised to simplify integrals. Applications of these ideas to calculating total change from rates of change. The material is organized as follows: In Chapter 1 we develop the basic formulae for areas and volumes of elementary shapes. but at the same time. The connection between the mean (in probability) and the center of mass (of a density distributed in space) is illustrated. analytic computations can be very powerful and helpful. a density that varies from point to point) and total amount of material (obtained by the same process of integration). and apply the techniques to computing expected values for random variables. namely the Fundamental Theorem that connects derivatives and integrals. We study the ideas of probability in Chapters 7 and 8. we turn attention to the classic problem of deﬁning and computing the area of a twodimensional region. the student will have made acquaintance with the topic of such equations and qualitative techniques associated with interpreting their solutions. With the methods of integral calculus in hand. A set of computer labs using a spreadsheet tool are an important part of this course. and analogous numerical (i. used in the context of simple computer algorithms. we discuss the techniques on integration in Chapter 6. An example to motivate these ideas is the volume and surface area of a branching structure. In Chapter 3. The importance of seeing calculus from these two distinct but related perspectives is stressed: on the one hand. and to computing volumes and masses are discussed in Chapters 4 and 5. In an earlier differential calculus course. This is discussed in Chapter 9. computerenabled) calculations. we discuss the linchpin of Integral Calculus.e. and show how to set up summations that describe compound objects made up of many such shapes. the most important is integration by parts. Here. .xviii Preface We show the connection between material that is spatially distributed in a nonuniform way (e. comes in handy.e.but these apply to a limited set of cases.g. The two go handinhand. pencil and paper) calculations . Here we rediscover the connection between discrete sums and continuous integration. many interesting problems are too challenging to be handled by integration techniques. leading to the notion of the deﬁnite integral. This set of lecture notes grew out of many years of teaching of Mathematics 103. Here is where the same ideas. Quite often such rules take the form of differential equations. Many scientiﬁc problems are phrased in terms of rules about rates of change. This is particularly useful in cases where the analytic methods are not sufﬁcient or too technically challenging. A theme that unites much of the approach is that integral calculus has both analytic (i. the Taylor series is developed and discussed in this concluding chapter. For this reason. or the “formulae” for integrals) is vital: Ideas used to develop the analytic techniques on which calculus is based can be adapted to develop good working methods for harnessing computer power to solve problems. a technique that has independent applications in many areas of science. Among these. Of prime importance. we ﬁnd that the chain rule of calculus reappears (in the form of substitution integrals). the importance of understanding the concepts (not just the technical results. we can solve some types of differential equations analytically. The course concludes with the development of some notions of inﬁnite sums and convergence in Chapter 10. In Chapter 2. To expand our reach to other cases.
We brieﬂy survey some of these simple geometric shapes and list what we know or can easily determine about their area or volume. triangles. An important idea is introduced. Indeed. namely that we can use the sum of areas of elementary shapes to approximate the areas of more complicated objects. We show using examples how such ideas can be used in calculating the volumes or areas of more complex objects. 1. We will ﬁnd that the tools of calculus will provide important and powerful techniques for meeting this goal. our ability to compute areas and volumes of more elaborate geometrical objects will rest on some of these simple formulae.1 Introduction This introductory chapter has several aims. parallelograms.Chapter 1 Areas. The areas of simple geometrical objects. Among these are areas of simple geometric shapes and formulae for sums of certain common sequences. 1 . areas of rectangles will play an important part in those methods. and that the approximation can be made more accurate by a process of reﬁnement. we concentrate here a number of basic formulae for areas and volumes that are used later in developing the notions of integral calculus. In particular. First. Rectangular areas Most integration techniques discussed in this course are based on the idea of carving up irregular shapes into rectangular strips. Thus. summarized below. volumes and simple sums 1.2 Areas of simple shapes One of the main goals in this course will be calculating areas enclosed by curves in the plane and volumes of three dimensional shapes. such as rectangles. Some shapes are simple enough that no elaborate techniques are needed to compute their areas (or volumes). we conclude with a detailed exploration of the structure of branched airways in the lung as an application of ideas in this chapter. and circles are given by elementary formulae.
Areas.1(a) and (b) (a) h b (b) h b (c) h b (d) h b (e) h θ (f) h b r b Figure 1.1(c) and (d). The areas of triangles are easy to compute. • The area of a triangle can be obtained by slicing a rectangle or parallelogram in half. Planar regions whose areas are given by elementary formulae. See Figure 1. volumes and simple sums • The area of a rectangle with base b and height h is A=b·h • Any parallelogram with height h and base b also has area. 2 .1. as shown in Figure 1. Thus. However. triangles will play a less important role in subsequent integration methods. A = b·h. and we summarize this review material below.2 Chapter 1. any triangle with base b and height h has area 1 A = bh. Areas of triangular shapes A few illustrative examples in this chapter will be based on dissecting shapes (such as regular polygons) into triangles.
and its area is A = (1/2)br sin(θ) • If the triangle is isosceles. One of these triangles is shown at right. For example. with two sides of equal length. 1. each of length b = 1. 1 This calculation will be used again to ﬁnd the area of a circle in Section 1. as shown on Figure 1. the height of a triangle is not given. This example illustrates how a complex shape (the polygon) can be dissected into simpler shapes. as shown in Figure 1. i. We do not know the heights of these triangles. However. note that in later chapters.2. Solution The polygon has n sides. We dissect the polygon into n isosceles triangles. if the triangle has sides of length b and r with enclosed angle θ.1.2. trigonometric relations can be used to ﬁnd the height h in terms of the length of the base 1/2 and the angle θ/2.1(e) then its height is simply h = r sin(θ).2. but can be determined from other information provided. namely triangles1 . as in Figure 1. where the length of each side is b = 1. let us compute the area of a regular polygon with n equal sides.e. . For example. n of these identical angles make up a total of 360◦ or 2π radians. h2 = r2 − (b/2)2 so that the area of the triangle is A = (1/2)b r2 − (b/2)2 . our dissections of planar areas will focus mainly on rectangular pieces.1 Example 1: Finding the area of a polygon using triangles: a “dissection” method Using the simple ideas reviewed so far. θ/2 θ 1 1/2 h Figure 1.2. It is θ = 2π/n since together. r. Areas of simple shapes 3 • In some cases.1(f) then its height can be obtained from Pythagoras’s theorem.2. Since it can be further divided into two Pythagorean triangles. and base of length b. we can determine the areas of more complex geometric shapes. but the angle θ can be found.2. An equilateral nsided polygon with sides of unit length can be dissected into n triangles.
namely n . the area of a hexagon (6 sided polygon. Here we discuss how this formula for the area of a circle was determined long ago by Archimedes using a clever “dissection” and approximation trick. n = 6) is √ 6 3 3 3 √ = Ahexagon = = . so A= 1 2 tan(π/n) .2. and rearranging the above expression. But how did this convenient formula come about? and how could we relate it to what we know about simpler shapes whose areas we have discussed so far.e. The area of the entire polygon is then n times this. First. Here we see a terriﬁcally important second step that formed the “leap of faith” on which most of calculus is based. n = 4) is Asquare = 1 4 = = 1. but using rectangles in the dissections. As a second example. We have already seen part of this idea in dissecting a polygon into triangles. Then trigonometric relations relate the height to the base length as follows: b/2 opp = = tan(θ/2) adj h Using the fact that θ = 2π/n. namely taking a limit as the number of subdivisions increases 2 . i. we recall the deﬁnition of the constant π: 2 This idea has important parallels with our later development of integration.2 Example 2: How Archimedes discovered the area of a circle: dissect and “take a limit” As we learn early in school the formula for the area of a circle of radius r. the area of each of the n triangles is A= 1 1 bh = b 2 2 1 2 . The statement of the problem speciﬁes that b = 1. Here it involves adding up the areas of triangles. 1. A = πr2 . volumes and simple sums Let h stand for the height of one of the triangles in the dissected polygon. in Section 1.1.4 Chapter 1. we do much the same. we get h= b 2 tan(π/n) b 2 tan(π/n) Thus.2. Later on. 4 tan(π/4) tan(π/4) where we have used the fact that tan(π/4) = 1. 4 tan(π/6) 2 2(1/ 3) √ Here we used the fact that tan(π/6) = 1/ 3. Angon = 4 tan(π/n) For example. Areas. and then taking a limit as the number of triangles gets larger. the area of a square (a polygon with 4 equal sides. .
then we already have a formula for its area. (nb) → 2πr so A= 1 1 (nb)h → (2πr)r = πr2 2 2 . this side length is not known to us.) Shown in Figure 1. π is the ratio of the circumference to the diameter of the circle.1. i.) This polygon is made up of n triangles. Rather.2. Let r denote the radius of the circle. Similar observations are central to integral calculus. An is then 1 1 A = n · (area of triangle) = n bh = (nb)h. We can compute the area of any one of these polygons by dissecting into triangles.3. taking larger and larger n. and we will encounter this idea often.) The area of this triangle is 1 Atriangle = bh. However. and the perimeter of the polygon will get closer and closer to the perimeter of the circle. Then the height of each triangle will get closer to the radius of the circle. (Comment: expressed in terms of the radius. 2 The area of the whole polygon.e. 2 2 We have grouped terms so that (nb) can be recognized as the perimeter of the polygon (i. h → r. whose length we’ll call r. since two sides are radii of the circle.3. each one an isosceles triangle with two equal sides of length r and base of undetermined length that we will denote by b. (If we knew the side length of that polygon. which is (by deﬁnition) 2πr. (See Figure 1.e.3 is a sequence of regular polygons inscribed in the circle. r h b r Figure 1. As the number of sides of the polygon increases. All triangles will be isosceles. Areas of simple shapes Deﬁnition of π 5 In any circle. we know that the polygon should ﬁt exactly inside a circle of radius r. Suppose that at one stage we have an n sided polygon. this assertion states the obvious fact that the ratio of 2πr to 2r is π. Archimedes approximated the area of a circle by dissecting it into triangles. Now consider what happens when we increase the number of sides of the polygon. the sum of the n equal sides of length b each). its area gradually becomes a better and better approximation of the area inside the circle. as n → ∞.
the quantity of interest “approaches” the value shown. As in the case of areas. volumes and simple sums We have used the notation “→” to mean that in the limit. The volume of a cube of side length s (Figure 1. centimeters2 (cm2 ). • The surface area of a sphere of radius r is Sball = 4πr2 .3 Simple volumes Later in this course. This idea will appear again soon. but in most of our standard calculus computations. as n gets large. The volume of a cylinder of base area A and height h. The volume of a rectangular box of dimensions h. rather than triangles. a large number of triangles). 3. is V = Ah. Areas of other shapes We concentrate here the area of a circle and of other shapes. we collect below some basic formulae for volumes of elementary shapes. square inches. • The area of a circle of radius r is A = πr2 . to approximate areas of interesting regions in the plane. as in Figure 1. This applies for a cylinder with ﬂat base of any shape. circular or not.4b) is V = hwl. One of the most important ideas contained in this little argument is that by approximating a shape by a larger and larger number of simple pieces (in this case. 1. .4a). we will also be computing the volumes of 3D shapes.6 Chapter 1. etc. we will use a collection of rectangles. Areas. These will be useful in our later discussions. is V = s3 .4(c). Units The units of area can be meters2 (m2 ). 2. 1. w. • The surface area of a right circular cylinder of height h and base radius r is Scyl = 2πrh. This argument proves that the area of a circle must be A = πr2 . l (Figure 1. we get a better and better approximation of its area.
the volume of a cylinder with a circular base of radius r. The volume of a sphere of radius r (Figure 1. a cylindrical shell of radius r. (e. Simple volumes 7 (a) (b) h s w l (c) r A (d) h Figure 1. a disk) is V = h(πr2 ).1. τ ) is approximately V ≈ τ · (surface area of sphere) = 4πτ r2 .4. 3 6. cubic inches. The volume of a spherical shell (hollow sphere with a shell of some small thickness. centimeters3 (cm3 ). Units The units of volume are meters3 (m3 ). 7. . is V = 4 3 πr .g. τ has volume given approximately by V ≈ τ · (surface area of cylinder) = 2πτ rh. 3dimensional shapes whose volumes are given by elementary formulae 4.4d). etc. 5.3. height h and small thickness. In particular. Similarly.
Computing the volume of a set of disks. so V = (π12 ) + (π22 ) + (π32 ) + (π42 ) = π(1 + 4 + 9 + 16) = 30π. and a limit will be needed to arrive at a “true volume”. (b) The idea will be the same. The height of each disk is h = 1. . with radii r = 1. 3. . Areas.5. 2. . 3 Note that the idea of computing a volume of a radially symmetric 3D shape by dissection into disks will form one of the main themes in Chapter 5. i = 1. . (This structure is sometimes called the tower of Hanoi after a mathematical puzzle by the same name. but if the tower is large. and comprised of many disks.5. . Here.3. we would want some shortcut to avoid long sums3 . . Figure 1. The Tower of Hanoi is a shape consisting of a number of stacked disks. 4. 100. Solution (a) The volume of the fourdisk tower is calculated as follows: V = V1 + V2 + V3 + V4 . 2 . (b) Compute the volume of a tower made up of 100 such stacked disks. It is a simple calculation to add up the volumes of these disks. and it is also cumbersome to write down the long list of terms that we will need to add up. .8 Chapter 1.) (a) Compute the volume of a tower made up of four disks stacked up one on top of the other.1 Example 3: The Tower of Hanoi: a tower of disks In this example. and ﬁnding some clever way of performing such calculations. Assume that the radii of the disks are 1. + 992 + 1002 ). where Vi is the volume of the i’th disk whose radius is r = i. volumes and simple sums 1. 2. It would be tedious to do this by adding up individual terms. as shown in Figure 1. . 4 units and that each disk has height 1. we consider how elementary shapes discussed above can be used to determine volumes of more complex objects. Later on. the disks will only approximate the true 3D volume. This motivates inventing some helpful notation. 99. . but we have to calculate V = π(12 + 22 + 32 + . the sums of the volumes of disks is exactly the same as the volume of the tower.
) . + aN ≡ ak .4 Summations and the “Sigma” notation We introduce the following notation for the operation of summing a list of numbers: N S = a1 + a2 + a3 + . Example 4a: Summation notation Suppose we want to form the sum of ten numbers. k=1 The notation .e.1 ≡ 1. or in cases where there are too many to conveniently write down. and the superscript N tells us where it ends. We will be interested in getting used to this notation. We would write this as 10 S = 1 + 1 + 1 + . . Summations and the “Sigma” notation 9 1. k=1 The Greek symbol Σ (“Sigma”) indicates summation. n) as the index. i. as well as in actually computing the value of the desired sum using a variety of shortcuts. . so 10 S= k=1 1 = 10.4. . (We have already seen this sum in part (a) of The Tower of Hanoi.e. each equal to 1. n=1 To compute the value of the sum we use the elementary fact that the sum of ten ones is just 10.g. Example 4b: Sum of squares Expand and sum the following: 4 S= k=1 k2 . signiﬁes that we have left out some of the terms (out of laziness. Solution 4 S= k=1 k 2 = 1 + 22 + 32 + 42 = 1 + 4 + 9 + 16 = 30. which term starts off the series)..1. The notation k = 1 that appears underneath Σ indicates where the sum begins (i.) We could have just as well written the sum with another symbol (e.. the same operation is implied by 10 1. The symbol k used here is called the “index of summation” and it keeps track of where we are in the list of summands. .
10 Example 4c: Common factors Chapter 1. while each term has the form of (1/3)n . Learning how to compute the sum of such terms will be important to us. and will be described later on in this chapter. 1. volumes and simple sums Add up the following list of 100 numbers (only a few of them are shown): S = 3 + 3 + 3 + 3 + . . We give a few examples below: . 3. to be explored shortly. sums of lists of numbers satisfy many convenient properties. This series is a geometric series. The “index” n starts at 1. for example: 5 S= n=0 1 3 n =1+ 1 + 3 1 3 2 + 1 3 3 + 1 3 4 + 1 3 5 . We can easily modify our notation to include additional terms. S= 1 + 3 1 3 2 1 1 1 1 + + + . In most cases. so we can take out a common factor 100 100 S = 3 + 3 + 3 + 3 + .+ 3 = k=1 3=3 k=1 1 = 3(100) = 300. and 4. each one is 1/3 raised to an increasing integer power. Solution There are 100 terms.1 Manipulations of sums Since addition is commutative and distributive. all equal.e. We can represent this with the “Sigma” notation as follows: 4 S= n=1 1 3 n . + 3. 3 9 27 81 + 1 3 3 + 1 3 4 . i. Example 4d: Finding the pattern Write the following terms in summation notation: S= Solution We recognize that there is a pattern in the sequence of terms. Areas. .. a standard geometric series starts off with the value 1.4. and counts up through 2.. namely.
1) ... n=0 Solution 5 5 5 (1 + 3n ) = n=0 n=0 1+ n=0 3n .+ N = k=1 k= N (N + 1) . The idea is that all but the ﬁrst two terms in the ﬁrst sum will cancel. 1. The sum of consecutive integers (Gauss’ formula) We ﬁrst show that the sum of the ﬁrst N integers is: N S = 1 + 2 + 3 + .1.5 Summation formulas In this section we introduce a few examples of useful sums and give formulae that provide a shortcut to dreary calculations.5. We could have arrived at this conclusion directly from 10 k=1 10 2 2k − 2k = k=3 k=1 2k = 2 + 22 = 2 + 4 = 6. Example 5b: Expanding Expand the following expression: 5 (1 + 3n ). Summation formulas Example 5a: Simple operations Simplify the following expression: 10 10 11 k=1 2k − 2k . The only remaining terms are those corresponding to k = 1 and k = 2. 2 (1. k=3 Solution 10 10 k=1 2k − k=3 2k = (2 + 22 + 23 + · · · + 210 ) − (23 + · · · + 210 ) = 2 + 22 .
.1 Example 3.+992+1002 ) = π k=1 k2 = π 100(101)(201) = 338. we can easily add them up one by one vertically. . 350π cubic units. so that 2S = N (1 + N ).. + (N − 1) + 2 + N 1 (1 + N ) + (N − 1) + . we show how the formula for the sum of square integers can be proved by a technique called mathematical induction. + N 2 = k=1 k2 = N (N + 1)(2N + 1) . 2 Two other useful formulae are those for the sums of consecutive squares and of consecutive cubes: The sum of the ﬁrst N consecutive square integers N S2 = 1 2 + 2 2 + 3 2 + . . . .. + 2S = (1 + N ) + (1 + N ) (1 + N ) + Thus. + + . . + N 3 = k=1 k3 = N (N + 1) 2 2 . one written backwards.5. volumes and simple sums The following trick is due to Gauss. 6 . we can now return to the problem of computing the volume of a tower of 100 stacked disks of heights 1 and radii r = 1. By aligning two copies of the above sum. 100. . . We see that: S= + S= 1 N + 2 + . there are N times the value (N + 1) above. 1. revisited: Volume of a Tower of Hanoi Armed with the formula for the sum of squares... . 6 (1. This formula is very useful in what would otherwise be a huge calculation.3) In the Appendix. 99. Thus. Areas. We ﬁnd that 1000 so S= N (1 + N ) . .12 Chapter 1. Example: Adding up the ﬁrst 1000 integers Suppose we want to add up the ﬁrst 1000 integers. We have 100 V = π(12 +22 +32 +. . (1.2) The sum of the ﬁrst N consecutive cube integers N S3 = 1 3 + 2 3 + 3 3 + . . 2 S = 1 + 2 + 3 + . . 2. . Gauss’ formula is conﬁrmed. + 1000 = k=1 k= 1000(1 + 1000) = 500(1001) = 500500.
1. 20 20 20 20 Sa = Thus. . + rN +1 . . . We refer to a series of this type as a geometric series. + rN . Summing the geometric series Examples: Evaluating the sums Compute the following two sums: 20 50 13 (a) Sa = k=1 (2 − 3k + 2k 2 ). (b) Sb = k=10 k.1. Sa = 2(20) − 3 20(21) 2 +2 (20)(21)(41) 6 = 5150. where r is some real number and k is an integer power. we get k=1 (2 − 3k + 2k 2 ) = 2 k=1 1−3 k+2 k=1 k=1 k2 ..6 Summing the geometric series Consider a sum of terms that all have the form rk . . so that if r = 1 then SN = 1 + 1 + 1 + . Solutions (a) We can separate this into three individual sums. We would like to ﬁnd an expression for terms of this form in the general case of any real number r.. . each of which can be handled by algebraic simpliﬁcation and/or use of the summation formulae developed so far. 1 = N + 1 (a total of N + 1 ones added. and ﬁnite number of terms N . Below we will show that the sum of such a series is given by: N SN = 1 + r + r 2 + r 3 + . (b) We can express the second sum as a difference of two sums: 50 50 9 Sb = k=10 k= k=1 k − k . First we note that there are N + 1 terms in this sum. k=1 Thus Sb = 50(51) 9(10) − 2 2 = 1275 − 45 = 1230. .4) where r = 1. + r N = k=0 rk = 1 − rN +1 1−r (1. We have already seen one example of this type in a previous section. We call this sum a (ﬁnite) geometric series.6.) If r = 1 we have the following trick: S= − rS = 1 + r r + r2 + r2 + + .
Example: Geometric series Compute the following sum: 10 1 − rN +1 . Now dividing both sides by 1 − r leads to S= which was the formula to be established.. . (1.g. We say that such series diverge as N → ∞.2) and (1.1). Solution This is a geometric series 10 Sc = k=0 2k = 1 − 210+1 1 − 2048 = = 2047. We would like to investigate how the sum of a series behaves when more and more terms of the series are included. . have a ﬁnite sum. i. However. N (where N is some integer). Areas. the series simply gets larger and larger as more terms are included. Here we will look speciﬁcally for series that converge. 1−2 −1 1. when the series becomes an inﬁnite series.3)). We will use the following deﬁnition: 4 Convergence and divergence of series is discussed in fuller depth in Chapter 10 in the context of Taylor Series. or sums of squared or cubed integers (e. series in which there are only a ﬁnite number of terms. leaving S(1 − r) = 1 − rN +1 . + rN + rN +1 ) Most of the terms on the right hand side cancel. 1−r Sc = k=0 2k . Eqs. Let us focus again on the geometric series and determine its behaviour when the number of terms is increased. + rN ) − (r + r2 + . It is evident that in many cases. .14 Subtracting leads to Chapter 1. volumes and simple sums S − rS = (1 + r + r2 + . even as more and more terms are included4.e. such as Gauss’s series (1. i. these concepts are so important that it was felt necessary to introduce some preliminary ideas early in the term. . we have looked at several examples of ﬁnite series. .e. Our goal is to ﬁnd a way of attaching a meaning to the expression ∞ Sn = k=0 rk .7 Prelude to inﬁnite series So far.
Whenever r > 1. the sum of the series. k=0 That is. Then the partial sums.1 The inﬁnite geometric series Deﬁnition An inﬁnite series that has a ﬁnite sum is said to be convergent. this term will get bigger in magnitude as n increases. and write ∞ S= k=0 ak . and we will be interested in exploring two questions: 1. of this series are n Sn = k=0 ak . = k=0 2 k rk . 2.+ r + . Sn .7.5) Examples of convergent and divergent geometric series are discussed below. . Under what circumstances does an inﬁnite series have a ﬁnite sum.4) depends on the number of terms in the series. In the case of a geometric series. We say that the sum of the inﬁnite series is S. n via rn+1 . What value does the partial sum approach as more and more terms are included. exists provided r < 1 and is S= 1 .7. we consider the inﬁnite series as the limit of the partial sums as the number of terms n is increased. Prelude to inﬁnite series 15 1... for 0 < r < 1. This leads to the following conclusion: The sum of an inﬁnite geometric series.. We will see that only under certain circumstances will inﬁnite series have a ﬁnite sum. In this case we also say that the inﬁnite series converges to S. 1−r (1. We can say that lim rn+1 = 0 provided r < 1. (1. or r < −1. ∞ S = 1 + r + r + .1. n→∞ These observations are illustrated by two speciﬁc examples below.. Deﬁnition Suppose that S is an (inﬁnite) series whose terms are ak . Otherwise it is divergent. provided that S = lim n→∞ n ak . this term decreases in magnitude with n. whereas.
It is further studied in the homework problems. In this case. = 2 2 2 and we say that “the (inﬁnite) series converges to 2”. A similar example is given as an exercise for the student in Lab 1 of this calculus course. we will compute the volume and surface area of the branched airways of lungs5 .7.16 Chapter 1. We will see examples of each of these trends again. 1 − (1/2) 1 + 2 1 2 2 + 1 2 3 + . a sum with inﬁnitely many terms added up. . as the ﬁrst example shows. Areas. We observe that as n increases.8 Application of geometric series to the branching structure of the lungs In this section. It may converge in some cases. i. equivalently.e. It is essential to be able to distinguish the two. i. we now investigate the case that r = 2: then the series consists of terms n Sn = 1 + 2 + 2 2 + 2 3 + . Consider the geometric series with r = 1 . . we say that the sum does not converge. 5 This section provides an example of how to set up a biologically relevant calculation based on geometric series. .e. . that the sum diverges. we obtain n→∞ lim Sn = lim In this case. we write ∞ 1 − (1/2)n+1 1 = = 2. . can exhibit either one of these two very different behaviours. and we also illustrate how the same calculation could be handled using a simple spreadsheet. 1.. 2 Sn = 1 + Then Sn = 1 − (1/2)n+1 .3 Example: A geometric series that diverges In contrast. we may ﬁnd contradictions or seemingly reasonable calculations that have meaningless results. i. the sum continues to grow indeﬁnitely.. We use the summation formulae to arrive at the results. volumes and simple sums 1.7. or. 1. + 2 n = k=0 2k = 1 − 2n+1 = 2n+1 − 1 1−2 We observe that as n grows larger. + 1 2 n n = k=0 1 2 k . It is important to remember that an inﬁnite series. n→∞ 1 − (1/2) 1 − (1/2) 1 2 n =1+ n=0 1 1 + ( )2 + . Divergent series (or series that diverge under certain conditions) must be handled with particular care.2 Example: A geometric series that converges. for otherwise. or diverge (fail to converge) in other cases.e. as we retain more and more terms.
Figure 1. All segments are assumed to be cylindrical. their successive daughters “2”. The principle of this efﬁcient organ for oxygen exchange is that these very many small structures present a very large surface area. The bronchial tubes conduct air. labeled 0. We label the largest segment with index “0”. Most of the oxygen exchange takes place in tiny sacs called alveoli at the terminal branches of the airways passages. and its daughter segments with index “1”. and 2.6 shows only generations 0.8. The lungs. starting from the initial segment. In this section. the initial segment has undergone M subdivisions. we will compute this area and compare how it grows to the growth of the volume from one branching layer to the next. and many other biological “distribution systems” are composed of a branched structure. and on tools developed in this chapter. We assume that there are M “generations”. etc. 1.8. The index n refers to the branch generation. r0 Segment 0 l0 1 2 Figure 1.1 Assumptions • The airway passages consist of many “generations” of branched segments. It bifurcates into smaller segments. and so on down the structure from large to small branch segments. in humans. which then bifurcate further. and distribute it to the many smaller and smaller tubes that eventually lead to those alveoli. every segment is approximated as a cylinder of radius rn and length n . However. We will construct a simple mathematical model and explore its consequences.e.) • At each generation. for human lungs there can be up to 2530 generations of branching. Air passages in the lungs consist of a branched structure. 6 The surface area of the bronchial tubes does not actually absorb much oxygen. and so on. Oxygen from the air can diffuse across this area into the bloodstream very efﬁciently. with radius rn and length n in the n’th generation.1. length. i. as an example of summation. Application of geometric series to the branching structure of the lungs 17 Our lungs pack an amazingly large surface area into a conﬁned volume. . Based on these assumptions. (Typically. resulting in a geometric expansion in the number of branches. we apply geometric series to explore this branched structure of the lung. 1. their collective volume. The model will consist in some wellformulated assumptions about the way that “daughter branches” are related to their “parent branch”. we will then predict properties of the structure as a whole. The initial segment is quite large. We will be particularly interested in the volume V and the surface area S of the airway passages in the lungs6 .6.
theoretical structures produced by iterating such growth laws indeﬁnitely. some accuracy is sacriﬁced to get intuition. i. it could be the case that the radius of daughter branches is 1/2 or 2/3 that of the parent branch. In actual fact. i. xn+1 is xn+1 = bxn . volumes and simple sums radius of ﬁrst segment length of ﬁrst segment ratio of daughter to parent length ratio of daughter to parent radius number of branch generations average number daughters per parent r0 0 α β M b 0.8) are often called selfsimilar growth laws. For example. here denoted xn with the number (of smaller branches) in the next generation. In Figure 1. On average. However. Areas.9 0. • The number of branches grows along the “tree”. b = 1. (1. This means that the relationship of the radii and lengths satisfy simple rules: The lengths are related by n+1 = α n. the branching is slightly irregular. on average. 1 < b < 2. For human lungs.7. that b is a constant. Rules such as those given by equations (1. but in general. we expect that 0 < α < 1 and 0 < β < 1. the . In fact. details that were missed and are considered important can be corrected and reﬁned.86 30 1.e.8) with α and β positive constants. In a real biological structure.1. Not every level of the structure bifurcates. A branched structure in which each branch produces two daughter branches is described as a bifurcating tree structure (whereas trifurcating implies b = 3). each parent branch produces b daughter branches. because we have just one segment initially (x0 = 1). • The ratios of radii and lengths of daughters to parents are approximated by “proportional scaling”. (1.e. Since the branches decrease in size (while their number grows). In real lungs. we have illustrated this idea for b = 2. it must be true that b > 1. Here we will take b to be constant. the rule that links the number of branches in generation n. the number of branches x1 should be some small integer.18 Chapter 1. Since the number of branches is growing down the length of the structure. as in many mathematical models. this simpliﬁcation cannot be precise. Typical structure of branched airway passages in lungs. not a number like “1. averaging over the many branches in the structure b is smaller than 2.5 cm 5.7) and (1. (1.7) and the radii are related by rn+1 = βrn . and at level 1. for simplicity.7 Table 1. Later on.6) We will assume.6. Such concepts are closely linked to the idea of fractals.6 cm 0.7”.
E. According to physiological measurements.10).2 A simple geometric rule The three equations that govern the rules for successive branching. we will use the values of constants given in Table 1. (1. . and G.10) (1. .65 ≤ α. n factors We have arrived at a simple.9. G.6). equations (1. xn = bxn−1 implies that the n’th generation will have grown by a factor bn .) Actual lungs are not fully symmetric branching structures. a ﬁnite geometric series is wellapproximated by an inﬁnite sum. the number of segments will be n’th iteration: xn = bxn−1 = b(bxn−2 ) = b(b(bxn−3 )) = .9) This connection between the rule linking two generations and the resulting number of members at each generation is useful in other circumstances. Horsﬁeld. namely: The rule linking two generations. we will see that the rule linking two generations implies an exponential growth. Olson.1. but the above approximations are used here for simplicity. let us write out a few ﬁrst terms in the progression of the sequence {xn }: initial value: x0 ﬁrst iteration: x1 = bx0 second iteration: x2 = bx1 = b(bx0 ) = b2 x0 third iteration: x3 = bx2 = b(b2 x0 ) = b3 x0 . Equation (1. 207217. D.8. . and surface areas of successive segments in the branching structure. G. but important result.e. Phys. (1. Application of geometric series to the branching structure of the lungs 19 number of generations is ﬁnite.. and its solution is given by equation (1.7). Before returning to the problem at hand.8). xn = bn x0 . (1971) J. 1. (However. Dart.) For the purposes of this example. . We will use the same idea to ﬁnd the connection between the volumes. Appl. are examples of a very generic “geometric progression” recipe. = (b · b · · · b) x0 = bn x0 . at the n’th generation. β ≤ 0. (K. i. in some cases.8.9) is sometimes called a recursion relation.e.1. i. To see this. Cumming. and (1. the scale factors for sizes of daughter to parent size are in the range 0. let us examine the implications of this recursive rule. Essentially. 31. By the same pattern. . when it is applied to generating the whole structure.
volumes and simple sums 1. the same idea developed above can be used to relate the length and radius of a segment in the n’th. M . (That factor takes into account that both the radius and the length are being scaled down at every successive generation of branching. its volume is 2 vn = πrn n.4 Total volume of airways in the lung Since each lung segment is assumed to be cylindrical. 1. (There are bn such identical segments in the n’th generation. β).8. whose sum we can compute. Using equation (1.1. given in Table 1. etc. In fact. if b = 2. and rn = βrn−1 ⇒ 2 vn = πrn rn = β n r0 . Areas.2 and the fact that there is one segment in the 0’th generation. namely. and we will refer to the volume of all of them together as Vn below. 8. Given b and M . i. . to conclude that at the n’th generation. n n = α n−1 ⇒ n = α 0. . we add up over all generations. 4. To determine how many branch segments there are in total. with the n’th power of a certain factor(α. we ﬁnd M N= n=0 bn = 1 − bM+1 1−b . generation segment to the length and radius of the original 0’th generation segment.8.8. so that the tree bifurcates with the pattern 1.e. 1. b. Here we mean just a single segment in the n’th generation of branches. Thus the volume of one segment in generation n is n 2 = π(β n r0 )2 (αn 0 ) = (αβ 2 )n (πr0 0 ) .20 Chapter 1. 2. 0.) The length and radius of segments also follow a geometric progression. .4). x0 = 1. the number of segments is xn = x0 bn = 1 · bn = bn . . This is a geometric series.) Thus vn = (αβ 2 )n v0 . and the number of branch generations. we can then predict the exact number of segments in the structure. For example. the number of segments grows by powers of 2. v0 2 This is just a product of the initial segment volume v0 = πr0 0 . The calculation is summarized further on for values of the branching parameter.3 Total number of segments We used the result of Section 1. M .
we can use the summation formula. total airways volume is 30 30 V = n=0 Vn = v0 n=0 an = v0 1 − aM+1 1−a . s0 where s0 is the surface area of the initial segment. . is sn = 2πrn n = 2π(β n r0 )(αn 0 ) = (αβ)n (2πr0 0 ). i. Since this is a geometric series. S n = cn s0 . Thus. Application of geometric series to the branching structure of the lungs The total volume of all (bn ) segments in the n’th layer is Vn = bn vn = bn (αβ 2 )n v0 = (bαβ 2 )n v0 . Accordingly.4). based on its cylindrical shape. M 1 − cM+1 . Equation (1. We compute the value of the constant a in Table 1.1.2. a 21 Here we have grouped terms together to reveal the simple structure of the relationship: one part of the expression is just the initial segment volume. S = s0 cn = s0 1−c n=0 We compute the values of s0 and c in Table 1. Letting the constant a stand for that scale factor. c where we have let c stand for the scale factor c = (bαβ). 1. The similarity of treatment with the previous calculation of number of branches is apparent. we sum up.5 Total surface area of the lung branches The surface area of a single segment at generation n. the total surface area of all the n’th generation branches is thus Sn = bn (αβ)n s0 = (bαβ )n s0 .6.e.. To ﬁnd the total surface area of the airways.8. and summarize ﬁnal calculations of the total airways surface area in section 1.8.8. but also in the number of branches. a = (bαβ 2 ) leads to the result that the volume of all segments in the n’th layer is Vn = an v0 . while the other is now a “scale factor” that includes not only changes in length and radius.8.2.6. The total volume of the structure is obtained by summing the volumes obtained at each layer. This reveals the similar nature of the problem. and ﬁnd the total volume in Section 1. Since there are bn branches at generation n.
774 1. we ﬁnd that the total volume of all segments in the n’th generation is 30 V = v0 n=0 an = v0 1 − aM+1 1−a = 4.4 (1 − 1.131588) Recall that 1 litre = 1000 cm3 . Areas.2. Then we have found that the lung airways contain about 1. Volume. form top to bottom) in the entire structure! Total volume of airways Using the values for a and v0 computed in Table 1.3 cm3 .131588 1. volumes and simple sums volume of ﬁrst segment surface area of ﬁrst segment ratio of daughter to parent segment volume ratio of daughter to parent segment surface area ratio of net volumes in successive generations ratio of net surface areas in successive generations 2 v0 = πr0 0 s0 = 2πr0 0 (αβ 2 ) (αβ) a = bαβ 2 c = bαβ 4. According to this calculation.1. scale factors.8. but with distinct “bases” b. 1. a and c and coefﬁcients 1. and s0 . we have revealed that each quantity in the structure obeys a simple geometric series.9898 · 107 ≈ 2 · 107 .2.7 = 1. Now it remains to merely “plug in” the appropriate quantities. use the sample values for a model “human lung” given in Table 1.13158831) = 1510.6 Summary of predictions for speciﬁc parameter values By setting up the model in the above way. (1 − 1. .66564 0.6 cm2 0. v0 . it is important to that their values are fairly accurate. Total number of segments M N= n=0 bn = 1 − bM+1 1−b = 1 − (1. This approach has shown that the formula for geometric series applies in each case.7)31 1 − 1. we collect our results. In this section. Because a and c are bases that will be raised to large powers.3158 Table 1. so we keep more signiﬁcant ﬁgures. and other derived quantities.2 to ﬁnish the task at hand.5 litres.22 Chapter 1. surface area. or the resulting derived scale factors and quantities in Table 1. there are a total of about 20 million branch segments overall (including all layers.4 cm3 17.
8.7 (original value) to 2. in cases where summation formulae are not know to us.e. The disadvantage is that it can be less obvious how each of the values of parameters assigned to the problem affects the ﬁnal answers. How would you set up this variant of the model? How would this affect the calculated volume? 3. For what value of M would b = 2 lead to a reasonable result? 2.3158) There are 100 cm per meter. keeping all other parameters the same? Explain why this is biologically impossible in the case M = 30 generations. Thus.8.7. but then from generation 6 on. The advantage of that approach is that it eliminates tedious calculations by hand. on average. In our model. Explain why this is the case.8 For further independent study The following problems can be used for further independent exploration of these ideas. all calculations were done using the formulae developed for geometric series. the total surface area of the tubes that make up the airways is M S = s0 n=0 cn = s0 1 − cM+1 1−c = 17. we have assumed that. we compare these results to similar graphs in the case that one parameter. and (100)2 = 104 cm2 per m2 .6 (1 − 1. the branching number. Both layer by layer values and cumulative sums leading to total volume and surface area are shown in each of (a) and (c). In Figure 1. The contrast between the graphs shows how such a small change in this parameter can signiﬁcantly affect the results. In (b) and (d). In the problem we explored. Suppose that the ﬁrst 5 generations of branching produce 2 daughters each.7” daughter branches.315831) = 2.2.1. that b = 1. sometimes it is more convenient to devise a computer algorithm to implement “rules” and perform repetitive calculations in a problem such as discussed here. 1. and. 1.7. Suppose we had assumed that b = 2. We would describe this as “unbounded growth”. (1 − 1. However.7 we show the volumes and surface areas associated with the lung airways for parameter values discussed above. the branching number is b = 1.76 · 105 cm2 . paying particular attention to the scale factors a and c. What would the total volume V be in that case. reduces the need for analytical computations. i. b is adjusted from 1. 7 See Lab 1 for a similar problem that is also investigated using a spreadsheet. the net volume and surface area keep growing by larger and larger increments at each “generation” of branching.8. . Application of geometric series to the branching structure of the lungs Total surface area of airways 23 Using the values of s0 and c in Table 1. the area we have computed is equivalent to about 28 square meters! 1. It can also provide a shortcut to visual summary of the results. a parent branch has only “1.7 Exploring the problem numerically Up to now. A spreadsheet is an ideal tool for exploring iterated rules such as those given in the lung branching problem7.
(b) Same as (a) but assuming that parent segments always produce two daughter branches (i. You may want to investigate what is known about the actual branching parameter b.0 0. (c) and (d): same idea showing the surface area of n’th layer (green) and the cumulative surface area to layer n (blue) for original parameters (in c).24 Chapter 1.5 30. b = 2).0 250000.0 Cumulative volume to layer n 1500. to predict that the total volume of the branching tubes remains roughly constant while the surface area increases as branching layers are added.0 30.7. and the ratios of lengths and radii that we have assumed. Areas.1.5 0. the volume of layer n (red bars). 4. The graphs in (a) and (b) are shown on the same scale to accentuate the much more dramatic growth in (b).0 0.5 V   (a) 250000.5 30. M . Alternately.5 0. Determine how the branching properties of real human lungs differs from our assumed model. volumes and simple sums 1500.0 Cumulative volume to layer n Vn = Volume of layer n 0. Suppose we want a set of tubes with a large surface area but small total volume. the number of generations of branches. as well as for the value b = 2 (in d). and use similar ideas to reﬁne and correct our estimates.5 surface area of n’th layer 30.e. i. Which single factor or parameter should we change (and how should we change it) to correct this feature of the model.0 (b) Cumulative surface area to n’th layer Cumulative surface area to n’th layer surface area of n’th layer 0. and the cumulative volume down to layer n (yellow bars) are shown for parameters given in Table 1.0 0. (a) Vn . you may wish to ﬁnd parameters for other species and do a .5 Vn = Volume of layer n 0.5 (c) (d) Figure 1.e. 5.
we were able to prove that the area of a circle of radius r is A = πr2 .4 the volumes and Table 1. A summary of the most important ones is given below. namely the branching structure of lungs. We introduced some notation for series and collected useful formulae for summation of such series. This is an openended problem. will form a deep underlying theme in the next two chapters and later on in this course. height h base b. We will use these extensively in our next chapter. Object triangle rectangle circle dimensions base b. 25 6. and then. and others related to it. We used areas of triangles to compute areas of more complicated shapes. b = 3). Areas of planar regions . produces 3 new daughter branches per parent branch.1. A 1 2 bh bh πr2 Table 1.6. by letting N go to inﬁnity. Consider a tree that trifurcates (i. This idea. 1. height h radius r area. These are summarized in Table 1. including regular polygons. Many species of plants are based on a regular geometric sequence of branching. Summary comparative study of lungs in a variety of animal sizes.5 the surface areas of 3D shapes. Explain (a) What biological problem is to be solved in creating such a structure (b) What sorts of constraints must be satisﬁed by the branching parameters to lead to a viable structure. we collected useful formulae for areas and volumes of simple 2D and 3D shapes. We used a polygon with N sides to approximate the area of a circle. Finally.3. Table 1.9 Summary In this chapter.3 lists the areas of simple shapes. Branching structures are ubiquitous in biology. Table 1.9.e. we investigated geometric series and studied a biological application.
* Assumes a thin shell. Surface areas of 3D shapes Sum 1 + 2 + 3 + .4. V hwb πr2 h 4 3 3 πr 2πrhτ 4πr2 τ Table 1.+ N 12 + 22 + 32 + . i. + N 2 13 + 23 + 33 + . Volumes of 3D shapes. Useful summation formulae. height h. width w radius r. volumes and simple sums Object box circular cylinder sphere cylindrical shell* spherical shell* dimensions base b. height h.26 Chapter 1. . width w radius r. . height h.5. height h radius r radius r.. . small τ . S 2(bh + bw + hw) 2πrh 4πr2 Table 1. height h radius r surface area. Object box circular cylinder sphere dimensions base b. . Areas. rN Notation N k=1 N k=1 N k=1 Formula N (1+N ) 2 N (N +1)(2N +1) 6 N (N +1) 2 1−r N +1 1−r 2 Comment Gauss’ formula Sum of squares Sum of cubes Geometric sum k k2 k3 N k k=0 r Table 1.e. thickness τ volume. . .. + N 3 1 + r + r2 + r3 .6. thickness τ radius r. .
You might imagine that the shaded portion of this ﬁgure is a plot of land bounded by fences on three sides. We consider the problem of determining areas of regions such bounded by the x axis. rectangles. For triangles. beyond these elementary shapes. the lines x = a and x = b and the graph of some function. polygons. y y=f(x) A a b x Figure 2. and by a smooth curve on one of its edges. such methods fail. A farmer wishing to purchase this land would want to know exactly how large an area is being acquired. and a new idea is needed. y = f (x). and in Chapter 3. We now consider the problem of determining the area of a region in the plane that has the following special properties: The region is formed by straight lines on three sides. and by a river on the fourth side. We will discuss such ideas in this chapter. This type of geometric problem formed part of the original motivation for the development of calculus techniques.1. Here we set up the calculation of that area. no advanced methods (beyond simple geometry) are needed. We have already seen examples of the computation of areas of especially simple geometric shapes in Chapter 1.1. as shown in Figure 2.Chapter 2 Areas 2.1 Areas in the plane A longstanding problem of integral calculus is how to compute the area of a region in the plane. 27 . and we will discuss it in many contexts in this course. However. and circles.
In Figure 2. N = 40 and the true area for N →∞ approximation is fairly coarse when the number of rectangles is small9 . However. Later.0 x 1. we 8 Not 9 That all planar areas have this property. we generalize our results and lift this restriction. We will approximate the area of the region shown in Figure 2.0 0.0 Figure 2. As we increase the number of rectangular strips.0 0. N = 20.0 0. It can be seen that the 1.2.2. Shown are the intermediate steps N = 10. and the graph of a function y = f (x).0 0. Later examples indicate how to deal with some that do not.0 1. Areas More speciﬁcally. The function y = x2 for 0 ≤ x ≤ 1 is shown.0 x 1.0 0. if the number of rectangles is increased.0 x 1. is. the total area of the strips becomes a better and better approximation of the desired “true” area.0 1. the area of the rectangles is very different from the area of the region of interest.0 y=f(x)=x^2 N=40 rectangles y=f(x)=x^2 N > infinity True area of region 0. we illustrate the basic idea using a region bounded by the function y = f (x) = x2 on 0 ≤ x ≤ 1. we use a cartesian coordinate system to describe the region: we require that it falls between the xaxis.1 by dissecting it into smaller regions (rectangular strips) whose areas are easy to determine. We will refer to this type of procedure as a Riemann sum. with rectangles that approximate the area under its curve.0 1.28 Chapter 2. We will ﬁrst restrict attention to the case that f (x) > 0 for all points in the interval a ≤ x ≤ b as we concentrate on “real areas”.0 0.0 x 1. This is required for the process described below to work8 . (as shown in subsequent panels of this same ﬁgure).0 y=f(x)=x^2 N=10 rectangles y=f(x)=x^2 N=20 rectangles 0. . the lines x = a and x = b.
We will ﬁnd that carefully setting up the calculation of areas of the approximating rectangles will be important. approaches inﬁnity. Making a cameo appearance in this calculation will be the formula for the sums of square integers developed in the previous chapter. This idea will form the core of this chapter. we ﬁx N for a given panel. In the limit as N . the number of rectangles. i. set N = 1000 strips. a set of instructions. correct to 4 decimal places.2.g.indeed many of the laboratory exercises that accompany this course will be based on precisely this idea. We will ﬁnd that results are similar.2. The ideas are analogous to those described in Section 2. we use a simple spreadsheet to do the computations for us. Computing the area under a curve by rectangular strips 29 obtain a better and better approximation of the true area. in Section 2. First. for N = 20 strips it is 0. we can use essentially the same algorithm to explore various functions.e. 2. we had used a dissection of the circle into approximating triangles.2.3588.i. we will get deeper insight by understanding what happens in the limit as the number of strips N gets very large.2.3850 units2 .2. e. The reader will note a similarity with the idea we already encountered in obtaining the area of a circle. etc.2 can be used to calculate the areas of the steps for each of the panels in the ﬁgure.2. Once assembled. for N = 40 strips. for example. N = 10.2 Computing the area under a curve by rectangular strips 2. and set up a calculation which adds up the areas of steps. Compare with the exact calculations in Section 2. To do this. the area of the desired region is obtained.e. the assembly of a simple algorithm. and later labs will expand and generalize the idea to a variety of settings. intervals. i. then the area obtained is 0.2. though in that context. However. (e. Then.3338 units210 . or 40). ﬁnd the corresponding value of ∆x.g. but a spreadsheet does the number crunching for us. If we increase N greatly. 10 Note that all these values are approximations. 20. Using a spreadsheet. In our second approach. the area is 0.2. The advantage of this approach is that it requires only elementary “programming” .3459. the area is 0. we set up the Riemann sum corresponding to the function shown in Figure 2. With this idea in mind. number of rectangles. we set up the problem analytically. A new feature will be the limit N → ∞ that introduces the ﬁnal step of arriving at the smooth region shown in the ﬁnal panel of Figure 2. This is meant to illustrate the “numerical approach”. Lab 2 in this course will motivate the student to explore this numerical integration approach. x2 ∆x in a given panel.e. we compute the area of the region shown in Figure 2.2 .2. as the alternate analytic approach . we ﬁnd the following results at each stage: For N = 10 strips.2 in two ways. which begins to approximate the limit of N → ∞. This example illustrates that areas can be computed “numerically” .1 First approach: Numerical integration using a spreadsheet The same tool that produces Figure 2.
3..3. . Suppose we look at the rectangle labeled k. . x1 . where the value x0 = 0 and xN = 1 are the endpoints of the original interval. the coordinate xk is just k steps of size 1/N along the x axis. it follows that ∆x = 1/N . determine the heights and areas of these rectangle. rather than computational aids to determine that area. .2 Second approach: Analytic computation using Riemann sums In this section we consider the detailed steps involved in analytically computing the area of the region bounded by the function y = f (x) = x2 . (We will refer to this width as ∆x. xN . .. The height of this rectangle is determined by the value of the function. since one corner of the rectangle is “glued” to the curve. together with a representative pair xk−1 and xk inside the region. . sum their total area. xk = k(1/N ) = k/N . In the right panel of Figure 2. as it forms a difference of successive x coordinates. . xk . Let us look more carefully at one of the rectangles.) The coordinates of the endpoints of these subintervals will be labeled x0 . Let us subdivide this interval into N equal subintervals.3 is to afﬁx the right corner of each . Since 0 ≤ x ≤ 1. as shown in Figure 2. Since the points are equally spaced.2. For clarity. In the panel on the right. and the all rectangles have the same base width.2. Such a representative kth rectangle is shown shaded in Figures 2. The choice shown in Figure 2.e. Areas 2. we show only the ﬁrst few points. The interval of interest in this problem is 0 ≤ x ≤ 1. and then determine how this value behaves as the rectangles get more numerous (and thinner). By this we mean that we use “penandpaper” calculations. with detailed labeling in Figures 2. xN Figure 2. starting at x0 = 0. We set up the rectangles (as shown in Figure 2. xN = 1 and xk = k∆x.3. i.3).30 Chapter 2. the coordinates of base corners and two typical heights of the rectangles have been labeled.3. Then each has width 1/N . x k−1 x k . some of these coordinates have been labeled. A rectangle (shaded) has base width ∆x and height f (x). . Here x0 = 0. 0 ≤ x ≤ 1. y 2 y=f(x)=x y y=f(x) f(x N) f(x) f(x k ) 0 Δx 1 x0 x 1 . .. The region under the graph of y = f (x) for 0 ≤ x ≤ 1 will be approximated by a set of N rectangles..
k . N xk = k . position. to help visualize how heights of rectangles behave.e. i. N 1 N ∆x . height = f (xk ). The base of every rectangle is the same. The label. We approximate the area under the curve y = f (x) = x2 by the sum of the values in the last column. rectangle on the curve. base = ∆x = 1/N . (In general this table is not needed in our work. ∆x = 1/N . . and it is presented for this example only.1) The expressions shown in Eqn. Each rectangle has the same base width. (2.e. (2.1) is N N N AN strips = k=1 ak = k=1 f (xk )∆x = k=1 k N 2 1 N .e.1. the total area of the rectangles. and area ak of each rectangular strip is shown above.2.2. shown shaded. This may help the reader to see the pattern that emerges in the summation.) The total area of all rectangular strips (a sum of the values in the right column of Table 2. a concept introduced later on in this chapter. This means that the area of the kth rectangle. . . This implies that the height of the kth rectangle is obtained from substituting xk into the function. A list of rectangles. . . Computing the area under a curve by rectangular strips rectangle (k) 1 2 3 . height. is ak = height × base = f (xk )∆x We now use three facts: f (xk ) = x2 .1) is a Riemann sum.1. A recurring theme underlying integral calculus is the relationship between Riemann sums and deﬁnite integrals. i. N right x coord (xk ) (1/N ) (2/N ) (3/N ) (k/N ) (N/N ) = 1 height f (xk ) (1/N )2 (2/N )2 (3/N )2 (k/N )2 (N/N )2 = 1 area ak (1/N )2 ∆x (2/N )2 ∆x (3/N )2 ∆x (k/N )2 ∆x (1)∆x 31 Table 2. k Then the area of the k’th rectangle is ak = height × base = f (xk )∆x = k N f (xk ) 2 ∆x = 1 . and their properties are shown in Table 2. i.
only the quantity k changes from term to term. get very large. note that when N gets very large.2) to compute that approximate area for values of N shown in the ﬁrst three panels of Fig 2. N .2c).32 Chapter 2. the area of the 10 strips (shown in red) is A10 strips = (10 + 1)(2 · 10 + 1) = 0. ∆x = 1/20 = 0. (We plug in the value of N into (2. the true area is found by taking the limit as N gets large in equation (2. Areas We now rewrite this sum in a more convenient form so that summation formulae developed in Chapter 1 can be used.) If N = 10 strips (Figure 2.35875..) In the example discussed in this section. so that AN strips = 1 N3 N k2 .2) N3 6 6N 2 In the box below. resulting in 1 N (N + 1)(2N + 1) (N + 1)(2N + 1) AN strips = = .2. N →∞ This means that the true area is obtained by letting the number of rectangular strips. (while the width of each one.2). ∆x = 1/40 = 0. N →∞ N N2 N .2) for the sum of square integers can be applied to the summation.1.2. 6 · 402 We will deﬁne the true area under the graph of the function y = f (x) over the given interval to be: A = lim AN strips . 6 · 102 If N = 20 strips (Figure 2. 6 · 202 If N = 40 strips (Figure 2. N →∞ 6 6 N2 To evaluate this limit.e. Note that these are comparable to the values we obtained “numerically” in Section 2. and A20 strips = (20 + 1)(2 · 20 + 1) = 0.025 and A40 strips = (40 + 1)(2 · 40 + 1) = 0.2b). In this sum. k=1 The formula (1. ∆x = 1/N gets very small. (N + 1) ≈ N and (2N + 1) ≈ 2N so that (simplifying and cancelling common factors) N →∞ lim (N ) (2N ) (N + 1)(2N + 1) = lim = 2. A = lim N →∞ 1 N2 (N + 1)(2N + 1) 1 (N + 1)(2N + 1) = lim .2a). we can use the approximations. the width of each strip is 0.385.2) and use a calculator to obtain the results below. (2.1 unit. (2. All other quantities are common factors. i. According to equation 2.2.3459375.05. we use Eqn.
we must understand the underlying “technology” and concepts.2. We later ask how to modify this treatment to describe similarly curved leaves of arbitrary length and width. so that its proﬁle is contained in the interval 0 ≤ x ≤ 1. Indeed. 2. The student will ﬁnd that understanding the ideas of Section 2.1. as follows: length of interval = 1 − 0 = 1 . In this section we use our techniques to determine the area of a rhododendron leaf. we observe that at x = 0 and x = 1. provides a convenient approximation to the top edge of the leaf.2. 6 3 (2. our development of Riemann sums foreshadows and anticipates the idea of a deﬁnite integral.3 Comments Many student who have had calculus before in highschool. this proves vital in understanding how to use the methods. and in short order. the area between this curve and the x axis. Thus. Hence.2. 2. is one half of the leaf area. ask “why do we bother with such tedious calculations.1.2. the curve intersects the x axis.333. There are two reasons why we linger on Riemann sums.2d) is is 1/3 units2 . and leaves that are less symmetric. and when things can go wrong. in order to understand integration adequately. As shown in Figure 2. we gave only few details of the steps involved. by subdividing the interval of interest into N rectangular strips. of the form y = f (x) = x(1 − x). the ideas of setting up area approximations using rectangular strips is very similar to the way that the spreadsheet computations are designed. We can set up the calculation systematically. even though we will shortly have better tools for analytical calculations. when we could just use integration?”. At 0 < x < 1. the curve is above the axis.3.2. We set up the computation of approximating rectangular strips as before.) In Section 2.4. First.2 will go handinhand with understanding the numerical approach of Section 2. the summation is handled automatically using the spreadsheet. It also helps to understand what integrals represent in applications that occur later on. Second. In the next section we consider a practical problem in which a similar calculation is carried out. The ideas outlined above can be applied to more complicated situations. their surface area is an important property.3) Thus.3 The area of a leaf Leaves act as solar energy collectors for plants. we will ﬁrst consider a function designed to mimic the shape of the leaf in a simple system of units: we will scale distances by the length of the leaf. a simple parabola. The area of a leaf The result is: A= 33 1 1 (2) = ≈ 0. For simplicity of treatment. shown in Figure 2. (However. the true area of the region (Figure 2. To check that this is the case. some powerful techniques will help to shortcut such technical calculations.4. and no “formulae” are needed.
4. ∆x = the k’th x value. number of segments. N N . Areas y y=f(x)=x(1−x) 0 x 1 y k’th rectangle (enlarged) x 0=0 x x1 x2 xk xn =1 Δx yk =f(xk ) Figure 2. N width of rectangular strips. N N f (xk ) The total area of these rectangular strips is: N N N AN strips = k=1 ak = k=1 ∆x · f (xk ) = k=1 1 N · k k (1 − ) . f (xk ) = xk (1 − xk ) The representative k’th rectangle is shown shaded in Figure 2.4: Its area is ak = base × height = ∆x · f (xk ) = 1 N ∆x · k k (1 − ) .34 Chapter 2. In this ﬁgure we show how the area of a leaf can be approximated by rectangular strips. xk = k 1 N 1 k = N N height of k’th rectangular strip.
4 Area under an exponential curve In the precious examples. we considered areas under curves described by a simple quadratic functions. We will ﬁnd the area under the graph of the function y = f (x) = e2x over the interval between x = 0 and x = 2. or less symmetric than the current example. we have seen a similar calculation in Section 2. for a ﬁnite geometric sum. (1.e. In the homework problems accompanying this chapter. i. A= Remark: The function in this example can be written as y = x − x2 . Simplifying.4) in Chapter 1. A = limN →∞ AN strips . as well as leaves with shapes that are tapered.2. Recall that we derived Eqn.2) from Chapter 1 results in: AN strips = 1 N2 N (N + 1) 2 − 1 N3 (2N + 1)N (N + 1) 6 . namely the fact that we can rearrange the terms into simpler expressions that can be summed individually.2. of rectangular strips.1) and(1.4. For part of this expression. Taking the limit leads to 2. Area under an exponential curve Simplifying the result (so we can use summation formulae) leads to: AN strips = 1 N N 35 k=1 k k (1 − ) N N = 1 N2 N k=1 k− 1 N3 N k2 . This example illustrates an important property of sums. 2 6 2 3 6 Thus the area of the entire leaf (twice this area) is 1/3. we will also use the fact that the exponential function has a linear approximation as follows: ez ≈ 1 + z . N . k=1 Using the summation formulae (1. Each of these led to calculations in which sums of integers or square integers appeared. we get 1 AN strips = 2 (N + 1) N − 1 6 (2N + 1)(N + 1) N2 . the true area is obtained as the limit as N goes to inﬁnity. broad. As before. we investigate how to describe leaves with arbitrary lengths and widths. and regrouping terms. In evaluating a limit in this example. This is the area for a ﬁnite number. Here we demonstrate an example in which a geometric sum will be used. We obtain: A = lim 1 2 (N + 1) N − lim 1 6 (2N + 1)(N + 1) N2 = 1 1 1 − ·2= . 2 6 6 N →∞ N →∞ 1 1 1 1 1 − ·2= − = .
and we ﬁnd that A = lim 2 N →∞ e4 − 1 1 − e4 1 − e4 = lim 2 =2 ≈ 26.799. (i. We can use the linear approximation e−4/N ≈ 1 − 4/N to evaluate the limit of the term in the denominator. As before.5 Extensions and other examples More general interval To calculate the area under the curve y = f (x) = x2 over the interval 2 ≤ x ≤ 5 using N rectangles.. (1 − r) 1 − e4 1 − e4 . we write length of interval = 2 − 0 = 2 number of segments = N width of rectangular strips. f (xk ) = exk = e2(2k/N ) = e4k/N We observe that the length of the interval (here 2) has affected the details of the calculation.) As before. where r = e4/N . we subdivide the interval into N pieces. length of . ∆x = the k’th x value. the area of the k’th rectangle is ak = base × height = ∆x × f (xk ) = and the total area of all the rectangles is AN strips = 2 N N 2 N e4k/N . This is a ﬁnite geometric series. Because the series starts with k = 1 and not with k = 0. e4k/N = k=1 2 N N rk = k=1 2 N N k=0 rk − r0 . each of width 2/N . the width of each one would be ∆x = (5 − 2)/N = 3/N . we ﬁnd that AN strips = 2 N e4/N We need to determine what happens when N gets very large. Areas (See Linear Approximations in an earlier calculus course. 4 N (e−4/N − 1) N →∞ −N (1 + 4/N − 1) 2.36 Chapter 2.e. xk = k 2 N 2 2k = N N height of k’th rectangular strip. Proceeding systematically as before. the sum is AN strips = 2 N (1 − rN +1 ) −1 . =2 1 − e4/N N (e−4/N − 1) After some simpliﬁcation and using r = e4/N .
For a function y = y y=f(x) A a b x Figure 2. and this is to be summed over k.1 Remarks 1. The deﬁnite integral 37 interval divided by N). a ≤ x ≤ b. Second. that of the deﬁnite integral. b I= a f (x) dx (2.6 The deﬁnite integral We now introduce a central concept that will form an important theme in this course.4) to be the area A of the region under the graph of the function between the endpoints a and b. The area of the k’th rectangle is then AK = f (xk ) × ∆x = [(2 + (3k/N ))2 ](3/N ). The results are entirely analogous. 11 A function is said to be bounded if its graph stays between some pair of horizontal lines. We begin by deﬁning a new piece of notation relevant to the topic in this chapter. Since the interval starts at x0 = 2.2 we discuss a number of other examples with several modiﬁcations: First.6. 2. we deﬁne the deﬁnite integral. rather than right endpoint approximations. f (x) > 0 that is bounded and continuous11 on an interval [a. in Appendix 11.2. the k’th coordinate is xk = 2 + k(3/N ) = 2 + (3k/N ). . we show how Riemann sums can be set up for left. summation formulae. b] (also written a ≤ x ≤ b).2. namely the area associated with the graph of a function. It is continuous if there are no “breaks” in its graph. we show how to set up a Riemann sum for a more complicated quadratic function on a general interval.1. See Figure 2. Other examples In the Appendix 11. A similar algebraic simpliﬁcation.5. The deﬁnite integral is a number. and increments in units of (3/N ).6.5. 2. and limit is needed to calculate the true area. The shaded area A corresponds to the deﬁnite integral I of the function f (x) over the interval a ≤ x ≤ b.
we have a procedure to calculate the value of the deﬁnite integral by dissecting the region into rectangular strips.6. (The calculation may be nontrivial.5 (Figure 2. Examples (14) relate areas shown above to deﬁnite integrals. and on the two end points of the interval. and taking a limit as N . 1 x dx = 0.2 Examples We have calculated the areas of regions bounded by particularly simple functions. The value of the deﬁnite integral depends on the function. but merely use previously known or recently derived results. Example (1) The area under the function y = f (x) = x over the interval 0 ≤ x ≤ 1 is triangular. we need no elaborate calculations. Areas 2. 2. 3.6. To practice notation. From previous remarks. Note that in many of the examples below.) y (a) (b) 0 1 0 1 x y (c) (d) 2 4 0 2 x Figure 2. 0 . and might involve sums that we have not discussed in our simple examples so far. The area of this triangle is thus A = (1/2)base× height= 0.38 Chapter 2. but in principle the procedure is welldeﬁned. the number of strips gets large. to familiarize the reader with the new notation just deﬁned. with base and height 1. Hence.5. summing up the total area of the strips.6a). we write down the corresponding deﬁnite integral in each case.
6(b)). We will investigate how the area under the graph of a function changes as one of the endpoints of the interval moves. we have to extend the idea of the deﬁnite integral somewhat.333.6(d)) forms a triangular region with base 2 and height 1.6(c)). Thus 1 0 x2 dx = 1/3 0. The area as a function Example (2) 39 In Section 2. (2.7. we will elaborate on the idea of the deﬁnite integral and arrive at some very important connection between differential and integral calculus. A(x). Before doing so. y y=f(x) A(x) a x b Figure 2. Example (3) A constant function of the form y = 1 over an interval 2 lex ≤ 4 would produce a rectangular region in the plane.2. We deﬁne a new function A(x) to be the area associated with the graph of some function y = f (x) from the ﬁxed endpoint a up to the endpoint x. with base (42)=2 and height 1 (Figure 2. 2 Example (4) The function y = f (x) = 1 − x/2 (Figure 2. where a ≤ x ≤ b.2. we also computed the area under the function y = f (x) = x2 on the interval 0 ≤ x ≤ 1 and found its area to be 1/3 (See Eqn.7. We can think of this as a function that gradually changes . and thereby deﬁne a new function.3) and Fig.7 The area as a function In Chapter 3. 2. Thus 4 1 dx = 2. thus 2 0 (1 − x/2) dx = 1. 2.
40
Chapter 2. Areas
(i.e. the area accumulates) as we sweep across the interval (a, b) from left to right in Figure 2.1. The function A(x) represents the area of the region shown in Figure 2.7. Extending our deﬁnition of the deﬁnite integral, we might be tempted to use the notation x A(x) =
a
f (x) dx.
However, there is a slight problem with this notation: the symbol x is used in slightly confusing ways, both as the argument of the function and as the variable endpoint of the interval. To avoid possible confusion, we will prefer the notation
x
A(x) =
a
f (s) ds.
(or some symbol other than s used as a placeholder instead of x.) An analogue already seen is the sum
N
k2
k=1
where N denotes the “end” of the sum, and k keeps track of where we are in the process of summation. The symbol s, sometimes called a “dummy variable” is analogous to the summation symbol k. In the upcoming Chapter 3, we will investigate properties of this new “area function” A(x) deﬁned above. This will lead us to the Fundamental Theorem of Calculus, and will provide new and powerful tools to replace the dreary summations that we had to perform in much of Chapter 2. Indeed, we are about to discover the amazing connection between a function, the area A(x) under its curve, and the derivative of A(x).
2.8 Summary
In this chapter, we showed how to calculate the area of a region in the plane that is bounded by the x axis, two lines of the form x = a and x = b, and the graph of a positive function y = f (x). We also introduced the terminology “deﬁnite integral” (Section 2.6) and the notation (2.4) to represent that area. One of our main efforts here focused on how to actually compute that area by the following set of steps: • Subdivide the interval [a, b] into smaller intervals (width ∆x). • Construct rectangles whose heights approximate the height of the function above the given interval. • Add up the areas of these approximating rectangles. (Here we often used summation formulae from Chapter 1.) The resulting expression, such as Eqn. (2.1), for example, was denoted a Riemann sum. • Find out what happens to this total area in the limit when the width ∆x goes to zero (or, in other words, when the number of rectangles N goes to inﬁnity).
2.8. Summary
41
We showed both the analytic approach, using Riemann sums and summation formulae to ﬁnd areas, as well as numerical approximations using a spreadsheet tool to arrive at similar results. We then used a variety of examples to illustrate the concepts and arrive at computed areas. As a ﬁnal important point, we noted that the area “under the graph of a function” can itself be considered a function. This idea will emerge as particularly important and will lead us to the key concept linking the geometric concept of areas with the analytic properties of antiderivatives. We shall see this link in the Fundamental Theorem of Calculus, in Chapter 3.
42
Chapter 2. Areas
Chapter 3
The Fundamental Theorem of Calculus
In this chapter we will formulate one of the most important results of calculus, the Fundamental Theorem. This result will link together the notions of an integral and a derivative. Using this result will allow us to replace the technical calculations of Chapter 2 by much simpler procedures involving antiderivatives of a function.
3.1
The deﬁnite integral
In Chapter 2, we deﬁned the deﬁnite integral, I, of a function f (x) > 0 on an interval [a, b] as the area under the graph of the function over the given interval a ≤ x ≤ b. We used the notation
b
I=
a
f (x)dx
to represent that quantity. We also set up a technique for computing areas: the procedure for calculating the value of I is to write down a sum of areas of rectangular strips and to compute a limit as the number of strips increases:
b N
I=
a
f (x)dx = lim
N →∞
f (xk )∆x,
k=1
(3.1)
where N is the number of strips used to approximate the region, k is an index associated with the k’th strip, and ∆x = xk+1 − xk is the width of the rectangle. As the number of strips increases (N → ∞), and their width decreases (∆x → 0), the sum becomes a better and better approximation of the true area, and hence, of the deﬁnite integral, I. Example of such calculations (tedious as they were) formed the main theme of Chapter 2 . We can generalize the deﬁnite integral to include functions that are not strictly positive, as shown in Figure 3.1. To do so, note what happens as we incorporate strips corresponding to regions of the graph below the x axis: These are associated with negative values of the function, so that the quantity f (xk )∆x in the above sum would be negative for each rectangle in the “negative” portions of the function. This means that regions of the graph below the x axis will contribute negatively to the net value of I. 43
44
Chapter 3. The Fundamental Theorem of Calculus
If we refer to A1 as the area corresponding to regions of the graph of f (x) above the x axis, and A2 as the total area of regions of the graph under the x axis, then we will ﬁnd that the value of the deﬁnite integral I shown above will be I = A1 − A2 . Thus the notion of “area under the graph of a function” must be interpreted a little carefully when the function dips below the axis.
y
y=f(x)
y
y=f(x)
x
x
y
(a)
y y=f(x)
(b)
y=f(x)
x a
(c)
x a
(d)
b
c
Figure 3.1. (a) If f (x) is negative in some regions, there are terms in the sum (3.1) that carry negative signs: this happens for all rectangles in parts of the graph that dip b below the x axis. (b) This means that the deﬁnite integral I = a f (x)dx will correspond to the difference of two areas, A1 − A2 where A1 is the total area (dark) of positive regions minus the total area (light) of negative portions of the graph. Properties of the deﬁnite integral: (c) illustrates Property 1. (d) illustrates Property 2.
3.2 Properties of the deﬁnite integral
The following properties of a deﬁnite integral stem from its deﬁnition, and the procedure for calculating it discussed so far. For example, the fact that summation satisﬁes the distributive
Also note that the value of A(x) does not depend in any way on t. according to our deﬁnition. This endpoint is considered as a variable12. 12 Recall that the “dummy variable” t inside the integral is just a “place holder”. a f (x)dx = − f (x)dx. and so satisﬁes properties of simple addition. b b 4. .3 The area as a function In Chapter 2. from some ﬁxed starting point. Let ∆x = h represent some (very small) increment in x. 5. a 1. It is actually a step size along the x axis. We would like to study how A(x) changes as x is increased ever so slightly. a c f (x)dx = 0.3. Properties 3 and 4 reﬂect the fact that the integral is actually just a sum. 3. We deﬁned a function that represents the area under the graph of a function f . This accounts for the sign change shown in Property 5.) Then. b Property 1 states that the “area” of a region with no width is zero. We will now investigate the interesting connection between A(x) and the original function. a b Cf (x)dx = C a f (x)dx. Property 5 is obtained by noting that if we perform the summation “in the opposite direction”. so any letter or symbol in its place would do just as well.2(a)). 3. x A(x) = a f (t) dt. f (x). (Caution: do not confuse h with height here. Property 2 shows how a region can be broken up into two pieces whose total area is just the sum of the individual areas. x+h A(x + h) = a f (t) dt. and is used to avoid confusion with the endpoint of the integral (x in this case).3. We illustrate some of these in Fig 3. we will be interested in the way that this area changes as the endpoint varies (Figure 3.e. a b f (x)dx = a f (x)dx + b b f (x)dx. then we must replace the previous “rectangle width” given by ∆x = xk+1 − xk by the new “width” which is of opposite sign: xk − xk+1 . we investigated how the area under the graph of a function changes as one of the endpoints of the interval moves.1. i. The area as a function 45 property means that an integral will satisfy the same the same property. a b (f (x) + g(x))dx = a a f (x) + a g(x)dx. b c 2. a to an endpoint x.
2(c)) that looks much like a rectangular strip (Figure 3.2) . (3.46 Chapter 3. A(x + h) − A(x) ≈ f (x)h A(x + h) − A(x) ≈ f (x). It is simply the derivative of the area function. i. if h is small.2(a)(b). The difference between the two areas is a thin sliver (shown in Figure 3.2).2(d)). we get a better and better approximation. h → 0. The Fundamental Theorem of Calculus y y=f(x) A(x) y y=f(x) A(x+h) x y y=f(x) A(x+h)−A(x) a x x+h a y (a) (b) y=f(x) f(x) a h a x (c) x+h (d) Figure 3. Thus. by f (x). given in Eqn. In Figure 3. respectively.) The height of this sliver is speciﬁed by the function f evaluated at the point x. h→0 dx h (3.2. so that the area of the sliver is approximately f (x) · h. f (x) = A(x + h) − A(x) dA = lim . i. the area of the region increases from A(x) to A(x + h). h→0 or lim A(x + h) − A(x) = f (x). (Indeed. When the right endpoint of the interval moves by a distance h. we illustrate the areas represented by A(x) and by A(x + h). h The ratio above should be recognizable.e.e. This leads to the important Fundamental Theorem of Calculus. h As h gets small. so that.e. in the limit. i. then the approximation of this sliver by a rectangle will be good.
4. dx In other words. stated another way.2 Example: an antiderivative Recall the connection between functions and their derivatives. 3. and Figure 3. Consider the following two functions: x2 x2 . This example illustrates that adding a constant to a given function will not affect the value of its derivative. called the Fundamental Theorem of Calculus. then A(x) = F (x) + C. this result says that A(x) is an “antiderivative” of the original function. Let x A(x) = a f (t) dt.2.3. antiderivatives of a given function are deﬁned only up to some constant.. The Fundamental Theorem of Calculus 47 We have just given a simple argument in support of an important result. g2 = + 1. f (x)13 .4 The Fundamental Theorem of Calculus 3. b]. which is restated below. g1 (x) = 2 2 Clearly. both functions have the same derivative: g1 (x) = g2 (x) = x. In fact.4. with no hyphen. 13 We often write “antiderivative”. 3. Then for a < x < b. dA = f (x). . We would say that x2 /2 is an “antiderivative” of x and that (x2 /2) + 1 is also an “antiderivative” of x. or.4. Proof See above argument. any function of the form g(x) = x2 +C 2 where C is any constant is also an “antiderivative” of x. We will use this fact shortly: if A(x) and F (x) are both antiderivatives of some function f (x).1 Fundamental theorem of calculus: Part I Let f (x) be a bounded and continuous function on an interval [a.
But from Part I of the Fundamental Theorem. (3. In the case of elementary functions. We are told that F (x) is an antiderivative of f (x). The Fundamental Theorem of Calculus 3. Proof From comments above. evaluate it at the two endpoints a. by property 1 of deﬁnite integrals. Remark 1: Implications This theorem has tremendous implications.4.48 Chapter 3. Replacing C by −F (a) in equation 3. x A(x) = a f (t) dt = F (x) − F (a). a A(a) = a f (t) = F (a) + C = 0. b of the interval of interest. where C is some constant. b].3 leads to the desired result. x . Remark 2: Notation We will often use the notation F (t)a = F (x) − F (a) to denote the difference in the values of a function at two endpoints. Thus x A(x) = a f (t) dt = F (x) − F (a). It follows that x A(x) = a f (t) dt = F (x) + C. we know that A(x) is also an antiderivative of f (x).3 Fundamental theorem of calculus: Part II Let f (x) be a continuous function on [a. we know that a function f (x) could have many different antiderivatives that differ from one another by some additive constant. C = −F (a). Thus.3) However. Suppose F (x) is any antiderivative of f (x). this will be very easy and convenient. and subtract the results to get the area. Instead of the drudgery of summations in order to compute areas. we will be able to use a shortcut: ﬁnd an antiderivative. Then for a ≤ x ≤ b. because it allows us to use a powerful new tool in determining areas under curves.
. Its antiderivative can be found easily using the “power rule” together with the properties of addition of terms. we see that integration is related to “antidifferentiation”.1 lists functions f (x) and their derivatives f (x) (in the ﬁrst two columns) and functions f (x) and their antiderivatives F (x) in the subsequent two columns. k = 0.5.3. as we discuss much later. Indeed. This polynomial could have many other terms (or even an inﬁnite number of such terms. Review of derivatives (and antiderivatives) 49 3. .5 Review of derivatives (and antiderivatives) By remarks above.1. we assume that x > 0. Table 3. the antiderivative is a2 a3 a1 F (x) = C + a0 x + x2 + x3 + x4 + . when using ln(x) as antiderivative for 1/x. In this table. in Chapter 10). we assume that m = −1. b = 0. function f (x) Cx xn sin(ax) cos(ax) tan(ax) ekx ln(x) arctan(x) arcsin(x) derivative f (x) C nxn−1 a cos(ax) −a sin(ax) a sec2 (ax) kekx 1 x 1 1 + x2 1 √ 1 − x2 function f (x) C xm cos(bx) sin(bx) sec2 (bx) ekx 1 x 1 1 + x2 √ 1 1 − x2 antiderivative F (x) Cx xm+1 m+1 (1/b) sin(bx) −(1/b) cos(bx) (1/b) tan(bx) ekx /k ln(x) arctan(x) arcsin(x) Table 3. Also. . consider the polynomial p(x) = a0 + a1 x + a2 x2 + a3 x3 + . . This motivates a review of derivatives of common functions. Common functions and their derivatives (on the left two columns) also result in corresponding relationships between functions and their antiderivatives (right two columns). 2 3 4 . These will prove very helpful in our calculations of basic integrals. As an example.
(Here we have taken the ﬁrst few terms from the example of the last section with coefﬁcients all set to 1. computing 1 I= 0 p(x) dx leads to 1 I= 0 1 1 1 (1 + x + x2 + x3 ) dx = (x + x2 + x3 + x4 ) 2 3 4 1 = 1+ 0 1 1 1 + + ≈ 2. I = −1 1 3. (1 − x2 ) dx. I = 0 sin x 2 dx.6.083.2 Example 2: Simple areas Determine the values of the following deﬁnite integrals by ﬁnding antiderivatives and using the Fundamental Theorem of Calculus: 1 1. 2 3 4 3. I = 0 1 x2 dx.6. Solutions 1.6 Examples: Computing areas with the Fundamental Theorem of Calculus 3. . I = −1 π 4.1 Example 1: The area under a polynomial Consider the polynomial p(x) = 1 + x + x2 + x3 . e−2x dx. 3 3 fact. The Fundamental Theorem of Calculus This can be checked easily by differentiation14. 2. An antiderivative of f (x) = x2 is F (x) = (x3 /3). thus 1 1 1 I= 0 14 In x2 dx = F (x) 0 = (1/3)(x3 ) 0 = 1 1 3 (1 − 0) = .) Then. 3.50 Chapter 3. it is very good practice to perform such checks.
.6. then a a f (x) dx = 2 −a 15 Recall f (x) dx −a (3.3. Thus π I= 0 sin x 2 π 0 dx = −2 cos(x/2) − 2(cos(π/2) − cos(0)) = −2(0 − 1) = 2. 3. thus 1 1 1 I= −1 (1−x2 ) dx = F (x) −1 = x − (x3 /3) −1 = 1 − (13 /3) − (−1) − ((−1)3 /3) = 4/3 See comment below for a simpler way to compute this integral.4) that a function f (x) is even if f (x) = f (−x) for all x. As shown in Fig 3. y=1−x 2 −1 0 1 x Figure 3. Examples: Computing areas with the Fundamental Theorem of Calculus 51 2. 4. Thus.3. Thus. However. the areas to the right and to the left of x = 0 are the same for the interval −1 ≤ x ≤ 1. An antiderivative of f (x) = (1 − x2 ) is F (x) = x − (x3 /3). We state the general result we have obtained. which holds true for any function with even symmetry integrated on a symmetric interval about x = 0: If f (x) is an even function. in the examples above is tricky only in that signs can easily get garbled when we plug in the endpoint at 1.2. An antiderivative of sin(x/2) is F (x) = − cos(x/2)/(1/2) = −2 cos(x/2). we can simplify our work by noting the symmetry of the function f (x) = 1 − x2 on the given interval. We can exploit the symmetry of the function f (x) = 1 − x2 in the second integral of Examples 3. Comment: The evaluation of Integral 2. A function is odd if f (x) = −f (−x).6.3. This stems directly from the fact that the function considered is even15 . we can immediately write 1 I= −1 (1 − x2 ) dx = 2 1 0 1 (1 − x2 ) dx = 2 x − (x3 /3) 0 = 2 1 − (13 /3) = 4/3 Note that this calculation is simpler since the endpoint at x = 0 is trivial to plug in. An antiderivative of e−2x is F (x) = (−1/2)e−2x . 1 1 1 I= −1 e−2x dx = F (x) −1 = (−1/2)(e−2x ) −1 = (−1/2)(e−2 − e2 ). We can integrate over 0 ≤ x ≤ 1 and double the result.
0 0.2 0.4 0. 1 Thus. The area can be represented as 1 A2 = 0 x − x3 dx.6 1. using additive (or subtractive) properties of areas.8 0. i. . On this interval.e.8 2 y=x1/3 y=x3 y=x Figure 3. y = x3 and y = x1/3 . we compute the areas A1 and A2 shown above. However. (c) What is the relationship of these two areas? What is the relationship of the functions y = x3 and y = x1/3 that leads to this relationship between the two areas? y 2 1.3 Example 3: The area between two curves The deﬁnite integral is an area of a somewhat special type of region. x4/3 A1 = 4/3 3 1 x4 − 4 0 = 0 3 1 1 − = .52 Chapter 3. (b) Find the area enclosed between the graphs of the functions y = x3 and y = x in the ﬁrst quadrant. 4 4 2 (b) The two curves y = x and y = x also intersect at x = 0 and at x = 1 in the ﬁrst quadrant. The Fundamental Theorem of Calculus 3.6 0. an axis.6. so that the area we want to ﬁnd can be expressed as: 1 A1 = 0 x1/3 − x3 dx. including those bounded by the graphs of two functions. Solution (a) The two curves. we can generalize to computing areas of other regions.2 1. two vertical lines (x = a and x = b) and the graph of a function..2 1. In Example 3.4 0. x1/3 > x3 . Thus the interval that we will be concerned with is 0 < x < 1.8 1.6 0.4 1.2 0 A1 A2 x 0. intersect at x = 0 and at x = 1 in the ﬁrst quadrant.4 1. (a) Find the area enclosed between the graphs of the functions y = x3 and y = x1/3 in the ﬁrst quadrant.4.6 1. and on the interval 0 < x < 1 we have x > x3 .8 1 1.
.4 Example 4: Area of land Find the exact area of the piece of land which is bounded by the y axis on the west.5. This sketching skill is illustrated in the ﬁgures shown in this section. 2 4 4 (c) The area calculated in (a) is twice the area calculated in (b). and this subtracts from the amount accrued. from which we would like to construct a sketch of the associated function A(x).6. f (x). we are given a sketch of the graph of a function. Qualitative ideas x2 2 1 0 53 x4 4 1 A2 = − = 0 1 1 1 − = . The result is 1000 A = 100x 0 + x3 3 1000 0 · 1 10000 = 4 5 10 . we see that A(x) has either a maximum or minimum value.) Every time the function f (x) crosses the x axis. (See A(x) in bottom panels of Figure 3. the area A1 between y = x1/3 and y = x3 is twice as large as the area A2 between y = x and y = x3 calculated in part (b): A1 = 2A2 (see Figure 3. = 0 100 + 1 x2 dx.4). by deﬁnition A(a) = 0) and gradually build up to some net positive amount. As x moves from left to right.5): We start with no area. 3. 3 3. 10000 Note that the multiplicative constant (1/10000) is not affected by integration. We would like to assemble a sketch of x A(x) = a f (t)dt which corresponds to the area associated with the graph of the function f . but then we encounter a portion of the graph of f below the x axis. which means geometrically that the graph of x1/3 is the mirror image of the graph of x3 reﬂected about the line y = x. Solution The area is 1000 A= 0 100 + x 100 2 1000 dx. Therefore. at the point x = a (since.7.7 Qualitative ideas In some cases. This ﬁts well with our idea of A(x) as the antiderivative of f (x): Places where A(x) has a critical point coincide with places where dA/dx = f (x) = 0. the x axis in the south. The reason for this is that x1/3 is the inverse of the function x3 .3. we show how the “area” accumulated along the graph gradually changes. Suppose we are given a function as shown in the top left hand panel of Figure 3. the lake described by the function y = f (x) = 100 + (x/100)2 in the north and the line x = 1000 in the east. (Hence the graph of A(x) has a little peak that corresponds to the point at which f = 0.
This follows from Property 1 of the deﬁnite integral. a 2.this follows from the fact that the area continues to accumulate as we “sweep across” positive regions of f (x). The Fundamental Theorem of Calculus f(x) x x A(x) A(x) x x (a) f(x) x (b) f(x) x A(x) A(x) x x (c) (d) Figure 3. .5.54 f(x) Chapter 3. The endpoint of the interval.e. Remarks The following remarks may be helpful in gaining conﬁdence with sketching the “area” x function A(x) = a f (t) dt. Given a function f (x). Whenever f (x) is positive. i. from the original function f (x): 1. we here show how to sketch the corresponding “area function” A(x). A(x) is an increasing function . (The relationship is that f (x) is the derivative of A(x) Sketching the function A(x) is thus analogous to sketching a function g(x) when we are given a sketch of its derivative g (x). from the fact that A(a) = a f (t) dt = 0. Recall that this was one of the skills we built up in learning the connection between functions and their derivatives in a ﬁrst semester calculus course. a on the x axis indicates the place at which A(x) = 0.
6. the function A(x) has a local minimum or maximum. g(x) is an antiderivative of f (x). it follows that (tak . 4.3. (See dashed vertical line at x = a). solid line). those tangent lines are horizontal. changes sign. determines whether g(x) is increasing or decreasing over the given intervals. solid line).g. Qualitative ideas 55 f(x) + a + − − + x g(x) a x Figure 3. 3. This means that either the area stops increasing (if the transition is from positive to negative values of f ). Whether f (x) is positive (+) or negative () in portions of its graph. only one of these satisﬁes g(a) = 0 as required by Property 1 of the deﬁnite integral.7. we assemble a plot of the x corresponding function g(x) = a f (t)dt (bottom. Where f (x) = 0. Since dA/dx = f (x) by the Fundamental Theorem of Calculus. or else the area starts to increase (if f crosses from negative to positive values). dashed curve parallel to g(x)).. Given a function f (x) (top. This determines the height of the desired function g(x). Wherever f (x). The box in the middle of the sketch shows conﬁgurations of tangent lines to g(x) based on the sign of f (x). The function g(x) is drawn as a smooth curve whose direction is parallel to the tangent lines shown in the box. Places where f (x) changes sign correspond to maxima and minima of the function g(x) (Two such places are indicated by dotted vertical lines). While the function f (x) has many antiderivatives (e.
8 Some ﬁne print The Fundamental Theorem has a number of restrictions that must be satisﬁed before its results can be applied. when f (x) has a local maximum or minimum. The Fundamental Theorem of Calculus ing a derivative of both sides) d2 A/dx2 = f (x).7.7.e. f(x) + a + x − g(x) a x Figure 3. Given a function f (x).7 3. 3. Figure 3.7. it follows that A (x) = 0. f (x) = 0). (i. Sketch the correx sponding function g(x) = a f (x)dx.1 Example: sketching A(x) Consider the f (x) whose graph is shown in the top part of Figure 3. The original functions. . Thus. This means that at such points. In this section we look at some examples in which care must be used. the function A(x) would have an inﬂection point. The corresponding functions g(x) is drawn below. f (x) is shown above.56 Chapter 3.6 shows in detail how to sketch the corresponding function x g(x) = a f (t)dt. Solution See Figure 3.
8.4 Function undeﬁned Now let us examine the integral 1 −1 x1/2 dx. namely f (x) = x if x > 0 −x if x < 0. although one can “go through the motions” of computing an antiderivative.8. The Fundamental Theorem of Calculus cannot be applied. namely 2 I= −1 x dx = x2 2 2 0 (−x) dx + −1 0 x dx. we say that “this integral does not exist”. This function is actually made up of two distinct parts. Technically. . x2 This function is also undeﬁned (and hence not continuous) at x = 0. 0 2 The integral I must therefore be split up into two parts.2 Function unbounded II Consider the deﬁnite integral 1 −1 1 dx. 3. we cannot evaluate this integral using the Fundamental theorem. Some ﬁne print 57 3. 3.8. We ﬁnd that I=− x2 2 0 + −1 =− 0− 1 4 + − 0 = 2.5 2 2 3. Hence. x The function f (x) = is undeﬁned at x = 0.3.8. the result so obtained would be simply wrong. and getting a numerical answer. and unbounded on any interval that contains the point x = 0.1 Function unbounded I Consider the deﬁnite integral 0 1 x 2 1 dx. evaluating it at both endpoints.8. and indeed.3 Example: Function discontinuous or with distinct parts Suppose we are given the integral 2 I= −1 x dx. We say that his integral does not exist.
r r r (We used the fact that e−rb → 0 as b → ∞. r r This is the area under the exponential curve between x = 0 and x = b.8. this integral does not make sense. r (3. so that b → ∞. the upper endpoint of the integral increases. found that ∞ I= 0 e−rx dx = 1 . In this example. we must split up the region into two distinct parts. Hence. (0.5) An integral of the form (3. The Fundamental Theorem of Calculus y y= x −1 0 2 x Figure 3. the function is not deﬁned for x < 0 and the interval of integration is inappropriate.) We have. Even though the domain of integration of this integral is inﬁnite. Hence. Simple integration using the antiderivative in Table 3. observe that the value we computed is ﬁnite. ∞). so long as r = 0. Recall that x1/2 = x. and b > 0 are constants. √ We see that there is a problem here. Then the value of the integral becomes b I = lim b→∞ e−rx dx = lim 0 b→∞ 1 1 1 1 − e−rb = (1 − 0) = . Not all such integrals have a bounded ﬁnite value.8. to compute the integral over the interval −1 ≤ x ≤ 2.1 (for k = −r) leads to the result I= e−rx −r b 0 =− 1 −rb 1 e − e0 = 1 − e−rb . Now consider what happens when b.5) is called an improper integral. Learning to distinguish between those that do and those that do not will form an important theme in Chapter 10.5 Inﬁnite domain (“improper integral”) Consider the integral b I= 0 e−rx dx. where r > 0.58 Chapter 3. in essence. 3. .
9. placing our rectangles along the interval 0 < y < 1 on the y axis (each having base of width ∆y) leads to the integral 1 1 I= 0 g(y) dy = 0 (y − y 2 )dy = y2 y3 − 2 3 1 = 0 1 1 1 − = .e. all the rectangular strips we considered had bases (of width ∆x) on the x axis. We are unable to set up a series of rectangles with bases along the x axis whose heights are described by this curve. we can use the curved boundary as a single function that deﬁnes the region.2 showed that there is a connection between the derivative A (x) of the area and the function f (x). y = f (x). To do so. In Figure 3. We are y y x=g(y) Δy x x Figure 3. If we do so.3. and examined its basic properties. Up to now.9 Summary In this chapter we ﬁrst recapped the deﬁnition of the deﬁnite integral in Section 3. (Note that the curve cannot be expressed in the form of a function in the usual sense. becomes a function of x. Then.) asked to ﬁnd the area between the curve y 2 − y + x = 0 and the y axis. y 2 − y + x = 0 forms the boundary from both the top and the bottom of the region. This means that our deﬁnite integral (which is really just a convenient way of carrying out the process of area computation) has to be handled with care. Our construction in Figure 3. . one and the same curve. A(x).1. recalled its connection to an area in the plane under the graph of some function f (x). we can achieve the desired result. i. Let us consider this problem from a “new angle”.9. Summary Regions that need special treatment 59 So far. Indeed. but it can be expressed in the form of a function x = f (y). we showed that A (x) = f (x) and argued that this makes A(x) an antiderivative of the function f (x). with rectangles based on the y axis. we have learned how to compute areas of regions in the plane that are bounded by one or more curves. If one of the endpoints. the area it represents.9 we observe an example in which it would not be possible to use this technique. In all our examples so far. 2 3 6 3. x of the integral is allowed to vary. the basis for these calculations rests on imagining rectangles whose heights are speciﬁed by one or another function. The area in the region shown here is best computed by integrating in the y direction. let us express our curve in the form x = g(y) = y − y 2 . However.
the ideas presented here have a much wider range of applicability than simple area calculations. forming the Fundamental Theorem of Calculus. mass from densities. We used these antiderivatives to calculate areas in several examples. we reviewed some common functions and derivatives. as well as a host of other quantities that involve a process of accumulation. . The Fundamental Theorem of Calculus This important connection between integrals and antiderivatives is the crux of Integral Calculus. As we will see in upcoming chapters. to compute volumes of various shapes. Indeed. we extended the treatment to include qualitative sketches of functions and their antiderivatives. Finally. These ideas will be investigated in Chapters 4.60 Chapter 3. Motivated by this very important result. to determine displacement from velocity. Rather. and 5. we will shortly show that the same concepts can be used to calculate net changes in continually varying processes. and used this to relate functions and their antiderivatives in Table 3. Its signiﬁcance is that ﬁnding areas need not be as tedious and labored as the calculation of Riemann sums that formed the bulk of Chapter 2.1. we can take a shortcut using antidifferentiation.
and what effect it induces. Finally. In Section 5. which. Differential Calculus Course.3 and 4. Several other important ideas are introduced in this chapter. velocity and displacement of a moving object.Chapter 4 Applications of the deﬁnite integral to velocities and rates 4. We will see examples of this type in Sections 4. In all these examples.1 Introduction In this chapter. This allows us to extend the utility of the mathematical tools to a variety of novel situations. and see that integration can also be used to “add up” the total amount of material distributed over space. we encounter a number of applications of the deﬁnite integral to practical problems. We will discuss the connection between acceleration. is a cumulative summation of inﬁnitesimal changes. as discussed in Section 4. we also show that the deﬁnite integral is useful for determining the average value of a function. and recall how air resistance can be described. We also consider mass distributions and the notion of a center of mass.6. 61 .2. We encounter for the ﬁrst time the idea of spatial density.2. An important connection is made in this chapter between a rate of change (e. Here we will show that the notion of antiderivatives and integrals allows us to deduce details of the motion of an object from underlying Laws of Motion.4. a topic we visited in an earlier. We show that such examples also involve the concept of integration.g. fundamentally. Computations at this stage are relatively straightforward to emphasize the process of setting up the appropriate integrals and understanding what they represent. the important step is to properly set up the deﬁnite integral that corresponds to the desired net change.e. the net change resulting from all the accumulation and loss over a time span). rate of growth) and the total change (i. We will consider both uniform and accelerated motion. this idea is applied to the density of cars along a highway.
(4.1. as the displacement over the time interval 0 ≤ t ≤ T . since velocity is an antiderivative of acceleration. Similarly.62 Chapter 4.2) as above.1 Geometric interpretations Suppose we are given a graph of the velocity v(t). is an antiderivative of velocity.1) is a displacement. and a(t) the acceleration. Then according to the above observations.) This means that position is an antiderivative of velocity and velocity is an antiderivative of acceleration. Next..2.e. In the case that T1 = 0. we have T 0 v(t) dt = x(T ) − x(0).2 Displacement.1. Then T by the deﬁnition of the deﬁnite integral. the area under the curve a(t) is a geometric quantity that represents the net change in the velocity. by previous remarks. velocity and acceleration Recall from our study of derivatives that for x(t) the position of some particle at time t. . the following relationships hold: dx = v. we consider two examples where either the acceleration or the velocity is constant. this area represents the displacement of the particle between the two times T1 and T2 . x(t). as shown on the left of Figure 4. we also have that T T a(t) dt = v(t) 0 0 = v(T ) − v(0) is the net change in velocity between time 0 and time T . v(t). dt dv = a. 4. v(t) its velocity. (4.1) The quantity on the right hand side of Eqn. Applications of the deﬁnite integral to velocities and rates 4. as shown on the right of Figure 4. Since position. We use the results above to compute the displacements in each case.. i. Similarly. T2 T2 v(t) dt = x(t) T1 T1 = x(T2 ) − x(T1 ). by the Fundamental Theorem of Calculus. (though this quantity does not have a special name). T2 = T . we can interpret T12 v(t) dt as the “area” associated with the curve (counting positive and negative contributions) between the endpoints T1 and T2 . (4. the Fundamental Theorem of Calculus says that T2 T2 a(t) dt = v(t) T1 T1 = v(T2 ) − v(T1 ). dt (Velocity is the derivative of position and acceleration is the derivative of velocity. the difference between the position at time T1 and the position at time T2 . it follows that over the time interval T1 ≤ t ≤ T2 .
4.e.3 Uniformly accelerated motion In this case. T T v dt = vt 0 0 = v(T − 0) = vT.2. The total area under the velocity graph represents net displacement. i. the displacement is proportional to the velocity and to the time elapsed.3) 4. velocity and acceleration 63 v This area represents displacement a This area represents net velocity change t T1 T2 T1 T2 t Figure 4. and the total area under the graph of acceleration represents the net change in velocity over the interval T1 ≤ t ≤ T2 . Displacement. Then clearly. by direct antidifferentiation. i. it must be true that the two expressions obtained above must be equal.2 Displacement for uniform motion We ﬁrst examine the simplest case that the velocity is constant. v(t) = v = constant.1) over the time interval 0 ≤ t ≤ T also leads to T 0 v dt = x(T ) − x(0).2. the acceleration a is a constant. Therefore. 4. by direct antidifferentiation. However. applying result (4. for uniform motion. t.2. The ﬁnal position is x(T ) = x(0) + vT. . Thus. i. T T a dt = at 0 0 = a(T − 0) = aT. Thus.e. so we can rewrite the results in terms of the more familiar (lower case) notation for time.e. (4. Thus. the acceleration is zero since a = dv/dt = 0 when v is constant.1. x(t) = x(0) + vt. x(T ) − x(0) = vT. This is true for all time T .
5) and (4. The above connection between velocity and acceleration holds for any ﬁnal time T .7) This expression represents the position of a particle at time t given that it experienced a constant acceleration. Eqn. Since these two results must match. because frictional forces impede that motion.e. Once the a falling object hits the ground. (4.. this equation no longer holds. i. Then T 0 v(t) dt = x(T ) − x0 . setting Equations (4. initial position x0 and acceleration a allowed us to predict the position of the object x(t) at any later time t. 4. Applications of the deﬁnite integral to velocities and rates However.8) 16 Of course. the acceleration of a falling body is not actually uniform.e. From this we can ﬁnd the displacement and position of the particle as follows: Let us call the initial position x(0) = x0 . T (4.2) for 0 ≤ t ≤ T leads to T 0 a dt = v(T ) − v(0). (4.7)16. (4. (4. A better approximation to the rate of change of velocity is given by the differential equation dv = g − kv. i. using Equation (4. this holds for any time t so that x(t) = x0 + v0 t + a t2 .4) This just means that velocity at time t is the initial velocity incremented by an increase (over the given time interval) due to the acceleration. 2 (4. T .6) equal means that x(T ) − x0 = v0 T + a T2 .4 Nonconstant acceleration and terminal velocity In general. The initial velocity v0 . That is the meaning of Eqn. Let us refer to the initial velocity V (0) as v0 . 2 But this is true for all ﬁnal times. it is true for all t that: v(t) = v0 + at. v(T ) − v(0) = aT so that v(T ) = v(0) + aT. .5) But T T I= 0 v(t) dt = 0 (v0 + at) dt = v0 t + a t2 2 = 0 v0 T + a T2 2 .6) So. dt (4.7) only holds so long as the object is accelerating. for example.2.64 Chapter 4.
Initially. representing a frictional coefﬁcient. The derivative of v(t) (on the left) is connected to the unknown v(t) on the right.10) a(t) dt = 0 0 g e−kt dt = g 0 e−kt dt = g e−kt −k T =g 0 (e−kT − 1) g = 1 − e−kT . In Chapter 9. a(t) = g e−kt .2.8) implies that a(t) = g − kv(t). Displacement. Taking a derivative of both sides of this equation leads to dv da = −k = −ka. g.10) that a(t) is given by a(t) = C e−kt = a0 e−kt .8). a positive constant. Finding the velocity and then the displacement for this type of motion requires special techniques. the acceleration is a(0) = g (since a(t) = g − kv(t). The result is: T T T y(0) = y0 (4. dt dt We observe that this equation has the same form as equation (4.2. we will develop a systematic approach. Because v(t) appears both in the derivative and in the expression kv.4.3. at time t = 0. A good approximation for such drag forces is the term kv. Here. That is. i. (This can be veriﬁed by simple differentiation.e.2 and 4. and v(0) = 0).9) (with a replacing y). −k k In the calculation.) . We ﬁrst recall the following result from ﬁrst term calculus material: The differential equation and initial condition dy = −ky.9) (4.2. we have used the fact that the antiderivative of e−kt is e−kt /k. Equation (4. we use a special procedure that allows us to determine the velocity in this case. which implies (according to 4. and drag forces due to friction with the atmosphere. where a(t) is the acceleration at time t. Since we now have an explicit formula for acceleration vs time. we can apply direct integration as we did in the examples in Sections 4. This equation is a mathematical statement that relates changes in velocity v(t) to the constant acceleration due to gravity. proportional to the velocity. we cannot apply the methods developed in the previous section directly. called Separation of Variables to ﬁnd analytic solutions to equations such as (4. with k. Therefore. dt has a solution y(t) = y0 e−kt . v(0) = 0. velocity and acceleration 65 We will assume that initially the velocity is zero. we do not have an expression that depends on time whose antiderivative we would calculate.
2/s. We use this information to obtain the total change17 that occurs over some time period. in the context of a more detailed discussion of differential equations From our result here.2.8) again in Section 9.8 m/s2 .0 t 30. 4. Terminal velocity (m/s) for acceleration due to gravity g=9. As before. But v(0) = 0 by assumption.0 0. we can also determine how the velocity behaves in the long term: observe that for t → ∞. so. . The velocity reaches a near constant 49 m/s by about 20 s. in particular. v(t) = k We will study the differential equation (4. based on equation (4. so that v(t) → g g (1 − very small quantity) ≈ .8). when drag forces are in effect.0 velocity v(t) 0. The terminal velocity is also a steady state value of Eqn.0 Figure 4. and k = 0. setting T = t. k k Thus. and the result is true for any ﬁnal time T . we examine several examples in which the rate of change of some process is speciﬁed. Applications of the deﬁnite integral to velocities and rates 50.11) 1 − e−kt . the exponential term e−kt → 0. the falling object does not continue to accelerate indeﬁnitely: it eventually attains a terminal velocity. and combining both results leads to an expression for the velocity at any time: g (4. (4.e.2.2.3 From rates of change to total change In this section. 17 We will use the terminology “total change” and “net change” interchangeably in this section. i.2) this integral of the acceleration over 0 ≤ t ≤ T must equal v(T ) − v(0).66 Chapter 4.3. a value of the velocity at which no further change occurs. The object continues to fall at this (approximately constant) speed as shown in Figure 4. We have seen that this limiting velocity is v = g/k.
we would sum up all the small changes. from t = 1 to t = 5. we do not know the temperature at any time. We compute: 5 5 5 ◦ I= 1 f (t) dt = 1 8e−0.1 ) = 25(0. (Carefully note the subtle difference in the wording. .47◦ Celcius.) To get the total change.94 − 0.2t −e −1 1 = −40e−1 + 40e−0. (b) Here. we apply the Fundamental Theorem as before. (b) The rate of change of temperature of a cup of coffee is observed to be f (t) = 8e−0.2 . Find the change in the temperature of the juice between the times t = 1 and t = 5.606) = 7. Recognizing the subtleties of the wording in such examples will be an important skill that the reader should gain.1t )◦ Celcius where t is time in minutes. To determine what net change occurred between times t = 1 and t = 5. we recognize that this amounts to a process of integration. we take a limit as the number of subintervals increases (N → ∞).3. since we were given the way that the rate of change depended on time. f (t)∆t (over N subintervals of duration ∆t = (5 − 1)/N = 4/N ) for t starting at 1 and ending at 5 min. Only in the second case did we need to use a deﬁnite integral to ﬁnd a net change. from information about the rate of change of some function. and how we would handle them (a) The temperature of a cup of juice is observed to be T (t) = 25(1 − e−0. We obtain a sum of the form f (tk )∆t where tk is the k’th time point. From rates of change to total change Changing temperature 67 We must carefully distinguish between information about the time dependence of some function.2t dt = −40e−0. I = 40(e −0.2 ) = 40(0. To do so. Based on this variation of the same concept we can take the usual shortcut of integrating the rate of change. but we are given information about the rate of change.3678) = 18.4. reducing the amount of computation to ﬁnding antiderivatives. f (t).5 ) − 25(1 − e−0. What is the total change in the temperature between t = 1 and t = 5 minutes ? Solutions (a) In this case. we ﬁnd the temperatures at each time point and subtract: That is. Here is an example of these two different cases. the change in temperature between times t = 1 and t = 5 is simply T (5) − T (1) = 25(1 − e−0. we are given the temperature as a function of time.2t Celcius per minute where t is time in minutes. Finally. By now.8187 − 0.
Tree 1 initially has a higher growth rate. the area under the curve for tree 1 is greater. assumed to have the shape of a cylinder. gi (t) is the growth rate as a function of time (shown for each tree in Figure 4. we have to make some important observations about the behaviour shown in Fig 4. so it has grown more.3. which tree is taller after 1 year? After 4 years? Solution In this problem we are provided with a sketch. Applications of the deﬁnite integral to velocities and rates 2 Growth rate 1 0 1 2 3 4 year Figure 4. In years of plentiful rain and adequate nutrients. (This means we do not have to calculate anything. Our solution will thus be qualitative (i.e.68 Chapter 4. grows incrementally. where i = 1 for tree 1.) We recognize that the net change in height of each tree is of the form T Hi (T ) − Hi (0) = gi (t)dt.2 Radius of a tree trunk The trunk of a tree. 0 i = 1. descriptive). 4. 4. Suppose the rainfall pattern . But. 2.3.3. Thus we must interpret the net change in height for each tree as the area under its growth curve. rather than quantitative.3. At t = 4 years the area under the second curve is greatest so tree 2 has grown the most by that time.1 Tree growth rates The rate of growth in height for two species of trees (in feet per year) is shown in Figure 4. by the Fundamental Theorem of Calculus. If the trees start at the same height. rather than a formula for the growth rate of the trees. so that its crosssection consists of “rings”. i = 2 for tree 2. rather.3 that at t = 1 year. Growth rates of two trees over a four year period. the tree grows faster than in years of drought or poor soil conditions. We see from Figure 4.3) and Hi (t) is the height of tree “i” at time t. this deﬁnite integral corresponds to the area under the curve gi (t) from t = 0 to t = T . but tree 2 catches up and grows faster after year 3.3.
the growth rate of the radius of the tree trunk (in cm/year) is given by the function f (t) = 1.) Solution (a) Let R(t) denote the trunk’s radius at time t. as shown in Figure 4. Let the height of the tree trunk be approximately constant over this ten year period.5t − cos(πs/5) π/5 t .4. and we are told that at t = 0. what will the radius of the trunk be at time t later? (b) What is the ratio of the mass of the tree trunk at t = 10 years and t = 0 years? (i. A graph of this growth rate over the ﬁrst ﬁfteen years is shown in Figure 4.4. we get f (s) ds = 0 0 (1. ﬁnd the ratio mass(10)/mass(0). Thus.4.5 + sin(πt/5). (a) If the radius was initially r0 at time t = 0.3.5t − + .4.0 Figure 4. using the initial value. π π . The net change in the radius is t t R(t) − R(0) = Integrating the above. leads to 5 cos(πt/5) 5 R(t) = r0 + 1. 0 Here we have used the fact that the antiderivative of sin(ax) is −(cos(ax)/a).0 time. R(0) = r0 . R(0) = r0 (which is a constant). f (t) for a growing tree over a period of 14 years. has been cyclic. over a period of 14 years.0 f(t) 0.e. Rate of change of radius.0 0. and assume that the density of the trunk is approximately 1 gm/cm3 .5 + sin(πs/5)) ds. R(t) − R(0) = 1. From rates of change to total change 69 3. and evaluating the integral. The rate of change of the radius of the tree is given by the function f (t). so that. t 14.
This function is equivalent to the area associated with the function shown in Figure 4. .5.70 Chapter 4. we need only obtain the volume at t = 10. t 14. Taking the trunk to be cylindrical means that the volume at any given time is V (t) = π[R(t)]2 h.0 time. 1 gm/cm3 . Notice that Figure 4. After ten years we have R(10) = r0 + 15 − But cos(2π) = 1.5 conﬁrms that the radius keeps growing over the entire period. obtained by integrating the rate of change of radius shown in Fig. as a function of time. 4.0 R(t) 0.4 to that of the resulting radius at time t. so R(10) = r0 + 15. and since the density in this example is constant. The ratio we want is then V (10) π[R(10)]2 h [R(10)]2 = = = 2h 2 V (0) πr0 r0 r0 + 15 r0 2 5 5 cos(2π) + . Applications of the deﬁnite integral to velocities and rates (The constant at the end of the expression stems from the fact that cos(0) = 1. (b) The mass of the tree is density times volume. but that its growth rate (slope of the curve) alternates between higher and lower values. π π . 25.5.4. We also related the graph of the radial growth rate in Fig.4. In this problem we used simple antidifferentiation to compute the desired total change.) A graph of the radius of the tree over time (using r0 = 1) is shown in Figure 4.0 0. R(t).5. in Fig.0 Figure 4. 4. 4. The radius of the tree.
While this problem involves simple integration. i. T 2 + 5T − 14 = 0. where t is time in years after the end of the war. 4. (a) How many babies in total were born during this time period (i. b(t) over the given time period. the birth rate in western countries increased dramatically.4. which can be written in the factored form. Then.3. Solution (a) To ﬁnd the number of births. We would like to determine when the quantity of material increases. 0 ≤ t ≤ 10.4. The net change in the population due to births (neglecting deaths) is 10 10 P (10)−P (0) = 0 b(t) dt = 0 (5+2t) dt = (5t+t2 )10 = 50+100 = 150[million babies]. we would integrate the birth rate. but we reject T = −7 since we are looking for time after the War. We investigate a process in which a substance accumulates as it is being produced. . (T − 2)(T + 7) = 0. but disappears through some removal process. Thus we ﬁnd that T = 2 years. Many problems in real application involve such slight twists on the ideas of integration.3 Birth rates and total births After World War II. Production and removal 71 4.e in the ﬁrst 10 years after the war)? (b) Find the time T0 such that the total number of babies born from the end of the war up to the time T0 was precisely 14 million. The example in this section is of this type.e it took two years for 14 million babies to have been born. we had to solve for a quantity (T ) based on information about behaviour of that integral. [in units of million] T T I= 0 b(t) dt = 14 = 0 (5 + 2t) dt = 5T + T 2 equating I = 14 leads to the quadratic equation. and when it decreases.4 Production and removal The process of integration can be used to convert rates of production and removal into net amounts present at a given time. 0 (b) Denote by T the time at which the total number of babies born was 14 million. This has two solutions. Suppose that the number of babies born (in millions per year) was given by b(t) = 5 + 2t.
. periodic functions with somewhat different phases. At what time is the hormone level in the blood highest? 4. p(t). This ﬁgure shows the production and removal rates over a period of 24 hours. of the hormone are timedependent.72 Chapter 4. Solutions 1. A typical example of two such functions are shown in Figure 4. A competing process might be the removal of hormone (or its deactivation by some enzymes secreted by other cells.6.6 to answer the following questions: 1. maximal? 2. When is the removal rate r(t) minimal? 3. we will assume that both the production rate. Applications of the deﬁnite integral to velocities and rates Circadean rhythm in hormone levels Consider a hormone whose level in the blood at time t will be denoted by H(t).6 that production rate is maximal at about 9:00 am. We see directly from Fig. r(t). The rate of hormone production p(t) and the rate of removel r(t) are shown here.6. and the removal rate. Find the maximal level of hormone in the blood over the period shown. starting at midnight.) This type of cyclic pattern is called circadean rhythm. We want to use these graphs to deduce when the level of hormone is highest and lowest.) In this example. in some cyclic pattern that repeats every 24 hours or so. hormone production/removal rates p(t) r(t) t 0 3 6 9 12 (noon) 3 6 9 0 hour Figure 4. When is the production rate. p(t). At what time is the hormone level in the blood lowest? 5. assuming that its basal (lowest) level is H = 0. 4. (The production rate of hormone might depend on the time of day. We will assume that the level of hormone is regulated by two separate processes: one might be the secretion rate of specialized cells that produce the hormone. Our ﬁrst task will be to use properties of the graph in Figure 4.
sin(ωt) = cos(ωt)). With these remarks. Intersection points occur when p(t) = r(t) A(1 + sin(ωt)) = A(1 + cos(ωt)). this integral will be positive. Otherwise. But then. we ﬁnd that the hormone level in the blood will be greatest at 3:00 pm. the level of the substance is accumulating. But we must use caution here: For any time interval over which p(t) > r(t). Here we will practice integration by actually ﬁtting some cyclic functions to the graphs shown in Figure 4. 4. over the time period 3 ≤ t ≤ 15 hrs. Production and removal 2. This last equality leads to ωt = π/4.4. Thus. whereas in the opposite situation. removal rate is minimal at noon. the substance is decreasing. This makes simple intuitive sense: If production is greater than removal. 15. just as the curves cross each other. and the hormone level will have increased. 73 3. To answer this question we note that the total amount of hormone produced over a time period a ≤ t ≤ b is b Ptotal = p(t)dt. so that the hormone level has decreased. Similarly. then the frequency of the oscillation is ω = (2π)/24 = π/12. 5π/4. r(t) = A(1 + cos(ωt)). the integral yields a negative result. Similarly. the least hormone level occurs after a period in which the removal rate has been larger than production for the longest stretch. We further observe that the functions shown in the Figure 4. We interpret this integral as the area between the curves p(t) and r(t).7 have the form p(t) = A(1 + sin(ωt)). This occurs at 3:00 am.6. Our ﬁrst observation is that if the length of the cycle (also called the period) is 24 hours. 5.4. a The total amount removed over time interval a ≤ t ≤ b is b Rtotal = r(t)dt. if r(t) < p(t). the hormone level is . the fact that ω = π/12 implies that t = 3. a This means that the net change in hormone level over the given time interval (amount produced minus amount removed) is b H(b) − H(a) = Ptotal − Rtotal = a (p(t) − r(t))dt. when the greatest (positive) area between the two curves has been obtained. ⇒ tan(ωt) = 1.
In some cases. in which case f (t) would be constant. will . For simplicity. we will take the amplitude A = 1. between √ 3:00 am and 3:00 pm is 24 2/π. which is a kind of savings account that guarantees a continuous stream of income.5 0 5 10 t 15 20 Figure 4. we ask for income f (t) that could vary from year to year. Applications of the deﬁnite integral to velocities and rates 2 1.6 increasing. this would just be a multiplicative constant in whatever solution we compute. The functions shown above are trigonometric approximations to the rates of hormone production and removal from Figure 4. More generally. the reader is asked to compute this integral and to show that the amount of hormone that accumulated over the time interval 3 ≤ t ≤ 15. If the bank interest rate is r. how much should you pay now? Solution If we invest P dollars (the “principal” i.74 Chapter 4. 4. You would like to pay P dollars to purchase an annuity that will pay you an income f (t) every year from now on. for t > 0.e. (In general.5 Present value of a continuous income stream Here we discuss the value of an annuity. we might want a constant income every year. we can consider the case that at each future year t.) Then the net increase in hormone over this period is calculated from the integral 15 15 Htotal = 3 [p(t) − r(t)] dt = 3 [(1 + sin(ωt)) − (1 + cos(ωt))] dt In the problem set. i. the amount deposited) in the bank with interest r (compounded continually) then the amount A(t) in the account at time t (in years)..5 1 0.7.e.
) Thus. we have that A(t) = P (1 + h) h rt = P (1 + h) h ≈ P ert 1 1 rt for large n or small h. etc. Then n = . frequent intervals of compounding) the expression in square brackets above can be approximated by e. (4.5.12). Present value of a continuous income stream grow as follows: 75 r nt .g. One problem is that we do not know in advance how long . n = 12 means monthly compounded interest. to get f (t) in a future year t. n where r is the yearly interest rate (e.4. and where we have approximated a sum of payments by an integral (of a continuous income stream). 5%) and n is the number of times per year that interest is compound (e. we have found that the amount in the bank at time t will grow as A(t) = P ert . h→0 (This result was obtained in a ﬁrst semester calculus course by selecting the base of exponentials such that the derivative of ex is just ex itself.) Summing over all the years. (4. (assuming continually compounded interest). Suppose we want to have f (t) spending money for each year t. the base of the natural logarithms. Here we have used the fact that when h is small (i.e.12) Having established the exponential growth of an investment. we ﬁnd that the present value of the continuous income stream is L L P = t=1 f (t)e−rt · 1 ≈ “∆t f (t)e−rt dt. the principle amount that we should invest in order to have A(t) to spend at time t is P = A(t)e−rt .e. Recall that 1 e = lim (1 + h) h . Rewriting Eqn. We refer to the present value of year t as the quantity P = f (t)e−rt . we return to the question of how to set up an annuity for a continuous stream of income in the future. n = 2 means interest compounded twice per year. (i. in the present. 0 where L is the expected number of years left in the lifespan of the individual to whom this annuity will be paid. n h Then at time t.g.) Deﬁne A(t) = P 1 + h= r r . We must pay P now.
integral over an unbounded domain).76 Chapter 4. . we will deﬁne average value of the function as follows: 18 In Chapters 5 and 8. By a previous calculation in Section 3. 000 annually for every future year.e. r e. and about their technical deﬁnition. We shall have more to say about the properties of such integrals. i.6 Average value of a function In this ﬁnal example.05 Therefore. we could assume that this income stream continues forever. In such an approximation. and properties in Chapter 10. (4. Applications of the deﬁnite integral to velocities and rates the lifespan L will be.8. 000 from the bank.18 Given a function y = f (x) over some interval a ≤ x ≤ b. First. As a crude approximation. (4.e.5. we deﬁne what is meant by average value in this context. then P = 10000 = $200. It is important to avoid confusing these distinct notions. in this case f (t) = $10. existence. we ﬁnd that 1 P = 10000 · .8. we have to compute the integral: ∞ P = 0 f (t)e−rt dt. we need to pay $200.13) is an improper integral (i. We refer to the quantity ∞ P = 0 f (t)e−rt dt.14). 000 is a function that has a constant value for every year. 4.e. ∞ ∞ P = 0 10000e−rt dt = 10000 0 e−rt dt.g. (4. 0. we will encounter a different type of average (also called mean) that will represent an average horizontal position or center of mass. 000.000 today to get 10. i. Example: Setting up an annuity Suppose we want an annuity that provides us with an annual payment of 10. as we have already encountered in Section 3.5. if interest rate is 5% (and assumed constant over future years). we apply the deﬁnite integral to computing the average height of a function over some interval. Then from Eqn (4. that L ≈ ∞.14) as the present value of a continuous income stream f (t).13) The integral in Eqn.
the date in spring when night and day lengths are equal.8. Let us measure time t in days.e.6. A simple function that would describe the cyclic changes of day length over the seasons is 2πt . Average value of a function Deﬁnition The average value of f (x) over the interval a ≤ x ≤ b is 77 ¯ f= 1 b−a b f (x)dx. each being 12 hrs. Solution ¯ f= 1 4−2 4 x2 dx = 2 1 x3 2 3 4 = 2 28 1 3 4 − 23 = 6 3 Example 2: Day length over the year Suppose we want to know the average length of the day during summer and spring. We will assume that day length follows a simple periodic behaviour. a Example 1 Find the average value of the function y = f (x) = x2 over the interval 2 < x < 4. Its maximal value is 16h and its minimal value is 8h. with t = 0 at the spring equinox. i. over the ﬁrst (365/2) ≈ 182 days is: 1 ¯ f= 182 = = 1 182 182 f (t)dt 0 182 12 + 4 sin( 0 πt ) 182 dt 182 4 · 182 πt cos( ) π 182 0 4 · 182 12 · 182 − [cos(π) − cos(0)] π 8 = 12 + ≈ 14. f (t) = 12 + 4 sin 365 This is a periodic function with period 365 days as shown in Figure 4.15) . We will refer to the number of daylight hours on day t by the function f (t).4.546 hours π 1 182 1 = 182 12t − (4. (We will also call f (t) the daylength on day t. with a cycle length of 1 year (365 days).e. i. The average day length over spring and summer.
since overall.) Now let us draw in a rectangle over the same interval (0 ≤ t ≤ 365) having the same total area. red region. It is left as an exercise for the reader to show that the average value of f over the entire year is 12 hrs. we show the entire day length cycle over one year. the whole year. we found that for motion at constant acceleration a.78 Chapter 4. Applications of the deﬁnite integral to velocities and rates 16. i.2. the displacement of a moving object can be obtained by integrating twice: the deﬁnite integral of acceleration is the velocity v(t).546 hrs long during the spring and summer. t t v(t) = v0 + 0 a ds. where the interval is 0 ≤ t ≤ 365.0 0.0 <=================> summer winter 365. In Figure 4. 1. but the meaning is. (This makes intuitive sense. (This area is painted red (dark) in Figure 4. (See the big rectangle in Figure 4. we arrived at a number of practical applications of the deﬁnite integral. In Section 4.0 winter 365.e. (Here we use the “dummy variable” s inside the integral.0 <=================> 0. and the deﬁnite integral of the velocity is the displacement. on average.) We showed that at .0 summer 0. 4. We show the variations in day length (cyclic curve) as well as the average day length (height of rectangle) in this ﬁgure.8. x(t) = x(0) + 0 v(s) ds.8.0 Figure 4. the day is 14.) Figure 4. and note that its area matches with the darker.) The height of the rectangle represents the average value of f (t) over the interval of interest.0 <=================> <=================> 0. Suppose we determine the area associated with the graph of f (x) over the interval of interest.7 Summary In this chapter.0 16.8.8 also shows geometrically what the average value of the function represents. the same as in the previous presentation of the formulae. Thus. of course.8. the short days in winter will average out with the longer days in summer.
In general. .) Again. the position of an object initially at x0 with velocity v0 is x(t) = x0 + v0 t + a t2 . the acceleration gradually decreases. 2 79 2. when air resistance tends to produce a drag force to slow the motion of a falling object. we could compute a velocity. a(t) = ge−kt . volumes. This idea was discussed in Section 4.2.4). length. etc. In some of these we will be called on to “dissect” a geometric shape into pieces that are not simple rectangles. to such topics as mass. then b a r(s) ds = Total change over the time interval a ≤ t ≤ b.6. 4. we discussed the average value of a function f (x) over some interval a ≤ x ≤ b. nevertheless be the same. using the deﬁnite integral. The essential idea of an integral as a sum of many (inﬁnitesimally) small pieces will.3. t v(t) = 0 a(s) ds = g (1 − e−kt ). In the concluding Section 4. We found that in that case. Summary any time t.7. We extended our analysis of a moving object to the case of nonconstant acceleration (Section 4. ¯ f= 1 b−a b f (x)dx. k 3. a In the next few chapters we encounters a vast assortment of further examples and practical applications of the deﬁnite integral. we saw that if r(t) represents a rate of change of some process.4. We illustrated the connection between rates of change (over time) and total change (between on time point and another). (The decaying exponential means that a → 0 as t increases.
80 Chapter 4. Applications of the deﬁnite integral to velocities and rates .
In this context. we consider applications of the deﬁnite integral to calculating geometric quantities such as volumes of geometric solids. We will use familiar formulae for the volumes of disks and cylindrical shells. in Chapters 7 and 8. centers of mass. a limiting process of reﬁnement leads to the desired result. This means that we need not rederive the link between Riemann Sums and the deﬁnite integral. masses. mass. requiring us to visualize 3dimensional (3D) objects. we will calculate the total mass of a continuous density distribution. The volume of the entire object will then be obtained by summing up volumes of a stack of disks or a set of embedded shells. and lengths of curves. The technology of the deﬁnite integral. There are some important differences between material in this chapter and in previous chapters. we can use these as we did in Chapter 4. Then we show how the same concept can be applied in the continuous case. we will also deﬁne the concept of a center of mass. The idea behind the procedure described in this chapter is closely related to those we have become familiar with in the context of computing areas. but ﬁrst we review the basics of elementary volumes that will play the dominant role in our calculations. developed in Chapters 2 and 3 applies directly. Calculating volumes will stretch our imagination. The connection between the discrete and continuous representation will form an important link with our study of analogous concepts in probability. we will consider how to dissect certain three dimensional solids into a set of simpler parts whose volumes are easy to compute. and how they can be subdivided into shells or slices. and considering the limit as the thickness of the dissection cuts gets thinner. Most of our effort will be aimed at understanding how to set up the needed integral. In the second part of this chapter. We provide a number of examples of this procedure. Then. and length 5. In the ﬁrst parts of this chapter.Chapter 5 Applications of the deﬁnite integral to calculating volume.1 Introduction In this chapter. 81 . we ﬁrst imagine an approximation using a ﬁnite number of pieces to represent a desired result. and carefully construct a summation to represent the desired volume. That is. We will ﬁrst motivate this concept for a discrete distribution made up of a number of ﬁnite masses.
2 A continuous distribution: mass density and total mass We now consider a continuous mass distribution where the mass per unit length (“density”) changes gradually from one point to another.2 has a density that varies along its length. we will ﬁnd that these ideas are equally useful in the context of probability. (The density of the material along the length of the bar is shown in the graph. . Later. so that n M= i=1 mi .2.2 Mass distributions in one dimension We start our discussion with a number of example of mass distributed along a single dimension. We would like to describe some properties of this distribution. we consider a discrete collection of masses and then generalize to a continuous density. and their positions are of interest to us. We will denote that density by ρ(x) and this carries units of mass per unit length. mass. the bar in Figure 5. and length 5.82Chapter 5. i = 1 . it becomes an easy step to develop the analogous concepts for continuous distributions. Let us imagine dividing up the bar into small pieces of length ∆x as shown in Figure 5. . A discrete distribution of masses along a (one dimensional) wire. We will be interested in computing the total mass (by summation or integration) as well as other properties of the distribution.) How would we ﬁnd the total mass? Suppose the bar has length L and let x (0 ≤ x ≤ L) denote position along that bar. First. 5. Examples in this chapter also further reinforce the idea of density (in the context of mass density). is just the sum of the individual masses. (5. The portion at the left is made of lighter material. M . We will label each bead with an index.1 A discrete distribution: total mass of beads on a wire m1 x1 m2 x2 m3 x3 m4 x4 m5 x5 Figure 5. Each bead has a certain position (that we will think of as the value of xi ) and a mass that we will call mi . This allows us to recapitulate the link between ﬁnite sums and deﬁnite integrals that we developed in earlier chapters. The total mass of the beads. In considering the example of mass distributions.2. or has a lower density than the portions further to the right.1) 5. .2. For example.1 we see a number of beads distributed along a thin wire. We will think of this arrangement as a discrete mass distribution: both the masses of the beads. explored in Chapters 7 and 8.1. In Figure 5. Applications of the deﬁnite integral to calculating volume. n (there are ﬁve beads so that n = 5).
discussed in Example 5. The total mass of the bar (“sum of all the pieces”) will be represented by an integral (5. xk = k∆x. ∆x.) The total mass is then a sum of masses of all the pieces. (Observe that units are correct. . Here we have subdivided the bar into n smaller pieces.2) as we let the size. The mass of each piece is approximately mk = ρ(xk )∆x where xk = k∆x.. .. .2) as we make the size of the pieces smaller.5. xN = L and the corresponding masses of each of the pieces are approximately mk = ρ(xk )∆x. each of length ∆x. mn xn Figure 5.. and. Bottom: the discretized approximation of this same distribution. this sum will approach the integral L M= 0 ρ(x)dx (5.. . Note that ∆x has units of length. The density of the bar (mass per unit length). ..3. .3.2. of the pieces become inﬁnitesimal.. . that is massk =(mass/length)· length. The coordinates of the endpoints of those pieces are then x0 = 0.2. Mass distributions in one dimension 83 mass distribution ρ( x ) Δx x m1 x1 m2 x2 .. ρ(x) is shown on the graph. as we have seen in an earlier chapter. Top: A continuous mass distribution along a one dimensional bar.
Note that actin is densest in the middle of the band. (5.3a. and length We can also deﬁne a cumulative function for the mass distribution as x M (x) = 0 ρ(s)ds. cell) are found at the edge of the cell in a band called the actin cortex. which is a cell from the scale of a ﬁsh. receptors. The idea of a cumulative function becomes useful in discussions of probability in Chapter 8. 5. A band of actin ﬁlaments (protein responsible for structure and motion of the c actin cortex −1 1 b actin cortex d ρ =1−x 2 nucleus −1 0 1 x Figure 5.84Chapter 5. In (b) we show a schematic sketch of the actin cortex (shaded).e. where x is the fraction of distance from midpoint to the end of the band (Fig. Here ρ(x) is an actin ﬁlament density in units of ﬁlaments per µm. i.3.3 Example: Actin density inside a cell Biologists often describe the density of protein. ρ is the number Mogilner is a professor of mathematics who specializes in cell biology and the actin cytoskeleton that 1µm (read “ 1 micrometer” or “micron”) is 10−6 meters. A cell (keratocyte) shown in (a) has a dense distribution of actin in a band called the actin cortex.3) Then M (x) is the total mass in the part of the interval between the left end (assumed at 0) and the position x. That distribution is shown in (d). 5. Here we show a keratocyte. 5. In (c) that band of actin is scaled and straightened out so that it occupies a length corresponding to the interval −1 ≤ x ≤ 1. Applications of the deﬁnite integral to calculating volume. the position corresponding to the midpoint of the edge of the cell shown in Fig. or other molecules in cells. It has been found experimentally that the density of actin is greatest in the middle of the band. 20 Note 19 Alex . We are interested in the distribution of actin ﬁlaments across that band. and is appropriate for measuring lengths of small objects such as cells. the density of actin across the cortex in ﬁlaments per edge µm is well approximated by a distribution of the form ρ(x) = α(1 − x2 ) 20 − 1 ≤ x ≤ 1.3. (a) Credit to Alex Mogilner. According to Alex Mogilner19 . One example is shown in Fig. 5. That is.3c and d). mass.2.
3. The integral above has already been computed (Integral 2.3 Mass distribution and the center of mass It is useful to describe several other properties of mass distributions.1 Center of mass of a discrete distribution The center of mass x of a mass distribution is given by: ¯ 1 M n x= ¯ This can also be written in the form xi mi . x which is akin to an average x coordinate. we have that there are N = 4α/3 actin ﬁlaments in the band.3. i. We can ﬁnd the total number of actin ﬁlaments. L 0 ρ(x)dx . n i=1 mi 5. Mass distribution and the center of mass of actin ﬁbers per unit length.2 of Chapter 3 and was found to be 4/3. N in the band by integration. i=1 x= ¯ n i=1 xi mi . Based on this. this becomes an integral. Thus.) in the Examples 3. 0 x= ¯ L xρ(x)dx 0 . Our usual approach of subdividing the interval 0 ≤ x ≤ L and computing a Riemann sum leads to n 1 xi ρ(xi )∆x. 1 1 85 N= −1 α(1 − x2 ) dx = α −1 (1 − x2 ) dx. x= ¯ M i=1 As ∆x → dx. 5.3.6.5. We ﬁrst deﬁne the “center of mass”. it makes sense to deﬁne the center of mass of the continuous mass distribution as follows: 1 M L x= ¯ We can also write this in the form xρ(x)dx . ¯ 5.2 Center of mass of a continuous distribution We can generalize the concept of the center of mass for a continuous mass density.e.
2 .3 Example: Center of mass vs average mass density Here we distinguish between two (potentially confusing) quantities in the context of an example. (d) Where along the length of the bar should you cut to get two pieces of equal mass? Solution (a) From our previous discussion. which is not the same as the average mass density. (d) We can use the cumulative function deﬁned in Eqn. Suppose that the mass density is given by ρ(x) = ax. Applications of the deﬁnite integral to calculating volume.3) to ﬁgure out where half of the mass is concentrated.2. (This is the physical interpretation of average mass density. 0 ≤ x ≤ L. aL2 3 3 Observe that the center of mass is an “average x coordinate”. (a) Find the total mass of the bar. A long thin bar of length L is made of material whose density varies along the length of the bar. (b) Find the average mass density along the bar.86Chapter 5. the total mass of the bar is L M= 0 ax dx = ax2 2 L = 0 aL2 . (c) Find the center of mass of the bar. Let x be distance from one end of the bar. 2 (b) The average mass density along the bar is computed just as one would compute the average value of a function: integrate the function over an interval and divide by the length of the interval. Thus ρ= ¯ 1 L L ρ(x) dx = 0 1 L aL2 2 = aL 2 A bar having a uniform density ρ = aL/2 would have the same total mass as the bar ¯ in this example. mass. and length 5. Suppose we cut the bar at some position x = s. An example of this type appeared in Section 4. (5. 5.) (c) The center of mass of the bar is x= ¯ L 0 xp(x) dx 1 = M M L 0 ax2 dx = a x3 M 3 L = 0 2a L3 2 = L.6. Then the mass of this part of the bar is s M1 = 0 ρ(x) dx = as2 . This type of mass density is shown in a panel in Fig.3.
or a bar of varying density. In the next examples. Underlying the integral we computed was the idea that the interval could be “dissected” into small parts (of width ∆x).4 we show a schematic version of what this gradient might look like.3 we discuss this in detail.1 + 0.4 Physical interpretation of the center of mass The center of mass has a physical interpretation: it is the point at which the mass would “balance”.5. we have slightly more interesting geometries.4.) Determine the total amount of glucose in the tube (in gm).) In Figure 5. but instead of dissecting the region into 1dimensional intervals. In the Appendix 11.3. (This is called a density gradient). This concept will appear again in the context of probability in Chapter 8. we must have M1 = Solving for s leads to M . 5. Up to now. assume that it is a simple cylinder. Miscellaneous examples and related problems 87 We ask for what values of s is it true that M1 is exactly half the total mass? Using the result of part (a). we determined the total mass by integration. We give some examples in this section.e. and height h.5x grams per centimeter3.1 Example: A glucose density gradient A cylindrical testtube of radius r. cutting the bar at a distance ( 2/2)L from x = 0 results in two equal masses.4 Miscellaneous examples and related problems The idea of mass density can be extended to related problems of various kinds. we consider similar ideas. . 5. (In reality. (x = 0 at the top of the tube. For the continuous distributions. and x = h at the bottom of the tube. Suppose that the concentration c as a function of the depth x is c(x) = 0. contains a solution of glucose which has been prepared so that the concentration of glucose is greatest at the bottom and decreases gradually towards the top of the tube. and a sum of pieces transformed into an integral. Neglect the “rounded” lower portion of the tube: i. actin density along the edge of a cell. we have seen examples of mass distributed in one dimension: beads on a wire. 5. 2 2 √ Thus. 2 ⇒ as2 1 aL2 = 2 2 2 √ 1 2 s= √ L= L. we ﬁnd that for this to be true. Remark: the position that subdivides the mass into two equal pieces is analogous to the idea of a median. the transition between high and low concentration would be smoother than shown in this ﬁgure. (in Chapter 4).4.
seems more complicated than the onedimensional highway described in Chapter 4. . Even though the geometry of the testtube.1x + 0.1 + 0. Suppose that the height of the testtube is h = 10 cm and its radius is r = 1 cm. at ﬁrst glance. (As before.5x2 2 h = πr2 0.5h2 2 .1h + 0 0.) We have neglected the complication of the rounded bottom portion of the testtube. we use a deﬁnite integral.4) is a onedimensional problem.1(10) + 0. here labeled as the “x” axis. so that integration over its length (which is actually summation of disks shown in Figure 5.5(100) 2 = π (1 + 25) = 26π gm. In this case the total amount of glucose in the tube is h G = πr2 0 (0. and length x=0 r Δx x=h Figure 5. Suppose that the thickness of a slice is ∆x.88Chapter 5. The amount of glucose in the slice is approximately equal to the concentration c(x) multiplied by the volume of the slice. mass.) The integral we want is G = πr2 0 h c(x) dx. we observe here that the integral that computes the total amount is still a sum over a single spatial variable.5x)dx = πr2 0. the small slice contains an amount πr2 ∆xc(x) of glucose. Solution We assume a simple cylindrical tube and consider imaginary “slices” of this tube along its vertical axis. i. we imagine ∆x → dx becoming “inﬁnitesimal” as the number of slices increases. (Note the resemblance between the integrals L I= 0 C(x) dx and G = πr2 0 h c(x) dx.e. Then the total mass of glucose is G = π 0.4. In order to sum up the total amount over all slices. Then the volume of each of these (disk shaped) slices is πr2 ∆x. A diskshaped slice of the tube with small thickness ∆x has approximately constant density. x. Applications of the deﬁnite integral to calculating volume. A testtube of radius r containing a gradient of glucose. here and in the previous example.
5. we have shown a topdown view of a ring of nearly constant density on the right in Figure 5. Our task is to determine how to set up the problem in terms of an integral. However. The area of that “ring”21 would then be the area of the larger circle minus that of the smaller circle. At distance r from the center of the colony. is observed to be b(r) = 1 − r2 (Note: r is distance from the center in cm. Miscellaneous examples and related problems 89 In the next example. Note that the function describing density. A colony of bacteria with circular symmetry. then the quadratic term is very very small so that Aring ≈ 2πr∆r. The superimposed curve on the left is the bacterial density b(r) as a function of the radius r. so that 0 ≤ r ≤ 1). We see that this ring occupies the region between two circles. 21 Students . e. 5.4. if we make the thickness of that ring really small (∆r → 0). and. between a circle of radius r and a slightly bigger circle of radius r + ∆r. A ring of small thickness ∆r has roughly constant density.5.5. commonly make the error of writing Aring = π(r + ∆r − r)2 = π(∆r)2 . Solution Figure 5. namely Aring = π(r + ∆r)2 − πr2 = π(2r∆r + (∆r)2 ). The density as a function of distance from the center is given by b(r). we consider a circular geometry. This trap should be avoided! It is clear that the correct expression has additional terms.5 shows a rough sketch of a ﬂat surface with a colony of bacteria growing on it. What is the total number of bacteria in the colony? b(r) b(r)=1−r 2 r Δr r Side view Top−down view (one ring) Figure 5. the density of the bacteria.4. but the concept of dissecting and summing is the same. since we really are computing a difference of two circular areas.2 Example: A circular colony of bacteria A circular colony of bacteria has radius of 1 cm. as shown in Figure 5. b(r) is smooth. we must imagine which type of subdivision would lead to the summation (integration) needed to compute total amount. We assume that this distribution is radially symmetric.5. but to accentuate the strategy of dissecting the region. in units of one million cells per square centimeter. again.g.
Here we restrict attention to symmetric objects denoted solids of revolution. Vdisk = πr2 τ. is Vcylinder = πr2 h.7 we show one such curve. and the surface it forms when it is revolved about the y axis.5 Volumes of solids of revolution We now turn to the problem of calculating volumes of 3D solids. The volume of a cylinder of height h having circular base of radius r. To get the total number in the colony we sum up over all the rings from r = 0 to r = 1 and let the thickness.6 ).1 Volumes of cylinders and shells Before starting the calculation. The volume of a circular disk of thickness τ . The volume of a cylindrical shell of height h.5. We calculate the result as follows: Btotal = 2π r4 r2 − 2 4 = (πr2 − π r4 ) 2 =π− π π = . b(r) · (2πr∆r) = 2πr(1 − r2 )∆r. with circular radius r and small thickness τ (shown on the right in Figure 5. i. The outer surface of these objects is generated by revolving some curve around a coordinate axis. and radius r (shown on the left in Figure 5. b(r) and the area of the ring. See Figure 5.57 million cells. let us recall the volumes of some of the geometric shapes that are to be used as elementary pieces into which our shapes will be carved. ∆r → dr become very small.6. But. 5.6) is Vshell = 2πrhτ. 5. Applications of the deﬁnite integral to calculating volume. 2. 1. as with other examples.) The total number within such a ring is the product of the density. and length Consider all the bacteria that are found inside a “ring” of radius r and thickness ∆r (see Figure 5. is a special case of the above. this is equivalent to calculating a deﬁnite integral.90Chapter 5.5.e. 2 2 Thus the total number of bacteria in the entire colony is π/2 million which is approximately 1. In Figure 5. mass. 3. (This approximation holds for τ << r.) . namely: 1 1 Btotal = 0 (1 − r)(2πr) dr = 2π 1 0 0 (1 − r2 )rdr = 2π 1 0 1 0 (r − r3 )dr.
7). the approximation of the true volume is good. In the limit. A solid of revolution is formed by revolving a region in the xyplane about the yaxis.7 and the surface it forms when it is revolved about the y axis. and how these form a set of approximating disks for the 3D solid of revolution. 5. y y y x x x Figure 5.7. The total volume of the disks is not the same as the volume of the object but if we make the thickness of these disks very small. The volumes of these simple 3D shapes are given by simple formulae.6. We show how the region is approximated by rectangles of some given width. Here we will present examples based on disks. In the same ﬁgure. The reader should recognize a familiar theme. as the thickness of the disks becomes inﬁnitesimal.5. In Appendix 11.5. we also show how a set of approximating rectangular strips associated with the planar region (grey rectangles) lead to a set of stacked disks (orange shapes) that approximate the volume of the solid (greenish object in Fig. Volumes of solids of revolution 91 r τ r h τ disk shell Figure 5.5. We used the same concept in . we arrive at the true volume of the solid of revolution.2 Computing the Volumes Consider the curve in Figure 5. 5. We use them as basic elements in computing more complicated volumes.4 we give an example based on shells.
the key step is to make careful observation of the way that the radius of a given disk depends on the function that generates the surface. Applications of the deﬁnite integral to calculating volume. the radius of the crosssections depend on the position of the cut.e. Here the solid of revolution is formed by revolving the curve y = f (x) about the y axis. so that r = f (x). means that its volume is vk = π[f (xk )]2 ∆x. The volume of this single disk is then v = π[f (x)]2 ∆x. Considering this disk to be based at the k’th coordinate point in the stack. When we increase the number of disks. The volume of this disk is hence v = π[f (x)]2 ∆x arrive at a deﬁnite integral. . One such disk has been pulled out and labeled for our inspection. The thickness of the disk (in the x direction) is ∆x. 5. We note that its radius (in the y direction) is given by the height of the graph of the function.92Chapter 5. We note that if this surface is cut into slices. we y y=f(x) x disk radius: r=f(x) x disk thickness: Δx Figure 5. A typical disk used to approximate the volume is shown. mass.8 similarly shows a volume of revolution obtained by revolving the graph of the function y = f (x) about the x axis. Fig. The radius of the disk (parallel to the y axis) is r = y = f (x). making each one thinner so that ∆x → 0. In most of the examples discussed in this chapter. b V = a π[f (x)]2 dx. Summing up the volumes of all such disks in the stack leads to the total volume of disks Vdisks = k π[f (xk )]2 ∆x.8. and length computing areas using Riemann sums based on rectangular strips in Chapter 2. The thickness of the disk (parallel to the x axis) is ∆x. i. Let us imagine a stack of disks approximating this volume. at xk .
the radius of a disk through the x axis at a point xk is speciﬁed by the function rk = f (xk ). Example 1: Volume of a sphere y f(x k ) x Δx xk Δx Figure 5. using the fact that the function f (x) determines the radius. we show the sphere dissected into a set of disks. ∆x. where −R ≤ xk ≤ R. it generates a sphere. so for example. a diagram is usually quite helpful. y 2 + x2 = R2 ) is y = f (x) = R2 − x2 . These are just integer multiples of the slice thickness ∆x.5. x0 = −R. x1 = −R + ∆x.5. A function that describe a semicircle (i. so sometimes we may write this as rk = yk = f (xk ) . we show one disk generated by the revolution of the shaded rectangle. xk = −R + k∆x . When the semicircle (on the left) is rotated about the x axis. In setting up these examples.9. The disks are lined up along the x axis with coordinates xk .. Indeed. each of width ∆x. we have Vk = π[f (xk )]2 ∆x. the top half of the circle. We can think of a sphere of radius R as a solid whose outer surface is formed by rotating a semicircle about its long axis. On the right. The volume of the k’th disk is 2 Vk = πrk ∆x. Volumes of solids of revolution 93 (By this we mean the function that speciﬁes the curve that forms the surface of revolution. 22 Note that the radius is oriented along the y axis.e.9. .. In Figure 5. The radius of the disk depends on its position22.) We also pay attention to the dimension that forms the disk thickness. and others will be revolved about the y axis. By the above remarks. Some of our examples will involve surfaces revolved about the x axis..
k as ∆x → 0. where the radius. this sum becomes a deﬁnite integral. The volume of each disk is Vdisk = πr2 ∆y. .10. obtaining x3 Vsphere = π R2 x − 3 R . We often use such symmetry properties to simplify computations.94Chapter 5. Hence disks are stacked up along the y axis to approximate this volume. This accounts for the dy in the integral below. k The total volume of all the disks is V = k Vk = k π[f (xk )]2 ∆x = π k (R2 − x2 )∆x. we will get a paraboloid. and length 2 Vk = π R2 − x2 k ∆x = π(R2 − x2 )∆x. If we rotate this curve about the y axis. 3 Example 2: Volume of a paraboloid Consider the curve y = f (x) = 1 − x2 . We start the summation at x = −R and end at xN = R since the semicircle extends from x = −R to x = R. x3 Vsphere = 2π R2 x − 3 R 0 = 2π R3 − R3 3 . Solution The object has the y axis as its axis of symmetry. mass. we arrive at the familiar formula 4 Vsphere = πR3 . −R Observe that this is twice the volume obtained for the interval 0 < x < R. After simpliﬁcation. In this section we show how to compute the volume by dissecting into disks stacked up along the y axis. Applications of the deﬁnite integral to calculating volume. This means that the width of each disk is ∆y. r is now in the direction parallel to the x axis. as shown in Figure 5. Thus we must express radius as r = x = f −1 (y). We compute this integral using the Fundamental Theorem of calculus. Thus R Vsphere = R π[f (xk )]2 dx = π R R (R2 − x2 ) dx. and represents the true volume.
with thickness ∆x → dx. Solution (a) If we rotate this curve about the x axis.5. The shape extends from a smallest value of y = 0 up to y = 1. i. 0≤x≤1 .5. Example 3 Find the volume of the surface formed by rotating the curve y = f (x) = (a) about the x axis. 2 The above example was set up using disks. but here either method works equally well. It is helpful to note that once we have identiﬁed the thickness of the disks (∆y).10.e. we invert the relationship to obtain x as a function of y. In Appendix 11. and with x in the range 0 ≤ x ≤ 1. Volumes of solids of revolution 95 y y=f(x)=1−x2 y x Figure 5. We compute 1 V =π 0 (1 − y) dy = π y − y2 2 1 0 =π 1− 1 2 = π . one method is preferable to another.e. From y = 1 − x2 √ have we √ x2 = 1 − y so x = 1 − y.4 we show yet another method. we are guided to write an integral in terms of the variable y. The √ x. The curve that generates the shape of a paraboloid (left) and the shape of the paraboloid (right). The radius of a disk at height y is therefore r = x = 1 − y. However. In some cases. to reformulate the equation describing the curve. (b) about the y axis. comprised of cylindrical shells to compute the volume of a cone. i. r = y = f (x). there are other options. dissecting this surface leads to disks stacked along the x axis.e. we obtain a bowl shape. Thus the volume is 1 1 x V =π 0 [f (y)]2 dy = π 0 [ 1 − y]2 dy. with radii in the y direction. i.
∆y: ∆ ∆ = ∆x2 + ∆y 2 = 2 = ∆x2 + ∆y 2 . we consider a simple example. the length of the line joining those points is d= (x2 − x1 )2 + (y2 − y1 )2 .6 Length of a curve: Arc length Analytic geometry provides a simple way to compute the length of a straight line segment. Applications of the deﬁnite integral to calculating volume. and length volume will thus be 1 V =π 0 [f (x)]2 dx = π 0 1 √ [ x]2 dx = π 0 1 x dx = π x2 2 1 = 0 π . i. we are interested in calculating the length of such curves. The idea of dissection also applies to the problem of determining the length of a curve. we see the general idea of subdividing a curve into many small “arcs”. In the triangle shown. and placed points on the curve at each of these x values. y1 ) and P2 = (x2 .11. We will approximate this curve by a set of line segments. We now consider a curve given by some function y = f (x) a < x < b. Before we look in detail at this construction. 1+ ∆y 2 ∆x2 ∆x = 1+ ∆y ∆x 2 ∆x. and determine the lengths of those segments.11(a).11(b). based on the distance formula23. as shown in Figure 5. and radii in the x direction. given points P1 = (x1 . (The total length 23 The reader should recall that this formula is a simple application of Pythagorean theorem.96Chapter 5. shown in Figure 5. mass.12. The disks are stacked along the y axis. We connect the points with straight line segments. 2 (b) When the curve is rotated about the y axis. but in many cases. it forms a surface with a sharp point at the origin.e. Things are more complicated for “curves” that are not straight lines. We must rewrite the function in the form x = g(y) = y 2 . 0 < y < 1 The volume is then 1 V =π 0 [f (y)]2 dy = π 0 1 [y 2 ]2 dy = π 0 1 y 4 dy = π y5 5 1 = 0 π . To obtain these. Recall that. y2 ). with thickness ∆y → dy. as shown in Figure 5. we have selected some step size ∆x along the x axis. We now use the interval along the y axis. by the Pythagorean theorem we have the length of the sloped side related as follows to the side lengths ∆x. . In this section we describe how this can be done using the deﬁnite integral “technology”. 5 5. In Figure 5.
above.11. Its slope. y = f (x) (at left). but as the subdivision gets ﬁner and ﬁner. shown at right is given by ∆y/∆x.11. and then enlarged. to illustrate the relationship between the arc length and the length of the secant line segment. The basic idea of arclength is to add up lengths ∆l of small line segments that approximate the curve. Length of a curve: Arc length 97 y y=f(x) y y=f(x) y x x y=f(x) Δy Δx x Figure 5. we will arrive at the true total length of the curve. Top: Given the graph of a function. Bottom: a small part of this graph is shown. the length of this . Δy of the segments is only an approximation of the length of the curve.6.5. we draw secant lines connecting points on its graph at values of x that are multiples of ∆x (right).12. According to our remarks. y Δl Δx x Figure 5.) We show one such segment enlarged in the circular inset in Figure 5.
We get L= Example 2 Find an integral that represents the length of the curve that forms the graph of the function y = f (x) = x3 . i.98Chapter 5. √ 5 1 5 dx = √ 5x 5 = 1 √ √ 5[5 − 1] = 4 5. If our curve is given by a function y = f (x) then we can rewrite this as d = 1 + (f (x)) dx. b L= a 1 + (f (x)) dx. . Solution We could ﬁnd the equation of the line and use the distance formula.e. The integral in question is 5 5 1 L= 1 1 + (f (x))2 dx = 1 1 + (−2)2 dx = 5 √ 5 dx. adding up) these small pieces. 2 Thus. Applications of the deﬁnite integral to calculating volume. and length segment is given by ∆ = 1+ ∆y ∆x 2 ∆x. Solution We ﬁnd that dy = f (x) = 3x2 . ∆y → dy and ∆ → 1+ dy dx 2 dx. 2 (5. dx 1 < x < 2. mass. As the step size is made smaller and smaller ∆x → dx. we apply the method of Equation (5. We recognize the ratio inside the square root as as the derivative. But for the purpose of this example.e.4) Example 1 Find the length of a line whose slope is −2 given that the line extends from x = 1 to x = 5. dy/dx. the length of the entire curve is obtained from summing (i.4): we are given that the slope f (x) is 2.
The ﬁnal value of L = 1. L which is just a sum of such values. with points placed on it at each multiple of ∆x. i.1 along the x axis. Let us choose a step size of ∆x = 0. the slopes of the little segments (change in y divided by change in x). In Figure 5. Using the spreadsheet to calculate arclength Most integrals for arclength contain square roots and functions that are not easy to integrate. The segments on the right part of the curve are much more sloped. their spacing along the jaw is important in giving the reptile its famous bite. now that we know the idea behind determining the length of a curve. We will concern ourselves here with how that pattern of teeth is formed as the alligator develops from its embryonic stage to that of an . the integral is 2 2 99 L= 1 1 + (3x2 )2 dx = 1 1 + 9x4 dx. the value of ∆ . Alligator mississippiensis has a set of teeth best viewed at some distance. and from this. The regular arrangement of these teeth.13(b). In Figure 5. ﬁnally the total length L.6. simply because their antiderivatives are difﬁcult to determine. 5. The spreadsheet is a simple tool for doing the necessary summations. (The segments on the left along the original curve are nearly ﬂat. compute the length of each segment ∆ = 1 + (∆y/∆x)2 ∆x and the accumulated length along the curve from left to right.e. we show (in blue) how the lengths of the little straightline segments connecting these points changes across the interval.) We also show (in red) how the total accumulated length L depends on the position x across the interval.6. for the interval 0 ≤ x ≤ 1. As an example. we will not attempt to ﬁnd the actual length. We calculate the function. This function represents the total arclength of the curve y = 1 − x2 . the cumulative sum. and their lengths are thus bigger.6 shows steps in the calculation of the ratio ∆y/∆x. Length of a curve: Arc length Thus.4782 represents the total length of the curve over the entire interval 0 < x < 1. from x = 0 up to a given x value. and.13(a) we show the actual curve y = 1 − x2 . The Table 5. However. At x = 1 this function returns the value y = L. we show here how to calculate the length of the curve y = f (x) = 1 − x2 for 0 ≤ x ≤ 1 using a simple numerical procedure implemented on the spreadsheet. as we must ﬁrst develop tech√ niques for ﬁnding the antiderivative for functions such as 1 + 9x4 . At this point.1 How the alligator gets its smile The American alligator. as it has added up the full length of the curve for 0 ≤ x ≤ 1.5. we can apply the ideas developed have to approximate the length of a curve “numerically”. so their length is very close to ∆x.
0 1. The spreadsheet can be used to compute approximate values of integrals.5 Arc Length cumulative length L y = f(x) =1x^2 length increment 0. mass.0 1. and length 1. Applications of the deﬁnite integral to calculating volume.0 1. and hence to calculate arclength.0 0. together with the length increment and the cumulative arclength along that curve.5 y = f(x) =1x^2 0.13. Shown here is the graph of the function y = f (x) = 1 − x2 for 0 ≤ x ≤ 1. .100Chapter 5.0 0.0 Figure 5.
is shown enlarged in an inset in this ﬁgure.3 0.15(b) we see the same curve. Paul Kulesa. based on data in the literature about what happens at distinct stages of embryonic growth.0663 1.28 (in units not speciﬁed).6. and b = 7. at welldeﬁned positions along the jaw. Shown in Figure 5.9 1. 0 y = f (x) 1.1221 0. A proper choice of coordinate system. set out to understand the pattern of development of these teeth. In Figure 5.2049 0.4782 Table 5.1972 0.3600 0.1640 0.1 0.1803 0.1044 0. and 0 ≤ x ≤ 1.5 1.7500 0. One theory proposed by this group was that chemical signals that diffuse along the jaw at an early stage of development give rise to instructions that are interpreted by jaw cells: where the signal is at a high level. we show how to calculate an approximation to the arclength using the spreadsheet.1 ∆ 0. the teeth on an alligator do not form or sprout simultaneously.1345 0.256.9600 0. 0 0.1118 0.6400 0.2326 L= ∆ 0.0000 ∆y/∆x 0.4 0.1900 0.1 1. we will ﬁnd a simple application of the ideas of arclength in the developmental sequence of teething. a former student of James D Muray.5733 0.5100 0.2635 1.0000 0.9100 0.7 0. A close up of its smile (at an earlier stage of development) reveals the shape of the jaw.8860 1. a tooth will start to initiate.) Paul Kulesa found that the shape of the alligator’s jaw can be described remarkably well by a parabola.4388 0. called primordia.3 1.14 is a smiling baby alligator (no doubt thinking of some future tasty meal). and what mechanisms lead to this pattern of initiation.8 0. For the function y = f (x) = 1 − x2 . Also shown in this curve is a set of points at which teeth are found.9 2.9 1. labelled by order of appearance.0000 0. In the development of the baby alligator.5. Of interest in his research were several questions. As is the case in humans. Length of a curve: Arc length 101 x 0. together with the sites at which teeth are becoming evident.3 0. (One of these sites. While we will not address the details of the mechanism of development here.15(a). one after the other.1 0. and some experimentation leads to the equation of the best ﬁt parabola y = f (x) = −ax2 + b where a = 0.1487 0. there is a sequence of initiation of teeth. including what determines the positions and timing of initiation of individual teeth.7220 0.8400 0. We show this curve in Figure 5. adult.3167 0.5 0.2147 0.5 0. but we have here superimposed the function L(x) given by the arc length along the .2 0.1.7 1.1005 0.1005 0.6 0.7 0.9900 0.
35 2. i. The true situation is a bit more complicated: the jaw grows as the teeth appear as shown in 5.6072 4. . 5. 6.102Chapter 5.00 3. This curve measures distance along the jaw. equally spaced pattern.15(c).00 4.e. the top of the parabola).40 1. 7).45 distance along jaw L(x) 2. 3) are followed by a second wave (4.45 5.15 4. from front to back.40 2.4241 4. (By equally spaced.) The ﬁrst wave (teeth 1. 2.15(c). Applications of the deﬁnite integral to calculating volume. their (x.50 3. Kulesa.7000 7.4884 8.95 3.e. Each wave forms a linear pattern of distance from the front. and the value of L(x) obtained from the arclength formula. where we illustrate only the essential idea of arc length application. and each successive wave ﬁlls in the gaps in a similar. In Figure 5. mass. Tooth number 1 2 3 4 5 6 7 8 9 10 11 12 13 position x 1.60 3.6572 9. and length curve from the front of the jaw (i.1500 6. The distances of the teeth from one another. in the order in which they appear as the alligator develops.54 1. We see from this table that the teeth do not appear randomly. showing the order of the teeth. Data for the appearance of teeth. y) coordinates. The table below gives the original data.1486 4. and so on. nor do they ﬁll in the jaw in one sweep. This has not been taken into account in our simple treatment here.05 0. courtesy of Dr.4000 3.80 5.80 2.65 7.2.05 6.20 1. they appear in several stages.3923 7.2644 Table 5.3705 0. we refer to distance along the parabolic jaw.60 3. we show the pattern of appearance: Plotting the distance along the jaw of successive teeth reveals that the teeth appear in waves of nearly equallyspaced sites.35 4. We can use arclength computations to determine the distances between successive teeth.15 4. Rather.1189 1.25 4.45 4. or along the curve of the jaw can be determined using this curve if we know the x coordinates of their positions.60 0.95 5. x L(x) = 0 1 + [f (s)]2 ds.30 y 6.2052 5.
Length of a curve: Arc length 103 Figure 5. Alligator mississippiensis and its teeth .6.14.5.
0 (c) (d) Figure 5. and length 8.0 teeth in order of appearance 11 13. (a) The parabolic shape of the jaw.0 0. mass. (c) Distance of successive teeth along the jaw.0 Alligator teeth 11 10.0 (b) 13 7 Distance along jaw 3 9 10 6 2 8 5 12 1 4 0. (b) Arc length along the jaw from front to back.0 4 1 arc length L(x) along jaw jaw y = f(x) 5 2 8 12 6 9 3 10 7 13 0.15. .0 6.0 0.5 (a) 10.0 0. (d) Growth of the jaw. Applications of the deﬁnite integral to calculating volume.0 5.104Chapter 5.0 6. showing positions of teeth and numerical order of emergence.
29272932. Kulesa (1996). (5. S. 3.2 References 1.. The mean is an average x coordinate.5. J. We introduced the idea of a spatially distributed mass density ρ(x) in Section 5. Journal of Chemical Physics. 10. 4. The mean (but not the median) is inﬂuenced more strongly by distant portions of the distribution.6) are ρ(x)dx. whereas the analogous quantities in (5. P. P. They differ for any distribution that is asymmetric. Ferguson and J. 44. Cover Article. G.] 3. J. (5.7 Summary Here are the main points of the chapter: 1. the quantities mi in the sum (5. No.D. Lubkin.5) X= M i=0 We developed the analogue of this for a continuous mass distribution (distributed in the interval 0 ≤ x ≤ L).D.R. Vol. 153164. Chem. Modelling the Wavelike Initiation of Teeth Primordia in the Alligator.5) carry units of mass. [Recall that ρ(x) is a mass per unit length in the case of mass distributed along a bar or straight line.C. 2. the median subdivides the graph of the distribution into two regions of equal areas). 92 (16). Soc. Cruywagen. whereas the median is the x coordinate that splits the distribution into two equal masses (Geometrically.2.6.W. Kulesa.M. We deﬁned the center of mass of a continuous distribution by the deﬁnite integral L 1 x= ¯ xρ(x)dx . . 5.6) M 0 Importantly.7. we deﬁned the center of mass of a (discrete) distribution of n masses by n 1 ¯ xi mi . We deﬁned a cumulative function. M.2. Acta Biotheoretica.M. Modelling the Spatial Patterning of Teeth Primordia in the Alligator. In the discrete case. On A Dynamic ReactionDiffusion Mechanism for the Spatial Patterning of Teeth Primordia in the Alligator. 2. this was deﬁned as In the continuous case. Summary 105 5. FORMA. it is x M (x) = 0 ρ(s)ds.D. In this chapter. Kulesa and J. The mean and median are the same only in symmetric distributions. Murray and P.M. 3.J.. Murray (1996). 259280. Here the deﬁnite integral represents b a ρ(x) dx = Total mass in the interval a ≤ x ≤ b. Murray (1995). Faraday Trans.
and length 5. we ﬁnd applications of the ideas of density and center of mass to the context of a probability distribution and its mean. We showed that if the surface is generated by rotating the graph of a function y = f (x) about the x axis (for a ≤ x ≤ b). .106Chapter 5. We used this idea to show that the volume of a sphere of radius R is Vsphere = (4/3)πR3 In the Chapters 7 and 8. mass. then its volume can be described by an integral of the form b V = a π[f (x)]2 dx. In the later parts of this chapter. we showed how to compute volumes of various objects that have radial symmetry (“solids of revolution”). Applications of the deﬁnite integral to calculating volume.
We refer to these collected “tricks” as methods of integration. Consider a straight line y = mx + b. A review of the table of elementary antiderivatives (found in Chapter 3) will be useful. guesswork and trial and error is an important part of the process.e. 107 . A primary method of integration to be described is substitution. reducing our need to dwell on these technical methods. A second very important method is integration by parts.e. we expand our repertoire for antiderivatives beyond the “elementary” functions discussed so far. Nowadays. Aside from its usefulness in integration per se. Many other techniques of integration used to form a core of methods taught in such courses in integral calculus. A close relationship exists between the chain rule of differential calculus and the substitution method. 6. in some cases. Many of these are quite technical. with sophisticated mathematical software packages (including Maple and Mathematica). this method has numerous applications in physics.1 Differential notation We begin by familiarizing the reader with notation that appears frequently in substitution integrals. and other sciences. integration can be carried our automatically via computation called “symbolic manipulation”. whereas in other cases. m. mathematics. these methods are systematic (i. with clear steps). differential notation. is m= change in y ∆y = . i. Recall that the slope of the line. change in x ∆x This relationship can also be written in the form ∆y = m∆x.Chapter 6 Techniques of Integration In this chapter. Here we will discuss a number of methods for ﬁnding antiderivatives. As will be shown.
Techniques of Integration y Δy Δx x Figure 6. The relationship between the step y+Δ y y secant y+dy y tangent y=f(x) x x+ Δx x x+dx Figure 6.1.108 Chapter 6. then the resulting change in the y direction. If we take a very small step along this line in the x direction.2).2. call it dx (to remind us of an “inﬁnitesimally small” quantity).For a given point (x. On this ﬁgure.2 i. dy = mt dx = f (x)dx. Now suppose that we have a curve deﬁned by some arbitrary function. sizes is: ∆y = ms ∆x. mt as shown in Figure 6. (call it dy) is related by dy = m dx. mt to the curve. . the graph of some function is used to illustrate the connection between differentials dy and dx. Note that these are related via the slope of a tangent line. This means that the small quantities ∆y and ∆x are related by ∆y = m∆x. The slope of the line shown here is m = ∆y/∆x. y) on this curve.e. If the sizes of the steps are small (dx and dy). then this relationship is well approximated by the slope of the tangent line. a step ∆x in the x direction is associated with a step ∆y in the y direction. y = f (x) which need not be a straight line. in contrast with the relationship of ∆y and ∆x which stems from the slope of the secant line ms on the same curve. where now ms is the slope of a secant line (shown connecting two points on the curve in Figure 6.
their derivatives. dy = f (x).e. We might observe that the ratio of the differentials. and the differential notation that goes with them. we use just the symbol df to mean the same thing. d(xn ) = nxn−1 dx. though. ∆x dx This notation will be useful in substitution integrals.2).6. d(arctan(x)) = 1 dx. Examples We give some examples of functions. 3. The function y = f (x) = x3 has derivative f (x) = 3x2 . ∆x When the step size ∆x is quite small. 2. d(sin(x)) = cos(x) dx. We give a list of some of these elementary rules below. i. . The function y = f (x) = ln(x) has derivative f (x) = dy = 1 dx. they link a small step on the x axis with the resulting small change in height along the tangent line to the curve (shown in Figure 6. dx appears to link our result to the deﬁnition of the derivative. 1. that the derivative is actually deﬁned as a limit: f (x) = lim ∆x→0 ∆y .1. We remember. The following examples illustrate this idea with speciﬁc functions. Therefore dy = sec2 (x) dx. Thus dy = 3x2 dx. Differential notation 109 The quantities dx and dy are called differentials. x 1 x so With some practice. some of the basic rules of differentiation translate directly into rules for handling and manipulating differentials. Given a function y = f (x) we will often write df df (x) = dx dx and occasionally. we can omit the intermediate step of writing down a derivative and go directly from function to differential notation. In general. it is approximately true that ∆y dy ≈ . The function y = f (x) = tan(x) has derivative f (x) = sec2 (x). 1 + x2 Moreover.
dx 2. we also say that F (x) (or G(x)) is the integral of the function f (x). we have d d F (x) = G(x). This notation is sometimes called “an indeﬁnite integral” because it does not denote a speciﬁc numerical value. dx dx d(u + v) = du + dv. which represents a number. An indeﬁnite . F (x) and G(x). F (x) and G(x) are called antiderivatives of f (x). Proof Since F (x) and G(x) have the same derivative.2 Antidifferentiation and indeﬁnite integrals In Chapters 2 and 3. then they differ at most by a constant. Thus F (x) − G(x) = C. 1. d du dv (u(x) + v(x)) = + dx dx dx d dv du u(x)v(x) = u +v dx dx dx d du (Cu(x)) = C . and we refer to f (x) as the integrand. that is F (x) = G(x) + C. have the same derivative. which is a function. d(uv) = u dv + v du. so F (x) = G(x) + C. dC = 0. nor is an interval speciﬁed for the integration range. If two functions. since its derivative is zero.110 Chapter 6. 4. d(Cu) = C du 6. We write this as follows: F (x) = f (x) dx. dx dx d (F (x) − G(x)) = 0. namely an antiderivative. that any two antiderivatives differ at most by a constant. 3. once more. as required. Techniques of Integration Rules for derivatives and differentials d C = 0. dx This means that the function F (x) − G(x) should be a constant. It will be useful here to consider the idea of an indeﬁnite integral. which means the same thing. where C is some constant. In another terminology. we deﬁned the concept of the deﬁnite integral. say f (x). dx dx d d F (x) − G(x) = 0. and this conﬁrms.
3.) We also write f (x) dx = F (x) + C. in terms of areas associated with curves. we speciﬁed an interval. a function) is the derivative of the function f (x). The following examples illustrate the idea with several elementary functions. itself.1 Integrals of derivatives Suppose we are given an integral of the form df dx. d(x3 ) = x3 + C. df dx = f (x) + C. 6. and interpreted the result. if we want to indicate the form of all possible functions that are antiderivatives of f (x).3. according to the Fundamental Theorem of Calculus. The idea rests on the fact that in some cases. dx We can write this same result using the differential of f . a number.6. Then. we can spot a “helper function” u = f (x). d(cos x) = cos x + C.2. dx or alternately. How do we handle this? We reason as follows. 6. and using our familiarity with derivatives (and the chain rule). The df /dx (a quantity that is. the same thing written using differential notation. . Examples 1. we observe that the forms of some integrals can be simpliﬁed by making a judicious substitution.3 Simple substitution In this section. 2. (Contrast this with the deﬁnite integral studied in our last chapters: in the case of the deﬁnite integral. C is referred to as a constant of integration. Simple substitution 111 integral is a function with an arbitrary constant. dv = v + C. That means that f (x) is the antiderivative of df /dx. as follows: df = f (x) + C. df.
In that case. we converted the result back to the original variable. We could ﬁnd an antiderivative by expanding the integrand (x + 1)10 into a degree 10 polynomial and using methods already known to us. then du = d(x + 1) dx = dx dx d(1) + dx dx dx = (1 + 0)dx = dx. A very important point to remember is that we can always check our results by differentiation: Check Differentiate F (x) to obtain dF 1 = (11(x + 1)10 ) = (x + 1)10 . that if we deﬁne u = (x + 1). Techniques of Integration du = f (x)dx appears in the integrand. however. Let us observe. but this would be laborious. Now replacing (x + 1) by u and dx by the equivalent du we get: F (x) = u10 du.112 such that the quantity Chapter 6.1 Example: Simple substitution Suppose we are given the function f (x) = (x + 1)10 . and included the arbitrary integration constant. Then its antiderivative (indeﬁnite integral) is F (x) = f (x) dx = (x + 1)10 dx.3. and simpliﬁcation may occur. 11 11 In the last step. the substitution will lead to eliminating x entirely in favour of the new quantity u. dx 11 . F (x) = u11 (x + 1)11 = + C. An antiderivative to this can be easily found. namely. 6.
I = 0 2 dx. I = 2. x+2 1 x2 ex dx 3 3. (x + 1)2 + 1 . 6. Consider the example: 2 I= 1 1 dx .3. 2 In the last steps we have plugged in the new endpoints (appropriate to u). Method 2: Change back to x before evaluating at endpoints Alternately. I = 1 dx.3 Examples: Substitution type integrals Find a simple substitution and determine the antiderivatives (indeﬁnite or deﬁnite integrals) of the following functions: 1. the endpoint x = 1 corresponds to u = 2. i. x+1 This integration can be done by making the substitution u = x + 1 for which du = dx. and the endpoint x = 2 corresponds to u = 3. We can handle the endpoints in one of two ways: Method 1: Change the endpoints We can change the integral over entirely to a deﬁnite integral in the variable u as follows: Since u = x + 1.2 How to handle endpoints We consider how substitution type integrals can be calculated when we have endpoints.e. 2 1 1 dx = ln x + 1 x+1 2 = ln 1 3 2 Here we plugged in the original endpoints (as appropriate to the variable x). in evaluating deﬁnite integrals. 1 du = ln u = ln x + 1 u and then evaluate this function at the original endpoints.3. Simple substitution 113 6.6.3. so changing the endpoints to reﬂect the change of variables leads to 3 I= 2 1 du = ln u u 3 2 3 = ln 3 − ln 2 = ln . we could rewrite the antiderivative in terms of x.
x. With this substitution we get I= 5. b + ax2 2 dx. Then du = (2x + 6) dx = √ du 1 u = 2 2 π 2(x + 3) dx.114 4. Let u = x2 + 6x + 10. before plugging in the original endpoints. Let u = cos(x). 4 . For x = 0. I = 1 du = arctan(u) = arctan(x + 1) + C. Here we use method 2 for eu du 1 1 3 = eu = ex + C. 2 3/2 3 3 method 1 for handling endpoints. I = Solutions 1. 6. Then I= 1 1 x3 e = (e − 1). then du = dx so we have (x + 1)2 + 1 I= 4. u = cos π = −1.) x2 ex dx = 3. I = 7. I = 5. Let u = x + 1. u2 + 1 1 1 (x + 3) x2 + 6x + 10 dx. I = 1 dx ax + b 1 dx. Techniques of Integration (x + 3) x2 + 6x + 10 dx. Then du = dx and we get x+2 I= 1 2 du = 2 u 1 du = 2 ln u = 2 ln x + 2 + C. Then du = − sin(x) dx. I = 1 dx. I = 0 Chapter 6. u 2. 3 3 0 0 (We converted the antiderivative to the original variable. Let u = x3 . π cos3 (x) sin(x) dx. u = cos 0 = 1 and for x = π. Let u = x + 2. so changing the integral and endpoints to u leads to −1 cos3 (x) sin(x) dx. I = 0 x2 ex dx. Then du = 3x2 dx. Here we use I= 1 u3 (−du) = − u4 4 −1 1 1 = − ((−1)4 − 14 ) = 0. 3 3 3 3 3 handling endpoints. I = 0 u1/2 du = 1 u3/2 1 1 = u3/2 = (x2 + 6x + 10)3/2 + C.
u a Substitute u = ax + b back to arrive at the solution I= 1 1 dx = ln ax + b ax + b a (6. so the expression obtained contains a mixture of two variables.6. ba ba 6.3.1) 7. so u = a/b x and du = a/b dx. We can proceed no further. This substitution did not simplify the integral and we must try some other technique. A “reasonable” guess for substitution might be u = (1 + x2 ). This can be brought to the form of 2 b + ax b 1 + (a/b)x2 an arctan type integral as follows: Let u2 = (a/b)x2 . Attempting to convert the integral to the form containing u would lead to √ du . 6. Then du = a dx. Then du = 2x dx. . I = 1 1 1 dx = dx. I= u 2x We have not succeeded in eliminating x entirely. so dx = du/a.4 When simple substitution fails Not every integral can be handled by simple substitution. Now substituting these. 115 1 dx. we get I= 1 b 1 1 + u2 du a/b = b/a 1 b 1 du 1 + u2 1 1 I = √ arctan(u) du = √ arctan( a/b x) + C. Substitute the ax + b above equations into the ﬁrst equation and simplify to get I= 1 du 1 = u a a 1 1 du = lnu + C.3. and dx = du/2x. Let u = ax + b. Simple substitution Here we plugged in the new endpoints that are relevant to the variable u. Let us see what could go wrong: Example: Substitution that does not work Consider the case F (x) = 1 + x2 dx = (1 + x2 )1/2 dx.
For example. n+1 1 du = arctan(u) + C. The idea is to reduce each one to the form of an elementary integral.) However. I = 3. suppose that (in the previous example) we had incorrectly guessed that the antiderivative of (1 + x2 )1/2 dx 1 (1 + x2 )3/2 . (We can similarly check to conﬁrm correctness of any antiderivative found by following steps of methods here described. I = 2.5 Checking your answer Finding an antiderivative can be tricky.1 Example: perfect square in denominator Find the antiderivative for I= x2 1 dx. ﬁnding which of these forms is appropriate in a given case will take some ingenuity and algebra skills.3. 1 + u2 However. whose antiderivative is known.4 More substitutions In some cases. This can help uncover errors.116 Chapter 6. We give some examples in this section. I = 1 du = ln u + C.) might be 6. since an “extra” factor of 2x appears from application of the chain rule: this means that the trial function Fguess (x) was not the correct antiderivative.4. 6. it is always possible (and wise) to check for correctness. Techniques of Integration 6. methods described in this chapter are a “collection of tricks”. 3/2 The following check demonstrates the incorrectness of this guess: Differentiate Fguess (x) to obtain 1 Fguess (x) = (3/2)(1 + x2 )(3/2)−1 · 2x = (1 + x2 )1/2 · 2x 3/2 Fguess (x) = The result is clearly not the same as (1 + x2 )1/2 . (To a large extent. u un du = un+1 . and recognition of patterns plays an important role here. Standard integral forms 1. This can help to uncover sign errors and other algebraic mistakes. − 6x + 9 . by differentiating the result. rearrangement is needed before the form of an integral becomes apparent. Integration tends to be more of an art than differentiation.
1 − x2 In this case. i. (1 − x)(1 + x) . I= x2 − 6x + 10 Solution Here we use “completing the square” to express the denominator in the form x2 −6x+10 = (x − 3)2 + 1. we can factor the denominator to obtain I= 1 dx.2 Example: completing the square A small change in the denominator will change the character of the integral. that x2 − 6x+ 9 = (x − 3)2 . 1 + (x − 3)2 Now a substitution u = (x − 3) and du = dx will result in I= 1 du = arctan(u) = arctan(x − 3) + C. Replacing this in the integral. (x − 3)2 Now making the substitution u = (x − 3).4. we use the technique illustrated in Example 6. More substitutions Solution 117 We observe that the denominator of the integrand is a perfect square. as shown by this example: 1 dx.e. 6.4.3 Eqn. Consider I= 1 dx.3. Then the integral takes the form I= 1 dx. 1 + u2 Remark: in cases where completing the square gives rise to a constant other than 1 in the denominator.1) to simplify the problem. (x − 3) 6.3 Example: factoring the denominator A change in one sign can also lead to a drastic change in the antiderivative. we obtain I= 1 dx = x2 − 6x + 9 1 dx. (6.4.6. and du = dx leads to a power type integral I= 1 du = u2 u−2 du = −u−1 = − 1 + C.
118
Chapter 6. Techniques of Integration
where A, B are constants. The algebraic technique for ﬁnding these constants, and hence of forming the simpler expressions, called Partial fractions, will be discussed in an upcoming section. Once these constants are found, each of the resulting integrals can be handled by substitution.
We will show shortly that the integrand can be simpliﬁed to the sum of two fractions, i.e. that A B 1 dx = + dx, I= (1 − x)(1 + x) (1 − x) (1 + x)
6.5 Trigonometric substitutions
Trigonometric functions provide a rich set of interconnected functions that show up in many problems. It is useful to remember three very important trigonometric identities that help to simplify many integrals. These are: Essential trigonometric identities 1. sin2 (x) + cos2 (x) = 1 2. sin(A + B) = sin(A) cos(B) + sin(B) cos(A) 3. cos(A + B) = cos(A) cos(B) − sin(A) sin(B). In the special case that A = B = x, the last two identities above lead to: Double angle trigonometric identities 1. sin(2x) = 2 sin(x) cos(x). 2. cos(2x) = cos2 (x) − sin2 (x). From these, we can generate a variety of other identities as special cases. We list the most useful below. The ﬁrst two are obtained by combining the doubleangle formula for cosines with the identity sin2 (x) + cos2 (x) = 1. Useful trigonometric identities 1. cos2 (x) = 2. sin2 (x) = 1 + cos(2x) . 2 1 − cos(2x) . 2
3. tan2 (x) + 1 = sec2 (x).
6.5.1 Example: simple trigonometric substitution
Find the antiderivative of I= sin(x) cos2 (x) dx.
6.5. Trigonometric substitutions Solution
119
This integral can be computed by a simple substitution, similar to Example 5 of Section 6.3. We let u = cos(x) and du = − sin(x)dx to get the integral into the form I =− u2 du = − cos3 (x) −u3 = + C. 3 3
We need none of the trigonometric identities in this case. Simple substitution is always the easiest method to use. It should be the ﬁrst method attempted in each case.
6.5.2 Example: using trigonometric identities (1)
Find the antiderivative of I= Solution This is an example in which the “Useful trigonometric identity” 1 leads to a simpler integral. We write I= Then clearly, I= 1 2 x+ sin(2x) 2 + C. cos2 (x) dx = 1 + cos(2x) 1 dx = 2 2 (1 + cos(2x)) dx. cos2 (x) dx.
6.5.3 Example: using trigonometric identities (2)
Find the antiderivative of I= Solution We can rewrite this integral in the form I= sin2 (x) sin(x) dx. sin3 (x) dx.
Now using the trigonometric identity sin2 (x) + cos2 (x) = 1, leads to I= This can be split up into I= sin(x) dx − sin(x) cos2 (x) dx. (1 − cos2 (x)) sin(x) dx.
120
Chapter 6. Techniques of Integration
The ﬁrst part is elementary, and the second was shown in a previous example. Therefore we end up with cos3 (x) I = − cos(x) + + C. 3 Note that it is customary to combine all constants obtained in the calculation into a single constant, C at the end. Aside from integrals that, themselves, contain trigonometric functions, there are other cases in which use of trigonometric identities, though at ﬁrst seemingly √ unrelated, is cru√ cial. Many expressions involving the form 1 ± x2 or the related form a ± bx2 will be simpliﬁed eventually by conversion to trigonometric expressions!
6.5.4 Example: converting to trigonometric functions
Find the antiderivative of I= Solution The simple substitution u = 1 − x2 will not work, (as shown by a similar example in Section 6.3). However, converting to trigonometric expressions will do the trick. Let x = sin(u), then dx = cos(u)du. 1 − x2 dx.
(In Figure 6.3, we show this relationship on a triangle. This diagram is useful in reversing the substitutions after the integration step.) Then 1 − x2 = 1 − sin2 (u) = cos2 (u), so the
1 u 1−x 2
x
Figure 6.3. This triangle helps to convert the (trigonometric) functions of u to the original variable x in Example 6.5.4. substitutions lead to I= cos2 (u) cos(u) du = cos2 (u) du.
From a previous example, we already know how to handle this integral. We ﬁnd that I= 1 2 u+ sin(2u) 2 = 1 (u + sin(u) cos(u)) + C. 2
6.5. Trigonometric substitutions
121
(In the last step, we have used the double angle trigonometric identity. We will shortly see why this simpliﬁcation is relevant.) We now desire to convert the result back to a function of the original variable, x. We note that x = sin(u) implies u = arcsin(x). To convert the term cos(u) back to an expression depending on x we can use the relationship 1 − sin2 (u) = cos2 (u), to deduce that cos(u) = 1 − sin2 (u) = 1 − x2 .
It is sometimes helpful to use a Pythagorean triangle, as shown in Figure 6.3, to rewrite the antiderivative in terms of the variable x. The idea is this: We construct the triangle in such a way that its side lengths are related to the “angle” u according to the substitution rule. In this example, x = sin(u) so the sides labeled x and 1 were chosen so that their ratio (“opposite over hypotenuse” coincides with the sine of the indicated angle, u, thereby satisfying x = sin(u). We can then determine the length of the third leg of the triangle (using the Pythagorean formula) and thus all other trigonometric functions of u. For example, we note that the ratio of “adjacent over hypotenuse” is cos(u) = √ √ 1 − x2 /1 = 1 − x2 . Finally, with these reverse substitutions, we ﬁnd that, I= 1 − x2 dx = 1 arcsin(x) + x 1 − x2 + C. 2
Remark: In computing a deﬁnite integral of the same type, we can circumvent the need for the conversion back to an expression involving x by using the appropriate method for handling endpoints. For example, the integral
1
I=
0
1 − x2 dx
can be transformed to
π/2
I=
0
cos2 (u) cos(u) du,
by observing that x = sin(u) implies that u = 0 when x = 0 and u = π/2 when x = 1. Then this means that the integral can be evaluated directly (without changing back to the variable x) as follows:
π/2
I=
0
cos2 (u) cos(u) du =
1 2
u+
sin(2u) 2
π/2
=
0
1 2
π sin(π) + 2 2
=
π 4
where we have used the fact that sin(π) = 0. Some subtle points about the domains of deﬁnition of inverse trigonometric functions will not be discussed here in detail. (See material on these functions in a ﬁrst term calculus course.) Sufﬁce it to say that some integrals of this type will be undeﬁned if this endpoint conversion cannot be carried out (e.g. if the interval of integration had been 0 ≤ x ≤ 2, we would encounter an impossible relation 2 = sin(u). Since no value of u satisﬁes this relation, such a deﬁnite integral has no meaning, i.e. “does not exist”.)
122
Chapter 6. Techniques of Integration
6.5.5 Example: The centroid of a two dimensional shape
We extend the concept of centroid (center of mass) for a region that has uniform density in 2D, but where we consider the distribution of mass along the x (or y) axis. Consider the semicircle shape of uniform thickness, shown in Figure 6.4, and suppose it is balanced along its horizontal edge. Find the x coordinate c at which the shape balances. ¯
y y= 9 − x 2
x
Figure 6.4. A semicircular shape.
Solution The semicircle is one quarter of a circle of radius 3. Its edge is described by the equation y = f (x) = 9 − x2 .
We will assume that the density per unit area is uniform. However, the mass per unit length along the x axis is not uniform, due to the shape of the object. We apply the idea of integration: If we cut the shape at increments of ∆x along the x axis, we get a collection of pieces whose mass is each proportional to f (x)∆x. Summing up such contributions and letting the widths ∆x → dx get small, we arrive at the integral for mass. The total mass of the shape is thus
3 3
M=
0
f (x) dx =
0
9 − x2 dx.
Furthermore, if we compute the integral
3 3
I=
0
xf (x) dx =
0
x 9 − x2 dx,
we obtain the x coordinate of the center of mass, x= ¯ I . M
It is evident that the mass is proportional to the area of one quarter of a circle of radius 3: M= 1 9 π(3)2 = π. 4 4
y 2 + x2 = 9.5. Then du = −2x dx. This integral looks identical to the one we wrote down for x. so x = f (y) = 3 y= ¯ 0 y 9 − y 2 dy. −2 1 2 u3/2 3/2 9 We can reverse the endpoints if we switch the sign. we would express the boundary of the shape in the form x = f (y) and integrate to ﬁnd 3 y= ¯ 0 yf (y) dy. 1 + x2 dx.5. 0 Since 93/2 = (91/2 )3 = 33 . The endpoints are converted as follows: x = 0 ⇒ u = 9 − 02 = 9 and x = 3 ⇒ u = 9 − 32 = 0 so that we get the integral 0 I= 9 √ 1 u du.) The second integral can be done by simple substitution. . Consider 3 3 I= 0 xf (x) dx = 0 x 9 − x2 dx. and this leads to I= 1 2 9 u1/2 du = 0 . Let u = 9 − x2 .6 Example: tan and sec substitution Find the antiderivative of I= Solution We aim for simpliﬁcation by the identity 1 + tan2 (u) = sec2 (u). we get I = (33 )/3 = 32 = 9. so we set x = tan(u). 9 − y 2 . Thus For the semicircle. based on this similarity ¯ (or based on the symmetry of the problem) we will ﬁnd that y= ¯ 4 .6. Thus the x coordinate of the center of mass is I 9 4 x= ¯ = = . π 6. Trigonometric substitutions 123 (We could also see this by performing a trigonometric substitution integral. Thus. M (9/4)π π We can similarly ﬁnd the y coordinate of the center of mass: To do so. dx = sec2 (u)du.
In this example. 6. 1+x 2 u 1 x Figure 6. (6. As in Figure 6.5.6.5 shows the relationship between x and u and will help to convert other trigonometric functions of u to functions of x. suggests breaking up the integrand into the form A B 1 = + . as shown in Example 6.6 Partial fractions In this section. We give several examples below. we show a simple algebraic trick that helps to simplify an integrand when it is in the form of some rational function such as f (x) = 1 .3 but for example 6. −1 Factoring the denominator. Techniques of Integration sec2 (u) sec2 (u) du = sec3 (u) du. B such that A B 1 = + .6.124 Then the substitution leads to I= 1 + tan2 (u) sec2 (u) du = Chapter 6.5.1 Example: partial fractions (1) Find the antiderivative of I= x2 1 .3. 6.5.3.1). (ax + b)(cx + d) (ax + b) (cx + d) Each part can then be handled by a simple substitution. x2 − 1 = (x − 1)(x + 1). the triangle shown in Figure 6. Eqn. 2−1 x (x + 1) (x − 1) . and will be partly calculated by Integration by Parts in Appendix 11. (ax + b)(cx + d) The idea is to break this up into simpler rational expressions by ﬁnding constants A. This integral will require further work.
convenient values of x for determining the constants are x = 0. B. namely x = 1 and x = −1 leads to isolating one or the other unknown constants. Choosing two “easy” values.2 Example: partial fractions (2) Find the antiderivative of I= 1 dx. We ﬁnd that A = 1. x(1 − x) This example is similar to the previous one. and u = x − 1 for the second integral. x2 − 1 x2 − 1 This means that 1 = A(x − 1) + B(x + 1) 125 must be true for all x values.6. We set 1 A B = + . In particular. 1. Partial fractions The two sides are equal provided: A(x − 1) + B(x + 1) 1 = .6. 1 = 2B. with the results: 1 = −2A.) The result is 1 1 = (− ln x + 1 + ln x − 1) + C. . B = 1. This must hold for all x values. so the integral can be written in the simpler form I= 1 2 −1 dx + (x + 1) 1 dx . (Let u = x + 1 for the ﬁrst. I= 2−1 x 2 6. We now ask what values of A and B make this equation hold for any x. x(1 − x) x (1 − x) Then 1 = A(1 − x) + Bx. Thus B = 1/2. (x − 1) (A common factor of (1/2) has been taken out.) Now a simple substitution will work for each component.6. Thus I= 1 dx = x(1 − x) 1 dx + x 1 dx. A. A = −1/2. 1−x Simple substitution now gives I = ln x − ln 1 − x + C.
It also has independent theoretical stature in many applications in mathematics and physics. x2 + x − 2 3 3 Find the antiderivative of Another example of the technique of partial fractions is provided in Appendix 11. Suppose that u(x) and v(x) are two differentiable functions.6. We write this result in the more suggestive form u dv = uv − v du. Then we know that the derivative of their product is d(uv) du dv =v +u .3 Example: partial fractions (3) x . x2 + x − 2 (x − 1) (x + 2) Consequently. . −2 into this leads to A = 1/3 and B = 2/3. The idea rests on the product rule for derivatives. Integrating both sides.7 Integration by parts The method described in this section is important as an additional tool for integration.126 Chapter 6. The usual procedure then results in I= x 1 2 = ln x − 1 + ln x + 2 + C. it follows that I= x2 A(x + 2) + B(x − 1) = x. v du + u dv The idea here is that if we have difﬁculty evaluating an integral such as u dv. This is best illustrated by the examples below. Substituting the values x = 1. Techniques of Integration 6. 6. we may be able to “exchange it” for a simpler integral in the form v du.5. The essential idea is that in some cases. +x−2 The rational expression above factors into x2 + x − 2 = (x − 1)(x + 2). we can exchange the task of integrating a function with the job of differentiating it. leading to the expression x A B = + .2. we obtain d(uv) = i. dx dx dx or.e. in the differential notation: d(uv) = v du + u dv. uv = v du + u dv.
This is a wiser choice because when we differentiate u. This suggests that our ﬁrst attempt was not a helpful one. Then du = (1/x) dx and v = x. du = dx. We now evaluate this result at the endpoints to obtain 2 2 I= 1 ln(x) dx = (x ln(x) − x) 1 = (2 ln(2) − 2) − (1 ln(1) − 1) = 2 ln(2) − 1. Then du = ex dx and v = x2 /2.6. Note that all parts of the expression are evaluated at the two endpoints.) Let u = x and dv = ex dx. not easier to integrate.) Example: Integration by parts (2) Compute 1 I= 0 xex dx. we reduce the power of x (from 1 to 0). it may be hard to decide how to assign roles for u and dv. Suppose we try u = ex and dv = xdx. and is consequently harder. Indeed. This means that we would get the integral in the form x2 x x2 x e − e dx. because the integral we obtain has a higher power of x. To ﬁnd a deﬁnite integral of this kind on some interval (say 0 ≤ x ≤ 1). and get a simpler expression. v = ex so that xex dx = xex − ex dx = xex − ex + C. (Where we used the fact that ln(1) = 0.7. Integration by parts Example: Integration by parts (1) Compute 2 127 I= 1 ln(x) dx. . we compute 1 1 I= 0 xex dx = (xex − ex ) 0 = (1e1 − e1 ) − (0e0 − e0 ) = 0 + e0 = e0 = 1. Solution Let u = ln(x) and dv = dx. ln(x) dx = x ln(x) − x(1/x) dx = x ln(x) − dx = x ln(x) − x. I= 2 2 This is certainly not a simpliﬁcation. Solution At ﬁrst. (Note that integration often requires trial and error.
Then du = (1/(1 + x2 )) dx and v = x so that I = x arctan(x) − 1 x dx. We can calculate this integral by repeated application of the idea in the previous example. Each application of integration by parts. 1 + x2 The last integral can be done with the simple substitution w = (1 + x2 ) and dw = 2x dx. integration by parts is needed more than once. . Techniques of Integration xn ex dx. with no remaining powers of x. The calculation is repeated until the very last integral has been simpliﬁed. 2 (1/w)dw. 1 ln(1 + x2 ). Example: Integration by parts (3) Compute I= arctan(x) dx. as a result I = x arctan(x) − Example: Integration by parts (3b) Compute I= tan(x) dx. Letting u = xn and dv = ex dx leads to du = nxn−1 and v = ex . giving I = x arctan(x) − (1/2) We obtain. This illustrates that in some problems. Solution Let u = arctan(x) and dv = dx.128 Example: Integration by parts (2b) Compute In = Solution Chapter 6. Then In = xn ex − nxn−1 ex dx = xn ex − n xn−1 ex dx. reduces the power of the term xn inside an integral by one.
Integration by parts Solution 129 We might try to ﬁt this into a similar pattern. but let us not give up! In this case. Solution Let u = ex and dv = sin(x) dx.e. even on a seemingly related example. cos(x)ex dx. that we’ll call I2 will appear in the calculation. Example: Integration by parts (4) Compute I1 = ex sin(x) dx. . ﬁrst. before attempting other methods.e. w This example illustrates that we should always try substitution. This is not really a simpliﬁcation.6. We refer to this integral as I1 because a related second integral. cos(x) Now we ﬁnd that a simple substitution will do the trick. Thus I2 = ex sin(x) − sin(x)ex dx = ex sin(x) − I1 . and we see that integration by parts will not necessarily work. Call I2 the integral I2 = so that Repeat the same procedure for the new integral I2 . another application of integration by parts will do the trick. Let u = ex and dv = cos(x) dx. i. that w = cos(x) and dw = − sin(x) dx will convert the integral into the form I= 1 (−dw) = − ln w = − ln  cos(x). let u = tan(x) and dv = dx. I1 = −ex cos(x) + I2 . However. Therefore I1 = −ex cos(x) − (− cos(x))ex dx = −ex cos(x) + cos(x)ex dx. Then du = ex dx and v = sin(x) dx. Then du = ex dx and v = − cos(x) dx.7. Then du = sec2 (x) dx and v = x. as we have not simpliﬁed the result. This seems hopeless. We now have another integral of a similar form to tackle. so we obtain I = x tan(x) − x sec2 (x) dx.e. i. we might instead try to rewrite the integral in the form sin(x) I = tan(x) dx = dx. i.
but in fact. we have also found that I2 is related. This method works provided the change of variable results in elimination of the original variable and leads to a simpler. that is. 2 6. We can eliminate I2 . We here summarize the most important results: 1. using I2 = ex sin(x) − I1 we now know that I2 = cos(x)ex dx = 1 x e (sin(x) + cos(x)) + C. it has a purpose. i. 2. v(x) is u dv = uv − v du. 2 2 (At this last step. the desired integral has been found to be I1 = ex sin(x) dx = 1 1 (−ex cos(x) + ex sin(x)) = ex (sin(x) − cos(x)) + C. I1 = −ex cos(x) + ex sin(x) − I1 . this method is combined with substitution or other simpliﬁcations. obtaining I1 = −ex cos(x) + I2 = −ex cos(x) + ex sin(x) − I1 . When using substitution on a deﬁnite integral.8 Summary In this chapter.e. Other times. endpoints can be converted to the new variable (Method 1) or the resulting antiderivative can be converted back to its original variable before plugging in the (original) endpoints (Method 2). Techniques of Integration This appears to be a circular argument. 3. more elementary integral. we explored a number of techniques for computing antiderivatives. and thus. Rearranging (taking I1 to the left hand side) leads to 2I1 = −ex cos(x) + ex sin(x). Substitution is the ﬁrst method to consider. . We have determined that the following relationships are satisﬁed by the above two integrals: I1 = −ex cos(x) + I2 I2 = ex sin(x) − I1 . Integration by parts is useful when u is easy to differentiate (but not easy to integrate). we have included the constant of integration. Sometimes more than one application of this method is needed.130 Chapter 6.) Moreover. It is also helpful when the integral contains a product of elementary functions such as xn and a trigonometric or an exponential function. The integration by parts formula for functions u(x).
4. the problem of ﬁnding an antiderivative can be very complicated. Even with all these techniques. Table of elementary antiderivatives 1. √ 5.6. and many associated manipulations are often applied to twist and turn a complicated integral into a set of simpler expressions that can each be handled more easily. 3. In some cases.8. 5. 6. or. Algebraic tricks. 1 du = ln u + C. Using integration by parts on a deﬁnite integral means that both parts of the formula are to be evaluated at the endpoints. 3. u un du = un+1 +C n+1 1 = arctan(u) + C 1 + u2 √ 1 = arcsin(u) + C 1 − x2 sin(u) du = − cos(u) + C cos(u) du = sin(u) + C sec2 (u) du = tan(u) + C Additional useful antiderivatives 1. if none of these work. cot(u) du = ln  sin(u) + C sec(u) = ln  sec(u) + tan(u) + C . use symbolic manipulation software packages. tan(u) du = ln  sec(u) + C. 7. Integrals involving 1 ± x2 can be simpliﬁed by making a trigonometric substitution. Summary 131 4. Integrals with products or powers of trigonometric functions can sometimes be simpliﬁed by application of trigonometric identities or simple substitution. 8. 6. calculate a given deﬁnite integral numerically using a spreadsheet. we resort to handbooks of integrals. 2. 7. 2.
Techniques of Integration .132 Chapter 6.
1 Introduction In this chapter we lay the groundwork for calculations and rules governing simple discrete probabilities24. the ﬂuctuations in the number of RNA molecules in a cell. and a vast array of other phenomena. To gain experience with probability. An additional example of real data is described in Appendix 11. Here we revisit such ideas in the context of probability. some resulting from experimental measurement or ﬁeld observations. it is unrealistic to compare each and every numerical value.Chapter 7 Discrete probability and the laws of chance 7. In doing so we lose the detailed information that the data set contains.6. collect data of all kinds. and comparisons are to be made between multiple data sets. We have seen related ideas in Chapter 5 in the context of mass distributions. in favor of simplicity of one or several “simple” numerical descriptors such as the mean and the median of a distribution. Data sets can be large and complex. it is important to see simple examples. we show how grade distributions on a test can be analyzed by similar methods. we discuss experiments that can be easily reproduced and tested by the reader. 24 I am grateful to Robert Israel for comments regarding the organization of this chapter 133 . there are many examples of such processes. In biology. medians. If an experiment is repeated. including the inheritance of genes and genetic diseases.2 Dealing with data Scientists studying phenomena in the real world. and similar quantities. Some shortcuts allow us to summarize trends or descriptions of data sets in simple values such as averages (means). There. 7. The idea of a center of mass is closely related to that of the mean of a distribution. In this chapter. Such skills are essential in understanding problems related to random processes of all sorts. the random motion of cells.
We expect that if we repeated the experiment many more times. how probable is it that we get eight out of ten heads? For dice. In order for the ideas of probability to apply. Let us use the notation xi to refer to the number of times that outcome i was obtained. dealing cards. We deﬁne the empirical probability pi of outcome i to be pi = xi /N.2 Outcome Whenever we perform the experiment. The number of repetitions will often be denoted N . we would expect that. We will often be interested in more complex experiments. consider the experiment of rolling a dice: A sixsided die can land on any of its six faces. This will lead us to deﬁne a probability of 1/2 for each outcome. and counting how many times each outcome happens.134 Chapter 7. We say that the events are random and unbiased for “fair” dice.1 Experiment We will consider “experiments” such as tossing a coin. i. this empirical probability would . we should be able to repeat the experiment as many times as desired under exactly the same conditions. on average.3 Simple experiments 7. we could ask how likely are we to roll a 5 and a 6 in successive experiments? A 5 or a 6? For such experiments we are interested in quantifying how likely it is that a certain event is obtained. Consider the following experiment: We toss a coin and see how it lands.3 Empirical probability We can arrive at a notion of probability by actually repeating a real experiment N times. Similarly. if we were to repeat the same experiment many many times. This means that if we repeat this experiment many many times.e. and to examine ways of quantifying and computing probabilities. i. we expect that on average. we ﬁrst look at results of a real experiment performed in class by students.3. Here there are only two possible results: “heads” (H) or “tails” (T). Our goal in this chapter is to make more precise our notion of probability. such as THTHH. if we toss a coin ﬁve times. We are interested in understanding how to quantify the probability of each such outcome in fair (as well as unfair) coins. applying treatment to sick patients. 7.e pi is the fraction of times that the result i is obtained out of all the experiments. In this chapter we will deal with discrete probability. An example of this sort is illustrated in Section 7. 7. we get H roughly 50% of the time and T roughly 50% of the time. To motivate this investigation. If we toss a coin ten times. so that a “single experiment” has six possible outcomes.3. A fair coin is one for which these results are equally likely.3. where there is a ﬁnite list of possible outcomes.4. For a fair die. an outcome corresponds to a ﬁveletter sequence of “Heads” (H) and “Tails” (T). and recording how many are cured. Discrete probability and the laws of chance 7. the six possible events would occur with similar frequencies. rolling a die.1. we anticipate getting each of the results with an equal probability. exactly one outcome happens. each 1/6 of the times. For example.
suppose we toss a coin n times. . termed probability distributions and their properties.. all with equal probability.4 Theoretical Probability For theoretical probability. . In general if the possible values xi are listed in increasing order for i = 0. . . 7. 3. where p(xi ) =Prob(X = xi )25 .1) means that the outcome occurs some of the time. we make some reasonable basic assumptions on which we base a calculation of the probabilities. then the number of heads. 25 Read: p(xi ) is the probability that the random variable X takes on the value xi . the actual probability of the outcome. For example. we will be interested in characterizing such function. it follows that pi = 1/M for every i. . So if in a cointossing experiment. 4} (i. For example. For discrete probability i pi = 1. In a case where there are M possible outcomes. Rules of probability 1. and let X be the number of heads that appear. Even though p(xi ) is a discrete quantity taking on one of a discrete set of values.. 7.e. Any value inside the range (0. we can argue by symmetry that every sequence of n heads and tails has the same probability as any other.5 Random variables and probability distributions A random variable is a numerical quantity X that depends on the outcome of an experiment. say. whereas pi = 1 implies that this outcome is the only possibility (and always happens). four heads).3..e. we should still think of this mathematical object as a function: it associates a number (the probability) p with each allowable value of the random variable xi for i = 0. In the case of discrete probability there are a discrete number of possible values for the random variable to take on. n. one head. 2.. If. in the case of a “fair coin”. then we would say that the empirical probability pHHTHH is 25/1000. as a limit. we toss the coin n = 4 times. X could take on any of the values {xi } = {0. 2. Rule 2 makes intuitive sense: it means that we have accounted for all possibilities. repeated 1000 times. In discrete probability. the outcome HHTHH is obtained 25 times. About Rule 1: pi = 0 implies that the given outcome never happens. . .7. 1. . i. we would like to characterize their probabilities p(xi ). In what follows.3. Simple experiments 135 approach. the fractions corresponding to all of the outcomes add up to 100% of the results. 0 ≤ pi ≤ 1 for each outcome i. n. We then use two fundamental rules of probability to calculate the probability as illustrated below. We will be interested in the probability distribution of X.3. where the sum is over all possible outcomes. no heads.
and we deﬁne the empirical probability assigned to xi as this fraction. of “Heads” that came up. For a given numerical outcome xi . superimposed as an xyplot on a graph of p(xi ). 1. actually carried out by each of 121 students in this calculus course. 1.e. . Here ni is the number of students who got xi heads. and then in column (5) we compute the cumulative (empirical) probability F (xi ). Discrete probability and the laws of chance 7. Also. . hence is called “cumulative”. 7. Each student recorded one of eleven possible outcomes. . etc heads) must add up to 1. consisted of tossing a coin n = 10 times and recording the number. we implicitly assume that all coins and all tossers are more or less identical and unbiased. The experiment. two. 1. in the rolling of a die. we can also deﬁne a cumulative function as follows: The cumulative function corresponding to the probability distribution p(xi ) is deﬁned as F (xi ) = Prob(X ≤ xi ). Table 7.1 shows the result of this experiment. We also show the cumulative function F (xi ). xi . .3. etc. and the vertical axis is p(xi ). 2.1.1 we show what this distribution looks like on a bar graph. that is p(xi ) = ni /N . so the “experiment” has N = 121 replicates (one for each student). We refer to this as the frequency of the given result.6 The cumulative distribution Given a probability distribution. The function F merely sums up all the probabilities of outcomes up to and including xi . Observe that F starts with the value 0 and climbs up to value 1. The horizontal axis is xi . . so ni /N is the fraction of experiments that led to the given result. Because in this example. which is clearly equal to 1 for a sixsided die. the value of F (xi ) is hence i F (xi ) = j=0 p(xj ).4. . For example. . the number of heads obtained. 2. 6}. etc) can be obtained in the experiment. This implies that F (xn ) = 1 where xn is the largest value attainable by the random variable. 2. By pooling together such data. 7. xi = {0.136 Chapter 7. no heads. only discrete integer values (0.4 Examples of experimental data 7. In column (3) we display the cumulative number of students who got any number up to and including xi heads. 2. as shown on the bottom panel in Fig.1 Example1: Tossing a coin We illustrate ideas with an example of real data obtained by repeating an “experiment” many times. one. up to ten heads out of the ten tosses). if we list the possible outcomes in ascending order as {1. . . In Figure 7. it makes sense to represent the data as discrete points. then F (6) stands for the probability of rolling a 6 or any lower value. 10} (i. since the probabilities of any of the events (0.
1 (x positions weighted by masses associated with those positions).0248 0. This idea is related to the concept of center of mass deﬁned in Section 5.0165 0. 2. We recorded the “frequency”. 10 heads.2 Example 2: grade distributions Another example of real data is provided in Appendix 11.1 for the same data presented graphically.4.3306 0. We also compute the cumulative function F (xi ) in the last column.5. the number of students ni who each got xi = 0. 1.00 Table 7.9421 1. p(xi ).00 0. For space constraints.0083 0.00 0. .2810 0.5455 0.1074 0.0579 0.3. 7.0083 0. i. . where values of x are weighted by their frequency of occurrence. ni /N .2149 0.7. . . See Figure 7. The expected value is a kind of “average value of x”.00 0.6. x of a probability distribution is ¯ n x= ¯ i=0 xi p(xi ) . There we discuss distributions of grades on a test.1157 0. Many of the ideas described here apply in the same way.5 Mean and variance of a probability distribution We next discuss some very important quantities related to the random variable.e.00 1.8264 0.00 1.1. Each student tossed a coin 10 times.2231 0. is identiﬁed with the (empirical) probability of that outcome. .00 nj 0 1 3 13 40 66 100 114 121 121 121 F (xi ) = 0 p(xj ) 0 1 2 10 27 26 34 14 7 0 0 0.0826 0. Results of a real cointossing experiment carried out by 121 students in this mathematics course. 7. rather than here. that example is provided in an Appendix. We deﬁne each of these as follows: The mean (or average or expected value). Such quantities provide numerical descriptions of the average value of the random variable and the ﬂuctuations about that average. Mean and variance of a probability distribution Number of heads xi 0 1 2 3 4 5 6 7 8 9 10 frequency (number of students) ni 0 137 cumulative function i cumulative number i empirical probability p(xi ) = ni /N 0. The fraction of the class that got each outcome.
V .1. The standard deviation.138 Chapter 7. the possible outcomes of some observation or measurement process are depicted on the x axis of the graph. In the lower graph.4 1. Discrete probability and the laws of chance 0. A total of N = 121 people were asked to toss a coin n = 10 times.0 0.0 number of heads (i) 10.0 0. σ carries the same units as x.0 Figure 7. The data from Table 7.0 empirical probability of i heads in 10 tosses Cumulative function 0. We deﬁne the variance. the average is an average y coordinate. We also show the cumulative function that sums up the values from left to right. it is 26 Note to the instructor: students often mix these two distinct meanings of the word average.) The mean is not the same as the average value of a function. or average height of the function. Note that the cumulative function is a “step function” . and since the standard deviation its square root.0 0. . (Recall that in the distributions we are describing. and they should be helped to overcome this difﬁculty with terminology. discussed in Section 4. In the bar graph (left). (In that case. the horizontal axis reﬂects i. the number. The mean is a point on the x axis. ¯ where x is the mean. σ as follows: The variance.6.1 is shown plotted on this graph. of a distribution is n V = i=0 (xi − x)2 p(xi ).0 number of heads (i) 10. The variance is related to the square of the quantity represented on the x axis. For this reason. The vertical axis reﬂects the fraction p(xi ) of the class that achieved that particular number of heads. of heads (H) that came up during those 10 coin tosses. the same data is shown by the discrete points.)26 We also deﬁne quantities that represents the width of the distribution. σ is ¯ √ σ = V. V and standard deviation. representing the “average” outcome of an experiment.
7. Moments of a distribution are deﬁned as the values obtained by summing up products of the probability weighted by powers of x. the mean (expected value) is calculated from results in Table 7. Mj of a distribution is n Mj = i=0 (xi )j p(xi ). we show that the variance can also be expressed in the form V = M2 − x2 . .2 (Expected number of heads (empirical)) For the empirical probability distribution shown in Figure 7.e. 91 − 6 7 2 2 = 35 . whereas a large σ signiﬁes that there is a large scatter of experimental values about the mean. M2 is 6 M2 = i=1 i2 · 1 1 = · 6 6 6 · 7 · 13 6 = 91 . . Having a low value of σ means that most of the experimental results are close to the mean.1 as follows: 10 x= ¯ k=0 xi p(xi ) = 0(0)+1(0. and let the random variable be X be the number obtained on the die. with a typical “width” of the distribution. (1 to 6). 6 p(xi ) = 1/6. 12 Example 7.+8(0. The j’th moment. so each has probability 1/6. We calculate the various quantities as follows: The mean is 6 x= ¯ i=1 i· 1 1 = · 6 6 6·7 2 = 7 = 3. In this case xi = i. V = and the standard deviation. 6 We can now obtain the variance.2149 . ¯ where M2 is the second moment of the distribution.5. Mean and variance of a probability distribution 139 common to associate the value of σ. If this die is fair.0083)+2(0. Example 7.1 (Rolling a die) Suppose you toss a die. 2 .7078. i = 1. i. then it is equally likely to get any of the six possible outcomes.0579)+9(0)+10(0) = 5. In the problem sets. 2 The second moment. .5.1. σ= 35/12 ≈ 1.0165)+. .
140 Chapter 7. + (7 − 5. Here we have used the mean calculated above and the fact that xk = k. 5 out of 10 tosses (i. .2149)2(0) + (10 − 5. Also note that X = k means that in the n trials there are k successes and n − k failures. . 7. we refer to one of the outcomes of a Bernoulli trial as ”success” S and the other ”failure”27 . Let us consider how to calculate the probability distribution of X. .053 (Because there was no replicate of the experiment that led to 9 or 10 heads out of 10 tosses. where we list all possible outcomes and their probabilities: In constructing Table 7. Then X is said to have a Binomial distribution with parameters n and p. Recall that the notation for this probability is Prob(X = k) for k = 0. . Discrete probability and the laws of chance Thus.0579) + (9 − 5.2149)2 p(k). . we use a multiplication principle applied to computing the probability of a compound experiment. 50%) would result in heads. A typical example.2149)2(0. . This is close to what we would expect intuitively in a fair coin. We obtain V = (0 − 5.e. We state this. Consider the following example for the case of n = 3.6. Let X be the number of successes.e.2149)2(0) = 2. namely that.2149)2(0. together with a useful addition principle below. Tails you lose”. 27 For example “Heads you win.1 The Binomial distribution Suppose we repeat a Bernoulli trial n times.2149)2(0) + (1 − 5. We now consider how to calculate the probability of some number of “successes” in a set of repetitions of a Bernoulli trial. n. To compute the variance we form the sum 10 10 V = k=0 (xk − x) p(xk ) = ¯ 2 k=0 (k − 5. motivated previously.2. these values do not contribute to the calculation. Traditionally.6 Bernoulli trials A Bernoulli trial is an experiment in which there are two possible outcomes. we are interested in the probability of tossing some number of Heads in n coin tosses.4328. the mean number of heads in this set of experiments is about 5.0083) + .2. . on average. In short.) The standard deviation is then √ σ = V = 1. the probability that X = k where k is some number of successes between none (k = 0) and all (k = n). i. 7.2149)2(0. 1. Let p be the probability of success and q = 1 − p the probability of failure in a Bernoulli trial. we will assume that each trial is identical and independent of the others. This implies that the probability p of success and q of failure is the same in each trial. is tossing a coin (the outcome being H or T). F.1157) + (8 − 5.
A list of all possible results of three repetitions ( n = 3) of a Bernoulli trial. The probability of obtaining X successes out of 3 Bernoulli trials. then Prob(e1 or e2 or . (Substituting H for S. Based on the results in Table 7. . namely SSS. ek are mutually exclusive events. all the ways of obtaining only one success (here we must allow for SFF. + Prob(ek ). . . S=“success” and F=“failure. successes (there is only one such way. 2. the addition principle is used to compute the probability Prob(SFF or FSF or FFS).7. . In constructing Table 7. Prob(ek ) Addition principle: if e1 .2 and the addition principle of probability. ek ) = Prob(e1 ) + Prob(e2 ) + ..3. we can compute the probability of obtaining 0. .. 1. . The results are shown in Table 7. we have considered all the ways of obtaining 0 Probability of X heads Prob(X Prob(X Prob(X Prob(X = 0) = q 3 = 1) = 3pq 2 = 2) = 3p2 q = 3) = p3 Table 7. and T for F gives the same results for a coin tossing experiment repeated 3 times). .2 and on the two principles outline above. or 3 successes out of 3 trials. and its probability is p3 ). . then Prob(e1 and e2 and . FFS. . FSF.3. ek are independent events. based on results in Table 7. . . Bernoulli trials Result SSS SSF SFS SFF FSS FSF FFS FFF probability p3 p2 q p2 q pq 2 p2 q pq 2 pq 2 q3 number of heads X =3 X =2 X =2 X =1 X =2 X =1 X =1 X =0 141 Table 7.6. . ek ) = Prob(e1 )Prob(e2 ) . Multiplication principle: if e1 . Since these results are mutually exclusive (only one such result is possible for any given replicate of the 3trial experiment).. each having the same probability pq 2 ) etc. .3.2. .
) In fact. we need to count how many possible outcomes consist of k successes and n − k failures. since the order in which S and F appear can differ from one outcome to another. respectively. the number of ways of choosing k objects out of a collection of n objects. the binomial coefﬁcient. by deﬁnition.8 for a review. 2)p2 q = 3p2 q. In mathematical terminology. To get the total probability of X = k. 2)an−2 b2 + . 1) = 2. . with n = 3. Discrete probability and the laws of chance In general. there are. (See Section 11. n − 1)abn−1 + bn n = k=0 C(n. n − 2)a2 bn−2 + C(n. in general. an expansion of n terms leads to (a + b)n = an + C(n. we ﬁnd that Prob(X = 2) = C(3. 1)an−1 b + C(n. 0) = 1 appear in front of the three terms. k) for k = 3. k)pk q n−k . 0. which is. As illustrated by the above example. That binomial coefﬁcient is C(n.e. k)ak bn−k + . 2. the number of ways of choosing 2 a’s. ab and b2 in the resulting expansion. [Respectively. k).) We have arrived at the following result for n Bernoulli trials: The probability of k successes in n Bernoulli trials is Prob(X = k) = C(n. A familiar example is (a + b)2 = (a + b) · (a + b) = a2 + ab + ba + b2 = a2 + 2ab + b2 . . representing. 2) = 1. k)ak bn−k . the probability of an outcome that has k successes and n − k failures (in some speciﬁc order) is pk q (n−k) . 7. n (a + b)n = k=0 C(n. and C(2. + C(n. The coefﬁcients C(2. k)ak bn−k . Let us consider a few examples. there can be many permutations (i. More generally. C(2. 1 a. . the product of three terms is (a + b)3 = (a + b) · (a + b) · (a + b) = (a + b)3 = a3 + 3a2 b + 3ab2 + b3 whereby coefﬁcients are of the form C(3. In the above example. 1. arrangements of the order) of S and F that have the same number of successes in total. the number of ways that n trials can lead to k successes is C(n.] Similarly. for each replicate of an experiment consisting of n Bernoulli trials. + C(n. these account for the terms a2 . k) = (n choose k) = n! . many such ways. and no a’s out of the n factors of (a + b). .2 The Binomial theorem The name binomial coefﬁcient comes from the binomial theorem: which accounts for the expression obtained by expanding a binomial.7 for the deﬁnition of factorial notation “!” used here.142 Chapter 7.6. (n − k)!k! (See Section 11.
starting with k = 0 at the beginning of the row and and going to k = n at the end of the row. 0). the probability of success and failure are the same. Bernoulli trials 1 1 1 1 1 1 5 4 10 3 6 10 2 3 4 5 1 1 1 1 1 143 Table 7. k).5. The next row represents C(1.3 The binomial distribution 0.5 0.6. In the ﬁrst panel. The top of the triangle represents C(0. n − k). k). k) = C(n. shown in Table 7. In the second panel. . Pascal’s triangle contains the binomial coefﬁcients of the C(n. terms along the row are the binomial coefﬁcients C(n.0 0. . p = q = 0. What does the binomial theorem say about the binomial distribution? First.5 Figure 7.4.6. .4 The binomial distribution The binomial distribution p=1/2 q=1/2 p=1/4 q=3/4 0. For row number n. i.0 0.2. it follows that p + q = 1. since there are only two possible outcomes in each Bernoulli trial.e. the probability of success is p = 1/4. 1).4. 10. We have plotted Prob(X = k) versus k for k = 0.7. The binomial distribution is shown here for n = 10. so q = 3/4 and the resulting distribution is skewed. so that C(n. Each term in Pascal’s triangle is obtained by adding the two diagonally above it. 7. The binomial coefﬁcients are symmetric. They are entries that occur in Pascal’s triangle. and hence (p + q)n = 1. 1. .5 10. The distribution is then symmetric. 0) and C(1.4 0. This distribution is the same as the probability of getting X heads out of 10 coin tosses for a fair coin.5 10.
Taking the derivative of the above with respect to x leads to: n n(px + q)n−1 · p = which. We continue to compute other quantities of interest. k)pk q n−k = M2 . Discrete probability and the laws of chance Using the binomial theorem. (7. k)(px)k q n−k = k=0 C(n. (And since this accounts for all possibilities. k=0 Thereby we have calculated the second moment of the distribution.144 Chapter 7. In summary. k)pk q n−k k 2 xk−1 .1) Thus. 1. (plugging in x = 1) implies that n C(n. n successes. .. k=0 n np = k=0 k · C(n. Take the derivative again.e. Here x will be an abstract quantity introduced for convenience (i. k)pk q n−k = k=0 ¯ k · Prob(X = k) = X. we have found that ¯ The mean of the binomial distribution is X = np where n is the number of trials and p is the probability of success in one trial. k)pk q n−k kxk−1 . and the standard deviation.) We can compute the mean and variance of the binomial distribution using the following tricks. k)pk q n−k xk . the sum of these terms represents the sum of probabilities of obtaining k = 0. k)pk q n−k = k=0 Prob(X = k) = 1. the variance. . k=0 np + n(n − 1)p2 = k 2 C(n.1 by x to obtain n nx(px + q)n−1 p = k=0 C(n. 7. . That is. it follows that the sum adds up to 1. we can expand the latter to obtain n n (p + q)n = k=0 C(n. for making the trick work): n n (px + q)n = k=0 C(n. Multiply both sides of Eqn. We will write out an expansion for a product of the form (px + q)n . we found the following results: . k)pk q n−k kxk . . The result is n n(px + q)n−1 p + n(n − 1)x(px + q)n−2 p2 = Plug in x = 1 to get n C(n.
¯ V = M2 − X 2 = np − np2 = np(1 − p) = npq. The Normal (or Gaussian) distribution is given by equation (7. i. 0. To do so.4 The Normal distribution 0.0 Figure 7. The resulting function is of the form 2 1 p(x) = √ e−x /2 2π (7.3. Bernoulli trials 145 The second moment M2 . As the number of Bernoulli trials grows. Then X has mean 0 and standard deviation σ.2) and has the distribution shown in this ﬁgure.6. a remarkable thing happens to the binomial distribution: it becomes smoother and smoother. called the standard normal distribution.0 4. deﬁne the new random variable X to be: X = X − X.e. then we ﬁnd a distribution that describes the deviation from the expected value of 50% heads. until it grows to resemble a continuous distribution that looks like a “Bell curve”.2) . In the limit as n → ∞.7. Now deﬁne Z= ¯ (X − X) σ Then Z has mean 0 and standard deviation 1. √ σ = npq. If we scale this curve vertically and horizontally (stretch vertically and compress horizontally √ by the factor N/2) and shift its peak to x = 0. as we toss our imaginary coin in longer and longer sets (n → ∞).0 4. rescale) the binomial random variable so that it has a convenient ˜ ˜ ¯ ˜ mean and width.e.4 The normalized binomial distribution We can “normalize” (i. the Variance V and the standard deviation σ of a binomial distribution are M2 = np + n2 p2 − np2 . we can approximate Z with a continuous distribution.6. That curve is known as the Gaussian or Normal distribution. 7.
the unit of genetic material that “codes” for proteins and ultimately. A might be an allele for blue eyes. For example.0 Figure 7. so each individual would have one of the following pairs of combinations AA. Then there are two possible mutually exclusive outcomes. determines all of our physical traits.0 The normal distribution 4. according to our previous deﬁnition. so only the . Suppose that the gene for eye color comes in two forms that will be referred to as A and a. The order Aa or aA is synonymous.0 The cumulative distribution The cumulative distribution 0.0 1.4. the experiment just described is a Bernoulli trial. and hence. and one set comes from our father. Each of us has two entire sets of chromosomes: one set is inherited from our mother. We ﬁnd that many of the simple concepts presented here will be useful in calculating the probability of inheriting genes from one generation to the next.0 0. and examine the region in one of their chromosomes determining eye colour. whereas a could be an allele for brown eyes. we are interested in a “repeated Bernoulli trial” with n = 2. through complicated biochemistry and molecular biology. We show a typical example of the Normal distribution in Figure 7. Its cumulative distribution is then shown (without and with the original distribution superimposed) in Figure 7.3. The Normal probability density with its corresponding cumulative function. In principle.0 4. or aa. We will consider a particularly simple situation.0 4. 7.7 HardyWeinberg genetics In this section.4. each chromosome will come with one or the other allele. We will investigate how a single gene (with two “ﬂavors”. when the single gene determines some physical trait (such as eye color). 1. A or a.0 4. Aa. These chromosomes carry genes. aA. The actual eye color phenotype will depend on both inherited alleles. The trait (say blue or green eyes) will be denoted the phenotype and the actual pair of genes (one on each parentally derived chromosome) will be called the genotype. Discrete probability and the laws of chance We will study properties of this (and other) such continuous distributions in a later section.146 Chapter 7. we investigate how the ideas developed in this chapter apply to genetics. called alleles) is passed from one generation to the next. Consider the following “experiment”: select a random individual from the population of interest.
7. on average we would expect N p2 individuals of type AA. Prob(x and y)=Prob(x)·Prob(y)).e.) 7. and in the deﬁnition of properties of probability. we show the possible genotypes of the mother and father.7.) Then we can interpret p and q as probabilities that a gene selected at random from the population will turn out to be type a (respectively A).5. HardyWeinberg genetics Genotype: Probability: Genotype: Probability: aA pq AA p2 aa q2 AA p2 Aa pq aa q2 147 aA or Aa 2pq Table 7. does not depend at all on “eye color”. Now let us examine the distribution of possible offspring of various parents. i. If the probability of ﬁnding allele A is p and the probability of ﬁnding allele A is q. and their union produces the zygote that carries the doubled set of chromosomes. But that is just (2pq)(p2 ). the probability that a couple chosen at random will consist of a woman of genotype aA and a man of genotype aa is a product of the fraction of females that are of type aA and the fraction of males that are of type aa.6. For example.e.. N q 2 of type aa and 2N pq individuals of the mixed type. The father and mother each pass down one or another copy of their alleles to the progeny. However.7.e. and the fraction of all genes for eye color of type a is q. This means that we can use the multiplicative property of probability to determine the probability of a given combination of parental alleles. (We have used the fact that there are only two possibilities for the gene type. Now suppose we draw at random two alleles out of the (large) population. 28 Recall that the sperm and the egg each have one single set of chromosomes. we have combined these outcomes in the revised second table. Prob(a)=q.i.. (We have seen this before in the discussion of Bernoulli trials. then the eye color gene probabilities are as shown in the top table. because genotype Aa is equivalent to genotype aA. Note that the sum of the probabilities of all the genotypes is p2 + 2pq + q 2 = (p + q)2 = 1. where p + q = 1. Prob(A) = p. (i. Suppose we know that the fraction of all genes for eye color of type A in the population is p. In Table 7. number of alleles of type A (or equivalently of type a) is important. then.1 Random nonassortative mating We now examine what happens if mates are chosen randomly and offspring arise from such parents. and whether it changes in the next generation. . or simply 2p3 q. and calculate the probability that mating of such individuals would occur under the assumption that choice of mate is random . We assume that the allele donated by the father (carried in his sperm) is independent of the allele found in the mother’s egg cell28 . We investigate how the proportion of genes of various types is arranged. of course. If the population size is N .
Mother: Father: AA p2 AA p2 AA p4 1 1 2 aA 2 AA 2 aA 2pq 1 1 2 aA 2 AA 2 aa q2 Aa p2 q 2 1 1 2 aa 2 Aa 2 2pqp aA 2pq 2pqp 1 1 1 4 aa 2 aA 4 AA 2 2 4p q 2pqq aa q2 Aa p2 q 2 1 1 2 aA 2 aa 2 2pqq aa q4 Table 7. The probabilities of the given progeny are directly under those entries. The genotype of the mother is shown across the top and the father’s genotype is shown on the left column. AA in the ratio 1:2:1 (regardless of the values of p and q). for example. we see that there are only four ways that a child of type AA can result from a mating: either both parents are AA. for children of type AA the probability is 1 1 1 Prob(child of type AA) = p4 + (2pqp2 ) + (2pqp2 ) + (4p2 q 2 ). or one or the other parent is Aa.148 Chapter 7.6.6. that if the couple are both of type aA.3 (Probability of AA progeny) Find the probability that a random (Hardy Weinberg) mating will give rise to a progeny of type AA. 2 2 4 Simplifying leads to Prob(child of type AA) = p2 (p2 + 2qp + q 2 ) = p2 (p + q)2 = p2 .) Example 7. The frequency of progeny of various types in HardyWeinberg genetics can be calculated as shown in this “mating table”. we can then determine the probability of each of the three genotypes in the next generation. Discrete probability and the laws of chance In Table 7. Thus. or both parents are Aa. The various progeny resulting from mating are shown as entries in bold face.this is to emphasize that they are products of the original parental probabilities. Solution 1 Using Table 7. with the probabilities that they are produced by one or another such random mating. so we expect to see children of types aa. we note. Using this table. aA. (We did not simplify the expressions . . We can now group together and summarize all the progeny of a given genotype. each parent can “donate” either a or A to the progeny.6.
as shown on the arrows from the father and mother to these genotypes.7.) Thus. HardyWeinberg genetics 149 In the problem set. It is of interest to investigate what happens when one of the assumptions we made is . the probability of the child being type aa is q 2 . For a parent of type AA. (We must multiply. the probability that A is passed down to the child is only 1/2. A tree diagram to aid the calculation of the probability that a child with genotype AA results from random assortative (Hardy Weinberg) mating. Each arrow indicating the given case is accompanied by the probability of that event. The combined probability is computed as follows: we determine the probability of getting an A from father (of type AA OR Aa): This is Prob(A from father)=(1/2)2pq + 1 · p2 ) = (pq + p2 ) and multiply it by a similar probability of getting A from the mother (of type AA OR Aa).5. A child AA cannot have any parent of genotype aa.7. For a parent of type Aa.) Continuing down the branches. Prob(child of type AA) =(pq + p2 )(pq + p2 ) = p2 (q + p)2 = p2 · 1 = p2 . we examine all the possibilities at each branch point. a random individual has probability 2pq of having genotype Aa. we also ﬁnd that the probability of a child of type aA is 2qp. we ask with what probability the given parent would have contributed an allele of type A to the child. since we need A from the father AND A from the mother for the genotype AA.5. we show an alternate solution to the same problem using a tree diagram. so both father and mother’s genotype could only have been one of AA or Aa. In Figure 7. We thus observe that the frequency of genotypes of the progeny is exactly the same as that of the parents. Reading from the top down. (For example. This type of genetic makeup is termed HardyWeinberg genetics. so the given branch carries probability 1. Alternate solution child AA father 2pq Aa 1/2 A or p2 AA 1 A mother 2pq p2 Aa 1 AA 1/2 A or A (pq+p 2 ) . ( pq + p2 ) Figure 7. this is certainly true.
A random walker in 1 dimension takes a step to the right with probability p and a step to the left with probability q. for example. we could consider this motion as a 1D simpliﬁcation of the random tumbles and swims of a bacterium in its turbulent environment.e. we consider a straight (1 dimensional) path and an erratic walker who takes steps randomly to the left or right. and with probability q she takes a step towards the left. (We may as well assume that the steps occur at regular intervals of time.6.) In Figure 7. With probability p. when the genotype of the individual has an impact on survival or on the ability to reproduce.6(b) we show the walker’s position.) (a) q p x −1 (b) x n 0 1 Figure 7.6(a). it forms an important theme in the area of genetics. or possibly avoidance of poor environmental conditions.150 Chapter 7. she takes a step towards the right. A shown in Figure 7. and often attributed to a drunken wanderer. We could imagine the walker tossing a coin to determine whether to move . In this problem. The process described here is classic.8 Random walker In this section we discuss an application of the binomial distribution to the process of a random walk. it is usually the case that a goal of this swim is a search for some nutrient source. We shall see that if the probabilities of left and right motion are unequal (i. it must be true that p + q = 1. While this is beyond our scope here. each step has only two outcomes (analogous to a trial in a Bernoulli experiment). Discrete probability and the laws of chance relaxed. so that the horizontal axis of this plot can be thought of as a time axis. x plotted versus the number of steps (n) she has taken. In our case. 7. (Since these are the only two choices. the motion is biased in one direction or another) this swimmer tends to drift along towards a preferred direction. We will assume that the walker never stops.
The expected (mean) position after 1 move is thus x1 = p(+1) + q(−1) = p − q But the process follows a binomial distribution. What is the probability that a walker starting at the origin returns to the origin on her 10’th step? Solution (a) The probability of the run RLRRRLRLLL is the product pqpppqpqqq = p5 q 5 . (c) Suppose that p = q = 1/2. the position is x = −1. The probability of such an event is given by a term in the binomial distribution: P(k out of n moves to right)=C(n. Note the similarity to the question “What is the probability of tossing HTHHHTHTTT?” (b) This problem is identical to the problem of k heads in n tosses of a coin. Our familiarity with Bernoulli trials and the binomial distribution will prove useful in this context. We wish to characterize the probability of the walker being at a certain position at a given time. The order of the steps does not matter. with probability p the position is x = +1 and with probability q. 5) 1 2 10 = 10! 5!5! 1 = 0. (c) The walker returns to the origin after 10 steps only if she has taken 5 steps to the left (total) and 5 steps to the right (total).24609 1024 Mean position We now ask how to determine the expected position of the walker after n steps. 5)p5 q 5 = C(10. and to ﬁnd her expected position after n steps. Example (a) What is the probability of a run of steps as follows: RLRRRLRLLLL (b) Find the probability that the walker moves k steps to the right out of a total run of n consecutive steps. i. how the mean value of x depends on the number of steps and the probabilities associated with each step. k)pk q n−k .7.e. Random walker 151 right or left. . Thus this problem reduces to the problem (b) with 5 steps out of 10 taken to the right. and thus the mean after n steps is xn = n(p − q). The probability is thus P(back at 0 after 10 steps) = P(5 out of 10 steps to right) =C(10.8. After 1 step.
and that the sum of (discrete) probabilities of all possible (discrete) outcomes is 1. σ is σ= where V is the variance. describes the location of the “center” of the distribution (analogous to center of mass). and we used the distribution of results to deﬁne simple numerical descriptors. i. coin tossed n times) was a binomial distribution. more or less. V = i=0 n √ V (xi − x)2 p(xi ). the mean number of events e1 in n repeated Bernoulli trials is x = np. ¯ While the chapter was motivated by results of a real experiment. then P (k occurrences of given event out of n trials) = n! pk q n−k . The mean of the binomial distribution. The standard deviation. We deﬁned the notion of a Bernoulli trial.9 Summary In this chapter.e. We then described how to combine probabilities of elementary events to calculate probabilities of compound independent events in a variety of simple experiments. whether in tabular or graphical form. We learned that a probability is always a number between 0 and 1. we then investigated theoretical distributions. and studied this in detail. The mean is a number that.152 Chapter 7. Suppose that the probability of one of the events. such as tossing of a coin. We investigated a number of ways of describing results of experiments. roughly speaking. The standard deviation is. We found that the distribution of events in a repetition of a Bernoulli trial (e. Discrete probability and the laws of chance 7. we introduced the notion of discrete probability of elementary events. k!(n − k)! This is called the binomial distribution. the “width” of the distribution. ¯ . and we computed the mean of that distribution. say event e1 in a Bernoulli trial is p (and hence the probability of the other event e2 is q = 1 − p). including the binomial.g. deﬁned as follows: The mean (expected value) x of a probability distribution is ¯ n x= ¯ i=0 xi p(xi ).
This idea is analo29 This leap from discrete values that are the probability of an outcome (as seen in Chapter 7) to a probability density is challenging for many students. using a continuous function in place of the discrete bargraph seen in Chapter 7. Here again.Chapter 8 Continuous probability distributions 8.2 Basic deﬁnitions and properties Here we extend previous deﬁnitions from Chapter 7 to the case of continuous probability. We begin by extending the idea of a discrete random variable to the continuous case. 8. We call x a continuous random variable in a ≤ x ≤ b if x can take on any value in this interval. but rather “ a probability per unit x”. so a = 0. we observe that now p(x) will no longer be a probability.thus. say an adult male.5 ≤ x ≤ 3 meters. we will see the concepts of integral calculus in the context of practical examples and applications.71. Unlike the case of discrete probability. One of the most important differences is that we now consider a probability density.5 and b = 3.1 Introduction In Chapter 7.8 meters most often . (This height typically takes on values in the range 0. This leads us to redeﬁne our idea of a continuous probability.7 < x < 2. In doing so. we will see that quantities such as mean and variance that were previously deﬁned by sums will now become deﬁnite integrals. we explored the concepts of probability in a discrete setting. we might expect to get a result in the proximity of 1. and measure his height.8 meters.g. e. where outcomes of an experiment can take on only one of a ﬁnite set of values. heights in the range 2. say.) If we select a male subject at random from a large population. such heights will be associated with a larger value of probability than heights in some other interval of equal length. Here we extend these ideas to continuous probability. rather than a value of the probability per se29 . Reinforcing the analogy with discrete masses versus distributed mass density (discussed in Chapter 5) may be helpful. 153 . however. say. First and foremost. selected randomly from a population. the measured height can take on any real number within the interval of interest. An example of a random variable is the height of a person.
Continuous probability distributions gous to the connection between the mass of discrete beads and a continuous mass density. p(x)dx. x) 31 .154 Chapter 8. 2. we deﬁne a cumulative function. b a p(x) dx = 1 where the possible range of values of x is a ≤ x ≤ b. Deﬁnition A function p(x) is a probability density provided it satisﬁes the following properties: 1. we will now ﬁnd that many of the associated concepts have a natural and straightforward generalization as well. a1 The transition to probability density means that the quantity p(x) does not carry the same meaning as our previous notation for probability of an outcome xi . we will not ask “what is the probability that x takes on some exact value?” Rather. namely p(xi ) in the discrete case. p(x) ≥ 0 for all x. F (x). Unlike our previous discrete probability. and variance of a continuous probability density can be computed. But this integral has a value zero. by properties of the deﬁnite integral. or its approximation p(x)∆x is now associated with the probability of an outcome whose values is “close to x”. and then show how the mean. and the point x). Having generalized the idea of probability. The above deﬁnition has several implications: b 30 Remark: the probability that x is exactly equal to b is the integral b p(x) dx. 31 By now. the reader should be comfortable with the use of “s” as the “dummy variable” in this formula. as follows: x F (x) = a p(s) ds. as integrals replace the sums in such calculations. . and this is computed by performing an integral30 . In fact. The cumulative function is simply the area under the probability density (between the left endpoint of the interval. we ask for the probability that x is within some range of values. Here we will have the opportunity to practice integration skills. a. median. where x plays the role of right endpoint of the interval of integration. We ﬁrst deﬁne the cumulative function. Deﬁnition For experiments whose outcome takes on values on some interval a ≤ x ≤ b. encountered previously in Chapter 5. Then F (x) represents the probability that the random variable takes on a value in the range (a. The probability that a random variable x takes on values in the interval a1 ≤ x ≤ a2 is deﬁned as a2 p(x) dx.
This process is called normalization. p(x). The connection between the probability density and its cumulative function can be written (using the Fundamental Theorem of Calculus) as p(x) = F (x). 4. this is zero. This follows from the fact that a F (a) = a p(s) ds. This follows from the additive property of integrals and the Fundamental Theorem of Calculus: a2 a a1 a2 a2 p(s) ds − p(s) ds = a a1 p(s) ds = a1 F (s) ds = F (a2 ) − F (a1 ) Finding the normalization constant Not every realvalued function can represent a probability density. A b a ≤ x ≤ b. . By a property of the deﬁnite integral. F (a) = 0. Since p(x) ≥ 0. F (b) = 1. 5. p(x) as p(x) = 1 f (x). It is easy to check that p(x) ≥ 0 and that a p(x)dx = 1. Basic deﬁnitions and properties Properties of continuous probability 1. f (x) ≥ 0. the cumulative function is an increasing function. For one thing. Given an arbitrary positive function. This is really what normalization is all about. the function must be positive everywhere. Further. 3. The probability that x takes on a value in the interval a1 ≤ x ≤ a2 is the same as F (a2 ) − F (a1 ). and the constant C = 1/A is called the normalization constant32 . This follows from the fact that b F (b) = a p(s) ds = 1 by Property 2 of the deﬁnition of the probability density. by Property 2 of a probability density. on some interval a ≤ x ≤ b such that b f (x)dx = A > 0.8. a we can always deﬁne a corresponding probability density. Thus we have converted the original function to a probability density. 32 The reader should recognize that we have essentially rescaled the original function by dividing it by the “area” A. the total area under its graph should be 1. 155 2.2.
Once we rescale our function by this constant. such that Property 2 of continuous probability is satisﬁed. 6 (a) We must ﬁnd the normalization constant. such that 6 1= 0 p(x) dx. F (x).e.156 Chapter 8. for p(x) to be a probability density. C.1 Example: probability density and the cumulative function Consider the function f (x) = sin (πx/6) for 0 ≤ x ≤ 6. A graph of this probability density function is shown as the black curve in Figure 8.1. (a) Normalize the function so that it describes a probability density. Continuous probability distributions 8. Let π p(x) = C sin x . . and setting the normalization constant to C = 1/A. so we can deﬁne the desired probability density. Solving for C leads to the desired normalization constant. it must be true that C(12/π) = 1. we get the probability density. p(x) = π π sin x . Solution The function is positive in the interval 0 ≤ x ≤ 6. i.2. (b) Find the cumulative distribution function. C= π .) But by Property 2. Carrying out this computation leads to 6 C sin 0 π x 6 dx = C π 6 − cos x π 6 6 =C 0 6 12 (1 − cos(π)) = C π π (We have used the fact that cos(0) = 1 in a step here. 12 Note that this calculation is identical to ﬁnding the area 6 A= 0 sin π x 6 dx. 12 6 This density has the property that the total area under its graph over the interval 0 ≤ x ≤ 6 is 1.
and thus the value of F (x).3 Mean and median When we are given a distribution. Recall that in Chapter 5 for mass density ρ(x). and the cumulative function F (x) (red) for Example 8.0 Figure 8.0 F(x) p(x) 0. This cumulative function is shown as a red curve in Figure 8. b ρ(x) dx a (8.3.8. . x 157 F (x) = 0 p(s) ds = π 12 x x sin 0 π s 6 ds Carrying out the calculation33 leads to F (x) = π π 6 − cos · s 12 π 6 = 0 1 π 1 − cos x 2 6 .0 6.1. 1.2.1) 33 Notice that the integration involved in ﬁnding F (x) is the same as the one done to ﬁnd the normalization constant. we often want to describe it with simpler numerical values that characterize its “center”: the mean and the median both give this type of information.0 0. we deﬁned a center of mass.1. 8. We also want to describe whether the distribution is narrow or fat . The only difference is the ultimate step of evaluating the integral at the variable endpoint x rather than the ﬁxed endpoint b = 6. Mean and median (b) We now compute the cumulative function. Note that the area under the black curve is 1 (by normalization). The probability density p(x) (black).e.i.1. The variance and higher moments will provide that type of information. which is the cumulative area function is 1 at the right endpoint of the interval. how clustered it is about its “center”. x= ¯ b xρ(x) dx a .
the mean of a probability density is given as follows: Deﬁnition For a random variable in a ≤ x ≤ b and a probability density p(x) deﬁned on this interval.3 may help to dispel such confusion.1.2. Simply put. 2 It follows from this deﬁnition that the median is the value of x for which the cumulative function satisﬁes 1 . Deﬁnition The median xmed of a probability distribution is a value of x in the interval a ≤ xmed ≤ b such that xmed b p(x) dx = a xmed p(x) dx = 1 .Consequently.3. Continuous probability distributions The mean of a probability density is deﬁned similarly. To avoid confusion note the distinction between the mean as an average value of x versus the average value of the function p over the given interval.1 Example: Mean and median Find the mean and the median of the probability density found in Example 8. Solution To ﬁnd the mean we compute x= ¯ π 12 6 x sin 0 π x 6 dx. . Since probability distributions are normalized. but the deﬁnition simpliﬁes by virtue b of the fact that a p(x) dx = 1.158 Chapter 8. The idea of median encountered previously in grade distributions also has a parallel here. the median is the value of x that splits the probability distribution into two portions whose areas are identical. denoted x is given by ¯ b x= ¯ a xp(x) dx. the denominator in Eqn.1) is simply 1. (8. the mean or average value of x (also called the expected value). 2 F (xmed ) = 8. Reviewing Example 5.3.
0 Figure 8. since the integrand consists of an expression xp(x)dx.2.2.0 x med 6.) To ﬁnd the median.1 in relation to the median. The idea of IBP is to reduce the integration to something involving only p(x)dx. as shown in green.1. xmed .3.0 F(x) 0. The median is the value of x at which F (x) = 0.2. 2 34 Recall from Chapter 6 that udv = vu − vdu. as we show here. Mean and median Integration by parts is required here34 . v = − π cos π x . we ﬁnd that 1.0 0.2) (We have used cos(π) = −1. xmed sin 0 π s 6 ds = 1 2 ⇒ 1 π 1 − cos xmed 2 6 = 1 .3. we look for the value of x for which F (xmed ) = 1 . 2 Using the form of the cumulative function from Example 8. dv = sin π x dx. 2 (8. as computed in Example 8.5 0. which is done essentially by “differentiating” the term u = x.8. Calculations of the mean in continuous probability often involve Integration by Parts (IBP). . 6 6 Then du = dx. sin(0) = sin(π) = 0 in the above.1. The cumulative function F (x) (red) for Example 8. Let u = x.5. The calculation is then as follows: 6 x= ¯ = π 12 1 2 −x 6 π cos x π 6 π x 6 6 6 159 + 0 6 π 6 cos 0 6 0 π x dx 6 −x cos + 0 6 π sin x π 6 1 = 2 6 6 −6 cos(π) + sin(π) − sin(0) π π = 6 = 3.
This will be true in general for symmetric probability distributions. We have seen in Example 8. In other words. Is this always the case? When are the two different.2. π/2. 8.1 that for symmetric distributions.e. at the midpoint of the interval. we have found that the point xmed subdivides the interval 0 ≤ x ≤ 6 into two subintervals whose probability is the same.e. ±3π/2 etc. a concept from physics that describes the location of a pivot point at which the entire “mass” would . just as it was for symmetric mass or grade distributions. This leads to π π xmed = 6 2 so the median is xmed = 3.2 How is the mean different from the median? p(x) p(x) x x Figure 8.3.3.160 Chapter 8. and how can we understand the distinction? Recall that the mean is closely associated with the idea of a center of mass. Continuous probability distributions Here we must solve for the unknown value of xmed . which the mean will have shifted to the new “center of mass” of the probability density. Thus we should have anticipated that the mean and median of this distribution would both occur at the same place. the mean and the median are the same. The relationship of the median and the cumulative function F (x) is illustrated in Fig 8. Remark A glance at the original probability distribution should convince us that it is symmetric about the value x = 3. 6 ⇒ cos π xmed = 0. 6 The angles whose cosine is zero are ±π/2. If the distribution is changed slightly so that it is no longer symmetric (as shown on the right) then the median may still be the same. We select the angle so that the resulting value of xmed will be inside the relevant interval (0 ≤ x ≤ 6 for this example). In a symmetric probability distribution (left) the mean and median are the same. i. i. 1 − cos π xmed = 1.3.
This is the curve shown in black in Figure 8. However. just as they did in Example 8. as shown in Figure 8. each of those portions represents an equal area.. to compensate for the change. since the total areas to the right and to the left of the vertical line are still equal.3 Example: a nonsymmetric distribution We slightly modify the function used in Example 8. consider distribution of 35 This is good practice. affects the contribution of parts of the distribution to the value of the mean. It is worth remembering that mean of p(x) = expected value of x = average value of x.4.2.3. It is then an easy matter to replot the revised function f (x)/A. the average height of the function on the given interval.3.of terms of the form xp(x)∆x. A1 = A2 = 1/2 since the total area under the graph is 1 by deﬁnition. p(x)∆x. Applications of continuous probability exactly balance. (In the case of probability density. x = 6). we leave as an exercise for the reader how to determine the median and the mean using the same spreadsheet tool for a related (simpler) example. To the right. which is an average value of the y coordinate. Alternatively. 161 This concept is not to be confused with the average value of a function. and hence the normalization constant (C = 1/A) will be thereby determined (at the point corresponding to the end of the interval.e.1. not just the “mass”. x. We can plot f (x) using sufﬁciently ﬁne increments ∆x along the x axis and compute the approximation for its integral by adding up the quantities f (x)∆x. Simply put. to compute the mean of the distribution we have to integrate by parts twice. 8. . we explore applications of the ideas developed in this chapter to a variety of problems. We treat the decay of radioactive atoms.4. the fact that part of the mass is farther away to the right leads to a shift in the mean of the distribution.) Figure 8. 8. F (x). At the left. The median simply indicates a place at which the “total mass” is subdivided into two equal portions.1 to the new expression f (x) = x sin (πx/6) for 0 ≤ x ≤ 6. This stems from the fact that the mean of the distribution is a “sum” i. and the reader is encouraged to do this calculation. we can carry out all such computations (approximately) using the spreadsheet. integral . which corresponds to the normalized probability density. Further. for a symmetric probability density. but we have to carry out an integration by parts to ﬁnd the normalization constant and/or to calculate the cumulative function. a small portion of the distribution was moved off to the far right.4 Applications of continuous probability In the next few sections.4. Thus the location along the x axis.8. In the problem sets. i. Steps in obtaining p(x) would be similar35 . shown in black in Figure 8. the mean and the median coincide. The area under the curve A. This change did not affect the location of the median. the mean contains more information about the way that the distribution is arranged spatially.4.e. This results in a nonsymmetric probability density.3 shows how the two concepts of median (indicated by vertical line) and mean (indicated by triangular “pivot point”) differ.
and explore how the distribution of radii is related to the distribution of volumes in raindrop drop sizes. leading to an improper integral. It turns out that a good candidate for such a function is p(t) = Ce−kt . We cannot predict exactly when a given atom will undergo this event. This means that these integrals have to be evaluated “at inﬁnity”. as well as the means and medians in these cases will form the main focus of our discussion. we see that the median is approximately xmed = 3. this function is deﬁned over the interval 0 ≤ t ≤ ∞. where k is a constant that represents the rate of decay (in units of 1/time) of the speciﬁc radioactive material.9. This function is not symmetric. We do not show the mean (which is close but not identical).6.0 F(x) 0. We ﬁnd that the mean is x = 3. Note that the “most probable value”. From this ﬁgure. Using this probability density for atom decay.4. In principle. Continuous probability distributions 1. We can compute both the mean and the median for this distribution using numerical integration with the spreadsheet. i. the point at which p(x) ¯ is maximal is at x = 3.1 Radioactive decay Radioactive decay is a probabilistic phenomenon: an atom spontaneously emits a particle and changes into a new form. . The interpretation of the probability density and the cumulative function. which is again different from both the mean and the median.4.1 and 8. that is.e. heights in a population. but we can study a large collection of atoms and draw some interesting conclusions.0 0. As in Figures 8. 8.5679.162 Chapter 8.0 Figure 8.2. so the mean and median are not the same. it is possible that we would have to wait a “very long time” to have all of the atoms decay.0 xmed 6.5 p(x) 0. but for the probability density p(x) = (π/36)x sin(πx/6). We can deﬁne a probability density function that represents the probability per unit time that an atom would decay at time t. we can characterize the mean and median decay time for the material.
so that lim e−kT = 0. T →∞ Thus. ﬁnd the constant C such that ∞ ∞ 163 p(t) dt = 0 0 Ce−kt dt = 1. and state here the deﬁnition: ∞ T I= 0 Ce−kt dt ≡ lim IT T →∞ where IT = 0 Ce−kt dt.e.8. k 1 To ﬁnd the constant of normalization C we require that I = 1. T goes to inﬁnity (T → ∞). The idea is to compute an integral over a ﬁnite interval 0 ≤ t ≤ T and then take a limit as the upper endpoint.4.3 decays to zero as T increases. in which one of the endpoints is at inﬁnity is called an improper integral36.5. Thus the (normalized) probability density for the decay is p(t) = ke−kt . i.3 will vanish as T → ∞ so that the value of the improper integral will be I = lim IT = T →∞ 1 C. despite the inﬁnitely long domain of integration).5 and 4.8. 8. the second term in braces in the integral I in Eqn. 8.e. and in particular when they “exist” (in the sense of producing a ﬁnite value. We compute: T IT = C 0 e−kt dt = C e−kt −k T = 0 1 C(1 − e−kT ). Some care is needed in understanding how to handle such integrals. Recall that an integral of this sort.3) To compute this limit. Chapter 10 for a more detailed discussion of improper integrals. We will delay full discussion to Chapter 10. Applications of continuous probability Normalization We ﬁrst ﬁnd the constant of normalization. the exponential term in Eqn. T →∞ k k (8. k Now we take the limit: I = lim IT = lim T →∞ T →∞ 1 1 C(1 − e−kT ) = C(1 − lim e−kT ). recall that for k > 0. encountered such integrals in Sections 3. C = 1. which means k that C = k. T > 0. i. This means that the fraction of atoms that decay between time t1 and t2 is t2 k t1 36 We have already e−kt dt. . See also.
37 Note that the precise English wording is subtle. We can simplify this expression by integrating: F (t) = k e−ks −k t 0 = − e−kt − e0 = 1 − e−kt . k Thus half of the atoms have decayed by this time. the probability of the atoms decaying by time t (which means anytime up to time t) is F (t) = 1 − e−kt .note subtle wording”37) is t t F (t) = 0 p(s) ds = k 0 e−ks ds.164 Cumulative decays Chapter 8. We note that F (0) = 0 and F (∞) = 1.e. tm (the time at which half of the atoms have decayed). (Remark: this is easily recognized as the half life of the radioactive process from previous familiarity with exponentially decaying functions. We compute this integral again as an improper integral by taking a limit as the top endpoint increases to inﬁnity. 2 so we get e−ktm = 1 . but very important here. we ﬁrst ﬁnd T IT = 0 tp(t) dt. 2 ⇒ ektm = 2. i. to determine the median decay time. Then 1 = F (tm ) = 1 − e−ktm . “any time up to time t” or “by time t . Median decay time As before. as expected for the cumulative function. Continuous probability distributions The fraction of the atoms that decay between time 0 and time t (i.) Mean decay time ¯ The mean time of decay t is given by ∞ ¯ t= 0 tp(t) dt. ⇒ ktm = ln 2. we set F (tm ) = 1/2.e. . ⇒ tm = ln 2 . “By time t” means that the event could have happened at any time right up to time t. Thus.
allele A or a in genetics). there was an unambiguous meaning to the concept of “mass at a point”. we compared the treatment of two types of mass distributions. The example below provides some further insight to the connection between continuous and discrete probability. v = e−kt /(−k). We explore this connection in more detail below. the same dichotomy exists in the topic of probability. In Chapter 7.2 Discrete versus continuous probability In Chapter 5.g.) In the ﬁrst case. we will see that one can arrive at the idea of probability density by reﬁning a set of measurements and making the appropriate scaling.3. . Let u = t. T →∞ 165 To compute IT we use integration by parts: T T IT = 0 tke−kt dt = k 0 te−kt dt. We ﬁrst explored a set of discrete masses strung along a “thin wire”. whereas in the latter case. so that IT = k t e−kt − (−k) e−kt dt (−k) T 0 T 0 T = −te−kt + e−kt dt 0 e−kt = −te−kt − k = −T e−kT e−kT 1 − + k k Now as T → ∞. In the ﬁrst case. say x = a and x = b. we talked about the mass of the objects. we have e−kT → 0 so that 1 ¯ t = lim IT = . we considered a single “bar” with a continuous distribution of density along its length. we could assign a mass to some section of the bar between. k 8.8.4. In the second case. In particular.4. we were interested in the idea of density (mass per unit distance: Note that the units of mass density are not the same as the units of mass. Head or Tail for a coin toss. (To do so we had to integrate the mass density on the interval a ≤ x ≤ b. Applications of continuous probability and then set ¯ t = lim IT . we were concerned with the probability of discrete events whose outcome belongs to some ﬁnite set of possibilities (e. dv = e−kt dt. Then du = dt. Later. T →∞ k Thus the mean or expected decay time is ¯ 1 t= .) As we have seen so far in this chapter.
38 I am grateful to David Austin for developing this example. we might reorganize the data slightly.5(a). Our bar graph would contain two bars. then both numerator and denominator decrease as the size of the bins is made smaller. (An example is shown in Figure 8. We could divide the population into smaller groups by shrinking the size of the interval or “bin” into which height is subdivided. .e.4. as shown in Figure 8.3 Example: Student heights Suppose we measure the heights of all UBC students. Here. p(h) p(h) p(h) h Δh Δh h h Figure 8. We observe that in this case. We could make a graph and show how these heights are distributed. Reﬁning a histogram by increasing the number of bins leads (eventually) to the idea of a continuous probability density. with the number of students in each height category represented by the heights of the bars. we might plot number of students in the bin . The important point to consider is that the height of each bar in the plot represents the number of students per unit height. then as the size of the bins gets smaller. Instead of plotting the number of students in each bin. so that the shape of the distribution is preserved (i. by a “bin” we mean a little interval of width ∆h where h is height. ∆h If we do this.e. Continuous probability distributions 8. For example. Suppose we want to record this distribution in more detail. i.166 Chapter 8. and those between 1. it does not get ﬂatter). so would the height of the bar: there would be fewer students in each category if we increase the number of categories. For example.5m.5 and 3 meters. To keep the bar height from shrinking. we could subdivide the student body into those students between 0 and 1. This would produce about 30.000 data values38 .5(b)). If we were to plot the number of students in each height category.5. we could keep track of the heights in increments of 50 cm. the number of students in a given height category is represented by the area of the bar corresponding to that category: Area of bin = ∆h number of students in the bin ∆h = number of students in the bin. a height interval.
The appropriate quantity is the cumulative function. this is the probability that the age of death is in the interval 0 ≤ age of death ≤ a. F (75) or F (80)? (d) Use the information in part (c) to estimate the probability of dying between the ages of 75 and 80 years old. we get a probability density. Related problems are presented in the homework. It is thus a density. Understanding the connection between the verbal description and the symbols we use to represent these concepts requires practice and experience. p(h) for the height of the population.11. estimate p(80) from this information.8. As noted.4. Applications of continuous probability 167 This type of plot is precisely what leads us to the idea of a density distribution. Solution (a) The probability of dying by age a is the same as the probability of dying any time up to age a. 39 Note in particular the units of h−1 attached to this probability density. What is the probability of surviving to age 80? Which is larger. p(h) ∆h represents the fraction of individuals whose height is in the range h ≤ height ≤ h + ∆h. We would like to answer the following questions: (a) What is the probability of dying by age a? (b) What is the probability of surviving to age a? (c) Suppose that we are told that F (75) = 0. x is playing the role of a “dummy variable”. and contrast this with a discrete probability that is a pure number carrying no such units. as customary.e.8 and that F (80) differs from F (75) by 0. where 0 ≤ a ≤ 120. (We have chosen an upper endpoint of age 120 since practically no Canadian female lives past this age at present. In this case.) Let F (a) be the cumulative distribution corresponding to this probability density. i. Restated. we get a continuous graph. . As ∆h shrinks. Let p(a) be a probability density for the probability of mortality of a female Canadian nonsmoker at age a. If we “normalize”. Hence the symbol x rather than a inside the integral. x and endpoint of the interval a.4 Example: Age dependent mortality In this example. Interpret this information in plain English. We are integrating over all ages between 0 and a. and has the appropriate units.4. divide by the total area under the graph. 8. Remark: note that. we consider an age distribution and interpret the meanings of the probability density and of the cumulative function. Further. so we do not want to confuse the notation for variable of integration. p(h) represents the fraction of students per unit height39 whose height is h. for this probability density a F (a) = 0 p(x) dx.
168
Chapter 8. Continuous probability distributions
(b) The probability of surviving to age a is the same as the probability of not dying before age a. By the elementary properties of probability discussed in the previous chapter, this is 1 − F (a). (c) F (75) = 0.8 means that the probability of dying some time up to age 75 is 0.8. (This also means that the probability of surviving past this age would be 10.8=0.2.) From the properties of probability, we know that the cumulative distribution is an increasing function, and thus it must be true that F (80) > F (75). Then F (80) = F (75) + 0.11 = 0.8 + 0.11 = 0.91. Thus the probability of surviving to age 80 is 10.91=0.09. This means that 9% of the population will make it to their 80’th birthday. (d) The probability of dying between the ages of 75 and 80 years old is exactly
80
p(x) dx.
75
However, we can also state this in terms of the cumulative function, since
80 80 75
p(x) dx =
75 0
p(x) dx −
0
p(x) dx = F (80) − F (75) = 0.11
Thus the probability of death between the ages of 75 and 80 is 0.11. To estimate p(80), we use the connection between the probability density and the cumulative distribution40: p(x) = F (x). (8.4) Then it is approximately true that F (x + ∆x) − F (x) . (8.5) ∆x (Recall the deﬁnition of the derivative, and note that we are approximating the derivative by the slope of a secant line.) Here we have information at ages 75 and 80, so ∆x = 80 − 75 = 5, and the approximation is rather crude, leading to p(x) ≈ p(80) ≈ F (80) − F (75) 0.11 = = 0.022 per year. 5 5
Several important points merit attention in the above example. First, information contained in the cumulative function is useful. Differences in values of F between x = a and x = b b are, after all, equivalent to an integral of the function a p(x)dx, and are the probability of a result in the given interval, a ≤ x ≤ b. Second, p(x) is the derivative of F (x). In the expression (8.5), we approximated that derivative by a small ﬁnite difference. Here we see at play many of the themes that have appeared in studying calculus: the connection between derivatives and integrals, the Fundamental Theorem of Calculus, and the relationship between tangent and secant lines.
40 In Eqn. (8.4) there is no longer confusion between a variable of integration and an endpoint, so we could revert to the notation p(a) = F (a), helping us to identify the independent variable as age. However, we have avoided doing so simply so that the formula in Eqn. (8.5) would be very recognizable as an approximation for a derivative.
8.4. Applications of continuous probability
169
8.4.5 Example: Raindrop size distribution
In this example, we ﬁnd a rather nonintuitive result, linking the distribution of raindrops of various radii with the distribution of their volumes. This reinforces the caution needed in interpreting and handling probabilities. During a Vancouver rainstorm, the distribution of raindrop radii is uniform for radii 0 ≤ r ≤ 4 (where r is measured in mm) and zero for larger r. By a uniform distribution we mean a function that has a constant value in the given interval. Thus, we are saying that the distribution looks like f (r) = C for 0 ≤ r ≤ 4. (a) Determine what is the probability density for raindrop radii, p(r)? Interpret the meaning of that function. (b) What is the associated cumulative function F (r) for this probability density? Interpret the meaning of that function. (c) In terms of the volume, what is the cumulative distribution F (V )? (d) In terms of the volume, what is the probability density p(V )? (e) What is the average volume of a raindrop? Solution This problem is challenging because one may be tempted to think that the uniform distribution of drop radii should give a uniform distribution of drop volumes. This is not the case, as the following argument shows! The sequence of steps is illustrated in Figure 8.6.
p(r)
(a)
F(r)
(b)
4 p(V)
(c)
r F(V)
4
(d)
r
V
V
Figure 8.6. Probability densities for raindrop radius and raindrop volume (left panels) and for the cumulative distributions (right) of each for Example 8.4.5.
170
Chapter 8. Continuous probability distributions
(a) The probability density function is p(r) = 1/4 for 0 ≤ r ≤ 4. This means that the probability per unit radius of ﬁnding a drop of size r is the same for all radii in 0 ≤ r ≤ 4, as shown in Fig. 8.6(a). Some of these drops will correspond to small volumes, and others to very large volumes. We will see that the probability per unit volume of ﬁnding a drop of given volume will be quite different. (b) The cumulative function is
r
F (r) =
0
r 1 ds = , 4 4
0 ≤ r ≤ 4.
(8.6)
A sketch of this function is shown in Fig. 8.6(b). (c) The cumulative function F (r) is proportional to the radius of the drop. We use the connection between radii and volume of spheres to rewrite that function in terms of the volume of the drop: Since 4 V = πr3 (8.7) 3 we have r= 3 4π
1/3
V 1/3 .
Substituting this expression into the formula (8.6), we get F (V ) = 1 4 3 4π
1/3
V 1/3 .
We ﬁnd the range of values of V by substituting r = 0, 4 into Eqn. (8.7) to get V = 0, 4 π43 . Therefore the interval is 0 ≤ V ≤ 4 π43 or 0 ≤ V ≤ (256/3)π. The 3 3 function F (V ) is sketched in panel (d) of Fig. 8.6. (d) We now use the connection between the probability density and the cumulative distribution, namely that p is the derivative of F . Now that the variable has been converted to volume, that derivative is a little more “interesting”: p(V ) = F (V ) Therefore, p(V ) = 1 4 3 4π
1/3
1 −2/3 V . 3
Thus the probability per unit volume of ﬁnding a drop of volume V in 0 ≤ V ≤ 4 π43 3 is not at all uniform. This probability density is shown in Fig. 8.6(c) This results from the fact that the differential quantity dr behaves very differently from dV , and reinforces the fact that we are dealing with density, not with a probability per se. We note that this distribution has smaller values at larger values of V .
8.5. Moments of a probability density (e) The range of values of V is 0≤V ≤ and therefore the mean volume is ¯ V =
0 256π/3
171
256π , 3
1/3 0 256π/3
V p(V )dV = 3 4π = 1 16
1/3 0 256π/3
1 12
3 4π
V · V −2/3 dV
1/3
=
1 12
V 1/3 dV = 256π 3
4/3
1 12 =
3 4π
3 4/3 V 4
256π/3 0
3 4π
1/3
64π ≈ 67mm3 . 3
8.5
Moments of a probability density
We are now familiar with some of the properties of probability distributions. On this page we will introduce a set of numbers that describe various properties of such distributions. Some of these have already been encountered in our previous discussion, but now we will see that these ﬁt into a pattern of quantities called moments of the distribution.
8.5.1 Deﬁnition of moments
Let f (x) be any function which is deﬁned and positive on an interval [a, b]. We might refer to the function as a distribution, whether or not we consider it to be a probability density. Then we will deﬁne the following moments of this function:
b
zero’th moment M0 =
a b
f (x) dx
ﬁrst moment M1 =
a b
x f (x) dx x2 f (x) dx
a
second moment M2 = . . .
b
n’th moment Mn =
a
xn f (x) dx.
Observe that moments of any order are deﬁned by integrating the distribution f (x) with a suitable power of x over the interval [a, b]. However, in practice we will see that usually moments up to the second are usefully employed to describe common attributes of a distribution.
172
Chapter 8. Continuous probability distributions
8.5.2 Relationship of moments to mean and variance of a probability density
In the particular case that the distribution is a probability density, p(x), deﬁned on the interval a ≤ x ≤ b, we have already established the following :
b
M0 =
a
p(x) dx = 1.
(This follows from the basic property of a probability density.) Thus The zero’th moment of any probability density is 1. Further
b
M1 =
a
x p(x) dx = x = µ. ¯
That is, The ﬁrst moment of a probability density is the same as the mean (i.e. expected value) of that probability density. So far, we have used the symbol x to represent the mean ¯ or average value of x but often the symbol µ is also used to denote the mean. The second moment, of a probability density also has a useful interpretation. From above deﬁnitions, the second moment of p(x) over the interval a ≤ x ≤ b is
b
M2 =
a
x2 p(x) dx.
We will shortly see that the second moment helps describe the way that density is distributed about the mean. For this purpose, we must describe the notion of variance or standard deviation. Variance and standard deviation Two children of approximately the same size can balance on a teetertotter by sitting very close to the point at which the beam pivots. They can also achieve a balance by sitting at the very ends of the beam, equally far away. In both cases, the center of mass of the distribution is at the same place: precisely at the pivot point. However, the mass is distributed very differently in these two cases. In the ﬁrst case, the mass is clustered close to the center, whereas in the second, it is distributed further away. We may want to be able to describe this distinction, and we could do so by considering higher moments of the mass distribution. Similarly, if we want to describe how a probability density distribution is distributed about its mean, we consider moments higher than the ﬁrst. We use the idea of the variance to describe whether the distribution is clustered close to its mean, or spread out over a great distance from the mean. Variance The variance is deﬁned as the average value of the quantity (distance f rom mean)2 , where the average is taken over the whole distribution. (The reason for the square is that we would not like values to the left and right of the mean to cancel out.) For discrete probability with mean, µ we deﬁne variance by
Expanding the integral leads to: b b b V = a b x2 p(x)dx − 2µx p(x) dx + a b a µ2 p(x) dx b = a x2 p(x)dx − 2µ x p(x) dx + µ2 a a p(x) dx. . M2 and to the mean. Moments of a probability density 173 V = (xi − µ)2 pi . Thus V = M2 − µ2 . Observe that the variance is related to the second moment. we deﬁne the variance by b V = a (x − µ)2 p(x) dx. Using the deﬁnitions.8. Relationship of variance to second moment From the equation for variance we calculate that b V = a (x − µ)2 p(x) dx = b a (x2 − 2µx + µ2 ) p(x) dx. since they are simply moments of the probability distribution. The standard deviation The standard deviation is deﬁned as √ V. σ= Let us see what this implies about the connection between the variance and the moments of the distribution. µ of the distribution. For a continuous probability density. with mean µ. we arrive at V = M2 − 2µ µ + µ2 . We recognize the integrals in the above expression.5.
8. leading to µ = M1 = b+a (b − a)(b + a) = . 2(b − a) This last expression can be simpliﬁed by factoring. 2(b − a) 2 The value (b + a)/2 is a midpoint of the interval [a. It has the shape of a rectangular band of height C and base (b − a).3 Example: computing moments Consider a probability density such that p(x) = C is constant for values of x in the interval [a. so it is easy to see that the value of the constant C should be C = 1/(b − a). Continuous probability distributions Relationship of variance to second moment Using the above deﬁnitions.) We also ﬁnd that b M1 = a x p(x) dx = 1 b−a b x dx = a 1 x2 b−a 2 b = a b 2 − a2 . b−a a ≤ x ≤ b. 3(b − a) Factoring simpliﬁes this to M2 = (b − a)(b2 + ab + a2 ) b2 + ab + a2 = . since we have determined that the zeroth moment of any probability density is 1. this is a uniform distribution. To ﬁnd the variance we calculate the second moment. a (This was already known. 3(b − a) 3 41 As noted before. as expected for a symmetric distribution. b M2 = a x2 p(x) dx = 1 b−a b a x2 dx = 1 b−a x3 3 b = a b 3 − a3 . b].5. Thus we have found that the mean µ is in the center of the interval. The area under the graph of this function for a ≤ x ≤ b is A = C · (b − a) ≡ 1 (enforced by the usual property of a probability density). We compute some of the moments of this probability density b M0 = a p(x)dx = 1 b−a b 1 dx = 1. Thus p(x) = 1 . . b] and zero for values outside this interval41 . The median would be at the same place by a simple symmetry argument: half the area is to the left and half the area is to the right of this point. the standard deviation.174 Chapter 8. σ can be expressed as √ σ = V = M2 − µ2 .
x). a long tail in one direction will shift the mean toward that direction more strongly than the median. xmed is the value for which F (xmed ) = 1 . (¯. p (x) = F (x).e. The variance of a probability density is b V = a (x − µ)2 p(x) dx. We learned that this function is a probability per unit value (of the variable of interest). and the standard deviation is This quantity describes the “width” of the distribution.6. We deﬁned the n’th moment of a probability density as b Mn = a xn p(x)dx. we extended the discrete probability encountered in Chapter 7 to the case of continuous probability density. If the distribution is nonsymmetric.) The mean and median are two descriptors for some features of probability densities. 2 3 175 8. so that b p(x)dx = probability that x takes a value in the interval (a.6 Summary In this chapter. F (x) is an antiderivative of p(x) (or synonymously. σ= √ V. 2 Both mean and median correspond to the “center” of a symmetric distribution. .8. a We also deﬁned and studied the cumulative function x F (x) = a p(s)ds = probability of a value in the interval (a. i. or sometimes x called µ) is b x= ¯ a xp(x)dx whereas the median. b). 3 4 12 12 (b − a) √ . how spread out (large σ) or clumped (small σ) it is. We noted that by the Fundamental Theorem of Calculus. the mean. Summary The variance is then V = M2 − µ2 = The standard deviation is σ= (b + a)2 b2 − 2ab + a2 (b − a)2 b2 + ab + a2 − = = . For p(x) deﬁned on an interval a ≤ x ≤ b and zero outside.
Continuous probability distributions and showed that the ﬁrst few moments are related to mean and variance of the probability. Most of these concepts are directly linked to the analogous ideas in discrete probability. we used integration in place of summation. but in this chapter. to deal with the continuous. rather than the discrete case.176 Chapter 8. .
The importance and wide applicability of this topic cannot be overstated. (4. whereas later. we reviewed an example of a differential equation for velocity. since we are concerned only with functions that depend on a single variable. we discuss ordinary differential equations (ODE’s). temperature of a cooling object. after a multivariate calculus course where partial derivatives are introduced. Constructing the differential equation that adequately represents a system of interest is an art that takes some thought and experience. For a pendulum.4. The rate of growth of a population dN/dt depends on the size of that population at the given time N (t). the goal is to make a prediction 42 Newton’s law states that force is proportional to acceleration. the force is due to gravity. which we call “modeling”. and speed of a moving object subjected to friction. In this process. but here. and in any quantitative analysis of systems where rates of change are linked to the state of the system. and discussed its solution. In this chapter.8). Such equations are encountered in many areas of science.42 Many biological processes are also described by differential equations. we apply the tools of integration to ﬁnding solutions to differential equations. we present a more systematic approach to solving such equations using a technique called separation of variables. In this course. For example.Chapter 9 Differential Equations 9.2. of partial differential equations (PDE’s) can be studied. leaving out many complicating details.1 Introduction A differential equation is a relationship between some (unknown) function and one of its derivatives. Most laws of physics are of this form. 177 . In Section 4. applying the familiar Newton’s law. F = ma. Given a differential equation and a starting value. for example. The details of age distribution might be neglected in modeling a growing population. many simpliﬁcations are made so that the essential properties of a given system are captured. Examples of differential equations were encountered in an earlier calculus course in the context of population growth. friction might be neglected in “modeling” a perfect pendulum. and the acceleration is a second derivative of the x or y coordinate of the bob on the pendulum. links the position of a pendulum’s mass to its acceleration (second derivative of position). Now that we have techniques for integration. we can devise a new approach to computing solutions of differential equations. a wider class.
or other effects. given information about birth and mortality rates. disease. m are both constants is a simpliﬁcation that neglects many biological effects. y) on one side of the equation.178 Chapter 9. Suppose that b > 0 is the per capita average birth rate. or whose solution was supplied in advance. and m > 0 the per capita average mortality rate. We ask how a population in such ideal circumstances would change over time. we can ﬁnd the analytic solution to a variety of simple ﬁrstorder differential equations (i. time t) on the other side.1 A simple model for population growth Let y(t) represent the size of a population at time t. we essentially convert information about the rate of change and starting level of the population to a detailed prediction of the population at later times. the rate of mortality is given by my. our exploration of differential equations was limited to those whose solution could be guessed. and are to be taken as an approximation of any real population growth. We build up a simple model (i.2. i. but will be used for simplicity in this ﬁrst example. irrespective of age. allowing us to compute the population size at any time t.2 Unlimited population growth We start with a simple example that was treated thoroughly in the differential calculus semester of this course. the population level is speciﬁed. Differential Equations about the future behaviour of the system. We refer to such a function as the solution to the initial value problem (IVP). 9. (both of which are here assumed to be constant over time). Similarly. In differential calculus. This reduces the problem to integration and algebraic manipulation. We consider a population with per capita birth and mortality rates that are constant. The assumption that b.g. those involving the ﬁrst derivative of the unknown function).g. This is equivalent to identifying the function that satisﬁes the given differential equation and initial value(s). and the independent variable (e. a differential equation) to describe this ideal case. and then proceed to ﬁnd its solution.e.43 9. This technique works for examples that are simple enough that we can isolate the dependent variable (e. Solving the differential equation is accomplished by a new technique introduced here. The population changes through births and mortality. We want to ﬁnd the population at later times. . We will assume that at time t = 0. By going through this process.e. environmental changes. can be restated as rate of change of y = rate of births − rate of mortality where the rate of births is given by the product of the per capita average birth rate b and the population size y. Translating the rate of 43 Of course. y(0) = y0 is some given constant.e. We also explored some of the fascinating geometric and qualitative properties of such equations and their predictions. The statement that the population increases through births and decreases due to mortality. We will describe the technique of separation of variables. Now that we have techniques of integration. namely separation of variables. we must keep in mind that such predictions are based on simplifying assumptions.
1) Recall that a differential equation together with an initial condition is called an initial value problem.1. this simple model of unlimited growth leads to the differential equation and initial condition: dy = ky. which we are trying to determine. . To summarize. dt Let us deﬁne the new constant.g. and the RHS only on t. writing dy dt = ky dt + C. 9. There is also a marginal case that b = m. To determine the appropriate intervals for integration. We can distinguish two possible cases: b > m means that there are more births then deaths. since the integral on dt the right hand side (RHS) can only be carried out if we know the function y = y(t). dt y(0) = y0 . (where C is some constant). where the population does not change at all. so that the population will eventually go extinct. but this is not very useful.2) can be carried out independently. e. separation of variables. that will be used in all the examples described in this chapter. dt Rather than integrating this equation as is44 . integrating each side of Eqn. 179 Then k is the net per capita growth rate of the population. for which k = 0. we use an alternate approach. considering dt and dy as “differentials” in the sense deﬁned in Section 6.2. (9.9. Now. Moreover. this ﬁrst example will be relatively straightforward.2) depends only on the variable y. (9. k = b − m. To ﬁnd a solution to such a problem.2 Separation of variables and integration We here introduce the technique. Unlimited population growth change into the corresponding derivative of y leads to dy = by − my = (b − m)y. given its initial size at time t = 0. We rearrange and rewrite the above equation in the form 1 dy = k dt. The constant k will not interfere with any integration step. b < m means that there are more deaths than births. Since the differential equation (9.2) This step of putting expressions involving the independent variable t on one side and expressions involving the dependent variable y on the opposite side gives rise to the name “separation of variables”. we observe that when time sweeps over some interval 0 ≤ t ≤ T (from initial to ﬁnal time). We would like to determine y(t) given the differential equation dy = ky. y (9. (9. the LHS of Eqn.2. the value of y(t) will 44 We may be tempted to integrate both sides of this equation with respect to the independent variable t. so we expect the population to grow. we look for the function y(t) that describes the population size at any future time t.1) is relatively simple.
where friction is neglected). T . ln y(T ) = kT.180 Chapter 9.3) also satisﬁes the differential equation in (9. Simply differentiate Eqn. as required by the equation (9. . ln y = kt . and ﬁnd the solution v(t) as a function of time. we can set T = t. under ideal conditions.3 Terminal velocity and steady states Here we revisit the equation for velocity of a falling object that we ﬁrst encountered in Section 4. a population will grow exponentially with time. 45 This kind of check is good practice and helps to spot errors.4. In other words. Here y0 is the given starting value of y (prescribed by the initial condition in (9. so that the solution (9. 9. Differential Equations change over a corresponding interval y0 ≤ y ≤ y(T ). but our goal is to ﬁnd that value.e. We will ﬁrst reconsider the simplest case of uniformly accelerated motion (i. y0 0 ln y(T ) − ln y(0) = k(T − 0). (9. But this result holds for any arbitrary ﬁnal time. we have determined that.2.1).1). By solving the initial value problem (9. as in Section 4.3) The above formula relates the predicted value of y at any time t to its initial value. since this is true for any time we chose. as in Section 4.3) and show that the result is the same as k times the original function. (9. we get y(0) = y0 ekt = y0 e0 = y0 . arriving at the desired solution y(t) = y0 ekt . Observe that plugging in t = 0. when the net per capita growth rate t is constant. and to all the parameters of the problem. We then include friction. i. y0 y(T ) = y0 ekT .e to predict the future behaviour of y.2.1).2.3. Integrating leads to y(T ) y0 1 dy = y T T k dt = k 0 y(T ) T 0 dt. We do not yet know y(T ). We wish to derive the appropriate differential equation governing that velocity.3) satisﬁes the initial condition.4 and use the new technique of separation of variables to shortcut the method of solution. y0 y(T ) = ekT . Recall that this validates results that we had encountered in our ﬁrst calculus course. We leave as an exercise for the reader45 to validate that the function in(9.1)).
e.3. but it would just be “overkill”. (9. i. To a good approximation. so the constant we need is C = 0 and the velocity satisﬁes v(t) = gt.3. since simple integration of the each side of the equation “as is” does the job. Plugging in v(0) = 0 into Eqn. dt (9. where γ is the frictional coefﬁcient.3. which is equivalent to the statement that the velocity increases at a constant rate. . we can divide through by m.4).9. A force balance for the falling object leads to ma(t) = mg − γv(t). For an object of constant mass.4) Because g is constant. It would not be wrong to use separation of variables to ﬁnd the solution for Eqn. so γ a(t) = g − v(t). We take the positive direction to be downwards. We have just arrived at a result that parallels Eqn. the initial velocity of the object is known to be v(0) = 0. m 46 It is important to note the distinction between this simple example and other cases where separation of variables is required. we do not need to use separation of variables.5) leads to dt 0 = g · 0 + C = C. respectively of an object falling under the force of gravity at time t.e. Suppose that at time t = 0. for convenience. we can simplify the integral dv dt = dv = v). 9. we arrive at v(t) = gt + C. a frictional force retards the downwards motion.2 Including friction: the case of terminal velocity When a falling object experiences the force of friction. it cannot accelerate indeﬁnitely. that force is proportional to the velocity. (equivalently.3 (in slightly different notation). (9. When friction is neglected. In fact. a(t) = g. we can integrate each side of this equation directly46 . where C is an integration constant.5) Here we have used (on the LHS) that v is the antiderivative of dv/dt. dv = g. (4. (9.2. Writing dv dt = dt g dt + C = g dt + C. the object starts from rest. i. the object will accelerate.1 Ignoring friction: the uniformly accelerated case Let v(t) and a(t) be the velocity and the acceleration.4) of Section 4. Terminal velocity and steady states 181 9.
0 velocity v terminal velocity 0. during this time interval. v(T ) at the ﬁnal time.3. dt dv = dt.0 Figure 9.2.8 m/s2 and k = 0. The velocity v(t) as a function of time given by Eqn.1. Note that as time increases.5. the velocity approaches some constant terminal velocity. 20.0 0. 0 . The parameters used were g = 9. we get dv = g − kv. v(0) = 0. Differential Equations Let k = γ/m.0 time t 10. and suppose that. the velocity at any time satisﬁes the differential equation and initial condition dv = g − kv. the velocity changes from an initial value of v(0) = 0 to the ﬁnal value. Then using separation of variables and integration.6) dt We can ﬁnd the solution to this differential equation and predict the velocity at any time t using separation of variables.182 Chapter 9.7) as found in Section 9. T . g − kv v(T ) 0 dv = g − kv T dt. Consider a time interval 0 ≤ t ≤ T . (9. Then. (9.
u k After replacing u by g − kv.e. k Here we note that v(T ) can never be larger than g/k since the term (1 − e−kT ) is always ≤ 1. we were correct in assuming that g − kv(T ) ≥ 0. g − kv(T ) ≥ 0). so we can write. Terminal velocity and steady states 183 Substitute u = g − kv for the integral on the left hand side. Hence. This will mean that the quantity g − kv(T ) is always be nonnegative (i. we have to consider the sign of the term inside the absolute value in the numerator.3. v(T ) = g (1 − e−kT ). and ﬁnally solve for v(T ) to obtain our ﬁnal result. In the case we are considering here. To simplify further. Extricating it will involve some subtle reasoning about signs because there is an absolute value to contend with. the above formula relating velocity to time holds for any choice of the ﬁnal time T . 0 g − kv(T ) = −kT. As a ﬁrst step. so we get an integral of the form − 1 k 1 1 du = − ln u. we exponentiate both sides to remove the logarithm. g We are ﬁnished with the integration step. v(t) = g (1 − e−kt ). dv = (−1/k)du. Then du = −kdv. k − 1 k ln ln g − kv(T ) g = T. we could remove absolute values signs from it. k (9. v(0) = 0. we can write g − kv(T ) = g − kv(T ) = ge−kT .7) . supposing this is true. g − kv(T ) = e−kT g ⇒ g − kv(T ) = ge−kT .9. in general. We will verify this fact shortly. For the moment. As before. v(T ) 0 T =t . we arrive at 1 − ln g − kv k We use the fact that v(0) = 0 to write this as 1 − (ln g − kv(T ) − ln g) = T. v(T ) is still tangled up inside an expression involving the natural logarithm. Because the constant g is positive. but the function we are trying to ﬁnd.
as studied in differential calculus. by letting dv = 0. g .3. k 9. with the same method as we applied to the problem of terminal velocity. (4. 9. It predicts the velocity of the falling object through time. y(t) = 47 It 48 A a a −bt + y0 − e . we arrive at g − kv = 0 ⇒ v= g . the term e−kt decreases rapidly.3 Steady state We might observe that the terminal velocity can also be found quite simply and directly from the differential equation itself: it is the steady state of the differential equation.7) in Figure 9. That is the numerical method alternative to the analytic technique discussed in this chapter.8) dt with given initial condition y(0) = y0 . to zero. . We graph the expression given in (9.6). i.e. at steady state.6) could be assembled using Euler’s method. Note that as t increases. similar plot of the solution of the differential equation (9. The steady state can be found by setting the derivative in the differential equation. b b (9. the velocity of the falling object is indeed the same as the terminal velocity that we have just discovered. the value for which no further change takes place. this class is represented by linear differential equations of the form dy = a − by. so that the velocity approaches a constant whose value is v(t) → We call this the terminal velocity48 .11).1. writing dy = dt a − by and proceeding as in the previous example.9) often happens that a differential equation can be solved using several different methods.e. k Thus. Now.3.2 belongs to a class of problems that share many common features. Generally. (9. but using the technique of separation of variables47 . We arrive at its solution. we can integrate this equation by separation of variables. i. The student may wish to review results obtained in a previous semester to appreciate the correspondence.184 Chapter 9. Differential Equations This is the solution to the initial value problem (9. dt When this is done. Note that we have arrived once more at the result obtained in Eqn. Properties of this equation were studied in the context of differential calculus in a previous semester.4 Related problems and examples The example discussed in Section 9.
We show this typical pattern in Figure 9. the process of recovery is represented by dy = −by. after drinking stops.e.9) has several features of note: it illustrates the fact that alcohol would increase from the initial level. this level could be toxic to the drinker. Then an equation of the form (9. Most people do not drink this way. and the assumptions of the model may break down in that region! In the phase of “recovery”. young healthy drinkers have a higher value of b than those who can no longer metabolize alcohol as efﬁciently. y = .e. Suppose that the average rate of drinking is gradual and constant (i.1 Blood alcohol Let y(t) be the level of alcohol in the blood of an individual during a party. The constant a would reﬂect the rate of intake per unit volume of the individual’s blood: larger people take longer to “get drunk” for a given amount consumed49. Further.2 Chemical kinetics The same ideas apply to any chemical substance that is formed at a constant rate (or supplied at a constant rate) a. we are here assuming a constant intake rate. dt b Indeed the solution given in the formula (9. We observe that the steady state of the above equation is obtained by setting 185 a dy = a − by = 0. and then breaks down with rate proportional to its concentration. Indeed. such as food intake. as the rate of metabolism can depend on other factors. This equation has a number of important applications that arise in a variety of context. assume that alcohol is detoxiﬁed in the liver at a rate proportional to its blood level. y will approach its steady state level. Instead.2. the exponential term e−bt → 0 so that the term in large brackets will vanish and y → a/b. small sips are continually taken. It is possible to describe this. 49 Of course. y(0) = 0 would signify the absence of alcohol in the body at the beginning of the evening. This means that from any initial value. The constant b represents the rate of decay of alcohol per unit time due to degradation by the liver. so that the rate of input of alcohol is approximately constant). Of course. y(0) = y0 .4. but only up to a maximum of a/b. The solution (9. as though the alcohol is being continually sipped all evening at a uniform rate. 9. 9. (9. the level y = a/b represents a steady state level (as long as drinking continues). assumed constant50 . We then call the constant b the “decay rate constant”. i. Related problems and examples The steps are left as an exercise for the reader. but we will not do so in this chapter. instead quafﬁng a few large drinks over some hour(s). the above differential equation no longer describes the level of blood alcohol.4. .9. 50 This is also a simplifying assumption.9) has the property that as t increases.8) would describe the blood level over the period of drinking. where the intake and degradation balance.10) dt The level of blood alcohol then decays exponentially with rate b from its level at the moment that drinking ends. A few of these are mentioned below.4.
We can derive a simple differential equation that describes the rate that the height of the ﬂuid changes using the following physical argument. The rate of emptying of the container will depend on the height of ﬂuid in the container above the hole52 . . with a constant cross sectional area A > 0. and the same differential equation describes this chemical process. 9. given any initial level of the substance.2 0 0 2 4 6 8 10 Figure 9. so that we can talk about the net changes in volume (rather than mass). Suppose that the area of the hole is a. The variable y(t) represents the concentration of chemical at time t.4 0. and use mass balance to derive a differential equation model. Finally. y(t) = y0 .6 0.8) for the ﬁrst two hours of drinking. 9. we will also use separation of variables to predict how long it takes for the container to be emptied. The level of alcohol in the blood is described by Eqn. it illustrates a slightly different integral.8 0. We will look at the ﬂow of ﬂuid leaking out of a container. 52 As we have assumed that the hole is at h = 0. the level of y will eventually approach the steady state. We will here assume that the density of water is constant. As above. the drinking stopped (so a = 0 from then on). it shows precisely how physical laws can be combined to formulate a model. First. Differential Equations 1 Blood Alcohol level 0.186 Chapter 9. The level of alcohol in the blood then decays back to zero. 51 This example is particularly instructive. 9. (9.2. The rate that ﬂuid leaves through the hole must balance with the rate that ﬂuid decreases in the container.1 Conservation of mass Suppose that the container is a cylinder. y = a/b.5 Emptying a container In this section we investigate a new problem in which the differential equation that describes a process will be derived from basic physical principles51 . following Eqn.10). When this is done. At t = 2h. (9. h(t) to be the same as ”the height of ﬂuid above the hole”. then it shows how the problem can be recast as a single ODE in one dependent variable. We will assume that the container has a small hole at its base. as shown in Fig.5. This principle is called mass balance.3. we henceforth consider the height of the ﬂuid surface.
5.3 shows a little “cylindrical unit” of ﬂuid that ﬂows out of the hole per second. (The small inset in Fig. a is the crosssectional area of the hole through which ﬂuid drains. if we know the area of the hole. V (t) = Ah(t) where A is the crosssectional area and h(t) is the height of the ﬂuid at time t.3. v(t) is the velocity of the ﬂuid.9.see small cylindrical volume indicated on the right. The rate of change of V is dV = −(rate volume lost as ﬂuid ﬂows out). The volume of ﬂuid leaking out in a time span ∆t is av∆t . . We refer to V (t) as the volume of ﬂuid in the container at time t. We investigate the time it takes to empty a container full of ﬂuid by deriving a differential equation model and solving it using the methods developed in this chapter. Indeed. At every second. Note that for the cylindrical container. 9. dt Now we need to determine the velocity v of the ﬂow to complete the formulation of the problem. dt (The minus sign indicates that the volume is decreasing).) So far we have a relationship between the volume of ﬂuid in the tank and the velocity of the water exiting the hole: dV = −av. some amount of ﬂuid leaves through the hole. and h(t) is the time dependent height of ﬂuid remaining in the tank (indicated by the dashed line). In fact. all the particles in a little cylinder of length v behind these molecules have also left the hole. Thus the volume leaving per second is va. Suppose we are told that the velocity of the water molecules leaving the hole is precisely v(t) in units of cm/sec. namely rate volume lost as ﬂuid ﬂows out = va. those particles have moved a distance v cm/sec · 1 sec = v cm. we can determine precisely what volume of water exits through the hole each second. Emptying a container 187 h A a a vΔ t Figure 9. A is the crosssectional area of the cylindrical tank. (We will ﬁnd out how to determine this velocity shortly.) Then in one second. The area is a and the length of that little volume is v.
Keeping units in an equation consistent is essential.188 Chapter 9. 2 (Here v = v(t) is the instantaneous velocity of the ﬂuid leaving the hole and h = h(t) is the height of the water column.e. whereas when the water ﬂows out of the hole.11) In fact. In doing so. 9. Differential Equations 9. Recall that the volume of the water in the tank. There are three timedependent variables that were discussed above. for these to balance (so that total energy is conserved) we have 1 mv 2 = mgh. the volume V (t).) This allows us to relate the velocity of the ﬂuid leaving the hole to the height of the water in the tank. h(t). a differential equation for a single (unknown) function of time. It proves convenient to express everything in terms of the height of water in the tank.5.12) . the crosssectional area of the tank. dt dt dt But by previous steps and Eqn. dt Thus A or simply put. Thus. so we might write [v(t)]2 = 2gh(t) or v(t) = 2gh(t). its kinetic energy is given by (1/2)mv 2 where v is velocity. Potential energy of a small mass of water (m) at height h will be mgh. a small mass of water has simply exchanged some potential energy (due to its relative height above the hole) for kinetic energy (expressed by how fast it is moving).3 Putting it together We now combine the various pieces of information to arrive at the model. dh a =− dt A d(h(t)) = −a 2gh. i. (9. though this choice is to some extent arbitrary.11) dV = −av = −a 2gh.2 Conservation of energy The ﬂuid “picks up speed” because it has “dropped” by a height h from the top of the ﬂuid surface to the hole.5. the height h(t). both the height of ﬂuid and its exit velocity are constantly changing as the ﬂuid drains. including differential equations. of the velocity v(t). dt √ 2gh = −k h. V (t) is related to the height of ﬂuid h(t) by V (t) = Ah(t). v 2 = 2gh ⇒ v= 2gh. We can simplify as follows: d(Ah(t)) d(h(t)) dV = =A . (9. where A > 0 is a constant. We have arrived at this result using an energy balance argument. (9. Checking for unit consistency can help to uncover errors in equations.
rewrite the equation in the separated form. then k will be very small. we have shown that the height h(t) of water in the tank at time t satisﬁes the differential equation (9. h(T ) h0 1 √ dh = −k h T dt. nonlinear differential equations are more challenging than linear ones. the rate of change in h per unit time will not be large). we use separation of variables to ﬁnd the height as a function of time. We do this using separation of variables. h. but we are interested in an explicit formula for ﬂuid height h versus time t.) As usual. (9. Using simple principles such as conservation of mass and conservation of energy. (9. 0 Now integrate both sides and simplify: h1/2 (1/2) 2 h(T ) h0 = −kT h0 = −kT h(T ) − 53 In many cases. A taller column of water drains faster. Once its height has been reduced. 189 If the area of the hole is very small relative to the crosssectional area of the tank. so that the tank will drain very slowly (i.9.13). we must determine the solution to this differential equation. the same tank will drain more quickly.13) shows how height of ﬂuid is related to its rate of change. (9. Emptying a container where k is a constant that depends on the size and shape of the cylinder and its hole: k= a A 2g. we arrive at initial value problem to solve: √ dh = −k h. To obtain that relationship. We also remark that Eqn. examples chosen in this chapter are simple enough that we will not experience the true challenges of such nonlinearities.e. dt h(0) = h0 . On a planet with a very high gravitational force. Putting this together with the initial condition (height of ﬂuid h0 at time t = 0).12). signifying that the height of the ﬂuid decreases. However. .5.12) has a minus sign.5. during which the height of ﬂuid that started as h0 becomes some new height h(T ) to be determined. 9.4 Solution by separation of variables The equation (9. Next. (We will also use the initial condition h(0) = h0 that accompanies Eqn.13) is √ nonlinear53 as it involves the variable h in a nonlinear term. We comment that Equation (9. dh √ = −kdt. h We integrate from t = 0 to t = T . its rate of draining also slows down. this equation is valid only for h nonnegative.13) Clearly.
4 we show some of the “solution curves”54 .5. i. we can also write the form of the solution as h(t) = h0 − k t 2 2 . 10.0 <= initial height of fluid h(t) Emptying a fluidfilled container 0. The parameter k = 0. (9. A numerical method alternative would use Euler’s Method and the spreadsheet to obtain the (approximate) solution directly from the initial value problem (9.0 emptying time V time t 20.0 0. (9. this ﬁgure was produced by plotting the analytic solution (9. as shown in the next section.14) predicts ﬂuid height remaining in the tank versus time t. (9. We can also use our results to predict the emptying time.13).e. Differential Equations h0 T 2 2 h0 − k . 9. In Fig.14). Solution curves obtained by plotting Eqn. The “V” points to the time it takes the tank to empty starting from a height of h(t) = 10. h0 = 2. 54 As before. .14) for three different initial heights of ﬂuid in the container. (9. Since this is true for any time t. functions of the form Eqn.4 in each case.14) Eqn.0 Figure 9. 10. 5.4.190 h(T ) = −k h(T ) = T + 2 Chapter 9.14) for a variety of initial ﬂuid height values h0 .
9.5. Three examples are shown in Figure 9.6. This means. Clearly. N (0) = N0 .1 The logistic equation The logistic equation is the simplest density dependent growth equation. since a population cannot be negative. since we have found that √ k = (a/A) 2g. and use the new tools to analyze its predictions. K reﬂects that size of the population that can be sustained by the given environment. In place of our previous notation we will now use N to represent the size of a population.5. replaces the previous constant net growth rate k. To correct for this unrealistic feature.2 for population growth has an unrealistic feature of unlimited explosive exponential growth. 10. We will assume that the initial population is known. Setting h(t) = 0 in Eqn.41. r(K − N )/K.15) Here r > 0 is called the intrinsic growth rate and K > 0 is called the carrying capacity. that doubling the √ height of ﬂuid initially in the tank only increases the time it takes by a factor of 2 ≈ 1. Making the hole smaller has a more direct “proportional” effect. k The time it takes to empty the tank depends on the initial height of water in the tank.6 Density dependent growth The simple model discussed in Section 9. Let N (t) be the size of a population at time t. The emptying time depends on the squareroot of the initial height. for instance. We can understand this equation as a modiﬁed growth law in which the “density dependent” term. a common assumption is that the rate of growth is “density dependent”. 9.14 h0 − k t 2 2 = 0.5 How long will it take the tank to empty? The tank will be empty when the height of ﬂuid is zero. we get k te = 2 h0 ⇒ te = √ 2 h0 . 5. we expect N (t) ≥ 0 for all time t. Solving this equation for the emptying time te .4 for initial heights of h0 = 2. (9. 9. The logistic differential equation states that the rate of change of the population is given by dN = rN dt K −N K . Density dependent growth 191 9. In this section.9. and we study its behaviour below.6. we consider a revised differential equation that describes such growth. .
2 Scaling the equation The form of the equation can be simpliﬁed if we measure the population in units of the carrying capacity.1). and y0 = N0 /K is the scaled initial population level.3 Separation of variables Here we will solve Eqn. 9. Here again is an initial value problem.6. consider dividing each side of the logistic equation (9. the variable y appears in a nonlinear expression (in fact a quadratic) in the equation. 9. we obtain the scaled equation and initial condition given by dy = ry(1 − y). We can do so by partial fractions. Separating the variables leads to 1 dy = r dt y(1 − y) 1 dy = y(1 − y) r dt + K. forming d( N ) K =r dt N K 1− N K . the logistic differential equation is nonlinear.4. To show an alternative method of handling the integration. The idea is essentially the same as our previous examples.6. (9. (9.16) dt Now the variable y(t) measures population size in “units” of the carrying capacity. y(1 − y) .6.e.6.15) by the constant K. i. Then 1 dN r = N K dt K We now group terms conveniently. instead of “numbers of individuals”.192 Chapter 9. Replacing (N/K) by y in each case. K −N K . Differential Equations 9. the technique described in Section 6. K This procedure is called scaling. That is.6.13).16) by separation of variables. To see this. like Eqn. y(0) = y0 . we will treat both sides as indeﬁnite integrals. But we must work harder to evaluate the integral on the left. if we deﬁne a new quantity y(t) = N (t) . (9. but is somewhat more involved.4 Application of partial fractions Let I= 1 dy. but unlike Eqn. (9. The integral on the right will lead to rt + K where K is some constant of integration that we need to incorporate since we do not have endpoints on our integrals. Details are given in Section 9.
y(t) 1 + Cert = Cert . everything inside the absolute value is positive. What remains now is some algebra to isolate the desired function y(t) y(t) = (1 − y(t))Cert . Density dependent growth Then for some constants A. This must be true for all y. Indeed. we ﬁnd that y(0) = Ce0 = C. and in particular. y 1−y y(1 − y) so that A(1 − y) + By = 1. After exponentiating both sides we need to remove the absolute value. B from the fact that A B 1 + = .9.6. substituting in y = 0 and y = 1 leads to A = 1. That is. C > 0 is now also an arbitrary constant whose value will be determined from the initial conditions. B = 1 so that y I = ln y − ln 1 − y = ln . we want y as a function of t. and we can write y(t) = ert+K = eK ert = Cert .5 The solution of the logistic equation We now have to extract the quantity y from the equation ln y 1−y = rt + K.) We can ﬁnd A. y 1−y 193 (The minus sign in front of B stems from the fact that letting u = 1 − y would lead to du = −dy. (1 − y0 ) We will use this fact shortly. . In that case. if we substitute t = 0 into the most recent equation.6. (1 − y(0)) so that C= y0 . B we can write I= B A + dy = A ln y − B ln 1 − y. We will now assume that y is initially smaller than 1. 1−y 9. (1 − y(t)) In the above step. eK by the new name C for simplicity. we have simply renamed the constant. and show that it remains so.
5. Note that all solutions approach the value y = 1.0 0. C.6 What this solution tells us We have arrived at the function that describes the scaled population as a function of time as predicted by the scaled logistic equation. i.1.16). 0. (9. 9.17) Some typical solution curves of the logistic equation are shown in Fig. y0 .17) for three different initial conditions.18).0 time t 30. 0.18) y(t) = (y0 + (1 − y0 )e−rt ) . When we do so.0 y(t) Solutions to Logistic equation 0. by using what we know to be true about the constant C.e. Solution curves for y(t) in the scaled form of the logistic equation based on (9.25. we arrive at y(t) = 1 1+y0 −rt y0 e +1 = y0 . y0 = 0. (9.e. 9.5.0 Figure 9. We can also express it in terms of the initial value of y. We show the predicted behaviour of y(t) as given by Eqn. 1.194 y(t) = Chapter 9. i. (1 + Cert ) (1/C)e−rt + 1 The desired function is now expressed in terms of the time t. (y0 + (1 − y0 )e−rt ) (9. Differential Equations Cert 1 = . C = y0 /(1 − y0 ). and the constants r.6. The level of population (in units of the carrying capacity K) follows the timedependent function y0 .5. (9.
. The former is not too interesting. unscaled logistic differential equation. 9. (9. Now recall that r > 0. We saw in Figure 9. i. By setting dy/dt = 0.7 Extensions and other population models: the “Law of Mortality” There are many variants of the logistic model that are used to investigate the growth or mortality of a population. Integration. is the population size that will be sustained by the environment. y = 0 is also a steady state.e. and so y0 = 1. a steady state. Therefore. and noting that y0 = N0 /K leads to N0 . this could have been found by setting the derivative to zero in Eqn. As an aside. at which no further change will occur. we ﬁnd that 0 = ry(1 − y).7. we have shown that the behaviour of the logistic equation for population growth is more realistic than the simpler exponential growth we studied earlier. In summary. the only values of N satisfying this steady state equation are N = 0 or N = K. after a long time. i. (9. then there can be no population growth.19). given by Eqn. that a small population will grow. but only up to some constant level (the carrying capacity). This means that (N/K) → 1 or simply N (t) → K. The latter reﬂects that N = K.9. Substituting this for y(t). This implies that either N = 0 or N = K are steady states. which suggests that y = 1 is a steady state. Here we extend tools to another example. Doing so leads to dN = 0 ⇒ rN dt K −N K = 0. the carrying capacity.e. It states the obvious fact that if there is no population. and in particular the use of partial fractions allowed us to make a full prediction of the behaviour of the population level as a function of time. (9.18) implies that. could have been predicted directly from the differential equation. the original.5. This means that e−rt is a decreasing function of time. If r > 0.19) N (t) = (N0 + (K − N0 )e−rt ) It is left as an exercise for the reader to check this claim. (This is also true for the less interesting case of no population. the term e−rt in the denominator will be negligibly small. the gradual decline of . we observe that this too. y(t) → y0 so that y will approach the value 1. Extensions and other population models: the “Law of Mortality” 195 We can convert this result to an equivalent expression for the unscaled total population N (t) by recalling that y(t) = N (t)/K.) Similarly. (9.15). The population will thus settle into a constant level.
Infant mortality is generally higher than mortality for young children. the mortality looks more like the dashed curve. and increase as individuals age. 9.6.55. it is assumed that the log of mortality increases linearly with time.20) means that (9. Such a group is called a “cohort”. Differential Equations a group of individuals born at the same time. together with the following additional assumptions.56 It is easy to see that these two statements are equivalent: Suppose we assume that for some constants A > 0. Essentially. as depicted by Eqn. Gompertz assumed that mortality is not constant: it is low at ﬁrst. In the Gompertz Law of Mortality. 56 In 55 This . m would depend on the age of the individuals. whereas mortality levels off or even decreases slightly for those oldest old who have survived past the average lifespan. In 1825.196 Chapter 9. so is eA . Gompertz suggested that the rate of mortality.21) 9. For real populations. section was formulated with help from Lu Fan actual fact. For simplicity we deﬁne Let us deﬁne m0 = eA . Here the slope of ln(m) versus time (or age) is µ. ln(m(t)) = A + µt. Because we consider a group of people who were born at the same time. (9. Then Eqn. This turns out to be equivalent to the assumption that the logarithm of mortality increases linearly with time. the timedependent mortality is m = m(t) = m0 eµt . we can trade ”age” for ”time”.1 Aging and Survival curves for a cohort: We now study a population model having Gompertz mortality. (9. µ > 0.e. (m0 = m(0) is the socalled “birth mortality” i. this is likely true for some range of ages. value of m at age 0. m(t) = eA+µt = eA eµt Since A is constant. Gompertz argued that mortality increases exponentially. t Figure 9.) Thus.7.20) log mortality ln(m) slope µ A age.20 and by the solid curve in this diagram.
2 Gompertz Model All the people in the cohort were born at time (age) t = 0. a group of people who were born in the same year.e. where t is time since birth. We will denote by N (t) the number of people in this group who are alive at time t.9. 9. we arrive at the differential equation dN (t) = −m(t)N (t).8. . age. There is “natural” mortality. Exponential growth. 197 2. dt and using information about the size of the cohort at birth leads to the initial condition. but now mortality is timedependent. This means we ignore the mortality caused by epidemics. N (0) = N0 . dt N (0) = N0 . given by dy = ky. we used integration methods to ﬁnd the analytical solutions to a variety of differential equations where initial values were prescribed. 57 Note that new births would contribute to other cohorts. with initial population level y(0) = y0 dt was investigated (Eqn. That number changes with time due to mortality. this leads to the initial value problem dN (t) = −m(t)N (t). i.e. We consider a single cohort. 9. Summary 1. 3. Let N (0) = N0 be the initial number of individuals in the cohort. T ]: to show that the remaining population at age t is N (t) = N0 e− m0 µ (eµt −1) .g. and there were N0 of them at that time. we apply separation of variables and integrate over the time interval [0.1)). All individuals are assumed to be identical. by violence and by wars. Indeed. Together. The rate of change of cohort size = −[number of deaths per unit time] = −[mortality rate] · [cohort size] Translating to mathematical notation. (9.8 Summary In this chapter. Note similarity to Eqn. In the Problem set. by immigration)57. and assume that no new individuals are introduced (e.1). We will now study the size of a “cohort”. i. This model had an unrealistic feature that growth is unlimited.7. but no other type of removal. (9. We investigated a number of population growth models: 1.
Such ideas occur in many scientiﬁc problems. The Gompertz equation. including the velocity of a falling object subject to drag force. Differential Equations 2. By slight reinterpretation of terms in this equation. in chemistry. to arrive at a differential equation model. we used separation of variables to “integrate” the differential equation. we can use dt results to understand chemical kinetics and blood alcohol levels.15)). (9. as well as a host of other scientiﬁc applications. and predict the population as a function of time. The Logistic equation dN = rN K−N was analyzed (Eqn. the “centerpiece” of this chapter. and biology. In each of these cases. 3. This led us to study a differential equation of the form dy = a − by. dN (t) = −m(t)N (t). showing that dt K densitydependent growth can correct for the above unrealistic feature. We also investigated several other physical models in this chapter. . Section 9.198 Chapter 9.5. illustrated the detailed steps that go into the formulation of a differential equation model for ﬂow of liquid out of a container. physics. Here we saw how conservation statements and simplifying assumptions are interpreted together. was solved to understand how agedt dependent mortality affects a cohort of individuals.
we will ﬁnd that the notion of convergence and divergence will be important. and Taylor series 10. improper integrals. and a better “higher order” approximation (HOA.e. A second theme will be that of approximation of functions in terms of power series. Notice that this better approximation stays closer to the graph of the function near x0 . Understanding when these objects are meaningful is also related to the issue of convergence. The theme of approximation has appeared often in our calculus course.1. y LA HOA y=f(x) x0 x Figure 10.i. Such series can be described informally as inﬁnite polynomials (i. In this chapter. The function y = f (x) (solid heavy curve) is shown together with its linear approximation (LA. we discuss how such better approximations can be obtained. dashed line) at the point x0 . In a previous 199 . so we use the background assembled in the ﬁrst part of the chapter to address such concepts arising in the second part. integrals of functions over an inﬁnite domain.1 Introduction This chapter has several important and challenging goals. The ﬁrst of these is to understand how concepts that were discussed for ﬁnite series and integrals can be meaningfully extended to inﬁnite series and improper integrals . polynomials containing inﬁnitely many terms). also called Taylor series.Chapter 10 Inﬁnite series. In this part of the discussion. thin solid curve).e.
provided that S = lim Sn = lim n→∞ n→∞ ak . Otherwise it is divergent.e.2 Convergence and divergence of series Recall the geometric series discussed in Section 1. Further away.7 Deﬁnition: Convergence of inﬁnite series An inﬁnite series that has a ﬁnite sum is said to be convergent. Deﬁnition: Partial sums and convergence Suppose that S is an (inﬁnite) series whose terms are ak . Inﬁnite series. improper integrals. to ﬁnd a polynomial that approximates the given function. + r n = k=0 rk . This leads naturally to the question: can we do better in making this approximation if we include other terms to describe this “curving away”? Here we extend such linear approximation methods. If this limit exists. k=0 That is. Our goal is to increase the accuracy of the linear approximation by including higher order terms (quadratic. cubic. Then the partial sums. we say that the inﬁnite series converges58 to S.6. of this series are n Sn = k=0 ak . We say that the sum of the inﬁnite series is S. we discussed a linear approximation to a function. and Taylor series semester. we say that the series diverges. n Sn = 1 + r + r 2 + . 1−r (10.1) We also review deﬁnitions discussed in Section 1. and write ∞ n S= k=0 ak . We ﬁrst review the idea of series introduced in Chapter 1. This leads to the following conclusion: 58 If the limit does not exist. Sn . . The idea was to approximate the value of the function close to a point on its graph using a straight line (the tangent line). . the graph of the functions curves away from that straight line. 10. etc). .200 Chapter 10. i. The sum of a ﬁnite geometric series. This idea forms an important goal in this chapter. We noted in doing so that the approximation was good only close to the point of tangency. we consider the inﬁnite series as the limit of partial sums Sn as the number of terms n is increased. is Sn = 1 − rn+1 .
We can think of convergence or divergence of series using a geometric analogy. or diverge (fail to converge) in other cases. S converges if that distance remains ﬁnite and if we approach some ﬁxed number.2. We will see examples of each of these trends again. ∞ S = 1 + r + r2 + . (We will show this later on by considering the harmonic series. ak . It is important to remember that an inﬁnite series. . If we start on the number line at the origin and take a sequence of steps {a0 .2. the terms have to get smaller. 1−r provided r < 1. ∞ we can think of S = k=0 ak as the total distance we have travelled. In order for the sum of ‘inﬁnitely many things’ to add up to a ﬁnite number. . then we say that this sum does not exist (meaning that it is not ﬁnite). a sum with inﬁnitely many terms added up. . a1 . i.10. each step is a term in the series.2) If this inequality is not satisﬁed. . An informal schematic illustrating the concept of convergence and divergence of inﬁnite series. Here we show only a few terms of the inﬁnite series: from left to right.}. In the bottom example. a2 . It may converge in some cases. (10. = k=0 rk = 1 . the sum of the steps gets closer and closer to some (ﬁnite) value. Divergent series (or series that diverge under certain conditions) must be handled with particular care. But just getting smaller is not. . as the ﬁrst example shows. . . the steps lead to an ever increasing total sum.e. It is essential to be able to distinguish the two. . enough to guarantee convergence. + rk + . . In the top example. Convergence and divergence of series 201 The sum of an inﬁnite geometric series. in itself. We discuss some of these tests in Appendix 11. for otherwise. we may ﬁnd contradictions or “seemingly reasonable” calculations that have meaningless results. .9. "convergence" "divergence" Figure 10.) There are rigorous mathematical tests which help determine whether a series converges or not. . can exhibit either one of these two very different behaviours. .
3 Improper integrals We will see that there is a close connection between certain inﬁnite series and improper integrals.3. Other wise we say that the improper integral diverges. ∞ f (x) dx. improper integrals. we compute a number of classic integrals: 10. We have already encountered an example of an improper integral in Section 3.202 Chapter 10.e.4.) Suppose that r > 0 and let ∞ b I= 0 e−rt dt ≡ lim b→∞ e−rt dt.8.5 and in the context of radioactive decay in Section 8. and Taylor series 10. as the endpoint b moves to larger values. as given in the following deﬁnition: ∞ b f (x) dx = lim a b→∞ f (x) dx.4. we say that the improper integral converges.1.g. The essential question being addressed here is whether that area remains bounded when we include the “inﬁnite tail” of the function (i. (We have seen this earlier. Deﬁnition When the limit shown above exists.5. The deﬁnite integral can be interpreted as an area under the graph of the function. 0 Then note that b > 0 so that 1 I = lim − e−rt b→∞ r b 0 =− 1 1 1 lim (e−rb − e0 ) = − ( lim e−rb −1) = . Here we recap this important result in the context of our discussion of improper integrals. r b→∞ r b→∞ r 0 . and then taking a limit as the endpoint b moves off to larger and larger values. and again in applications in Sections 4. Recall the following deﬁnition: Deﬁnition An improper integral is an integral performed over an inﬁnite domain.e. i. With this in mind. Inﬁnite series.) For some functions (whose values get small enough fast enough) the answer is “yes”.1 Example: A decaying exponential: convergent improper integral Here we recall that the improper integral of a decaying exponential converges. in Section 3. e.5 and 8.e. a i. integrals over an inﬁnite domain.8. a The value of such an integral is understood to be a limit. we evaluate an improper integral by ﬁrst computing a deﬁnite integral over a ﬁnite domain a ≤ x ≤ b.
We want here to emphasize the behaviour at inﬁnity.1. 10. Nevertheless. we know that values decrease to zero as x increases59 . for the interval (1. we compute an improper integral for each one. We will use this result again in Section 10. y y= 1/x y=1/x 2 x 1 Figure 10. but also falls to arbitrarily small values as x increases. f (x) = 1/x and f (x) = 1/x2 .g. not the blow up that occurs close to x = 0. and one of the most important results in this chapter. In fact it converges to the value I = 1/r. Thus the limit exists (is ﬁnite) and the integral converges. ∞). Thus the area under the curve f (x) = 1/x over the interval 1 ≤ x ≤ ∞ is inﬁnite.2 Example: The improper integral of 1/x diverges We now consider a classic and counterintuitive result.10. Improper integrals 203 We have used the fact that limb→∞ e−rb = 0 since (for r. . ∞ I= 1 1 dx = lim b→∞ x b a 1 dx = lim ln(x) b→∞ x b→∞ b 1 = lim (ln(b) − ln(1)) b→∞ I = lim ln(b) = ∞ The fact that we get an inﬁnite value for this integral follows from the observation that ln(b) increases without bound as b increases.3.3. Consider the function y = f (x) = 1 . that is the limit does not exist (is not ﬁnite).2 and 10. b > 0) the exponential function is decreasing with increasing b. To do so.3. ∞) because this function is undeﬁned at x = 0. x Examining the graph of this function for positive x.3.3. We say that the improper integral of 1/x diverges (or does not converge). 59 We do not chose the interval (0. The heavy arrow is meant to remind us that we are considering areas over an unbounded domain. ∞).4. In Sections 10. The function is not only itself bounded. we consider two functions whose values decrease along the x axis. e.3. We show that one. but not the other encloses a ﬁnite (bounded) area over the interval (1. this is not enough to guarantee that the enclosed area remains ﬁnite! We show this in the following calculation.
the limit exists. that the area under its graph does not become inﬁnite . we have 1 I= .not an easy concept to digest!) This observations leads us to wonder what power p is needed to make the improper integral of a function 1/xp converge.3 Example: The improper integral of 1/x2 converges Now consider the related function y = f (x) = Then b b 1 . Thus.3. i. this integral converges. For this to be true.204 Chapter 10. and for p = 2 we have seen that it is convergent (Section 10. that power affects how rapidly the graph “falls off” to zero as x increases. Let ∞ I= 1 1 dx. By a similar calculation. = lim I = lim b→∞ 1 − p b→∞ (1 − p) 1 To summarize our result. p−1 ∞ 1 1 dx xp converges if p > 1. that can be any real number. Inﬁnite series.e.3. (Consequently 1/x2 has a sufﬁciently “slim” inﬁnite “tail”. and.4 When does the integral of 1/xp converge? Here we consider an arbitrary power. The only difference between these functions is the power of x. = lim (−x−1 ) 1 b→∞ 1 = − lim b→∞ 1 −1 b = 1. and Taylor series 10.3. improper integrals. In this case. Thus this integral converges provided that the term b1−p does not “blow up” as b increases.3.2. The function 1/x2 decreases much faster than 1/x. 1 − p < 0 or p > 1. in fact. We answer this question below. We observe that the behaviours of the improper integrals of the functions 1/x and 1/x2 are very different. so. . we require that the exponent (1 − p) should be negative. 10. p. we ﬁnd that b 1 x1−p b1−p − 1 . x2 ∞ and the corresponding integral I = 1 1 dx x2 I = lim b→∞ x−2 dx.3). The former diverges. while the latter converges.3. We ask when the corresponding improper integral converges or diverges. As shown in Figure 10.3.2). I = 1. xp For p = 1 we have already established that this integral diverges (Section 10. in contrast to the Example 10. diverges if p ≤ 1.
3. 2. so is the smaller one) 3. so is the larger one. (If the larger area is ﬁnite. If a f (x) dx diverges. (If the smaller area is inﬁnite. Thus. The integral ∞ 1 1 √ dx. The integral ∞ < 1. then a ∞ f (x) dx converges.) ∞ 2. Reversing that order leads to a common error. this integral diverges. x diverges. moreover. then a g(x) dx diverges.3. i.2. √ 1 x = x 2 . both continuous on some inﬁnite interval [a. 1 converges. so the result implies convergence of the integral. Improper integrals Examples: 1/xp that do or do not converge 205 1. If a ∞ g(x) dx converges.10. at all points on this interval.9. by the x−1. That is the order “if.01 > 1. (This means that the area under f (x) is smaller than a the area under g(x). The following important result establishes how these comparisons work: Suppose we are given two functions. Then the following conclusionsa can be made: ∞ ∞ 1. ∞).e. 10.. Here p = 1. f (x) and g(x). a recurring theme in this course. What is assumed and what is concluded works “one way”.then” is important. Suppose. 0 ≤ f (x) ≤ g(x). the ﬁrst function is smaller than the second.) a These statements have to be carefully noted. a f (x) dx ≤ ∞ g(x) dx. that.01 dx. This similarity stems from the fact that there is a close connection between series and integrals. 60 The reader should notice the similarity of these ideas to the comparisons made for inﬁnite series in the Appendix 11.5 Integral comparison tests The integrals discussed above can be used to make comparisons that help us to identify when other improper integrals converge or diverge60. so p = 1 2 We see this from the following argument: general result. .
.9. as we show here using a comparison of the series and the integral. simply because the successive terms being added are getting smaller and smaller. Many tests for convergence are provided in the Appendix 11.7 and here again in Section 10. Here we give an example of this type by establishing a connection between the harmonic series and a divergent improper integral. . . if x = k is an integer.. Inﬁnite series. At ﬁrst appearance. an interesting connection exists between convergence of series and integrals. These two “surprises” are closely related. 62 This function is “related” since for integer values of x. This is the topic we examine here. 10. but this appearance is deceptive and actually wrong61 . the function takes on values that are the same as successive terms in the series. in Section 1..4. then f (x) = f (k) = 1/k .2.4 Comparing integrals and series The convergence of inﬁnite series was discussed earlier. The harmonic series ∞ k=1 1 1 1 1 1 = 1 + + + + .1 The harmonic series The harmonic series is a sum of terms of the form 1/k where k = 1. this series might seem to have the desired qualities of a convergent series. 1 + x3 x2 1 1 Since the larger integral on the right is known to converge..206 Chapter 10.+ + . improper integrals. and will not be discussed in detail due to time and space constraints. 2. so does the smaller integral on the left. x 61 We have already noticed a similar surprise in connection with the improper integral of 1/x. We can use the convergence/divergence of an integral/series to determine the behaviour of the other. 10.e. 1 + x3 x x ∞ 1 x dx ≤ dx. k 2 3 4 k diverges We establish that the harmonic series diverges by comparing it to the improper integral of the related function62. However. i.. and Taylor series Example: comparison of improper integrals We can determine that the integral ∞ 1 x dx 1 + x3 converges by noting that for all x > 0 0≤ Thus ∞ x x 1 ≤ 3 = 2. Convergence of series and convergence of integrals is related. 1 y = f (x) = ..
In Figure 10. = 1 + + + + . Comparing integrals and series 207 1. by Section 10. .4. k On the other hand..4 we show on one graph a comparison of the area under this curve. and the heights form the sequence 1 1 1 {1. Note that we have purposely shown the stairs arranged so that they are higher than the function.3.0 11. .} 2 3 4 Thus the area of (inﬁnitely many) of these steps can be expressed as the (inﬁnite) harmonic series.0 Figure 10.. x 1 . .10. A= 1·1+1· 1 1 1 1 1 1 + 1 · + 1 · + ..0 <==== the function y=1/x The harmonic series 0. we note that the width of each step is 1. For the area of the staircase. The harmonic series is a sum that corresponds to the area under the staircase shown above. = 2 3 4 2 3 4 ∞ k=1 1 .2. . This is essential in drawing the conclusion that the sum of the series is inﬁnite: It is larger than an area under 1/x that we already know to be inﬁnite.. . .4. the area under the graph of the function y = f (x) = 1/x for 0 ≤ x ≤ ∞ is given by the improper integral ∞ 1 dx. and a staircase area representing the ﬁrst few terms in the harmonic series.0 0.
i.3. (i. say.208 Chapter 10. The same can be said for integrals.2 that this integral diverges! From Figure 10. diverges if p ≤ 1. This is not the case for. we explored a variety of functions.4 we see that the areas under the function. exponential . As .5 From geometric series to Taylor polynomials In studying calculus. This leads to the conclusion that The “p” series. improper integrals. however. since the smaller of the two (the improper integral) is inﬁnite. trigonometric functions.e. since p = 2 > 1.e.e. so is the larger (the sum of the harmonic series). we can compare series of the form ∞ k=1 1 kp ∞ to the integral 1 1 dx xp in precisely the same way. One of our ﬁrst examples. An additional convenience of polynomials is that “evaluating” the function. Notice. Af and under the staircase. We found long ago that it is easy to compute derivatives of polynominals. For example. Other comparisons: The “p” series More generally. Inﬁnite series.6. Thus. Our introduction to differential calculus started with such functions for a reason: these functions are convenient and simple to handle.1 was the integral of a polynomial. that the comparison does not give us a value to which the sum converges. satisfy 0 < Af < As ... using this comparison. by basic operations easily handled by an ordinary calculator. i. that the the sum of the harmonic series cannot be ﬁnite. functions such as p(x) = x5 + 2x2 + 3x + 2. so that this series diverges. 10. ∞ k=1 1 kp converges if p > 1. and Taylor series We have seen previously in Section 10. the series ∞ k=1 1 1 1 1 =1+ + + + . plugging in an x value and determining the corresponding y value) can be done by simple multiplications and additions. It merely indicates that the series does converge. We have established. Among the most basic are polynomials. in Section 3. k2 4 9 16 converges. We needed only use a power rule to integrate each term.
to ﬁnd the decimal value of sin(2. . . Let us consider the behaviour of this series when we vary the quantity r. (somewhat haphazard) results64 .2 that the sum of an inﬁnite geometric series is ∞ S = 1 + r + r2 + . That will be done in Section 10. We use both the results for convergence of the geometric series (from Section 10.7. n.3). Example 10.4) Then for every x in −1 < x < 1.10. we need a “variable”. These days the distinction is blurred. by (10. . . . 64 We say “haphazard” here because we are not yet at the point of a systematic procedure for computing a Taylor Series. For this reason. .3). (10. in the sense that the polynomial converges to the value of the function. . being able to approximate a function by a polynomial is an attractive proposition. This forms our main concern in the sections that follow. 63 For example. 1−r provided r < 1. rewriting the above with this substitution. it will be helpful to change notation by substituting r = x into the above equation. 65 A Taylor polynomial contains ﬁnitely many terms. Recall from Sections 1.2) and the formula for the sum of that series to derive a number of interesting. the ease of evaluating polynomials made them even more important.5. or for that matter.3) hold only provided r = x < 1. for x < 1 the two expressions “are the same”. The usefulness of this kind of result can be illustrated by a simple example.1 and 10.1 (Using the Taylor Series (10. 10. Here we “take what we can get” using simple manipulations of a geometric series. Then formally. We can arrive at connections between several functions and their polynomial approximations by exploiting our familiarity with the geometric series.1 Example 1: A simple expansion Substitute the variable x = r into the series (10. while remembering that the formula in Eqn (10. .3) To connect this result to a statement about a function. From geometric series to Taylor polynomials 209 functions. since powerful handheld calculators are ubiquitous.5) we would need a scientiﬁc calculator. most other functions we considered63.6.1 without using a calculator. 1−x We can think of this result as follows: Let f (x) = 1 1−x (10.5)) Compute the value of the function f (x) given by Eqn. (10.5) (10. it is true that f (x) can be approximated by terms in the polynomial p(x) = 1 + x + x2 + . To emphasize that now r is our variable.6) to approximate the function (10. leads to the conclusion that 1 = 1 + x + x2 + . We refer to this p(x) as an (inﬁnite) Taylor polynomial65 or simply a Taylor series for the function f (x). = k=0 rk = 1 . whereas a Taylor series has n → ∞. . + rk + . Before such devices were available.5. (10.6) In other words.5) for x = 0.
or any other “term by term” computation makes sense only so long as the original series converges. improper integrals. in other cases.1 + 0. . . whereby we integrate both sides to arrive at a new function and its expansion. shown next.5.2. ln(1 + x) = x − x2 x3 x4 + − + . we integrate the function f (t) = 1/(1 + t) (to arrive at a logarithm type integral) and on the right we integrate the power terms of the expansion.1) = 1 + 0. k (10. . . However. but we follow our substitution by integration. we have a simpler method: p(0. This is an important restriction that we emphasize: Manipulation of inﬁnite series by integration. so that the formula (10. We provide a few other examples based on substitutions of various sorts using the geometric series as a starting point.5. 10. .11 . Provided t < 1. . and Taylor series Solution: Plugging in the value x = 0. ln(1 + x). Inﬁnite series. in the sense that evaluating the function itself is not very difﬁcult.1) = 1/0. We can go farther with this example by a new manipulation. = 2 3 4 ∞ (−1)k+1 k=1 xk . 2 3 4 ln(1 + x) = x − This procedure has allowed us to ﬁnd a series representation for a new function. . we will ﬁnd that the polynomial expansion is the only way to ﬁnd the desired value. differentiation. . whose evaluation with no calculator requires long division66 .) dt x3 x4 x2 + − + . = 1. We are permitted to integrate the power series term by term provided that the series converges. provided t < 1 1+t This means we have produced a series expansion for the function 1/(1 + t).5.3 Example 3: An expansion for the logarithm We will use the results of Example 10.. 1 − (−t) 1 = 1 − t + t2 − t3 + t4 + ..12 + . . 10. Using the polynomial representation.1 + 0. we have that x 0 1 dt = 1+t x 0 (1 − t + t2 − t3 + t4 − . . . addition. .210 Chapter 10.01 + .7) 66 This example is slightly “trivial”. . On the left..3) for the sum of a geometric series implies that: 1 = 1 + (−t) + (−t)2 + (−t)3 + ..9. = 1 + 0. then r < 1 means that  − t = t < 1.1 into the function directly leads to 1/(1 − 0. multiplication.2 Example 2: Another substitution We make the substitution r = −t.
7) becomes 1 1 1 − 1 + + + + .4. Indeed. We are usually interested only in the ﬁrst few terms of such a series in any approximation of practical value.25) = ln(1.1 that the harmonic series diverges. .e. For example. .25 + 0. 2 3 so the approximation produced by the series is relatively good.5. It takes some deeper mathematics (Abel’s theorem) to prove that the result of this substitution actually makes sense. This example illustrates that outside the interval of convergence.3 (An expansion for ln(2)) Strictly speaking. Then 1 = 1 + (−t2 ) + (−t2 )2 + (−t2 )3 + . and converges. 2 3 4 We state without proof here that the alternating harmonic series converges to ln(2). the series and the function become “meaningless”. and 3 2 addition (0. we have ln(1 + x) = ln(0) which is undeﬁned. . 68 Strictly speaking.25) An expansion for the logarithm is deﬁnitely useful. and neither is the function deﬁned. our analysis does not predict what happens if we substitute x = 1 into the expansion of the function found in Section 10. we cannot ﬁnd ln(1 + 0. we must avoid x = −1. we have gotten thoroughly familiar with such summation notation67. . for x = 0. 2 3 4 This is the recognizable harmonic series (multiplied by 1).2239). since the expansion will not converge there. We certainly cannot expect the series for ln(1 + x) to converge when x > 1.25 − 0.25. division.4 Example 4: An expansion for arctan Suppose we make the substitution r = −t2 into the geometric series formula. Example 10.5. From geometric series to Taylor polynomials 211 The formula appended on the right is just a compact notation that represents the pattern of the terms. . 10.2 we note that the value of t is not permitted to leave the interval t < 1 so we need also x < 1 in the integration step68 . and recall that we need r < 1 for convergence..25 ≈ 0.7) expected to converge? Retracing our steps from the beginning of Example 10.2 (Evaluating the logarithm for x = 0. for x = −1. we should have ensured that we are inside this interval of convergence before we computed the last example. because this value of x is outside of the permitted range −1 < x < 1 in which the Taylor series can be guaranteed to converge. But we already know from Section 10. Example 10. that 1 1 1 ln(2) = 1 − + − + . Also note that for x = −1 the right hand side of (10.) When is the series for ln(1 + x) in (10.25) ≈ 0. Recall that in Chapter 1. 1 − (−t2 ) 67 The summation notation is not crucial and should certainly not be memorized.. i. in the sense that (without a scientiﬁc calculator or log tables) it is not possible to easily calculate the value of this function at a given point. .25) using simple operations.10.5. Thus. whereas the value of the ﬁrst few terms of the series are computable by simple multiplication.5.2231.3. (A scientiﬁc calculator gives ln(1.
. consider plugging in x = 1 into Equation (10. and recall that the antiderivative of the function 1/(1 + t2 ) is arctan(t).) dt x3 x5 x7 + − + .8) Example 10. . Then x 0 1 dt = 1 + t2 x 0 (1 − t2 + t4 − t6 + t8 + . and Taylor series 1 = 1 − t2 + t4 − t6 + t8 + . . Now integrate both sides.) 10. We conclude that a0 = f (0).. To determine a0 . ..8).. Thus we have found a way of computing the irrational number π. let x = 0 and note that f (0) = a0 + a1 0 + a2 02 + a3 03 + . = k=0 ak xk . (There are other series that converge to π very rapidly that are used in any practical application. . the convergence is slow.. Here we will use the function to directly determine the coefﬁcients ak . (2k − 1) (10.) This means that it is not practical to use such a series as an approximation for π. = 1 + t2 ∞ (−1)n t2n k=0 This series will converge provided t < 1. . .6 Taylor Series: a systematic approach In Section 10. Then arctan(1) = 1 − 1 1 1 + − + .5. Suppose we have a function which we want to represent by a power series. 3 5 7 But arctan(1) = π/4. . . we found a variety of Taylor series expansions directly from the formula for a geometric series..4 (An expansion for π) For a particular application of this expansion.212 Chapter 10. Here we ask how such Taylor series can be constructed more systematically. = 4 . = 3 5 7 ∞ arctan(x) = x − (−1)k+1 k=1 x(2k−1) . namely ∞ 1 1 1 1 π = 4 1 − + − + . (−1)k+1 3 5 7 (2k − 1) k=1 While it is true that this series converges. = a0 . Inﬁnite series. ∞ f (x) = a0 + a1 x + a2 x2 + a3 x3 + . if we are given a function that we want to approximate 69 . 69 The development of this section was motivated by online notes by David Austin. (This can be seen by adding up the ﬁrst 100 or 1000 terms of this series with a spreadsheet. . improper integrals.
then we know the coefﬁcients of the Taylor series as well. k! k! x2 x3 x4 xk + + +.+ +. 213 Here we have used the notation f (k) (x) to denote the k’th derivative of the function. = 1+x+ xk k! k=0 This is a very interesting series. . ⇒ a1 = f (0) ⇒ a2 = ⇒ a3 = ⇒ ak = f (0) 2 f (0) 2·3 f (k) (0) k! This gives us a recipe for calculating all coefﬁcients ak . 10. . So that the coefﬁcients of the Taylor series are ak = f (k) (0) 1 = . Because we have evaluated all the coefﬁcients by the substitution x = 0. . f (k) (0) = k!ak . .6. we say that the resulting power series is the Taylor series of the function about x = 0.1 Taylor series for the exponential function. the function deﬁned by the series is in fact equal to ex that is.+ak xk +. = 2 6 ∞ k=0 xk k! . + (k − 1)kak xk−2 + .. . ex Consider the function f (x) = ex . . . . . . . . . . = 2 6 24 k! ∞ Therefore the Taylor series for ex about x = 0 is a0 +a1 x+a2 x2 +a3 x3 +. . + kak xk−1 + .6. ex = 1 + x + x2 x3 + + . 1 · 2 · 3 · 4 . . In particular. 2 · 3a3 + . . . This means that if we can compute all the derivatives of the function f (x). Taylor Series: a systematic approach By differentiating both sides we ﬁnd the following: f (x) f (x) f (x) f (k) (x) = = = = a1 + 2a2 x + 3a3 x2 + . . All the derivatives of this function are equal to ex . Further.. We state here without proof that this series converges for all values of x. .10. f (0) = 2 · 3a3 . . . + (k − 2)(k − 1)kak xk−3 + . Now evaluate each of the above derivatives at x = 0. . f (k) (x) = ex ⇒ f (k) (0) = 1. f (0) = 2a2 . 2a2 + 2 · 3a3 x + . . kak + . Then f (0) = a1 .
3! 5! sin x = x − x3 x5 x7 x2n+1 + − + . 1 1 . 000 is determined by the behaviour of the function around x = 0. the cycle repeats. Inﬁnite series. We can also easily obtain a Taylor series expansion for functions related to ex . In other words. 000. It is instructive to demonstrate how successive terms in a Taylor series expansion lead to approximations that improve. .2 Taylor series of trigonometric functions In this example we determine the Taylor series for the sine function. . derivatives of all orders) at x = 0.. We leave this as an exercise for the reader. We can use the results of this example to establish the fact that the exponential function grows “faster” than any power function xn . 3! 5! 7! (2n + 1)! n=0 ∞ We state here without proof that the function sin(x) is analytic. = 2 6 ∞ k=0 (x2 )k k! 10. f (x) = − sin x. . a). for example. = 2 6 ∞ k=0 uk k! Then. a5 = . f (x) = − cos x. and so on in a cyclic fashion. . f (x) = cos x. we say that f is analytic on this region. This means that f (0) = 0. f (4) (x) = sin x. That is the same as saying that the ratio of ex to xn (for any power n) increases with x. .214 Chapter 10. The function and its derivatives are f (x) = sin x. the value of the function at x = 1. . = (−1)n .. as was the case here. and Taylor series The implication is that the function ex is completely determined (for all x values) by its behaviour (i. After this.. without assembling the derivatives. a2 = 0. a1 = 1. f (0) = 0. Doing this kind of thing will be the subject of the last computer laboratory exercise in this course. . . We start with the result that eu = 1 + u + u3 u2 + + . In other words. It is known that ex is analytic for all x. improper integrals. a0 = 0. a3 = − Thus. a4 = 0. If a function f (x) agrees with its Taylor polynomial on a region (−a.e. . . f (0) = −1..6. so that the expansion converges to the function for all x.. f (0) = 1.. This means that ex is a very special function with superior “predictable features”. the substitution u = x2 leads to ex = 1 + x2 + 2 (x2 )2 (x2 )3 + + .
in the polynomial Tn (x).0 7. The ﬁrst of these is just a linear (or tangent line) approximation that we had studied long ago.5 illustrates how the ﬁrst few Taylor polynomials approximate the function sin(x) near x = 0. 10. 5! 7! Then these polynomials provide a better and better approximation to the function sin(x) close to x = 0. The second improves this with a quadratic approximation. The student will be asked to use the spreadsheet.0 Figure 10. . T1 . T3 . consider the sequence of polynomials T1 (x) = x. Figure 10. 5! x5 x7 − .5. T2 . Taylor Series: a systematic approach 215 2. n. Observe that as we keep more terms. the approximating curve “hugs” the graph of sin(x) over a longer and longer range. x3 .0 0. etc. The higher Taylor polynomials do a better job of approximating the function on a larger interval about x = 0.5 for some other function. An approximation of the function y = sin(x) by successive Taylor polynomials. together with some calculations as done in this section. Here we demonstrate this idea with the expansion for the function sin(x) that we just obtained.6. To see this.10. T4 .0 T1 T3 T2 sin(x) T4 2. 3! 3 x + T3 (x) = x − 3! x3 + T4 (x) = x − 3! T2 (x) = x − x5 . to produce a composite graph similar to Fig.
7 Application of Taylor series In this section we illustrate some of the applications of Taylor series to problems that may be difﬁcult to solve using other conventional methods.667 which is too small. We provide one example of that sort in Section 10. so we can ﬁnd the Taylor series for cos(x) by simple differentiation term by term.1. and Taylor series Example 10.) We leave as an exercise for the reader to show that cos(x) = 1 − x4 x6 x2n x2 + − + . Here is how we might approach the problem using Taylor series: We know that the .. Some functions do not have an antiderivative that can be expressed in terms of other simple functions. Another application of Taylor series is to compute an approximate solution to a differential equation.9079 that is closer still. 10. the error in the approximation between the polynomials and the function is the vertical distance between the graphs of the polynomial and the function sin(x) (shown in black)..9333 that is much closer and T4 (2) ≈ . In some cases. we ﬁnd that its Taylor series is composed of even powers of x only. Inﬁnite series. improper integrals. But in this case.11. For example. and we cannot ﬁnd an antiderivative. 10. a concept we discussed in an earlier term. The approximations are: T1 (2) = 2.9093 (as found on a scientiﬁc calculator).1 Example 1: using a Taylor series to evaluate an integral Evaluate the deﬁnite integral 1 0 sin(x2 ) dx. this is permitted for all x. This stems from the fact that sin(x) is an odd function.g. as we cannot use the Fundamental Theorem of Calculus speciﬁes. we can approximate the value of the deﬁnite integral using a Taylor series. The Taylor series for cos(x) could be found by a similar sequence of steps.216 Chapter 10. The details of estimating such errors is omitted from our discussion. this is unnecessary: We already know the expansion for sin(x).5 (The error in successive approximations) For a given value of x close to the base point (at x = 0). T2 (2) = 2 − 23 /3! ≈ 0.7. (Since sin(x) is analytic. We also note that all polynomials that approximate sin(x) contain only odd powers of x.2 and another in Appendix 11. i. T3 (2) ≈ 0. 2 4! 6! (2n)! n=0 ∞ Since cos(x) has symmetry properties of an even function. A simple substitution (e. its graph is symmetric to rotation about the origin. u = x2 ) will not work here. Integrating these functions can be a problem. which is very inaccurate.e. at x = 2 radians sin(2) = 0. we can approximate the size of the error using the next term that would occur in the polynomial if we kept a higher order expansion. = (−1)n .7. We show this in Section 10. In general.7.
but we will pretend that we do not know this fact in illustrating the usefulness of Taylor series.. we have sin(x2 ) = x2 − x10 x14 x6 + − + . ... 3! 5! 7! t5 t7 t3 + − + . 3 7 · 3! 11 · 5! 15 · 7! This is an alternating series so we know that it converges..7. where separation of variables does not work. If we add up the ﬁrst four terms. dy = y. .. from previous work.) dx 3! 5! 7! 1 0 = = x3 x7 x11 x15 − + − + . Our task is to determine values for the coefﬁcients ai Since this function satisﬁes the condition y(0) = 1. .31026.2 Example 2: Series solution of a differential equation We are already familiar with the differential equation and initial condition that describes unlimited exponential growth. . Application of Taylor series series expansion for sin(t) is sin t = t − Substituting t = x2 . dx .10. we can antidifferentiate the Taylor series. .. just as we would a polynomial: 1 1 sin(x2 ) dx = 0 0 (x2 − x6 x10 x14 + − + .. we know that the solution of this differential equation and initial condition is y(x) = ex . 3! 5! 7! 217 In spite of the fact that we cannot antidifferentiate the function. Differentiating this power series leads to dy = a1 + 2a2 x + 3a3 x2 + 4a4 x3 + . we must have y(0) = a0 = 1. .. In some cases. the pattern becomes clear: the series converges to 0. Indeed. 10.7. this option would have great practical value. dx y(0) = 1. Let us express the “unknown” solution to the differential equation as y = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + . 3 7 · 3! 11 · 5! 15 · 7! 1 1 1 1 − + − + .
8 Summary The main points of this chapter can be summarized as follows: 1. .11. 2 ⇒ a3 = ⇒ a4 = ⇒ an = a2 3 a3 4 = = 1 2·3 . a2 = 3a3 . an−1 = nan . it must be true that the two Taylor a0 + a1 x + a2 x2 + a3 x3 + a4 x4 + . a3 = 4a4 . as we have seen. series match. is the expansion for the exponential function. This equality hold for all values of x. whereas I= 1 ∞ 1 dx x diverges. we would hardly need to use series to arrive at the right conclusion. 2! 3! n! which. We reviewed the deﬁnition of an improper integral ∞ b f (x) dx = lim a b→∞ f (x) dx. This can only happen if the coefﬁcients of like terms are the same. We provide an example of a more complicated differential equation and its series solution in Appendix 11. improper integrals. we would not ﬁnd it as easy to discover the solution by other techniques discussed previously. a1 = 2a2 . .n = 1 n! .e. This means that ⇒ a1 = 1. 1 2·3·4 . We computed some examples of improper integrals and discussed their convergence or divergence. and Taylor series dy dx But according to the differential equation.e. . = y. 1 ⇒ a2 = a1 = 2 . i. = a1 + 2a2 x + 3a3 x2 + 4a4 x3 + . y =1+x+ 10. we obtain: a0 = a1 = 1. a 2. Inﬁnite series. i.. = ex . . . In the example here shown.. .218 Chapter 10.. Thus. if the terms of the form Cx2 on either side are equal. and so on for all powers of x.+ + . an−1 n = 1 1·2·3. but in the next example. This agrees with the solution we have been expecting. . x3 xn x2 + + . if the constant terms on either side of the equation are equal. We recalled (from earlier chapters) that ∞ I= 0 e−rt dt converges.. Equating coefﬁcients.
can be found by computing the coefﬁcients ak = f (k) (0) k! 9. In discussing Taylor series.10. diverges if p ≤ 1. 5. and are beyond the scope of this introductory calculus course. we showed that the Taylor series for a function about x = 0.. ∞ k=1 1 kp converges if p > 1. k 2 3 4 k diverges. ∞ k=1 1 1 1 1 1 = 1 + + + + . Two examples of Taylor series that were obtained in this way are ln(1 + x) = x − and arctan(x) = x − x2 x3 x4 + − + . . = k=0 2 3 ak xk . 2 3 4 x3 x5 x7 + − + . ∞ f (x) = a0 + a1 x + a2 x + a3 x + . Summary 3. Using a comparison between integrals and series we showed that the harmonic series. More generally.. More generally. We studied Taylor series and showed that some can be found using the formula for convergent geometric series. and to solve a differential equation.. 6. .. + + .e. 4..8. . More generally.. does that approximation get better and better? (i. diverges if p ≤ 1. We used Taylor series to approximate a function. 3 5 7 for x < 1 for x < 1 7. our results led to the conclusion that the “p” series. we showed that ∞ 1 219 1 dx xp converges if p > 1.. does the series converge to the function?) (d) Is the convergence rate rapid? Some of these questions occupy the attention of career mathematicians. How good is that approximation? Another interesting question is: (c) If we include more and more such terms. to ﬁnd an approximation for a deﬁnite integral of a function. 8. We discussed some of the applications of Taylor series... we considered some of the following questions: (a) For what range of values of x can we expect the series to converges? (b) Suppose we approximate the function on the right by a ﬁnite number of terms on the left.
improper integrals. and Taylor series . Inﬁnite series.220 Chapter 10.
First. Proof by induction (optional) Here. we verify that this formula works for a few test cases: N = 1: If there is only one term. The idea of the method is to check that the formula works for one or two simple cases (e.g. 1 k 2 = 12 = 1. the “sum” of just one or just two terms of the series). also for purposes of illustration uses a “trick”. We give two examples. based on mathematical induction provides a general method that could be used in many similar kinds of proofs. determining that they work in general requires more work. While it is not hard to see that these formulae “work” for a few cases. 6 using a technique called induction. k=1 221 . Devising such tricks is not as straightforward. and depends to some degree on serendipity or experience with numbers. by inspection. The second argument. Here we provide a taste of how such careful arguments works. The ﬁrst. N k2 = k=1 N (N + 1)(2N + 1) . then clearly. we prove the formula for the sum of square integers.Chapter 11 Appendix 11. it has to also work for the next case (summing up to N + 1).1 How to prove the formulae for sums of squares and cubes Mathematicians are concerned with rigorously establishing formulae such as sums of squared (or cubed) integers. and then show that whenever it works for one case (summing up to N ).
222 The formula indicates that we should get
Chapter 11. Appendix
1(2)(3) 1(1 + 1)(2 · 1 + 1) = = 1, 6 6 so this case agrees with the prediction. N = 2: S=
2
k 2 = 12 + 22 = 1 + 4 = 5.
k=1
The formula would then predict that 2(2 + 1)(2 · 2 + 1) 2(3)(5) = = 5. 6 6 So far, elementary computation matches with the result predicted by the formula. Now we show that if this formula holds for any one case, e.g. for the sum of the ﬁrst N squares, then it is also true for the next case, i.e. for the sum of N + 1 squares. So we will assume that someone has checked that for some particular value of N it is true that S=
N
SN =
k=1
k2 =
N (N + 1)(2N + 1) . 6
Now the sum of the ﬁrst N + 1 squares will be just a bit bigger: it will have one more term added to it:
N +1 N
SN +1 =
k=1
k2 =
k=1
k 2 + (N + 1)2 .
Thus SN +1 = Combining terms, we get
N (N + 1)(2N + 1) + (N + 1)2 . 6 N (2N + 1) + (N + 1) , 6
SN +1 = (N + 1) SN +1 = (N + 1)
2N 2 + 7N + 6 2N 2 + N + 6N + 6 = (N + 1) . 6 6 Simplifying and factoring the last term leads to (2N + 3)(N + 2) . 6 We want to check that this still agrees with what the formula predicts. To make the notation simpler, we will let M stand for N + 1. Then, expressing the result in terms of the quantity M = N + 1 we get SN +1 = (N + 1)
M
SM =
k=1
k 2 = (N + 1)
[2(N + 1) + 1][(N + 1) + 1] [2M + 1][M + 1] =M . 6 6
This is the same formula as we started with, only written in terms of M instead of N . Thus we have veriﬁed that the formula works. By Mathematical Induction we ﬁnd that the result has been proved.
11.2. Riemann Sums: Extensions and other examples Another method using a trick70
n n
223
There is another method for determining the sums
k=1
k 2 or
k=1
k 3 . Write
(k + 1)3 − (k − 1)3 = 6k 2 + 2, so
n n
k=1
(k + 1)3 − (k − 1)3 =
(6k 2 + 2).
k=0
But looking more carefully at the left hand side (LHS), we see that
n
k=1
((k + 1)3 − (k − 1)3 ) = 23 − 03 + 33 − 13 + 43 − 23 + 53 − 33 ... + (n + 1)3 − (n − 1)3 .
most of the terms cancel, leaving only −1 + n3 + (n + 1)3 , so this means that
n n
−1 + n3 + (n + 1)3 = 6 so
n k=1
k2 +
k=1 k=1
2,
k 2 = (−1 + n3 + (n + 1)3 − 2n)/6 = (2n3 + 3n2 + n)/6.
n
Similarly, the formula for
k=1
k 3 , can be obtained by starting with (k + 1)4 − (k − 1)4 = 4k 3 + 4k.
11.2 Riemann Sums: Extensions and other examples
We take up some issues here that were not yet considered in the context of our examples of Riemann sums in Chapter 2 . First, we consider an arbitrary interval a ≤ x ≤ b. Then we comment on other ways of constructing the rectangular strip approximation (that eventually lead to the same limit when the true area is computed.)
11.2.1
Example 2: (Lu Fan)
A general interval: a ≤ x ≤ b
Find the area under the graph of the function y = f (x) = x2 + 2x + 1
70 I
a ≤ x ≤ b.
want to thank Robert Israel for contributing this material
224
Chapter 11. Appendix
Here the interval is a ≤ x ≤ b. Let us leave the values of a, b general for a moment, and consider how the calculation is set up in this case. Then we have length of interval = b − a number of segments = N b−a N (b − a) the k’th x value = xk = a + k N height of k’th rectangular strip = f (xk ) = x2 + 2xk + 1 k width of rectangular strips = ∆x = Combining the last two steps, the height of rectangle k is: f (xk ) = and its area is ak = f (xk ) × ∆x = f (xk ) × a+ k(b − a) N
2
+2 a+
k(b − a) N b−a N
+1
.
We use the last two equations to express ak in terms of k (and the quantities a, b, N ), then sum over k as before (A = ΣAk ). Some algebra is needed to simplify the sums so that summation formulae can be applied. The details are left as an exercise for the reader (see homework problems). Evaluating the limit N → ∞, we ﬁnally obtain
N
A = lim
N →∞
k=1
ak = (a + 1)2 (b − a) + (a + 1)(b − a)2 +
(b − a)3 . 3
as the area under the function f (x) = x2 + 2x + 1, over the interval a ≤ x ≤ b. Observe that the solution depends on a, and b. (The endpoints of the interval inﬂuence the total area under the curve.) For example, if the given interval happens to be 2 ≤ x ≤ 4. then, substituting a = 2, b = 4 into the above result for A, leads to A = (2 + 1)2 (4 − 2) + (2 + 1)(4 − 2)2 + 2 32 4−2 = 18 + 12 + = 3 3 3
In the next chapter, we will show that the tools of integration will lead to the same conclusion.
11.2.2 Using left (rather than right) endpoints
So far, we used the right endpoint of each rectangular strip to assign its height using the given function (see Figs. 2.2, 2.3, 2.4). Restated, we “glued” the top right corner of the rectangle to the graph of the function. This is the so called right endpoint approximation. We can just as well use the left corners of the rectangles to assign their heights (left endpoint approximation). A comparison of these for the function y = f (x) = x2 is shown in Figs. 11.1 and 11.2. In the case of the left endpoint approximation, we evaluate
11.2. Riemann Sums: Extensions and other examples
225
y y=f(x)
x a y y b
a=x 0
x1
x k−1 x k
x N=b
x
a=x 0
x1
x k−1 x k
x N=b
x
Figure 11.1. The area under the curve y = f (x) over an interval a ≤ x ≤ b could be computed by using either a left or right endpoint approximation. That is, the heights of the rectangles are adjusted to match the function of interest either on the right or on their left corner. Here we compare the two approaches. Usually both lead to the same result once a limit is computer to arrive at the “true ” area.
the heights of the rectangles starting at x0 (instead of x1 , and ending at xN −1 (instead of xN ). There are still N rectangles. To compare, sum of areas of the rectangles in the left versus the right endpoint approximation is
N
Right endpoints: AN strips =
k=1
f (xk )∆x.
N −1
Left endpoints: AN strips =
k=0
f (xk )∆x.
Details of one such computation is given in the box.
226 Example of left endpoint calculation
Chapter 11. Appendix
We here look again at a simple example, using the quadratic function, f (x) = x2 , 0 ≤ x ≤ 1,
We now compare the right and left endpoint approximation. These are shown in panels of Figure 11.2. Note that k 1 ∆x = , xk = , N N The area of the k’th rectangle is ak = f (xk ) × ∆x = (k/N ) (1/N ) , but now the sum starts at k = 0 so
N −1 N −1 2 N −1 2
AN strips =
k=0
f (xk )∆x =
k=0
k N
1 N
=
1 N3
k2 .
k=0
The ﬁrst rectangle corresponds to k = 0 in the left endpoint approximation (rather than k = 1 in the right endpoint approximation). But the k = 0 rectangle makes no contribution (as its area is zero in this example) and we have one less rectangle at the right endpoint of the interval, since the N’th rectangle is k = N − 1. Then the sum is AN strips = 1 N3 (2(N − 1) + 1)(N − 1)(N ) (2N − 1)(N − 1) = . 6 6N 2
The area, obtained by taking a limit for N → ∞ is A = lim AN strips = lim N →∞ N →∞ (2N − 1)(N − 1) 2 1 = = . 2 6N 6 3
We see that, after computing the limit, the result for the “true area” under the curve is exactly the same as we found earlier in this chapter using the right endpoint approximation.
11.3 Physical interpretation of the center of mass
We deﬁned the idea of a center of mass in Chapter 5. The center of mass has a physical interpretation for a real mass distribution. Loosely speaking, it is the position at which the mass “balances” without rotating to the left or right. In physics, we say that there is no net torque. The analogy with children sitting on a teetertotter is relevant: many children may sit along the length of the frame of a teeter totter, but if they distribute themselves in a way that the center of mass is at the fulcrum of the teeter totter, they will remain precariously balanced (until one of them ﬁdgets or gets off!). Notice that both the mass and the position of each child is important  a light child sitting on the very edge of the teeter totter can balance a heavier child sitting closer to the fulcrum (middle). The center of mass need not be the same as the median position. As we have see, the median is a position that
0 100. For example. the true area obtained is the same. In a discrete mass distribution.0 Comparison of Right Left approximations 0. beads farther away will contribute more torque than beads closer to point x.0 10. This quantity is related to the tendency of the mass to contribute a torque. i. to make the object rotate. for symmetric distributions. in the limit as the number of rectangles.) The center of mass assigns a greater weight to parts of the distribution that are “far away” in the same sense. greater mass) will contribute more torque than lighter beads. the median and the mean are the same. However. The approximation shown in pink is “missing” the largest rectangle shown in green.e. we speak of the “moment of mass” of a distribution about a point.0 10. produces equal sized areas under the graph of the density function.0 Right endpoint approximation Left endpoint approximation 0.as with the teeter totter. N → ∞.0 10.0 0.e.) In physics.0 0.11. Physical interpretation of the center of mass 227 100. the moment of mass of each of the beads relative to point x is given by the product of the mass and its distance away from the point . more generally.0 0. subdivides the distribution into two equal masses (or.0 0. Rectangles with left or right corners on the graph of y = x2 are compared in this picture.2. Suppose we are interested in a particular point of reference x. (However.0 100.3.0 Figure 11. and heavier beads (i. mass 1 contributes an amount m1 (x − x1 ) to the total moment of mass of the distribution about the point x. for example. Altogether the moment of mass of the distribution about the .
But we already know that the ﬁrst summation above is just the total mass. x2 . Thus. taking the second term to the other side and dividing by M leads to x= ¯ 1 M n mi xi . A discrete set of masses m1 . The center of mass is a special point x such that the moment of mass about that point is ¯ zero. (Loosely speaking the tendency to rotate to the left or the right are the same: thus the distribution would be balanced if it “rested on that point”. . Appendix n mi (x − xi ). so.) m1 m2 m3 x1 x2 x x3 Figure 11. x or n i=1 mi (¯ − xi ) = 0. i=1 We have recovered precisely the deﬁnition of the center of mass or “average x coordinate”. x ¯ i=1 mi − mi xi i=1 = 0. we rewrite the above as n n mi x ¯ i=1 n − mi xi i=1 n = 0. so that n xM − ¯ mi xi i=1 = 0. we identify the center of mass as the point at which M1 (¯) = 0. The center of mass of the distribution is the position at which the given mass distribution would balance. x Now expanding the sum. m3 is distributed at positions x1 .3. x3 .228 point x is deﬁned as M1 (x) = i=1 Chapter 11. m2 . here represented by the white triangle.
4.4 The shell method for computing volumes In Chapter 5. The shell method for computing volumes 229 11. The heights of the shells are determined by their y value (i. The volume of a cylindrical shell of radius r.11. together with a representative shell used in the calculation of total volume.e. .4. h = y = 1 − x = 71 Note to the instructor: This material may be skipped in the interest of time. Top: The curve that generates the cone (left) and the shape of the cone (right). Solution We show the cone and its generating curve in Figure 11.4. Here we show use an alternative dissection into shells. Bottom: the cone showing one of the series of shells that are used in this example to calculate its volume. height h and thickness τ is Vshell = 2πrhτ. We use the shell method71 to ﬁnd the volume of the cone formed by rotating the curve y =1−x about the y axis.1 y Example: Volume of a cone using the shell method y y=f(x)=1−x x y x y=1−x dx x Figure 11.4. 11. we used dissection into small disks to compute the volume of solids of revolution. We will place these shells one inside the other so that their radii are parallel to the x axis (so r = x). It presents an alternative to the disk method. but there may not be enough time to cover this in detail.
No “obvious substitution” or further integration by parts helps here. The integral can be transformed to I = sec(x) tan(x) − The latter can be rewritten: I1 = sec(x) tan2 (x) dx = sec(x)(sec2 (x) − 1). we recognize this as a process of integration. and for the ﬂattest shell r = 1. Then I = sec(x) tan(x) − sec3 (x) dx + sec(x) dx = sec(x) tan(x) − I + sec(x) dx so (taking both I’s to the left hand side. and dividing by 2) I= 1 2 sec(x) tan(x) + sec(x) dx . The volume of the object is obtained by summing up these shell volumes. we encountered the integral I= sec3 (x) dx. = π .5 More techniques of integration 11. Appendix 1 − r).1 Secants and other “hard integrals” In a previous section. We are now in need of an antiderivative for sec(x). to obtain: 1 1 V = 2π 0 r(1 − r) dr = 2π 1 0 (r − r2 ) dr. where we have use a trigonometric identity for tan2 (x).230 Chapter 11. the volume of one shell is Vshell = 2πr(1 − r) ∆r. We integrate over 0 ≤ r ≤ 1. The thickness of the shell is ∆r. sec(x) tan2 (x) dx. Then du = sec(x) tan(x)dx while v = sec2 (x) dx = tan(x). 3 We ﬁnd that V = 2π r3 r2 − 2 3 = 2π 0 1 1 − 2 3 11.5. as ∆r → dr gets inﬁnitesimally small. dv = sec2 (x) dx. For the tallest shell r = 0. Therefore. This integral can be simpliﬁed to some extent by integration by parts as follows: Let u = sec(x). but it can be checked by differentiation that sec(x) dx = ln  sec(x) + tan(x) + C Then the ﬁnal result is I= 1 (sec(x) tan(x) + ln  sec(x) + tan(x)) + C 2 . In the limit.
The denominator is a degree 2 polynomial function that has two roots and that can be factored easily. In this case. ﬁnd the common denominator and rewrite it as a single fraction in terms of A and B. More techniques of integration 231 11. 2 3 1 1 I= 72 This section was contributed by Lu Fan . this is how we deﬁne A and B. Using this result. Simplify: 2 2 2 1 I = ln 2x + 1 + ln 3x + 2 . so: 3Ax + 2A + 2Bx + B (3A + 2B)x + (2A + B) 7x + 4 = = (2x + 1)(3x + 2) (2x + 1)(3x + 2) (2x + 1)(3x + 2) The above equation should hold true for all x values. a ratio of two polynomials). B 3Ax + 2A + 2Bx + B A + = 2x + 1 3x + 2 (2x + 1)(3x + 2) Group like terms in the numerator. First. we rewrite the original expression in the form: 6x2 7x + 4 A B 1 2 7x + 4 = = + = + + 7x + 2 (2x + 1)(3x + 2) 2x + 1 3x + 2 2x + 1 3x + 2 Now we are ready to rewrite the integral: 2 I= 1 7x + 4 dx = 6x2 + 7x + 2 2 2 1 1 2 + 2x + 1 3x + 2 dx Simplify: 2 1 1 dx + 2 dx 2x + 1 3x + 2 1 1 Now the integral becomes a simple natural log integral that follows the pattern of Eqn.5.) Next. 6.1. the numerator is a degree 1 polynomial function. therefore: 3A + 2B = 7. and note that this has to match the original fraction. 2A + B = 4 Solving the system of equations leads to A = 1. we can use the following strategy.2 A special case of integration by partial fractions 2 Evaluate this integral72 : 7x + 4 dx + 7x + 2 1 This integral involves a rational function (that is. factor the denominator: 6x2 6x2 + 7x + 2 = (2x + 1)(3x + 2) Assign A and B in the following way: A B 7x + 4 7x + 4 + = = 2 2x + 1 3x + 2 (2x + 1)(3x + 2) 6x + 7x + 2 (Remember.5. B = 2.11.
2 Fraction of students that scored a given grade Suppose that the number of students who got the grade xi is pi . 2 3 6 2 3 This method can be used to solve any integral that contain a fraction with a degree 1 polynomial in the numerator and a degree 2 polynomial (that has two roots) in the denominator. If the class consists of a total of N students. 11.and yk the grade of student k. i=0 .6. and yk takes any value between 0 and 50 points). Appendix 2 1 1 2 1 (ln 5 − ln 3) + (ln 8 − ln 5) = − ln 5 − ln 3 + ln 8.6 Analysis of data: a student grade distribution We study the distribution of student grades on a test written by 76 students and graded out of a maximum of 50 points. This is just saying that the sum of the number of students in every one of the categories has to add up to the total class size. 11.6. we would have the sum 1 ¯ Y = 76 76 yk .232 Simplify further: I= Chapter 11. for a class of 76 students. then it follows that 10 N= i=1 pi . N (Dividing by N has normalized the distribution. k=1 11.1 Deﬁning an average grade Let N be the size of the class. N k=1 For example.) The mean or average grade is: 1 ¯ X= N 50 xi pi . The value pi /N is the empirical probability of getting grade xi . Here k is the number of the student from 1 to N . The fraction of the class that scored grade xi is pi . Then the average ¯ grade Y is computed by adding up the scores of all students and dividing by the number of students as follows: N 1 ¯ Y = yk .
10. . There were a total of 76 students writing the test.11. the index i takes on values i = 1. x4 represents all grades in the fourth “bin”.6.3 Frequency distribution It is difﬁcult to visualize all the data if we list all the grades obtained. For example.5.4 Average/mean of the distribution The frequency distribution can also be used to compute an average value: each (approximate) grade value xi is achieved by pi students. .1.0 31.g.0 Figure 11. Table 11. grades between ˜ 1620. 1115.5 represents this distribution. 610. 11. ˜ (The notation xi is meant to remind us that we are approximating the grade value. We will now reinterpret our notation somewhat.6. The mean grade 31. Analysis of data: a student grade distribution 233 25. e. Distributions of grades on a test with 50 point maximum. consider 10 “bins” or grade categories. we assign a “weight” to each of the catex . test scores might be divided into ranges of bins in increments of 5 points: (15. When we form the multiple (pi /N )˜i . We could represent grades in each bin by some value up to a speciﬁed level of accuracy.. 2. The.0 Grade Distribution 0. 11.0 0. grades in the the range 1620 can be described by the score18 up to an accuracy of ±2.9 is shown.9 mean 50. . We will refer to xi as the score and pi ˜ the number of students whose test score fell within the range represented by xi ±accuracy. In that case.) For ˜ example.1 shows the data that produced that bar graph. For example. We “lump together” scores into various categories (or “bins”) and create a distribution. This is what we have done in Table 11. etc). i.e. which is a fraction (pi /N ) of the whole ˜ class. A plot of pi against xi is called a frequency distribution. The bar graph shown in ˜ Figure 11.6.
Appendix (1/N ) xi pi ˜ 0.2 are saying the same thing.2) The sum in the denominator of this last fraction is simply the total class size. M i=1 pi (11.234 i 0 1 2 3 4 5 6 7 8 9 10 grade xi ˜ 0 3±2 8±2 13±2 18±2 23±2 28±2 33±2 38±2 43±2 48±2 number pi 0 1 2 0 5 10 8 21 19 6 4 pi 0 1 3 3 8 18 26 47 66 72 76 xi pi ˜ 0.0263 29.5 Cumulative function We can calculate a “runn